skip to main content
10.1145/3510003.3510155acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Log-based anomaly detection with deep learning: how far are we?

Published:05 July 2022Publication History

ABSTRACT

Software-intensive systems produce logs for troubleshooting purposes. Recently, many deep learning models have been proposed to automatically detect system anomalies based on log data. These models typically claim very high detection accuracy. For example, most models report an F-measure greater than 0.9 on the commonly-used HDFS dataset. To achieve a profound understanding of how far we are from solving the problem of log-based anomaly detection, in this paper, we conduct an in-depth analysis of five state-of-the-art deep learning-based models for detecting system anomalies on four public log datasets. Our experiments focus on several aspects of model evaluation, including training data selection, data grouping, class distribution, data noise, and early detection ability. Our results point out that all these aspects have significant impact on the evaluation, and that all the studied models do not always work well. The problem of log-based anomaly detection has not been solved yet. Based on our findings, we also suggest possible future work.

References

  1. 2021. Implementation of PLELog. Retrieved August 27, 2021 from https://github.com/YangLin-George/PLELogGoogle ScholarGoogle Scholar
  2. 2021. A large collection of system log datasets for AI-powered log analytics. Retrieved August 31, 2021 from https://github.com/logpai/loghubGoogle ScholarGoogle Scholar
  3. 2021. Log Anomaly Detection Toolkit. Retrieved August 27, 2021 from https://github.com/donglee-afar/logdeepGoogle ScholarGoogle Scholar
  4. 2021. A Pytorch implementation of DeepLog. Retrieved August 21, 2021 from https://github.com/wuyifan18/DeepLogGoogle ScholarGoogle Scholar
  5. 2021. A toolkit for automated log parsing. Retrieved August 31, 2021 from https://github.com/logpai/logparserGoogle ScholarGoogle Scholar
  6. Jakub Breier and Jana Branišová. 2015. Anomaly detection from log files using data mining techniques. In Information Science and Applications. Springer, 449--457.Google ScholarGoogle Scholar
  7. Mike Chen, Alice X Zheng, Jim Lloyd, Michael I Jordan, and Eric Brewer. 2004. Failure diagnosis using decision trees. In International Conference on Autonomic Computing, 2004. Proceedings. IEEE, 36--43.Google ScholarGoogle ScholarCross RefCross Ref
  8. Zhuangbin Chen, Jinyang Liu, Wenwei Gu, Yuxin Su, and Michael R Lyu. 2021. Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection. arXiv preprint arXiv:2107.05908 (2021).Google ScholarGoogle Scholar
  9. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).Google ScholarGoogle Scholar
  10. Hetong Dai, Heng Li, Che Shao Chen, Weiyi Shang, and Tse-Hsun Chen. 2020. Logram: Efficient log parsing using n-gram dictionaries. IEEE Transactions on Software Engineering (2020).Google ScholarGoogle Scholar
  11. Min Du and Feifei Li. 2016. Spell: Streaming parsing of system event logs. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 859--864.Google ScholarGoogle ScholarCross RefCross Ref
  12. Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 1285--1298.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Amir Farzad and T Aaron Gulliver. 2020. Unsupervised log message anomaly detection. ICT Express 6, 3 (2020), 229--237.Google ScholarGoogle ScholarCross RefCross Ref
  14. Norman E. Fenton and Niclas Ohlsson. 2000. Quantitative analysis of faults and failures in a complex software system. IEEE Transactions on Software engineering 26, 8 (2000), 797--814.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Haixuan Guo, Shuhan Yuan, and Xintao Wu. 2021. LogBERT: Log Anomaly Detection via BERT. arXiv preprint arXiv:2103.04475 (2021).Google ScholarGoogle Scholar
  16. Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Guofei Jiang, and Abdullah Mueen. 2016. Logmine: Fast pattern recognition for log analytics. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1573--1582.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R Lyu. 2017. Drain: An online log parsing approach with fixed depth tree. In 2017 IEEE International Conference on Web Services (ICWS). IEEE, 33--40.Google ScholarGoogle ScholarCross RefCross Ref
  18. Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R Lyu. 2020. A Survey on Automated Log Analysis for Reliability Engineering. arXiv preprint arXiv:2009.07237 (2020).Google ScholarGoogle Scholar
  19. Shilin He, Jieming Zhu, Pinjia He, and Michael R Lyu. 2016. Experience report: System log analysis for anomaly detection. In ISSRE 2016. IEEE, 207--218.Google ScholarGoogle ScholarCross RefCross Ref
  20. Shilin He, Jieming Zhu, Pinjia He, and Michael R Lyu. 2020. Loghub: a large collection of system log datasets towards automated log analytics. arXiv preprint arXiv:2008.06448 (2020).Google ScholarGoogle Scholar
  21. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zhen Ming Jiang, Ahmed E Hassan, Parminder Flora, and Gilbert Hamann. 2008. Abstracting execution logs to execution events for enterprise applications (short paper). In 2008 The Eighth International Conference on Quality Software. IEEE, 181--186.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016).Google ScholarGoogle Scholar
  24. Suhas Kabinna, Cor-Paul Bezemer, Weiyi Shang, Mark D Syer, and Ahmed E Hassan. 2018. Examining the stability of logging statements. Empirical Software Engineering 23, 1 (2018), 290--333.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar.Google ScholarGoogle ScholarCross RefCross Ref
  26. Van-Hoang Le and Hongyu Zhang. 2021. Log-based Anomaly Detection Without Log Parsing. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 492--504.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xiaoyun Li, Pengfei Chen, Linxiao Jing, Zilong He, and Guangba Yu. 2020. Swiss-Log: Robust and Unified Deep Learning Based Log Anomaly Detection for Diverse Faults. In ISSRE 2020. IEEE, 92--103.Google ScholarGoogle Scholar
  28. Qingwei Lin, Ken Hsieh, Yingnong Dang, Hongyu Zhang, Kaixin Sui, Yong Xu, Jian-Guang Lou, Chenggang Li, Youjiang Wu, Randolph Yao, et al. 2018. Predicting node failure in cloud service systems. In ESEC/FSE 2018. 480--490.Google ScholarGoogle Scholar
  29. Qingwei Lin, Hongyu Zhang, Jian-Guang Lou, Yu Zhang, and Xuewei Chen. 2016. Log Clustering Based Problem Identification for Online Service Systems. In Proceedings of the 38th International Conference on Software Engineering Companion (Austin, Texas) (ICSE '16). 102--111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Fucheng Liu, Yu Wen, Dongxue Zhang, Xihe Jiang, Xinyu Xing, and Dan Meng. 2019. Log2vec: A heterogeneous graph embedding based approach for detecting cyber threats within enterprise. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 1777--1794.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xu Liu, Weiyou Liu, Xiaoqiang Di, Jinqing Li, Binbin Cai, Weiwu Ren, and Huamin Yang. 2021. LogNADS: Network anomaly detection scheme based on semantic representation. Future Generation Computer Systems (2021).Google ScholarGoogle Scholar
  32. Jian-Guang Lou, Qiang Fu, Shengqi Yang, Ye Xu, and Jiang Li. 2010. Mining Invariants from Console Logs for System Problem Detection.. In USENIX Annual Technical Conference. 1--14.Google ScholarGoogle Scholar
  33. Siyang Lu, Xiang Wei, Yandong Li, and Liqiang Wang. 2018. Detecting anomaly in big data system logs using convolutional neural network. In 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). IEEE, 151--158.Google ScholarGoogle Scholar
  34. Adetokunbo AO Makanju, A Nur Zincir-Heywood, and Evangelos E Milios. 2009. Clustering event logs using iterative partitioning. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 1255--1264.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical density based clustering. Journal of Open Source Software 2, 11 (2017), 205.Google ScholarGoogle ScholarCross RefCross Ref
  36. Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, et al. 2019. LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs.. In IJCAI, Vol. 7. 4739--4745.Google ScholarGoogle Scholar
  37. Haibo Mi, Huaimin Wang, Yangfan Zhou, Michael Rung-Tsong Lyu, and Hua Cai. 2013. Toward fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems. IEEE Transactions on Parallel and Distributed Systems 24, 6 (2013), 1245--1255.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Meiyappan Nagappan and Mladen A Vouk. 2010. Abstracting log lines to log event types for mining software system logs. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). IEEE, 114--117.Google ScholarGoogle ScholarCross RefCross Ref
  39. Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, and Odej Kao. 2020. Self-attentive classification-based anomaly detection in unstructured logs. arXiv preprint arXiv:2008.09340 (2020).Google ScholarGoogle Scholar
  40. Adam Oliner and Jon Stearley. 2007. What supercomputers say: A study of five system logs. In DSN 2007. IEEE, 575--584.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarGoogle ScholarCross RefCross Ref
  42. Keiichi Shima. 2016. Length matters: Clustering system log messages using length of words. arXiv preprint arXiv:1611.03213 (2016).Google ScholarGoogle Scholar
  43. Marina Sokolova, Nathalie Japkowicz, and Stan Szpakowicz. 2006. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence. Springer, 1015--1021.Google ScholarGoogle Scholar
  44. Liang Tang, Tao Li, and Chang-Shing Perng. 2011. LogSig: Generating system events from raw textual logs. In Proceedings of the 20th ACM international conference on Information and knowledge management. 785--794.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Risto Vaarandi and Mauno Pihelgas. 2015. Logcluster-a data clustering and pattern mining algorithm for event logs. In 2015 11th International conference on network and service management (CNSM). IEEE, 1--7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).Google ScholarGoogle Scholar
  47. Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael I Jordan. 2009. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. 117--132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Semi-supervised log-based anomaly detection via probabilistic label estimation. In ICSE 2021. IEEE, 1448--1460.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Jacob Yerushalmy. 1947. Statistical problems in assessing methods of medical diagnosis, with special reference to X-ray techniques. Public Health Reports (1896--1970) (1947), 1432--1449.Google ScholarGoogle Scholar
  50. Bo Zhang, Hongyu Zhang, Pablo Moscato, and Aozhong Zhang. 2020. Anomaly Detection via Mining Numerical Workflow Relations from Logs. In 2020 International Symposium on Reliable Distributed Systems (SRDS). IEEE, 195--204.Google ScholarGoogle ScholarCross RefCross Ref
  51. Hongyu Zhang. 2008. On the distribution of software faults. IEEE Transactions on Software Engineering 34, 2 (2008), 301--302.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Hongyu Zhang. 2009. An investigation of the relationships between lines of code and defects. In 2009 IEEE International Conference on Software Maintenance. IEEE, 274--283.Google ScholarGoogle ScholarCross RefCross Ref
  53. Hongyu Zhang and Xiuzhen Zhang. 2007. Comments on" data mining static code attributes to learn defect predictors". IEEE Transactions on Software Engineering 33, 9 (2007), 635--637.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, et al. 2019. Robust log-based anomaly detection on unstable log data. In ESEC/FSE 2019. 807--817.Google ScholarGoogle Scholar
  55. Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, and Michael R Lyu. 2019. Tools and benchmarks for automated log parsing. In ICSE 2019. IEEE, 121--130.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Log-based anomaly detection with deep learning: how far are we?

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICSE '22: Proceedings of the 44th International Conference on Software Engineering
      May 2022
      2508 pages
      ISBN:9781450392211
      DOI:10.1145/3510003

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 July 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate276of1,856submissions,15%

      Upcoming Conference

      ICSE 2025

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader