ABSTRACT
Software-intensive systems produce logs for troubleshooting purposes. Recently, many deep learning models have been proposed to automatically detect system anomalies based on log data. These models typically claim very high detection accuracy. For example, most models report an F-measure greater than 0.9 on the commonly-used HDFS dataset. To achieve a profound understanding of how far we are from solving the problem of log-based anomaly detection, in this paper, we conduct an in-depth analysis of five state-of-the-art deep learning-based models for detecting system anomalies on four public log datasets. Our experiments focus on several aspects of model evaluation, including training data selection, data grouping, class distribution, data noise, and early detection ability. Our results point out that all these aspects have significant impact on the evaluation, and that all the studied models do not always work well. The problem of log-based anomaly detection has not been solved yet. Based on our findings, we also suggest possible future work.
- 2021. Implementation of PLELog. Retrieved August 27, 2021 from https://github.com/YangLin-George/PLELogGoogle Scholar
- 2021. A large collection of system log datasets for AI-powered log analytics. Retrieved August 31, 2021 from https://github.com/logpai/loghubGoogle Scholar
- 2021. Log Anomaly Detection Toolkit. Retrieved August 27, 2021 from https://github.com/donglee-afar/logdeepGoogle Scholar
- 2021. A Pytorch implementation of DeepLog. Retrieved August 21, 2021 from https://github.com/wuyifan18/DeepLogGoogle Scholar
- 2021. A toolkit for automated log parsing. Retrieved August 31, 2021 from https://github.com/logpai/logparserGoogle Scholar
- Jakub Breier and Jana Branišová. 2015. Anomaly detection from log files using data mining techniques. In Information Science and Applications. Springer, 449--457.Google Scholar
- Mike Chen, Alice X Zheng, Jim Lloyd, Michael I Jordan, and Eric Brewer. 2004. Failure diagnosis using decision trees. In International Conference on Autonomic Computing, 2004. Proceedings. IEEE, 36--43.Google ScholarCross Ref
- Zhuangbin Chen, Jinyang Liu, Wenwei Gu, Yuxin Su, and Michael R Lyu. 2021. Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection. arXiv preprint arXiv:2107.05908 (2021).Google Scholar
- Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).Google Scholar
- Hetong Dai, Heng Li, Che Shao Chen, Weiyi Shang, and Tse-Hsun Chen. 2020. Logram: Efficient log parsing using n-gram dictionaries. IEEE Transactions on Software Engineering (2020).Google Scholar
- Min Du and Feifei Li. 2016. Spell: Streaming parsing of system event logs. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 859--864.Google ScholarCross Ref
- Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 1285--1298.Google ScholarDigital Library
- Amir Farzad and T Aaron Gulliver. 2020. Unsupervised log message anomaly detection. ICT Express 6, 3 (2020), 229--237.Google ScholarCross Ref
- Norman E. Fenton and Niclas Ohlsson. 2000. Quantitative analysis of faults and failures in a complex software system. IEEE Transactions on Software engineering 26, 8 (2000), 797--814.Google ScholarDigital Library
- Haixuan Guo, Shuhan Yuan, and Xintao Wu. 2021. LogBERT: Log Anomaly Detection via BERT. arXiv preprint arXiv:2103.04475 (2021).Google Scholar
- Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Guofei Jiang, and Abdullah Mueen. 2016. Logmine: Fast pattern recognition for log analytics. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1573--1582.Google ScholarDigital Library
- Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R Lyu. 2017. Drain: An online log parsing approach with fixed depth tree. In 2017 IEEE International Conference on Web Services (ICWS). IEEE, 33--40.Google ScholarCross Ref
- Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R Lyu. 2020. A Survey on Automated Log Analysis for Reliability Engineering. arXiv preprint arXiv:2009.07237 (2020).Google Scholar
- Shilin He, Jieming Zhu, Pinjia He, and Michael R Lyu. 2016. Experience report: System log analysis for anomaly detection. In ISSRE 2016. IEEE, 207--218.Google ScholarCross Ref
- Shilin He, Jieming Zhu, Pinjia He, and Michael R Lyu. 2020. Loghub: a large collection of system log datasets towards automated log analytics. arXiv preprint arXiv:2008.06448 (2020).Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
- Zhen Ming Jiang, Ahmed E Hassan, Parminder Flora, and Gilbert Hamann. 2008. Abstracting execution logs to execution events for enterprise applications (short paper). In 2008 The Eighth International Conference on Quality Software. IEEE, 181--186.Google ScholarDigital Library
- Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016).Google Scholar
- Suhas Kabinna, Cor-Paul Bezemer, Weiyi Shang, Mark D Syer, and Ahmed E Hassan. 2018. Examining the stability of logging statements. Empirical Software Engineering 23, 1 (2018), 290--333.Google ScholarDigital Library
- Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar.Google ScholarCross Ref
- Van-Hoang Le and Hongyu Zhang. 2021. Log-based Anomaly Detection Without Log Parsing. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 492--504.Google ScholarDigital Library
- Xiaoyun Li, Pengfei Chen, Linxiao Jing, Zilong He, and Guangba Yu. 2020. Swiss-Log: Robust and Unified Deep Learning Based Log Anomaly Detection for Diverse Faults. In ISSRE 2020. IEEE, 92--103.Google Scholar
- Qingwei Lin, Ken Hsieh, Yingnong Dang, Hongyu Zhang, Kaixin Sui, Yong Xu, Jian-Guang Lou, Chenggang Li, Youjiang Wu, Randolph Yao, et al. 2018. Predicting node failure in cloud service systems. In ESEC/FSE 2018. 480--490.Google Scholar
- Qingwei Lin, Hongyu Zhang, Jian-Guang Lou, Yu Zhang, and Xuewei Chen. 2016. Log Clustering Based Problem Identification for Online Service Systems. In Proceedings of the 38th International Conference on Software Engineering Companion (Austin, Texas) (ICSE '16). 102--111.Google ScholarDigital Library
- Fucheng Liu, Yu Wen, Dongxue Zhang, Xihe Jiang, Xinyu Xing, and Dan Meng. 2019. Log2vec: A heterogeneous graph embedding based approach for detecting cyber threats within enterprise. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 1777--1794.Google ScholarDigital Library
- Xu Liu, Weiyou Liu, Xiaoqiang Di, Jinqing Li, Binbin Cai, Weiwu Ren, and Huamin Yang. 2021. LogNADS: Network anomaly detection scheme based on semantic representation. Future Generation Computer Systems (2021).Google Scholar
- Jian-Guang Lou, Qiang Fu, Shengqi Yang, Ye Xu, and Jiang Li. 2010. Mining Invariants from Console Logs for System Problem Detection.. In USENIX Annual Technical Conference. 1--14.Google Scholar
- Siyang Lu, Xiang Wei, Yandong Li, and Liqiang Wang. 2018. Detecting anomaly in big data system logs using convolutional neural network. In 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). IEEE, 151--158.Google Scholar
- Adetokunbo AO Makanju, A Nur Zincir-Heywood, and Evangelos E Milios. 2009. Clustering event logs using iterative partitioning. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 1255--1264.Google ScholarDigital Library
- Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical density based clustering. Journal of Open Source Software 2, 11 (2017), 205.Google ScholarCross Ref
- Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, et al. 2019. LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs.. In IJCAI, Vol. 7. 4739--4745.Google Scholar
- Haibo Mi, Huaimin Wang, Yangfan Zhou, Michael Rung-Tsong Lyu, and Hua Cai. 2013. Toward fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems. IEEE Transactions on Parallel and Distributed Systems 24, 6 (2013), 1245--1255.Google ScholarDigital Library
- Meiyappan Nagappan and Mladen A Vouk. 2010. Abstracting log lines to log event types for mining software system logs. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). IEEE, 114--117.Google ScholarCross Ref
- Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, and Odej Kao. 2020. Self-attentive classification-based anomaly detection in unstructured logs. arXiv preprint arXiv:2008.09340 (2020).Google Scholar
- Adam Oliner and Jon Stearley. 2007. What supercomputers say: A study of five system logs. In DSN 2007. IEEE, 575--584.Google ScholarDigital Library
- Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarCross Ref
- Keiichi Shima. 2016. Length matters: Clustering system log messages using length of words. arXiv preprint arXiv:1611.03213 (2016).Google Scholar
- Marina Sokolova, Nathalie Japkowicz, and Stan Szpakowicz. 2006. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence. Springer, 1015--1021.Google Scholar
- Liang Tang, Tao Li, and Chang-Shing Perng. 2011. LogSig: Generating system events from raw textual logs. In Proceedings of the 20th ACM international conference on Information and knowledge management. 785--794.Google ScholarDigital Library
- Risto Vaarandi and Mauno Pihelgas. 2015. Logcluster-a data clustering and pattern mining algorithm for event logs. In 2015 11th International conference on network and service management (CNSM). IEEE, 1--7.Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).Google Scholar
- Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael I Jordan. 2009. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. 117--132.Google ScholarDigital Library
- Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Semi-supervised log-based anomaly detection via probabilistic label estimation. In ICSE 2021. IEEE, 1448--1460.Google ScholarDigital Library
- Jacob Yerushalmy. 1947. Statistical problems in assessing methods of medical diagnosis, with special reference to X-ray techniques. Public Health Reports (1896--1970) (1947), 1432--1449.Google Scholar
- Bo Zhang, Hongyu Zhang, Pablo Moscato, and Aozhong Zhang. 2020. Anomaly Detection via Mining Numerical Workflow Relations from Logs. In 2020 International Symposium on Reliable Distributed Systems (SRDS). IEEE, 195--204.Google ScholarCross Ref
- Hongyu Zhang. 2008. On the distribution of software faults. IEEE Transactions on Software Engineering 34, 2 (2008), 301--302.Google ScholarDigital Library
- Hongyu Zhang. 2009. An investigation of the relationships between lines of code and defects. In 2009 IEEE International Conference on Software Maintenance. IEEE, 274--283.Google ScholarCross Ref
- Hongyu Zhang and Xiuzhen Zhang. 2007. Comments on" data mining static code attributes to learn defect predictors". IEEE Transactions on Software Engineering 33, 9 (2007), 635--637.Google ScholarDigital Library
- Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, et al. 2019. Robust log-based anomaly detection on unstable log data. In ESEC/FSE 2019. 807--817.Google Scholar
- Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, and Michael R Lyu. 2019. Tools and benchmarks for automated log parsing. In ICSE 2019. IEEE, 121--130.Google ScholarDigital Library
Index Terms
- Log-based anomaly detection with deep learning: how far are we?
Recommendations
Log-based anomaly detection without log parsing
ASE '21: Proceedings of the 36th IEEE/ACM International Conference on Automated Software EngineeringSoftware systems often record important runtime information in system logs for troubleshooting purposes. There have been many studies that use log data to construct machine learning models for detecting system anomalies. Through our empirical study, we ...
A Theoretical Framework for Understanding the Relationship Between Log Parsing and Anomaly Detection
Runtime VerificationAbstractLog-based anomaly detection identifies systems’ anomalous behaviors by analyzing system runtime information recorded in logs. While many approaches have been proposed, all of them have in common an essential pre-processing step called log parsing. ...
Robust log-based anomaly detection on unstable log data
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringLogs are widely used by large and complex software-intensive systems for troubleshooting. There have been a lot of studies on log-based anomaly detection. To detect the anomalies, the existing methods mainly construct a detection model using log event ...
Comments