skip to main content
research-article

Exathlon: a benchmark for explainable anomaly detection over time series

Published:01 July 2021Publication History
Skip Abstract Section

Abstract

Access to high-quality data repositories and benchmarks have been instrumental in advancing the state of the art in many experimental research domains. While advanced analytics tasks over time series data have been gaining lots of attention, lack of such community resources severely limits scientific progress. In this paper, we present Exathlon, the first comprehensive public benchmark for explainable anomaly detection over high-dimensional time series data. Exathlon has been systematically constructed based on real data traces from repeated executions of large-scale stream processing jobs on an Apache Spark cluster. Some of these executions were intentionally disturbed by introducing instances of six different types of anomalous events (e.g., misbehaving inputs, resource contention, process failures). For each of the anomaly instances, ground truth labels for the root cause interval as well as those for the extended effect interval are provided, supporting the development and evaluation of a wide range of anomaly detection (AD) and explanation discovery (ED) tasks. We demonstrate the practical utility of Exathlon's dataset, evaluation methodology, and end-to-end data science pipeline design through an experimental study with three state-of-the-art AD and ED techniques.

References

  1. Arvind Arasu, Mitch Cherniack, Eddie F. Galvez, David Maier, Anurag Maskey, Esther Ryvkina, Michael Stonebraker, and Richard Tibbetts. 2004. Linear Road: A Stream Data Management Benchmark. In VLDB Conference. 480--491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anthony J. Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn J. Keogh. 2018. The UEA Multivariate Time Series Classification Archive, 2018. CoRR abs/1811.00075 (2018). arXiv:1811.00075 http://arxiv.org/abs/1811.00075 Accessed: 2021-07-27.Google ScholarGoogle Scholar
  3. Peter Bailis, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, and Sahaana Suri. 2017. MacroBase: Prioritizing Attention in Fast Data. In ACM International Conference on Management of Data (SIGMOD). 541--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrei Barbu, David Mayo, Julian Alverio, William Luo, Christopher Wang, Dan Gutfreund, Josh Tenenbaum, and Boris Katz. 2019. ObjectNet: A Large-Scale Bias-controlled Dataset for Pushing the Limits of Object Recognition Models. In Annual Conference on Neural Information Processing Systems (NeurIPS). 9453--9463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, Justin Reynolds, and Casey Rosenthal. 2016. Chaos Engineering. IEEE Software 33, 3 (2016), 35--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Leopoldo E. Bertossi, Jordan Li, Maximilian Schleich, Dan Suciu, and Zografoula Vagena. 2020. Causality-based Explanation of Classification Outcomes. In Fourth Workshop on Data Management for End-To-End Machine Learning (DEEM). 6:1--6:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ana Maria Bianco, Marta Garcia Ben, Eunie Jr. Martinez, and Victor J. Yohai. 2001. Outlier Detection in Regression Models with ARIMA Errors using Robust Estimates. Journal of Forecasting 20, 8 (2001), 565--579.Google ScholarGoogle ScholarCross RefCross Ref
  8. Paul Boniol, Michele Linardi, Federico Roncallo, and Themis Palpanas. 2020. Automated Anomaly Detection in Large Sequences. In IEEE 36th International Conference on Data Engineering (ICDE). 1834--1837.Google ScholarGoogle Scholar
  9. Paul Boniol and Themis Palpanas. 2020. Series2Graph: Graph-Based Subsequence Anomaly Detection for Time Series. Proceedings of the VLDB Endowment (PVLDB) 13, 12 (2020), 1821--1834. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Loïc Bontemps, Van Loi Cao, James McDermott, and Nhien-An Le-Khac. 2016. Collective Anomaly Detection Based on Long Short-Term Memory Recurrent Neural Networks. In International Conference on Future Data and Security Engineering (FDSE), Vol. 10018. 141--152.Google ScholarGoogle Scholar
  11. Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying Density-based Local Outliers. In ACM International Conference on Management of Data (SIGMOD). 93--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Raghavendra Chalapathy and Sanjay Chawla. 2019. Deep Learning for Anomaly Detection: A Survey. CoRR abs/1901.03407 (2019). arXiv:1901.03407 http://arxiv.org/abs/1901.03407 Accessed: 2021-07-27.Google ScholarGoogle Scholar
  13. Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly Detection: A Survey. ACM Computing Surveys 41, 3 (2009), 15:1--15:58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Christopher Ré, and Matei Zaharia. 2019. Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark. ACM SIGOPS Operating Systems Review 53, 1 (2019), 14--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In ACM Symposium on Cloud Computing (SoCC). 143--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. The Standard Performance Evaluation Corporation. [n.d.]. SPEC Benchmarks. https://www.spec.org/ Accessed: 2021-07-27.Google ScholarGoogle Scholar
  17. The Transaction Processing Council. [n.d.]. TPC Benchmarks. http://www.tpc.org/ Accessed: 2021-07-27.Google ScholarGoogle Scholar
  18. Hoang Anh Dau, Anthony J. Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn J. Keogh. 2018. The UCR Time Series Archive. CoRR abs/1810.07758 (2018). arXiv:1810.07758 http://arxiv.org/abs/1810.07758 Accessed: 2021-07-27.Google ScholarGoogle Scholar
  19. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 248--255.Google ScholarGoogle Scholar
  20. Dheeru Dua and Casey Graff. [n.d.]. The UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/ Accessed: 2021-07-27.Google ScholarGoogle Scholar
  21. Kareem El Gebaly, Parag Agrawal, Lukasz Golab, Flip Korn, and Divesh Srivastava. 2014. Interpretable and Informative Explanations of Outcomes. Proceedings of the VLDB Endowment (PVLDB) 8, 1 (2014), 61--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Andrew F. Emmott, Shubhomoy Das, Thomas Dietterich, Alan Fern, and Weng Keen Wong. 2013. Systematic Construction of Anomaly Detection Benchmarks from Real Data. In ACM SIGKDD Workshop on Outlier Detection and Description (ODD). 16--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. FICO. 2018. Explainable Machine Learning Challenge. https://community.fico.com/s/explainable-machine-learning-challenge Accessed: 2021-07-27.Google ScholarGoogle Scholar
  24. Nicholas Frosst and Geoffrey E. Hinton. 2017. Distilling a Neural Network Into a Soft Decision Tree. In International Workshop on Comprehensibility and Explanation in AI and ML.Google ScholarGoogle Scholar
  25. Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: Towards an Industry Standard Benchmark for Big Data Analytics. In ACM SIGMOD International Conference on Management of Data. 1197--1208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jim Gray. 1993. The Benchmark Handbook for Database and Transaction Systems. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. 2014. Outlier Detection for Temporal Data: A Survey. IEEE Transactions on Knowledge and Data Engineering 26, 9 (2014), 2250--2267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Geoffrey Hinton and Ruslan Salakhutdinov. 2006. Reducing the Dimensionality of Data with Neural Networks. Science 313, 5786 (2006), 504 -- 507.Google ScholarGoogle Scholar
  29. Vincent Jacob, Fei Song, Arnaud Stiegler, Bijan Rad, Yanlei Diao, and Nesime Tatbul. 2021. Exathlon: A Benchmark for Explainable Anomaly Detection over Time Series. CoRR abs/2010.05073 (2021). arXiv:2010.05073 http://arxiv.org/abs/2010.05073 Accessed: 2021-07-27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Vimalkumar Jeyakumar, Omid Madani, Ali Parandeh, Ashutosh Kulshreshtha, Weifei Zeng, and Navindra Yadav. 2019. ExplainIt! - A Declarative Root-cause Analysis Engine for Time Series Data. In ACM International Conference on Management of Data (SIGMOD). 333--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Srikanth Kandula, Ratul Mahajan, Patrick Verkaik, Sharad Agarwal, Jitu Padhye, and Victor Bahl. 2009. Detailed Diagnosis in Enterprise Networks. In ACM SIGCOMM Conference. 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Martin Kopp, Tomás Pevný, and Martin Holena. 2020. Anomaly Explanation with Random Forests. Expert Systems with Applications 149 (2020), 113187.Google ScholarGoogle ScholarCross RefCross Ref
  33. Himabindu Lakkaraju, Stephen H. Bach, and Jure Leskovec. 2016. Interpretable Decision Sets: A Joint Framework for Description and Prediction. In ACM SIGKDD Conference. 1675--1684. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Alexander Lavin and Subutai Ahmad. 2015. Evaluating Real-Time Anomaly Detection Algorithms - The Numenta Anomaly Benchmark. In IEEE International Conference on Machine Learning and Applications (ICMLA). 38--44.Google ScholarGoogle Scholar
  35. Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, and Prashant J. Shenoy. 2012. SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce. ACM Transactions on Database Systems 37, 4 (2012), 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2012. Isolation-Based Anomaly Detection. ACM Transactions on Knowledge Discovery from Data 6, 1 (2012), 3:1--3:39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Annual Conference on Neural Information Processing Systems (NIPS). 4765--4774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Minghua Ma, Zheng Yin, Shenglin Zhang, Sheng Wang, Christopher Zheng, Xinhao Jiang, Hanwen Hu, Cheng Luo, Yilin Li, Nengjun Qiu, Feifei Li, Changcheng Chen, and Dan Pei. 2020. Diagnosing Root Causes of Intermittent Slow Queries in Large-Scale Cloud Databases. Proceedings of the VLDB Endowment (PVLDB) 13, 8 (2020), 1176--1189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal. 2015. Long Short Term Memory Networks for Anomaly Detection in Time Series. In European Symposium on Artificial Neural Networks (ESANN). 89--94.Google ScholarGoogle Scholar
  40. George A. Miller. 1995. WordNet: A Lexical Database for English. Communications of the ACM 38, 11 (1995), 39--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Christoph Molnar. 2021. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/ Accessed: 2021-07-27.Google ScholarGoogle Scholar
  42. Tilmann Rabl, Christoph Brücke, Philipp Härtling, Stella Stars, Rodrigo Escobar Palacios, Hamesh Patel, Satyam Srivastava, Christoph Boden, Jens Meiners, and Sebastian Schelter. 2019. ADABench - Towards an Industry Standard Benchmark for Advanced Analytics. In TPC Technology Conference on Performance Evaluation and Benchmarking (TPCTC). 47--63.Google ScholarGoogle Scholar
  43. Shebuti Rayana. 2016. Outlier Detection DataSets (ODDS) Library. http://odds.cs.stonybrook.edu/ Accessed: 2021-07-27.Google ScholarGoogle Scholar
  44. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. ACM SIGKDD Conference, 1135--1144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-Precision Model-Agnostic Explanations. AAAI Conference.Google ScholarGoogle Scholar
  46. Sudeepa Roy, Laurel Orr, and Dan Suciu. 2015. Explaining Query Answers with Explanation-Ready Databases. Proceedings of the VLDB Endowment (PVLDB) 9, 4 (2015), 348--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Thomas Schlegl, Philipp Seeböck, Sebastian M. Waldstein, Ursula Schmidt-Erfurth, and Georg Langs. 2017. Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery. In International Conference on Information Processing in Medical Imaging (IPMI). 146--157.Google ScholarGoogle ScholarCross RefCross Ref
  48. Nidhi Singh and Craig Olinsky. 2017. Demystifying Numenta Anomaly Benchmark. In International Joint Conference on Neural Networks (IJCNN). 1570--1577.Google ScholarGoogle Scholar
  49. Spark-uses [n.d.]. How are Big Companies using Apache Spark. https://medium.com/@tao_66792/how-are-big-companies-using-apache-spark-413743dbbbae Accessed: 2021-07-27.Google ScholarGoogle Scholar
  50. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic Attribution for Deep Networks. ICML, 3319--3328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Nesime Tatbul, Tae Jun Lee, Stan Zdonik, Mejbah Alam, and Justin Gottschlich. 2018. Precision and Recall for Time Series. In Annual Conference on Neural Information Processing Systems (NeurIPS). 1924--1934. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Luan Tran, Liyue Fan, and Cyrus Shahabi. 2015. Distance Based Outlier Detection for Data Streams. Proceedings of the VLDB Endowment (PVLDB) 9, 4 (2015), 1089--1100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  54. Eugene Wu and Samuel Madden. 2013. Scorpion: Explaining Away Outliers in Aggregate Queries. Proceedings of the VLDB Endowment (PVLDB) 6, 8 (2013), 553--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, and Finale Doshi-Velez. 2018. Beyond Sparsity: Tree Regularization of Deep Models for Interpretability. AAAI Conference.Google ScholarGoogle Scholar
  56. Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, Jie Chen, Zhaogang Wang, and Honglin Qiao. 2018. Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications. In International World Wide Web Conference (WWW). 187--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Jiawei Yang, Susanto Rahardja, and Pasi Fränti. 2019. Outlier Detection: How to Threshold Outlier Scores?. In International Conference on Artificial Intelligence, Information Processing and Cloud Computing (AIIPCC). 37:1--37:6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Haopeng Zhang, Yanlei Diao, and Alexandra Meliou. 2017. EXstream: Explaining Anomalies in Event Stream Monitoring. In International Conference on Extending Database Technology (EDBT). 156--167.Google ScholarGoogle Scholar

Index Terms

  1. Exathlon: a benchmark for explainable anomaly detection over time series
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 14, Issue 11
      July 2021
      732 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 July 2021
      Published in pvldb Volume 14, Issue 11

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader