research-article

Exathlon: a benchmark for explainable anomaly detection over time series

Authors:
Vincent Jacob

Ecole Polytechnique, France

Ecole Polytechnique, France
View Profile

,
Fei Song

Ecole Polytechnique, France

Ecole Polytechnique, France
View Profile

,
Arnaud Stiegler

Ecole Polytechnique, France

Ecole Polytechnique, France
View Profile

,
Bijan Rad

Ecole Polytechnique, France

Ecole Polytechnique, France
View Profile

,
Yanlei Diao

Ecole Polytechnique, France

Ecole Polytechnique, France
View Profile

,
Nesime Tatbul

Intel Labs and MIT

Intel Labs and MIT
View Profile

Proceedings of the VLDB Endowment Volume 14 Issue 11pp 2613–2626https://doi.org/10.14778/3476249.3476307

Published:01 July 2021Publication History

Proceedings of the VLDB Endowment

Abstract

Access to high-quality data repositories and benchmarks have been instrumental in advancing the state of the art in many experimental research domains. While advanced analytics tasks over time series data have been gaining lots of attention, lack of such community resources severely limits scientific progress. In this paper, we present Exathlon, the first comprehensive public benchmark for explainable anomaly detection over high-dimensional time series data. Exathlon has been systematically constructed based on real data traces from repeated executions of large-scale stream processing jobs on an Apache Spark cluster. Some of these executions were intentionally disturbed by introducing instances of six different types of anomalous events (e.g., misbehaving inputs, resource contention, process failures). For each of the anomaly instances, ground truth labels for the root cause interval as well as those for the extended effect interval are provided, supporting the development and evaluation of a wide range of anomaly detection (AD) and explanation discovery (ED) tasks. We demonstrate the practical utility of Exathlon's dataset, evaluation methodology, and end-to-end data science pipeline design through an experimental study with three state-of-the-art AD and ED techniques.

References

Arvind Arasu, Mitch Cherniack, Eddie F. Galvez, David Maier, Anurag Maskey, Esther Ryvkina, Michael Stonebraker, and Richard Tibbetts. 2004. Linear Road: A Stream Data Management Benchmark. In VLDB Conference. 480--491. Google ScholarDigital Library
Anthony J. Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn J. Keogh. 2018. The UEA Multivariate Time Series Classification Archive, 2018. CoRR abs/1811.00075 (2018). arXiv:1811.00075 http://arxiv.org/abs/1811.00075 Accessed: 2021-07-27.Google Scholar
Peter Bailis, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, and Sahaana Suri. 2017. MacroBase: Prioritizing Attention in Fast Data. In ACM International Conference on Management of Data (SIGMOD). 541--556. Google ScholarDigital Library
Andrei Barbu, David Mayo, Julian Alverio, William Luo, Christopher Wang, Dan Gutfreund, Josh Tenenbaum, and Boris Katz. 2019. ObjectNet: A Large-Scale Bias-controlled Dataset for Pushing the Limits of Object Recognition Models. In Annual Conference on Neural Information Processing Systems (NeurIPS). 9453--9463. Google ScholarDigital Library
Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, Justin Reynolds, and Casey Rosenthal. 2016. Chaos Engineering. IEEE Software 33, 3 (2016), 35--41. Google ScholarDigital Library
Leopoldo E. Bertossi, Jordan Li, Maximilian Schleich, Dan Suciu, and Zografoula Vagena. 2020. Causality-based Explanation of Classification Outcomes. In Fourth Workshop on Data Management for End-To-End Machine Learning (DEEM). 6:1--6:10. Google ScholarDigital Library
Ana Maria Bianco, Marta Garcia Ben, Eunie Jr. Martinez, and Victor J. Yohai. 2001. Outlier Detection in Regression Models with ARIMA Errors using Robust Estimates. Journal of Forecasting 20, 8 (2001), 565--579.Google ScholarCross Ref
Paul Boniol, Michele Linardi, Federico Roncallo, and Themis Palpanas. 2020. Automated Anomaly Detection in Large Sequences. In IEEE 36th International Conference on Data Engineering (ICDE). 1834--1837.Google Scholar
Paul Boniol and Themis Palpanas. 2020. Series2Graph: Graph-Based Subsequence Anomaly Detection for Time Series. Proceedings of the VLDB Endowment (PVLDB) 13, 12 (2020), 1821--1834. Google ScholarDigital Library
Loïc Bontemps, Van Loi Cao, James McDermott, and Nhien-An Le-Khac. 2016. Collective Anomaly Detection Based on Long Short-Term Memory Recurrent Neural Networks. In International Conference on Future Data and Security Engineering (FDSE), Vol. 10018. 141--152.Google Scholar
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying Density-based Local Outliers. In ACM International Conference on Management of Data (SIGMOD). 93--104. Google ScholarDigital Library
Raghavendra Chalapathy and Sanjay Chawla. 2019. Deep Learning for Anomaly Detection: A Survey. CoRR abs/1901.03407 (2019). arXiv:1901.03407 http://arxiv.org/abs/1901.03407 Accessed: 2021-07-27.Google Scholar
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly Detection: A Survey. ACM Computing Surveys 41, 3 (2009), 15:1--15:58. Google ScholarDigital Library
Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Christopher Ré, and Matei Zaharia. 2019. Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark. ACM SIGOPS Operating Systems Review 53, 1 (2019), 14--25. Google ScholarDigital Library
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In ACM Symposium on Cloud Computing (SoCC). 143--154. Google ScholarDigital Library
The Standard Performance Evaluation Corporation. [n.d.]. SPEC Benchmarks. https://www.spec.org/ Accessed: 2021-07-27.Google Scholar
The Transaction Processing Council. [n.d.]. TPC Benchmarks. http://www.tpc.org/ Accessed: 2021-07-27.Google Scholar
Hoang Anh Dau, Anthony J. Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn J. Keogh. 2018. The UCR Time Series Archive. CoRR abs/1810.07758 (2018). arXiv:1810.07758 http://arxiv.org/abs/1810.07758 Accessed: 2021-07-27.Google Scholar
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 248--255.Google Scholar
Dheeru Dua and Casey Graff. [n.d.]. The UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/ Accessed: 2021-07-27.Google Scholar
Kareem El Gebaly, Parag Agrawal, Lukasz Golab, Flip Korn, and Divesh Srivastava. 2014. Interpretable and Informative Explanations of Outcomes. Proceedings of the VLDB Endowment (PVLDB) 8, 1 (2014), 61--72. Google ScholarDigital Library
Andrew F. Emmott, Shubhomoy Das, Thomas Dietterich, Alan Fern, and Weng Keen Wong. 2013. Systematic Construction of Anomaly Detection Benchmarks from Real Data. In ACM SIGKDD Workshop on Outlier Detection and Description (ODD). 16--21. Google ScholarDigital Library
FICO. 2018. Explainable Machine Learning Challenge. https://community.fico.com/s/explainable-machine-learning-challenge Accessed: 2021-07-27.Google Scholar
Nicholas Frosst and Geoffrey E. Hinton. 2017. Distilling a Neural Network Into a Soft Decision Tree. In International Workshop on Comprehensibility and Explanation in AI and ML.Google Scholar
Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: Towards an Industry Standard Benchmark for Big Data Analytics. In ACM SIGMOD International Conference on Management of Data. 1197--1208. Google ScholarDigital Library
Jim Gray. 1993. The Benchmark Handbook for Database and Transaction Systems. Morgan Kaufmann. Google ScholarDigital Library
Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. 2014. Outlier Detection for Temporal Data: A Survey. IEEE Transactions on Knowledge and Data Engineering 26, 9 (2014), 2250--2267. Google ScholarDigital Library
Geoffrey Hinton and Ruslan Salakhutdinov. 2006. Reducing the Dimensionality of Data with Neural Networks. Science 313, 5786 (2006), 504 -- 507.Google Scholar
Vincent Jacob, Fei Song, Arnaud Stiegler, Bijan Rad, Yanlei Diao, and Nesime Tatbul. 2021. Exathlon: A Benchmark for Explainable Anomaly Detection over Time Series. CoRR abs/2010.05073 (2021). arXiv:2010.05073 http://arxiv.org/abs/2010.05073 Accessed: 2021-07-27. Google ScholarDigital Library
Vimalkumar Jeyakumar, Omid Madani, Ali Parandeh, Ashutosh Kulshreshtha, Weifei Zeng, and Navindra Yadav. 2019. ExplainIt! - A Declarative Root-cause Analysis Engine for Time Series Data. In ACM International Conference on Management of Data (SIGMOD). 333--348. Google ScholarDigital Library
Srikanth Kandula, Ratul Mahajan, Patrick Verkaik, Sharad Agarwal, Jitu Padhye, and Victor Bahl. 2009. Detailed Diagnosis in Enterprise Networks. In ACM SIGCOMM Conference. 243--254. Google ScholarDigital Library
Martin Kopp, Tomás Pevný, and Martin Holena. 2020. Anomaly Explanation with Random Forests. Expert Systems with Applications 149 (2020), 113187.Google ScholarCross Ref
Himabindu Lakkaraju, Stephen H. Bach, and Jure Leskovec. 2016. Interpretable Decision Sets: A Joint Framework for Description and Prediction. In ACM SIGKDD Conference. 1675--1684. Google ScholarDigital Library
Alexander Lavin and Subutai Ahmad. 2015. Evaluating Real-Time Anomaly Detection Algorithms - The Numenta Anomaly Benchmark. In IEEE International Conference on Machine Learning and Applications (ICMLA). 38--44.Google Scholar
Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, and Prashant J. Shenoy. 2012. SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce. ACM Transactions on Database Systems 37, 4 (2012), 27. Google ScholarDigital Library
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2012. Isolation-Based Anomaly Detection. ACM Transactions on Knowledge Discovery from Data 6, 1 (2012), 3:1--3:39. Google ScholarDigital Library
Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Annual Conference on Neural Information Processing Systems (NIPS). 4765--4774. Google ScholarDigital Library
Minghua Ma, Zheng Yin, Shenglin Zhang, Sheng Wang, Christopher Zheng, Xinhao Jiang, Hanwen Hu, Cheng Luo, Yilin Li, Nengjun Qiu, Feifei Li, Changcheng Chen, and Dan Pei. 2020. Diagnosing Root Causes of Intermittent Slow Queries in Large-Scale Cloud Databases. Proceedings of the VLDB Endowment (PVLDB) 13, 8 (2020), 1176--1189. Google ScholarDigital Library
Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal. 2015. Long Short Term Memory Networks for Anomaly Detection in Time Series. In European Symposium on Artificial Neural Networks (ESANN). 89--94.Google Scholar
George A. Miller. 1995. WordNet: A Lexical Database for English. Communications of the ACM 38, 11 (1995), 39--41. Google ScholarDigital Library
Christoph Molnar. 2021. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/ Accessed: 2021-07-27.Google Scholar
Tilmann Rabl, Christoph Brücke, Philipp Härtling, Stella Stars, Rodrigo Escobar Palacios, Hamesh Patel, Satyam Srivastava, Christoph Boden, Jens Meiners, and Sebastian Schelter. 2019. ADABench - Towards an Industry Standard Benchmark for Advanced Analytics. In TPC Technology Conference on Performance Evaluation and Benchmarking (TPCTC). 47--63.Google Scholar
Shebuti Rayana. 2016. Outlier Detection DataSets (ODDS) Library. http://odds.cs.stonybrook.edu/ Accessed: 2021-07-27.Google Scholar
Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. ACM SIGKDD Conference, 1135--1144. Google ScholarDigital Library
Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-Precision Model-Agnostic Explanations. AAAI Conference.Google Scholar
Sudeepa Roy, Laurel Orr, and Dan Suciu. 2015. Explaining Query Answers with Explanation-Ready Databases. Proceedings of the VLDB Endowment (PVLDB) 9, 4 (2015), 348--359. Google ScholarDigital Library
Thomas Schlegl, Philipp Seeböck, Sebastian M. Waldstein, Ursula Schmidt-Erfurth, and Georg Langs. 2017. Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery. In International Conference on Information Processing in Medical Imaging (IPMI). 146--157.Google ScholarCross Ref
Nidhi Singh and Craig Olinsky. 2017. Demystifying Numenta Anomaly Benchmark. In International Joint Conference on Neural Networks (IJCNN). 1570--1577.Google Scholar
Spark-uses [n.d.]. How are Big Companies using Apache Spark. https://medium.com/@tao_66792/how-are-big-companies-using-apache-spark-413743dbbbae Accessed: 2021-07-27.Google Scholar
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic Attribution for Deep Networks. ICML, 3319--3328. Google ScholarDigital Library
Nesime Tatbul, Tae Jun Lee, Stan Zdonik, Mejbah Alam, and Justin Gottschlich. 2018. Precision and Recall for Time Series. In Annual Conference on Neural Information Processing Systems (NeurIPS). 1924--1934. Google ScholarDigital Library
Luan Tran, Liyue Fan, and Cyrus Shahabi. 2015. Distance Based Outlier Detection for Data Streams. Proceedings of the VLDB Endowment (PVLDB) 9, 4 (2015), 1089--1100. Google ScholarDigital Library
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In International Conference on Learning Representations (ICLR).Google Scholar
Eugene Wu and Samuel Madden. 2013. Scorpion: Explaining Away Outliers in Aggregate Queries. Proceedings of the VLDB Endowment (PVLDB) 6, 8 (2013), 553--564. Google ScholarDigital Library
Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, and Finale Doshi-Velez. 2018. Beyond Sparsity: Tree Regularization of Deep Models for Interpretability. AAAI Conference.Google Scholar
Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, Jie Chen, Zhaogang Wang, and Honglin Qiao. 2018. Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications. In International World Wide Web Conference (WWW). 187--196. Google ScholarDigital Library
Jiawei Yang, Susanto Rahardja, and Pasi Fränti. 2019. Outlier Detection: How to Threshold Outlier Scores?. In International Conference on Artificial Intelligence, Information Processing and Cloud Computing (AIIPCC). 37:1--37:6. Google ScholarDigital Library
Haopeng Zhang, Yanlei Diao, and Alexandra Meliou. 2017. EXstream: Explaining Anomalies in Event Stream Monitoring. In International Conference on Extending Database Technology (EDBT). 156--167.Google Scholar

Index Terms

Exathlon: a benchmark for explainable anomaly detection over time series
1. Information systems
  1. Information systems applications
    1. Data mining

Index terms have been assigned to the content through auto-classification.

Recommendations

A demonstration of the exathlon benchmarking platform for explainable anomaly detection

In this demo, we introduce Exathlon - a new benchmarking platform for explainable anomaly detection over high-dimensional time series. We designed Exathlon to support data scientists and researchers in developing and evaluating learned models and ...
Read More
Big Data Analytics with R and Hadoop
Read More
Big Data Analytics
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 14, Issue 11
July 2021
732 pages
ISSN:2150-8097
Editors:
Xin Luna Dong
Amazon
,
Felix Naumann
HPI, University of Potsdam
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 July 2021
Published in pvldb Volume 14, Issue 11
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 169
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exathlon: a benchmark for explainable anomaly detection over time series

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

A demonstration of the exathlon benchmarking platform for explainable anomaly detection

Big Data Analytics with R and Hadoop

Big Data Analytics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Exathlon: a benchmark for explainable anomaly detection over time series

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

A demonstration of the exathlon benchmarking platform for explainable anomaly detection

Big Data Analytics with R and Hadoop

Big Data Analytics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media