Abstract
Access to high-quality data repositories and benchmarks have been instrumental in advancing the state of the art in many experimental research domains. While advanced analytics tasks over time series data have been gaining lots of attention, lack of such community resources severely limits scientific progress. In this paper, we present Exathlon, the first comprehensive public benchmark for explainable anomaly detection over high-dimensional time series data. Exathlon has been systematically constructed based on real data traces from repeated executions of large-scale stream processing jobs on an Apache Spark cluster. Some of these executions were intentionally disturbed by introducing instances of six different types of anomalous events (e.g., misbehaving inputs, resource contention, process failures). For each of the anomaly instances, ground truth labels for the root cause interval as well as those for the extended effect interval are provided, supporting the development and evaluation of a wide range of anomaly detection (AD) and explanation discovery (ED) tasks. We demonstrate the practical utility of Exathlon's dataset, evaluation methodology, and end-to-end data science pipeline design through an experimental study with three state-of-the-art AD and ED techniques.
- Arvind Arasu, Mitch Cherniack, Eddie F. Galvez, David Maier, Anurag Maskey, Esther Ryvkina, Michael Stonebraker, and Richard Tibbetts. 2004. Linear Road: A Stream Data Management Benchmark. In VLDB Conference. 480--491. Google ScholarDigital Library
- Anthony J. Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn J. Keogh. 2018. The UEA Multivariate Time Series Classification Archive, 2018. CoRR abs/1811.00075 (2018). arXiv:1811.00075 http://arxiv.org/abs/1811.00075 Accessed: 2021-07-27.Google Scholar
- Peter Bailis, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, and Sahaana Suri. 2017. MacroBase: Prioritizing Attention in Fast Data. In ACM International Conference on Management of Data (SIGMOD). 541--556. Google ScholarDigital Library
- Andrei Barbu, David Mayo, Julian Alverio, William Luo, Christopher Wang, Dan Gutfreund, Josh Tenenbaum, and Boris Katz. 2019. ObjectNet: A Large-Scale Bias-controlled Dataset for Pushing the Limits of Object Recognition Models. In Annual Conference on Neural Information Processing Systems (NeurIPS). 9453--9463. Google ScholarDigital Library
- Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, Justin Reynolds, and Casey Rosenthal. 2016. Chaos Engineering. IEEE Software 33, 3 (2016), 35--41. Google ScholarDigital Library
- Leopoldo E. Bertossi, Jordan Li, Maximilian Schleich, Dan Suciu, and Zografoula Vagena. 2020. Causality-based Explanation of Classification Outcomes. In Fourth Workshop on Data Management for End-To-End Machine Learning (DEEM). 6:1--6:10. Google ScholarDigital Library
- Ana Maria Bianco, Marta Garcia Ben, Eunie Jr. Martinez, and Victor J. Yohai. 2001. Outlier Detection in Regression Models with ARIMA Errors using Robust Estimates. Journal of Forecasting 20, 8 (2001), 565--579.Google ScholarCross Ref
- Paul Boniol, Michele Linardi, Federico Roncallo, and Themis Palpanas. 2020. Automated Anomaly Detection in Large Sequences. In IEEE 36th International Conference on Data Engineering (ICDE). 1834--1837.Google Scholar
- Paul Boniol and Themis Palpanas. 2020. Series2Graph: Graph-Based Subsequence Anomaly Detection for Time Series. Proceedings of the VLDB Endowment (PVLDB) 13, 12 (2020), 1821--1834. Google ScholarDigital Library
- Loïc Bontemps, Van Loi Cao, James McDermott, and Nhien-An Le-Khac. 2016. Collective Anomaly Detection Based on Long Short-Term Memory Recurrent Neural Networks. In International Conference on Future Data and Security Engineering (FDSE), Vol. 10018. 141--152.Google Scholar
- Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying Density-based Local Outliers. In ACM International Conference on Management of Data (SIGMOD). 93--104. Google ScholarDigital Library
- Raghavendra Chalapathy and Sanjay Chawla. 2019. Deep Learning for Anomaly Detection: A Survey. CoRR abs/1901.03407 (2019). arXiv:1901.03407 http://arxiv.org/abs/1901.03407 Accessed: 2021-07-27.Google Scholar
- Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly Detection: A Survey. ACM Computing Surveys 41, 3 (2009), 15:1--15:58. Google ScholarDigital Library
- Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Christopher Ré, and Matei Zaharia. 2019. Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark. ACM SIGOPS Operating Systems Review 53, 1 (2019), 14--25. Google ScholarDigital Library
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In ACM Symposium on Cloud Computing (SoCC). 143--154. Google ScholarDigital Library
- The Standard Performance Evaluation Corporation. [n.d.]. SPEC Benchmarks. https://www.spec.org/ Accessed: 2021-07-27.Google Scholar
- The Transaction Processing Council. [n.d.]. TPC Benchmarks. http://www.tpc.org/ Accessed: 2021-07-27.Google Scholar
- Hoang Anh Dau, Anthony J. Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn J. Keogh. 2018. The UCR Time Series Archive. CoRR abs/1810.07758 (2018). arXiv:1810.07758 http://arxiv.org/abs/1810.07758 Accessed: 2021-07-27.Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 248--255.Google Scholar
- Dheeru Dua and Casey Graff. [n.d.]. The UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/ Accessed: 2021-07-27.Google Scholar
- Kareem El Gebaly, Parag Agrawal, Lukasz Golab, Flip Korn, and Divesh Srivastava. 2014. Interpretable and Informative Explanations of Outcomes. Proceedings of the VLDB Endowment (PVLDB) 8, 1 (2014), 61--72. Google ScholarDigital Library
- Andrew F. Emmott, Shubhomoy Das, Thomas Dietterich, Alan Fern, and Weng Keen Wong. 2013. Systematic Construction of Anomaly Detection Benchmarks from Real Data. In ACM SIGKDD Workshop on Outlier Detection and Description (ODD). 16--21. Google ScholarDigital Library
- FICO. 2018. Explainable Machine Learning Challenge. https://community.fico.com/s/explainable-machine-learning-challenge Accessed: 2021-07-27.Google Scholar
- Nicholas Frosst and Geoffrey E. Hinton. 2017. Distilling a Neural Network Into a Soft Decision Tree. In International Workshop on Comprehensibility and Explanation in AI and ML.Google Scholar
- Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: Towards an Industry Standard Benchmark for Big Data Analytics. In ACM SIGMOD International Conference on Management of Data. 1197--1208. Google ScholarDigital Library
- Jim Gray. 1993. The Benchmark Handbook for Database and Transaction Systems. Morgan Kaufmann. Google ScholarDigital Library
- Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. 2014. Outlier Detection for Temporal Data: A Survey. IEEE Transactions on Knowledge and Data Engineering 26, 9 (2014), 2250--2267. Google ScholarDigital Library
- Geoffrey Hinton and Ruslan Salakhutdinov. 2006. Reducing the Dimensionality of Data with Neural Networks. Science 313, 5786 (2006), 504 -- 507.Google Scholar
- Vincent Jacob, Fei Song, Arnaud Stiegler, Bijan Rad, Yanlei Diao, and Nesime Tatbul. 2021. Exathlon: A Benchmark for Explainable Anomaly Detection over Time Series. CoRR abs/2010.05073 (2021). arXiv:2010.05073 http://arxiv.org/abs/2010.05073 Accessed: 2021-07-27. Google ScholarDigital Library
- Vimalkumar Jeyakumar, Omid Madani, Ali Parandeh, Ashutosh Kulshreshtha, Weifei Zeng, and Navindra Yadav. 2019. ExplainIt! - A Declarative Root-cause Analysis Engine for Time Series Data. In ACM International Conference on Management of Data (SIGMOD). 333--348. Google ScholarDigital Library
- Srikanth Kandula, Ratul Mahajan, Patrick Verkaik, Sharad Agarwal, Jitu Padhye, and Victor Bahl. 2009. Detailed Diagnosis in Enterprise Networks. In ACM SIGCOMM Conference. 243--254. Google ScholarDigital Library
- Martin Kopp, Tomás Pevný, and Martin Holena. 2020. Anomaly Explanation with Random Forests. Expert Systems with Applications 149 (2020), 113187.Google ScholarCross Ref
- Himabindu Lakkaraju, Stephen H. Bach, and Jure Leskovec. 2016. Interpretable Decision Sets: A Joint Framework for Description and Prediction. In ACM SIGKDD Conference. 1675--1684. Google ScholarDigital Library
- Alexander Lavin and Subutai Ahmad. 2015. Evaluating Real-Time Anomaly Detection Algorithms - The Numenta Anomaly Benchmark. In IEEE International Conference on Machine Learning and Applications (ICMLA). 38--44.Google Scholar
- Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, and Prashant J. Shenoy. 2012. SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce. ACM Transactions on Database Systems 37, 4 (2012), 27. Google ScholarDigital Library
- Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2012. Isolation-Based Anomaly Detection. ACM Transactions on Knowledge Discovery from Data 6, 1 (2012), 3:1--3:39. Google ScholarDigital Library
- Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Annual Conference on Neural Information Processing Systems (NIPS). 4765--4774. Google ScholarDigital Library
- Minghua Ma, Zheng Yin, Shenglin Zhang, Sheng Wang, Christopher Zheng, Xinhao Jiang, Hanwen Hu, Cheng Luo, Yilin Li, Nengjun Qiu, Feifei Li, Changcheng Chen, and Dan Pei. 2020. Diagnosing Root Causes of Intermittent Slow Queries in Large-Scale Cloud Databases. Proceedings of the VLDB Endowment (PVLDB) 13, 8 (2020), 1176--1189. Google ScholarDigital Library
- Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal. 2015. Long Short Term Memory Networks for Anomaly Detection in Time Series. In European Symposium on Artificial Neural Networks (ESANN). 89--94.Google Scholar
- George A. Miller. 1995. WordNet: A Lexical Database for English. Communications of the ACM 38, 11 (1995), 39--41. Google ScholarDigital Library
- Christoph Molnar. 2021. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/ Accessed: 2021-07-27.Google Scholar
- Tilmann Rabl, Christoph Brücke, Philipp Härtling, Stella Stars, Rodrigo Escobar Palacios, Hamesh Patel, Satyam Srivastava, Christoph Boden, Jens Meiners, and Sebastian Schelter. 2019. ADABench - Towards an Industry Standard Benchmark for Advanced Analytics. In TPC Technology Conference on Performance Evaluation and Benchmarking (TPCTC). 47--63.Google Scholar
- Shebuti Rayana. 2016. Outlier Detection DataSets (ODDS) Library. http://odds.cs.stonybrook.edu/ Accessed: 2021-07-27.Google Scholar
- Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. ACM SIGKDD Conference, 1135--1144. Google ScholarDigital Library
- Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-Precision Model-Agnostic Explanations. AAAI Conference.Google Scholar
- Sudeepa Roy, Laurel Orr, and Dan Suciu. 2015. Explaining Query Answers with Explanation-Ready Databases. Proceedings of the VLDB Endowment (PVLDB) 9, 4 (2015), 348--359. Google ScholarDigital Library
- Thomas Schlegl, Philipp Seeböck, Sebastian M. Waldstein, Ursula Schmidt-Erfurth, and Georg Langs. 2017. Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery. In International Conference on Information Processing in Medical Imaging (IPMI). 146--157.Google ScholarCross Ref
- Nidhi Singh and Craig Olinsky. 2017. Demystifying Numenta Anomaly Benchmark. In International Joint Conference on Neural Networks (IJCNN). 1570--1577.Google Scholar
- Spark-uses [n.d.]. How are Big Companies using Apache Spark. https://medium.com/@tao_66792/how-are-big-companies-using-apache-spark-413743dbbbae Accessed: 2021-07-27.Google Scholar
- Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic Attribution for Deep Networks. ICML, 3319--3328. Google ScholarDigital Library
- Nesime Tatbul, Tae Jun Lee, Stan Zdonik, Mejbah Alam, and Justin Gottschlich. 2018. Precision and Recall for Time Series. In Annual Conference on Neural Information Processing Systems (NeurIPS). 1924--1934. Google ScholarDigital Library
- Luan Tran, Liyue Fan, and Cyrus Shahabi. 2015. Distance Based Outlier Detection for Data Streams. Proceedings of the VLDB Endowment (PVLDB) 9, 4 (2015), 1089--1100. Google ScholarDigital Library
- Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In International Conference on Learning Representations (ICLR).Google Scholar
- Eugene Wu and Samuel Madden. 2013. Scorpion: Explaining Away Outliers in Aggregate Queries. Proceedings of the VLDB Endowment (PVLDB) 6, 8 (2013), 553--564. Google ScholarDigital Library
- Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, and Finale Doshi-Velez. 2018. Beyond Sparsity: Tree Regularization of Deep Models for Interpretability. AAAI Conference.Google Scholar
- Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, Jie Chen, Zhaogang Wang, and Honglin Qiao. 2018. Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications. In International World Wide Web Conference (WWW). 187--196. Google ScholarDigital Library
- Jiawei Yang, Susanto Rahardja, and Pasi Fränti. 2019. Outlier Detection: How to Threshold Outlier Scores?. In International Conference on Artificial Intelligence, Information Processing and Cloud Computing (AIIPCC). 37:1--37:6. Google ScholarDigital Library
- Haopeng Zhang, Yanlei Diao, and Alexandra Meliou. 2017. EXstream: Explaining Anomalies in Event Stream Monitoring. In International Conference on Extending Database Technology (EDBT). 156--167.Google Scholar
Index Terms
- Exathlon: a benchmark for explainable anomaly detection over time series
Recommendations
A demonstration of the exathlon benchmarking platform for explainable anomaly detection
In this demo, we introduce Exathlon - a new benchmarking platform for explainable anomaly detection over high-dimensional time series. We designed Exathlon to support data scientists and researchers in developing and evaluating learned models and ...
Comments