Exact variable-length anomaly detection algorithm for univariate and multivariate time series

Wang, Xing; Lin, Jessica; Patel, Nital; Braun, Martin

doi:10.1007/s10618-018-0569-7

Exact variable-length anomaly detection algorithm for univariate and multivariate time series

Published: 31 July 2018

Volume 32, pages 1806–1844, (2018)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Xing Wang¹,
Jessica Lin ORCID: orcid.org/0000-0002-4887-0692¹,
Nital Patel² &
…
Martin Braun²

3055 Accesses
30 Citations
Explore all metrics

Abstract

The problem of anomaly detection in time series has received a lot of attention in the past two decades. However, existing techniques cannot locate where the anomalies are within anomalous time series, or they require users to provide the length of potential anomalies. To address these limitations, we propose a self-learning online anomaly detection algorithm that automatically identifies anomalous time series, as well as the exact locations where the anomalies occur in the detected time series. In addition, for multivariate time series, it is difficult to detect anomalies due to the following challenges. First, anomalies may occur in only a subset of dimensions (variables). Second, the locations and lengths of anomalous subsequences may be different in different dimensions. Third, some anomalies may look normal in each individual dimension but different with combinations of dimensions. To mitigate these problems, we introduce a multivariate anomaly detection algorithm which detects anomalies and identifies the dimensions and locations of the anomalous subsequences. We evaluate our approaches on several real-world datasets, including two CPU manufacturing data from Intel. We demonstrate that our approach can successfully detect the correct anomalies without requiring any prior knowledge about the data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 12

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Article Open access 18 December 2020

A survey of methods for time series change point detection

Article 08 September 2016

Evaluating time series forecasting models: an empirical study on performance estimation methods

Article 13 October 2020

Notes

We use the whole time series to demonstrate the idea, but our approach also works in the scenario when points of time series come in a streaming fashion.
Reflow soldering: https://en.wikipedia.org/wiki/Reflow_soldering.
http://www.eigenvector.com/data/Etch/index.html.

References

Aggarwal CC, Yu PS (2010) On clustering massive text and categorical data streams. Knowl Inf Syst 24(2):171–196
Article Google Scholar
Ahmed M, Baqqar M, Gu F, Ball AD (2012) Fault detection and diagnosis using principal component analysis of vibration data from a reciprocating compressor. In: Proceedings of 2012 UKACC international conference on control, pp 461–466
Baragona R, Battaglia F (2007) Outliers detection in multivariate time series by independent component analysis. Neural Comput 19(7):1962–1984
Article Google Scholar
Begum N, Ulanova L, Wang J, Keogh E (2015) Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’15, pp 49–58
Budalakoti S, Srivastava AN, Akella R, Turkov E (2006) Anomaly detection in large sets of high-dimensional symbol sequences. Tech Rep
Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’10, pp 333–342
Chandola V, Cheboli D, Kumar V (2009) Detecting anomalies in a time series database. University of Minnesota, Tech Rep, Computer Science Department
Cheng H, Tan PN, Potter C, Klooster S (2009) Detection and characterization of anomalies in multivariate time series. In: Proceedings of the 2009 SIAM international conference on data mining, pp 413–424
Chapter Google Scholar
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, AAAI Press, KDD’96, pp 226–231
Galeano P, Pea D, Tsay RS (2006) Outlier detection in multivariate time series by projection pursuit. J Am Stat Assoc 101(474):654–669
Article MathSciNet Google Scholar
Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data. Synth Lect Data Min Knowl Discov 5(1):1–129
Article Google Scholar
Hawkins DM (1980) Identification of outliers, vol 11. Springer, Dordrecht
Book Google Scholar
He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9):1641–1650
Article Google Scholar
Hyndman RJ, Wang E, Laptev N (2015) Large-scale unusual time series detection. In: 2015 IEEE international conference on data mining workshop (ICDMW), pp 1616–1619
Id T, Papadimitriou S, Vlachos M (2007) Computing correlation anomaly scores using stochastic nearest neighbors. In: Seventh IEEE international conference on data mining (ICDM 2007), pp 523–528
Izakian H, Pedrycz W (2014) Anomaly detection and characterization in spatial time series data: a cluster-centric approach. IEEE Trans Fuzzy Syst 22(6):1612–1624
Article Google Scholar
Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50
Article Google Scholar
Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8(2):154–177
Article Google Scholar
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286
Article Google Scholar
Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE international conference on data mining (ICDM’05), pp 8
Keogh E, Lin J, Lee SH, Herle HV (2007) Finding the most unusual time series subsequence: algorithms and applications. Knowl Inf Syst 11(1):1–27
Article Google Scholar
Laptev N, Amizadeh S, Flint I (2015) Generic and scalable framework for automated time-series anomaly detection. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’15, pp 1939–1947
Li Y, Lin J, Oates T (2012) Visualizing variable-length time series motifs. In: Proceedings of the 2012 SIAM international conference on data mining, pp 895–906
Chapter Google Scholar
Li J, Pedrycz W, Jamal I (2017) Multivariate time series anomaly detection: a framework of hidden markov models. Appl Soft Comput 60(Supplement C):229–240
Article Google Scholar
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144
Article MathSciNet Google Scholar
Miljkovi D (2011) Fault detection methods: a literature survey. In: 2011 Proceedings of the 34th international convention MIPRO, pp 750–755
Nevill-Manning CG, Witten IH (1997) Identifying hierarchical structure in sequences: a linear-time algorithm. J Artif Intell Res 7(1):67–82
Article Google Scholar
Pires AM, Santos-Pereira C (2005) Using clustering and robust estimators to detect outliers in multivariate data. In: Proceedings of the international conference on robust statistics
Pukelsheim F (1994) The three sigma rule. Am Stat 48(2):88–91
MathSciNet Google Scholar
Qiu H, Liu Y, Subrahmanya NA, Li W (2012) Granger causality for time-series anomaly detection. In: 2012 IEEE 12th international conference on data mining, pp 1074–1079
Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part III. Springer, Berlin pp 468–472
Google Scholar
Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S (2015) Time series anomaly discovery with grammar-based compression. In: Proceedings of the 18th international conference on extending database technology, EDBT 2015, Brussels, Belgium, March 23–27, 2015, pp 481–492
Sequeira K, Zaki M (2002) Admit: anomaly-based data mining for intrusions. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’02, pp 386–395
Sun H, Bao Y, Zhao F, Yu G, Wang D (2004) Cd-trees: an efficient index structure for outlier detection. In: Li Q, Wang G, Feng L (eds) Advances in web-age information management: 5th international conference, WAIM 2004, Dalian, China, July 15–17, 2004. Springer, Berlin, pp 600–609
Chapter Google Scholar
Wang H, Tang M, Park Y, Priebe CE (2014) Locality statistics for anomaly detection in time series of graphs. IEEE Trans Signal Process 62(3):703–717
Article MathSciNet Google Scholar
Wang X, Gao Y, Lin J, Rangwala H, Mittu R (2015) A machine learning approach to false alarm detection for critical arrhythmia alarms. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 202–207
Wang X, Lin J, Patel N, Braun M (2016) A self-learning and online algorithm for time series anomaly detection, with application in cpu manufacturing. In: Proceedings of the 25th ACM international on conference on information and knowledge management, ACM, New York, CIKM ’16, pp 1823–1832
Wei L, Keogh E, Xi X (2006) Saxually explicit images: Finding unusual shapes. In: Sixth international conference on data mining (ICDM’06), pp 711–720
Xie Y, Huang J, Willett R (2013) Change-point detection for high-dimensional time series with missing data. IEEE J Sel Top Signal Process 7(1):12–27
Article Google Scholar
Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutor 12(2):159–170
Article Google Scholar

Download references

Author information

Authors and Affiliations

George Mason University, Fairfax, USA
Xing Wang & Jessica Lin
Intel Corporation, Chandler, AZ, USA
Nital Patel & Martin Braun

Authors

Xing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Lin
View author publications
You can also search for this author in PubMed Google Scholar
Nital Patel
View author publications
You can also search for this author in PubMed Google Scholar
Martin Braun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jessica Lin.

Additional information

Communicated by Eamonn Keogh.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Lin, J., Patel, N. et al. Exact variable-length anomaly detection algorithm for univariate and multivariate time series. Data Min Knowl Disc 32, 1806–1844 (2018). https://doi.org/10.1007/s10618-018-0569-7

Download citation

Received: 16 October 2017
Accepted: 22 April 2018
Published: 31 July 2018
Issue Date: November 2018
DOI: https://doi.org/10.1007/s10618-018-0569-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exact variable-length anomaly detection algorithm for univariate and multivariate time series

Abstract

Access this article

Similar content being viewed by others

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

A survey of methods for time series change point detection

Evaluating time series forecasting models: an empirical study on performance estimation methods

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exact variable-length anomaly detection algorithm for univariate and multivariate time series

Abstract

Access this article

Similar content being viewed by others

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

A survey of methods for time series change point detection

Evaluating time series forecasting models: an empirical study on performance estimation methods

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation