Skip to main content
Log in

Exact variable-length anomaly detection algorithm for univariate and multivariate time series

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The problem of anomaly detection in time series has received a lot of attention in the past two decades. However, existing techniques cannot locate where the anomalies are within anomalous time series, or they require users to provide the length of potential anomalies. To address these limitations, we propose a self-learning online anomaly detection algorithm that automatically identifies anomalous time series, as well as the exact locations where the anomalies occur in the detected time series. In addition, for multivariate time series, it is difficult to detect anomalies due to the following challenges. First, anomalies may occur in only a subset of dimensions (variables). Second, the locations and lengths of anomalous subsequences may be different in different dimensions. Third, some anomalies may look normal in each individual dimension but different with combinations of dimensions. To mitigate these problems, we introduce a multivariate anomaly detection algorithm which detects anomalies and identifies the dimensions and locations of the anomalous subsequences. We evaluate our approaches on several real-world datasets, including two CPU manufacturing data from Intel. We demonstrate that our approach can successfully detect the correct anomalies without requiring any prior knowledge about the data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Similar content being viewed by others

Notes

  1. We use the whole time series to demonstrate the idea, but our approach also works in the scenario when points of time series come in a streaming fashion.

  2. Reflow soldering: https://en.wikipedia.org/wiki/Reflow_soldering.

  3. http://www.eigenvector.com/data/Etch/index.html.

References

  • Aggarwal CC, Yu PS (2010) On clustering massive text and categorical data streams. Knowl Inf Syst 24(2):171–196

    Article  Google Scholar 

  • Ahmed M, Baqqar M, Gu F, Ball AD (2012) Fault detection and diagnosis using principal component analysis of vibration data from a reciprocating compressor. In: Proceedings of 2012 UKACC international conference on control, pp 461–466

  • Baragona R, Battaglia F (2007) Outliers detection in multivariate time series by independent component analysis. Neural Comput 19(7):1962–1984

    Article  Google Scholar 

  • Begum N, Ulanova L, Wang J, Keogh E (2015) Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’15, pp 49–58

  • Budalakoti S, Srivastava AN, Akella R, Turkov E (2006) Anomaly detection in large sets of high-dimensional symbol sequences. Tech Rep

  • Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’10, pp 333–342

  • Chandola V, Cheboli D, Kumar V (2009) Detecting anomalies in a time series database. University of Minnesota, Tech Rep, Computer Science Department

  • Cheng H, Tan PN, Potter C, Klooster S (2009) Detection and characterization of anomalies in multivariate time series. In: Proceedings of the 2009 SIAM international conference on data mining, pp 413–424

    Chapter  Google Scholar 

  • Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, AAAI Press, KDD’96, pp 226–231

  • Galeano P, Pea D, Tsay RS (2006) Outlier detection in multivariate time series by projection pursuit. J Am Stat Assoc 101(474):654–669

    Article  MathSciNet  Google Scholar 

  • Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data. Synth Lect Data Min Knowl Discov 5(1):1–129

    Article  Google Scholar 

  • Hawkins DM (1980) Identification of outliers, vol 11. Springer, Dordrecht

    Book  Google Scholar 

  • He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9):1641–1650

    Article  Google Scholar 

  • Hyndman RJ, Wang E, Laptev N (2015) Large-scale unusual time series detection. In: 2015 IEEE international conference on data mining workshop (ICDMW), pp 1616–1619

  • Id T, Papadimitriou S, Vlachos M (2007) Computing correlation anomaly scores using stochastic nearest neighbors. In: Seventh IEEE international conference on data mining (ICDM 2007), pp 523–528

  • Izakian H, Pedrycz W (2014) Anomaly detection and characterization in spatial time series data: a cluster-centric approach. IEEE Trans Fuzzy Syst 22(6):1612–1624

    Article  Google Scholar 

  • Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50

    Article  Google Scholar 

  • Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8(2):154–177

    Article  Google Scholar 

  • Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286

    Article  Google Scholar 

  • Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE international conference on data mining (ICDM’05), pp 8

  • Keogh E, Lin J, Lee SH, Herle HV (2007) Finding the most unusual time series subsequence: algorithms and applications. Knowl Inf Syst 11(1):1–27

    Article  Google Scholar 

  • Laptev N, Amizadeh S, Flint I (2015) Generic and scalable framework for automated time-series anomaly detection. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’15, pp 1939–1947

  • Li Y, Lin J, Oates T (2012) Visualizing variable-length time series motifs. In: Proceedings of the 2012 SIAM international conference on data mining, pp 895–906

    Chapter  Google Scholar 

  • Li J, Pedrycz W, Jamal I (2017) Multivariate time series anomaly detection: a framework of hidden markov models. Appl Soft Comput 60(Supplement C):229–240

    Article  Google Scholar 

  • Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144

    Article  MathSciNet  Google Scholar 

  • Miljkovi D (2011) Fault detection methods: a literature survey. In: 2011 Proceedings of the 34th international convention MIPRO, pp 750–755

  • Nevill-Manning CG, Witten IH (1997) Identifying hierarchical structure in sequences: a linear-time algorithm. J Artif Intell Res 7(1):67–82

    Article  Google Scholar 

  • Pires AM, Santos-Pereira C (2005) Using clustering and robust estimators to detect outliers in multivariate data. In: Proceedings of the international conference on robust statistics

  • Pukelsheim F (1994) The three sigma rule. Am Stat 48(2):88–91

    MathSciNet  Google Scholar 

  • Qiu H, Liu Y, Subrahmanya NA, Li W (2012) Granger causality for time-series anomaly detection. In: 2012 IEEE 12th international conference on data mining, pp 1074–1079

  • Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part III. Springer, Berlin pp 468–472

    Google Scholar 

  • Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S (2015) Time series anomaly discovery with grammar-based compression. In: Proceedings of the 18th international conference on extending database technology, EDBT 2015, Brussels, Belgium, March 23–27, 2015, pp 481–492

  • Sequeira K, Zaki M (2002) Admit: anomaly-based data mining for intrusions. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’02, pp 386–395

  • Sun H, Bao Y, Zhao F, Yu G, Wang D (2004) Cd-trees: an efficient index structure for outlier detection. In: Li Q, Wang G, Feng L (eds) Advances in web-age information management: 5th international conference, WAIM 2004, Dalian, China, July 15–17, 2004. Springer, Berlin, pp 600–609

    Chapter  Google Scholar 

  • Wang H, Tang M, Park Y, Priebe CE (2014) Locality statistics for anomaly detection in time series of graphs. IEEE Trans Signal Process 62(3):703–717

    Article  MathSciNet  Google Scholar 

  • Wang X, Gao Y, Lin J, Rangwala H, Mittu R (2015) A machine learning approach to false alarm detection for critical arrhythmia alarms. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 202–207

  • Wang X, Lin J, Patel N, Braun M (2016) A self-learning and online algorithm for time series anomaly detection, with application in cpu manufacturing. In: Proceedings of the 25th ACM international on conference on information and knowledge management, ACM, New York, CIKM ’16, pp 1823–1832

  • Wei L, Keogh E, Xi X (2006) Saxually explicit images: Finding unusual shapes. In: Sixth international conference on data mining (ICDM’06), pp 711–720

  • Xie Y, Huang J, Willett R (2013) Change-point detection for high-dimensional time series with missing data. IEEE J Sel Top Signal Process 7(1):12–27

    Article  Google Scholar 

  • Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutor 12(2):159–170

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jessica Lin.

Additional information

Communicated by Eamonn Keogh.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Lin, J., Patel, N. et al. Exact variable-length anomaly detection algorithm for univariate and multivariate time series. Data Min Knowl Disc 32, 1806–1844 (2018). https://doi.org/10.1007/s10618-018-0569-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-018-0569-7

Keywords

Navigation