Skip to main content
Log in

Finding the most unusual time series subsequence: algorithms and applications

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this work we introduce the new problem of finding time seriesdiscords. Time series discords are subsequences of longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. While discords have many uses for data mining, they are particularly attractive as anomaly detectors because they only require one intuitive parameter (the length of the subsequence) unlike most anomaly detection algorithms that typically require many parameters. While the brute force algorithm to discover time series discords is quadratic in the length of the time series, we show a simple algorithm that is three to four orders of magnitude faster than brute force, while guaranteed to produce identical results. We evaluate our work with a comprehensive set of experiments on diverse data sources including electrocardiograms, space telemetry, respiration physiology, anthropological and video datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bentley JL, Sedgewick R (1997) Fast algorithms for sorting and searching strings. In: Proceedings of the 8th annual ACM-SIAM symposium on discrete algorithms, pp 360–369

  2. Chen Z, Fu A, Tang J (2003) On complementarity of cluster and outlier detection schemes. In: Proceedings of data warehousing and knowledge discovery (DaWaK 2003), pp 234–243

  3. Chiu B, Keogh E, Lonardi S (2004) Probabilistic discovery of time series motifs. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 493–498

  4. Coerman TH, Leiserson CE, Rivest RL, et al. (2001) Introduction to algorithms, 2nd edn. MIT Press, Cambridge, MA

    Google Scholar 

  5. Dasgupta D, Forrest S (1996) Novelty detection in time series data using ideas from immunology. In: Proceedings of the 5th international conference on intelligent systems, pp 87–92

  6. Duchene F, Garbayl C, Rialle V (2004) Mining heterogeneous multivariate time-series for learning meaningful patterns: application to home health telecare. Laboratory TIMC-IMAG, Facult'e de m'edecine de Grenoble, France

  7. Fleagle JG (1999) Primate adaptation and evolution. Academic Press, San Diego, CA

    Google Scholar 

  8. Gionis A, Mannila H (2003) Finding recurrent sources in sequences. In: Proceedings of the 7th annual international conference on research in computational molecular biology (RECOMB 2003), pp 123–130

  9. Keogh E (2005) Availabe via http://www.cs.ucr.edu/~eamonn/discords/

  10. Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 102–111

  11. Keogh E, Lonardi S, Ratanamahatana C (2004) Towards parameter-free data mining. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 206–215

  12. Kitaguchi S (2004) Extracting feature based on motif from a chronic hepatitis dataset. In: Proceedings of the 18th annual conference of the Japanese society for artificial intelligence (JSAI)

  13. Knorr E, Ng R, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3/4):237–253

    Article  Google Scholar 

  14. Kumar N, Lolla N, Keogh E, et al. (2005) Time-series bitmaps: a practical visualization tool for working with large time series databases. In: Proceedings of the 5th SIAM international conference on data mining, pp 531–535

  15. Lanctot JK, Li M, Ma B, et al. (2003) Distinguishing string selection problems. Inf Comput 185(1):41–55

    Article  MATH  MathSciNet  Google Scholar 

  16. Lin J, Keogh E, Lonardi S, et al. (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11

  17. Lin J, Keogh E, Lonardi S, et al. (2004) Visually mining and monitoring massive time series. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 460–469

  18. Ma J, Perkins S (2003) Online novelty detection on temporal sequences. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 613–618

  19. Ratanamahatana C, Keogh E (2004) Making time-series classification more accurate using learned constraints. In: Proceedings of the 4th SIAM international conference on data mining, pp 11–22

  20. Ratanamahatana C, Keogh E (2005) Three myths about dynamic time warping. In: Proceedings of the 5th SIAM international conference on data mining, pp 506–510

  21. Rombo S, Terracina G (2004) Discovering representative models in large time series databases. In: Proceedings of the 6th international conference on flexible query answering systems, pp 84–97

  22. Ruzzo WL, Tompa M (1999) A linear time algorithm for finding all maximal scoring subsequences. In: Proceedings of the 7th international conference on intelligent systems for molecular biology, pp 234–241

  23. Sadakane K (2000) Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Proceedings of the 11th international conference on algorithms and computation (ISAAC 2000), pp 410–421

  24. Tanaka Y, Uehara K (2004) Motif discovery algorithm from motion data. In: Proceedings of the 18th annual conference of the Japanese society for artificial intelligence (JSAI)

  25. White TD (2000) Human osteology, 2nd edn. Academic Press, San Diego, New York, pp 63–64

    Google Scholar 

  26. Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms. In: Proceedings of the 26th international conference on very large data bases, pp 385–394

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eamonn Keogh.

Additional information

Eamonn Keogh is an Assistant Professor of computer science at the University of California, Riverside. His research interests include data mining, machine learning and information retrieval. Several of his papers have won best paper awards, including papers at SIGKDD and SIGMOD. Dr. Keogh is the recipient of a 5-year NSF Career Award for “Efficient discovery of previously unknown patterns and relationships in massive time series databases.”

Jessica Lin is an Assistant Professor of information and software engineering at George Mason University. She received her Ph.D. from the University of California, Riverside. Her research interests include data mining and informational retrieval.

Sang-Hee Lee is a paleoanthropologist at the University of California, Riverside. Her research interests include the evolution of human morphological variation and how different mechanisms (such as taxonomy, sex, age, and time) explain what is observed in fossil data. Dr. Lee obtained her Ph.D. in anthropology from the University of Michigan in 1999.

Helga Van Herle is an Assistant Clinical Professor of medicine at the Division of Cardiology of the Geffen School of Medicine at UCLA. She received her M.D. from UCLA in 1993; completed her residency in internal medicine at the New York Hospital (Cornell University, 1993–1996) and her cardiology fellowship at UCLA (1997–2001). Dr. Van Herle holds a M.Sc. in bioengineering from Columbia University (1987) and a B.Sc. in Chemical Engineering from UCLA (1985)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Keogh, E., Lin, J., Lee, SH. et al. Finding the most unusual time series subsequence: algorithms and applications. Knowl Inf Syst 11, 1–27 (2007). https://doi.org/10.1007/s10115-006-0034-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-006-0034-6

Keywords

Navigation