Finding the most unusual time series subsequence: algorithms and applications

Keogh, Eamonn; Lin, Jessica; Lee, Sang-Hee; Herle, Helga Van

doi:10.1007/s10115-006-0034-6

Finding the most unusual time series subsequence: algorithms and applications

Regular Paper
Published: 23 November 2006

Volume 11, pages 1–27, (2007)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Eamonn Keogh¹,
Jessica Lin²,
Sang-Hee Lee³ &
…
Helga Van Herle⁴

1321 Accesses
120 Citations
Explore all metrics

Abstract

In this work we introduce the new problem of finding time seriesdiscords. Time series discords are subsequences of longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. While discords have many uses for data mining, they are particularly attractive as anomaly detectors because they only require one intuitive parameter (the length of the subsequence) unlike most anomaly detection algorithms that typically require many parameters. While the brute force algorithm to discover time series discords is quadratic in the length of the time series, we show a simple algorithm that is three to four orders of magnitude faster than brute force, while guaranteed to produce identical results. We evaluate our work with a comprehensive set of experiments on diverse data sources including electrocardiograms, space telemetry, respiration physiology, anthropological and video datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introducing the contrast profile: a novel time series primitive that allows real world classification

Article 17 March 2022

MERLIN++: parameter-free discovery of time series anomalies

Article 16 January 2023

Introducing time series snippets: a new primitive for summarizing long time series

Article 02 July 2020

References

Bentley JL, Sedgewick R (1997) Fast algorithms for sorting and searching strings. In: Proceedings of the 8th annual ACM-SIAM symposium on discrete algorithms, pp 360–369
Chen Z, Fu A, Tang J (2003) On complementarity of cluster and outlier detection schemes. In: Proceedings of data warehousing and knowledge discovery (DaWaK 2003), pp 234–243
Chiu B, Keogh E, Lonardi S (2004) Probabilistic discovery of time series motifs. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 493–498
Coerman TH, Leiserson CE, Rivest RL, et al. (2001) Introduction to algorithms, 2nd edn. MIT Press, Cambridge, MA
Google Scholar
Dasgupta D, Forrest S (1996) Novelty detection in time series data using ideas from immunology. In: Proceedings of the 5th international conference on intelligent systems, pp 87–92
Duchene F, Garbayl C, Rialle V (2004) Mining heterogeneous multivariate time-series for learning meaningful patterns: application to home health telecare. Laboratory TIMC-IMAG, Facult'e de m'edecine de Grenoble, France
Fleagle JG (1999) Primate adaptation and evolution. Academic Press, San Diego, CA
Google Scholar
Gionis A, Mannila H (2003) Finding recurrent sources in sequences. In: Proceedings of the 7th annual international conference on research in computational molecular biology (RECOMB 2003), pp 123–130
Keogh E (2005) Availabe via http://www.cs.ucr.edu/~eamonn/discords/
Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 102–111
Keogh E, Lonardi S, Ratanamahatana C (2004) Towards parameter-free data mining. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 206–215
Kitaguchi S (2004) Extracting feature based on motif from a chronic hepatitis dataset. In: Proceedings of the 18th annual conference of the Japanese society for artificial intelligence (JSAI)
Knorr E, Ng R, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3/4):237–253
Article Google Scholar
Kumar N, Lolla N, Keogh E, et al. (2005) Time-series bitmaps: a practical visualization tool for working with large time series databases. In: Proceedings of the 5th SIAM international conference on data mining, pp 531–535
Lanctot JK, Li M, Ma B, et al. (2003) Distinguishing string selection problems. Inf Comput 185(1):41–55
Article MATH MathSciNet Google Scholar
Lin J, Keogh E, Lonardi S, et al. (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11
Lin J, Keogh E, Lonardi S, et al. (2004) Visually mining and monitoring massive time series. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 460–469
Ma J, Perkins S (2003) Online novelty detection on temporal sequences. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 613–618
Ratanamahatana C, Keogh E (2004) Making time-series classification more accurate using learned constraints. In: Proceedings of the 4th SIAM international conference on data mining, pp 11–22
Ratanamahatana C, Keogh E (2005) Three myths about dynamic time warping. In: Proceedings of the 5th SIAM international conference on data mining, pp 506–510
Rombo S, Terracina G (2004) Discovering representative models in large time series databases. In: Proceedings of the 6th international conference on flexible query answering systems, pp 84–97
Ruzzo WL, Tompa M (1999) A linear time algorithm for finding all maximal scoring subsequences. In: Proceedings of the 7th international conference on intelligent systems for molecular biology, pp 234–241
Sadakane K (2000) Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Proceedings of the 11th international conference on algorithms and computation (ISAAC 2000), pp 410–421
Tanaka Y, Uehara K (2004) Motif discovery algorithm from motion data. In: Proceedings of the 18th annual conference of the Japanese society for artificial intelligence (JSAI)
White TD (2000) Human osteology, 2nd edn. Academic Press, San Diego, New York, pp 63–64
Google Scholar
Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms. In: Proceedings of the 26th international conference on very large data bases, pp 385–394

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of California, Riverside, CA, USA
Eamonn Keogh
Department of Information and Software Engineering, George Mason University, Fairfax, VA, USA
Jessica Lin
Anthropology Department, University of California, Riverside, CA, USA
Sang-Hee Lee
David Geffen School of Medicine, University of California, Los Angeles, CA, USA
Helga Van Herle

Authors

Eamonn Keogh
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Lin
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Hee Lee
View author publications
You can also search for this author in PubMed Google Scholar
Helga Van Herle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eamonn Keogh.

Additional information

Eamonn Keogh is an Assistant Professor of computer science at the University of California, Riverside. His research interests include data mining, machine learning and information retrieval. Several of his papers have won best paper awards, including papers at SIGKDD and SIGMOD. Dr. Keogh is the recipient of a 5-year NSF Career Award for “Efficient discovery of previously unknown patterns and relationships in massive time series databases.”

Jessica Lin is an Assistant Professor of information and software engineering at George Mason University. She received her Ph.D. from the University of California, Riverside. Her research interests include data mining and informational retrieval.

Sang-Hee Lee is a paleoanthropologist at the University of California, Riverside. Her research interests include the evolution of human morphological variation and how different mechanisms (such as taxonomy, sex, age, and time) explain what is observed in fossil data. Dr. Lee obtained her Ph.D. in anthropology from the University of Michigan in 1999.

Helga Van Herle is an Assistant Clinical Professor of medicine at the Division of Cardiology of the Geffen School of Medicine at UCLA. She received her M.D. from UCLA in 1993; completed her residency in internal medicine at the New York Hospital (Cornell University, 1993–1996) and her cardiology fellowship at UCLA (1997–2001). Dr. Van Herle holds a M.Sc. in bioengineering from Columbia University (1987) and a B.Sc. in Chemical Engineering from UCLA (1985)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Keogh, E., Lin, J., Lee, SH. et al. Finding the most unusual time series subsequence: algorithms and applications. Knowl Inf Syst 11, 1–27 (2007). https://doi.org/10.1007/s10115-006-0034-6

Download citation

Received: 30 November 2005
Revised: 20 December 2005
Accepted: 20 February 2006
Published: 23 November 2006
Issue Date: January 2007
DOI: https://doi.org/10.1007/s10115-006-0034-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding the most unusual time series subsequence: algorithms and applications

Abstract

Access this article

Similar content being viewed by others

Introducing the contrast profile: a novel time series primitive that allows real world classification

MERLIN++: parameter-free discovery of time series anomalies

Introducing time series snippets: a new primitive for summarizing long time series

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Finding the most unusual time series subsequence: algorithms and applications

Abstract

Access this article

Similar content being viewed by others

Introducing the contrast profile: a novel time series primitive that allows real world classification

MERLIN++: parameter-free discovery of time series anomalies

Introducing time series snippets: a new primitive for summarizing long time series

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation