Abstract
In this work we introduce the new problem of finding time seriesdiscords. Time series discords are subsequences of longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. While discords have many uses for data mining, they are particularly attractive as anomaly detectors because they only require one intuitive parameter (the length of the subsequence) unlike most anomaly detection algorithms that typically require many parameters. While the brute force algorithm to discover time series discords is quadratic in the length of the time series, we show a simple algorithm that is three to four orders of magnitude faster than brute force, while guaranteed to produce identical results. We evaluate our work with a comprehensive set of experiments on diverse data sources including electrocardiograms, space telemetry, respiration physiology, anthropological and video datasets.
Similar content being viewed by others
References
Bentley JL, Sedgewick R (1997) Fast algorithms for sorting and searching strings. In: Proceedings of the 8th annual ACM-SIAM symposium on discrete algorithms, pp 360–369
Chen Z, Fu A, Tang J (2003) On complementarity of cluster and outlier detection schemes. In: Proceedings of data warehousing and knowledge discovery (DaWaK 2003), pp 234–243
Chiu B, Keogh E, Lonardi S (2004) Probabilistic discovery of time series motifs. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 493–498
Coerman TH, Leiserson CE, Rivest RL, et al. (2001) Introduction to algorithms, 2nd edn. MIT Press, Cambridge, MA
Dasgupta D, Forrest S (1996) Novelty detection in time series data using ideas from immunology. In: Proceedings of the 5th international conference on intelligent systems, pp 87–92
Duchene F, Garbayl C, Rialle V (2004) Mining heterogeneous multivariate time-series for learning meaningful patterns: application to home health telecare. Laboratory TIMC-IMAG, Facult'e de m'edecine de Grenoble, France
Fleagle JG (1999) Primate adaptation and evolution. Academic Press, San Diego, CA
Gionis A, Mannila H (2003) Finding recurrent sources in sequences. In: Proceedings of the 7th annual international conference on research in computational molecular biology (RECOMB 2003), pp 123–130
Keogh E (2005) Availabe via http://www.cs.ucr.edu/~eamonn/discords/
Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 102–111
Keogh E, Lonardi S, Ratanamahatana C (2004) Towards parameter-free data mining. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 206–215
Kitaguchi S (2004) Extracting feature based on motif from a chronic hepatitis dataset. In: Proceedings of the 18th annual conference of the Japanese society for artificial intelligence (JSAI)
Knorr E, Ng R, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3/4):237–253
Kumar N, Lolla N, Keogh E, et al. (2005) Time-series bitmaps: a practical visualization tool for working with large time series databases. In: Proceedings of the 5th SIAM international conference on data mining, pp 531–535
Lanctot JK, Li M, Ma B, et al. (2003) Distinguishing string selection problems. Inf Comput 185(1):41–55
Lin J, Keogh E, Lonardi S, et al. (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11
Lin J, Keogh E, Lonardi S, et al. (2004) Visually mining and monitoring massive time series. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 460–469
Ma J, Perkins S (2003) Online novelty detection on temporal sequences. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 613–618
Ratanamahatana C, Keogh E (2004) Making time-series classification more accurate using learned constraints. In: Proceedings of the 4th SIAM international conference on data mining, pp 11–22
Ratanamahatana C, Keogh E (2005) Three myths about dynamic time warping. In: Proceedings of the 5th SIAM international conference on data mining, pp 506–510
Rombo S, Terracina G (2004) Discovering representative models in large time series databases. In: Proceedings of the 6th international conference on flexible query answering systems, pp 84–97
Ruzzo WL, Tompa M (1999) A linear time algorithm for finding all maximal scoring subsequences. In: Proceedings of the 7th international conference on intelligent systems for molecular biology, pp 234–241
Sadakane K (2000) Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Proceedings of the 11th international conference on algorithms and computation (ISAAC 2000), pp 410–421
Tanaka Y, Uehara K (2004) Motif discovery algorithm from motion data. In: Proceedings of the 18th annual conference of the Japanese society for artificial intelligence (JSAI)
White TD (2000) Human osteology, 2nd edn. Academic Press, San Diego, New York, pp 63–64
Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms. In: Proceedings of the 26th international conference on very large data bases, pp 385–394
Author information
Authors and Affiliations
Corresponding author
Additional information
Eamonn Keogh is an Assistant Professor of computer science at the University of California, Riverside. His research interests include data mining, machine learning and information retrieval. Several of his papers have won best paper awards, including papers at SIGKDD and SIGMOD. Dr. Keogh is the recipient of a 5-year NSF Career Award for “Efficient discovery of previously unknown patterns and relationships in massive time series databases.”
Jessica Lin is an Assistant Professor of information and software engineering at George Mason University. She received her Ph.D. from the University of California, Riverside. Her research interests include data mining and informational retrieval.
Sang-Hee Lee is a paleoanthropologist at the University of California, Riverside. Her research interests include the evolution of human morphological variation and how different mechanisms (such as taxonomy, sex, age, and time) explain what is observed in fossil data. Dr. Lee obtained her Ph.D. in anthropology from the University of Michigan in 1999.
Helga Van Herle is an Assistant Clinical Professor of medicine at the Division of Cardiology of the Geffen School of Medicine at UCLA. She received her M.D. from UCLA in 1993; completed her residency in internal medicine at the New York Hospital (Cornell University, 1993–1996) and her cardiology fellowship at UCLA (1997–2001). Dr. Van Herle holds a M.Sc. in bioengineering from Columbia University (1987) and a B.Sc. in Chemical Engineering from UCLA (1985)
Rights and permissions
About this article
Cite this article
Keogh, E., Lin, J., Lee, SH. et al. Finding the most unusual time series subsequence: algorithms and applications. Knowl Inf Syst 11, 1–27 (2007). https://doi.org/10.1007/s10115-006-0034-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-006-0034-6