Skip to main content
Log in

HIME: discovering variable-length motifs in large-scale time series

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Detecting repeated variable-length patterns, also called variable-length motifs, has received a great amount of attention in recent years. The state-of-the-art algorithm utilizes a fixed-length motif discovery algorithm as a subroutine to enumerate variable-length motifs. As a result, it may take hours or days to execute when the enumeration range is large. In this work, we introduce an approximate algorithm called hierarchical-based motif enumeration (HIME) to detect variable-length motifs with a large enumeration range in million-scale time series. We show in the experiments that the scalability of the proposed algorithm is significantly better than that of the state-of-the-art algorithm. Moreover, the motif length range detected by HIME is considerably larger than previous sequence matching-based approximate variable-length motif discovery approach. We demonstrate that HIME can efficiently detect meaningful variable-length motifs in long, real-world time series.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Begum N, Keogh E (2014) Rare time series motif discovery from unbounded streams. Proc VLDB Endow 8(2):149–160

    Article  Google Scholar 

  2. Buza K, Schmidt-Thieme L (2010) Motif-based classification of time series with Bayesian networks and SVMS. In: Fink A, Lausen B, Seidel W, Ultsch A (eds) Advances in data analysis, data handling and business intelligence. Springer, Berlin, pp 105–114

    Google Scholar 

  3. Castro N, Azevedo PJ (2010) Multiresolution motif discovery in time series. In: Proceedings of the 2010 SIAM international conference on data mining. SIAM, pp 665–676

  4. Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 493–498

  5. Gao Y, Li Q, Li X, Lin J, Rangwala H (2017) Trajviz: A tool for visualizing patterns and anomalies in trajectory. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 428–431. Springer

  6. Gao Y, Lin J (2017) Efficient discovery of time series motifs with large length range in million scale time series. In: Data Mining (ICDM), 2017 IEEE International Conference on, pp 1213–1222. IEEE

  7. Gao Y, Lin J (2018) Exploring variable-length time series motifs in one hundred million length scale. Data Min Knowl Discov 32(5):1200–1228

    Article  MathSciNet  Google Scholar 

  8. Gao Y, Lin J, Rangwala H (2016) Iterative grammar-based framework for discovering variable-length time series motifs. In: 2016 15th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 7–12

  9. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220

    Article  Google Scholar 

  10. Jessica Lin SL, Keogh E, Patel P (2002) Finding motifs in time series. In: Proceedings of the 2nd workshop on temporal data mining, pp 53–68

  11. Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: 2005 IEEE 5th international conference on data mining (ICDM), p 8

  12. Li Y, Lin J, Oates T (2012) Visualizing variable-length time series motifs. In: Proceedings of the 2012 SIAM international conference on data mining. SIAM, pp 895–906

  13. Li Y, Yiu ML, Gong Z, et al (2015) Quick-motif: an efficient and scalable framework for exact motif discovery. In: 2015 IEEE 31st international conference on data engineering (ICDE). IEEE, pp 579–590

  14. Lin J, Keogh E, Lonardi S, Lankford JP, Nystrom DM (2004) Visually mining and monitoring massive time series. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 460–469

  15. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144

    Article  MathSciNet  Google Scholar 

  16. Linardi M, Zhu Y, Palpanas T, Keogh E (2018) Matrix profile x: Valmod-scalable discovery of variable-length motifs in data series. In: Proceedings of the 2018 international conference on management of data. ACM, pp 1053–1066

  17. Lines J, Davis LM, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 289–297

  18. Liu B, Li J, Chen C, Tan W, Chen Q, Zhou M (2015) Efficient motif discovery for large-scale time series in healthcare. IEEE Trans Ind Inf 11(3):583–590

    Article  Google Scholar 

  19. Maaten Lvd, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605

    MATH  Google Scholar 

  20. Meng J, Yuan J, Hans M, Wu Y (2008) Mining motifs from human motion. In: Proceedings of EUROGRAPHICS, vol 8

  21. Minnen D, Starner T, Essa I, Isbell C (2006) Discovering characteristic actions from on-body sensor data. In: 2006 10th IEEE international symposium on wearable computers. IEEE, pp 11–18

  22. Mohammad Y, Nishida T (2014) Exact discovery of length-range motifs. In: Intelligent information and database systems, pp 23–32. Springer, Berlin

  23. Mohammad Y, Nishida T (2014) Scale invariant multi-length motif discovery. In: Modern advances in applied intelligence. Springer, Berlin, pp 417–426

  24. Mueen A (2013) Enumeration of time series motifs of all lengths. In: 2013 IEEE 13th international conference on data mining (ICDM). IEEE, pp 547–556

  25. Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1089–1098

  26. Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1154–1162

  27. Mueen A, Keogh EJ, Zhu Q, Cash S, Westover MB (2009) Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM international conference on data mining. SIAM, pp 473–484

  28. Mueen A, Zhu Y, Yeh M, Kamgar K, Viswanathan K, Gupta C, Keogh E (2015) The fastest similarity search algorithm for time series subsequences under Euclidean distance. http://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html

  29. Murray D, Liao J, Stankovic L, Stankovic V, Hauxwell-Baldwin R, Wilson C, Coleman M, Kane T, Firth S (2015) A data management platform for personalised real-time energy feedback. In: Proceedings of the 8th international conference on energy efficiency in domestic appliances and lighting, pp 1–15

  30. Nevill-Manning CG, Witten IH (1997) Identifying hierarchical structure in sequences: a linear-time algorithm. J Artif Intell Res (JAIR) 7:67–82

    Article  MATH  Google Scholar 

  31. Nunthanid P, Niennattrakul V, Ratanamahatana CA (2011) Discovery of variable length time series motif. In: 2011 8th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON). IEEE, pp 472–475

  32. Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 262–270

  33. Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 468–472

  34. Shokoohi-Yekta M, Chen Y, Campana B, Hu B, Zakaria J, Keogh E (2015) Discovery of meaningful rules in time series. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1085–1094

  35. Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, Cordum HS, Hillier L, Brown LG, Repping S, Pyntikova T, Ali J, Bieri T et al (2003) The male-specific region of the human y chromosome is a mosaic of discrete sequence classes. Nature 423(6942):825–837

    Article  Google Scholar 

  36. Tang H, Liao SS (2008) Discovering original motifs with different lengths from time series. Knowl Based Syst 21(7):666–671

    Article  Google Scholar 

  37. Wang X, Lin J, Senin P, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S (2016) Rpm: representative pattern mining for efficient time series classification. In: 19th international conference on extending database technology (EDBT), pp 185–196

  38. Bob P, Willem-Pier V, Sander P, Jonathon J (2005) Xeno-Canto. www.xeno-canto.org. Accessed 30 May 2005

  39. Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 947–956

  40. Yeh C-CM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 1317–1322

  41. Zhang X, Zhao L, Boedihardjo AP, Lu C-T, Ramakrishnan N (2017) Spatiotemporal event forecasting from incomplete hyper-local price data. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 507–516

  42. Zhu Y, Schall-Zimmerman Z, Senobari NS, Yeh C-CM, Funning G, Mueen A, Brisk P, Keogh EJ (2016) Matrix profile II: exploiting a novel algorithm and gpus to break the one hundred million barrier for time series motifs and joins. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 739–748

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yifeng Gao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, Y., Lin, J. HIME: discovering variable-length motifs in large-scale time series. Knowl Inf Syst 61, 513–542 (2019). https://doi.org/10.1007/s10115-018-1279-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1279-6

Keywords

Navigation