skip to main content
10.1145/3340964.3340969acmotherconferencesArticle/Chapter ViewAbstractPublication PagessstdConference Proceedingsconference-collections
research-article

Efficient Interval-focused Similarity Search under Dynamic Time Warping

Published:19 August 2019Publication History

ABSTRACT

Similarity search on time series from large temporal text corpora is interesting in many settings. Our use case is the Google Books Ngram corpus and historians interested in the changes of word frequencies over time. More specifically, users are interested in similarity search in a specific period of time, aka. interval-focused similarity search. Related work formalizes interval-focused similarity search, but the sparsely existing approaches are limited to metric distance measures, like the Euclidean distance. Most other approaches in this area, that address the usage of warping distance measures, focus on whole matching similarity search. In this work, we present a novel search tree that uses so-called time series envelopes to group objects. To speed up the tree traversal, our search tree approximates the envelopes based on the node height, i. e., envelopes are tighter further down in the tree. We combine this with various time series pruning techniques, mainly to reduce the number of expensive distance computations. Our experimental evaluation shows that this combination is worthwhile and indeed decisive for a significant speedup, compared to less sophisticated adaptations of known approaches. We, first, show that a combination of both pruning groups of time series and single time series outperforms the usage of a single pruning technique. Secondly, we compare the wall-clock run times of our data structure to existing approaches and determine a significant speed up for focused-interval similarity search queries on large temporal data sets, like the Google Books Ngram corpus.

References

  1. R. Agrawal, C. Faloutsos, and A. Swami. 1993. Efficient similarity search in sequence databases. In FODO '93. Springer Berlin Heidelberg, 69--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal, K. Lin, H. Sawhney, and K. Shim. 1995. Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. In VLDB '95. Morgan Kaufmann Publishers Inc., 490--501. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. Assent, R. Krieger, F. Afschari, and T. Seidl. 2008. The TS-tree: efficient time series search and retrieval. In EDBT '08. ACM Press, 252--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Aßfalg, H.-P. Kriegel, P. Kröger, P. Kunath, A. Pryakhin, and M. Renz. 2007. Interval-Focused Similarity Search in Time Series Databases. In DASFAA '07. Springer Berlin Heidelberg, 586--597. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. O. Brunner, W. Conze, and R. Koselleck (Eds.). 2004. Geschichtliche Grundbegriffe 1--8. Klett-Cotta Verlag.Google ScholarGoogle Scholar
  6. F. Chan, A. Fu, and C. Yu. 2003. Haar wavelets for efficient similarity search of time-series: with and without time warping. TKDE '03 (2003), 686--705. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K.-P. Chan and A. Fu. 1999. Efficient time series matching by wavelets. In ICDE '99. IEEE, 126--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Chiu, E. Keogh, and S. Lonardi. 2003. Probabilistic discovery of time series motifs. In KDD '03. ACM Press, 493--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Du, C. Jiang, W.-A. Tan, D. Lu, and D. Li. 2008. Effective Subsequence Matching in Compressed Time Series. In ICPCA '08. IEEE, 922--926.Google ScholarGoogle Scholar
  10. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. 1994. Fast subsequence matching in time-series databases. In SIGMOD '94. ACM Press, 419--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Fu, E. Keogh, L. Lau, C. Ratanamahatana, and R. Wong. 2007. Scaling and time warping in time series querying. The VLDB Journal (2007), 899--921. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M.-S. Gil, B.-S. Kim, M.-J. Choi, and Y.-S. Moon. 2015. Fast index construction for distortion-free subsequence matching in time-series databases. In BIGCOMP '15. IEEE, 130--135.Google ScholarGoogle ScholarCross RefCross Ref
  13. A. Guttman. 1984. R-trees: A Dynamic Index Structure for Spatial Searching. In SIGMOD '84. ACM Press, 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. 2001. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowledge and In formation Systems (2001), 263--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Keogh and C. Ratanamahatana. 2005. Exact indexing of dynamic time warping. Knowledge and Information Systems (2005), 358--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S.-W. Kim, S. Park, and W. Chu. 2001. An index-based approach for similarity search supporting time warping in large sequence databases. In ICDE '01. IEEE Computer Society, 607--614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Korn, H. Jagadish, and C. Faloutsos. 1997. Efficiently supporting ad hoc queries in large datasets of time sequences. In SIGMOD '97. ACM Press, 289--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Krawczak and G. Szkatula. 2010. Time series envelopes for classification. In IS '10. IEEE, 156--161.Google ScholarGoogle Scholar
  19. A.-J. Li, Y.-H. Liu, Y.-J. Qi, and S.-W. Luo. 2002. An approach for fast subsequence matching through KMP algorithm in time series databases. In ICMLC '02. IEEE, 1292--1295.Google ScholarGoogle Scholar
  20. Q. Li, B. Moon, and I. Lopez. 2004. Skyline index for time series data. TKDE '04 (2004), 669--684. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S.-H. Lim, H. Park, and S.-W. Kim. 2007. Using multiple indexes for efficient subsequence matching in time-series databases. Information Sciences (2007), 5691--5706. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Lin, J.-B. Michel, E. Aiden, J. Orwant, W. Brockman, and S. Petrov. 2012. Syntactic annotations for the Google Books Ngram Corpus. In ACL '12. Association for Computational Linguistics, 169--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X.-Y. Liu and C.-L. Ren. 2013. Fast subsequence matching under time warping in time-series databases. In ICMLC '13. IEEE, 1584--1590.Google ScholarGoogle Scholar
  24. V. Niennattrakul, P. Ruengronghirunya, and C. Ratanamahatana. 2010. Exact indexing for massive time series databases under time warping distance. Data Mining and Knowledge Discovery (2010), 509--541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Olsen. 2012. History in the Plural: An Introduction to the Work of Reinhart Koselleck. Berghahn Books.Google ScholarGoogle Scholar
  26. T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh. 2012. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD '12. ACM Press, 262--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh. 2013. Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping. TKDD '13 (2013), 1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Ritter, K. Gründer, and G. Gabriel (Eds.). 1971. Historisches Worterbuch der Philosophie (13 Volume Set) (German Edition). Schwabe.Google ScholarGoogle Scholar
  29. H. Sakoe and S. Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing (1978), 43--49.Google ScholarGoogle Scholar
  30. Y. Sakurai, M. Yoshikawa, and C. Faloutsos. 2005. FTW: fast similarity search under the time warping distance. In PODS '05. ACM Press, 326--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Schneider and H.-P. Kriegel. 1991. The TR*-tree: A new representation of polygonal objects supporting spatial queries and operations. In CG '91. Springer Berlin Heidelberg, 249--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Schäler, A. Grebhahn, R. Schröter, S. Schulze, V. Köppen, and G. Saake. 2013. QuEval: beyond high-dimensional indexing à la carte. VLDB Endowment (2013), 1654--1665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. E. Vidal, F. Casacuberta, J. Benedi, M. Lloret, and H. Rulot. 1988. On the verification of triangle inequality by dynamic time-warping dissimilarity measures. Speech Communication (1988), 67--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Weber, H.-J. Schek, and S. Blott. 1998. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In VLDB '98. Morgan Kaufmann Publishers Inc., 194--205. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Interval-focused Similarity Search under Dynamic Time Warping

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SSTD '19: Proceedings of the 16th International Symposium on Spatial and Temporal Databases
      August 2019
      245 pages

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 August 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader