research-article

Efficient Interval-focused Similarity Search under Dynamic Time Warping

Authors:
Jens Willkomm

Karlsruhe Institute of Technology (KIT)

Karlsruhe Institute of Technology (KIT)
View Profile

,
Janek Bettinger

Karlsruhe Institute of Technology (KIT)

Karlsruhe Institute of Technology (KIT)
View Profile

,
Martin Schäler

Karlsruhe Institute of Technology (KIT)

Karlsruhe Institute of Technology (KIT)
View Profile

,
Klemens Böhm

Karlsruhe Institute of Technology (KIT)

Karlsruhe Institute of Technology (KIT)
View Profile

SSTD '19: Proceedings of the 16th International Symposium on Spatial and Temporal DatabasesAugust 2019Pages 130–139https://doi.org/10.1145/3340964.3340969

Published:19 August 2019Publication History

SSTD '19: Proceedings of the 16th International Symposium on Spatial and Temporal Databases

Pages 130–139

ABSTRACT

Similarity search on time series from large temporal text corpora is interesting in many settings. Our use case is the Google Books Ngram corpus and historians interested in the changes of word frequencies over time. More specifically, users are interested in similarity search in a specific period of time, aka. interval-focused similarity search. Related work formalizes interval-focused similarity search, but the sparsely existing approaches are limited to metric distance measures, like the Euclidean distance. Most other approaches in this area, that address the usage of warping distance measures, focus on whole matching similarity search. In this work, we present a novel search tree that uses so-called time series envelopes to group objects. To speed up the tree traversal, our search tree approximates the envelopes based on the node height, i. e., envelopes are tighter further down in the tree. We combine this with various time series pruning techniques, mainly to reduce the number of expensive distance computations. Our experimental evaluation shows that this combination is worthwhile and indeed decisive for a significant speedup, compared to less sophisticated adaptations of known approaches. We, first, show that a combination of both pruning groups of time series and single time series outperforms the usage of a single pruning technique. Secondly, we compare the wall-clock run times of our data structure to existing approaches and determine a significant speed up for focused-interval similarity search queries on large temporal data sets, like the Google Books Ngram corpus.

References

R. Agrawal, C. Faloutsos, and A. Swami. 1993. Efficient similarity search in sequence databases. In FODO '93. Springer Berlin Heidelberg, 69--84. Google ScholarDigital Library
R. Agrawal, K. Lin, H. Sawhney, and K. Shim. 1995. Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. In VLDB '95. Morgan Kaufmann Publishers Inc., 490--501. Google ScholarDigital Library
I. Assent, R. Krieger, F. Afschari, and T. Seidl. 2008. The TS-tree: efficient time series search and retrieval. In EDBT '08. ACM Press, 252--263. Google ScholarDigital Library
J. Aßfalg, H.-P. Kriegel, P. Kröger, P. Kunath, A. Pryakhin, and M. Renz. 2007. Interval-Focused Similarity Search in Time Series Databases. In DASFAA '07. Springer Berlin Heidelberg, 586--597. Google ScholarDigital Library
O. Brunner, W. Conze, and R. Koselleck (Eds.). 2004. Geschichtliche Grundbegriffe 1--8. Klett-Cotta Verlag.Google Scholar
F. Chan, A. Fu, and C. Yu. 2003. Haar wavelets for efficient similarity search of time-series: with and without time warping. TKDE '03 (2003), 686--705. Google ScholarDigital Library
K.-P. Chan and A. Fu. 1999. Efficient time series matching by wavelets. In ICDE '99. IEEE, 126--133. Google ScholarDigital Library
B. Chiu, E. Keogh, and S. Lonardi. 2003. Probabilistic discovery of time series motifs. In KDD '03. ACM Press, 493--498. Google ScholarDigital Library
Y. Du, C. Jiang, W.-A. Tan, D. Lu, and D. Li. 2008. Effective Subsequence Matching in Compressed Time Series. In ICPCA '08. IEEE, 922--926.Google Scholar
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. 1994. Fast subsequence matching in time-series databases. In SIGMOD '94. ACM Press, 419--429. Google ScholarDigital Library
A. Fu, E. Keogh, L. Lau, C. Ratanamahatana, and R. Wong. 2007. Scaling and time warping in time series querying. The VLDB Journal (2007), 899--921. Google ScholarDigital Library
M.-S. Gil, B.-S. Kim, M.-J. Choi, and Y.-S. Moon. 2015. Fast index construction for distortion-free subsequence matching in time-series databases. In BIGCOMP '15. IEEE, 130--135.Google ScholarCross Ref
A. Guttman. 1984. R-trees: A Dynamic Index Structure for Spatial Searching. In SIGMOD '84. ACM Press, 47--57. Google ScholarDigital Library
E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. 2001. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowledge and In formation Systems (2001), 263--286. Google ScholarDigital Library
E. Keogh and C. Ratanamahatana. 2005. Exact indexing of dynamic time warping. Knowledge and Information Systems (2005), 358--386. Google ScholarDigital Library
S.-W. Kim, S. Park, and W. Chu. 2001. An index-based approach for similarity search supporting time warping in large sequence databases. In ICDE '01. IEEE Computer Society, 607--614. Google ScholarDigital Library
F. Korn, H. Jagadish, and C. Faloutsos. 1997. Efficiently supporting ad hoc queries in large datasets of time sequences. In SIGMOD '97. ACM Press, 289--300. Google ScholarDigital Library
M. Krawczak and G. Szkatula. 2010. Time series envelopes for classification. In IS '10. IEEE, 156--161.Google Scholar
A.-J. Li, Y.-H. Liu, Y.-J. Qi, and S.-W. Luo. 2002. An approach for fast subsequence matching through KMP algorithm in time series databases. In ICMLC '02. IEEE, 1292--1295.Google Scholar
Q. Li, B. Moon, and I. Lopez. 2004. Skyline index for time series data. TKDE '04 (2004), 669--684. Google ScholarDigital Library
S.-H. Lim, H. Park, and S.-W. Kim. 2007. Using multiple indexes for efficient subsequence matching in time-series databases. Information Sciences (2007), 5691--5706. Google ScholarDigital Library
Y. Lin, J.-B. Michel, E. Aiden, J. Orwant, W. Brockman, and S. Petrov. 2012. Syntactic annotations for the Google Books Ngram Corpus. In ACL '12. Association for Computational Linguistics, 169--174. Google ScholarDigital Library
X.-Y. Liu and C.-L. Ren. 2013. Fast subsequence matching under time warping in time-series databases. In ICMLC '13. IEEE, 1584--1590.Google Scholar
V. Niennattrakul, P. Ruengronghirunya, and C. Ratanamahatana. 2010. Exact indexing for massive time series databases under time warping distance. Data Mining and Knowledge Discovery (2010), 509--541. Google ScholarDigital Library
N. Olsen. 2012. History in the Plural: An Introduction to the Work of Reinhart Koselleck. Berghahn Books.Google Scholar
T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh. 2012. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD '12. ACM Press, 262--270. Google ScholarDigital Library
T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh. 2013. Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping. TKDD '13 (2013), 1--31. Google ScholarDigital Library
J. Ritter, K. Gründer, and G. Gabriel (Eds.). 1971. Historisches Worterbuch der Philosophie (13 Volume Set) (German Edition). Schwabe.Google Scholar
H. Sakoe and S. Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing (1978), 43--49.Google Scholar
Y. Sakurai, M. Yoshikawa, and C. Faloutsos. 2005. FTW: fast similarity search under the time warping distance. In PODS '05. ACM Press, 326--337. Google ScholarDigital Library
R. Schneider and H.-P. Kriegel. 1991. The TR*-tree: A new representation of polygonal objects supporting spatial queries and operations. In CG '91. Springer Berlin Heidelberg, 249--263. Google ScholarDigital Library
M. Schäler, A. Grebhahn, R. Schröter, S. Schulze, V. Köppen, and G. Saake. 2013. QuEval: beyond high-dimensional indexing à la carte. VLDB Endowment (2013), 1654--1665. Google ScholarDigital Library
E. Vidal, F. Casacuberta, J. Benedi, M. Lloret, and H. Rulot. 1988. On the verification of triangle inequality by dynamic time-warping dissimilarity measures. Speech Communication (1988), 67--79. Google ScholarDigital Library
R. Weber, H.-J. Schek, and S. Blott. 1998. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In VLDB '98. Morgan Kaufmann Publishers Inc., 194--205. Google ScholarDigital Library

Index Terms

Efficient Interval-focused Similarity Search under Dynamic Time Warping
1. Information systems
  1. Information systems applications
    1. Data mining
      1. Nearest-neighbor search

Recommendations

Speeding up similarity search under dynamic time warping by pruning unpromising alignments

Similarity search is the core procedure for several time series mining tasks. While different distance measures can be used for this purpose, there is clear evidence that the Dynamic Time Warping (DTW) is the most suitable distance function for a wide ...
Read More
Similarity search for time series based on efficient warping measure
DM-IKM '12: Proceedings of the Data Mining and Intelligent Knowledge Management Workshop

Similarity search is one of the most important tasks in time series data mining, and similarity measure between time series is a basic work. Dynamic time warping (DTW) is often used to compute distance between two time series by warping time axes to ...
Read More
Approximate Similarity Search for Time Series Data Enhanced by Section Min-Hash
Similarity Search and Applications
Abstract
Dynamic Time Warping (DTW) is a well-known similarity measure between time series data. Although DTW can calculate the similarity between time series with different lengths, it is computationally expensive. Therefore, fast algorithms that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SSTD '19: Proceedings of the 16th International Symposium on Spatial and Temporal Databases
August 2019
245 pages
ISBN:9781450362801
DOI:10.1145/3340964
Editors:
Walid G. Aref,
Michela Bertolotto,
Panagiotis Bouros,
Christian S. Jensen,
Ahmed Mahmood,
Kjetil Nørvåg,
Dimitris Sacharidis,
Mohamed Sarwat
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 August 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data structure
dynamic time warping
similarity search
time series
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 98
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient Interval-focused Similarity Search under Dynamic Time Warping

SSTD '19: Proceedings of the 16th International Symposium on Spatial and Temporal Databases

ABSTRACT

References

Cited By

Index Terms

Recommendations

Speeding up similarity search under dynamic time warping by pruning unpromising alignments

Similarity search for time series based on efficient warping measure

Approximate Similarity Search for Time Series Data Enhanced by Section Min-Hash