Time series representation and similarity based on local autopatterns

Baydogan, Mustafa Gokce; Runger, George

doi:10.1007/s10618-015-0425-y

Time series representation and similarity based on local autopatterns

Published: 07 July 2015

Volume 30, pages 476–509, (2016)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Mustafa Gokce Baydogan¹ &
George Runger²

4556 Accesses
105 Citations
3 Altmetric
Explore all metrics

Abstract

Time series data mining has received much greater interest along with the increase in temporal data sets from different domains such as medicine, finance, multimedia, etc. Representations are important to reduce dimensionality and generate useful similarity measures. High-level representations such as Fourier transforms, wavelets, piecewise polynomial models, etc., were considered previously. Recently, autoregressive kernels were introduced to reflect the similarity of the time series. We introduce a novel approach to model the dependency structure in time series that generalizes the concept of autoregression to local autopatterns. Our approach generates a pattern-based representation along with a similarity measure called learned pattern similarity (LPS). A tree-based ensemble-learning strategy that is fast and insensitive to parameter settings is the basis for the approach. Then, a robust similarity measure based on the learned patterns is presented. This unsupervised approach to represent and measure the similarity between time series generally applies to a number of data mining tasks (e.g., clustering, anomaly detection, classification). Furthermore, an embedded learning of the representation avoids pre-defined features and an extraction step which is common in some feature-based approaches. The method generalizes in a straightforward manner to multivariate time series. The effectiveness of LPS is evaluated on time series classification problems from various domains. We compare LPS to eleven well-known similarity measures. Our experimental results show that LPS provides fast and competitive results on benchmark datasets from several domains. Furthermore, LPS provides a research direction and template approach that breaks from the linear dependency models to potentially foster other promising nonlinear approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Siamese Neural Networks: An Overview

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Article Open access 18 December 2020

A survey of methods for time series change point detection

Article 08 September 2016

References

Akl A, Valaee S (2010) Accelerometer-based gesture recognition via dynamic-time warping, affinity propagation, compressive sensing. In: 2010 IEEE International conference on acoustics speech and signal processing (ICASSP), pp 2270–2273
Bagnall A, Davis LM, Hills J, Lines J (2012) Transformation based ensembles for time series classification. In: SDM, vol. 12. SIAM, pp 307–318
Batista G, Keogh E, Tataw O, de Souza V (2014) Cid: an efficient complexity-invariant distance for time series. Data Min Knowl Discov 28(3):634–669. doi:10.1007/s10618-013-0312-3
Article MathSciNet MATH Google Scholar
Baydogan MG (2013) Learned pattern similarity (LPS). homepage: www.mustafabaydogan.com/learned-pattern-similarity-lps.html
Baydogan MG, Runger G (2014) Learning a symbolic representation for multivariate time series classification. Data Min Knowl Discov pp 1–23. doi:10.1007/s10618-014-0349-y
Baydogan MG, Runger G, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Anal Mach Intell 35(11):2796–2802
Article Google Scholar
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, Belmont
MATH Google Scholar
Chakrabarti K, Keogh E, Mehrotra S, Pazzani M (2002) Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans Database Syst 27(2):188–228
Article Google Scholar
Chen H, Tang F, Tino P, Yao X (2013) Model-based kernel for efficient time series analysis. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 392–400
Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD International conference on management of data, SIGMOD ’05. ACM, New York, pp 491–502. doi:10.1145/1066157.1066213
Cortina JM (1993) Interaction, nonlinearity, and multicollinearity: implications for multiple regression. J Manag 19(4):915–922
Google Scholar
Cuturi M (2011) Fast global alignment kernels. In: Getoor L, Scheffer T (ed) Proceedings of the 28th international conference on machine learning (ICML-11). ACM, New York, pp 929–936
CMU (2012) Graphics Lab Motion Capture Database: Homepage: mocap.cs.cmu.edu
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1:1542–1552
Article Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92. http://www.jstor.org/stable/2235971
Fu T (2011) A review on time series data mining. Eng Appl Artif Intell 24:164–181
Article Google Scholar
Gaidon A, Harchaoui Z, Schmid C (2011) A time series kernel for action recognition. In: BMVC 2011-British machine vision conference. BMVA Press, Dundee, pp 63–1
Geurts P (2001) Pattern extraction for time series classification. Principles of data mining and knowledge discovery. Lecture Notes in Computer Science, vol 2168. Springer, Berlin, pp 115–127
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Article MATH Google Scholar
Grabocka J, Schmidt-Thieme L (2014) Invariant time-series factorization. Data Min Knowl Discov 28(5—-6):1455–1479
Article MathSciNet Google Scholar
Han J, Kamber M, (2001) Data mining: concepts and techniques. The Morgan Kaufmann Series In Data Management Systems. Elsevier Books, Oxford. http://books.google.com/books?id=6hkR_ixby08C
Hastie T, Tibshirani R, Friedman J (2009) Elements of statistical learning. Springer, Berlin
Book MATH Google Scholar
Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Discov 28(4):851–881. doi:10.1007/s10618-013-0322-1
Article MathSciNet MATH Google Scholar
Jaakkola T, Diekhans M, Haussler D (1999) Using the fisher kernel method to detect remote protein homologies. In: ISMB vol. 99, pp 149–158
Jebara T, Kondor R, Howard A (2004) Probability product kernels. J Mach Learn Res 5:819–844. http://dl.acm.org/citation.cfm?id=1005332.1016786
Jeong YS, Jeong MK, Omitaomu OA, (2011) Weighted dynamic time warping for time series classification. Pattern Recognit 44(9): 2231–2240. doi:10.1016/j.patcog.2010.09.022. http://www.sciencedirect.com/science/article/pii/S003132031000484X. Computer Analysis of Images and Patterns
Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371
Article MathSciNet Google Scholar
Keogh E, Lin J, Fu A (2005) HOT SAX: efficiently finding the most unusual time series subsequence. In: Proceedings of the fifth IEEE international conference on data mining, ICDM ’05. IEEE Computer Society, Washington, DC, pp 226–233
Keogh E, Wei L, Xi X, Lee SH, Vlachos M (2006) LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In: Proceedings of the 32nd international conference on very large data bases, VLDB ’06. VLDB Endowment, pp 882–893
Keogh E, Zhu Q, Hu BYH, Xi X, Wei L, Ratanamahatana CA (2011) The UCR time series classification/clustering. homepage:www.cs.ucr.edu/~eamonn/time_series_data/
Keogh EJ, Pazzani MJ (2001) Derivative dynamic time warping. In: SDM, vol. 1. SIAM, pp 5–7
Kuksa P, Pavlovic V (2010) Spatial representation for efficient sequence classification. In: 2010 20th International conference on pattern recognition (ICPR), pp 3320–3323
Latecki L, Megalooikonomou V, Wang Q, Lakaemper R, Ratanamahatana C, Keogh E (2005) Partial elastic matching of time series. In: Fifth IEEE international conference on data mining, pp 701–704
Liao TW (2005) Clustering of time series data-a survey. Pattern Recogn 38(11):1857–1874. doi:10.1016/j.patcog.2005.01.025
Article MATH Google Scholar
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery. ACM Press, New York, pp 2–11
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15:107–144
Article MathSciNet Google Scholar
Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315
Article Google Scholar
Lines J, Bagnall A (2014) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565–592. doi:10.1007/s10618-014-0361-2
Liu J, Wang Z, Zhong L, Wickramasuriya J, Vasudevan V (2009) uWave: Accelerometer-based personalized gesture recognition and its applications. IEEE International conference on pervasive computing and communications, pp 1–9
Lowe DG (1995) Similarity metric learning for a variable-kernel classifier. Neural Comput 7(1):72–85
Article Google Scholar
Marteau PF (2009) Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans Pattern Anal Mach Intell 31(2):306–318. doi:10.1109/TPAMI.2008.76
Article Google Scholar
Nemenyi P (1963) Distribution-free multiple comparisons. Princeton University, Princeton
Google Scholar
Olszewski RT (2012)http://www.cs.cmu.edu/~bobski/. Accessed June 10
R Core Team (2014) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.http://www.R-project.org/
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, pp 262–270
Rakthanmanon T, Keogh E, Fast Shapelets: A scalable algorithm for discovering time series shapelets, chap. 73, pp. 668–676. doi:10.1137/1.9781611972832.74
Ratanamahatana C, Keogh E (2005) Three myths about dynamic time warping data mining. In: Proceedings of SIAM international conference on data mining (SDM05), vol 21, pp 506–510
Ratanamahatana CA, Lin J, Gunopulos D, Keogh E, Vlachos M, Das G (2010) Mining time series data. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Berlin, pp 1049–1077
Google Scholar
Shieh J, Keogh E (2008) iSAX: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’08. ACM, New York, pp 623–631
Stefan A, Athitsos V, Das G (2013) The move-split-merge metric for time series. IEEE Trans Knowl Data Eng 25(6):1425–1438. doi:10.1109/TKDE.2012.88
Article Google Scholar
Sübakan YC, Kurt B, Cemgil AT, Sankur B (2014) Probabilistic sequence clustering with spectral learning. Dig Signal Process 29(0):1–19. doi:10.1016/j.dsp.2014.02.014. http://www.sciencedirect.com/science/article/pii/S1051200414000517
Wang Q, Megalooikonomou V, Faloutsos C (2010) Time series analysis with multiple resolutions. Inf Syst 35(1):56–74
Article Google Scholar
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309
Article MathSciNet Google Scholar

Download references

Acknowledgments

This research was partially supported by the Scientific and Technological Research Council of Turkey (TUBITAK) Grant Number 114C103.

Author information

Authors and Affiliations

Department of Industrial Engineering, Boğaziçi University, Istanbul, Turkey
Mustafa Gokce Baydogan
School of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA
George Runger

Authors

Mustafa Gokce Baydogan
View author publications
You can also search for this author in PubMed Google Scholar
George Runger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mustafa Gokce Baydogan.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baydogan, M.G., Runger, G. Time series representation and similarity based on local autopatterns. Data Min Knowl Disc 30, 476–509 (2016). https://doi.org/10.1007/s10618-015-0425-y

Download citation

Received: 17 November 2014
Accepted: 19 June 2015
Published: 07 July 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10618-015-0425-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Time series representation and similarity based on local autopatterns

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

A survey of methods for time series change point detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Time series representation and similarity based on local autopatterns

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

A survey of methods for time series change point detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation