Skip to main content
Log in

Time series representation and similarity based on local autopatterns

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Time series data mining has received much greater interest along with the increase in temporal data sets from different domains such as medicine, finance, multimedia, etc. Representations are important to reduce dimensionality and generate useful similarity measures. High-level representations such as Fourier transforms, wavelets, piecewise polynomial models, etc., were considered previously. Recently, autoregressive kernels were introduced to reflect the similarity of the time series. We introduce a novel approach to model the dependency structure in time series that generalizes the concept of autoregression to local autopatterns. Our approach generates a pattern-based representation along with a similarity measure called learned pattern similarity (LPS). A tree-based ensemble-learning strategy that is fast and insensitive to parameter settings is the basis for the approach. Then, a robust similarity measure based on the learned patterns is presented. This unsupervised approach to represent and measure the similarity between time series generally applies to a number of data mining tasks (e.g., clustering, anomaly detection, classification). Furthermore, an embedded learning of the representation avoids pre-defined features and an extraction step which is common in some feature-based approaches. The method generalizes in a straightforward manner to multivariate time series. The effectiveness of LPS is evaluated on time series classification problems from various domains. We compare LPS to eleven well-known similarity measures. Our experimental results show that LPS provides fast and competitive results on benchmark datasets from several domains. Furthermore, LPS provides a research direction and template approach that breaks from the linear dependency models to potentially foster other promising nonlinear approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Akl A, Valaee S (2010) Accelerometer-based gesture recognition via dynamic-time warping, affinity propagation, compressive sensing. In: 2010 IEEE International conference on acoustics speech and signal processing (ICASSP), pp 2270–2273

  • Bagnall A, Davis LM, Hills J, Lines J (2012) Transformation based ensembles for time series classification. In: SDM, vol. 12. SIAM, pp 307–318

  • Batista G, Keogh E, Tataw O, de Souza V (2014) Cid: an efficient complexity-invariant distance for time series. Data Min Knowl Discov 28(3):634–669. doi:10.1007/s10618-013-0312-3

    Article  MathSciNet  MATH  Google Scholar 

  • Baydogan MG (2013) Learned pattern similarity (LPS). homepage: www.mustafabaydogan.com/learned-pattern-similarity-lps.html

  • Baydogan MG, Runger G (2014) Learning a symbolic representation for multivariate time series classification. Data Min Knowl Discov pp 1–23. doi:10.1007/s10618-014-0349-y

  • Baydogan MG, Runger G, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Anal Mach Intell 35(11):2796–2802

    Article  Google Scholar 

  • Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, Belmont

    MATH  Google Scholar 

  • Chakrabarti K, Keogh E, Mehrotra S, Pazzani M (2002) Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans Database Syst 27(2):188–228

    Article  Google Scholar 

  • Chen H, Tang F, Tino P, Yao X (2013) Model-based kernel for efficient time series analysis. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 392–400

  • Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD International conference on management of data, SIGMOD ’05. ACM, New York, pp 491–502. doi:10.1145/1066157.1066213

  • Cortina JM (1993) Interaction, nonlinearity, and multicollinearity: implications for multiple regression. J Manag 19(4):915–922

    Google Scholar 

  • Cuturi M (2011) Fast global alignment kernels. In: Getoor L, Scheffer T (ed) Proceedings of the 28th international conference on machine learning (ICML-11). ACM, New York, pp 929–936

  • CMU (2012) Graphics Lab Motion Capture Database: Homepage: mocap.cs.cmu.edu

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1:1542–1552

    Article  Google Scholar 

  • Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92. http://www.jstor.org/stable/2235971

  • Fu T (2011) A review on time series data mining. Eng Appl Artif Intell 24:164–181

    Article  Google Scholar 

  • Gaidon A, Harchaoui Z, Schmid C (2011) A time series kernel for action recognition. In: BMVC 2011-British machine vision conference. BMVA Press, Dundee, pp 63–1

  • Geurts P (2001) Pattern extraction for time series classification. Principles of data mining and knowledge discovery. Lecture Notes in Computer Science, vol 2168. Springer, Berlin, pp 115–127

  • Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42

    Article  MATH  Google Scholar 

  • Grabocka J, Schmidt-Thieme L (2014) Invariant time-series factorization. Data Min Knowl Discov 28(5—-6):1455–1479

    Article  MathSciNet  Google Scholar 

  • Han J, Kamber M, (2001) Data mining: concepts and techniques. The Morgan Kaufmann Series In Data Management Systems. Elsevier Books, Oxford. http://books.google.com/books?id=6hkR_ixby08C

  • Hastie T, Tibshirani R, Friedman J (2009) Elements of statistical learning. Springer, Berlin

    Book  MATH  Google Scholar 

  • Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Discov 28(4):851–881. doi:10.1007/s10618-013-0322-1

    Article  MathSciNet  MATH  Google Scholar 

  • Jaakkola T, Diekhans M, Haussler D (1999) Using the fisher kernel method to detect remote protein homologies. In: ISMB vol. 99, pp 149–158

  • Jebara T, Kondor R, Howard A (2004) Probability product kernels. J Mach Learn Res 5:819–844. http://dl.acm.org/citation.cfm?id=1005332.1016786

  • Jeong YS, Jeong MK, Omitaomu OA, (2011) Weighted dynamic time warping for time series classification. Pattern Recognit 44(9): 2231–2240. doi:10.1016/j.patcog.2010.09.022. http://www.sciencedirect.com/science/article/pii/S003132031000484X. Computer Analysis of Images and Patterns

  • Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371

    Article  MathSciNet  Google Scholar 

  • Keogh E, Lin J, Fu A (2005) HOT SAX: efficiently finding the most unusual time series subsequence. In: Proceedings of the fifth IEEE international conference on data mining, ICDM ’05. IEEE Computer Society, Washington, DC, pp 226–233

  • Keogh E, Wei L, Xi X, Lee SH, Vlachos M (2006) LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In: Proceedings of the 32nd international conference on very large data bases, VLDB ’06. VLDB Endowment, pp 882–893

  • Keogh E, Zhu Q, Hu BYH, Xi X, Wei L, Ratanamahatana CA (2011) The UCR time series classification/clustering. homepage:www.cs.ucr.edu/~eamonn/time_series_data/

  • Keogh EJ, Pazzani MJ (2001) Derivative dynamic time warping. In: SDM, vol. 1. SIAM, pp 5–7

  • Kuksa P, Pavlovic V (2010) Spatial representation for efficient sequence classification. In: 2010 20th International conference on pattern recognition (ICPR), pp 3320–3323

  • Latecki L, Megalooikonomou V, Wang Q, Lakaemper R, Ratanamahatana C, Keogh E (2005) Partial elastic matching of time series. In: Fifth IEEE international conference on data mining, pp 701–704

  • Liao TW (2005) Clustering of time series data-a survey. Pattern Recogn 38(11):1857–1874. doi:10.1016/j.patcog.2005.01.025

    Article  MATH  Google Scholar 

  • Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery. ACM Press, New York, pp 2–11

  • Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15:107–144

    Article  MathSciNet  Google Scholar 

  • Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315

    Article  Google Scholar 

  • Lines J, Bagnall A (2014) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565–592. doi:10.1007/s10618-014-0361-2

  • Liu J, Wang Z, Zhong L, Wickramasuriya J, Vasudevan V (2009) uWave: Accelerometer-based personalized gesture recognition and its applications. IEEE International conference on pervasive computing and communications, pp 1–9

  • Lowe DG (1995) Similarity metric learning for a variable-kernel classifier. Neural Comput 7(1):72–85

    Article  Google Scholar 

  • Marteau PF (2009) Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans Pattern Anal Mach Intell 31(2):306–318. doi:10.1109/TPAMI.2008.76

    Article  Google Scholar 

  • Nemenyi P (1963) Distribution-free multiple comparisons. Princeton University, Princeton

    Google Scholar 

  • Olszewski RT (2012)http://www.cs.cmu.edu/~bobski/. Accessed June 10

  • R Core Team (2014) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.http://www.R-project.org/

  • Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, pp 262–270

  • Rakthanmanon T, Keogh E, Fast Shapelets: A scalable algorithm for discovering time series shapelets, chap. 73, pp. 668–676. doi:10.1137/1.9781611972832.74

  • Ratanamahatana C, Keogh E (2005) Three myths about dynamic time warping data mining. In: Proceedings of SIAM international conference on data mining (SDM05), vol 21, pp 506–510

  • Ratanamahatana CA, Lin J, Gunopulos D, Keogh E, Vlachos M, Das G (2010) Mining time series data. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Berlin, pp 1049–1077

    Google Scholar 

  • Shieh J, Keogh E (2008) iSAX: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’08. ACM, New York, pp 623–631

  • Stefan A, Athitsos V, Das G (2013) The move-split-merge metric for time series. IEEE Trans Knowl Data Eng 25(6):1425–1438. doi:10.1109/TKDE.2012.88

    Article  Google Scholar 

  • Sübakan YC, Kurt B, Cemgil AT, Sankur B (2014) Probabilistic sequence clustering with spectral learning. Dig Signal Process 29(0):1–19. doi:10.1016/j.dsp.2014.02.014. http://www.sciencedirect.com/science/article/pii/S1051200414000517

  • Wang Q, Megalooikonomou V, Faloutsos C (2010) Time series analysis with multiple resolutions. Inf Syst 35(1):56–74

    Article  Google Scholar 

  • Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This research was partially supported by the Scientific and Technological Research Council of Turkey (TUBITAK) Grant Number 114C103.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mustafa Gokce Baydogan.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baydogan, M.G., Runger, G. Time series representation and similarity based on local autopatterns. Data Min Knowl Disc 30, 476–509 (2016). https://doi.org/10.1007/s10618-015-0425-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-015-0425-y

Keywords

Navigation