Skip to main content
Log in

Abstract

Shapelets are discriminative subsequences of time series, usually embedded in shapelet-based decision trees. The enumeration of time series shapelets is, however, computationally costly, which in addition to the inherent difficulty of the decision tree learning algorithm to effectively handle high-dimensional data, severely limits the applicability of shapelet-based decision tree learning from large (multivariate) time series databases. This paper introduces a novel tree-based ensemble method for univariate and multivariate time series classification using shapelets, called the generalized random shapelet forest algorithm. The algorithm generates a set of shapelet-based decision trees, where both the choice of instances used for building a tree and the choice of shapelets are randomized. For univariate time series, it is demonstrated through an extensive empirical investigation that the proposed algorithm yields predictive performance comparable to the current state-of-the-art and significantly outperforms several alternative algorithms, while being at least an order of magnitude faster. Similarly for multivariate time series, it is shown that the algorithm is significantly less computationally costly and more accurate than the current state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. An earlier version of the algorithm, restricted to univariate time series, was presented together with a limited empirical evaluation in Karlsson et al. (2015).

  2. All currently available datasets in both repositories have been included.

  3. http://fs.ismll.de/publicspace/LearningShapelets.

  4. Available at the supporting website.

  5. We note, however, that since gRSF and LTS are distributed over several cores, the comparison to the non-parallel fast shapelet algorithm is not entirely fair.

  6. For the run-time experiment, we limit the total number of shapelets to 10,000 for computational convenience, i.e., reducing the cost of UFS by a factor d.

  7. Note, however, that the LPS algorithm is run on a single core.

  8. http://github.com/briljant/mimir.

  9. http://people.dsv.su.se/~isak-kar/grsf/.

References

  • Bagnall A, Lines J (2014) An experimental evaluation of nearest neighbour time series classification. CoRR arXiv:1406.4757

  • Bankó Z (2012) Correlation based dynamic time warping of multivariate time series. Expert Syst Appl 39(17):12814–12823

    Article  Google Scholar 

  • Batista GE, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of SIAM, SIAM international conference on data mining, pp 699–710

  • Baydogan MG, Runger G (2014) Learning a symbolic representation for multivariate time series classification. Data Min Knowl Discov 29(2):400–422

    Article  MathSciNet  Google Scholar 

  • Baydogan MG, Runger G (2015) Time series representation and similarity based on local autopatterns. Data Min Knowl Discov 30(2):1–34

    MathSciNet  Google Scholar 

  • Baydogan MG, Runger G, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Anal Mach Intell 35(11):2796–2802

    Article  Google Scholar 

  • Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD workshop, knowledge discovery and data mining, pp 359–370

  • Boström H (2011) Concurrent learning of large-scale random forests. In: Proceedings of the Scandinavian conference on artificial intelligence, pp 20–29

  • Boström H (2012) Forests of probability estimation trees. Int J Pattern Recognit Artif Intell 26(02):125–147

    Article  MathSciNet  Google Scholar 

  • Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159

    Article  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MathSciNet  MATH  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MathSciNet  MATH  Google Scholar 

  • Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Cetin MS, Mueen A, Calhoun VD (2015) Shapelet ensemble for multi-dimensional time series. In: Proceedings of SIAM international conference on data mining, SIAM, pp 307–315

  • Chen L, Ng R (2004) On the marriage of \(l_p\)-norms and edit distance. In: Proceedings of the international conference on very large data bases, ACM, pp 792–803

  • Chen L, Özsu MT (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the ACM SIGMOD international conference on management of data, ACM, pp 491–502

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Inf Sci 239:142–153

    Article  MathSciNet  MATH  Google Scholar 

  • Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552

    Article  Google Scholar 

  • Friedman JH (1997) On bias, variance, 0/1–loss, and the curse-of-dimensionality. Data Min Knowl Discov 1(1):55–77

    Article  Google Scholar 

  • Fulcher BD, Jones NS (2014) Highly comparative feature-based time-series classification. IEEE Trans Knowl Data Eng 26(12):3026–3037

    Article  Google Scholar 

  • Gordon D, Hendler D, Rokach L (2012) Fast randomized model generation for shapelet-based time series classification. arXiv:1209.5038

  • Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 392–401

  • Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Discov 28(4):851–881

    Article  MathSciNet  MATH  Google Scholar 

  • Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  • Hu B, Chen Y, Keogh EJ (2013) Time series classification under more realistic assumptions. In: Proceedings of SIAM international conference on data mining, SIAM, pp 578–586

  • James GM (2003) Variance and bias for general loss functions. Mach Learn 51(2):115–135

    Article  MATH  Google Scholar 

  • Kampouraki A, Manis G, Nikou C (2009) Heartbeat time series classification with support vector machines. Inf Technol Biomed 13(4):512–518

    Article  Google Scholar 

  • Karlsson I, Papapetrou P, Boström H (2015) Forests of randomized shapelet trees. In: Proceedings of statistical learning and data sciences, Springer, pp 126–136

  • Keogh E, Zhu Q, Hu B, Y H, Xi X, Wei L, Ratanamahatana CA (2015) The ucr time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/

  • Lines J, Bagnall A (2014) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565–592

    Article  MathSciNet  Google Scholar 

  • Lines J, Davis LM, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 289–297

  • Maier D (1978) The complexity of some problems on subsequences and supersequences. J ACM 25(2):322–336

    Article  MathSciNet  MATH  Google Scholar 

  • Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1154–1162

  • Nanopoulos A, Alcock R, Manolopoulos Y (2001) Feature-based classification of time-series data. Int J Comput Res 10:49–61

    Google Scholar 

  • Patri OP, Sharma AB, Chen H, Jiang G, Panangadan AV, Prasanna VK (2014) Extracting discriminative shapelets from heterogeneous sensor data. In: Proceedings of IEEE international conference on big data, IEEE, pp 1095–1104

  • Quinlan JR (1993) C4.5: programs for machine learning. Elsevier, Amsterdam

    Google Scholar 

  • Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of SIAM international conference on data mining, SIAM

  • Ratanamahatana CA, Keogh E (2004) Everything you know about dynamic time warping is wrong. In: 3rd workshop on mining temporal and sequential data, pp 22–25

  • Rebbapragada U, Protopapas P, Brodley CE, Alcock C (2009) Finding anomalous periodic time series. Mach Learn 74(3):281–313

    Article  Google Scholar 

  • Rodríguez JJ, Alonso CJ (2004) Interval and dynamic time warping-based decision trees. In: Proceedings of the 2004 ACM Symposium on applied computing, ACM, pp 548–552

  • Rodríguez JJ, Alonso CJ, Maestro JA (2005) Support vector machines of interval-based features for time series classification. Knowl Based Syst 18(4):171–178

    Article  Google Scholar 

  • Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. In: Transactions on ASSP, IEEE, pp 43–49

  • Schmidhuber J (2014) Deep learning in neural networks: an overview. arXiv:1404.7828

  • Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423

    Article  MathSciNet  MATH  Google Scholar 

  • Shokoohi-Yekta M, Wang J, Keogh E (2015) On the non-trivial generalization of dynamic time warping to the multi-dimensional case. In: Proceedings of SIAM international conference on data mining, SIAM, pp 289–297

  • Valentini G, Dietterich TG (2004) Bias-variance analysis of support vector machines for the development of svm-based ensemble methods. J Mach Learn Res 5:725–775

    MathSciNet  MATH  Google Scholar 

  • Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309

    Article  MathSciNet  Google Scholar 

  • Wistuba M, Grabocka J, Schmidt-Thieme L (2015) Ultra-fast shapelets for time series classification. CoRR arXiv:1503.05018

  • Wu Y, Chang EY (2004) Distance-function design and fusion for sequence data. In: Proceedings of ACM international conference on information and knowledge management, ACM, pp 324–333

  • Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 1033–1040

  • Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 947–956

  • Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1–2):149–182

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work was partly supported by the project High-Performance Data Mining for Drug Effect Detection at Stockholm University, funded by Swedish Foundation for Strategic Research under Grant IIS11-0053.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Isak Karlsson.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Source code

Source code (MIT license) for the Generalized Random Shapelet Forest is available at Github.Footnote 8 Instructions and datasets can be found at the supporting website.Footnote 9

Additional information

Responsible editor: Thomas Gärtner, Mirco Nanni, Andrea Passerini, and Celine Robardet.

Appendices

Appendix 1: Decomposing the mean squared error in a forest

To investigate whether the observed difference in predictive performance between parameter configurations is mainly due to bias, variance, or both, the mean square error of the forest can be decomposed into those terms. In contrast to regression, where the decomposition of prediction error into bias and variance is well understood and widely used (James 2003), there is no general definition for classification. Hence, several decompositions of the average classification error rate into bias and variance have been proposed (see Friedman 1997; James 2003; Valentini and Dietterich 2004). It is, however, difficult to directly study the effect of bias and variance of randomized algorithms in the context of miss-classification error since a decrease in variability of the predictions can increase the average error, making the averaged model worse than the randomized. Instead, similar to the mean square error (MSE) of a regressor, the mean square error of a forest consisting of trees able to output a conditional class probability estimates for a given example can be computed as follows (Boström 2012). Given the class labels \(\{c_1, c_2, \ldots ,c_l\}\), let \(\bar{b}^{(i)}_k = (b^{(i)}_{k1}, b^{(i)}_{k2},\ldots ,b^{(i)}_{kl})\) be the probabilities assign by k:th random shapelet tree \(ST_k\) in the forest for a labeled time series \(z^{(i)}\). Furthermore, let \(\bar{c}^{(i)} = (\bar{c}^{(i)}_1, \bar{c}^{(i)}_2, \ldots , \bar{c}^{(i)}_l)\) represent the true class vector for a labeled time series \(z^{(i)}\), where \(\bar{c}^{(i)}_j\) is 1 if \(y^{(i)} = c_j\) and 0 otherwise, then the mean squared error (mse) of the forest can be defined as:

$$\begin{aligned} \mathrm {mse} = \frac{1}{n}\sum _{i=0}^{t} \frac{1}{p}\sum _{k=1}^{p}\left( \bar{b}^{(i)}_k-\bar{c}^{(i)}\right) ^2 \end{aligned}$$
(5)

Given the mean class probability vector \(\bar{b}^{(i)}_{\mu }\) for the ith example, the mse can be composed into two parts, the bias (left) and variance (right):

$$\begin{aligned} \mathrm {mse} = \frac{1}{n}\sum _{i=0}^{n} \left( \bar{c}^{(i)} - \bar{b}^{(i)}_{\mu } \right) ^2 + \frac{1}{n}\sum _{1=0}^{n} \frac{1}{T}\left( \bar{p}^{(i)} - \bar{b}^{(i)}_{\mu } \right) ^2 \end{aligned}$$
(6)

Appendix 2: Internal estimates of strength and correlation

In the original Random Forest, Breiman (2001) proposes internal estimates for the strength (i.e., how accurate the individual classifiers are) and correlation (i.e., the dependence between classifiers) of the forest based on the out-of-bag instances not included during training. Using these measures, an upper bound can be derived for the generalization error. More precisely, for the case of random shapelet forests, each random shapelet tree \(ST_k \in \mathcal {R}\) can be seen as a base classifier function \(f_k\). Hence, we can define a set of p base classifier functions \(\{f_1(T), \ldots , f_p(T)\}\) as well as the ensemble classifier function \(f_{\mathcal {R}}(T)\). Let us denote the set of out-of-bag instances for a classifier \(ST_k\) as \(\mathcal {D}_{T_k}\). Furthermore, given a class label \(c\in \mathcal {C}\), Q(Tc) is an approximation function for \(P(f_{\mathcal {R}}(T) = c)\) corresponding to the out-of-bag proportion of votes for class c for the input time series T. More formally:

$$\begin{aligned} Q(\mathcal {F}, c) = \frac{ \sum _{k=1}^{p} \mathbf {1}(ST_k(T) = c ; T \in \mathcal {D}_{T_k}) }{ |{T \in \mathcal {D}_{T_k}}| } , \end{aligned}$$
(7)

where \(\mathbf {1}(\cdot )\) is the indicator function. Then, the margin measures the extent to which the average number of votes for the right class exceeds the average number vote for any other class (Breiman 2001).

Definition 7

(margin function) The empirical margin function for a random shapelet forest, similar to a random forest, is

$$\begin{aligned} \mathrm {mr}(\mathcal {F}, T,c) = P(f_{\mathcal {R}}(T) = c) - \max _{\begin{array}{c} j=1 \\ j \ne c \end{array}}^{|\mathcal {C}|} \{f_{\mathcal {R}}(T) = c_j \} \end{aligned}$$
(8)

where \(P(\cdot )\) is estimated using \(Q(\cdot )\).

The expectation over the margin function gives a measure of how accurate, or strong, a set of classifiers are.

Definition 8

(strength) The strength of a random shapelet forest is the expected margin, and can be empirically estimated as the average over the training set:

$$\begin{aligned} s = \frac{1}{n}\sum _{i=1}^{n}mr(T_i, y_i) \end{aligned}$$
(9)

By computing the variance of the margin, the correlation and interdependence between the individual classifiers can be estimated as the variance of the margin over the squared standard deviation of the random shapelet forest.

Definition 9

(correlation) The correlation of the random shapelet forest can be empirically estimated as:

$$\begin{aligned} \bar{p}=\frac{\mathrm {var}(mr)}{\mathrm {sd}(\mathcal {F})^2} = \frac{ \frac{1}{n}\sum _{i=1}^{n}mr(T_i, y_i)^2 - s^2 }{ \left( \frac{1}{p}\sum _{k=1}^{p}\sqrt{b_k+\hat{b}_k+(b_k-\hat{b}_k)^2} \right) ^2 } \end{aligned}$$
(10)

where \(b_k\) is an out-of-bag estimate of \(P(f_k(T = y) = c)\), with

$$\begin{aligned} b_k = \frac{ \sum _{i=1}^{n}\mathbf {1}(f_k(T_i) = y_i ; T_i \in \mathcal {D}_{T_k}) }{ |\{T \in \mathcal {D}_{T_k}\}| }, \end{aligned}$$
(11)

and \(\hat{b}_k\) is an out-of-bag estimate of \(P(f_{\mathcal {F}}(T) = \hat{c}_j)\)

$$\begin{aligned} \hat{b}_k = \frac{ \sum _{i=1}^{n}\mathbf {1}(f_k(T_i) = \hat{c}_i) }{ |\{T\in \mathcal {D}_{T_k}\}| }, \end{aligned}$$
(12)

where \(\hat{c}_j\) is estimated for every instance in the training set with Q(Tc) as

$$\begin{aligned} \hat{c}_j = \mathrm {arg} \max _{\begin{array}{c} j=1 \\ j \ne c \end{array}}^{|\mathcal {C}|} Q(T, c_j). \end{aligned}$$
(13)

Assuming that the strength is \(s > 0\), Breiman (2001) showed that a (rather loose) upper bound on the generalization error of a random forest, and by similar argument a random shapelet forest, can be given by \(\frac{\bar{p}(1-s^2)}{s^2}\). The bound shows that the two ingredients involved in the generalization error for forests of randomized trees are the strength of the individual classifiers, and the dependencies between them in terms of the margin function (Breiman 2001). Furthermore, the correlation divided with the squared strength, \(\bar{p}/s^2\), provides a ratio, where smaller is better, that can be used to understand the functioning of the forest (Breiman 2001).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karlsson, I., Papapetrou, P. & Boström, H. Generalized random shapelet forests. Data Min Knowl Disc 30, 1053–1085 (2016). https://doi.org/10.1007/s10618-016-0473-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-016-0473-y

Keywords

Navigation