Abstract
Shapelets are discriminative subsequences of time series, usually embedded in shapelet-based decision trees. The enumeration of time series shapelets is, however, computationally costly, which in addition to the inherent difficulty of the decision tree learning algorithm to effectively handle high-dimensional data, severely limits the applicability of shapelet-based decision tree learning from large (multivariate) time series databases. This paper introduces a novel tree-based ensemble method for univariate and multivariate time series classification using shapelets, called the generalized random shapelet forest algorithm. The algorithm generates a set of shapelet-based decision trees, where both the choice of instances used for building a tree and the choice of shapelets are randomized. For univariate time series, it is demonstrated through an extensive empirical investigation that the proposed algorithm yields predictive performance comparable to the current state-of-the-art and significantly outperforms several alternative algorithms, while being at least an order of magnitude faster. Similarly for multivariate time series, it is shown that the algorithm is significantly less computationally costly and more accurate than the current state-of-the-art.
Notes
An earlier version of the algorithm, restricted to univariate time series, was presented together with a limited empirical evaluation in Karlsson et al. (2015).
All currently available datasets in both repositories have been included.
Available at the supporting website.
We note, however, that since gRSF and LTS are distributed over several cores, the comparison to the non-parallel fast shapelet algorithm is not entirely fair.
For the run-time experiment, we limit the total number of shapelets to 10,000 for computational convenience, i.e., reducing the cost of UFS by a factor d.
Note, however, that the LPS algorithm is run on a single core.
References
Bagnall A, Lines J (2014) An experimental evaluation of nearest neighbour time series classification. CoRR arXiv:1406.4757
Bankó Z (2012) Correlation based dynamic time warping of multivariate time series. Expert Syst Appl 39(17):12814–12823
Batista GE, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of SIAM, SIAM international conference on data mining, pp 699–710
Baydogan MG, Runger G (2014) Learning a symbolic representation for multivariate time series classification. Data Min Knowl Discov 29(2):400–422
Baydogan MG, Runger G (2015) Time series representation and similarity based on local autopatterns. Data Min Knowl Discov 30(2):1–34
Baydogan MG, Runger G, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Anal Mach Intell 35(11):2796–2802
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD workshop, knowledge discovery and data mining, pp 359–370
Boström H (2011) Concurrent learning of large-scale random forests. In: Proceedings of the Scandinavian conference on artificial intelligence, pp 20–29
Boström H (2012) Forests of probability estimation trees. Int J Pattern Recognit Artif Intell 26(02):125–147
Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
Cetin MS, Mueen A, Calhoun VD (2015) Shapelet ensemble for multi-dimensional time series. In: Proceedings of SIAM international conference on data mining, SIAM, pp 307–315
Chen L, Ng R (2004) On the marriage of \(l_p\)-norms and edit distance. In: Proceedings of the international conference on very large data bases, ACM, pp 792–803
Chen L, Özsu MT (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the ACM SIGMOD international conference on management of data, ACM, pp 491–502
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Inf Sci 239:142–153
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552
Friedman JH (1997) On bias, variance, 0/1–loss, and the curse-of-dimensionality. Data Min Knowl Discov 1(1):55–77
Fulcher BD, Jones NS (2014) Highly comparative feature-based time-series classification. IEEE Trans Knowl Data Eng 26(12):3026–3037
Gordon D, Hendler D, Rokach L (2012) Fast randomized model generation for shapelet-based time series classification. arXiv:1209.5038
Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 392–401
Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Discov 28(4):851–881
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Hu B, Chen Y, Keogh EJ (2013) Time series classification under more realistic assumptions. In: Proceedings of SIAM international conference on data mining, SIAM, pp 578–586
James GM (2003) Variance and bias for general loss functions. Mach Learn 51(2):115–135
Kampouraki A, Manis G, Nikou C (2009) Heartbeat time series classification with support vector machines. Inf Technol Biomed 13(4):512–518
Karlsson I, Papapetrou P, Boström H (2015) Forests of randomized shapelet trees. In: Proceedings of statistical learning and data sciences, Springer, pp 126–136
Keogh E, Zhu Q, Hu B, Y H, Xi X, Wei L, Ratanamahatana CA (2015) The ucr time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/
Lines J, Bagnall A (2014) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565–592
Lines J, Davis LM, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 289–297
Maier D (1978) The complexity of some problems on subsequences and supersequences. J ACM 25(2):322–336
Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1154–1162
Nanopoulos A, Alcock R, Manolopoulos Y (2001) Feature-based classification of time-series data. Int J Comput Res 10:49–61
Patri OP, Sharma AB, Chen H, Jiang G, Panangadan AV, Prasanna VK (2014) Extracting discriminative shapelets from heterogeneous sensor data. In: Proceedings of IEEE international conference on big data, IEEE, pp 1095–1104
Quinlan JR (1993) C4.5: programs for machine learning. Elsevier, Amsterdam
Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of SIAM international conference on data mining, SIAM
Ratanamahatana CA, Keogh E (2004) Everything you know about dynamic time warping is wrong. In: 3rd workshop on mining temporal and sequential data, pp 22–25
Rebbapragada U, Protopapas P, Brodley CE, Alcock C (2009) Finding anomalous periodic time series. Mach Learn 74(3):281–313
Rodríguez JJ, Alonso CJ (2004) Interval and dynamic time warping-based decision trees. In: Proceedings of the 2004 ACM Symposium on applied computing, ACM, pp 548–552
Rodríguez JJ, Alonso CJ, Maestro JA (2005) Support vector machines of interval-based features for time series classification. Knowl Based Syst 18(4):171–178
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. In: Transactions on ASSP, IEEE, pp 43–49
Schmidhuber J (2014) Deep learning in neural networks: an overview. arXiv:1404.7828
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423
Shokoohi-Yekta M, Wang J, Keogh E (2015) On the non-trivial generalization of dynamic time warping to the multi-dimensional case. In: Proceedings of SIAM international conference on data mining, SIAM, pp 289–297
Valentini G, Dietterich TG (2004) Bias-variance analysis of support vector machines for the development of svm-based ensemble methods. J Mach Learn Res 5:725–775
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309
Wistuba M, Grabocka J, Schmidt-Thieme L (2015) Ultra-fast shapelets for time series classification. CoRR arXiv:1503.05018
Wu Y, Chang EY (2004) Distance-function design and fusion for sequence data. In: Proceedings of ACM international conference on information and knowledge management, ACM, pp 324–333
Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 1033–1040
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 947–956
Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1–2):149–182
Acknowledgments
This work was partly supported by the project High-Performance Data Mining for Drug Effect Detection at Stockholm University, funded by Swedish Foundation for Strategic Research under Grant IIS11-0053.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Source code
Source code (MIT license) for the Generalized Random Shapelet Forest is available at Github.Footnote 8 Instructions and datasets can be found at the supporting website.Footnote 9
Additional information
Responsible editor: Thomas Gärtner, Mirco Nanni, Andrea Passerini, and Celine Robardet.
Appendices
Appendix 1: Decomposing the mean squared error in a forest
To investigate whether the observed difference in predictive performance between parameter configurations is mainly due to bias, variance, or both, the mean square error of the forest can be decomposed into those terms. In contrast to regression, where the decomposition of prediction error into bias and variance is well understood and widely used (James 2003), there is no general definition for classification. Hence, several decompositions of the average classification error rate into bias and variance have been proposed (see Friedman 1997; James 2003; Valentini and Dietterich 2004). It is, however, difficult to directly study the effect of bias and variance of randomized algorithms in the context of miss-classification error since a decrease in variability of the predictions can increase the average error, making the averaged model worse than the randomized. Instead, similar to the mean square error (MSE) of a regressor, the mean square error of a forest consisting of trees able to output a conditional class probability estimates for a given example can be computed as follows (Boström 2012). Given the class labels \(\{c_1, c_2, \ldots ,c_l\}\), let \(\bar{b}^{(i)}_k = (b^{(i)}_{k1}, b^{(i)}_{k2},\ldots ,b^{(i)}_{kl})\) be the probabilities assign by k:th random shapelet tree \(ST_k\) in the forest for a labeled time series \(z^{(i)}\). Furthermore, let \(\bar{c}^{(i)} = (\bar{c}^{(i)}_1, \bar{c}^{(i)}_2, \ldots , \bar{c}^{(i)}_l)\) represent the true class vector for a labeled time series \(z^{(i)}\), where \(\bar{c}^{(i)}_j\) is 1 if \(y^{(i)} = c_j\) and 0 otherwise, then the mean squared error (mse) of the forest can be defined as:
Given the mean class probability vector \(\bar{b}^{(i)}_{\mu }\) for the ith example, the mse can be composed into two parts, the bias (left) and variance (right):
Appendix 2: Internal estimates of strength and correlation
In the original Random Forest, Breiman (2001) proposes internal estimates for the strength (i.e., how accurate the individual classifiers are) and correlation (i.e., the dependence between classifiers) of the forest based on the out-of-bag instances not included during training. Using these measures, an upper bound can be derived for the generalization error. More precisely, for the case of random shapelet forests, each random shapelet tree \(ST_k \in \mathcal {R}\) can be seen as a base classifier function \(f_k\). Hence, we can define a set of p base classifier functions \(\{f_1(T), \ldots , f_p(T)\}\) as well as the ensemble classifier function \(f_{\mathcal {R}}(T)\). Let us denote the set of out-of-bag instances for a classifier \(ST_k\) as \(\mathcal {D}_{T_k}\). Furthermore, given a class label \(c\in \mathcal {C}\), Q(T, c) is an approximation function for \(P(f_{\mathcal {R}}(T) = c)\) corresponding to the out-of-bag proportion of votes for class c for the input time series T. More formally:
where \(\mathbf {1}(\cdot )\) is the indicator function. Then, the margin measures the extent to which the average number of votes for the right class exceeds the average number vote for any other class (Breiman 2001).
Definition 7
(margin function) The empirical margin function for a random shapelet forest, similar to a random forest, is
where \(P(\cdot )\) is estimated using \(Q(\cdot )\).
The expectation over the margin function gives a measure of how accurate, or strong, a set of classifiers are.
Definition 8
(strength) The strength of a random shapelet forest is the expected margin, and can be empirically estimated as the average over the training set:
By computing the variance of the margin, the correlation and interdependence between the individual classifiers can be estimated as the variance of the margin over the squared standard deviation of the random shapelet forest.
Definition 9
(correlation) The correlation of the random shapelet forest can be empirically estimated as:
where \(b_k\) is an out-of-bag estimate of \(P(f_k(T = y) = c)\), with
and \(\hat{b}_k\) is an out-of-bag estimate of \(P(f_{\mathcal {F}}(T) = \hat{c}_j)\)
where \(\hat{c}_j\) is estimated for every instance in the training set with Q(T, c) as
Assuming that the strength is \(s > 0\), Breiman (2001) showed that a (rather loose) upper bound on the generalization error of a random forest, and by similar argument a random shapelet forest, can be given by \(\frac{\bar{p}(1-s^2)}{s^2}\). The bound shows that the two ingredients involved in the generalization error for forests of randomized trees are the strength of the individual classifiers, and the dependencies between them in terms of the margin function (Breiman 2001). Furthermore, the correlation divided with the squared strength, \(\bar{p}/s^2\), provides a ratio, where smaller is better, that can be used to understand the functioning of the forest (Breiman 2001).
Rights and permissions
About this article
Cite this article
Karlsson, I., Papapetrou, P. & Boström, H. Generalized random shapelet forests. Data Min Knowl Disc 30, 1053–1085 (2016). https://doi.org/10.1007/s10618-016-0473-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-016-0473-y