Skip to main content
Log in

Optimizing dynamic time warping’s window width for time series data mining applications

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Dynamic Time Warping (DTW) is a highly competitive distance measure for most time series data mining problems. Obtaining the best performance from DTW requires setting its only parameter, the maximum amount of warping (w). In the supervised case with ample data, w is typically set by cross-validation in the training stage. However, this method is likely to yield suboptimal results for small training sets. For the unsupervised case, learning via cross-validation is not possible because we do not have access to labeled data. Many practitioners have thus resorted to assuming that “the larger the better”, and they use the largest value of w permitted by the computational resources. However, as we will show, in most circumstances, this is a naïve approach that produces inferior clusterings. Moreover, the best warping window width is generally non-transferable between the two tasks, i.e., for a single dataset, practitioners cannot simply apply the best w learned for classification on clustering or vice versa. In addition, we will demonstrate that the appropriate amount of warping not only depends on the data structure, but also on the dataset size. Thus, even if a practitioner knows the best setting for a given dataset, they will likely be at a lost if they apply that setting on a bigger size version of that data. All these issues seem largely unknown or at least unappreciated in the community. In this work, we demonstrate the importance of setting DTW’s warping window width correctly, and we also propose novel methods to learn this parameter in both supervised and unsupervised settings. The algorithms we propose to learn w can produce significant improvements in classification accuracy and clustering quality. We demonstrate the correctness of our novel observations and the utility of our ideas by testing them with more than one hundred publicly available datasets. Our forceful results allow us to make a perhaps unexpected claim; an underappreciated “low hanging fruit” in optimizing DTW’s performance can produce improvements that make it an even stronger baseline, closing most or all the improvement gap of the more sophisticated methods proposed in recent years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29

Similar content being viewed by others

Notes

  1. “Essentially,” since some clustering algorithms are not defined (or lose certain guarantees) for non-metric distance measures.

  2. NMI is an information-theoretic interpretation of clustering quality. It has values in range 0 and 1, the higher the better.

  3. For conditional entropy, smaller is better.

References

Download references

Acknowledgements

This material is based upon work supported by the Air Force Office of Scientific Research, Asian Office of Aerospace Research and Development (AOARD) under award number FA2386-16-1-4023. The Australian Research Council under grant DE170100037 and the UK Engineering and Physical Sciences Research Council (EPSRC) under grant number EP/M015807/1 have also supported this work. Finally, we acknowledge the funding from NSF IIS-1161997 II and NSF IIS-1510741. We also wish to take this opportunity to thank the donors of the data to the UCR Time Series Archive.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoang Anh Dau.

Additional information

Responsible editor: Jian Pei.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dau, H.A., Silva, D.F., Petitjean, F. et al. Optimizing dynamic time warping’s window width for time series data mining applications. Data Min Knowl Disc 32, 1074–1120 (2018). https://doi.org/10.1007/s10618-018-0565-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-018-0565-y

Keywords

Navigation