Elsevier

Pattern Recognition

Volume 44, Issue 9, September 2011, Pages 2231-2240
Pattern Recognition

Weighted dynamic time warping for time series classification

https://doi.org/10.1016/j.patcog.2010.09.022Get rights and content

Abstract

Dynamic time warping (DTW), which finds the minimum path by providing non-linear alignments between two time series, has been widely used as a distance measure for time series classification and clustering. However, DTW does not account for the relative importance regarding the phase difference between a reference point and a testing point. This may lead to misclassification especially in applications where the shape similarity between two sequences is a major consideration for an accurate recognition. Therefore, we propose a novel distance measure, called a weighted DTW (WDTW), which is a penalty-based DTW. Our approach penalizes points with higher phase difference between a reference point and a testing point in order to prevent minimum distance distortion caused by outliers. The rationale underlying the proposed distance measure is demonstrated with some illustrative examples. A new weight function, called the modified logistic weight function (MLWF), is also proposed to systematically assign weights as a function of the phase difference between a reference point and a testing point. By applying different weights to adjacent points, the proposed algorithm can enhance the detection of similarity between two time series. We show that some popular distance measures such as DTW and Euclidean distance are special cases of our proposed WDTW measure. We extend the proposed idea to other variants of DTW such as derivative dynamic time warping (DDTW) and propose the weighted version of DDTW. We have compared the performances of our proposed procedures with other popular approaches using public data sets available through the UCR Time Series Data Mining Archive for both time series classification and clustering problems. The experimental results indicate that the proposed approaches can achieve improved accuracy for time series classification and clustering problems.

Introduction

There has been a long-standing interest for time series classification and clustering in diverse applications such as pattern recognition, signal processing, biology, aerospace, finance, medicine, and meteorology [1], [2], [8], [12], [14], [18], [23], [25], [26], and thus some notable techniques have been developed including nearest neighbor classifier with a given distance measure, support vector machines, and neural networks [2], [4], [20]. The nearest neighbor classifiers with dynamic time warping (DTW) has shown to be effective for time series classification and clustering because of its non-linear mappings capability [7], [18], [25]. The DTW technique finds an optimal match between two sequences by allowing a non-linear mapping of one sequence to another, and minimizing the distance between two sequences [8], [7], [12], [22]. The sequences are "warped" non-linearly to determine their similarity independent of any non-linear variations in the time dimension. The technique was originally developed for speech recognition, but several researchers have evaluated its application in other domains and have developed several variants such as derivative DTW (DDTW) [11], [21], [22]. Fig. 1 shows the example of process of aligning two out of phase sequences by DTW.

The methodology for DTW is as follows. Assume a sequence A of length m, A=a1, a2, …, ai, …, am and a sequence B of length n, B=b1, b2, …, bj, …, bn. We create an m-by-n path matrix where the (ith, jth) element of matrix contains the distance between the two points ai and bj such that d(ai,bj)=||(aibj)||p, where ||·||p represents the lp norm. The warping path is typically subject to several constraints such as [22]

  • Endpoint constraint: the starting and ending points of warping path have to be the first and the last points of the path matrix, that is, u1=(a1, b1) and uk=(am, bn).

  • Continuity constraint: the path can advance one step at a time. That is, when uk=(ai, bj), uk+1=(ai+1, bj+1) where aiai+1≤1 and bibi+1≤1.

  • Monotonicity: the path does not decrease, i.e., uk=(ai, bj), uk+1=(ai+1, bj+1) where aiai+1 and bibi+1.

The best match between two sequences is the one with the lowest distance path after aligning one sequence to the other. Therefore, the optimal warping path can be found by using recursive formula given byDTWp(A,B)=γ(i,j)pwhere γ(i, j) is the cumulative distance described byγ(i,j)=|aibj|p+min{γ(i1,j1),γ(i1,j),γ(i,j1)}

As seen from Eq. (1), given a search space defined by two time series DTWp guarantees to find the warping path with the minimum cumulative distance among all possible warping paths that are valid in the search space. Thus, DTWp can be seen as the minimization of warped lp distance with time complexity of Ο(mn). By restraining a search space using constraint techniques such as Sakoe–Chuba Band [22] and Itakura Parallelogram [7], the time complexity of DTW can be reduced. Fig. 2 shows the warping matrix and optimal warping path between two sequences by DTW. In Fig. 2, a band with width w is used to constrain the warping.

However, the conventional DTW calculates the distance of all points between two series with equal weight of each point regardless of the phase difference between a reference point and a testing point. This may lead to misclassification especially in applications such as image retrieval where the shape similarity between two sequences is a major consideration for an accurate recognition, thus neighboring points between two sequences are more important than others. In other words, relative significance depending on the phase difference between points should be considered.

Therefore, this paper proposes a novel distance measure, called the weighted dynamic time warping (WDTW), which weights nearer neighbors more heavily depending on the phase difference between a reference point and a testing point. Because WDTW takes into consideration the relative importance of the phase difference between two points, this approach can prevent a point in a sequence from mapping the further points in another one and reduce unexpected singularities, which are alignments between a point of a series with multiple points of the other series. Some practical examples will be presented to graphically illustrate possible situations where WDTW clearly is a better approach.

In addition, a new weight function, called the modified logistic weight function (MLWF), is proposed to assign weights as a function of the phase difference between a reference point and a testing point. The proposed weight function extends the properties of logistic function to enhance the flexibility of setting bounds on weights. By applying different weights to adjacent points, the proposed algorithm can enhance the detection of similarity between series.

Finally, we extend the proposed idea to other variants of DTW such as derivative dynamic time warping (DDTW) and propose the weighted version of DDTW (WDDTW). We compare the performances of our proposed procedures with other popular approaches using public data sets available through UCR Time Series Data Mining Archive [13] for both time series classification and clustering problems. The experimental results show that the proposed procedures achieve improved accuracy for time series classification and clustering problems.

This remainder of the paper is organized as follows. In Section 2, we review some related literatures on times series classification and its methodologies. Section 3 explains the rationale of the advantage of the proposed idea. In Section 4, we describe the proposed WDTW and the modified logistic weight function for automatic time series classifications. The experimental results are presented and discussed in Section 5. The paper ends with concluding remarks and future works in Section 6.

Section snippets

Related works

As a result of the increasing importance of time series classification in diverse fields, lots of algorithms have been proposed for different applications. Husken and Stagge [6] utilized recurrent neural networks for time series classification and Guler and Ubeyli [4] presented the wavelet-based adaptive neuro-fuzzy inference system model for classification of ectroencephalogram (EEG) signals. Rath and Manmatha [21] used DTW for word image matching and compared the performance of DTW with other

Rationale for the performance advantages of WDTW

In this section, we will present the rationale underlying the proposed WDTW with practical examples to graphically illustrate situations where WDTW shows better performance than conventional DTW. The first example deals with automatic classification of defect patterns on semiconductor wafer maps. Fig. 3(a)–(d) shows four common classes of defect patterns on wafer maps. Jeong et al. [9] presented the effectiveness of using spatial correlograms (i.e., time series data) as new features for the

Proposed algorithm for time series classification

This section presents the proposed WDTW measure and a new weighting function, so called modified logistic weight function (MLWF) for time series data.

Performance comparison for time series classification

In this section, we perform extensive experiments to verify the effectiveness of the proposed algorithm for time series classification and clustering. All data sets, which include real-life time series, synthetic time series, and generic time series, come from different application domains and are obtained from “UCR Time Series Data Mining Archive” [13]. For the detailed descriptions of the data sets, please see Ratanamahatana and Keogh [20].

Euclidean distance, conventional DTW, and DDTW

Conclusion

A new distance measures for time series data, WDTW and WDDTW, are proposed to classify or cluster time series data set in diverse applications. Compared with the conventional DTW and DDTW, the proposed algorithm weighs each point according to the phase difference between a test point and a reference point. The proposed method is the generalized distance measure of Euclidean distance, DTW, and DDTW, and maximizes its effectiveness with optimal g value depending on different applications. A new

Acknowledgements

The authors acknowledge the support of Dr. Eamonn Keogh in providing us the experimental data set. Also, the authors would like to thank the anonymous reviewers for their valuable comments that improved our paper dramatically. The part of this work was supported by the National Science Foundation (NSF) Grant no. CMMI-0853894. Dr. Olufemi A. Omitaomu acts in his own independent capacity and not on behalf of UT-Battelle, LLC, or its affiliates or successors.

Young-Seon Jeong is now working toward his Ph.D. degree in the Department of Industrial and Systems Engineering, Rutgers University, New Brunswick, NJ. His research interests include spatial modeling of wafer map data, wavelet application for functional data analysis, and statistical modeling for intelligent transportation system

References (28)

  • Y.S. Jeong et al.

    Automatic identification of defect patterns in semiconductor wafer maps using spatial correlogram and dynamic time warping

    IEEE Transactions on Semiconductor Manufacturing

    (2008)
  • E. Keogh et al.

    Clustering of time series subsequences is meaningless: implications for previous and future research

    Knowledge and Information Systems

    (2005)
  • E. Keogh, M. Pazzani, Derivative dynamic time warping, in: Proceedings of the SIAM International Conference on Data...
  • E. Keogh et al.

    Exact indexing of dynamic time warping

    Knowledge and Information Systems

    (2005)
  • Cited by (594)

    • ESDTW: Extrema-based shape dynamic time warping

      2024, Expert Systems with Applications
    • Spatiotemporal analysis of bike-share demand using DTW-based clustering and predictive analytics

      2023, Transportation Research Part E: Logistics and Transportation Review
    View all citing articles on Scopus

    Young-Seon Jeong is now working toward his Ph.D. degree in the Department of Industrial and Systems Engineering, Rutgers University, New Brunswick, NJ. His research interests include spatial modeling of wafer map data, wavelet application for functional data analysis, and statistical modeling for intelligent transportation system

    Myong K. Jeong is an Assistant Professor in the Department of Industrial and Systems Engineering and the Center for Operation Research, Rutgers University, New Brunswick, NJ. His research interests include statistical data mining, recommendation systems, machine health monitoring, and sensor data analysis. He is currently an Associate Editor of IEEE Transactions on Automation Science and Engineering and International Journal of Quality, Statistics and Reliability.

    Olufemi A. Omitaomu is a Research Scientist at Geographic Information Science & Technology Group, Computational Sciences and Engineering Division in Oak Ridge National Laboratory Oak Ridge, TN. He is also an Adjunct Assistant Professor at Department of Industrial and Information Engineering in University of Tennessee, Knoxville, TN. His research areas include streaming and real-time data mining, signal processing, optimization techniques in data mining, infrastructure modeling and analysis, and disaster risk analysis in space and time.

    View full text