Elsevier

Neurocomputing

Volume 129, 10 April 2014, Pages 556-569
Neurocomputing

A new similarity measure based on shape information for invariant with multiple distortions

https://doi.org/10.1016/j.neucom.2013.09.003Get rights and content

Abstract

Due to the characteristics of noise and volatility, two similar time series always appear in diverse kinds of distortions, which usually are considered as the combinations of the following basic transformations: noise, amplitude shift, amplitude scaling, temporal scaling, and linear drift. In this paper, a novel similarity measure (SIMshape) invariant to these basic distortions and any combinations of them is proposed. It is parameter-free and easy to implement. Specifically, a multi-scale shape approximation for time series based on Discrete Haar Wavelet Transform, key point extraction and symbolization is presented first; then, based on this proposed representation and a scale-weight factor, a robust similarity measure is proposed. The novelty of SIMshape lies in two aspects as follows: (a) symbolizing key points sequence extracted from approximate wavelet coefficients; (b) adding the scale-weight factor and shape similarity in the similarity criterion. To show the effectiveness and efficiency, SIMshape is compared with other popular methods Euclidean Distance (ED), LB_keogh, Complexity Invariant Distance (CID), and ASEAL (Approximate Shape Exchange ALgorithm) using two indices: the number of kinds of distortions and the degree of distortion. Obtained results show that compared with ED, CID, LB_keogh, and ASEAL, SIMshape has better robustness in synthetic data, and shows better performance in real time series classification.

Introduction

The research on similarity measure is one of the core aspects in time series data mining [1], [2], [3]. Almost every task of time series data mining requires a subtle notion of similarity between series [1], [2], [3], [5]. Due to the characteristics of noise and volatility, the two similar time series always appear in diverse kinds of distortions, which are usually seen as the combinations of the following five basic distortions: noise, amplitude scale, amplitude shifting, temporal scaling, and linear drift [3], [4], [6], [7], [8].

In recent years, hundreds of techniques have been designed to study the similarity measure with invariance under the mentioned basic distortions for time series [3], [4]. As a result, the similarity model has been extended in many different directions [3]: taking time warping into account [7], [8], [9], [11], [16], [17], [18], [19], [20], [21], [22], allowing amplitude shifting [7], [8], [22], allowing time series of different lengths [7], [8], [9], [11], [16], [17], [18], [19], [20], [21], [22], tolerating some degree of noise [7], [8], [10], [11], [15], [16], [17], [18], [21], and invariant to the complexity [6].

Many literatures often use the number of kinds of tolerable distortions to measure the performance of similarity measure [3], [6], [8]. The more kinds of distortions a similarity model tolerates, the more powerful the similarity model is. For example, according to the study results of [7], [8], ASEAL has been demonstrated to be superior to others used in the literature on ECG datasets for dealing with four basic distortions: noise, offset translation, amplitude scaling and time axis scaling. However, there are few literatures discussing the tolerable degree of a specific distortion in the evaluation of similarity measure; while in this paper, it is considered as an important index to measure the performance of similarity measure. Besides, most of the existing approaches take a toll to tune its parameters and compute [10], [11], [15], [16], [18], [20], [21], [22]. For instance, for EDR and LCSS measures [15], [16], [18], a threshold parameter is required to be set, which is difficult without a priori knowledge of the data.

Inspired by shape recognition, a novel similarity measure SIMshape is introduced to address multiple distortions in this paper. The number of kinds of distortions and the degree of distortion are used to measure the robustness of SIMshape. In order to provide a comprehensive validation, the experiments on the synthetic data and five real time series data from different domains have been conducted. The major contributions of this paper are the following:

  • To record the most salient features of the original time series from different scales, a new symbolic approximate representation for time series is introduced. The representation is obtained through Multi-scale Discrete Haar Wavelet Transform, key point extraction, and symbolization. Unlike the traditional approximation methods, such as Discrete Fourier Transform (DFT) [23], Symbolic Aggregate Approximation (SAX) [24], [25] and Piecewise Aggregate Approximation (PAA) [27], it does not need to preset any parameter. The symbolization technique significantly reduces dimensionality. The essential characteristics of the original time series is retained by the application of Multi-scale Discrete Haar Wavelet Transform and key point extraction retain. Therefore, the multi-scale shape information assures the efficiency and effectiveness of SIMshape.

  • To improve SIMshape robustness to various transformations, the scale-weight function for SIMshape is designed. It makes SIMshape emphasize the basic shape information of the original sequence, which is preserved in the coarse level. As the essential characteristics of time series are not altered by the degree of distortions mentioned, so the distortions have relatively little impact on the information in the coarse level. As a result, the robustness of SIMshape is improved by assigning bigger weighted values to the coarse level.

  • To measure the similarity between two time series, a novel similarity measure SIMshape, based on the multi-scale shape information and the scale-weight function, is presented. SIMshape is parameter-free and easy to implement. Through a set of objective tests on synthetic data sets and five real time series data sets from different application domains, it can be found that SIMshape is more robust to various deformations than LB_keogh [10], [11], ED, CID [6] and ASEAL [7], [8], and more accurate than other four methods when applied to classify real time series.

The rest of this paper is organized as follows. In Section 2, the current known basic distortions for time series are reviewed, and related methods are cited and commented. In Section 3, a new similarity measure and its specific implementation is put forward. In Section 4, experiments in synthetic time series and real-world time series have been conducted to evaluate the proposed method. In Section 5, conclusions and some potential future work are given.

Section snippets

Problem statement

Suppose that there are two time series Q and its small distortion Qd, which means that the degree of distortion cannot alter the nature of Q. Specifically, Q and Qd have the same basic shape information ignoring the subtle difference, and they seem very similar to the human eye.

In this paper, the following five basic transformations [3], [4], [6], [7], [8] are considered. As shown in Fig. 1, these basic transformations are defined as follows:

  • Noise: The noise distortion means that two time

The proposed method

The core contribution of this work is introduced in a position. To record the shape information in a different scale, a symbolic approximate representation based on multi-scale discrete wavelet transform and key points for time series is first proposed; then a new similar measure is defined, which is based on this representation and a scale-weight factor. The overview of the proposed method is shown in Fig. 2. To make the presentation of the proposed work clear, the description of various

Performance analysis

This section contains the experimental results to show the performance of SIMshape. To evaluate our proposed similarity SIMshape, two experiments are conducted on synthetic data sets and five real time series data sets from different domains. The synthetic data sets simulate the basic distortions and their deformation degree, and the real time series can reflect the combinations of five basic distortions. In the first experiment of the tolerance on five basic distortions, SIMshape is compared

Conclusions

The goal of this paper is to propose a robust measure of similarity that would be more robust to distortion. That is, if we have a sequence Q and modify it to sequence Qd by introducing distortions (such as noise, amplitude scaling, amplitude shifting, time scaling, linear shifting, and so on), the sequences Q and Qd should be considered reasonably similar through the judgement of the proposed similarity SIMshape. With this in mind, a novel similarity measure SIMshape is introduced to address

Acknowledgments

The author would like to thank the anonymous referees for their valuable critics and suggestions. Their comments greatly contributed to the quality enhancement of this work. This work is supported by the Natural Science Foundation of China (NSFC) under Grant no. 61174144, no. 61232018 and Grant no. 60874065.

Xiaoxu He received the Bachelor's degree in Computer Science from Shandong Normal University in 2009. She is to get the Ph.D. degree from Department of Computer Science and Technology, University of Science and Technology of China, in June 2014. Since September 2009, she has been a research in the Intelligent Qualitative and Virtual Reality Lab of University Science and Technology of China. Her research interest includes time series data mining (with emphasis on the representation and

References (55)

  • G.E.A.P.A. Batista et al.

    A Complexity-Invariant Distance Measure for Time Series

    (2011)
  • D. Berndt, J. Clifford, Using dynamic time warping to find patterns in time series, in: KDD Workshop, 1994, pp....
  • E. Keogh et al.

    Exact indexing of dynamic time warping

    Knowl. Inf. Syst.

    (2005)
  • E. Keogh, L. Wei, X. Xi, S.-H. Lee, M. Vlachos, LB_Keogh supports exact indexing of shapes under rotation invariance...
  • E. Keogh, A. Ratanamahatana, Everything you know about dynamic time warping is wrong, in: 3rd Workshop on Mining...
  • S. Salvador et al.

    Toward accurate dynamic time warping in linear time and space

    Intell. Data Anal.

    (2007)
  • C.A. Ratanamahatana, E. Keogh, Making Time-Series Classification More Accurate Using Learned Constraints,...
  • G. Das et al.
  • M. Vlachos et al.

    Indexing multidimensional time-series

    VLDB J.

    (2006)
  • F. Elias, in: G. Kostas, T. Yannis (Eds.), Index-Based Most Similar Trajectory Search, 2007, pp....
  • L. Chen, M.T. zsu, V. Oria, Robust and fast similarity search for moving object trajectories, in: Proceedings of the...
  • L. Chen, R. Ng, On the marriage of Lp-norms and edit distance, in: VLDB ’04: Proceedings of the 30th International...
  • J. Assfalg et al.

    Similarity search on time series based on threshold queries

    Lect. Notes Comput. Sci.

    (2006)
  • M.D. Morse, J.M. Patel, An efficient and accurate method for evaluating time series similarity, in: Proceedings of the...
  • C. Yueguo, M.A. Nascimento, O. Beng Chin, A.K.H. Tung, SpADe: on shape-based pattern detection in streaming time...
  • R. Agrawal et al.
  • J. Lin, E. Keogh, S. Lonardi, B. Chiu, A symbolic representation of time series, with implications for streaming...
  • Cited by (11)

    • Towards adequate prediction of prediabetes using spatiotemporal ECG and EEG feature analysis and weight-based multi-model approach

      2020, Knowledge-Based Systems
      Citation Excerpt :

      We considered data from the signal in three-time phases throughout the experiment, which was implemented in three feature learning methods: PCA, ICA, LASSO, and PAA. Several feature learning methods have been proposed to perform feature selection automatically; for example, the approximate coefficient for discrete wavelet decomposition [53,54]. The approach is similar to the use of the eigenvalues coefficient in PCA for feature decomposition.

    • Eliminating the Effects of Jump Phenomenon for WiFi Driving Behavior Recognition

      2022, ICEIEC 2022 - Proceedings of 2022 IEEE 12th International Conference on Electronics Information and Emergency Communication
    View all citing articles on Scopus

    Xiaoxu He received the Bachelor's degree in Computer Science from Shandong Normal University in 2009. She is to get the Ph.D. degree from Department of Computer Science and Technology, University of Science and Technology of China, in June 2014. Since September 2009, she has been a research in the Intelligent Qualitative and Virtual Reality Lab of University Science and Technology of China. Her research interest includes time series data mining (with emphasis on the representation and similarity measure), nonlinear complex time series analyses, and uncertainty knowledge discovery.

    Chenxi Shao received the Master's degree in Computer Science from University of Science and Technology of China (USTC), in 1995. He is now the Director of the Intelligent Qualitative and Virtual Reality Lab and an associate professor of school of computer science of USTC. His research interests mainly lie in the field of qualitative simulation, and non-stationary signal processing and non-linear complex system theory and their applications to biomedical and communication systems. He has published more than eighty research papers in those areas.

    Yan Xiong received the B.S., M.S. and PhD in Computer Science from University of Science and Technology of China (USTC), in 1983, 1986 and 1990, respectively. He had been a post-doctoral fellow in the Department of Computer Science and Communication, University of Missouri-Kansas City (UMKC) from 1995 to 1997. He is now the Director of the computer network and information security laboratory and a professor of school of computer science of USTC. His research interests mainly contain computer network, information security, mobile computing, spatiotemporal information retrieval, data mining, mobile networks, and distributed processing.

    View full text