Elsevier

Pattern Recognition

Volume 100, April 2020, 107122
Pattern Recognition

Dissimilarity-based representations for one-class classification on time series

https://doi.org/10.1016/j.patcog.2019.107122Get rights and content

Highlights

  • One-class classification experiment on 86 time series classification problems.

  • Dissimilarity-based representations in the context on one-class time series classification.

  • Evaluation of 12 dissimilarity measures and 8 prototype methods.

Abstract

In several real-world classification problems it can be impractical to collect samples from classes other than the one of interest, hence the need for classifiers trained on a single class. There is a rich literature concerning binary and multi-class time series classification but less concerning one-class learning.

In this study, we investigate the little-explored one-class time series classification problem. We represent time series as vectors of dissimilarities from a set of time series referred to as prototypes. Based on this approach, we evaluate a Cartesian product of 12 dissimilarity measures, and 8 prototype methods (strategies to select prototypes). Finally, a one-class nearest neighbor classifier is used on the dissimilarity-based representations (DBR).

Experimental results show that DBR are competitive overall when compared with a strong baseline on the data-sets of the UCR/UEA archive. Additionally, DBR enable dimensionality reduction, and visual exploration of data-sets.

Introduction

Time series are temporal ordered sequences of numbers, for instance the temperature of a room recorded every hour for one day. The way observations are ordered can determine a potentially infinite variety of shapes. It is understood that the shape of a time series in terms of its peaks and falls, ascending or descending slopes, is of fundamental importance in the context of time series classification. This is in line with the large research effort that is posed towards learning dissimilarity functions able to address invariance of transformations in the time-axis as per the widely used dynamic time warping (DTW) [1]. Consequently, by treating a time series as a mere vector, the modeler is likely to discard information needed to discriminate between different classes. While the order-dependent nature of time series limits the performance of off-the-shelf classifiers unable to detect temporal structures, it also fuels the rise of several dedicated ones [2] whose usefulness is sometimes demonstrated only on just a limited number of cases [3].

The current state of the art for time series classification heavily relies on the k-nearest neighbour (kNN) classifier which is a simple and effective algorithm. For instance, the hierarchical vote collective of transformation-based ensembles (HIVE-COTE) [4], the best performing algorithm [2] when evaluated on the data-sets of the UCR/UEA archive [3], relies on the 1-nearest neighbour (1NN) classifier for some of its components. Again, several works in the literature consider the 1NN classifier equipped with DTW as the standard benchmark method [2]. However, as a “lazy learner” [5], the nearest neighbor classifier does not have any training phase and it needs to scan through all the training samples each time we want to classify a new object. Moreover, some hyper-parameters like the neighborhood size k, or the warping window size for the DTW dissimilarity are usually selected by cross validation, requiring an extra computational cost.

The problem of representation is a core issue in pattern recognition [6], and can dramatically impact the classification performance as well as the computational resources required to solve a particular problem, and the interpretability of the solutions found. In this study, we investigate dissimilarity-based representations (DBR) [7] as a means to attain a vectorial representation of time series that, while preserving the information that allows classification, could enable scalable machine learning algorithms [8].

The characteristics of DBR are evaluated in the context of the little-explored one-class time series classification problem. In one-class classification, the goal is to learn a concept by using only examples of the same concept [9]. As a matter of fact, to distinguish an apple from another type of fruit humans do not need to be trained on all the types of fruit available on the planet. It is sufficient to see a few examples of the “class” apple to learn what is an apple and separate it from what is not. This approach can be useful in a variety of real world applications, for instance network security [10].

Our main contributions can be summarized as follows. There is no research using DBR for time series in the context of one-class classification, and very little using other representations (Section 2.3). We address this gap evaluating a variety of DBR derived through a Cartesian product of 12 dissimilarity measures and 8 “prototype methods”. Prototype methods are strategies designed to extract a subset of samples (prototypes) from a set of training samples. Subsequently, these prototypes are used along with a dissimilarity measure to derive the DBR. We benchmark the classification performance of DBR against raw data. We find that the performance of DBR and raw data is close. However, DBR have an advantage on problems where the class membership depends on the global time series shape. Also, we show that this representation enables dimensionality reduction with consequent savings in terms of computational time, and visual exploration of time series data-sets.

The rest of the paper is organized as follows. In Section 2, we review relevant scientific literature. In Section 3, we describe the dissimilarity measures and the prototype methods we evaluate. In Section 4, we describe the experiment design. In Section 5, we report and analyze our results. Finally, in Section 6 we summarize our conclusions and discuss future work.

Section snippets

Related work

In this section, we discuss the general idea of one-class classification (Section 2.1) and DBR (Section 2.2). A brief overview of the time series classification problem is outlined in Section 2.3. Works related to dissimilarity-based time series classification are discussed in Section 2.4.

Proposed method

We propose to represent a time series as a vector of dissimilarities from a set of carefully selected time series. Then we use a 1NN classifier equipped with the Euclidean distance (1NN-ED) on this representation. We discuss the functions we use to evaluate the dissimilarity between pairs of time series in Section 3.1, and the methods we use to choose prototypes in Section 3.2.

Data and experiment design

We provide an overview of the data-sets used in Section 4.1. We describe the classification experiment in Section 4.2.

Results

A summary of results of our experimental study is shown in Table 1. For each dissimilarity measure and prototype method results are averaged over all the data-sets. The associated standard deviations are in the range [11,16] as expected since some problems are far harder than others. We evaluate the impact of the number of prototypes on the classification performance of 1NN-ED by varying the hyper-parameter n={1,2,10%,20%,100%}. For instance, when n=1 we use only one prototype; when n=10% we

Conclusions

For the first time (to the best of our knowledge) we have conducted a comprehensive one-class classification experiment on the main archive of time series data-sets available in the literature (UCR/UEA archive).

We have investigated DBR as a methodology to derive a vector from a time series while maintaining information of its shape comparison to a set of carefully selected time series (prototypes) through a dissimilarity measure. In this regard, we have evaluated a Cartesian product of 12

Acknowledgements

This research is funded by ICON plc.

Stefano Mauceri is a PhD student at University College Dublin. He holds a BSc in Business Administration from Bocconi University in Milan. He also holds an MSc in Tourism Economics from Bicocca University in Milan, and an MSc in Business Analytics from UCD Michael Smurfit Business School in Dublin. His research interests include machine learning, anomaly detection, time series.

References (65)

  • M. Längkvist et al.

    A review of unsupervised feature learning and deep learning for time-series modeling

    Pattern Recognit. Lett.

    (2014)
  • T. Rakthanmanon et al.

    Addressing big data time series: mining trillions of time series subsequences under dynamic time warping

    ACM Trans. Knowl. Discov. Data (TKDD)

    (2013)
  • A. Bagnall et al.

    The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances

    Data Min. Knowl. Discov.

    (2017)
  • H.A. Dau, A. Bagnall, K. Kamgar, C.-C. M. Yeh, Y. Zhu, S. Gharghabi, C.A. Ratanamahatana, E. Keogh, The UCR time series...
  • J. Lines et al.

    Time series classification with hive-cote: the hierarchical vote collective of transformation-based ensembles

    ACM Trans. Knowl. Discov. Data

    (2018)
  • E.K. Garcia et al.

    Completely lazy learning

    IEEE Trans. Knowl. Data Eng.

    (2010)
  • S.C. Hoi, D. Sahoo, J. Lu, P. Zhao, Online learning: a comprehensive survey, arXiv:1802.02871v1...
  • H. He et al.

    Imbalanced Learning: Foundations, Algorithms, and Applications

    (2013)
  • V.L. Cao et al.

    Learning neural representations for network anomaly detection

    IEEE Trans. Cybern.

    (2018)
  • S.S. Khan et al.

    One-class classification: taxonomy of study and review of techniques

    Knowl. Eng. Rev.

    (2014)
  • V. Chandola et al.

    Anomaly detection: a survey

    ACM Comput. Surv. (CSUR)

    (2009)
  • S. Mauceri et al.

    Subject recognition using wrist-worn triaxial accelerometer data

    International Workshop on Machine Learning, Optimization, and Big Data

    (2017)
  • C. Phua, V. Lee, K. Smith, R. Gayler, A comprehensive survey of data mining-based fraud detection research,...
  • T. Schlegl et al.

    Unsupervised anomaly detection with generative adversarial networks to guide marker discovery

    International Conference on Information Processing in Medical Imaging

    (2017)
  • D.M. Tax et al.

    Support vector data description

    Mach. Learn.

    (2004)
  • B. Schölkopf et al.

    Support vector method for novelty detection

    Advances in Neural Information Processing Systems

    (2000)
  • S.S. Khan et al.

    Relationship between variants of one-class nearest neighbors and creating their accurate ensembles

    IEEE Trans. Knowl. Data Eng.

    (2018)
  • M. Neuhaus et al.

    Bridging the Gap Between Graph Edit Distance and Kernel Machines

    (2007)
  • L. Sørensen et al.

    Image dissimilarity-based quantification of lung disease from ct

    International Conference on Medical Image Computing and Computer-Assisted Intervention

    (2010)
  • R.J. Kate

    Using dynamic time warping distances as features for improved time series classification

    Data Min. Knowl. Discov.

    (2016)
  • Y. Li et al.

    Support vector based prototype selection method for nearest neighbor rules

    International Conference on Natural Computation

    (2005)
  • G.E. Batista et al.

    A complexity-invariant distance measure for time series

    Proceedings of the 2011 SIAM International Conference on Data Mining

    (2011)
  • Cited by (19)

    • Interpretable synthetic signals for explainable one-class time-series classification

      2024, Engineering Applications of Artificial Intelligence
    • OCSTN: One-class time-series classification approach using a signal transformation network into a goal signal

      2022, Information Sciences
      Citation Excerpt :

      Two comparative methods are selected as follows: Mauceri et al. proposed a dissimilarity-based representation with a one-class 1-nearest neighbor classifier [21], which was implemented using the source code provided by Mauceri et al. [45]. In their original paper, 12 distance metrics and 8 prototypes are used for the experiment [21].

    View all citing articles on Scopus

    Stefano Mauceri is a PhD student at University College Dublin. He holds a BSc in Business Administration from Bocconi University in Milan. He also holds an MSc in Tourism Economics from Bicocca University in Milan, and an MSc in Business Analytics from UCD Michael Smurfit Business School in Dublin. His research interests include machine learning, anomaly detection, time series.

    James Sweeney is a Lecturer in Statistics in the Department of Mathematics  & Statistics at the University of Limerick. He holds a BSc in Mathematical Sciences from UCD, Dublin, and a PhD in Statistics from Trinity College Dublin. His post-doctoral work was in medical imaging in University College Cork. His research interests include statistical climatology, image analysis, geospatial statistics, Bayesian experimental design, fraud identification and anomaly detection.

    James McDermott is a Lecturer in Computer Science in the National University of Ireland, Galway. He holds a BSc in Computer Science with Mathematics from the National University of Ireland, Galway, and a PhD in evolutionary computation and computer music from the University of Limerick. He has also worked on supercomputing in Compaq/Hewlett-Packard. His post-doctoral work was in evolutionary design and genetic programming in University College Dublin and Massachusetts Institute of Technology. His research interests are in evolutionary computing, artificial intelligence, and computational music and design. He has chaired EuroGP and EvoMUSART and is a member of the Genetic Programming and Evolvable Machines editorial board.

    View full text