Unsupervised Interpretable Pattern Discovery in Time Series Using Autoencoders

Bascol, Kevin; Emonet, Rémi; Fromont, Elisa; Odobez, Jean-Marc

doi:10.1007/978-3-319-49055-7_38

Kevin Bascol¹⁸,
Rémi Emonet¹⁸,
Elisa Fromont¹⁸ &
…
Jean-Marc Odobez¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10029))

Included in the following conference series:

Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR)

2117 Accesses
6 Citations
3 Altmetric

Abstract

We study the use of feed-forward convolutional neural networks for the unsupervised problem of mining recurrent temporal patterns mixed in multivariate time series. Traditional convolutional autoencoders lack interpretability for two main reasons: the number of patterns corresponds to the manually-fixed number of convolution filters, and the patterns are often redundant and correlated. To recover clean patterns, we introduce different elements in the architecture, including an adaptive rectified linear unit function that improves patterns interpretability, and a group-lasso regularizer that helps automatically finding the relevant number of patterns. We illustrate the necessity of these elements on synthetic data and real data in the context of activity mining in videos.

You have full access to this open access chapter, Download conference paper PDF

Time Series Encodings with Temporal Convolutional Networks

TSI-GAN: Unsupervised Time Series Anomaly Detection Using Convolutional Cycle-Consistent Generative Adversarial Networks

Denoising Architecture for Unsupervised Anomaly Detection in Time-Series

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Unsupervised discovery of patterns in temporal data is an important data mining topic due to numerous application domains like finance, biology or video analysis. In some applications, the patterns are solely used as features for classification and thus the classification accuracy is the only criterion. This paper considers different applications where the patterns can also be used for data analysis, data understanding, and novelty or anomaly detection [4–6, 18].

Not all time series are of the same nature. In this work, we consider the difficult case of multivariate time series whose observations are the result of a combination of different recurring phenomena that can overlap. Examples include traffic videos where the activity of multiple cars causes the observed sequence of images [6], or aggregate power consumption where the observed consumption is due to a mixture of appliances [10]. Unlike many techniques from the data mining community, our aim is not to list all recurrent patterns in the data with their frequency but to reconstruct the entire temporal documents by means of a limited and unknown number of recurring patterns together with their occurrence times in the data. In this view, we want to un-mix multivariate time series to recover how they can be decomposed in terms of recurrent temporally-structured patterns. Following the conventions used in [6], we will call a temporal pattern a motif, and an input multivariate time series a temporal document.

Artificial neural networks (or deep learning architectures) have (re)become tremendously popular in the last decade due to their impressive, and so far not beaten, results in image classification, speech recognition and natural language processing. In particular, autoencoders are artificial neural networks used to learn a compressed, distributed representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. It is thus an unsupervised learning method whose (hidden) layers contain representations of the input data sufficiently powerful for compressing (and decompressing) the data while loosing as few information as possible. Given the temporal nature of your data, our pattern discovery task is fundamentally convolutional (the same network is applied at any instant and is thus time-shift invariant) since it needs to identify motifs whatever their time(s) of occurrence. To tackle this task, we will thus focus on a particular type of autoencoders, the convolutional ones. However, while well adapted for discriminative tasks like classification [1], the patterns captured by (convolutional) autoencoders are not fully interpretable and often correlated.

In this paper, we address the discovery of interpretable motifs using convolutional auto-encoders and make the following contributions:

we show that the interpretability of standard convolutional autoencoders is limited;
we introduce an adaptive rectified linear unit (AdaReLU) which allows hidden layers to capture clear occurrences of motifs,
we propose a regularization inspired by group-lasso to automatically select the number of filters in a convolutional neural net,
we show, through experiments on synthetic and real data, how these elements (and others) allow to recover interpretable motifs^{Footnote 1}.

It is important to note that some previous generative models [6, 21] have obtained very good results on this task. However, their extensions to semi-supervised settings (i.e. with partially labelled data) or hierarchical schemes are cumbursome to achieve. In contrast, in this paper, to solve the same modeling problem we present a radically different method which will lend itself to more flexible and systematic end-to-end training frameworks and extensions.

The paper is organized as follows. In Sect. 2, we clarify the link between our data mining technique and previous work. Section 3 gives the details of our method while Sect. 4 shows experiments both on synthetic and real data. We conclude and draw future directions in Sect. 5.

2 Related Work

Our paper shows how to use a popular method (autoencoders) to tackle a task (pattern discovery in time series) that has seldom been considered for this type of method. We thus briefly review other methods used in this context and then, other works that use neural networks for unsupervised time series modeling.

Unsupervised Pattern Discovery in Time Series. Traditional unsupervised approaches that deal with time series do not aim at modeling series but rather at extracting interesting pieces of the series that can be used as high level descriptions for direct analysis or as input features for other algorithms. In this category fall all the event-based (e.g. [7, 22, 23]), sequence [15] and trajectory mining methods [25]. On the contrary of the previously cited methods, we do not know in advance the occurrence time, type, length or number of (possibly) overlapping patterns that can be used to describe the entire multivariate time series. These methods cannot be directly used in our application context.

The generative methods for modeling time series assume an apriori model and estimate its parameters. In the precursor work of [16], the unsupervised problem of finding patterns was decomposed into two steps, a supervised step involving an oracle who identifies patterns and series containing such patterns and an EM-step where a model of the series is generated according to those patterns. In [13], the authors propose a functional independent component analysis method for finding linearly varying patterns of activation in the data. They assume the availability of pre-segmented data where the occurrence time of each possible pattern is known in advance. Authors of [10] address the discovery of overlapping patterns to disaggregate the energy level of electric consumption. They propose to use additive factorial hidden Markov models, assuming that the electrical signal is univariate and that the known devices (each one represented by one HMM) have a finite known number of states. This also imposes that the motif occurrences of one particular device can not overlap. The work of [6] proposes to extract an apriori unknown number of patterns and their possibly overlapping occurrences in documents using Dirichlet processes. The model automatically finds the number of patterns, their length and occurrence times by fitting infinite mixtures of categorical distributions to the data. This approach achieved very good results, but its extensions to semi-supervised settings [19] or hierarchical schemes [2] were either not so effective [19] or more cumbursome [2]. In contrast, the neural network approach of this paper will lend itself to more flexible and systematic end-to-end training frameworks and extensions.

Networks for Time Series Mining. A recent survey [11] reviews the network-based unsupervised feature learning methods for time series modeling. As explained in Sect. 1, autoencoders [17] and also Restricted Boltzmann Machines (RBM) [8] are neural networks designed to be trained from unsupervised data. The two types of networks can achieve similar goals but differ in the objective function and related optimization algorithms. Both methods were extended to handle time series [1, 14], but the goal was to minimize a reconstruction error without taking care of the interpretability or of finding the relevant number of patterns. In this paper, we show that convolutional autoencoders can indeed capture the spatio-temporal structure in temporal documents. We build on the above works and propose a model to discover the right number of meaningful patterns in the convolution filters, and to generate sparse activations.

3 Motif Mining with Convolutional Autoencoders (AE)

Convolutional AEs [12] are particular AEs whose connection weights are constrained to be convolution kernels. In practice, this means that most of the learned parameters are shared within the network and that the weight matrices which store the convolution filters can be directly interpreted and visualized. Below, we first present the traditional AE model and then introduce our contributions to enforce at the same time a good interpretability of the convolutional filters and a clean and sparse activation of these filters.

3.1 Classical Convolutional Autoencoders

A main difference between an AE and a standard neural network is the loss function used to train the network. In an AE, the loss does not depend on labels, it is the reconstruction error between the input data and the network output. Figure 1 illustrates the main network modeling components of our model. In our case, a training example is a multivariate time series $\mathbf {x}$ whose $L$ time steps are described by a vector $\mathbf {x} _{(:,t)} \in \mathcal {R}^d$, and the network is parameterized by the set of weights $\mathbf {W} =\{\mathbf {{}^eW},\mathbf {{}^dW} \}$ involved in the coding and decoding processes. If we denote by $\mathbf {X} = \{ \mathbf {x} ^b \in \mathcal {R}^{L \times d}$, $b = 1 \ldots N \}$ the set of all training elements, the estimation of these weights is classically conducted by optimizing the cost function $C(\mathbf {W},\mathbf {X}) = MSE(\mathbf {W},\mathbf {X}) + R_{reg}(\mathbf {W},\mathbf {X})$ where the Mean Squared Error (MSE) reconstruction loss can be written as:

$$\begin{aligned} MSE(\mathbf {W},\mathbf {X}) = \frac{1}{N}\sum \limits _{b=1}^{N}\sum \limits _{i=1}^{d}\sum \limits _{t=1}^{L} \left( \mathbf {x} _{(i,t)}^b - \mathbf {o} _{(i,t)}^b\right) ^2 \end{aligned}$$

(1)

where $\mathbf {o} ^b$ (which depends on parameters $\mathbf {W} $) is the AE output of the $b^{th}$ input document. To avoid learning trivial and unstable mappings, a regularization term $R_{reg}$ is often added to the MSE and usually comprises two terms. The first one, known as weight decay as it avoids unnecessary high weight values, is a $\ell _2$ norm on the matrix weights. The second one (used with binary activations) consists of a Kullback-Leibler divergence $\sum _{j=1}^{M} KL(\rho || \hat{\rho }_j)$ encouraging all hidden activation units to have their probability of activation $\hat{\rho }_j$ estimated across samples to be close to a chosen parameter $\rho $, thus enforcing some activation sparsity when $\rho $ is small. The parameters are typically learned using a stochastic gradient descent algorithm (SGD) with momentum using an appropriate rate scheduling [3].

3.2 Interpretable Pattern Discovery with Autoencoders

In our application, the learned convolution filters should not only minimize the reconstruction error but also be directly interpretable. Ideally, we would like to only extract filters which capture and represent interesting data patterns, as illustrated in Fig. 2-c–d. To achieve this, we add a number of elements in the network architecture and in our optimization cost function to constrain our network appropriately.

Enforcing Non-negative Decoding Filters. As the AE output is somehow defined as a linear combination of the decoding filters, then these filters can represent the patterns we are looking for and we can interpret the hidden layers activations $\mathbf {a}$ (see Fig. 1) as the occurrences of these patterns. Thus, as our input is non-negative (a temporal document), we constraint the decoding filters weights to be non-negative by thresholding them at every SGD iteration. The assumption that the input is non-negative holds in our case and it will also hold in deeper AEs provided that we use ReLU-like activation functions. Note that for encoding, we do not constrain filters so they can have negative values to compensate for the pattern auto-correlation (see below).

Sparsifying the Filters. The traditional $\ell _2$ regularization allows many small but non-zero values. To force these values to zero and thus get sparser filters, we replaced the $\ell _2$ norm by the sparsity-promoting norm $\ell _1$ known as lasso:

$$\begin{aligned} R_{las}(\mathbf {W}) = \sum \limits _{f=1}^{M}\sum \limits _{i=1}^{d}\sum \limits _{k=1}^{L\!_f}\left| \mathbf {{}^eW} _{(i,k)}^f\right| + \sum \limits _{f=1}^{M}\sum \limits _{i=1}^{d}\sum \limits _{k=1}^{L\!_f}\left| \mathbf {{}^dW} _{(i,k)}^f\right| \end{aligned}$$

(2)

Encouraging Sparse Activations. The traditional KL divergence aims at making all hidden units equally useful on average, whereas our goal is to have the activation layer to be as sparse as possible for each given input document. We achieve this by encouraging peaky activations, i.e. of low entropy when seen as a document-level probability distribution, as was proposed in [20] when dealing on topic models for motif discovery. This results in an entropy-based regularization expressed on the set $\mathbf {A} = \{\mathbf {a} ^b \}$ of document-level activations:

$$\begin{aligned} R_{ent}(\mathbf {A}) = \! -\frac{1}{N}\sum \limits _{b=1}^{N} \left( \sum _{f=1}^{M}\sum _{t=1}^{L-L\!_f +1} \hat{\mathbf {a}}^b_{f,t} \log \left( \hat{\mathbf {a}}^b_{f,t} \right) \right) \text{ with } \hat{\mathbf {a}}^b_{f,t} \! = \! \mathbf {a} ^b_{f,t} \bigg / \sum _{f=1}^{M}\sum _{t=1}^{L-L\!_f +1}\mathbf {a} ^b_{f,t} \end{aligned}$$

(3)

Local Non-maximum Activation Removal. The previous entropy regularizer encourages peaked activations. However, as the encoding layer remains a convolutional layer, if a filter is correlated in time with itself or another filter, then the activations cannot be sparse. This phenomenon is due to the feed forward nature of the network, where activations depend on the input, not on each others: hence, no activation can inhibit its neighboring activations. To handle this issue we add a local non-maximum suppression layer which, from a network perspective, is obtained by convolving activations with a temporal Gaussian filter, subtracting from the result the activation intensities, and applying a ReLU, focusing in this way spread activations into central peaks.

Handling Distant Filter Correlations with AdaReLU. The Gaussian layer cannot handle non local (in time) correlations. To handle this, we propose to replace the traditional ReLU activation function by a novel one called adaptive ReLU. AdaReLU works on groups of units and sets to 0 all the values that are below a percentage (e.g., 60 %) of the maximal value in the group. In our architecture, AdaReLU is applied separately on each filter activation sequence.

Finding the True Number of Patterns. One main advantage and contribution of our AE-based method compared to methods presented in Sect. 2 is the possibility to discover the “true” number of patterns in the data. One solution to achieve this is to introduce in the network a large set of filters and “hope” that the learning leads to only a few non null filters capturing the interesting patterns. However, in practice, standard regularization terms and optimizations tend to produce networks “using” all or many more filters than the number of true patterns which results in partial and less interpretable patterns. To overcome this problem, we propose to use a group lasso regularization term called $\ell _{2,1}$ norm [24] that constrains the network to “use” as few filters as possible. It can be formulated for our weight matrix as:

$$\begin{aligned} R_{grp}(\mathbf {W}) = \sum \limits _{f=1}^{M}\sqrt{\sum \limits _{i=1}^{d}\sum \limits _{k=1}^{L\!_f} \left( \mathbf {{}^eW} _{(i,k)}^f\right) ^2} \, + \, \sum \limits _{f=1}^{M}\sqrt{\sum \limits _{i=1}^{d}\sum \limits _{k=1}^{L\!_f} \left( \mathbf {{}^dW} _{(i,k)}^f\right) ^2} \end{aligned}$$

(4)

Overall Objective Function. Combining Eqs. (1), (2), (3) and (4), we obtain the objective function that is optimized by our network:

$$\begin{aligned} C(\mathbf {W},\mathbf {X}) = MSE(\mathbf {W},\mathbf {X}) + \lambda _{las} R_{las}(\mathbf {W}) + \lambda _{grp} R_{grp}(\mathbf {W}) + \lambda _{ent} R_{ent}(\mathbf {A} (\mathbf {W},\mathbf {X})) \end{aligned}$$

(5)

4 Experiments

4.1 Experimental Setting

Datasets. To study the behavior of our approach, we experimented with both synthetic and real video datasets. The synthetic data were obtained using a known generation process: temporal documents were produced by sampling random observations of random linear combinations of motifs along with salt-and-pepper noise whose amount was defined as a percentage of the total document intensities (noise levels: 0 %, 33 %, 66 %). Six motifs (defined as letter sequences for ease of visualization) were used. A document example is shown in Fig. 2-a, where the the feature dimension ($d =25$) is represented vertically, and time horizontally ($L =300$). For each experiments, 100 documents were generated using this process and used to train the autoencoders. This controlled environment allowed us to evaluate the importance of modeling elements. In particular, we are interested in (i) the number of patterns discovered (defined as the non empty decoding filters^{Footnote 2}; (ii) the “sharpness” of the activations; and (iii) the robustness of our method according to parameters like $\lambda _{lasso}, \lambda _{grp}, \lambda _{ent}$, the number of filters $M$, and the noise level.

We also applied our approach on videos recorded from fixed cameras. We used videos from the QMUL [9] and the far-field datasets [21]. The data pre-processing steps from the companion code of [6] were applied. Optical flow features were obtained by estimating, quantifying, and locally collecting optical flow over 1 second periods. Then, temporal documents were obtained by reducing the dimensionality of these to $d =100$, and by cutting videos into temporal documents of size $L =300$ time steps.

Architecture Details and Parameter Setting. The proposed architecture is given in Fig. 1. As stated earlier, the goal of this paper is to make the most of a convolutional AE with a single layer (corresponding to the activation layer)^{Footnote 3}. Weights are initialized according to a uniform distribution between 0 and $\frac{1}{d *L\!_f}$.

In general, the filter length $L\!_f $ should be large enough to capture the longest expected recurring pattern of interest in the data. The filter length has been set to $L\!_f =45$ in synthetic experiments, which is beyond the longer motif of the ground-truth. In the video examples, we used $L\!_f =11$, corresponding to 10 seconds, and which allows to capture the different traffic activities and phases of our data [21].

4.2 Results on the Synthetic Dataset

Since we know the “true” number of patterns and their expected visualization, we first validate our approach by showing (see Fig. 2-c) that we can find a set of parameters such that our filters exactly capture our given motifs and the number of non empty filters is exactly the “true” number of motifs in the dataset even when this dataset is noisy (this is also true for a clean dataset). In this case (see Fig. 2-e) the activations for the complete document are, as expected, sparse and “peaky”. The output document (see Fig. 2-b) is a good un-noisy reconstruction of the input document shown in Fig. 2-a.

In Fig. 3, we evaluate the influence of the given number of filters $M$ and the noise level on both the number of recovered motifs an the MSE while fixing the parameters as in Fig. 2. We can see that with this set of parameters, the AE is able to recover the true number of filters for the large majority of noise levels and values of $M$. For all noise levels, we see from the low MSE that the AEs is able to well reconstruct the original document as long as the number of given filters is at least equal to the number of “true” patterns in the document.

Model Selection: Influence of $\varvec{\lambda }_{\varvec{grp}}$ . Figure 4 shows the number of non zero filters in function of $\lambda _{grp}$ and of the noise level for the synthetic dataset with 6 known motifs when using 12 filters (left) and 16 filters (right). The light blue area is the area in which the AEs was able to discover the true number of patterns. With no group lasso regularization ($\lambda _{grp} =0$), the AE systematically uses all the available filters capturing the original patterns (see $2^{nd}, 4^{th}$ or $5^{th}$ filters in Fig. 2-d), redundant variants of the same pattern (filters $1^{st}$ and $3^{rd}$ in Fig. 2-d) or a more difficult to interpret mix of the patterns (filters $6^{th}$ and $7^{th}$ in Fig. 2-d). On the contrary, with too high values of $\lambda _{grp}$, the AE does not find any patterns (resulting in a high MSE). A good heuristic to set the value of $\lambda _{grp}$ could thus be to increase it as much as possible until the resulting MSE starts increasing. In the rest of the experiments, $\lambda _{grp}$ is set equal to 2.

Influence of $\varvec{\lambda }_{\varvec{ent}}, \varvec{\lambda }_{\varvec{lasso}}$ , AdaReLU, and Non-local Maxima Suppression. We have conducted the same experiments as in Fig. 2 on clean and noisy datasets (up to 66 % of noise) with $M =3$, $M =6$ $M =12$ to assess the behavior of our system when canceling the parameters: (1) $\lambda _{ent}$ that controls the entropy of the activation layer, (2) $\lambda _{las}$, the lasso regularizer (3) the AdaReLU function (we used a simple ReLU in the encoding layer instead) and (4) the Non-Local Maxima activation suppression layer. In all cases, all parameters but one were fixed according to the best set of values given in Fig. 2. For lack of space, we do not give all the corresponding figures but we comment the main results.

The $\lambda _{ent}$ is particularly important in the presence of noise. Without noise and when this parameter is set to 0, the patterns are less sharp and smooth and the activations are more spread along time with much smaller intensities. However, the MSE is as low as for the default parameters. In the presence of noise (see Fig. 2-f), the AE is more likely to miss the recovery of some patterns even when the optimal number of filters is given (e.g. in some experiments only 5 out of the 6 filters were not empty) and the MSE increases a lot compared to experiments on clean data. This shows again that the MSE can be a good heuristic to tune the parameters on real data. The $\lambda _{las}$ has similar effects with and without noise: it helps removing all the small activation values resulting in much sharper (and thus interpretable) patterns.

The non-local maximum suppression layer (comprising the Gaussian filter) is compulsory in our proposed architecture. Indeed, without it, the system was not able to recover any patterns when $M =3$ (and only one blurry “false” pattern in the presence of noise). When $M =6$, it only captured 4 patterns (out of 6) in the clean dataset and did not find any in the noisy ones. When $M =12$, it was able to recover the 6 original true patterns in the clean dataset but only one blurry “false” pattern in the noisy ones.

The AdaReLU function also plays an important role to recover interpretable patterns. Without it (using ReLU instead) the patterns recognized are not the “true” patterns, they have a very low intensity and are highly auto-correlated (as illustrated by the activations in Fig. 2-g).

4.3 Results on the Real Video Dataset

Due to space limitations, we only show in Fig. 5 some of the obtained results. The parameters were selected using grid search by minimizing the MSE on the targeted dataset. For instance, on the Junction 1 dataset, the final parameters used are $\lambda _{las} = 0.2$, $\lambda _{grp} = 50$, $\lambda _{ent} = 5$. Note that this is larger than in the synthetic case but the observation size is also much larger (100 vs 25) and the filters are thus sparser in general. In the Junction 1 dataset, the autoencoder recovers 4 non-empty and meaningful filters capturing the car activities related to the different traffic signal cycles, whereas in the far-field case, the main trajectories of cars were recovered as also reported in [21].

5 Conclusion

We have shown that convolutional AEs are good candidate unsupervised data mining tools to discover interpretable patterns in time series. We have introduced a number of layers and regularization terms to the standard convolutional AEs to enforce the interpretability of both the convolutional filters and the activations in the hidden layers of the network. The filters are directly interpretable as spatio-temporal patterns while the activations give the occurrence times of each patterns in the temporal document. This allow us to un-mix multivariate time series. A direct perspective of this work is the use of multi-layer AEs to capture combination of motifs. If this was not the aim of this article, it may help to reduce the number of parameters needed to obtain truly interpretable patterns and capture more complex patterns in data.

Notes

1.
The complete source code will be made available online.
2.
We consider a filter empty if the sum of its weights is lower or equal to $\frac{1}{2}$ (the average sum value after initialization).
3.
Note however that the method can be generalized to hierarchical motifs using more layers, but then the interpretation of results would slightly differ.

References

Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Spatio-temporal convolutional sparse auto-encoder for sequence classification. In: British Machine Vision Conference (BMVC) (2012)
Google Scholar
Chockalingam, T., Emonet, R., Odobez, J.-M.: Localized anomaly detection via hierarchical integrated activity discovery. In: AVSS (2013)
Google Scholar
Darken, C., Moody, J.E.: Note on learning rate schedules for stochastic optimization. In: NIPS, pp. 832–838 (1990)
Google Scholar
Du, X., Jin, R., Ding, l., Lee, V.E., Thornton, J.H.: Migration motif: a spatial- temporal pattern mining approach for nancial markets. In: KDD, pp. 1135–1144. ACM (2009)
Google Scholar
Emonet, R., Varadarajan, J., Odobez, J.-M.: Multi-camera open space human activity discovery for anomaly detection. In: IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), Klagenfurt, Austria, August 2011
Google Scholar
Emonet, R., Varadarajan, J., Odobez, J.-M.: Temporal analysis of motif mixtures using dirichlet processes. IEEE PAMI 36(1), 140–156 (2014)
Article Google Scholar
Marwah, M., Shao, H., Ramakrishnan, N.: A temporal motif mining approach to unsupervised energy disaggregation: applications to residential and commercial buildings. In: Proceedings of the 27th AAAI Conference (2013)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Hospedales, T., Gong, S., Xiang, T.: A Markov clustering topic model for mining behavior in video. In: ICCV (2009)
Google Scholar
Kolter, J.Z., Jaakkola, T.: Approximate inference in additive factorial HMMs with application to energy disaggregation. In: Proceedings of AISTATS Conference (2012)
Google Scholar
Karlsson, L., Längkvist, M., Loutfi, A.: A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognit. Lett. 42, 11–24 (2014)
Article Google Scholar
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21735-7_7
Chapter Google Scholar
Mehta, N.A., Gray, A.G.: Funcica for time series pattern discovery. In: Proceedings of the SIAM International Conference on Data Mining, pp. 73–84 (2009)
Google Scholar
Memisevic, R., Hinton, G.E.: Unsupervised learning of image transformations. In: Computer Vision and Pattern Recognition (CVPR) (2007)
Google Scholar
Mooney, C.H., Roddick, J.F.: Sequential pattern mining - approaches, algorithms. ACM Comput. Surv. 45(2), 19:1–19:39 (2013)
Article MATH Google Scholar
Oates, T.: PERUSE: an unsupervised algorithm for finding recurring patterns in time series. In: ICDM (2002)
Google Scholar
Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: NIPS. MIT Press (2006)
Google Scholar
Sallaberry, A., Pecheur, N., Bringay, S., Roche, M., Teisseire, M.: Sequential patterns mining and gene sequence visualization to discover novelty from microarray data. J. Biomed. Inform. 44(5), 760–774 (2011)
Article Google Scholar
Tavenard, R., Emonet, R., Odobez, J.-M.: Time-sensitive topic models for action recognition in videos. In: International Conference on Image Processing (ICIP), Melbourne (2013)
Google Scholar
Varadarajan, J., Emonet, R., Odobez, J.-M.: A sparsity constraint for topic models - application to temporal activity mining. In: NIPS Workshop on Practical Applications of Sparse Modeling: Open Issues and New Directions (2010)
Google Scholar
Varadarajan, J., Emonet, R., Odobez, J.-M.: A sequential topic model for mining recurrent activities from long term video logs. Int. J. Comput. Vis. 103(1), 100–126 (2013)
Article MathSciNet MATH Google Scholar
Chu, W.-S., Zhou, F., Torre, F.: Unsupervised temporal commonality discovery. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 373–387. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33765-9_27
Chapter Google Scholar
Peng, W.-C., Chen, Y.-C., Lee, S.-Y.: Mining temporal patterns in time interval-based data. IEEE Trans. Knowl. Data Eng. 27(12), 3318–3331 (2015)
Article Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. (B) 68(1), 49–67 (2006)
Article MathSciNet MATH Google Scholar
Zheng, Y.: Trajectory data mining: an overview. ACM Trans. Intell. Syst. Technol. 6(3), 29:1–29:41 (2015)
Article Google Scholar

Download references

Acknowledgement

This work has been supported by the ANR project SoLStiCe (ANR-13-BS02-0002-01).

Author information

Authors and Affiliations

Univ Lyon, UJM-Saint-Etienne, CNRS, Institut d’Optique Graduate School, Laboratoire Hubert Curien UMR 5516, 42023, Saint-Etienne, France
Kevin Bascol, Rémi Emonet & Elisa Fromont
Idiap Research Institute, 1920, Martigny, Switzerland
Jean-Marc Odobez

Authors

Kevin Bascol
View author publications
You can also search for this author in PubMed Google Scholar
Rémi Emonet
View author publications
You can also search for this author in PubMed Google Scholar
Elisa Fromont
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Odobez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elisa Fromont .

Editor information

Editors and Affiliations

Data 61 - CSIRO , Canberra, Australia
Antonio Robles-Kelly
Pattern Recognition Laboratory, Technical University of Delft Pattern Recognition Laboratory, CD Delft, The Netherlands
Marco Loog
Electrical and Electronic Engineering, University of Cagliari Electrical and Electronic Engineering, Cagliari, Italy
Battista Biggio
Computación e IA, Universidad de Alicante Computación e IA, Alicante, Spain
Francisco Escolano
Computer Science, University of York Computer Science, York, United Kingdom
Richard Wilson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bascol, K., Emonet, R., Fromont, E., Odobez, JM. (2016). Unsupervised Interpretable Pattern Discovery in Time Series Using Autoencoders. In: Robles-Kelly, A., Loog, M., Biggio, B., Escolano, F., Wilson, R. (eds) Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2016. Lecture Notes in Computer Science(), vol 10029. Springer, Cham. https://doi.org/10.1007/978-3-319-49055-7_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-49055-7_38
Published: 05 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49054-0
Online ISBN: 978-3-319-49055-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)