Skip to main content
Log in

Model-based clustering of time series in group-specific functional subspaces

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

This work develops a general procedure for clustering functional data which adapts the clustering method high dimensional data clustering (HDDC), originally proposed in the multivariate context. The resulting clustering method, called funHDDC, is based on a functional latent mixture model which fits the functional data in group-specific functional subspaces. By constraining model parameters within and between groups, a family of parsimonious models is exhibited which allow to fit onto various situations. An estimation procedure based on the EM algorithm is proposed for determining both the model parameters and the group-specific functional subspaces. Experiments on real-world datasets show that the proposed approach performs better or similarly than classical two-step clustering methods while providing useful interpretations of the groups and avoiding the uneasy choice of the discretization technique. In particular, funHDDC appears to always outperform HDDC applied on spline coefficients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aguilera A, Escabiasa M, Preda C, Saporta G (2011) Using basis expansions for estimating functional PLS regression. Applications with chemometric data. Chemom Intell Lab Syst 104(2): 289–305

    Article  Google Scholar 

  • Banfield J, Raftery A (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803–821

    Article  MathSciNet  MATH  Google Scholar 

  • Biernacki C (2004) Initializing EM using the properties of its trajectories in Gaussian mixtures. Stat Comput 14(3): 267–279

    Article  MathSciNet  Google Scholar 

  • Bouveyron C, Girard S, Schmid C (2007) High dimensional data clustering. Comput Stat Data Anal 52: 502–519

    Article  MathSciNet  MATH  Google Scholar 

  • Cattell R (1966) The scree test for the number of factors. Multivar Behav Res 1(2): 245–276

    Article  Google Scholar 

  • Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. J Pattern Recognit Soc 28: 781–793

    Article  Google Scholar 

  • Delaigle A, Hall P (2010) Defining probability density for a distribution of random functions. Ann Stat 38: 1171–1193

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1): 1–38

    MathSciNet  MATH  Google Scholar 

  • Escabias M, Aguilera A, Valderrama M (2005) Modeling environmental data by functional principal component logistic regression. Environmetrics 16: 95–107

    Article  MathSciNet  Google Scholar 

  • Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer series in statistics. Springer, New York

    Google Scholar 

  • Frühwirth-Schnatter S, Kaufmann S (2008) Model-based clustering of multiple time series. J Bus Econ Stat 26: 78–89

    Article  Google Scholar 

  • Hartigan J, Wong M (1978) Algorithm as 1326: a k-means clustering algorithm. Appl Stat 28: 100–108

    Article  Google Scholar 

  • Jacques J, Bouveyron C, Girard S, Devos O, Duponchel L, Ruckebusch C (2010) Gaussian mixture models for the classification of high-dimensional vibrational spectroscopy data. J Chemom 24: 719–727

    Article  Google Scholar 

  • James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462): 397–408

    Article  MathSciNet  MATH  Google Scholar 

  • Lévéder C, Abraham P, Cornillon E, Matzner-Lober E, Molinari N (2004) Discrimination de courbes de prétrissage. In: Chimiométrie 2004, Paris, pp 37–43

  • Olszewski R (2001) Generalized feature extraction for structural pattern recognition in time-series data. PhD thesis, Carnegie Mellon University, Pittsburgh, PA

  • Preda C, Saporta G, Lévéder C (2007) PLS classification of functional data. Comput Stat 22(2): 223–235

    Article  MATH  Google Scholar 

  • Ramsay JO, Silverman BW (2005) Functional data analysis. Springer series in statistics, 2nd edn. Springer, New York

    Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464

    Article  MATH  Google Scholar 

  • Tarpey T, Kinateder K (2003) Clustering functional data. J Classif 20(1): 93–114

    Article  MathSciNet  MATH  Google Scholar 

  • Tipping ME, Bishop C (1999) Mixtures of principal component analyzers. Neural Comput 11(2): 443–482

    Article  Google Scholar 

  • Wahba G (1990) Spline models for observational data. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  • Warren Liao T (2005) Clustering of time series data—a survey. Pattern Recognit 38: 1857–1874

    Article  MATH  Google Scholar 

  • Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana C (2006) Fast time series classification using numerosity reduction. In: 23rd international conference on machine learning (ICML 2006), Pittsburgh, PA, pp 1033–1040

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julien Jacques.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouveyron, C., Jacques, J. Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5, 281–300 (2011). https://doi.org/10.1007/s11634-011-0095-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-011-0095-6

Keywords

Mathematics Subject Classification (2010)

Navigation