Abstract
This work develops a general procedure for clustering functional data which adapts the clustering method high dimensional data clustering (HDDC), originally proposed in the multivariate context. The resulting clustering method, called funHDDC, is based on a functional latent mixture model which fits the functional data in group-specific functional subspaces. By constraining model parameters within and between groups, a family of parsimonious models is exhibited which allow to fit onto various situations. An estimation procedure based on the EM algorithm is proposed for determining both the model parameters and the group-specific functional subspaces. Experiments on real-world datasets show that the proposed approach performs better or similarly than classical two-step clustering methods while providing useful interpretations of the groups and avoiding the uneasy choice of the discretization technique. In particular, funHDDC appears to always outperform HDDC applied on spline coefficients.
Similar content being viewed by others
References
Aguilera A, Escabiasa M, Preda C, Saporta G (2011) Using basis expansions for estimating functional PLS regression. Applications with chemometric data. Chemom Intell Lab Syst 104(2): 289–305
Banfield J, Raftery A (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803–821
Biernacki C (2004) Initializing EM using the properties of its trajectories in Gaussian mixtures. Stat Comput 14(3): 267–279
Bouveyron C, Girard S, Schmid C (2007) High dimensional data clustering. Comput Stat Data Anal 52: 502–519
Cattell R (1966) The scree test for the number of factors. Multivar Behav Res 1(2): 245–276
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. J Pattern Recognit Soc 28: 781–793
Delaigle A, Hall P (2010) Defining probability density for a distribution of random functions. Ann Stat 38: 1171–1193
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1): 1–38
Escabias M, Aguilera A, Valderrama M (2005) Modeling environmental data by functional principal component logistic regression. Environmetrics 16: 95–107
Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer series in statistics. Springer, New York
Frühwirth-Schnatter S, Kaufmann S (2008) Model-based clustering of multiple time series. J Bus Econ Stat 26: 78–89
Hartigan J, Wong M (1978) Algorithm as 1326: a k-means clustering algorithm. Appl Stat 28: 100–108
Jacques J, Bouveyron C, Girard S, Devos O, Duponchel L, Ruckebusch C (2010) Gaussian mixture models for the classification of high-dimensional vibrational spectroscopy data. J Chemom 24: 719–727
James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462): 397–408
Lévéder C, Abraham P, Cornillon E, Matzner-Lober E, Molinari N (2004) Discrimination de courbes de prétrissage. In: Chimiométrie 2004, Paris, pp 37–43
Olszewski R (2001) Generalized feature extraction for structural pattern recognition in time-series data. PhD thesis, Carnegie Mellon University, Pittsburgh, PA
Preda C, Saporta G, Lévéder C (2007) PLS classification of functional data. Comput Stat 22(2): 223–235
Ramsay JO, Silverman BW (2005) Functional data analysis. Springer series in statistics, 2nd edn. Springer, New York
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464
Tarpey T, Kinateder K (2003) Clustering functional data. J Classif 20(1): 93–114
Tipping ME, Bishop C (1999) Mixtures of principal component analyzers. Neural Comput 11(2): 443–482
Wahba G (1990) Spline models for observational data. SIAM, Philadelphia
Warren Liao T (2005) Clustering of time series data—a survey. Pattern Recognit 38: 1857–1874
Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana C (2006) Fast time series classification using numerosity reduction. In: 23rd international conference on machine learning (ICML 2006), Pittsburgh, PA, pp 1033–1040
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bouveyron, C., Jacques, J. Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5, 281–300 (2011). https://doi.org/10.1007/s11634-011-0095-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-011-0095-6
Keywords
- Functional data
- Time series clustering
- Model-based clustering
- Group-specific functional subspaces
- Functional PCA