Skip to main content
Log in

Multivariate functional data modeling with time-varying clustering

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

We consider the setting of multivariate functional data collected over time at each of a set of sites. Our objective is to implement model-based clustering of the functions across the sites where we allow such clustering to vary over time. Anticipating dependence between the functions within a site as well as across sites, we model the collection of functions using a multivariate Gaussian process. With many sites and several functions at each site, we use dimension reduction to provide a computationally manageable stochastic process specification. To jointly cluster the functions, we use the Dirichlet process which enables shared labeling of the functions across the sites. Specifically, we cluster functions based on their response to exogenous variables. Though the functions arise over continuous time, clustering in continuous time is extremely computationally demanding and not of practical interest. Therefore, we employ partitioning of the timescale to capture time-varying clustering. Our illustrative setting is bivariate, monitoring ozone and PM\(_{10}\) levels over time for one year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City for 2017 which record hourly ozone and PM\(_{10}\) levels. Hence, we have 48 functions to work with across 8760 hours. We provide a Gaussian process model for each function using continuous-time meteorological variables as regressors along with adjustment for daily periodicity. We interpret the similarity of functions in terms of their shape, captured through site-specific coefficients, and use these coefficients to develop the clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Abraham C, Cornillon P-A, Matzner-Løber E, Molinari N (2003) Unsupervised curve clustering using b-splines. Scand J Stat 30(3):581–595

    Article  MathSciNet  Google Scholar 

  • Aguilar O, West M (2000) Bayesian dynamic factor models and portfolio allocation. J Bus Econ Stat 18(3):338–357

    Google Scholar 

  • Ali AM, Darvishzadeh R, Skidmore AK (2017) Retrieval of specific leaf area from landsat-8 surface reflectance data using statistical and physical models. IEEE J Sel Top Appl Earth Observ Remote Sens 10(8):3529–3536

    Article  Google Scholar 

  • Banerjee S, Carlin BP, Gelfand AE (2014) Hierarchical modeling and analysis for spatial data. CRC Press, Amsterdam

    Book  Google Scholar 

  • Banerjee S, Gelfand AE, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J R Stat Soc Ser B (Stat Methodol) 70(4):825–848

    Article  MathSciNet  Google Scholar 

  • Bernardo J, Bayarri M, Berger J, Dawid A, Heckerman D, Smith A, West M (2003) Bayesian factor regression models in the “large p, small n” paradigm. Bayesian Stat 7:733–742

    MathSciNet  Google Scholar 

  • Berrocal VJ, Gelfand AE, Holland DM (2010) A spatio-temporal downscaler for output from numerical models. J AgriC Biol Environ Stat 15(2):176–197

    Article  MathSciNet  Google Scholar 

  • Bhattacharya A, Dunson D. B (2011) Sparse Bayesian infinite factor models. Biometrika, 291–306

  • Brockwell PJ, Davis R, Yang Y (2007) Continuous-time Gaussian autoregression. Stat Sin 17:63–80

    MATH  Google Scholar 

  • Christensen WF, Amemiya Y (2002) Latent variable analysis of multivariate spatial data. J Am Stat Assoc 97(457):302–317

    Article  MathSciNet  Google Scholar 

  • Cocchi D, Greco F, Trivisano C (2007) Hierarchical space-time modelling of pm10 pollution. Atmos Environ 41(3):532–542

    Article  Google Scholar 

  • Datta A, Banerjee S, Finley AO, Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J Am Stat Assoc 111(514):800–812

    Article  MathSciNet  Google Scholar 

  • Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90(430):577–588

    Article  MathSciNet  Google Scholar 

  • Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 209–230

  • Gelfand AE, Kim H-J, Sirmans C, Banerjee S (2003) Spatial modeling with spatially varying coefficient processes. J Am Stat Assoc 98(462):387–396

    Article  MathSciNet  Google Scholar 

  • Gervini D (2014) Warped functional regression. Biometrika 102(1):1–14

    Article  MathSciNet  Google Scholar 

  • Geweke J, Zhou G (1996) Measuring the pricing error of the arbitrage pricing theory. Rev Financ Stud 9(2):557–587

    Article  Google Scholar 

  • Han S, Kerekes J, Higbee S, Siegel L, Pertica A (2019) Band selection method for subpixel target detection using only the target reflectance signature. Appl Opt 58(11):2981–2993

    Article  Google Scholar 

  • Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108

    MATH  Google Scholar 

  • Hogan JW, Tchernis R (2004) Bayesian factor analysis for spatially correlated data, with application to summarizing area-level material deprivation from census data. J Am Stat Assoc 99(466):314–324

    Article  MathSciNet  Google Scholar 

  • Huang G, Lee D, Scott EM (2018) Multivariate space-time modelling of multiple air pollutants and their health effects accounting for exposure uncertainty. Stat Med 37(7):1134–1148

    Article  MathSciNet  Google Scholar 

  • Jacques J, Preda C (2014) Model-based clustering for multivariate functional data. Comput Stat Data Anal 71:92–106

    Article  MathSciNet  Google Scholar 

  • Lopes HF, West M (2004) Bayesian model assessment in factor analysis. Stat Sinica, 41–67

  • Morris JS (2015) Functional regression. Annu Rev Stat Appl 2:321–359

    Article  Google Scholar 

  • Petrone S, Guindani M, Gelfand AE (2009) Hybrid Dirichlet mixture models for functional data. J R Stat Soc Ser B (Stat Methodol) 71(4):755–782

    Article  MathSciNet  Google Scholar 

  • Ramsay J (1982) When the data are functions. Psychometrika 47(4):379–396

    Article  MathSciNet  Google Scholar 

  • Ramsay JO, Dalzell C (1991) Some tools for functional data analysis. J R Stat Soc Ser B (Stat Methodol) 53(3):539–561

    MathSciNet  MATH  Google Scholar 

  • Ramsay JO, Silverman BW (2007) Applied functional data analysis: methods and case studies. Springer, Berlin

    MATH  Google Scholar 

  • Sahu SK, Gelfand AE, Holland DM (2007) High-resolution space-time ozone modeling for assessing trends. J Am Stat Assoc 102(480):1221–1234

    Article  MathSciNet  Google Scholar 

  • Schmutz A, Jacques J, Bouveyron C, Cheze L, Martin P (2020) Clustering multivariate functional data in group-specific functional subspaces. Comput Stat 35:1101–1131

    Article  MathSciNet  Google Scholar 

  • Seber GA (2009) Multivariate Observ, vol 252. Wiley, New York

    Google Scholar 

  • Sethuraman J (1994) A constructive definition of dirichlet priors. Stat Sinica 639–650

  • Shi JQ, Choi T (2011) Gaussian process regression analysis for functional data. Chapman and Hall/CRC, Boca Raton

    Book  Google Scholar 

  • Sugar CA, James GM (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98(463):750–763

    Article  MathSciNet  Google Scholar 

  • Telesca D, Inoue LYT (2008) Bayesian hierarchical curve registration. J Am Stat Assoc 103(481):328–339

    Article  MathSciNet  Google Scholar 

  • Ullah S, Finch CF (2013) Applications of functional data analysis: a systematic review. BMC Med Res Methodol 13(1):43

    Article  Google Scholar 

  • Wang B, Chen T (2015) Gaussian process regression with multiple response variables. Chemometr Intell Lab Syst 142:159–165

    Article  Google Scholar 

  • Wang J-L, Chiou J-M, Müller H-G (2016) Functional data analysis. Ann Rev Stat Appl 3:257–295

    Article  Google Scholar 

  • West M, Harrison J (1997) Bayesian forecasting and dynamic models, 2nd edn. Springer, Berlin

    MATH  Google Scholar 

  • White P, Porcu E (2019) Nonseparable covariance models on circles cross time: a study of Mexico City ozone. Environmetrics 30(5):e2558

    Article  MathSciNet  Google Scholar 

  • White PA, Gelfand AE, Rodrigues ER, Tzintzun G (2019) Pollution state modelling for Mexico City. J R Stat Soc Ser A (Stat Soc) 182(3):1039–1060

    Article  MathSciNet  Google Scholar 

  • Zhang H (2004) Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J Ame Stat Assoc 99(465):250–261

    Article  MathSciNet  Google Scholar 

  • Zhang X, Nott DJ, Yau C, Jasra A (2014) A sequential algorithm for fast fitting of dirichlet process mixture models. J Comput Gr Stat 23(4):1143–1162

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philip A. White.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

White, P.A., Gelfand, A.E. Multivariate functional data modeling with time-varying clustering. TEST 30, 586–602 (2021). https://doi.org/10.1007/s11749-020-00733-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-020-00733-z

Keywords

Mathematics Subject Classification

Navigation