Abstract
We consider the setting of multivariate functional data collected over time at each of a set of sites. Our objective is to implement model-based clustering of the functions across the sites where we allow such clustering to vary over time. Anticipating dependence between the functions within a site as well as across sites, we model the collection of functions using a multivariate Gaussian process. With many sites and several functions at each site, we use dimension reduction to provide a computationally manageable stochastic process specification. To jointly cluster the functions, we use the Dirichlet process which enables shared labeling of the functions across the sites. Specifically, we cluster functions based on their response to exogenous variables. Though the functions arise over continuous time, clustering in continuous time is extremely computationally demanding and not of practical interest. Therefore, we employ partitioning of the timescale to capture time-varying clustering. Our illustrative setting is bivariate, monitoring ozone and PM\(_{10}\) levels over time for one year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City for 2017 which record hourly ozone and PM\(_{10}\) levels. Hence, we have 48 functions to work with across 8760 hours. We provide a Gaussian process model for each function using continuous-time meteorological variables as regressors along with adjustment for daily periodicity. We interpret the similarity of functions in terms of their shape, captured through site-specific coefficients, and use these coefficients to develop the clustering.
Similar content being viewed by others
References
Abraham C, Cornillon P-A, Matzner-Løber E, Molinari N (2003) Unsupervised curve clustering using b-splines. Scand J Stat 30(3):581–595
Aguilar O, West M (2000) Bayesian dynamic factor models and portfolio allocation. J Bus Econ Stat 18(3):338–357
Ali AM, Darvishzadeh R, Skidmore AK (2017) Retrieval of specific leaf area from landsat-8 surface reflectance data using statistical and physical models. IEEE J Sel Top Appl Earth Observ Remote Sens 10(8):3529–3536
Banerjee S, Carlin BP, Gelfand AE (2014) Hierarchical modeling and analysis for spatial data. CRC Press, Amsterdam
Banerjee S, Gelfand AE, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J R Stat Soc Ser B (Stat Methodol) 70(4):825–848
Bernardo J, Bayarri M, Berger J, Dawid A, Heckerman D, Smith A, West M (2003) Bayesian factor regression models in the “large p, small n” paradigm. Bayesian Stat 7:733–742
Berrocal VJ, Gelfand AE, Holland DM (2010) A spatio-temporal downscaler for output from numerical models. J AgriC Biol Environ Stat 15(2):176–197
Bhattacharya A, Dunson D. B (2011) Sparse Bayesian infinite factor models. Biometrika, 291–306
Brockwell PJ, Davis R, Yang Y (2007) Continuous-time Gaussian autoregression. Stat Sin 17:63–80
Christensen WF, Amemiya Y (2002) Latent variable analysis of multivariate spatial data. J Am Stat Assoc 97(457):302–317
Cocchi D, Greco F, Trivisano C (2007) Hierarchical space-time modelling of pm10 pollution. Atmos Environ 41(3):532–542
Datta A, Banerjee S, Finley AO, Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J Am Stat Assoc 111(514):800–812
Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90(430):577–588
Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 209–230
Gelfand AE, Kim H-J, Sirmans C, Banerjee S (2003) Spatial modeling with spatially varying coefficient processes. J Am Stat Assoc 98(462):387–396
Gervini D (2014) Warped functional regression. Biometrika 102(1):1–14
Geweke J, Zhou G (1996) Measuring the pricing error of the arbitrage pricing theory. Rev Financ Stud 9(2):557–587
Han S, Kerekes J, Higbee S, Siegel L, Pertica A (2019) Band selection method for subpixel target detection using only the target reflectance signature. Appl Opt 58(11):2981–2993
Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108
Hogan JW, Tchernis R (2004) Bayesian factor analysis for spatially correlated data, with application to summarizing area-level material deprivation from census data. J Am Stat Assoc 99(466):314–324
Huang G, Lee D, Scott EM (2018) Multivariate space-time modelling of multiple air pollutants and their health effects accounting for exposure uncertainty. Stat Med 37(7):1134–1148
Jacques J, Preda C (2014) Model-based clustering for multivariate functional data. Comput Stat Data Anal 71:92–106
Lopes HF, West M (2004) Bayesian model assessment in factor analysis. Stat Sinica, 41–67
Morris JS (2015) Functional regression. Annu Rev Stat Appl 2:321–359
Petrone S, Guindani M, Gelfand AE (2009) Hybrid Dirichlet mixture models for functional data. J R Stat Soc Ser B (Stat Methodol) 71(4):755–782
Ramsay J (1982) When the data are functions. Psychometrika 47(4):379–396
Ramsay JO, Dalzell C (1991) Some tools for functional data analysis. J R Stat Soc Ser B (Stat Methodol) 53(3):539–561
Ramsay JO, Silverman BW (2007) Applied functional data analysis: methods and case studies. Springer, Berlin
Sahu SK, Gelfand AE, Holland DM (2007) High-resolution space-time ozone modeling for assessing trends. J Am Stat Assoc 102(480):1221–1234
Schmutz A, Jacques J, Bouveyron C, Cheze L, Martin P (2020) Clustering multivariate functional data in group-specific functional subspaces. Comput Stat 35:1101–1131
Seber GA (2009) Multivariate Observ, vol 252. Wiley, New York
Sethuraman J (1994) A constructive definition of dirichlet priors. Stat Sinica 639–650
Shi JQ, Choi T (2011) Gaussian process regression analysis for functional data. Chapman and Hall/CRC, Boca Raton
Sugar CA, James GM (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98(463):750–763
Telesca D, Inoue LYT (2008) Bayesian hierarchical curve registration. J Am Stat Assoc 103(481):328–339
Ullah S, Finch CF (2013) Applications of functional data analysis: a systematic review. BMC Med Res Methodol 13(1):43
Wang B, Chen T (2015) Gaussian process regression with multiple response variables. Chemometr Intell Lab Syst 142:159–165
Wang J-L, Chiou J-M, Müller H-G (2016) Functional data analysis. Ann Rev Stat Appl 3:257–295
West M, Harrison J (1997) Bayesian forecasting and dynamic models, 2nd edn. Springer, Berlin
White P, Porcu E (2019) Nonseparable covariance models on circles cross time: a study of Mexico City ozone. Environmetrics 30(5):e2558
White PA, Gelfand AE, Rodrigues ER, Tzintzun G (2019) Pollution state modelling for Mexico City. J R Stat Soc Ser A (Stat Soc) 182(3):1039–1060
Zhang H (2004) Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J Ame Stat Assoc 99(465):250–261
Zhang X, Nott DJ, Yau C, Jasra A (2014) A sequential algorithm for fast fitting of dirichlet process mixture models. J Comput Gr Stat 23(4):1143–1162
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
White, P.A., Gelfand, A.E. Multivariate functional data modeling with time-varying clustering. TEST 30, 586–602 (2021). https://doi.org/10.1007/s11749-020-00733-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-020-00733-z
Keywords
- Dimension reduction
- Dirichlet process
- Hierarchical model
- Latent factor models
- Multivariate Gaussian process
- Ozone
- PM\(_{10}\)