Abstract
Nowadays, climate events and weather predictions have a huge impact on human activities. To understand the accuracy of weather prediction, we applied the functional principal component analysis (FPCA) method to investigate the main pattern of variance within the U.S. weather prediction error over a period of 3 years. We further grouped the states in the U.S. based on their similarity in weather forecast performance using two types of functional clustering approaches: the filtering method and the model-based method. The strengths and weaknesses of each clustering method were detected through the simulation studies. Then, the clustering approaches were applied to U.S. weather data from 2014 to 2017. Through clustering, cluster-specific patterns were visually detected, and the cluster-to-cluster differences were quantified in order to identify the most and least predictable U.S. states.
Similar content being viewed by others
References
Abraham C, Cornillon PA, Matzner-Løber E, Molinari N (2003) Unsupervised curve clustering using b-splines. Scandinavian J stat 30(3):581–595
Adams RA, Fournier JJ (2003) Sobolev spaces, vol 140. Elsevier, Atlanta
Adams RM, Rosenzweig C, Peart RM, Ritchie JT, McCarl BA, Glyer JD, Curry RB, Jones JW, Boote KJ, Allen LH Jr (1990) Global climate change and us agriculture. Nature 345(6272):219–224
Adelfio G, Chiodi M, D’Alessandro A, Luzio D (2011) FPCA algorithm for waveform clustering. J Commun Comput 8(6):494–502
Bauer P, Thorpe A, Brunet G (2015) The quiet revolution of numerical weather prediction. Nature 525(7567):47–55
Besse PC, Cardot H, Stephenson DB (2000) Autoregressive forecasting of some functional climatic variations. Scandinavian J Stat 27(4):673–687
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transact Pattern Anal Mach Intell 22(7):719–725
Bosq D (1996) Nonparametric statistics for stochastic processes: estimation and prediction, vol 110. Springer-Verlag, New York
Bouveyron C (2015) funFEM: Clustering in the Discriminative Functional Subspace. https://CRAN.R-project.org/package=funFEM, r package version 1.1
Bouveyron C, Côme E, Jacques J (2015) The discriminative functional mixture model for a comparative analysis of bike sharing systems. The Annals Appl Stat 9(4):1726–1760
Box GE, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control, 5th edn. John Wiley & Sons, Hoboken, New Jersey
Charrad M, Ghazzali N, Boiteau V, Niknafs A (2012) NbClust package: finding the relevant number of clusters in a dataset. UseR! 2012
Charrad M, Ghazzali N, Boiteau V, Niknafs A (2014) NbClust: an R package for determining the relevant number of clusters in a data set. J Stat Soft 61(6):1–36
Collomb G (1983) From non parametric regression to non parametric prediction: Survey of the mean square error and original results on the predictogram. In: Specifying statistical models, Springer, pp 182–204
Curry HB, Schoenberg IJ (1966) On Pólya frequency functions IV: the fundamental spline functions and their limits. J d’analyse mathématique 17(1):71–107
Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3(7):1–21
Györfi L, Härdle W, Sarda P, Vieu P (1989) Nonparametric curve estimation from time series, vol 60. Springer-Verlag, New York
Hartigan JA, Wong MA (1979) Algorithm as 136: A \(k\)-means clustering algorithm. J Royal Stat Soc Series C (Appl Stat) 28(1):100–108
Hornik K (2019) clue: Cluster ensembles. https://CRAN.R-project.org/package=clue, r package version 0.3-57
Jacques J, Preda C (2014) Functional data clustering: a survey. Adv Data Anal Classificat 8(3):231–255. https://doi.org/10.1007/s11634-013-0158-y
James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Associat 98(462):397–408
Ke Y, Li J, Zhang W et al (2016) Structure identification in panel data analysis. The Annals Stat 44(3):1193–1233
Lazo JK, Morss RE, Demuth JL (2009) 300 billion served: Sources, perceptions, uses, and values of weather forecasts. Bullet Am Meteorol Soc 90(6):785–798
Li J, Yue M, Zhang W (2019) Subgroup identification via homogeneity pursuit for dense longitudinal/spatial data. Stat Med
Orrell D, Smith L, Barkmeijer J, Palmer T (2001) Model error in weather forecasting. Nonlinear Process Geophys 8(6):357–371
Papadimitrou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Prentice-Hall, New York
Radhika Y, Shashi M (2009) Atmospheric temperature prediction using support vector machines. Int J Comput Theory Eng 1(1):55–59
Ramsay J, Silverman B (2005) Functional data anal, 2nd edn. Springer, New York
Ramsay J, Hooker G, Graves S (2009) Functional data analysis with R and MATLAB. Springer, New York
Ramsay JO, Wickham H, Graves S, Hooker G (2018) fda: Functional Data Analysis. https://CRAN.R-project.org/package=fda, r package version 2.4.8
Rice JA, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J Royal Stat Soc: Series B (Methodol) 53(1):233–243
Schmutz A, Jacques J, Bouveyron C, Cheze L, Martin P (2018) Clustering multivariate functional data in group-specific functional subspaces, https://hal.inria.fr/hal-01652467, preprint
Schwarz G (1978) Estimating the dimension of a model. The Annals Stat 6(2):461–464
Silverman BW (1996) Smoothed functional principal components analysis by choice of norm. The Annals Stat 24(1):1–24
Acknowledgements
The authors are most appreciative of the organizers of 2018 JSM Data Expo who made this happen. We also thank Dr. Peijun Sang, PhD candidates Yuping Yang and Zhiyang Zhou, faculty members and graduate students in the Department of Statistics and Actuarial Science at Simon Fraser University who provided helpful suggestions relating to this project.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lin, C., Yu, Y., Wu, L.Y. et al. Unsupervised learning on U.S. weather forecast performance. Comput Stat 38, 1193–1213 (2023). https://doi.org/10.1007/s00180-023-01340-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-023-01340-w