Extreme value distributions describe interannual variability in the North Atlantic spring bloom

,

allowing phytoplankton biomass to increase if division rates exceed other losses (Behrenfeld andBoss 2014, 2018).The critical turbulence hypothesis assumes the same threshold as the critical depth hypothesis but focuses on division rates in the actively mixing turbulent layer (Huisman et al. 1999;Taylor and Ferrari 2011).The dilution-recovery hypothesis suggests a more general balance between phytoplankton division and loss rates (often driven by grazing) allowing blooms to develop when division rates are declining so long as loss rates decline below division rates; for example, with dilution via mixed layer deepening (Evans et al. 1985;Behrenfeld 2010).Across proposed mechanisms, the population-level phenomena of blooms arise due to transient positive imbalances between division and loss rates (driven by variations in division, loss, or both) such that the net exponential growth rate () is positive, i.e.
= (() − ()) = (), with () − () = () > 0 prior to the bloom, where () is the division rate and () is the total loss rate.Days to weeks post-bloom, losses increase to match and exceed division, most often because of increased grazer abundance supported by the elevated phytoplankton biomass (Behrenfeld andBoss 2014, 2018).The magnitude and duration of the transient exponential growth period determines the magnitude of the bloom biomass maximum.We hereafter refer to a 'bloom' as the period of positive net exponential growth and 'bloom magnitude' as the concentration maximum achieved over this period (see Behrenfeld and Boss 2018 for a discussion of bloom definitions).The North Atlantic basin exhibits one of the largest and well-studied seasonal blooms across the global ocean.Deep winter mixing and rapid re-stratification creates ideal conditions for exponential phytoplankton growth, with bloom initiation following a northward progression as re-stratification occurs earliest at low latitude (Dutkiewicz et al. 2001;Siegel et al. 2002).Meteorological variability then modulates the precise timing and magnitude of the bloom according to the impacts on local mixing dynamics (Dutkiewicz et al. 2001;Follows and Dutkiewicz 2002).Importantly, variability in bloom timing and magnitude has been linked to the magnitude of carbon export (Briggs et al. 2011) and to fisheries productivity via the 'matchmismatch hypothesis' (Platt et al. 2003).Climate change is expected to alter North Atlantic bloom dynamics via a range of factors, including changes in seasonal mixing depths, nutrient fluxes, and the metabolic impacts of warmer temperatures (Sommer and Lengfellner 2008).Quantifying the dynamics of the North Atlantic spring bloom is thus of central importance for understanding the relevant oceanographic and ecological processes and will aid in tracking the associated impacts of climate change.Viewing interannual bloom magnitudes as extreme values in observed phytoplankton time series brings to bear the statistical theory of extreme values.Under the Fisher-Tippet-Gnedenko theorem, the maximum of a sequences of random variables converges in distribution to the Generalized Extreme Value Distribution (GEVD), itself a generalization of the Gumbel, Frechet, and Weibull distributions (Coles 2001).The theorem yields a three-parameter probability density function describing the limiting distribution of maximum values generated from samples of a stochastic process, analogous to the central limit theorem for the mean of a distribution (Coles 2001;described below).Like the lognormal or chi-square distribution, the GEVD can exhibit a strong right skew; however, the GEVD has the advantage of being derived specifically for stochastic maxima.While the GEVD is increasingly applied to geophysical and climate studies (Easterling et al. 2000;Katz 2010;Aghakouchak et al. 2020) there have been fewer applications to biological time series (Batt et al. 2017; but see interesting exceptions, e.g.Gaines and Denny 1993;Benedetti-Cecchi et al. 2015;Butitta et al. 2017).Consistent with the exponential nature of biological growth, Batt et al. (2017) found that biological time series exhibit consistently heavier tails in their extreme value distributions relative to chemical and geophysical time series.These findings suggest North Atlantic bloom magnitudes as an ideal target of extreme value analysis, due to its annually repeating cycle of transient and variable exponential growth.
Here I analyzed the North Atlantic satellite chlorophyll record to quantify seasonal phytoplankton bloom variability via extreme value analysis.I estimated GEVD parameters at a ¼° latitude-longitude scale.I mapped the fitted parameters spatially and evaluated the GEVD goodness-of-fit to the chlorophyll time series.I correlated the fitted parameters to bathymetric properties of the North Atlantic basin.I further evaluated evidence for non-stationarity (i.e., time-variability) in GEVD parameters in the context of satellite-observed chlorophyll and temperature trends across the basin.Results of this study will provide a statistical framework to describe interannual bloom variability and allow us to test an important hypothesis with respect to basin-scale environmental change.Methods Observations I analyzed two sets of basin-scale satellite chlorophyll observations.First, I used chlorophyll estimates from the Moderate-resolution Imaging Spectroradiometer-Aqua (MODIS-Aqua) sensor, as used in standard net primary productivity products, spanning years 2002-2021 from the Oregon State University Ocean Productivity database (https://sites.science.oregonstate.edu/ocean.productivity/).Temporal resolution of the satellite images was eight days per image.Eighteen complete time series years yielded eighteen annual maxima observations for fitting GEVDs.Associated inherent optical properties were estimated using the Garver-Siegel-Maritorena (GSM) algorithm (Maritorena and Siegel 2005).MODIS-Aqua based chlorophyll estimates were gap-filled for missing observations due to clouds according to the algorithm described at http://orca.science.oregonstate.edu/gap_fill.php.While gap-filling can alter the underlying chlorophyll distribution, it is not expected to affect the annually measured maximum value as the method calculates a temporal average over observed time points and would not generally exceed observed magnitudes.Secondly, I used the Ocean Colour Climate Change Initiative (OC-CCI) chlorophyll product (Sathyendranath et al. 2019) covering the same time period as the MODIS-Aqua dataset, accessed from www.oceancolour.org.The same time period was chosen to be consistent between the two analysis and compare results.OC-CCI chlorophyll is a synthetic product generated by combining information from multiple sensors, including SeaWiFS (Sea-viewing Wide-Field-of-view Sensor), MODIS-Aqua, MERIS (Medium spectral Resolution Imaging Spectrometer) and VIIRS (Visible and Infrared Imaging Radiometer Suite).OC-CCI was not gap-filled and therefore contained missing values due to clouds.I grided both sets of chlorophyll observations to ¼° latitude-longitude resolution.Wintertime satellite chlorophyll observations were not available for higher latitudes due to light limitation and were replaced with a value of zero concentration which did not affect the extreme value analysis.Sea surface temperature data from the MODIS-Aqua sensor were obtained from the Oregon State University Ocean Productivity Database site cited above.Bathymetric depth data were obtained from https://www.gebco.net/data_and_products/gridded_bathymetry_data/.I analyzed chlorophyll observations as a proxy for phytoplankton carbon biomass, noting that chlorophyll and carbon can decouple due to photoacclimation processes, particularly at low light (Behrenfeld et al. 2005;Sathyendranath et al. 2020).Despite this, chlorophyll is generally estimated with lower uncertainty than carbon (Sathyendranath et al. 2020) and plays a central role in rates of primary production.Carbon biomass estimates may also be used in future work with the analytical framework presented here.

Extreme Value Analysis
Given a time series of chlorophyll observations at an individual location, I estimated the parameters of the GEVD via the block maxima approach (Gilleland and Katz 2016), taking time series blocks as individual years.I define  = max( !,  " , … ,  # ) as the maximum chlorophyll measurement in a single year of  measurements.Over  years I have  yearly maxima, thus defining the observed annual maxima time series  !,  " , … ,  $ (the map showing the month when the maxima most frequently occurred is displayed in Supplementary Figure 1).For an annual maximum , the GEVD has a probability density function given by with ,  = 0, where , , and  are the location, scale, and shape parameters, respectively.The location parameter shifts the GEVD along the  axis, the scale parameter controls the spread, and the shape parameter controls the peaked-ness of the mode and heaviness of the distribution tail.Examples of how , , and  modulate the GEVD are given in Supplementary Figure 2. Formulas for the expected value, variance, and mode of the GEVD are given in Appendix A. For the satellite observations, m=18 for years 2002-2021 and n=46, on average, due to eight day spacing through the year.Goodness of fit was evaluated by quantile-quantile plots where the empirical quantiles of the observations are correlated against the theoretical quantiles predicted from the fitted GEVD distributions.In addition to the three-parameter GEVD described above, I also apply a nonstationary extension where the parameters are described as simple linear functions of time (Gilleland and Katz 2016) of the form where  is one of the GEVD parameters,  . is the intercept of the linear relationship, and  ! is the slope, i.e. the rate of change with respect to time.
The log-likelihood for the GEVD parameters (Gilleland and Katz 2016) is given by . I maximized the log-likelihood function with respect to the parameters to obtain empirical estimates.I maximized with respect to , , and  in the case of stationary GEVDs, and with respect to the intercept and slope in the case of nonstationary GEVDs.I restricted the analysis to estimating one nonstationary parameter at a time due to data restrictions and weak identifiability when multiple parameters are allowed to vary.I only considered linear functions of time and suggest nonlinear functions for future work.I used numerical optimization routines implemented in the extRemes library within the R programming language (Gilleland and Katz 2016).I compared stationary and nonstationary fits according to the Bayesian Information Criterion (BIC), given by BIC = −2P,  R,  S T −  log , where P,  R,  S T is the maximized likelihood at empirical estimates ,  R, and  S . is the number of parameters in the GEVD (three for stationary GEVDs, four for nonstationary GEVDs), and  is the number of yearly maxima used in the fit.A GEVD was fit to chlorophyll time series in each ¼° pixel.The parameters were mapped spatially and correlated with basin bathymetry.Parameter uncertainty was derived by taking the squareroot of the inverse Hessian matrix evaluated at the maximum likelihood estimates.

Results
The distributions of annual chlorophyll maxima showed excellent agreement with those predicted from the GEVD.Across the Atlantic basin, observed distribution quantiles correlated with those from the fitted GEVDs at  = 0.97 on the arithmetic scale (Supplementary Figure 3a) and  = 0.98 on the log scale (Supplementary Figure 3b).The spatial distribution of the quantile-quantile correlation was also consistent across the basin, with no apparent relationship with latitude or distance from the coast (Supplementary Figure 3c).The region of largest disagreement occurred off the southwest coast of Europe, yet correlations were still above  = 0.7 and remained so across the basin.When fitted GEVD parameters were mapped spatially I found that parameter magnitude closely followed basin bathymetry (Figures 1-2).Location, scale, and shape parameters were consistently elevated on the shelf and adjacent waters (<700m depth; Figure 1; Figure 2a-c).Location parameters were elevated by over fivefold, scale parameters elevated approximately twofold, and shape parameters elevated over threefold in waters shallower than 700m.The distinction in magnitude between shelf and open ocean water was strongest for location and shape parameters, and weaker for scale parameters, demonstrating a correlation between the mean interannual bloom magnitude and the 'heaviness' of the underlying distribution tail.This pattern appeared on the eastern and western sides of the basin.Deeper slope waters off the coast of Greenland also showed elevated location and shape parameters.The largest parameter magnitudes were found in the shallow Baltic Sea (Figure 1c,d; Figure 2a,c).The scale parameter showed a different pattern with bathymetry where parameter magnitude was modestly elevated in shelf waters but also showed a step-change decrease in the deepest waters (>4000m; Figure 1b; Figure 2b).However, the aerial distribution of shelf vs. deep water is markedly different, with deep waters limited to the southern half of the basin around the Mid-Atlantic ridge while shelf seas are widely distributed.Parameter uncertainty weakly correlated with parameter magnitude for location and shape parameters, particularly around the Greenland slope, but had no consistent relationship across the basin (Supplementary Figure 4).Scale parameter uncertainty was elevated in deeper water (Supplementary Figure 4).The area-weighted average of the parameter coefficients of variation (the standard deviation of the parameter uncertainty divided by the fitted mean parameter) was 0.11, 0.38, 0.24 for location, scale, and shape parameters, meaning the 1 uncertainty was 11%, 23.5%, and 24% of the mean, respectively (Supplementary Figure 4).I repeated the uncertainty analysis by artificially halving and doubling the number of observations to diagnose the impact of sample size on estimation uncertainty.Area-weighted coefficients of variation increased to 14%, 26.3%, and 34.9% when sample size was halved, and decreased to 8%, 16.6%, and 17.0% when sample size was doubled, respectively (Supplementary Figure 5).I quantified the correlation between fitted GEVDs parameters using linear relationships (Figure 2d-f).Bivariate relationships between parameters were well described by a linear intercept and positive slope, indicating strong positive scaling relationships.The strongest relationship was found between fitted location and shape parameters, reflecting that GEVD distributions increase in magnitude and heavy tailed-ness with decreasing bathymetric depth.I visually characterized how the GEVD changes from deep to shelf waters using the fitted linear relationships (Figure 2g-h), noting that this characterization represents the basin-averaged relationships so may not necessarily be representative of individual regions.The empirical pattern underlying positive scaling between location, scale, and shape parameters is reflected in the extreme value time series where the mean and median of the annual extremes are positively related to the extreme variance (Supplementary Figures 6-7).Using a nonstationary GEVD analysis, I found weak evidence for temporal trends in GEVD parameters, despite significant trends in chlorophyll levels and sea surface temperature across the North Atlantic (Figure 3).Nonstationary parameters were favored in 34.3%, 29.9%, and 36.3% of basin area for location, scale, and shape parameters, respectively (Figures 3a-c).Where nonstationary parameters were favored, trends in the location parameter positively correlated with background chlorophyll trends (r=0.70; Figure 3, Supplementary Figure 8).However, trends in the scale and shape parameters did not correlate with chlorophyll nor sea surface temperature trends (Supplementary Figure 8; r<0.1 in all cases).The weakest evidence for nonstationary parameters was found in the area with the strongest sea surface temperature trends, specifically the warming-cooling dipole pattern on the western side of the basin caused in-part by a slowdown of the Atlantic overturning circulation and associated northward heat transport (i.e. the North Atlantic 'warming hole'; Keil et al. 2020).I also examined parameter variability at the level of Longhurst biogeochemical provinces but found no consistent pattern, with high interand intra-province variability (Supplementary Figure 9).I repeated the GEVD parameter estimation using the OC-CCI chlorophyll product and found consistent results to those based on MODIS observations presented above.Quantile-quantile correlations showed a similarly good fit between observed OC-CCI quantiles and those predicted from the fitted GEVDs with a correlation of  = 0.98 in arithmetic space and  = 0.98 in log space (Supplementary Figure 10-11).Spatial patterns in fitted GEVD parameters were consistent across parameters estimated using the two datasets, despite some evidence for slightly reduced scale parameter magnitudes using OC-CCI chlorophyll (Supplementary Figure 10), which may be due to reduced variance in the synthetic multi-sensor OC-CCI dataset.Finally, I examined the annually observed extremes for autocorrelation and found weak evidence across the basin (Supplementary Figure 12).Discussion Our analysis demonstrates that annual chlorophyll maxima are well-described by the GEVD based on the statistical theory of extreme values.I achieved a high goodness-of-fit and interpretable spatial patterns across the North Atlantic basin.A clear pattern emerged in the correlation between GEVD parameters and bathymetric depth, with the magnitude and tailedness of chlorophyll extremes increasing in shelf and slope environments.The mechanism for this pattern is unclear and will require a further understanding of how nutrients, shelf stratification dynamics, grazing, and other factors interact to generate extreme chlorophyll concentrations.The heavy distribution tail on the shelf may be related to variability in bloom timing which has been shown to impact interannual variability in shelf bloom magnitude (Friedland et al. 2015).However, the observed relationships may also be a more general phenomenon where the tailedness of the chlorophyll distribution is a function of background concentrations which are coincidentally elevated in shelf environments.I suggest further work examining the potential mechanisms that could explain different extreme value distributions under contrasting oceanographic environments.Classical models of phytoplankton blooms (e.g.Behrenfeld and Boss 2014) may be extended to include stochastic forcing and generate statistical distributions of bloom interannual magnitude.Targeted sensitivity analyses of annual maxima distributions to different underlying forcing and background chlorophyll levels may uncover mechanisms for changes in distribution parameters.Results of the nonstationary analysis suggested weak evidence for changes in GEVD parameters over time.The lack of correlation with observed trends in background chlorophyll and sea surface temperature suggests that climate-related drivers are playing a limited role in modulating interannual bloom magnitude, despite the North Atlantic showing significant climate change (Keil et al. 2020).I note, however, that the current nonstationary analysis is limited by time series length, here applied to eighteen years of observed annual maxima.Uncertainty in the chlorophyll satellite record is also introduced via missing values due to clouds which partly obscures the distribution of maxima.Here I found that gap-filling the missing observations had no appreciable effect on the estimated extreme value parameters when comparing the gap-filled and non-gap-filled chlorophyll datasets.Continued studies will be required to monitor changes in bloom magnitude over time.An extended satellite record will provide greater statistical power to detect climate-driven trends.An extended record will also constrain more complicated functions describing the variation of parameters with time and their statistical association with environmental factors.Biogeochemical general circulation models may also be analyzed to gain insight into the distribution and generating mechanisms of annual maxima.
Ecologically, interesting questions arise about how extreme blooms propagate through food webs, contribute to carbon export, and impact ecological processes more broadly.For example, annual fisheries recruitment is often characterized by heavy-tailed distributions where individual years exhibit extremely large cohorts, often fueling the fishery for years (Saetre et al. 2002).Extreme bloom years may increase the probability of strong cohorts via the match-mismatch mechanism (Platt et al. 2003), perhaps with temporal lags between blooms and recruitment modulated by trophic transfer.With respect to carbon export, I expect extreme blooms to contribute disproportionately to interannual carbon fluxes due to the commonly-observed positive effect of phytoplankton productivity on carbon export efficiency (Britten and Primeau 2016) and observed carbon fluxes associated with blooms in the North Atlantic (Briggs et al. 2011).The observed positive relationship between location and shape parameters may be of particular interest where the frequency and intensity of extreme blooms scales with the background chlorophyll levels.Databases of carbon flux observations may be used to test these relationships at the basin and global scales (Mouw et al. 2016).Beyond fisheries and carbon export, I envision extreme value distributions to be broadly useful in characterizing the response of ecological processes to environmental extremes.The increase in applications of extreme value theory to environmental processes (Aghakouchak et al. 2020) naturally leads to questions of how environmental extremes impact ecology.The GEVD is one extreme value analysis tool that is theoretically motivated when considering block maxima (which was particularly appropriate here to describe the maxima of a repeating annual cycle) however other statistical descriptions of ecological heavy tailed-ness can also be useful in this context, for example the lognormal distribution (e.g.Anderson et al. 2017).In summary, the GEVD provided a useful statistical description of bloom variability and a general framework to quantify the spatiotemporal statistics of interannual bloom maxima.I hope this study spawns further analysis of marine ecosystem variability using extreme value theory to better understand how environmental conditions give rise to ecological extreme events, how extreme blooms contribute to fisheries productivity and carbon export, and how these processes may change with climate.Acknowledgements I gratefully acknowledge funding through the Simons Foundation Collaboration on Computational Biogeochemical Modeling of Marine Ecosystems and the Simons Foundation Postdoctoral Fellowship in Marine Microbial Ecology.Data and Code Availability: Original data used in this study are publicly available via the sources cited in the text.Processed data and metadata used for analysis will be made available in the Dryad data repository (Britten 2021).Code used to perform the GEVD analysis and visualize the results is publicly available at: https://github.com/gregbritten/chl_extremes_public.References Aghakouchak, A., F. Chiang, L. S. Huning, and others. Figures