Compound hot temperature and high chlorophyll extreme events in global lakes

An emerging concern for lake ecosystems is the occurrence of compound extreme events i.e. situations where multiple within-lake extremes occur simultaneously. Of particular concern are the co-occurrence of lake heatwaves (anomalously warm temperatures) and high chlorophyll-a extremes, two important variables that influence the functioning of aquatic ecosystems. Here, using satellite observations, we provide the first assessment of univariate and compound extreme events in lakes worldwide. Our analysis suggests that the intensity of lake heatwaves and high chlorophyll-a extremes differ across lakes and are influenced primarily by the annual range in surface water temperature and chlorophyll-a concentrations. The intensity of lake heatwaves is even greater in smaller lakes and in those that are shallow and experience cooler average temperatures. Our analysis also suggests that, in most of the studied lakes, compound extremes occur more often than would be assumed from the product of their independent probabilities. We anticipate compound extreme events to have more severe impacts on lake ecosystems than those previously reported due to the occurrence of univariate extremes.


Introduction
Climatic extremes, in particular storms, heatwaves and droughts are now becoming more frequent, a trend that has been linked to directional climate change (Fischer andKnutti 2015, Seneviratne et al 2021). These extreme events also frequently, but not exclusively, result in extreme within-lake conditions (Perga et al 2018, Stockwell et al 2020, Jennings et al 2021, including the occurrence of lake heatwaves (Woolway et al 2021a(Woolway et al , 2021b, more severe hypoxia (Jankowski et al 2006), and the occurrence of high phytoplankton biomass (Jöhnk et al 2008, Posch et al 2012, Mastrotheodoros et al 2020. Within-lake extremes such as these can have a dramatic influence on the functioning of lake ecosystems, and likewise result in numerous negative impacts on the many benefits that they provide to society, including the provision of safe water for drinking and irrigation, recreational use, and economic benefits such as fisheries and tourism (Rinke et al 2019).
An emerging concern for freshwater ecosystems is compound extreme events, i.e. situations where multiple within-lake extremes occur simultaneously. First introduced in 2012 by the Intergovernmental Panel on Climate Change Special Report on Climate Extremes (Seneviratne et al 2012), research into compound events has evolved into an interdisciplinary subject at the interface of climate science, climate-impact research, and statistics. In recent years, interest in compound events has evolved into a rapidly growing research field . However, studies thus far have focused on compound events over land (Ridder et al 2020) and in the ocean (Le Grix et al 2021). To the best of our knowledge, no study to date has investigated compound extreme events in lakes, despite the negative effects that combined extremes can have on the ecosystem services that they provide. The joint occurrence of extreme events has the potential to exacerbate negative impacts in lakes compared to those that occur following univariate extremes, leading to more severe ecologically and socioeconomically damaging events (Zscheischler et al 2014). Given the expected increase in extremes across much of the globe with climate change, an evaluation of compound extreme events in lakes has now taken on new and critical importance.
The overarching aim of this study is to investigate compound extreme events in lakes, focussing on the occurrence of lake heatwaves and high phytoplankton biomass. This is achieved by analysing lake surface temperature observations to investigate lake heatwaves (Woolway et al 2021a), and chlorophyll-a (hereafter referred to as chlorophyll) concentrations to investigate the occurrence of high phytoplankton biomass (Kasprzak et al 2008). Compound lake heatwave and high chlorophyll extreme events can severely impact lake ecosystems, especially when they act synergistically. For example, while the increased severity of lake heatwaves can expose aquatic organisms to novel and, in some cases, lethal conditions, leading to an increased risk of mass mortality events (Till et al 2019), anomalously high phytoplankton biomass can lead to, among other things, hypolimnetic oxygen depletion as well as a deterioration in water quality (Jane et al 2021, Smucker et al 2021). In this study, we provide the first assessment of compound lake heatwave and high chlorophyll extreme events in lakes worldwide. Moreover, we investigate the main lake-specific variables that influence the intensity of univariate extremes and, likewise, the intensity of compound lake heatwave and high chlorophyll extreme events, as well as investigate in which lakes are these compound extremes most likely to occur.

Satellite observations
Satellite observations of lake surface water temperature were drawn from the Global Observatory of Lake Responses to Environmental Change (GloboLakes) project (www.globolakes.stir.acuk/). Satellite observations of lake surface chlorophyll were drawn from the European Space Agency Data User Element programme GlobColour project (www.globcolour.info/). These satellite-derived lake surface temperature and chlorophyll data have been validated previously by Carrea and Merchant (2019) and Kraemer et al (2017), respectively. In this study, we only include lakes with at least 20 years of daily observational data. A total of 104 lakes with available lake surface temperature and chlorophyll data were investigated. The studied lakes varied in surface area between 113.5 km 2 and 82 103 km 2 , average depth between 0.4 and 738 m, in altitude between −10 m above sea level (a.s.l.) and 4724 m a.s.l., and in latitude between 50.2 • S and 69.4 • N (table S1 (available online at stacks.iop.org/ERL/16/124066/mmedia)) (Messager et al 2016).
To average across the intralake heterogeneity of lake responses to climate change (Woolway and Merchant 2018) we averaged the observed data across the surface area of each lake. Thus, here we calculate daily lake-wide mean time series of lake surface temperature and chlorophyll concentrations for each studied site. To prevent partial cloud or ice cover from biasing daily lake-wide averages, lake-specific boosted regression trees (i.e. fitted separately for each lake) were first used to factor out bias that could result from occasional partial data coverage of lake surface temperature and surface chlorophyll concentrations. Each lake-specific boosted regression tree modelled lake surface temperature and chlorophyll as a function of the day of the year, latitude, and longitude with a tree complexity of 3 to allow for three-way interactions between these variables. We optimized the learning rate separately for each boosted regression tree for each lake by iteratively running the model with progressively smaller learning rates (from 0.8, 0.4, 0.2, 0.1, 0.05 to 0.025) until the number of trees in the boosted regression which minimized the predicted deviance was greater than 1000 as suggested in previous literature (Elith et al 2008). The predicted values from the resulting model were also used to gapfill each daily lake-wide average time series.

Extreme event definitions
Using the satellite-derived observations we identified extremes in each studied lake relative to a local and seasonally varying percentile threshold (figure 1). Most notably, lake heatwaves were identified from daily lake surface temperatures following the methods described by Woolway et al (2021a). Specifically, using the R package 'heatwaveR' (Schlegel and Smit 2018), lake heatwaves were defined as when daily lake surface temperatures were above a local and seasonally varying 90th percentile threshold, which was calculated for each calendar day using observations across all years. For consistency, the same procedure was then used to identify high chlorophyll extreme events in the studied lakes. In addition, the 90th percentile seasonal cycle was smoothed with a 31 day moving average in order to remove noise on a daily scale associated with the relatively short data record (  Shown are (a) the threshold values used for defining extremes relative to the climatological distribution of lake observations, with the 90th percentiles highlighted. These percentiles are defined for each day of the year; thus, the percentile values vary temporally. Also shown are example time series of (b) lake surface temperature and (c) chlorophyll concentration. A lake heatwave and/or a high chlorophyll extreme event occur (shaded regions) when the lake surface temperature and/or chlorophyll concentrations exceed their 90th percentiles. Yellow bands indicate the occurrence of compound events. The intensity of univariate extremes is calculated as the average of all anomalies during an event.

Statistical analyses of univariate extremes
To understand the across lake variations in the intensity of lake heatwaves and high chlorophyll extreme events, we examined the influence of lake-specific variables that we hypothesize may have an effect. For the intensity of lake heatwaves, we investigated the influence of lake size (surface area and average depth)-which has been shown previously to influence lake heatwave intensity (Woolway et al 2021a(Woolway et al , 2021b)-and the annual mean, range and long-term trend in lake surface water temperature and chlorophyll concentrations. Here, chlorophyll concentration is used as a proxy for water clarity, which is known to influence lake surface temperature (Mazumder et al 1990, Read andRose 2013) and, in turn, is hypothesized to alter the intensity of thermal extremes. Essentially, here we test if the observed intensity of lake heatwaves across the studied sites is influenced by the annual mean, range, and long-term trend in water clarity. The long-term trend in water temperature is also included as a potential driver of heatwave intensity, as one might expect a lake that experiences greater warming to also experience more intense heatwaves in the most recent and thus warmer years (Woolway et al 2021b). For the intensity of high chlorophyll extreme events, we investigated the influence of the same lake-specific variables as described above. We hypothesized that lake size could influence the intensity of high chlorophyll extremes as one might expect smaller lakes to, for example, receive greater nutrient input from the surrounding catchment relative to their size (e.g. during storms), and the influence of sediment internal nutrient loading on surface chlorophyll concentrations could be larger in shallow lakes leading to more extremes. Lakes with a greater annual range in chlorophyll concentration could also experience more intense high chlorophyll extremes, as a property of a broader chlorophyll distribution. Finally, the annual mean, range and long-term trend in lake surface temperature can be expected to influence chlorophyll extremes due to the temperature dependence of algal growth (Paerl and Huisman 2008), and are investigated here. In this study, trends were calculated separately for each lake by first subtracting the climatological seasonal cycle and then using Sens slopes on the time series anomalies within the 'trend' package in R (R Core Team 2019, Pohlert 2020).
To identify which of the lake-specific variables had the greatest influence on the across lake variations in the intensity of these univariate extreme events, we conducted regression tree analysis (De'ath and Fabricius 2000). The most parsimonious regression tree for explaining the across-lake variations in the univariate extremes was selected by pruning the tree to the level where the complexity parameter minimized the cross-validation error. We calculated the percent variation explained by the regression tree (R 2 ) as: R 2 = 1 − Relative Error (Sharma et al 2012). Regression trees were developed in R using the 'rpart' and 'rpart.

Compound extreme event definitions
Intuitively, compound lake heatwave and high chlorophyll extreme events are defined as when both lake heatwaves and extreme chlorophyll conditions co-occur (figure 1). The frequency is thus calculated as the number of days in which lake heatwaves and high chlorophyll extreme events occurred at the same time relative to the number of days with available data. If lake heatwaves and high chlorophyll extreme events are independent, one would expect the frequency of their co-occurrence (i.e. a compound extreme event) to equal to the product of their univariate frequencies. However, if lake temperature and chlorophyll concentration anomalies are correlated, these compound extreme events might occur more often than expected from the product of their independent probabilities. In this study, we investigate the dependence of these univariate extremes and, likewise, estimate in which lakes are compound extreme events more likely to occur. This is achieved by calculating the likelihood multiplication factor (LMF), which is defined as the ratio of the observed frequency of compound extreme events to their expected frequency under the assumption of independence (Zscheischler and Seneviratne 2017): The LMF can vary between 0 and infinity. An LMF greater than 1 suggests a positive dependence between the univariate extremes and, likewise, that compound lake heatwave and high chlorophyll extreme events are more likely to occur in each lake. In contrast, an LMF < 1 suggests a negative dependence between the univariate extremes, meaning that they are likely to occur less often than suggested by their univariate statistics. An LMF of 1 suggests that the univariate extremes are independent. As an example, if lake heatwaves and high chlorophyll extremes each occur 10% of the time, one would expect a compound extreme to occur during 1% of the study period if they were independent (i.e. 0.1 × 0.1 = 0.01 = 1%). In this study, we also test if the calculated LMF values are significantly different from 1 by using a bootstrapping procedure where we compare the LMFs calculated from the observed and a resampled (1000 datasets) time series. Notably, we assess whether the computed LMF for each lake falls outside the 95th percentile range of the resampled ones (Ridder et al 2020). This is equivalent to a p-value of 0.05 or below. Finally, we conducted Random Forests analysis to try and explain the variations across lakes in the computed LMF. We tested the influence of all predictor variables described above for investigating the across-lake variability in the intensity of univariate extremes. The randomForest function in R (Liaw and Wiener 2002) was used for this analysis. Random forests are based on an ensemble of decision trees (Breiman 2001).
Here, we generated 1000 trees from which we calculated variable importance to generally identify how often a predictor variable was the most important predictor in a single decision tree. We used the mean decrease in accuracy, describing the prediction error calculated by the mean squared error on the out-ofbag portion of the data (Liaw and Wiener 2002).

Results
We begin our investigation by calculating the average intensity of lake heatwaves and high chlorophyll extreme events across the studied lakes. Our analysis suggests that the average intensity of these univariate extremes vary considerably across the global lake distribution (figures 2 and 3). Notably, the average intensity of lake heatwaves and high chlorophyll extremes vary between 0.3 and 2.5 • C, and 0.05 and 7.8 µm l −1 , respectively. To offer insights about which lake-specific variables influence the intensity of these observed univariate extremes, we used a regression tree analysis which included a number of predictor variables that we hypothesised might have an effect. Using these predictors, we were able to explain 82% and 91% of the across lake variation in the average intensity of lake heatwaves and high chlorophyll extreme events, respectively. However, the lake-specific variables shown by the regression trees to have the dominant influence on the acrosslake variability differed between lake heatwaves and high chlorophyll extreme events. Also, all the predictors originally considered in this analysis were not   present in the pruned regression tree, suggesting that they do not provide substantial predictive power. For the intensity of lake heatwaves, the most important predictor variables were the annual range in lake surface water temperature and average lake depth (figure 2(c)). Specifically, our analysis suggests that shallow lakes with high seasonal temperature variability experience the most intense lake heatwaves, on average (figures 2(b) and (c)). Moreover, the regression tree analysis suggested that the most intense heatwaves (=2 • C, on average) occur in lakes with a seasonal temperature range of greater than 12 • C and have an average depth of less than 10 mlakes in this category account for 13% of the studied sites. Another important predictor in the regression tree was the annual average lake surface temperature with cooler lakes, notably those that are also relatively shallow, experiencing more intense heatwaves, on average (=1.6 • C)-these lakes account for 12% of all lakes in this investigation. Lastly, our regression tree suggests that lake area is also an important predictor of the across lake variation in heatwave intensity, with smaller lakes experiencing slightly more intense heatwaves compared to larger lakes with a similar seasonal temperature range and depth. The long-term trend in surface water temperature, which varied from −0.24 to 0.64 • C decade −1 across the studied lakes (average = 0.16 • C decade −1 ), with 80% of lakes experiencing a warming trend, was not an important predictor of lake heatwave intensity.
Similar to the observed intensity of lake heatwaves, our analysis suggests that the most important predictor variable explaining the across-lake variation in the intensity of high chlorophyll extremes is the annual range in chlorophyll concentration (figure 3). Most notably, the most intense high chlorophyll extremes occur in lakes with a high seasonal range in chlorophyll concentrations. None of the other predictor variables included in the regression tree analysis provided additional predictive power compared to that explained by the annual range in chlorophyll concentration. Also, similar to lake heatwaves, our analysis suggests that the long-term trend in chlorophyll concentrations, which varied from −2.1 to 2.5 µm l −1 decade −1 across lakes (48% of lakes experienced an increasing trend), was not an important predictor of the intensity of high chlorophyll extremes.
Intuitively, lakes that experience the most intense heatwaves and the most intense high chlorophyll extreme events, also experience the most intense compound events. Less clear, however, is if these compound events are more common in some lakes compared to others, and if they occur more frequently than suggested from their individual occurrence. By calculating the LMF, our data suggested that 96% of the studied lakes (88% of which were statistically significant) experience more compound extreme events than expected by the combined frequency of the univariate extremes (i.e. LMF > 1). Thus, in most of our studied sites, there is an increased likelihood of compound lake heatwave and high chlorophyll extremes due to dependence (figure 4). To explain the variation in LMF across lakes, we used a random forest analysis. Despite using a large number of predictor variables (lake surface area, average depth, the annual mean, range and long-term trend in lake surface water temperature and chlorophyll concentrations), the random forest analysis only explained 15% of the variation in LMF. This suggests that most of the across-lake variations were not explained by the predictor variables considered.

Discussion
In this study, we provide the first assessment of univariate and compound lake heatwave and high chlorophyll extremes, specifically focussing on their intensity across a global lake distribution. Below, we discuss our findings in the context of previous work as well as discuss the limitations of the present study and future directions for developing further our understanding of compound extreme events in lakes.

Univariate extreme events
To understand the drivers of compound extreme events, we first investigated the dominant drivers of the univariate extremes. Our analysis suggested that the intensity of lake heatwaves is, similarly to marine heatwaves (Frölicher et al 2018, Oliver et al 2018), influenced strongly by the annual range in surface water temperature. It is higher in regions with high surface temperature variability and lower in regions with low variability, such as in high latitude and tropical lakes, respectively (Maberly et al 2020). These results agree with those reported by Woolway et al (2021a). However, by scaling the heatwave intensity by the observed annual temperature range in each lake (figure S1), we can better appreciate the ecological consequences of these thermal extremes for lakes worldwide. Most notably, a 1 • C heatwave in a lake that typically varies from 26 • C to 28 • C (e.g. low latitude lakes) might be far more impactful relative to a similar heatwave occurring in a lake that typically varies from 0 • C to 28 • C (e.g. north temperate lake). This reflects evidence showing species' thermal tolerances are narrower when environmental temperature variation is low (Kraemer et al 2016, 2021). Also, consistent with Woolway et al (2021a), our analysis suggests that deep lakes experience less intense heatwaves, which is explained due to their large thermal inertia, resulting in thermal anomalies being diluted over a larger volume of water. Lastly, our analysis suggested an influence of lake surface area, with larger lakes experiencing less intense heatwaves. This could be explained by (a) lakes with larger surface area (and thus greater fetch) typically experiencing higher near-surface wind speeds  which, intuitively, results in thermal anomalies being eroded more quickly and thus preventing the development of intense lake heatwaves; and (b) higher wind speeds often lead to deeper surface mixed layers, resulting in thermal anomalies being diluted over a larger volume of water during the stratified season (Piccolroaz et al 2015).
In terms of high chlorophyll extremes, our analysis suggests that the most intense events occur in lakes that experience large seasonal variations in chlorophyll concentrations (i.e. similar to lake heatwaves). We expect that lakes with more variable chlorophyll concentrations will have more intense extremes, simply as a property of a broader chlorophyll distribution. Temporal variations in chlorophyll concentrations can differ among lakes according to various factors, including nutrient loading and its seasonal variability (Oliver et al 2017). Notably, higher nutrient input with considerable seasonality (e.g. flood pulses bringing in agricultural runoff) would typically lead to higher chlorophyll variability. Physical lake processes will also play an important role with, for example, amictic lakes expected to have lower chlorophyll variability compared to lakes that undergo seasonal mixing.

Compound extreme events
In this study, we also explored the frequency of compound lake heatwave and high chlorophyll extreme events and, specifically, investigated if they occurred more frequently than expected from their individual occurrence (i.e. the combined frequency of univariate extremes). Our analysis suggested that in 96% of our studied lakes, compound lake heatwave and high chlorophyll extreme events occurred more often than expected from the product of their independent probabilities (i.e. LMF > 1). An LMF greater than 1, suggests a positive dependence between lake heatwaves and high chlorophyll extreme events. This can be expected given the strong temperature dependence of algal growth (Yvon-Durocher et al 2015). This may be particularly evident for some phytoplankton species that have ecophysiological adaptations that allow them to dominate aquatic systems under extended periods of warm and irregular temperatures (Jöhnk et al 2008, Duan et al 2009, Zhang et al 2016, Rasconi et al 2017, Baker and Geider 2021, as occurs during lake heatwaves. In some lakes, we calculated an LMF < 1, which indicates a negative dependence between the drivers of lake heatwaves and high chlorophyll extremes-i.e. that the driver of one extreme has the opposite effect on the occurrence of the other. This most likely occurs due to higher lake surface temperatures during lake heatwaves resulting in stronger thermal stratification (Jankowski et al 2006) and, in turn, a reduced supply of nutrient rich bottom waters to the near-surface layer, as previously reported in both marine (Bopp et al 2001, Steinacher et al 2010, Laufkötter et al 2015, Hayashida et al 2020 and lacustrine systems (Posch et al 2012, Yankova et al 2017, Schwefel et al 2019, Lau et al 2020, Krishna et al 2021.

Limitations and future directions
Although we consider our results robust, and that they fill an important knowledge-gap in the field of compound extreme events, there are some limitations to consider when interpreting our key findings. Firstly, some of our results are influenced by the assumptions made during the quantitative analysis, including (a) the percentile threshold used in the definition of extremes, and (b) the length of data record. In this study, we decided to use a 90th percentile threshold when defining high chlorophyll and hot temperature extremes. This threshold was chosen primarily to be consistent with previous studies of univariate (Woolway et al 2021a) and oceanic compound extreme events (Frölicher et al 2018), but also to ensure a relatively large sample size for the extreme event detection given the relatively short time series. Using a different percentile threshold would unlikely influence our key findings but would influence the exact intensity of the univariate extremes. Furthermore, in this study, the number of years with available data for each studied lake was 20 years. Thus, the time series used to calculate the climatology is shorter than the recommended 30 year period defined by the World Meteorological Organisation guide to climatological practices (WMO 2011). This might influence the definition of extreme events as with fewer years of data the presence of, for example, an anomalously warm or cold year will have a larger effect on the climatology than with a larger sample size. However, we note that this has been shown previously to have relatively minimal influence on the definition of marine heatwaves (Schlegel et al 2019) and the validation of modelled and satellite-derived lake heatwaves by Woolway et al (2021a), also support this finding. Despite these limitations, our results provide an important step forward in our understanding of lake responses to a more extreme world.
While we provide the first assessment of compound lake heatwave and high chlorophyll extreme events, additional observational-based and modelling studies are needed to identify their exact drivers, their evolution with climate change, and their ultimate impacts on lake ecosystems. Moreover, here we focused on the occurrence of extremes in two essential lake variables. Future studies should aim to investigate the occurrence of extremes in additional lake properties, including anomalous dissolved oxygen concentrations, pH, and/or variations in lake level, among others. This will require wider efforts to collate available data from lakes worldwide, including from international networks such as the Global Lake Ecological Observatory Network, from legacy Earth observations made over the past decades, and from current and prospective satellite missions. One effort in this direction is the ongoing European Space Agency Climate Change Initiative for Lakes project, which coordinates a range of remote sensing techniques to develop time series data for multiple lake essential climate variables-including lake surface temperature, ice cover, water level/extent, and water-leaving reflectance. These data could allow further studies to explore diverse compound extreme events in lakes (Bevacqua et al 2021), and it provides good prospects for the impact of compound extremes to be better quantified and appreciated at a global scale. Overall, we hope that our contribution will stimulate wider efforts to understand the far-reaching global implications of compound extreme events, and their impacts on aquatic ecosystems. Ultimately, we anticipate compound extreme events to have more severe impacts on lake ecosystems than those previously reported due to the occurrence of univariate extremes (

Data availability statement
All lake chlorophyll concentration data used here are also publicly available and can be accessed through an ftp server (ftp://ftp.hermes.acri.fr) following user registration (https://hermes.acri.fr/ index.php?class=ftp_access).
The data that support the findings of this study are openly available at the following URL/DOI: https://catalogue.ceda.ac.uk/uuid/ 76a29c5b55204b66a40308fc2ba9cdb3.