Bottom-up identification of key elements of compound events

Compound weather and climate events are combinations of climate drivers and/or hazards that contribute to societal or environmental risk. Studying compound events often requires a multidisciplinary approach combining domain knowledge of the underlying processes with, for example, statistical methods and climate model outputs. Recently, to aid the development of research on compound events, four compound event types were introduced, namely (1) preconditioned, (2) multivariate, (3) temporally compounding, and (4) spatially compounding events. However, guidelines on how to study these types of events are still lacking. Here, based on a bottom-up approach, we consider four case studies, each associated with a specific event type and a research question, to illustrate how the key elements of compound events (e.g., analytical tools and relevant physical effects) can be identified. These case studies show that (1) impacts on crops from hot and dry summers can be exacerbated by preconditioning effects of dry and bright springs. (2) Assessing compound coastal flooding in Perth (Australia) requires considering the dynamics of a non-stationary multivariate process. For instance, future mean sea-level rise will lead to the


Introduction
Many high-impact weather and climate events arise from a combination of multiple drivers and/or hazards such as droughts, heatwaves, heavy precipitation, and storms (Zscheischler et al., 2018). Therefore, understanding and ultimately modelling the complex nature of such compound events is essential for accurately assessing weather-and climate-induced risks. However, although the inherently complex nature of the environmental system has always threatened societies via combined weather and climate processes, compound-event-oriented research has emerged only recently. As a result, our abil-ity to identify, understand, and model compound events is still in its infancy .
Studying compound events is a complex task, which often requires a multidisciplinary approach involving the understanding of the underlying physical processes beyond the impact, climate, and weather elements, and advanced process-based and statistical modelling (Bevacqua et al., 2017). Consequently, for users, it is often difficult to identify the key elements (i.e., physical variables, relevant spatial and temporal scales as well as suitable analysis tools and datasets) necessary to address a given compound event-related research question. In this context, a common framework for compound event analysis could be very helpful to users. However, developing such a framework is challenging because the physical elements that constitute a compound event can combine in a myriad of ways, for example across multiple spatiotemporal scales (Messori et al., 2021). In particular, these physical elements are (i) the societal or environmental impacts (e.g. crop failure), (ii) the climate-related hazards (e.g. a heatwave or flood), causing the impact itself, (iii) the drivers of the hazards (e.g. a long-lasting atmospheric blocking causing a heatwave), and (iv) the modulators of the drivers (e.g. an anomalous state of the atmospheric circulation during a season). Finally, (v) climate change can influence modulators, drivers and hazards . The multiple possible combinations of these physical elements result in distinct event types whose physical understanding, modelling, and analysis require different approaches, methods, and datasets.
Recently, to aid the framing and development of new research on compound events, four distinct classes of event types were proposed . (1) Preconditioned events refer to a situation where a climate or weather background precondition can enhance the impact triggered by one or more hazards; (2) Multivariate events can result in impacts due to concurrent drivers and/or hazards at about the same location; (3) Temporally compounding events, which can lead to impacts due to a sequence of hazards; (4) Spatially compounding events, which can lead to aggregated impacts as a result of hazards occurring in multiple connected locations. Importantly, this typology has the potential for enabling the development of user-friendly practical guidelines for compound event analysis, which would be tailored to specific compound event types.
In this study, we contribute to the development of such user-friendly guidelines. We take a bottom-up approach and illustrate how to address specific research questions for four case studies selected to represent the four different compound event types. Such a pragmatic approach allows for illustrating and discussing how to identify the key elements of compound events in a context that is plausible for users. While the presented insights will be somewhat specific for the selected four case studies, considering four event types allows exploring a relatively wide range of methods and physical characteristics that will be used to provide general guidance for compound event analysis.
The four considered compound events are summertime vegetation impacts from springtime weather (a preconditioned event), projections of compound coastal flooding (multivariate), landslides from precipitation clustering (temporally compounding), and concurrent crop failures across different European countries (spatially compounding). In each of the following four subsections, we first introduce the case study and associated research question. We then identify the key elements of the compound events including tools that can be used to address the research question at hand. At the end of each subsection, we identify compound events that could be investigated with tools similar to those considered here. In a final section, we discuss common aspects and guidelines emerging from the four case studies.
2 Preconditioned event: Springtime weather modulates summertime vegetation impacts

Introduction and research question
The negative impact of hot and dry summers on vegetation productivity can be exacerbated by a pre-existing soil moisture deficit arising from meteorological drought in spring Lian et al., 2020). As illustrated in Figure 1, a soil moisture deficit, i.e. the precondition, can result from high evaporation in spring (driven by high temperatures and shortwave radiation) and/or low precipitation. Furthermore, high radiation can favour vegetation growth in spring (when water is not limiting), increasing transpiration. Both effects increase evapotranspiration (ET), ultimately depleting soil moisture. Consequently, spring weather conditions can increase the vulnerability of vegetation to summer drought and heat. To investigate this effect, we ask: does accounting for spring weather conditions improve the prediction of summertime vegetation impacts? Figure 1. Physical elements of vegetation impacts arising from a combination of summertime hot and dry conditions and soil moisture precondition driven by springtime effects (see details in section 2.1). Arrows indicate causal links. Note that while wind and humidity also modulate evapotranspiration (ET), they are not shown here for simplicity.

Case Study
We focus on forests and croplands in the northern mid-latitudes (30-55°N), where we expect the effect of meteorological drought to be largest due to the close link between vegetation activity and spring onset. The analysis is carried out separately for croplands and forests as we expect that rainfed croplands are more sensitive to springtime preconditioning effects than forests, which have deeper access to water and are more conservative in their water use over the growing season (Teuling et al., 2010;Bastos et al., 2014;Flach et al., 2018).

Data
We employ temperature (T), precipitation (P), and shortwave incoming solar radiation (SSRD) from the ERA5 reanalysis dataset for the period 1981-2018 (Hersbach et al., 2020). For vegetation data, we employ the Leaf Area Index (LAI) that is derived from Advanced Very High-Resolution Radiometer (AVHRR) satellite data (Claverie et al., 2016). The LAI provides an indication of plant development; high LAI values indicate more growth and healthier plants, and vice versa for low LAI. All variables are first regridded to a 0.5°lat-lon grid via conservative remapping (Schulzweida et al., 2006)  precipitation (P), and radiation (SSRD) in years with extremely low summer leaf area (LAIJJA < 5 th percentile, red) and the remaining years (grey). Boxplots indicate the median (line), interquartile range (box) and 1.5 times the interquartile range (whiskers). Performance of the logistic regression model (based on ROC curves) fitted for (d) forests and (e) croplands. ROC curves are plotted for two separate models, i.e. the base model (black) and the model that includes spring radiation SSRDMAM (light blue), along with the corresponding area under the curve (AUC). Note, the larger the AUC, the better the model. The differences in AUC of the two models are statistically significant for croplands and not significant for forests (p-val< 0.01). (f) Predicted probabilities of extremely low (< 5 th percentile) summer LAI given specified values of PJJA (x-axis) and SSRDMAM (colours), while TJJA is set to its mean value (i.e., TJJA=0). and then averaged over the spring (MAM: March, April, May) and summer (JJA: June, July, August) seasons. The categorisation of grid cells in forest or crop grid cells (Figure 2a) is done using the European Space Agency Land Cover dataset ; see supplementary material for details. We create two datasets for croplands and forest, respectively, by first standardising all variables per grid cell (subtracting the mean and dividing by the standard deviation) and then pooling grid cells that have the same vegetation type.

Tools
As a first step, we identify potential drivers of extremely low vegetation activity in summer by building composites of spring/summer meteorological variables corresponding to years with extremely low (< 5 th percentile) and not extremely low (> 5 th percentile) LAI JJA . Then, we use logistic regression to predict years with extremely low LAI. Specifically, we predict whether summer vegetation is extremely low based on a reference model considering only summertime meteorological conditions as predictors, and test whether the prediction accuracy significantly increases when including springtime meteorological conditions in the model (see supplementary materials for details). We test the performance of the models using Receiver Operating Characteristic (ROC) curves (Postance et al., 2018;Mandrekar, 2010).

Results and discussion
In line with Figure 1, extremely low LAI JJA in croplands are associated with precipitation, temperature, and radiation anomalies in both spring and summer (Figure 2c). For forests, these anomalies are much smaller, particularly in spring, and show opposite patterns compared to croplands in summer (Figure 2b).
Using a logistic regression model, we investigate if the prediction of extremely low LAI JJA is improved by adding spring conditions to a reference model that considers only summertime hot and dry conditions (i.e. T JJA and P JJA ). For forests, including spring radiation (SSRD M AM ) does not improve the model's prediction (Figure 2d), while in contrast, a significant improvement (p-val< 0.001) is found in croplands ( Figure 2e). The fitted model unitless coefficients also reveal that the meteorological variables have a stronger influence on crops (SSRD M AM : 0.44, T JJA : 0.44, P JJA : -0.31) than forests (SSRD M AM : 0.17, T JJA : -0.12, P JJA : 0.18). Therefore, hereafter we focus on the effect of spring conditions on cropland only. Note that replacing SSRD M AM with P M AM yields the same performance (whose consequence is discussed later), while adding T M AM does not improve the performance.
For croplands, based on the logistic regression model, we estimate the contribution of SSRD M AM to the probability of a low LAI JJA . For simplicity, probabilities of low LAI JJA are predicted as a function of P JJA and SSRD M AM , while T JJA is set to its mean value. The probability of low LAI JJA increases with decreasing P JJA and increasing SSRD M AM (Figure 2f). Hence, cropland impacts are highest when summer with low precipitation follows a spring with high radiation. For instance, the probability of an extremely low summer cropland LAI given a dry summer (P JJA = -2) and a normal spring (SSRD M AM = 0) is 0.1, but increases to 0.3 for SSRD M AM = 4. Furthermore, for T JJA = 2 and P JJA = -2, indicative of a hot and dry summer, this probability increases to 0.5. Overall, the above indicates that spring weather conditions favouring low soil moisture can affect the vulnerability of vegetation to a hot and dry summer over croplands.
In our model, replacing SSRD M AM with P M AM yields the same performance, indicating that soil moisture deficits can be modulated by either or both high radiation and low precipitation (which often coincide) through the two mechanisms illustrated in Figure 1. Here, it is not possible to disentangle the relative contributions of the two mechanisms, though the depletion of soil moisture in spring via crop growth promoted by high radiation is supported by the fact that dry soils in summer are often preceded by increased springtime vegetation growth (Lian et al., 2020). The relative difference in model improvements between forests and crops (Figure 2d,e) due to the inclusion of spring conditions, may suggest a role of vegetation type in developing a soil moisture deficit. However, due to their deep root systems that provide greater access to moisture, forests are generally more vulnerable to consecutive than single drought events (Anderegg et al., 2020), therefore the results may simply highlight that forests are less vulnerable during single droughts than crops. Furthermore, the differences found between cropland and forest may arise from the fact that in our dataset there are more cropland than forest grid cells below 40°N, that is, regions where evapotranspiration is not limited by energy constraints (Zscheischler et al., 2015). Removing these grid cells reduces the model improve-ment achieved by adding spring radiation, suggesting that its influence is not uniform across the studied domain.
Process-based vegetation models, simulating the response of vegetation to atmospheric variables based on underlying physiological processes, could allow for disentangling the effects of a precondition in follow-up studies. These models provide an advantage over statistical models as they would allow for disentangling the direct role of vegetation in exacerbating springtime soil moisture deficits using counterfactual simulations . Importantly, process-based vegetation models are expensive to run and require carefully designed spin-up and simulation protocols as well as multi-model ensembles to properly account for uncertainties.

Link to other preconditioned events
The approach outlined here can be used as a guideline to assess other preconditioned events. Each event type will require its own specific considerations (e.g. definition of hazards, identification of potential preconditions) but the structure of the conceptual model and its components will be similar. For instance, Gudmundsson et al. (2014) used logistic regression to predict increased wildfire activity after a meteorological drought. In fact, wildfires occurrences can be enhanced by previous-year surface moisture conditions (Forkel et al., 2012). In another example, either or both soil moisture and extensive snow cover can enhance river floods triggered by storm-driven precipitation extremes (Berghuijs et al., 2016(Berghuijs et al., , 2019.

Multivariate event: Future Changes in Compound Coastal Flooding 3.1 Introduction and research question
A typical example of multivariate compound events is compound coastal flooding (CF), which occurs when high sea levels obstruct the gravity-driven flow of pluvial/fluvial water into the ocean (Wahl et al., 2015;Eilander et al., 2020). Recent large-scale studies indicate that CF drivers such as precipitation, river discharge, or sea-level extremes will change in a warmer climate (Bevacqua, Vousdoukas, Zappa, et al., 2020;Moftakhari et al., 2017;Winsemius et al., 2016), which may have a significant impact on flooding at the local scale. In this context, we ask: how will climate change affect the CF hazard at the local scale?

Case study
The above question is considered for the Swan River estuary (catchment area 124,000 km 2 ), which flows through the densely populated city of Perth, in the state of Western Australia, Australia.

Data
We use observed daily maxima  of water levels at the Barrack station near Perth City (about 20 km inland and influenced by compound effects), upstream river discharge at the Walyunga stream gauge, and sea water levels at the coast at Fremantle. We focus on the period of May-August, when both water levels and discharge are the highest. The sea-level data is decomposed into the astronomical tide and non-tidal residuals (i.e. mainly storm surge driven by meteorological fluctuations). To characterize the underlying process leading to CF, we also use daily accumulated precipitation and mean sea-level pressure fields from the ERA5 reanalysis dataset (Hersbach et al., 2020). caused by the extreme compound water level due to interacting flood drivers. Weather systems such as storms modulate the intensity of the drivers via processes such as winds, pushing the oceanic water towards the coast, and rainfall. Note that rainfall from more than one storm and also evaporation can modulate river discharge. Climate change can affect compound flooding via sea-level rise, changes in storm intensity and duration, associated rainfall, and evaporation.
Arrows indicate causal links.

Tools
We use a multivariate non-linear regression model (James et al., 2013) to quantify the present-day influence of the river discharge (Q), astronomical tide (T astr ), and nontidal residuals (S) on the compound water level at the Barrack station (H), i.e.
where a i are the regression coefficient and (c, σ) is the error term. Composite maps (i.e., average weather conditions), and lagged correlations are used to investigate CF drivers.

Results and discussion
To assess CF changes in a warmer climate, it is important to first understand the processes shaping present-day flooding ( Figure 3). The multivariate regression model (eq. 1) indicates that the water levels in Barrack are compounded by sea levels and river discharges, however mainly influenced by the sea levels (grey isolines in Figure 4b). More upstream in the estuary the sea's influence diminishes (Bilskie & Hagen, 2018;Gori et al., 2020), hence we expect more pronounced compound effects, i.e. discharge and sea levels to have a similar impact on the water levels (orange isolines in Figure 4b) (Wu et al., 2021;Helaire et al., 2020;Eliot, 2012).
Since water levels at Barrack are largely driven by sea levels, extremes in both compound water levels and non-tidal residuals are driven by similar atmospheric configurations (Figure 5a,b). Non-tidal residual extremes are favoured by an elongated region of low atmospheric pressure, i.e. stormy weather, via winds pushing towards the coast and low barometric pressure effect (Bevacqua et al., 2017). Such atmospheric configuration is associated with wet conditions on the river catchment, however, in line with the time needed by the accumulated rainfall to reach the discharge gauge through the large catchment ( Figure 5c) (Hendry et al., 2019;Bevacqua, Vousdoukas, Shepherd, & Vrac, 2020), high river discharges at the outlet tend to occur 3-4 days after high non-tidal residuals ( Figure 5d). (e,f) plausible future increase (red) and decrease (blue) in positive non-tidal residuals S by 20%; (g,h) a potential future increase in the dependence between Q and S (g,h). In (g), black isolines represent the parametric copula density derived from data (a Clayton copula); red isolines show a potential future copula with higher dependencies (copula parameters were modified, increasing Kendall's tau from 0.04 to 0.3 and the upper tail dependence coefficient from 0.0001 to 0.15). Grey isolines in the right-hand panels show the average compound water level in Barrack predicted by the multilinear regression model. In (c), i.e. between daily river discharge and non-tidal residual on day t-N indicating that a higher river discharge tends to follow a higher non-tidal residual with a delay of about 3-4 days.
By the end of the century, changes in river discharge, sea levels, and their interplay can largely affect flooding dynamics. However, flooding projections remain unclear due to uncertainties in anthropogenic forcing scenarios, internal climate variability, and climate model differences. While we detected rare concurrent river discharge and sealevel extremes in the present, sea-level rise could increase CF risk (Moftakhari et al., 2017). For example, under a plausible sea-level rise of 0.5 m ( Figure 4a) (Hope et al., 2015;Vousdoukas et al., 2018), present-day river discharge extremes and mild sea levels would become concurrent extremes (dashed area in Figure 4b) (Bevacqua et al., 2019). However, climate projections point towards a reduction in discharge extremes by 20-50% ( Figure  4c) (Evans & Schreider, 2002;Hirabayashi et al., 2013;Swan River Trust, 2007), which could decrease CF risk (Figure 4d). Changes in storminess are uncertain and could lead to an increase or decrease (Figure 4e) in non-tidal residuals (Vousdoukas et al., 2018;Hope et al., 2015) and consequently in CF risk (Figure 4f). Changes in the dependence between non-tidal residual and river discharge (Figure 4g) may affect flooding as well (Figure 4h), however these changes remain uncertain and understudied (Bevacqua, Vousdoukas, Zappa, et al., 2020). We move to discuss different tools/approaches for assessing the effect of the changes above on local CF.
Statistical models for estimating compound water levels, such as the multivariate regression model employed here (Bermúdez et al., 2019;Santos et al., 2021), are often derived from (short) observational records. This can affect model accuracy under climate change, when information needs to be extrapolated far beyond the observed range. Highresolution hydrodynamic models, which include the non-linear interaction between hydraulic processes, topography, and human interventions (Wu et al., 2021;Kumbier et al., 2018), can provide a valid alternative for a thorough CF analysis. Hydrodynamic simulations can provide a detailed mapping of water levels in the estuary and highlight peculiar features of the hydrological system (Khojasteh et al., 2021), e.g. directly accounting for a potential shift in the area affected by compound effects from future sea-level rise (Bilskie & Hagen, 2018;Helaire et al., 2020). Probabilistic approaches can be applied to assess future flood return levels using simulated multi-year water-level time-series via high-resolution hydrodynamic modelling forced by an ensemble of climate models. However, running these simulations is computationally expensive, though approaches to circumvent this have been suggested (Wu et al., 2021;Parker et al., 2019;Sopelana et al., 2018). Nevertheless, applying a probabilistic approach for local future risk assessment remains challenging since no probability is attached to anthropogenic forcing scenarios. Notably, even focusing on a given anthropogenic forcing scenario, multimodel ensembles cannot be interpreted probabilistically (Zappa & Shepherd, 2017).
In this context, we suggest employing hydrodynamic models to study event-based storylines which explore in detail low-likelihood, high-impact future plausible events, with an emphasis on plausibility rather than probability (Shepherd, 2019;Hazeleger et al., 2015). Event-based storylines can be built based on expert knowledge, major past floods (Helaire et al., 2020;Khanam et al., 2021), or near-misses (Woo, 2021). One could investigate: what would be the societal impact of past fluvial flooding combined with the projected sea-level rise? Which consequences would a past storm have under a projected scenario with sea-level rise, high astronomical tide and reduced streamflow? What if a sequence of storms, the first causing high river discharge at the outlet and the last a concurring storm surge, had to hit Perth? Initial-condition large ensemble simulations and seasonal-hindcasts can help identify plausible threatening combinations of drivers that could lead to unseen extreme impacts (Thompson et al., 2017;Van der Wiel et al., 2021). Storylines fit well within common practices in disaster risk management, which consider event-based scenarios for emergency preparedness (Sillmann et al., 2021), and also allow for interaction with local stakeholders to evaluate the effectiveness of selected measures, e.g., dam installations and use. Figure 6. Physical elements of landslides and associated impacts in Portugal arising from extreme soil moisture, which is driven by clusters of moderate to extreme rainfall events. Arrows indicate causal links. Note that evaporation also influences soil moisture, but such a relationship is not represented here given the focus of the research question on precipitation clustering.

Links to other multivariate compound events
The methods above can be used to study other multivariate compound events. Composite maps can help to identify the atmospheric drivers of, e.g., wildfires, crop failures, tree mortality, which are often associated with hot and dry conditions Hoerling et al., 2013;Hao et al., 2019). Combined with large ensemble climate model simulations (Deser et al., 2020), composites can help to identify drivers of, e.g., extremely low renewable energy production (Van der Wiel, Stoop, et al., 2019). The knowledge acquired from composites can, for example, also allow for assessing how extreme events conditional to identified impact-driving weather patterns may evolve in the future (Jézéquel et al., 2020). Regression and more sophisticated models can help studying landslides (from precipitation, evaporation, and snowmelt), avalanche risk (enhanced by wet and warm conditions), and impacts from concurrent wind and precipitation extremes . In general, event-based storylines can provide practical information targeted to users' needs, e.g. for planning adaptation to plausible extreme events. For both landslide types, accumulated precipitation that saturates the soil, therefore reducing its shear strength, is one of the most important drivers ( Figure 6) (Guzzetti et al., 2007). Shallow landslides are typically triggered by intense precipitation falling within 1-15 days before the landslide, while deep landslides generally occur as a result of rainfall accumulated over weeks to about three months (Zêzere et al., 2015). Previous studies have quantified the critical accumulated precipitation amounts that can trigger landslides (Guzzetti et al., 2007;Brunetti et al., 2010;Zêzere et al., 2015;Peruccacci et al., 2017). Given that accumulated precipitation extremes are often caused by clusters of precipitation events from multiple storms (Priestley et al., 2017;Bevacqua, Zappa, & Shepherd, 2020;Dacre & Pinto, 2020) sometimes in association with individual atmospheric rivers (Ramos et al., 2015;Hénin et al., 2021), here we investigate whether a clear relationship between precipitation clustering and landslides can be identified. Specifically, we ask: are deep-and/or shallow-landslides preceded by temporal clustering of moderate to extreme rainfall events?

Case study
We focus on the region north of Lisbon (Portugal), which has a high landslide risk (de Brum Ferreira & Zêzere, 1997;Pereira et al., 2020). The region is part of the Western Meso-Cenozoic borderland, which has the highest number of disastrous landslides in Portugal (about 10 cases per 10 3 km 2 during 1865-2010) (Zêzere et al., 2014).

Data
We analyse the period 1956-2008, for which 23 landslides events were recorded in the dataset (Zêzere et al., 2015). A landslide event is identified when a minimum of five individual landslides occurred on natural slopes on a given date (Zêzere & Trigo, 2011). Following Zêzere and Trigo (2011), a landslide event was classified as shallow if more than 50% of the landslide area has a slip surface depth ≤1.5 m and as deep otherwise. For precipitation, employ the daily high-resolution IBERIA02 dataset (Belo- Pereira et al., 2011;Ramos et al., 2014) selecting a single grid-point, the nearest to the location of the landslides.

Tools
In a four-step procedure, we quantify whether landslides are systematically preceded (within a certain temporal window before the landslides) by a cluster of moderate to extreme precipitation events.
1. We identified the occurrence of precipitation events, here daily precipitation amounts above 9.25 mm (i.e. the year-round 85 th percentile of daily amounts; such threshold allow for considering moderate to extreme events, potentially relevant for landslides). To disentangle occurrences arising from different weather systems, we filtered high-frequency clustered events, that is, groups of daily events less than 3 days apart were categorised as a unique precipitation event (dated as the earliest event in the group) (Ferro & Segers, 2003;Barton et al., 2016). 2. We quantified the typical number of precipitation events occurring in any given temporal window W t , noticing that the number of events N Wt within W t is Binomially distributed (B(W t , p); p is the probability of a precipitation event on any given day). 3. For a window W t before any given landslide of interest, we identified a precipitation cluster when the number of precipitation events N Wt was above the 95 th percentile of the Binomial distribution (at 5% significance level). To prevent making strong assumptions on the width of the window W t , we sequentially looked for clustered events in all of the eighty-seven backward windows W t starting on the day of the landslide and ending 4 to 90 days before the landslide (a correction for multiple tests was applied, see supporting information). 4. After repeating the procedure above for each landslide, we quantified the fraction (%) of landslides that were preceded by precipitation clustering (we employed bootstrap to assess whether such a fraction is significantly higher than expected under independence between landslides and precipitation clustering).
See supporting information for details on the methods.

Results and discussion
In the Lisbon region, all of the 23 registered landslides occurred during the period November-March, of which 11 were classified as deep and 12 as shallow (in blue and red in Figure 7a, respectively). This period of the year is characterised by the highest frequency of 3-day accumulated precipitation extremes (Bevacqua, Vousdoukas, Zappa, et al., 2020), indicating precipitation's relevant role for landslide occurrences. In addition, here we demonstrate that most (deep) landslides were preceded by clustering of precipitation events. Figure 7a shows whether clusters of precipitation events were detected in the windows [-n,0] days preceding any landslide. For example, for the shallow landslide in 1983, the brown rectangle 23 days before the landslide indicates that an anomalously high number of precipitation events occurred in the window [-23,0] days before the landslide. Altogether, we detected precipitation clustering (at least one marked rectangle) before 83% (19/23) of landslides. Notably, this fraction is larger than 30%, which is the 95 th percentile of the value expected under no association between landslides and precipitation. However, while all deep landslides (11/11, 100%) were associated with clustered precipitation events, the association is weaker for shallow landslides (8/12, 67%). This difference indicates that multiple consecutive precipitation events are a crucial driver of deeplandslides, while -in line with previous studies (e.g., Zêzere et al., 2015) -shallow-landslides can also be triggered by single precipitation events.
Given the different geological nature of deep and shallow landslides (Zêzere et al., 2015), we investigate whether clustered precipitation events preceding deep and shallow landslides have different temporal characteristics. That is, for deep-and shallow-landslide events separately, we quantify the fraction of events that are preceded by precipitation clustering over different time windows. In line with the above, the results indicate that precipitation clustering may favour both landslide types. Still, major differences between the two landslide types are identified (Figure 7b). In fact, about 70-82% of deep landslides are significantly preceded by precipitation clustering over long temporal windows of 30-90 days, in line with multiple precipitation events rising groundwater level of deep soil and rock layers (e.g., Iverson, 2000). In contrast, only about 10-13% of shallow landslides are significantly preceded by clustered precipitation events, which occur 4-25 days before the landslides.
Precipitation clustering are associated with the passage of multiple storms and intensified westerly flow, leading to continuous pulses of moist air masses approaching the Iberian Peninsula (Hénin et al., 2021). Our results indicate a causal relationship between precipitation clustering and landslides, especially for deep landslides, which is in line with physical understanding (Figure 6). Notably, given that the classification of deep and shallow landslides is based on geological features, the contrasting results found for the two landslide types are directly attributable to differences in the meteorological processes causing the landslides. Here, we focused on low-frequency precipitation clustering, hence we do not exclude that also high-frequency clustering, e.g. from multiple convective precipitation events occurring within a few (< 3) days, is a relevant driver of shallow-landslides. Further analyses based on a combination of physically-based modelling with analyses of meteorological conditions and counterfactual experiments could allow investigating more in-depth questions related to the cause-effect relations of landslides.

Link to other temporally compounding events
An approach similar to the one developed here could be employed to investigate the relevance of event clustering for other types of natural hazards or impacts. For example, it could allow for identifying areas where temporal clustering of precipitation extreme events causes lake and river flooding, as well as high accumulated precipitation extremes (Barton et al., 2016;Bevacqua, Zappa, & Shepherd, 2020). Similarly, it could be possible to investigate whether sequential multi-year droughts have an impact on, e.g., forests (Anderegg et al., 2020) and how losses to the insurance industry are affected by consecutive storms (Priestley et al., 2018). It would also be possible to consider clustering of multiple hazard types, e.g. to study the negative effects of consecutive precipita-  5 Spatially compounding event: Synchronous wheat failure across European countries

Introduction and research question
Synchronous crop failures across multiple countries can pose a high risk to international food security and supply chains (Anderson et al., 2019;Gaupp et al., 2019) as exemplified by the European crop failure in 2018 (Beillouin et al., 2020;Toreti et al., 2019). Therefore, for society to be resilient to large-scale crop failures, it is important to reliably assess the probability that crop failure can be caused by spatially compounding events and also understand the drivers of behind these failure events. In this context, we ask: does the interplay between crop failures in different European countries affect aggregated continental crop yields? What are the climate drivers of such continental crop failures?

Case study
We analyse the concurrent failure of wintertime wheat across Europe (wintertime wheat is planted at the beginning of the winter and typically harvested in summer around July). We mainly focus on France and Germany (Figure 8), which are the two largest European wheat producers and therefore strongly influence continental food security (FAOSTAT, 2021).

Data
We use large-ensemble simulations that consist of 1600 growing seasons and associated annual wintertime wheat yields from Vogel et al. (2021). Wheat yields were simulated with the APSIM-Wheat model (version 7.10) (Zheng et al., 2014) driven by meteorological data from the EC-Earth global climate model (Hazeleger et al., 2010;Van der Wiel, Wanders, et al., 2019) under present-day (2011 climate conditions. We consider yields aggregated at the country level (Ribeiro et al., 2021), at this country scale the simulated annual wintertime yields compare well with observed yields (Vogel et al., 2021). To investigate drivers of crop failure, we consider mean monthly precipitation, vapour pressure deficit, and maximum temperature fields (Ray et al., 2015).

Tools
To quantify the effect of the dependence between yields of individual countries, we compare continent-wide aggregated yields from the original dataset with those obtained after randomly shuffling annual yields in all countries in time. We assess the relationship between crop yields in France and Germany through correlations and rates of cooccurring extremes. We estimate the probability of the total loss in a year exceeding a given loss amount via Aggregate Exceedance Probability (AEP) curves, which are standard insurance sector plots (Mitchell-Wallace et al., 2017;Hillier et al., 2015;Hillier & Dixon, 2020). To identify meteorological anomalies associated with crop failures, we compare composites (i.e. averages) of both meteorological time series and spatial fields during years with extremely low and normal yield (e.g., Vogel et al., 2021).

Results and discussion
At the European scale, the spatial correlation of wheat yields between countries strongly increases the likelihood of years characterised by high and low extreme aggregated yields, compared to the case where national yields are independent (Figure 9a). Accordingly, the wheat production of the two leading European producers, i.e. France and Germany, show a positive dependence (ρ P earson ∼ 0.5; p-val<< 0.01). Such a relationship also appears for the largest crop failures of the two countries, which tend to co-occur about four times more often than expected under independence (binomial test, p-val< 0.01, Figure 9b). Notably, the dependencies described above can influence European food security. This is illustrated by the AEP curve, indicating that aggregated yield losses of France and Germany with a return period larger than 10-years are about 20% larger than expected under independence (Figure 9c; note that the contribution of the two countries to the aggregated yield differ by only ∼ 10%).
Given the dependence of European wheat yield on Germany and France, we aim to identify the physical processes ( Figure 8) associated with concurrent crop failure in both countries. Figure 10 shows that seasons resulting in the lowest 5% aggregated crop yields of Germany and France ('bad' seasons) are associated with high maximum temperature, low precipitation, and high vapour pressure deficit during the growing season in May-July (Figure 10a,c,e). We note that during this growing stage, when the vegetation is photosynthetically most active, crops' health is particularly susceptible to the effects of compound warm and dry conditions, inducing vegetation stress that limits the quantity and quality of the harvests (COGECA, 2003;Ribeiro et al., 2020;Vogel et al., 2021). Finally, the composites show that during May-July, bad seasons are characterized by spatially homogeneous anomalies in the weather conditions across Germany and France (Figure 10b,d,f), indicating that large-scale weather patterns, such as high-pressure systems, favour aggregated crop failures.
The above highlights that both preconditioning effects during May-July and multivariate contributions from different drivers shape the spatially compounding European wheat failures (in line with the results of section 2). Preconditioning effects have also been identified as a key driver of the record-breaking 2016 crop failure in France (Ben-Ari et al., 2018).

Link to other spatially compounding events
Spatially compounding events can challenge the ability of (re-)insurance companies and governments to respond to emergencies (Falco et al., 2014;Raymond et al., 2020). An increasing need for adapting to climate change in many sec- (red), and when assuming inter-country independence (blue); labelled at selected return periods.
All differences between simulated and independent cases are statistically significant (p-val< 0.01).
One hundred permutations and a t-test were used to compute p-values in (c). tors such as agriculture, energy, industry, buildings, and transport (Hoegh-Guldberg et al., 2018), further highlights the relevance of studying these events. Our approach can be extended to investigate crop yields at the global scale and to analyse other major crop types such as maize and soy (Mehrabi & Ramankutty, 2019). For example, composites would allow for identifying typical atmospheric configurations associated with concurrent low agricultural production across the main 'breadbasket' regions worldwide (Gaupp et al., 2019;Kornhuber et al., 2019). Similar analyses would help to deepen the understanding of extreme impacts from spatially dependent river floods (Jongman et al., 2014;Berghuijs et al., 2019), which may intensify in the future due to the larger spatial-footprints of rainfall extremes (Bevacqua et al., 2021), or continental scale renewable energy shortages driven by large scale high pressure systems (Van der Wiel, Stoop, et al., 2019). At the regional scale, relevant events include widespread storm surges (Enríquez et al., 2020) and multiple wildfires (Portier et al., 2017) such as those in California in 2017 (Balch et al., 2020), which can deplete response capacity, resulting in extreme losses.

Discussion and conclusions
Building on the recently introduced typology of compound events , in this work we provide guidelines for the study of a set of highly heterogeneous compound events. These guidelines are applicable to events that might differ from the ones analysed here (e.g., belonging to very different impact domains, such as geological system and insurance industries), but have similar causal structure. For example, the approaches that were presented to study clustering and associated impacts can be useful for analysing both landslides and insurance losses, where the former may be caused by clustered heavy precipitation events and the latter by clustered windstorms. Using our typology-based selection of case studies we highlight six recommendations that are common to the study of the four different types of compound events.
1. Causal diagrams (e.g., Figure 1) provide a simple but rigorous tool (Pearl & Mackenzie, 2018) that offers a common language to meet the challenges of representing, discussing, and communicating relevant compounding mechanisms among experts of different disciplines (Leonard et al., 2014;Lloyd & Shepherd, 2020). 2. From a methodological point of view, our results highlight the relevance of considering univariate impact indicators in compound event studies (here, the Leaf Area Index (LAI) for the vegetation impacts, water level for flooding, landslide occurrences, and aggregated crop yields). Such an impact-centric approach allows for a backward analysis to identify relevant hazards and drivers of the considered impacts, hence guaranteeing a focused analysis on impactful combinations of drivers (Zscheischler et al., 2014;Bevacqua et al., 2017;Van der Wiel et al., 2020). When impact data are not available, expert knowledge is required to define impact proxies or potentially impactful multivariate drivers. 3. Composites of climatic conditions (Figure 2, 5, and 10) associated with impacts provide a simple but useful tool to gain a first-hand understanding of the potential drivers of impacts. Nevertheless, to avoid running the risk of linking association with causation, composites should be interpreted carefully based on expert knowledge and/or combined with advanced process-based modelling to test causal hypothesises via, e.g., counterfactual experiments or causal inference Runge, Bathiany, et al., 2019). For example, anomalies in the composites of time series in Figure 10 are correlated, hence disentangling causal effects on crop yields requires additional modelling, e.g., based on advanced statistical methods (Runge, Nowack, et al., 2019). 4. Advanced physically-based models, such as crop and hydrodynamical flooding models, are important for studying the dynamics of many compound events, especially under climate change conditions. In fact, simple statistical models, such as regression, may not provide accurate information about future changes that are outside the observational range, and more sophisticated approaches from extreme value theory are required (Engelke & Ivanovs, 2021). 5. Compound event occurrence probabilities and their future changes can be highly uncertain due to combined uncertainties in multiple drivers and associated interplay (Bevacqua, Vousdoukas, Zappa, et al., 2020;Santos et al., 2021;Villalobos-Herrera et al., 2021). In this context, event-based storylines, which explore consequences of high-impact plausible events either or both under present and future climates putting emphasis on plausibility rather than probability, can provide very effective information for improving emergency preparedness to compound events (Sillmann et al., 2021). 6. The separation between compound event types is not clear-cut and soft boundaries exist between them . For example, multivariate compound coastal flooding is affected by temporally compounding features since high river discharge can be caused by precipitation from multiple storms (Ward et al., 2018;Bevacqua, Vousdoukas, Shepherd, & Vrac, 2020) (Figure 3). Hence, methods specifically useful for studying different types of compound events can sometimes be combined. Importantly, our case studies highlight that, depending on the research question of interest, a user will typically focus on a specific compound feature, for example the multivariate feature of compound flooding from ocean and riverine processes, and therefore ultimately employ tools that are useful for the compound feature of interest.
Given the complexity and diversity of compound events, each event is different and therefore our recommendations are somewhat dependent on the selected case studies. Follow-up work could expand and generalise the presented guidelines, for instance by considering different compound events and different research questions and in this way provide additional insights to advance compound event analysis. Furthermore, creating an inventory of methods that are useful for analysing different types of compound events would be particularly helpful to users in order to identify appropriate analysis tools that can address specific research questions of interest. Given the extreme impacts often caused by different types of compound events, this study is a step towards the development of fundamental guidelines for incorporating a compound event perspective across disciplines and sectors for improved process understanding and risk assessments.
Author contributions EB, JZ, and CDM conceived the project. EB organised the workshop that led to the paper together with AC, CM, CDM, AMR, EV, and JZ. EB led the writing of the manuscript. CM, JZ, and AB designed the preconditioning analysis; AB retrieved and processed the data, CM carried out the analysis, CM and JZ wrote the section with contributions from AB and EB. EB, AC, ER, SB, FD, and WW designed the compound flooding analysis; AC and EB carried out the final analyses and wrote the section, WW retrieved the data. CDM, PR, AMR, SCO, JGP designed the landslides' analysis with contributions from EB and JZ; CDM and PR carried out the final analyses; AMR, SCO, JGP retrieved the data, all the authors reviewed results; CDM and EB wrote the section. AFSR, EV, JH, KS, and JZ designed the spatially compounding event case-study. AFSR, JH, and EV conducted the analysis based on the data prepared by TZ and KvdW. AFSR, EV, JH, KS, and EB wrote the section. All authors discussed the results and reviewed the manuscript.  Text S1. Additional information on the tools employed for the preconditioned event analysis Classification of Crop and Forest Grid Cells. The categorisation of grid cells into forest or crop grid cells (Figure 2a) is done using the European Space Agency Land Cover dataset . This provides a land classification at a 300m resolution for the period 1992-2018, which we regrid to a 0.5°lat-lon grid using the LC-CCI User Tool. A grid cell is classified as a crop grid cell if rainfed croplands are the majority class, while it is classified as a forest grid cell if the majority class is either evergreen, deciduous broadleaf, needle leaved, or mixed forests. To exclude human influence due to land-cover change, we keep only grid cells where the sum of the absolute changes over the 27 year period is below 5%. We also exclude irrigated croplands. The locations of the resulting crop and forest grid cells is shown in Figure 2a.
Application of Logistic Regression. Using a logistic regression model, we estimate the probability of extremely low vegetation activity in summer, defined as occurrences of LAI in summer (LAI JJA ) below its 5 th percentile. In this case, the impact Y is a binary time series (1: LAI JJA < 5 th percentile; 0: LAI JJA > 5 th percentile), while the predictors X 1 , X 2 , ..., X n are continuous variables. The probability of a low anomaly is estimated as: where b 0 , b 1 , ... , b n are the regression coefficients. The logistic regression model is fitted using the glm function from the stats package in R (R Core Team, 2020). We fit two models for both cropland and forest grid cells separately, a base model that includes X -4 : summer precipitation and temperature as predictors and a second model which includes the two summer predictors along with spring radiation. After an iterative process, this was found to be the most parsimonious model with good performance. Similar performance is achieved using spring precipitation instead of spring radiation, though we choose to use spring radiation (see discussion in the main text). Finally, no improvement is achieved when adding other variables (i.e summer radiation or spring temperature). Before fitting, all predictors are checked for multi-collinearity using the variation inflation factor (VIF). Some variables have moderate correlation but it is small enough that it will not adversely affect the results.
July 22, 2021, 4:15pm : X -5 Text S2. Additional information on the tools employed for the temporally compounding event analysis We looked for clustering of precipitation events over multiple windows before the landslide of interest. The amplitude of the shortest window ([-4,0] days) is in line with the 3 days thresholds used to disentangle nearby daily precipitation amounts arising from individual weather systems (i.e., used for removing high-frequency clustered events). Due to the multiple windows, multiple tests are involved when assessing whether the number of precipitation events is above the 95 th percentile of the Binomial distribution (where some tests may be dependent due to the overlapping of time windows), hence we considered the Fuzzy Benjamini-Hochberg correction (Kulinskaya & Lewin, 2009).
Given that we look for precipitation clustering up to 90 days before the landslide, which occurred during the November-March period in each hydrological year, we searched for clustering backward until August when considering landslides occurred in November.
Hence, the parameter p (equal to 0.056) of the Binomial distribution was estimated based on data during August-March, specifically as N tot /L, where N tot is the total number of selected precipitation events and L is the length of the August-March time series. Note that in further analysis, a dependency of the parameter p from time could be considered, such as to account for seasonality effects. The goodness of fit of the Binomial distribution was tested (significance level of 0.05) for all temporal windows using the Chi-square goodness-of-fit test. X -6 : We quantified the association between precipitation clustering and landslide occurrences as the fraction of landslide events preceded by precipitation clustering. To assess whether this association is significant, we compared the observed fraction with that 95 th percentile of the fraction obtained in 5000 synthetic datasets that assume no association between clustered precipitation events and landslides. In these synthetic datasets, the landslides occurrence is shuffled in time. In particular, we shuffled the year of occurrence of the landslide events but left that day and month of occurrence as in the original dataset to preserve the temporal structure of landslide occurrences in the synthetic dataset (Witt et al., 2010;Bevacqua et al., 2019). The resulting increase of the confidence interval width with the temporal window in Figure 7b is in line with the tendency of precipitation events to cluster more at higher temporal windows.