The flood cooking book: ingredients and regional flavors of floods across Germany

River flooding is a major natural hazard worldwide, whose prediction is impaired by limited understanding of the interplay of processes triggering floods within large regions. In this study we use machine learning techniques such as decision trees and random forests to pinpoint spatio-temporal features of precipitation and catchment wetness states which led to floods among 177 267 rainfall-runoff events observed in 373 German river basins. In mountainous catchments with high annual precipitation rates and shallow soils, event rainfall characteristics primarily control flood occurrence, while wetness conditions and the spatial interplay between rainfall and catchment soil moisture drive flood occurrence even more than event rainfall volume in drier basins. The existence of a snow cover also enhances flood occurrence. The identified ingredients and regional flavors shed new light on the spatial dynamics of hydro-meteorological processes leading to floods and foster regional adaptation of flood management strategies and early warning systems.


Introduction
Every year floods cause severe damages to infrastructures and economies worldwide (UNISDR 2015), often leading to fatalities (Petrucci et al 2019). In fact, despite floods having been the focus of a substantial part of hydrological research in the past decades, engineers, public authorities and the private sector still have limited capabilities to predict this type of environmental hazard. In addition, flood risk is expected to increase globally as a result of economic growth, urbanization and climate change (Arnell and Gosling 2016, Winsemius et al 2016, Dottori et al 2018, making advances in this field crucial to support policy and decision makers in adopting effective measures to reduce the impacts of flooding. The occurrence, duration, extent and severity of river floods are controlled by a variety of hydrometeorological processes. Because of this heterogeneity, their generation mechanisms at the scale of whole river basins are not well understood yet (Hall et al 2014). As a consequence, these mechanisms are usually ignored in statistical (Merz and Blöschl 2008) and comparative analyses (Salinas et al 2013), resulting in uncertain predictions of flood characteristics and their possible changes (Kundzewicz et al 2014). Progresses in understanding and accounting for physical processes leading to floods are key to improve their prediction and interpret their transformations (Merz et al 2014, Blöschl et  Works using seasonality as a surrogate of floodtriggering processes (Villarini 2016, Berghuijs et al 2017, Blöschl et al 2017 also exist, as well as investigations which consider flood generating mechanisms to estimate statistics (Struthers and Sivapalan 2007, Merz and Blöschl 2008, extent (Kemter et al 2020) and trends (Mangini et  In recent years, process-based classifications of flood events have gained momentum (e.g. Merz and Blöschl 2003, Nied et al 2014, Sikorska et al 2015, Keller et al 2017, Stein et al 2019, Barth et al 2019, Tarasova et al 2020. These approaches typically group floods according to features of their triggering atmospheric events and catchment responses (Tarasova et al 2019). However, a clear identification of the combination of factors, their magnitudes and interactions leading or not leading to flood events is still missing (Sharma et al 2018).
In the present study we address this problem by deciphering which combinations and spatiotemporal features of precipitation characteristics (volume, intensity, spatial coverage, rainfall versus snowfall) and catchment wetness have led to floods in German catchments in the past decades. In an analogue to cooking, we aim at discovering the different regional recipes of floods by analyzing which ingredients, quantities and combinations of them resulted in flood events. The study is based on the analysis of 177 267 rainfall-runoff events occurred in 373 German catchments. We used machine learning tools (i.e. decision trees (DTs) and random forests (RFs)) to facilitate the analysis of such a large data set and to 'reveal fundamental organizing principles and emergent patterns' (Shen 2018) hidden to the study of flood generation processes of a smaller number of individual flood events.

Data
The study uses observed daily streamflow data from 373 German catchments ranging from 30 to 23 719 km 2 (median = 463 km 2 ), daily gridded rainfall data (on a 1 km grid) from the REGNIE (Regionalisierte Niederschlagshöhen) data set (Rauthe et al 2013) provided by the German Weather Service, air temperatures interpolated using external drift kriging (Zink et al 2017), and snowmelt and soil moisture (SM) time series (on a 4 km grid) simulated by the mesoscale hydrologic model (mHM) (Zink et al 2017).
Catchments with streamflow time series significantly impacted by reservoirs or river engineering works are not included in this study (Tarasova et al 2018a). Large basins with a significant fraction of their areas located outside Germany and where river routing processes might affect flood characteristics are also excluded from this analysis, as well as lowland catchments in north-eastern Germany, where runoff behavior is strongly controlled by lakes and groundwater dynamics.
A map of Germany showing the spatial distribution of the study catchments along with key climate and landscape characteristics is given in figure  S1 (available online at http://stacks.iop.org/ERL/15/ 114024/mmedia). A detailed description of the study area is available in (Tarasova et al 2018a).
Various catchment descriptors representative of regional variations in landscape (e.g. topography, geomorphology, land use, soil characteristics, hydrogeological information) and climate (e.g. mean annual precipitation) characteristics (table S1) are used to illustrate general climatological and landscape attributes corresponding to spatial patterns of flood ingredients.

Flood definition and hydro-meteorological indicators
Goal of this study is to identify specific spatiotemporal interactions between precipitation and wetness state leading to floods in river catchments. In a first step, single rainfall-runoff events are derived from continuous hydroclimatic time series using the procedure illustrated in (Tarasova et al 2018b). In essence, runoff time series are separated into quick and base flow by means of a simple smoothing algorithm (Institute of Hydrology 1980) and event precipitation is assigned to each separated event by a lag time approach (Mei and Anagnostou 2015). Finally, multi-peak events and overlapping singlepeaks events are iteratively distinguished by comparing events statistics (Tarasova et al 2018b). The approach results in a collection of 177 267 runoff events occurred between 1979 and 2002 in 373 German catchments.
In this study, we assume that at least one flood event per year occurs in every catchment. Hence, we identify the highest streamflow of each year to build the annual maximum flood series (AMS) of the catchment and tag each streamflow peak larger than the minimum value of the AMS as flood event. This results in 35 565 events (i.e. ∼20% of all runoff events) labeled as flood events.
For each rainfall-runoff event several indicators which capture the spatio-temporal dynamics of event precipitation, catchment wetness state and their interactions are calculated (table 1). Note that indicators directly derived from the streamflow time series, such as event runoff volume or event runoff coefficients are not used as potential flood ingredients.
Some indicators are spatio-temporal averages, such as event rainfall volume divided by mean annual precipitation, (R.vol) and antecedent soil moisture relative to its climatology (Zink et al 2017). We also consider the portion of catchment that is wet (SM.wet) (Tarasova et al 2020) to account for runoff generating areas.
However, Seo et al (2012) and Mei et al (2014) stated that information on the space-time interaction of hydro-meteorological processes is needed to correctly mimic magnitude and timing of flood events. Therefore, unlike many previous studies on regional flood processes, we additionally use spatially and temporally distributed event characteristics to better capture the spatio-temporal interactions of rainfall and soil moisture within the catchment (Tarasova et al 2020). For example, we derive the overlap between precipitation and wetter areas within the catchment (P.SM.overlap.mean), to distinguish between rain falling on wetter soils, which tends to produce fast runoff (Tarasova et al 2018a) and rain falling on drier soils, which tends to be stored in the catchment. This is known to be a key driver of runoff generation (McGlynn et al 2003). Similarly, we use the spatial covariance of precipitation and snow cover (P.SWE.cov) to account for rain on snow events, well reported in the literature as drivers of large scale floods (Musselman et al 2018).

Predicting flood occurrence and disentangling flood ingredients
In this study we use machine learning and datamining tools, which have been recently applied to predict the occurrence of flood/non-flood events (Schmidt et al 2020) in alternative to deterministic rainfall-runoff models (Vormoor et al 2016, Musselman et al 2018. Using machine learning allows for consistently analysing the drivers of runoff generation for a large number of events in different climate and landscape settings. Specifically, we use RF and DT approaches. The basic idea of tree-based data mining (Breiman 2001) is to recursively split the data space (in our case event characteristics) into sub-spaces according to the behavior of a response variable (i.e. flood/non-flood labels). The result is a DT where at each node events are divided into two branches according to wether a selected event characteristic is smaller or larger than a threshold. At the end of each branch, events are classified as 'flood events' or 'non-flood events' . RFs (Cutler et al 2007) are a variant of tree-based models consisting of a large number of DTs that operate as an ensemble. RFs apply bootstrap sampling to compose the training subsamples of the single trees. The average of all the individual DT estimates (or the majority vote in case of categorical classes) is finally taken as prediction. Due to combining many DTs into a single model, predictions tend to be more accurate and less sensitive to the specific training data. Whereas DTs are easily interpretable, because classification rules are given, the classification strategy of RFs is often seen as a black box (Schmidt et al 2020).
A flowchart of the study is given in figure S2. To identify regional flood ingredients we first train a RF for each catchment using the potential flood ingredients of table 1 as predictor variables. 60% of all events in the catchment are randomly selected to build up the RF (training sample), while the remaining 40% of them (test sample) are used to validate the prediction performance. The prediction performance is here measured in terms of the Matthews correlation coefficient (MCC) (equation S3) (Lever et al 2016). MCC ranges from 1 (when the classification is always wrong) to 0 (when it is no better than random) to 1 (when it is always correct). Although no single metric can summarize all possible strengths and weaknesses of a classifier, MCC has proven to outperform other classification metrics in highly imbalanced data (Giuseppe 2012, Boughorbel 2017, Chicco and Jurman 2020. This feature motivates our choice of MCC as classification metric. The RFs thus obtained embody relevant flood ingredients for each catchment. To identify regional differences in the ingredients of flood events, catchments are then pooled into homogeneous groups. To do so, we assume that two catchments have similar flood ingredients if the RF locally derived in one catchment can reliably predict flood/non-flood occurrence in the other catchment. Hence, the RF of each catchment is used to predict flood/non-flood occurrence in all other catchments and MCC is used to measure the predictive performance. Hierarchical clustering techniques (Gordon 1987) which maximize pairwise MCC are used to derive clusters with similar flood ingredients. By compiling all training events within a region together, we finally build RFs to predict flood/nonflood occurrence in each cluster. The number of clusters is selected to maximize the median MCC for predicting flood/non-flood occurrence in the test sample of all catchments by using regional RFs. To get a hint on the most important factors distinguishing flood from non-flood events the distribution of the feature (or variable) importance (measured as mean decrease of the Gini index) (Breimann, 2001) is used. The feature importance score rates how important each variable is for building up the decision forest.
As the feature importance scores of RFs only provide the relative importance between flood ingredients (but no indications about how they combine), we also fit DTs to all trainings events in each cluster. In fact, although DTs tend to have a weaker predictive performance than RFs, the resulting trees explicitly represent the classification rules, which can then be hyrologically interpreted. As the objective of this study is not the optimization of prediction performances, but a deeper comprehension of which spatio-temporal combinations of rainfall and soil moisture lead to flooding, we use RFs and DTs in parallel to take advantage of their individual strengths.  Figure 1(b) shows the clustering of catchments based on the similarity of their flood ingredients, together with the predictive performance of the regional RFs (i.e. one RF per cluster). The predictive performance obtained by using regional RFs to predict flood/non-flood occurrence in each catchment is on average larger than by using local RFs. In fact, the mean MCC for all catchments increases from 0.70 to 0.77 when using regional RFs instead of local ones. Similarly, the lower and upper quartiles increase from 0.63 to 0.70 and from 0.75 to 0.82. This counterintuitive result is explained by the existence of two opposing factors affecting predictive performance when moving from local to regional RFs. As flood occurrence certainly depends on the local hydrological conditions, which can be better represented by local than regional RFs, one would expect a decrease in predictive performance when regional RFs are used. However, floods are by definition rare events and hence the sample size of local flood events always tends to be small.  figure 1(b); catchment descriptors (table S1) are catchment area (AREA), long-term median duration of dry spells (minimum 1 wet day between dry spells) (CL_DS.mean), long-term mean annual precipitation (CL_MAP), aridity index as ratio of mean annual potential evaporation and mean annual precipitation (CL_PET.P), ratio of long-term summer precipitation and winter precipitation(CL_Psum2win), percent of the catchment with aquitard (HGEO_aquifer_aquitard), percent of the catchment with karst aquifer and fractured aquifers (HGEO_aquifer_fract.karst), groundwater yield (HGEO_GW_yield), percent of the catchment covered by agricultural areas (LANDUSE_agri), percent of the catchment covered by forests (LANDUSE_forest), percent of the catchment covered by urban areas (LANDUSE_urban), mean elevation (MP_mean_dem), mean slope (MP_slope), mean topographic wetness index (MP_twi), river network density (MP_RND), mean soil depth (SOIL_soildepth), mean fraction of clay in subsoil (SOIL_sub_clay), mean fraction of silt in subsoil (SOIL_sub_silt), mean fraction of clay in topsoil (SOIL_top_clay), mean fraction of silt in topsoil (SOIL_top_silt). (b) Normalized feature importance of flood ingredients (table 1) of the regional RFs for each flood cluster of figure 1(b), which is indicated by the color of the bars. Flood ingredients used in regional RFs are the ratios between volumes of catchment-averaged event rainfall and mean annual precipitation (R.vol), antecedent catchment-averaged soil moisture (SM), the ratio of maximum daily rate (i.e. intensity) of catchment-averaged event precipitation and median of event rainfall volume (P.maxI), mean (in time) portion of overlap between areas with wet antecedent soil moisture state and precipitation during event (P.SM.overlap.mean), ratio of event melt volume and mean annual precipitation (M.vol), mean of spatial covariance between consecutive days within a precipitation event (P.steadiness.mean), spatial coefficient of variation of antecedent soil moisture (SM.spcv), spatial coefficient of variation of precipitation volumes (P.spcv), temporal coefficient of variation of daily precipitation rates (P.tcv), temporal coefficient of variation of spatial covariance between consecutive days within precipitation event (P.steadiness.cv), ratio of maximum daily precipitation rate and total precipitation volume (P.ts), temporal coefficient of variation of overlap between areas attributed to wet antecedent soil moisture state and precipitation during event (P.SM.overlap.cv), the ratio of catchment-averaged antecedent snow water equivalent (SWE) and mean annual precipitation, mean (in time) extent of snow covered areas during precipitation event (SWE.extent.mean), temporal coefficient of variation of the extent of snow covered areas during precipitation event (SWE.extent.cv) and the portion of catchment affected by precipitation event (P.extent).

Results and discussion
Although pooling catchments together may neglect local factors impacting flood characteristics, the resulting sample size is larger and more information on flood characteristics is available. Hence, assembling all events within a region enables to better train RFs and thus enhance prediction performance. Our results indicate that the second factor is dominating. These findings are in line with the general idea of pooling catchments to better estimate local flood characteristics (e.g. Salinas et al 2013).
The resulting clusters ( figure 1(b)) of similar event characteristics responsible for the occurrence of floods mostly match the spatial distribution of key climate and landscape descriptors (CDs) in Germany. Figure 2(a) shows boxplots of selected normalized CDs for catchments in each cluster. These descriptors are solely used in the study to illustrate climatic and landscape attributes of the clusters identified from event characteristics. The importance of the single event characteristics for the occurrence of floods in each cluster is displayed in figure 2(b).
Cluster 1 mainly comprises mountainous catchments in the pre-alpine region and the higher parts of the mid-elevation mountain ranges of the Black and Bavarian Forests and the Rheinish Massif, which face the highest mean annual rainfall rates of about 1130 mm yr −1 . These catchments are widely forested and characterized by shallow soil depths, small topographic wetness indices (which indicate catchments with only a few flat areas near the streams) and small groundwater yield. Cluster 5, on the contrary, includes catchments in the transition zone between central Germany and the northern lowlands. Catchments in this cluster tend to be significantly drier with mean annual rainfall rates of 660 mm yr −1 . The catchments are flatter than in cluster 1, with deeper soils and larger groundwater yields. Generally, aridity index (ratio between long-term potential evapotranspiration and rainfall), topographic wetness index, soil depth and groundwater yield increase from cluster 1 to 5, while mean annual precipitation, drainage density, catchment altitude, amount of forested areas and slope tend to decrease from cluster 1 to 5. Figure 2(b) displays the normalized importance of features used for splitting in the regional RF of each cluster. The figure reveals that rainfall volume (R.vol) is the most important feature to distinguish between floods and non-flood events in clusters 1 to 3. In the wettest cluster 1 the maximum intensity of event rainfall (P.maxI) is the second most important feature, which even precedes soil moisture. This may be explained by high annual rainfall and low evaporation rates, leading to relatively wet catchments with high chances of event-fed saturation which exhibit runoff peaks largely dependent on event rainfall characteristics (Tarasova et al 2018a).
In the driest clusters 4 and 5 wet antecedent SM states and spatial overlapping of precipitation and soil moisture (P.SM.overlap) tend to be more important than rainfall volume. Event runoff coefficients are usually low in dry catchments, but can significantly increase if soils are wet (Merz et al 2009, Tarasova et al 2018a. Hence, overlapping of rainfall and wet soils is a key ingredient for the generation of floods in these catchments. The emerging importance of P.SM.overlap for the occurrence of flood/nonflood events underpins findings of previous studies conducted at plot and hillslope scales (Mcglynn andMcdonnell 2003, Freyberg et al 2014), which show that event runoff is mainly produced in limited areas of the catchment. Our results also suggest that the other event characteristics have minor importance for flood occurrence.
To shed light on the composition of flood recipes in each region, we fit DTs to all training events in a region and analyzed the classification rules provided. As expected, the overall predictive performance in terms of mean MCC decreases to 0.70 by using regional DTs compared to regional RFs. This value is comparable to the performance of local RFs. Figure 3 displays the DTs for predicting flood/non-flood occurrence in each cluster together with sketches of archetypical catchments, where the main regional climate and landscape descriptors of each cluster are visualized to support transferring the findings of this study to other regions in the world. The flood ingredients unveiled by the DT of each cluster agree with those highlighted by the feature importance of the RF classification. Event rainfall volume (R.vol) is the first splitting criterion in clusters 1 to 3, showing the importance of event rainfall for flood occurrence. While in the wettest cluster 1 event rainfall volume (R.vol) and maximum precipitation intensity (P.maxI) are sufficient to label 70% of all events as 'non-floods' , information on catchment wetness state always appears as splitting criterion in the two top branches in clusters 2 and 3. In these clusters floods only occur as a result of the interaction of rainfall volume and catchments wetness state.
In line with the feature importance of the RF approach, catchment-averaged soil moisture and the spatial overlap of precipitation and wet soil (P.SM.overlap) tend to play a more important role than event rainfall volume for the generation of floods in clusters 4 and 5, which comprise drier catchments. P.SM.overlap and SM are respectively used in these clusters as the top branch splitting criterion.
In clusters 1 to 4 also melt runoff (M.vol) and snow water equivalent (SWE) appear as splitting factor. The rather low threshold values for melt runoff in these clusters suggest that even if it does not substantially contribe to runoff volume, the existence of a snow cover is relevant because it might give rise to the occurrence of rain-on-snow events. This finding further stresses the importance of this type of flood as suggested by previous studies (McGabe et al 2007, Freudiger et al 2014, Vormoor et al 2016, Berghuijs et al 2016, Krug et al 2020

Summary and conclusions
This study aims at deciphering spatio-temporal interactions between precipitation and wetness states of catchments that trigger floods. By using machine learning approaches to facilitate the analysis of 177 267 rainfall-runoff events, we consistently identify event characteristics which lead to flood runoff in a large number of catchments. Although tree based approaches used in this study do not necessarily disentangle causal relationships between precipitation and catchment wetness states and the occurrence of flood events, the resulting trees reveal a regional variability of flood ingredients that can be interpreted in the context of physiographic and climatic differences among catchments.
Our results show that regional RFs derived for clusters of catchments which are grouped together according to the similarity of their flood ingredients improve prediction of flood/non-flood occurrences compared to individual RFs derived for each catchment. Floods are intrinsically rare events and hence local flood samples might suffer from their limited sizes. The increase of information on flood generating processes resulting from the use of larger samples obtained by pooling basins together prevails over the benefit of having local RFs which better represent local flood characteristics. The clustering of catchments with similar flood ingredients show clear spatial patterns, which follow general climate . Archetypical catchment structure and regional decision tree to predict flood occurrence for each cluster with similar flood ingredients. Numbers in end leaves of decision trees correspond to percentage of all events falling in each end leave. Flood ingredients used in regional decision trees are the ratios between (i) volumes of catchment-averaged event rainfall and mean annual precipitation (R.vol), (ii) melt volume and mean annual precipitation volume (M.vol), and (iii) maximum daily rate (i.e. intensity) of catchment-averaged event precipitation and median of event rainfall volume (P.maxI), (iv) the mean of spatial covariance between consecutive days within a precipitation event (P.steadiness.mean), (v) the ratio between catchment-averaged antecedent snow water equivalent (SWE) and mean annual precipitation, (vi) antecedent catchment-averaged soil moisture (SM) and (vii) mean (in time) portion of overlap between areas with wet antecedent soil moisture state and precipitation during event (P.SM.overlap.mean). Typical catchment properties are presented based on selected catchment descriptors (table S1). and landscape characteristics. In mountainous catchments facing large rainfall rates and wet soils all over the year, the amount of event rainfall primarily determines flood occurrence, while catchment wetness states seem to be less important for distinguishing flood from non-flood events. However, for the drier regions the interplay of rainfall with catchment soil moisture gains importance. In drier catchments small rainfall events falling on wet soils can also result in flood runoff. In the second driest region of our study, the overlap of rainfall and wet soils tends to be more important than the event rainfall volume itself. In the driest catchments, wetness conditions tend to assume the driving role. The existence of a snow cover seems instead to enhance flood occurrence. The identified regional flavors of flood ingredients may help to explain the spatially variable performance of flood forecasting models and flood monitoring systems, eventually strengthening their reliability. In fact, recognizing catchments where the interplay between precipitation and antecedent wetness dominates flood occurrence may indicate regions where enhanced accuracy in the representation of soil moisture dynamics and spatial rainfall patterns is required to establish effective flood hazard early warning systems. Future work will focus on investigating relevant ingredients for the occurrence of ordinary versus extreme floods and on modifications of flood recipes over time due to land use and climate change.

Acknowledgments
The financial support of the German Research Foundation ('Deutsche Forschungsgemeinschaft' , DFG) in terms of the research group FOR 2416 'Space-Time Dynamics of Extreme Floods (SPATE)' , the research project Propensity of rivers to extreme floods: climate-landscape controls and early detection (PREDICTED, Grant Number 421396820), and of the Helmholtz Centre for Environmental Research-UFZ is gratefully acknowledged. For providing the discharge data for Germany, we are grateful to: Bavarian State Office of Environment (LfU), Baden-Wurttemberg Office of Environment, Measurements and Environmental Protection (LUBW), Brandenburg Office of Environment, Health and Consumer Protection (LUGV), Saxony State Office of Environment, Agriculture and Geology (SMUL), Saxony-Anhalt Office of Flood Protection and Water Management (LHW), Thüringen State Office of Environment and Geology (TLUG), Hessian Agency for the Environment and Geology (HLUG), Rhineland Palatinate Office of Environment, Water Management and the Factory Inspectorate (LUWG), Saarland Ministry for Environment and Consumer Protection (MUV), Office for Nature, Environment and Consumer Protection North Rhine-Westphalia (LANUV NRW), Lower Saxony Office for Water Management, Coast Protection and Nature Protection (NLWKN), Water and Shipping Management of the Fed. Rep. (WSV), the European Water Archive (EWA) and the Global Runoff Data Centre (GRDC) prepared by the Federal Institute for Hydrology (BfG, http://www.bafg.de/GRDC). The simulations of the mHM model are available at http://www.ufz.de/index.php?en=41160. Climatic data can be obtained from the German Weather Service (DWD; https://opendata.dwd.de/climate_ environment/CDC/).

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.