Machine learning based identification of dominant controls on runoff dynamics

Hydrological models used for flood prediction in ungauged catchments are commonly fitted to regionally transferred data. The key issue of this procedure is to identify hydrologically similar catchments. Therefore, the dominant controls for the process of interest have to be known. In this study, we applied a new machine learning based approach to identify the catchment characteristics that can be used to identify the active processes controlling runoff dynamics. A random forest (RF) regressor has been trained to estimate the drainage velocity parameters of a geomorphologic instantaneous unit hydrograph (GIUH) in ungauged catchments, based on regionally available data. We analyzed the learning procedure of the algorithm and identified preferred donor catchments for each ungauged catchment. Based on the obtained machine learning results from catchment grouping, a classification scheme for drainage network characteristics has been derived. This classification scheme has been applied in a flood forecasting case study. The results demonstrate that the RF could be trained properly with the selected donor catchments to successfully estimate the required GIUH parameters. Moreover, our results showed that drainage network characteristics can be used to identify the influence of geomorphological dispersion on the dynamics of catchment response.


| INTRODUCTION
It is argued that floods caused by extreme precipitation are becoming more frequent due to climatic changes. In the last two decades, such flood events have been the main reason for major flood damages in Germany (Uhlemann, Thieken, & Merz, 2010). The increased relevance of rainfall induced floods is caused by significant changes of precipitation extremes (Murawski, Zimmer, & Merz, 2016), and requires reconsideration of flood forecasting systems. Especially the lead time of forecast becomes more crucial for public safety and has to be prolongated by improved hydrological models. Inside the hydrological model, processes are numerically reproduced in the chosen model structure and by the parameters that have been fitted to data of past events. Because streamflow records are only available as single point records, the required data base is rarely available at the desired locations where flood risk assessments are needed.
A common strategy to overcome this problem is to utilize regionalized data from stream gauging sites for ungauged locations. The requirement for utilizing regionally transferred data is that the respective catchments have to be hydrological similar, i.e., the active processes within these catchments are concurrent (Blöschl & Sivapalan, 1995). Although several studies have developed a framework for defining hydrologic similarity in different climatic regions and spatial scales (Wagener, Sivapalan, Troch, & Woods, 2007;Winter, 2001), to date a clear definition for similarity and for a clear catchment classification scheme has not been developed. and its recording more expensive compared to streamflow gauging data. At the basin scale, a clear connection between hydrologic processes and catchment characteristics, derived from analysis of streamflow data, remains thus to be identified. This might go back to the fact that most previous studies either transferred parameters for sophisticated hydrological models or regionalized runoff signatures. As a result, findings were to a large extent restricted by the procedure used, that is, the a priori perception of the hydrologic process. Shen et al. (2018) and Mount et al. (2016) specified an alternative approach for analysis. They suggested using data-driven methods, Deep Learning in particular, as a new method of scientific analysis. Key benefits were supposed to be design free of preconception and the ability to adapt to specific problems. The learning procedure was basically a procedure of hypothesis-testing to provide new insights into hydrological processes. Mount et al. (2016) complemented these findings by demonstrating that data-driven models could be a useful supplement to classical physics-based models. Even though data-driven models, especially machine learning (ML) algorithms, were applied in many hydrological applications (see Elshorbagy, Corzo, Srinivasulu, & Solomatine, 2010a;Elshorbagy, Corzo, Srinivasulu, & Solomatine, 2010b;Solomatine & Ostfeld, 2008;Yaseen, El-shafie, Jaafar, Afan, & Sayl, 2015 for reviews), their focus was mainly on regression and classification results (Brunner et al., 2018 andHeřmanovský, Havlíček, Hanel, &Pech, 2017), or on trained model structures, for example, Singh, Archfield, and Wagener (2014) analyzed the trained structures of a decision tree. These ways of using ML for knowledge extraction are a first step towards exploiting the potential of ML for process analysis.
The trained algorithms inherit a hypothesis about the processes they were trained to be reiterated. This hypothesis emerged from a competition between numerous other hypotheses within the training phase. Herein lies the true power of ML-enforced analysis of processes. Because the algorithm has no constrains in its ability to build process abstractions, it will be able to test and discard more hypothesis than a single human researcher could in a comparable amount of time.
However, it has to be recognized that the result of a fitted algorithm might not be transferrable in the most cases, due to the inherit process uncertainty. ML-algorithms are usually trained for a single purpose; therefore, results tend to overfit the problem. A solution for this issue is the use of ensemble techniques and validation of the results based on large data sets. Though analyses based on trained ML-algorithms will benefit from the performed learning procedure, they not only considered the regression and classification results from a trained algorithm. In addition, they gained knowledge from analyzing the internal structures of their decision trees. Such decision tree algorithms are compelling for because they are easy to interpret. While there are more powerful algorithms like artificial neural networks or ensemble techniques as the random forest (RF), their internal structures are not understandable for the human logic (Kelleher, Mac-Namee, & D'Arcy, 2015).
Here, we pursued a new approach for data-driven process research. We studied the training phase of an RF that was trained to characterize the runoff dynamics in a (pseudo)-ungauged catchment.
The training data was taken from gauged catchments in regional neighborhood of the target catchment. The questions we asked was "Which data set minimizes the parameter estimation error?" Using a step-by-step analysis of the learning procedure, we assessed which data sets should be used for an optimal training of the RF. The selection of ideal donor catchments, based on model performance, was followed by an analysis of the catchments characteristics. The groups of donor catchments as defined by ML were analyzed for homogenous grouping of various catchment characteristics, that is, basin shape (area, Horton ratios), land cover and soil properties. We thus recreated the ML-derived classification with catchment characteristics that could be used to identify donor catchments for an ungauged target catchment.
We used drainage velocity as a proxy for flow dynamics. In the concept of the geomorphologic instantaneous unit hydrograph (GIUH), drainage velocity defines the variance of catchment response while the shape of catchment response is defined by the geomorphology of the basin (Rodríguez-Iturbe & Valdés, 1979). Consequentially drainage velocity is directly connected to catchment response, that is, the process of runoff concentration. Using drainage velocity also offered other benefits that we subsequently implemented.
First, this parameter can be derived analytically from runoff and precipitation data as well as it can be calibrated to a GIUH-model. Second, the GIUH directly relates catchment properties to discharge characteristics and is therefore a valuable tool for predictions in ungauged basins (Hrachowitz et al., 2013;Rigon, Bancheri, Formetta, & de Lavenne, 2016). Yet, its application had been limited due to missing solution for event-wise parametrization. The use of machine learning presents a major step forward to overcome this problem.
Two basins in in south-east Germany were used in this study. Uhlemann et al. (2010) found that the frequency of floods caused by heavy rainfall significantly increased in this region. The basins of the rivers Regen and the Upper Main are both situated in a mid-range mountainous area of Bavaria.

| Case study catchments
We used rainfall-runoff events in hourly temporal resolution from  were removed from the data set, as they were influenced by snow impact. Floods caused or influenced by snow melt were excluded due to our focus on rainfall-induced floods.

| Catchment characteristics
For each sub-basin, several catchment characteristics were calculated in order to determine the catchment classification scheme (summarized in Table 1). From Corine land cover data (Bossard, Feranec, & Otahel, 2000), the percentage of agricultural (AGR) and forested areas (FOR) were determined to characterize the land cover of the F I G U R E 1 (Left) Digital elevation model (from SRTM data) and gauges (triangles) in the case study basins Upper Main (upper left) and Regen (lower right) located in south-east Germany. (Right) Land cover classes of the case study basins derived from CORINE land cover sub-basins. Topographical characteristics slope (SLO) and mean elevation (ELE) as well as the basin area (ARE) were derived from the DEM (Jarvis, Reuter, Nelson, & Guevara, 2008). The soil was characterized with the total pore volume in the upper 2 m of the soil layer (TPV) (Federal Institute for Geosciences and Natural Resources, 2006). Horton ratios of the drainage area R A , stream lengths R L , the bifurcation ratio R B as well as the maximum flow length of the highest order stream within the basin L MAX were determined as characteristics of the drainage system.
The drainage network and the Horton rations were calculated following the methods proposed by Grimaldi, Petroselli, and Nardi (2011) and Moussa (2009). The Horton ratios are the only catchment characteristics affecting the simulation of the time series directly, as they were used to parametrize the GIUH (Section 2.2). All other characteristics were solely used for catchment classification. Note: Area (ARE), elevation (ELE) and slope (SLO) derived from DEM, share of agricultural (AGR) and forested areas (FOR) from land cover data, total pore volume in the upper 2 m soil layer (TPV) from soil data. Horton rations (R A , R L , R B ) and L Max derived from DEM and stream network.

| Geomorphologic instantaneous unit hydrograph model
A hydrological model was used to reproduce the flood events at hourly temporal resolution. The modelling study was performed in a leaveone-out procedure to test the ability of the model for runoff prediction in ungauged basins. From the variety of existing GIUH models (Rigon et al. (2016) and Singh, Mishra, and Jain (2014) provided a thorough reviews), we chose the rather simplistic approach by Rosso (1984) due to its common application and its quick and robust results. The ordinates of the GIUH were calculated depending on time step t: with v D being the drainage velocity in [m/s] of the considered event. The constant numbers in Equations 2 and 3 are part of the model proposed by Rosso (1984) and have been determined for multiple combinations of R B , R A and R L . Although the model and the constants derived by Rosso (1984) are commonly applied, the uncertainty concerning the constants has to be noted. However, the Horton ratios of the majority of catchments used in this study were within the parameter boundaries tested by Rosso (1984). As Grimaldi et al. (2011) stated, the drainage velocity should be considered as a calibration parameter, although it is connected to the concentration time. In order to focus on runoff dynamics, we omit- Due to the use of observed rainfall-runoff ratios to describe the runoff generation process, the volume of the simulated and observed hydrographs are identical. Hence, the performance criterion β of Equation 5 is equal to 1 in all cases. The other components of Equation 5 are not affected by this decision, rather they are influenced by v D and the Horton ratios.

| Drainage velocity and process indicators
In their original description of the GIUH, Rodríguez-Iturbe and Valdés (1979) stated that the drainage velocity v D needed to be estimated for each event individually. This requirement was one of the main restricting factors for operational applicability of GIUH-models. In this study, we performed parametrization of the GIUH by event with ML. The algorithms were trained to predict v D from several process indicators that will be introduced in the following.
To train the algorithm, a set of known v D values was required. For each event of our data base, v D was calibrated with the BOBYQA (Bound optimization by quadratic approximation) algorithm (Johnson, 2018;Powell, 2009 (Breiman, 2001).
As an alternative to the random forest procedure, we applied the adaptive boosting strategy. Here the base estimators, again regression trees, were trained in steps based on the errors of the preceding base estimator. Due to the inferior results of adaptive boosting to the RF with randomly chosen subsets, this strategy was discarded. For detailed description and the theoretical background of the algorithms, see (Breiman, 2001) details on implementation that were provided by (Pedregosa et al., 2011).
We selected the RF due to its common application in hydrological studies (Addor et al., 2018;Brunner et al., 2018) and their ability to reduce process uncertainty from overfitting (Breiman, 2001). The use of multiple base estimators trained to different data sets (i.e. subsets of the complete training data) created a large ensemble of process perceptions. This decreased the tendency of single-estimator algorithms (like a neuronal network or a single decision tree) to overfit a problem.
The ML-algorithms were evaluated with the mean absolute error (MAE) as follows:

| k-means algorithm
The k-means algorithm is a common tool used for supervised classification of M objects into k clusters. Each object m is a vector comprising several characteristics. In this study, catchments were clustered based on selected characteristics taken from Table 1.
The k-means minimizes the intra-cluster variance, while maximizing the inter-cluster variance, giving a k disjoint clusters. The target function Z of the algorithm is (Pedregosa et al., 2011) The sum of the Euclidean distances between each object x to its assigned cluster center μ within all clusters S are to be minimized. The classification of the objects is iterated in order to minimize Equation 7.
At the beginning of each iteration step, the cluster centers μ are calculated as the average of all assigned objects (note that in the first iteration step, all objects are randomly classified). Then the distance between each object and all available clusters is calculated. In the last step of the iteration, the objects are assigned to the nearest cluster center. This procedure is repeated until the classification is stable or a maximum number of iterations (in this case 100 steps) is reached.

| Silhouette coefficient
The Silhouette coefficient was applied to search for dominant controls on catchment grouping. For a known catchment classification, in this case derived from the analysis of the algorithm training, the catchment characteristic was assessed maximizing the following equation of the silhouette coefficient s (Pedregosa et al., 2011;Rousseeuw, 1987): with a being the mean distance between a characteristic x of the subbasin i and all other sub-basins belonging to the same cluster A. The second variable b was the mean distance between the characteristics x of sub-basin i and the sub-basins of the next nearest cluster (calculated over all clusters C, different from A): The silhouette coefficient is defined within the boundaries −1 and 1, with 1 being the ideal outcome and indicating a dense and exclusive structure of the classification relative to the chosen catchment characteristic x. Values around 0 indicate overlapping clusters.

| DEVELOPMENT OF A MACHINE LEARNING BASED CATCHMENT CLASSIFICATION SCHEME
Trained dependencies and structures of predictor interaction within the machine learning algorithms are incomprehensible (Han & Kamber, 2010), with the exception of the CART algorithm. Hence, it is not reasonable to evaluate the algorithm parameters, rather its functionality offers insight into underlying processes.
The focus of this study was put on catchment grouping, that is, hydrologic similarity. Following the argumentation of Blöschl and Sivapalan (1995), we assumed that an ML-algorithm trained with data from the most similar donor catchment (within the training data) will show a superior model performance, in comparison to ML-algorithms trained with lesssimilar donor catchments. In this case, model performance is a proxy for catchment similarity. A ranking of catchments lowering model performance can be interpreted as a ranking of similarity.
As first step, we analyzed the progress of predictive capability of an RF with increasing amount of training data in the Regen basin.
Based on this assessment, we determined an empirical ranking of donor catchments for each target catchment that minimized the model error. Groups of catchments that served each other as donors were merged into groups. In a second step, we analyzed the connection between the empirical classification and catchment characteristics and developed the classification scheme. The findings were then tested in the basin of the Upper Main.

| Analysis of the ML learning procedure
To determine the rankings for each catchment, we performed the following analysis ( Figure 2): One catchment was selected as target and was removed from the training data set. Then each remaining catchment was used as donor to individually train an RF. Each model was evaluated with data withheld from the training data set. The best performing model determined the most similar catchment.
Next, each of the remaining catchments was used to train a new RF model in addition to the already determined donors. Again, the best performing model indicated the next catchment in the empirical catchment ranking. This procedure was repeated until a full ranking of all available catchments was determined.
In order to evaluate and benchmark the results two alternative rankings of donor catchments, based on common distance measures, were determined. One was based on the distance of the sub-basin centroids, assuming that catchment similarity could be defined by spatial proximity. In order to take catchment nesting into account, we chose a similarity measure based on the Top-Kriging method (Skøien, Merz, & Blöschl, 2006) as second alternate ranking.
To visualize the learning procedure of the RF, the regressor was trained step by step with an increasing amount of data. The data used were either defined by the empirical order of donor catchments (EMP), the centroid-order (CENT) or the Top-Kriging order (TOPK).
After each step, that is, after additional data were added set to the training data, the RF was evaluated with the data withheld from the target basin. Note that the data for evaluation were identical in all steps of the analysis, that is, the validation was always performed on the same data set. The development of the model error, expressed as the MAE, is shown exemplary for two catchments in the Regen basin in Figure 3, dependent on the number of used data sets for training.
The development of the model error shown in Figure 3  validation data were unchanged in each step, the only possible explanation for the decrease is that redundant or misleading information was added to the training data. In this context, connections between precipitation indices and v D caused other processes than those active in the target basin are misleading information. If such information are present in the training data, the performance of the ML-algorithm decreases.
Results for CENT or TOPK were very similar and led to higher model errors. The increase of error as a function of increasing data availability was not visible at first sight in most cases. Results from sub-basin Eschelkam (short: Eschel) displayed in the right diagram of Figure 3 showed a different result. In this particular case, all determined rankings led to the same result compared to the first three steps. From there the ranking of donor catchment as well as the model error diverged. This indicated that for Eschelkam donor catchments in close spatial proximity delivered the best training data and could be thus considered as similar. However, this result was only obtained for three nested sub-basins, located at a tributary stream of the Regen, which indicates that spatial proximity is proxy of another factor defining the similarity of these three catchments.

| Catchment groups in the Regen basin
Based on the results obtained from the analysis of the learning processes, we derived a catchment classification. Due to the noted increase of the MAE as a function of increasing data quantities, we considered two donor catchments as the optimum quantity for RF training. This quantity was considered a compromise between subbasins like Saegemuehle (Figure 3, left side) that showed an increasing model error with each added data set and other sub-basins like Eschelkam (Figure 3, right side) that showed a stable, in some cases decreasing, MAE for a data base of up to 4 basins.
The first explorative analysis of the results was to localize the favored donor catchments. We created a map for each target catchment and its two donors. These maps showed that three affinity groups were present in the obtained empirical rankings. An RF to estimate v D in a randomly chosen target catchment within one of these groups was likely to yield minimum MAE if the training data were taken from the remaining catchments within this particular group.
Following the assumption that the empirical ranking of donor catchments was correspondent to their similarity, these groups could be interpreted as similarity groups (Figure 4, left panel). Note that the catchments are shown as sub-basin, yet the RF-regressors were fitted to represent entire catchments.
Please note that catchment group 3 differed from the two other groups. An RF for basins in cluster 3 were trained best with data taken from cluster 1. Additionally, these sub-basins were not preferred as training data for any other target sub-basin. The reason for this obvious anomaly will be discussed below. Silhouette scores for each catchment characteristics were summarized in the right panel of Figure 4.  Figure 5 including a marking of its catchment grouping. Note that the shown cluster boundary lines represent the perpendicular bisectors of cluster center distances. The distinction of the catchment classification derived is obvious for L MAX (Figure 5). The distinction for R L is significantly weaker.
Recall that the reproduction of the empirically defined classification uses only catchment characteristics ( Figure 5). It is also visible that the catchments of group 3 possess a significant lower L Max than all other catchments, with all values <2 km.

| Validation case study
To prove the validity of our findings, we applied the L Max -R L classification scheme in a second case study. We transferred our findings to the basin of the Upper Main and identified four catchment groups.
The members of each group have again been defined using the kmeans algorithm (Pedregosa et al., 2011). The results of catchment grouping were summarized in the right panel of Figure 6. Spatial arrangement of the catchments and groups in the Upper Main are shown additionally in the left panel of Figure 6. Note that we added an additional group to maximize the silhouette coefficient and to reduce the number of catchments per group. Without the additional cluster, catchment groups 2 and 3 would have been merged.
In the next step, we trained an RF for each sub-basin and used the data from the remaining sub-basins within the particular cluster.
Cluster 4 was handled differently, due to its resemblance to group 3 of the Regen catchment. Analogous to the empirical ranking data from these catchments, with L Max lower than 2-4 km, were not used for training. The RF for these basins used data from the nearest cluster for training.
In order to evaluate the model performance, we calculated the MAE for the prediction of v D for the data withheld from the target catchment ( Figure 7). For comparison, we trained three additional RFs The decrease of performance in the validation case study showed that we did not fully encode the learning process of the machine learning algorithm. Nevertheless, the classification schemes produced satisfactory results in both case studies. It is important to note that we were able to reproduce a catchment grouping that was determined by a modelling procedure with catchment characteristics. Hence, we conclude that the characteristics of the drainage system R L and L Max are closely linked to the drainage velocity, a proxy for short-term runoff dynamics.

| RESULTS OF ML SUPPORTED GIUH MODELLING
In the previous sections, drainage system characteristics were identified as dominant runoff controls for runoff dynamics. The classification scheme we derived was used then to train an RF for each catchment included in this study. We also evaluated its capability to estimate drainage velocity parameters. Yet, the impact of the error on runoff simulation results remained unknown. Therefore, we subsequently applied the GIUH-model (Section 2.2) to all catchments. Runoff volumes were determined analytically from observed discharge, due to the values. The average v D in these catchments were significantly lower than in their assigned training data which indicated that these catchments were incorrectly classified.
The catchments with the lowest performance were located at the boundaries of the catchment groups ( Figure 5). Oberlauter (short: Oberl) is a member of Cluster 3, a cluster that has been treated differently than the other. All members of this cluster displayed noticeably low KGE values. Untersteinach (short: Unters) and Pfarrweisach

| Limitations of the ML predictor
In previous sections (Sections 3.1 and 3.3), the performance of the RF has been used as a foundation for process-related conclusions. From the analysis of the learning curves, we concluded that with increasing data from new catchments, redundant or misleading information was added to training data set hence the performance decreased. In our validation study, a decrease of the predictive performance has been connected to an insufficient classification scheme.
However, another explanation for these outcomes could be an insufficient capability of the model used, respectively the RF, to reproduce the full range of variation within the runoff dynamics. Hence, further analyses of the RF performance were required to eliminate this uncertainty.
As a consequence, the RF has been applied in a local application at each gauge. Each RF has been trained with a randomly chosen subset of 50% the available data. The RF has been evaluated using the data withheld from the same gauge. With this procedure we analyzed if the RF was capable to reproduce the variance of the observed v D values. In order to reduce effect of the randomly chosen subset, this procedure has been repeated 10 times.
If the results are compared to the range of observed values ( Figure 10), it can be concluded that the RF was generally able to reproduce the variance of occurring v D values. A slight underestimation of the full range of v D in sub-basins of Group 1 and 2 (left and middle panels) is visible ( Figure 10). This means that these sub-basins offer a slightly wider range of variance than the RF could reproduce. It has to be noted that the Box-Whisker plots of the observed and estimated values shows different data sets. While the observed box contains all available data points, the observed box contains RF-estimated of 10 iterations, each with 25% of the available data. Hence, the observed Box-Whisker contains more data points than the estimated.
But this only explains a small part of the lower variance. Our findings indicate that a single algorithm is not sufficient to reproduce the full range of hydrologic process heterogeneity, which is in concordance with findings of Elshorbagy et al. (2010b).
Our results showed that the RF was capable to reproduce a large amount of the natural heterogeneity. Additionally, the results showed that with training data from the correct catchment, the RF is able to

| Process implications
We showed that an ML-algorithm could be trained to estimate runoff dynamics of a flood, based on characteristics of the upcoming precipitation event. Moreover, we were able to show that data from selected neighboring catchments could be used to train such a model. Catchment selection was based on characteristics of the drainage system, L Max and R B , respectively. Robinson, Sivapalan, and Snell (1995) showed that catchments response is either governed by hillslope response or by network geomorphology. The latter is connected to process of network dispersion. They defined a transition zone between these types of response governance based on catchment area. White, Kumar, Saco, Rhoads, and Yen (2004)  Our findings were consistent with findings in the literature.
The transition from catchment response governed by hillslope processes to dispersion governed response was beforehand explained by drainage area. Our study showed that the transition can be described better with characteristics of the drainage system, L Max and R L , respectively. Our findings are supported by studies on hydrologic similarity in meso-scale catchments which identified topography or drainage characteristics as the relevant indicators for hydrologic similarity. Although we showed in a validation case study in the Upper Main basin that our classification scheme, based on L Max and R L , was transferrable, our results are restricted to the natural conditions of the basins used in this study. The Upper Main basin as well as the Regen basin are located in a mid-range mountainous area and share the same climatic conditions. Therefore, our findings on dominant controls on runoff dynamics are, at this point, restricted to these specific conditions. In future research, the proposed analysis of the ML-learning procedure will be applied to a wider set of basins in different natural and climatic regions. With this step, we will analyze the dependency of local similarity on these conditions and identify the respective dominant controls on runoff dynamics.
Beside our, locally and to rainfall induced floods constrained results, we proposed a new way of process research. In this study we gained knowledge from the analysis of a random forest and its training procedure. More specifically we followed the question: how did the algorithm learn and what data sources did it prefer? With a step-bystep analysis of the training data and the performance of the MLbased regression models, we drew conclusions about catchment groups. Subsequently these groups were related to clusters of catchment characteristics and we were able to build a catchment classification scheme. The derived scheme proved to be valid in the validation case study. Hence, we proved that our process assumptions, gained through ML-model analysis, were valid.
We also demonstrated that two other benefits of ML as a supplement for physically based hydrological models. On one hand, we obtained an operational benefit because the RF performed a calibration of the GIUH-model by events in ungauged catchments with sufficient performance, this being a problem that had been unresolved to this point. On the other hand, its learning procedure allowed to draw conclusions on runoff dynamics and catchment similarity. This dual benefit, operational applicability of the hydrological model and process analysis without a-priori process assumptions, showed the power of ML-application in hydrologic analysis. We therefore propose the use of machine learning and related analysis schemes, as applied in this work, as new way of interpreting data and process research.
In ongoing and future research, we will apply the presented technique to a larger set of basins to test our results in different topographic and climatic regions. We will analyze the dependency of dominant controls on runoff dynamics on catchment conditions and locations. Another focus will be laid on the input data. In this study we excluded snow-influenced floods, due to different active processes. A classification of the remaining rainfall-induced flood events will be introduced to consider flood types, that is, different active processes (comparable to Oppel, 2019) which will reduce the estimation errors of the RF. Additionally, we will include runoff generation as target variable. In this study, we focused on runoff dynamics, a parameter of limited complexity compared to runoff generation parameters.
Furthermore future studies will be based on the analysis of a larger set of ML algorithm. While we relied solely on RF in this study, we will take other structures, like deep learning artificial neuronal networks, into account in future. A larger ensemble will raise the probability of finding the suited model for the represented catchment processes.

ACKNOWLEDGEMENT
The authors would like to thank the Bavarian Ministry of the Environment for providing the data used in this study.

DATA AVAILABILITY STATEMENT
Discharge and precipitation data used in this study can be retrieved from the Bavarian ministry of the Environment: •