Hydrologically Informed Machine Learning for Rainfall-Runoff Modelling: Towards Distributed Modelling

. Despite showing a great success of applications in many commercial fields, machine learning and data science models in general, show a limited use in scientific fields including hydrology. The approach is often criticized for lack of interpretability and physical consistency. This has led to the emergence of new paradigms, such as Theory Guided Data Science (TGDS) and physics informed machine learning. The motivation behind such approaches is to improve the physical meaningfulness of machine learning models by blending existing scientific knowledge with learning algorithms. Following 10 the same principles, in our prior work (Chadalawada et al., 2020), a new model induction framework was founded on Genetic Programming (GP) namely Machine Learning Rainfall-Runoff Model Induction Toolkit (ML-RR-MI). ML-RR-MI is cable of developing fully-fledged lumped conceptual rainfall-runoff models for a watershed of interest using the building blocks of two flexible rainfall-runoff modelling frameworks. In this study, we extend ML-RR-MI towards inducing semi-distributed rainfall-runoff models. This effort is motivated by the desire to address the decreasing meaningfulness of lumped models which tend 15 to particularly deteriorate within large catchments where the spatial heterogeneity of forcing variables and watershed properties are significant. Henceforth, our machine learning approach for rainfall-runoff modelling titled Machine Induction Knowledge Augmented - System Hydrologique Asiatique (MIKA-SHA) captures spatial variabilities and automatically induces rainfall-runoff models for the catchment of interest without any subjectivity in model selection. Currently, MIKA-SHA learns models utilizing the model building components of two flexible modelling frameworks. However, the proposed framework can be 20 coupled with any internally coherent collection of building blocks. MIKA-SHA’s model induction capabilities have been tested on the Red Creek catchment near Vestry, Mississippi, United States. MIKA-SHA builds and tests many model configurations using the model building components of the two flexible modelling frameworks and quantitatively identifies the optimal model for the catchment of interest. In this study, MIKA-SHA is utilized to identify two optimal models (one from each flexible modelling framework) to represent the runoff dynamics of Red Creek catchment. Both optimal models achieve high-efficiency 25 values and good visual match with the observed runoff response of the catchment. Further, the resulted model architectures are compatible with previously reported research findings and fieldwork insights of the watershed and are readily interpretable by hydrologists.


Introduction
Understanding the underlying environmental dynamics occurring within watersheds is an essential and fundamental task in hydrology.Hydrological models play a key role in capturing the discharge dynamics of watersheds.Irrespective of considerable advance over past decades, there is still some scope to advance state of art in hydrological knowledge to fully describe the functioning of a watershed upon a rainfall event owing to the highly complex, interdependent, and non-linear behaviours of governing physical phenomena.So far, no hydrological model structure can perform equally well over the entire range of problems (Fenicia et al., 2011;Beven, 2012a).This leads to different research directions seeking different hydrological models based on different modelling strategies (Beven, 2012b).Hydrological models are expected not only to have good predictive power but also to be interpretable in capturing relationships among the forcing terms and catchment response which may lead to the advancement of scientific knowledge (Babovic, 2005(Babovic, , 2009; Karpatne et al., 2017).Therefore, the final goal of any successful hydrological model must be based on a physically meaningful model architecture along with a good predictive performance.
Data science methods (machine learning techniques) have shown limited success in many scientific fields including hydrology compared to the level of success in many commercial fields (Karpatne et al., 2017).Although the data-driven models are often performing better in terms of predictive capabilities than traditional physics-based, conceptual, and empirical hydrological models (e.g.Nearing et al., 2018;Kratzert et al., 2019a), they may contribute little towards the advancement of scientific discovery due to the lack of interpretability of the model configurations (Karpatne et al., 2017).Recently, a novel modelling paradigm called Theory Guided Data Science (TGDS) (Karpatne et al., 2017) or physics informed machine learning (Physics Informed Machine Learning Conference, 2016) has emerged to enhance the explainability of machine learning models or data science models in general.Here, the existing body of knowledge is blended with machine learning algorithms to induce physically consistent models.

Fundamental Approaches in Hydrological Modelling
A general discussion on different hydrological modelling paradigms some of which are used in the present study is presented in this section.An in-depth discussion of each paradigm is beyond the scope of this paper.

Physics-based Models vs. Conceptual Models vs. Data Science Models
Physics-based models and conceptual models are founded in scientific principles and theories to describe different hydrological processes.These models are following an approach where a hypothesis is assumed initially, and the observations are used to accept or reject it.While small-scale physics are used within the physics-based models, conceptual models consist of a collection of reservoir units that approximate the moisture storage within the basin.The ideal solution to understanding and prediction of any environmental dynamic would be through physics-based models if and only if the body of knowledge is sufficient enough to fully describe the behaviour of those environmental processes.However, this is not the situation in hydrology and water resources science in general.Due to the conceptual representation instead of small-scale physics utilized in physics-based models, the complexities of conceptual models are largely reduced when compared to physics-based models.
As the conceptual components are derived from known physics but in a simplified manner, conceptual models can provide good process representation and reasonable physical meaningfulness in the model configurations.
On the other hand, with the advancement in the computer power and acquisition of data through remote sensing and geographical information systems, data science models gained more attraction in many fields.Especially, within the last two decades, there is an increase in data science model applications, such as machine learning models in hydrological modelling (Babovic and Abbott, 1997;Babovic, 2005;Yaseen et al., 2015).The data science models utilize the available data to build input-output relationships which provide actionable models with good predictive power.Physics-based, conceptual and data science models depend on data to a different extent.For example, physics-based models require the measured physical quantities to use as model parameters, whereas the model parameters of conceptual models are derived through a calibration process using measured input-output data.Similar to the conceptual models, data science models utilize the measured inputoutput data for the model training process.
While theory-based models (physics-based and conceptual models) are frequently admired by the community due to its interpretability which may lead to better understand catchment dynamics, they often experience poorer predictive power than data science models.At the same time, simplistic applications of data-driven models which often result in higher prediction accuracies than the physics-based and conceptual models may suffer serious difficulties with interpretation as they are unable to provide basic hydrological insights (Chadalawada et al., 2020).This dichotomy led to the evolution of two major communities in water resources engineering: those who work with theory-based modelling and those who deal with machine learning techniques, which appear to be working quite separately (Todini, 2007;Sellars, 2018).

Fixed Models vs. Flexible Models
Two types of modelling approaches can be identified within the conceptual modelling: the models based on a single hypothesis (fixed models) and the models based on multiple hypotheses (flexible models).Fixed models are built around a general model architecture that gives satisfactory model performances over a fairly broad range of watersheds and meteorological conditions.
At the same time, it is quite improbable for a model to perform equally well in completely different climates and geological regions.In contrast to fixed modelling, flexible modelling frameworks provide more granularity in the model building by allowing the hydrologist to customize the model structure to suit the intended task.These flexible modelling frameworks provide model building blocks that can be arranged in different ways to test many hypotheses about catchment dynamics instead of the one fixed hypothesis in fixed models.Such robust quality of any modular modelling framework allows the modeller to consider the uniqueness of the area of their application.
The high degree of transferability of flexible modelling frameworks is an aiding factor in proceeding in the direction of a unified hydrological theory at a watershed level.Simultaneously due to the dynamic modularity and high level of granularity, constructing a suitable model for the watershed of concern may require significant effort and expert knowledge.Hence, a hydrologist with novice knowledge would require to test many model structures beforehand selecting an optimal model which is time demanding and computationally intensive, in consequence, hinders the opportunity to use the flexible modelling frameworks in their full potential.

Lumped Models vs. Distributed Models
Hydrological models are broadly classified into lumped and distributed models based on how they treat the spatial variabilities of catchment properties and climate variables.Lumped models ignore the spatial heterogeneity and recognize the whole watershed as a single unit.Such models use catchment average variable values as model inputs.However, especially when the catchment size increases, the meaningfulness of the lumped values decreases and hence the inferences made on the basis of a lumped model may be accurate but not be reasonable or realistic.Namely, there is a possibility that macro-scale patterns of catchments are governed by the heterogeneity (Nearing et al., 2020a).In addition to that, if the modeller's requirement lies within the catchment (e.g.discharge at a particular location within the catchment), then the only option would be to adopt a distributed model where the spatial variabilities are considered in its modelling process.
One way of addressing the so-called uniqueness of the place as a major issue to deal with hydrological modelling (Beven, 2020) is to use distributed models.At the early stages of distributed modelling, the approach (fully distributed modelling) was constrained due to the lack of data and computational power (Wood et al., 2011;Beven, 2012b;Fatichi et al., 2016).Hence, it was thought that this approach would gain success with the advancement of technology.Until today, however, fully distributed models have not achieved the expected outcome (Beven, 2020).This points out that the problem lies not only in the lack of local information but also due to the issues in how processes are represented within the distributed model (Beven, 2020).A comprehensive review of applications, challenges, and future trends of fully distributed modelling in hydrology is presented in Fatichi et al. (2016).
An effective alternative for both lumped and fully distributed models would be the semi-distributed models where separate conceptual models are assigned to functionally distinguishable land segments (Boyle et al., 2001;Fenicia et al., 2016).In the semi-distributed modelling approach, each model operates individually and there are no interconnections or dependencies with other models in the network.This and the use of conceptual models instead of small-scale physics make this approach several orders less complex than the fully distributed models.

Model Complexity
Selecting an optimal model out of the suitable candidate models (competing models) is not a trivial matter.According to Wainwright and Mulligan (2013), the optimal model is defined as the model with enough complexity to explain the underlying physical phenomenon.Ideally, optimal model selection should be based on bias-variance tradeoff as the more complex models result in low bias and high variance while simpler models result in low variance and high bias in their predictions (Hoge et al., 2018).However, there is no clear cut definition for model complexity and existing definitions differ across different disciplines (Guthke, 2017).In the context of hydrology, model complexity is often defined based on the process complexity and spatial complexity of the model (Clark et al., 2016) where process complexity is a measure of the number of hydrological processes explicitly represented by the model and spatial complexity is a measure of the degree of model's spatial discretization and their connectivity.
As per the survey conducted by Baartman et al. (2019), most researchers believe the selection of the model among competing models should be governed based on the question at hand (i.e.suitability of the model to achieve research objectives).However, Addor and Melsen (2018) have reported that the choice of the model in hydrological applications is often based on familiarity of the model (i.e. based on legacy rather than adequacy).Inherent model complexity of a model is frequently assessed either in terms of number of model parameters, number of state variables, number of physical processes included or computational complexity and the choice of such matrix to measure complexity is often subjective (Baartman et al., 2019).One possible alternative to measuring model complexity would be through the analysis of time series complexity of resulted output signatures of the models based on information theory and pattern matching (Sivakumar and Singh, 2012).Regardless of the matrix used to measure the model complexity, model parsimony should to be a part of that as unwarranted complexity may lead to overfitting and high uncertainty (Guthke, 2017).

Machine Learning in Water Resources
Machine learning or data science in general, have become an irreplaceable tool, not only in commercials but also in many scientific fields.Data-driven techniques started to gain a lot of attention among the hydrologists within the last two decades.
Artificial Neural Networks, Evolutionary Computation, Wavelet-Artificial Intelligence models, Support Vector Machines, and Fuzzy set are the most popular data science techniques in hydrological modelling (Yaseen et al., 2015).Each of these techniques has its strengths and weaknesses.The scope of this paper does not discuss different data-driven techniques in detail.
Machine learning models have shown encouraging performances in a range of water resources applications because of their ability to capture noise complexity, non-linearity, non-stationarity and dynamism of data (Yaseen et al., 2015).Certainly, if we are only interested in better forecasting results then, the machine learning models might be the preferred choice over the conceptual or physics-based models due to their better predictive capability.Another major advantage of a machine learning model is that it requires much less human effort to develop and calibrate than a physics-based model (Nearing et al., 2020a).
Data-driven techniques have made it possible to develop actionable models with high prediction accuracy without depending on domain knowledge/ physical hydrology.At the same time, this very nature of data-driven models has become the main point of criticism especially in scientific fields including hydrology.They are regularly quoted as black-box models where the user has little or no knowledge about how the model makes its predictions.Karpatne et al. (2017) offer two reasons for the limited success of data-driven models in scientific fields.The first reason is the limited availability of labelled instances for the model training which makes it harder to extrapolate model predictions beyond the available labelled data.The second reason is associated with the objectives of the scientific discovery where the final goal is not only to have actionable models but also to convey a mechanistic awareness of underlying operations which may lead to the advancement of scientific knowledge.Further, data science models, such as Deep Learning (DL) models have shown better performances in hydrograph predictions than the traditional approaches in ungauged catchments (Kratzert et al., 2019a).At the same time, a recent paper (Beven, 2020), questions the performance of a DL model in ungauged catchments when the geological characteristics are not well defined within the model.According to this paper, DL models have not solved the ungauged catchment problem and they have just achieved higher efficiency values than the traditional approaches.Nearing et al. (2020a) argue that there is a danger for the hydrologic community in not recognizing the potential of machine learning offers for the future of hydrological modelling.The authors argue that machine learning models can capture catchment similarities by providing good results even for the catchments which were not used for the training of those models.This implies the capability of machine learning models in developing catchment scale theories that traditional models were unable to do so well.Further, the authors reject the most common criticism on machine learning models (the lack of explainability) by stating that even the accuracy of process representation in physics-based models is questionable due to their poorer prediction accuracies, criticizing only on machine learning models is unfair and meaningless.Despite having a huge potential within machine learning models, the state of art machine learning capabilities have not been tested in hydrological modelling and they expect even distributed hydrological models are to be developed primarily on machine learning in near future.Beven (2020) highlights the importance of the interpretability of DL models and suggests more direct incorporation of process information into such models.Further, he points out that machine learning models should also need to pay attention to similar issues associated with traditional modelling approaches like data and parameter uncertainties and equifinality.A brief discussion of two widely used machine learning techniques in hydrology is presented below.
Deep Learning (DL) models, such as Recurrent Neural Networks (RNNs) are often used for regression tasks, like modelling sequential data and time series analysis (Hu et al., 2018).Long Short-Term Memory (LSTM) is the most successful RNN architecture which utilizes gates and memory cells to retain state information of sequential data.Hence, LSTMs are more suitable for hydrological modelling applications, such as rainfall-runoff modelling.The state of art DL capabilities have not yet been tested in hydrological modelling and there are only a few DL applications so far (Shen et al., 2018).Successful DL applications in hydrology include rainfall-runoff modelling (Hu et al., 2018;Kratzert et al., 2018Kratzert et al., , 2019aKratzert et al., , 2019b;;Fan et al., 2020;Nearing et al., 2020b;Xiang et al., 2020), soil moisture modelling (Xiaodong et al., 2016), precipitation forecasting (Kumar et al., 2019), groundwater estimation (Afzaal et al., 2019) and uncertainty estimation (Gude et al., 2020).

Genetic Programming (GP)
Genetic Programming is an evolutionary computation algorithm (Koza, 1992) inspired through the basic principle of Darwin's theory of evolution.GP is capable of automatic generation of computer programs and falls under the supervised machine learning category.The most distinct feature of GP over the other machine learning techniques is its ability to produce explicit mathematical expressions of input-output relationships.As a result, GP is referred to as a grey box data-driven technique and differentiates it from the other black box data-driven approaches, like ANNs (Mehr et al., 2018).Other than that, its conceptual simplicity, the ability of parallel computing, and the capability of obtaining the near-global or global solution make GP a powerful machine learning technique.
GP generates the structure of its solutions (GP individuals) by arranging mathematical functions, input variables, and random constants.These are known as the building blocks of the GP algorithm.The algorithm starts with a randomly generated set of candidate solutions for the task at hand.The performance of each candidate is then assessed using a user-defined objective function.Individuals are selected by assigning higher chances of selection for better individuals (based on objective function value) to create offspring by apply genetic operators (crossover, mutation, and elitism).The new set of offspring becomes the candidate solutions in the next generation.This process is repeated until the algorithm meets its termination criteria (usually a maximum number of generations).The candidate solutions evolve towards the global optimum when the GP algorithm curtails the error margin between the simulated values of its individuals and measured observations (Babovic and Keijzer, 2000).

Physics Informed Machine Learning
One promising way forward which may bridge the gap between physics-based and machine learning modelling communities would be to couple the existing hydrological knowledge to guide machine learning models (Babovic and Keijzer, 2002;Babovic, 2009).This recent paradigm is presently referred to as Theory Guided Data Science (TGDS) (Karpatne et al., 2017) or Physics Informed Machine Learning (Physics Informed Machine Learning Conference, 2016).This modelling paradigm aims to simultaneously address the limitations of data science and physics-based models (primarily, lack of interpretability of data science models and poorer predictive capabilities of physics-based models) and induce more generalizable and physically consistent models.There are five ways of incorporating basic scientific knowledge with data-driven models (Karpatne et al., 2017): (i) theory-guided design of data science models, (ii) theory-guided learning of data science models, (iii) theory-guided refinement of data science outputs, (iv) learning hybrid models of theory and data science and (v) augmenting theory-based models using data science.A typical physics informed machine learning model may follow one or more of the above mention approaches to bring together scientific knowledge and data science techniques.Although, there are few reported explainable artificial intelligence utilizations in hydrological modelling in past (e.g.Cannon and Mckendry, 2002;Keijzer and Babovic, 2002;Fleming, 2007), there is an increasing trend of adopting theory-guided machine learning models for recent water resources applications (McGovern et al., 2019), such as hydroclimatic model building (Snauffer et al., 2018), automated model building (Chadalawada et al., 2020) and hydrologic process simulation (Solander et al., 2019).Even though there are attempts in almost every machine learning technique to incorporate existing hydrological knowledge into the basic frameworks, in the sequel, we only discuss such attempts in ANNs and GP.
Relative to the GP, ANNs suffer the most severe consequences of lack of interpretability of resulted models (Mehr et al., 2018).
An effective solution for this would be the use of augmented versions of neural networks where the existing theoretical knowledge is used to govern the learning algorithm to enhance the interpretability of induced models.Brunton et al. (2016), Raissi et al. (2017) and Rudy et al. (2017) used Physics Informed Neural Networks (PINN) in time series analysis to derive governing partial differential equations.Prediction of extreme rainfall events was carried out by Cannon (2018) using a neural network architecture constrained by physical laws.Wang et al. (2020) introduced a deep learning framework called Theory Guided Neural Networks (TGNN) for subsurface flow modelling where the governing equations, physical constraints, engineering controls and expert knowledge are used to guide the ANN model.Please refer to Fleming et al. (2015) and Xu et al. (2019), for further theory-guided neural network utilization in water resources.
Although the physics informed machine learning was only recently identified as a new modelling paradigm in the context of GP, there were attempts over past two decades to blend the hydrological knowledge into basic GP framework to induce more physically reliable hydrological models.To achieve physical consistency and dimensional accuracy of GP induced models, researches developed few enhanced versions of the GP algorithm by incorporating the existing hydrological knowledge.
Declarative bias and preferential bias were incorporated with the model-building phase of GP to reduce physical contraventions and to achieve dimensional accuracy of induced equations (Babovic andKeijzer, 1999, 2002;Keijzer and Babovic, 2002).At the initialization stage, declarative bias forces to sample only the dimensionally correct solutions (a hard constraint on dimensional correctness) while preferential bias guides the algorithm towards the dimensionally correct solution (a soft constraint on dimensional correctness) and allows all solutions to induce.Authors have reported that this augmented version of GP resulted in fast convergence through the reduction of solution space and achieved more parsimonious and regularize expressions than traditional GP.Dimensionally aware GP was utilized to extract hydraulic formulae from measurements by Babovic et al. (2001).
The inclusion of high-level theoretical concepts in sediment transport modelling with GP resulted in equal or superior performances than the traditional modelling with human expert knowledge (Baptist et al., 2007;Babovic, 2009).Another augmented version of GP was used for the identification of predominant processes in hydrological system dynamics by Selle and Muttil (2011).A reservoir model, a cumulative sum and delay function, and a moving average operator were incorporated as basic hydrological insights into the GP function set by Havlicek et al. (2013), to develop a rainfall-runoff prediction programme called SORD.They were able to achieve superior performances in terms of prediction accuracy with SORD than to ANNs and GP without above-mentioned special functions.GP was used as a model induction algorithm in Chadalawada et al. (2017), to optimize both model architecture and parameters to automatically induce most appropriate Tank model structure for a watershed of interest.Here, the hydrological knowledge is incorporated as special functions inspired through the Sugawara Tank model template (Sugawara, 1979).In our prior work (Chadalawada et al., 2020), an automatic lumped conceptual rainfall-runoff model induction toolkit was developed using GP and the building blocks available in two modular modelling frameworks (FUSE and SUPERFLEX) were used as the components of hydrological insights.
The remaining of this text is arranged as follows.The specific modelling strategies used in the current study and research objectives are presented in Sect. 2. The proposed model induction framework is introduced in Sect.3.An application of the proposed framework is given in Sect. 4. The last section (Sect.5) discusses the research findings and the conclusions of the present study.Additional details are presented in the Appendix.

Methods
In this contribution, following the TGDS modelling paradigm, we introduce a novel model induction engine called Machine Induction Knowledge Augmented -System Hydrologique Asiatique (MIKA-SHA) for automatic induction of semi-distributed rainfall-runoff models for a catchment of interest.This work is motivated by the success of our previously introduced (Chadalawada et al., 2020) model induction toolkit titled Machine Learning Rainfall-Runoff Model Induction Toolkit (ML-RR-MI).ML-RR-MI is capable of inducing fully-fledged lumped conceptual rainfall-runoff models for a watershed of interest.
We use the term "hydrologically informed machine learning" to refer that the existing body of hydrological knowledge is used to govern the machine learning algorithms to induce physically consistent model configurations.The proposed framework uses Genetic Programming (GP) as its learning algorithm, whereas the model building modules of two flexible rainfall-runoff modelling frameworks namely FUSE (Clark et al., 2008) and SUPERFLEX (Fenicia et al., 2011;Kavetski and Fenicia, 2011) represent the elements of existing hydrological knowledge.
By being a TGDS approach, the top priority of MIKA-SHA (ML-RR-MI also) remains as the induction of readily interpretable rainfall-runoff models with high prediction accuracies using GP.However, the specific objectives of the current study involve 1) Incorporation of spatial heterogeneities of catchment properties and climate variables into the rainfall-runoff modelling while maintaining the model parsimony of induced models, 2) Adoption of a quantitative model selection approach to select an optimal model with appropriate complexity instead of "simpler the better" paradigm used in ML-RR-MI.The approach addresses the common hydrological issues, such as equifinality, subjectivity, and uncertainty, in the context of semi-distributed modelling and machine learning.This study is a part of the larger ongoing research effort of using hydrologically informed machine learning for automated model induction.
Considering the uniqueness of the place is an important aspect of hydrological modelling (Beven, 2020).The use of distributed modelling concepts and flexible modelling frameworks are two available toolsets to incorporate the spatial heterogeneity into the model building phase.Due to the limited success and higher-order complexity of fully distributed models, the semidistributed modelling concept is used for the current study where a network of functionally distinguishable conceptual models from flexible modelling frameworks is developed to represent the watershed dynamics.As a result of the higher granularity and flexibility provided by the flexible modelling frameworks, even with a lumped application, one can try thousands of possible model architectures for a catchment of interest.This may rise to millions of possible model combinations in the context of semi-distributed modelling which makes it almost impossible to test them manually.
Further, the selection of a model configuration without testing alternative model configurations would become highly subjective and may require considerable expert's knowledge and time.In a recent paper (Addor and Melsen, 2019) based on more than 1500 peer-reviewed research articles, concluded that the model selection in hydrological modelling is more often driven by legacy rather than the adequacy.In such situations, model selection may be governed by the factors, such as the popularity of model, easiness, prior experience instead of the appropriateness of the model for the intended task which may result in biased research findings.Therefore, we see a necessity to automate the model building phase to overcome these limitations.
GP has been selected as the machine learning technique here due to its ability to optimize both model configuration and model parameters together.It is interesting to note that, most state of art GP utilizations in water resources (Oyebode and Adeyemo, 2014;Mehr et al., 2018), GP is still utilized as a short-term prediction mechanism which is analogous to ANN applications.In our contribution, we explore the full potential of GP by inducing fully-fledged rainfall-runoff models where the hydrological insights are introduced through the integration of process understanding by including model building components from existing flexible modelling frameworks into the function set of GP algorithm.As per the taxonomy defined in Karpatne et al. (2017), our framework falls under the hybrid TGDS category.Currently, MIKA-SHA learns models utilizing the model building components of two flexible modelling frameworks.A brief discussion of the two flexible modelling frameworks used in the current study is given below.However, the proposed framework can be coupled with any internally coherent collection of building blocks.

SUPERFLEX
SUPERFLEX (Fenicia et al., 2011;Kavetski and Fenicia, 2011) framework facilitates hydrologists to test many different hypotheses about the functioning of the watershed of interest using the model building components (reservoirs, junctions, and lag functions) available in the framework.The water storages within the catchment, such as soil moisture, interception, groundwater, and snow along with their release of water are represented through reservoir units.Junction elements conceptualize the merging and splitting of different fluxes in catchment dynamics (e.g.Hortonian flow, evaporation).Channel routing (delays in flow transmission) is described using lag functions.A number of constitutive functions are available to describe lag function characteristics and storage-discharge relationships of storage units (reservoirs).SUPERFLEX applications in rainfall-runoff modelling are found in van Esse et al. ( 2013), Fenicia et al. (2014Fenicia et al. ( , 2016)), and Molin et al. (2020).Clark et al. (2008) developed Framework for Understanding Structural Errors (FUSE) to examine the effect of model structural differences on rainfall-runoff modelling.FUSE conceptualizes the functioning of a catchment using a two-zone model architecture: an unsaturated zone (upper soil layer) and a saturated zone (lower soil layer).The model building modules of FUSE involve the choice of upper and lower soil configurations and parameterization for different hydrological processes, such as evaporation, percolation, interflow, surface runoff, and baseflow.The modeller has the freedom of selecting these model building modules from four rainfall-runoff models (TOPMODEL, ARNO/VIC, SACRAMENTO, and PRMS) which are known as parent models.For more details and applications of FUSE, please refer to Clark et al. (2010) and Vitolo (2015).

Performance Measures
MIKA-SHA consists of a performance measures library including the majority of the widely adopted performance matrices (Chadalawada and Babovic, 2017).In the present study, we have selected four absolute performance measures namely volumetric efficiency (Criss and Winston, 2008), Kling-Gupta efficiency (Gupta et al., 2009), Nash-Sutcliffe efficiency (Nash and Sutcliffe, 1970) and log Nash-Sutcliffe efficiency (Krause et al., 2005) from the MIKA-SHA's performance measures library to evaluate the simulated discharge values against the measured discharge values.The four selected objective functions are sensitive to different regions of measured and simulated runoff signatures and their details are given in Table 1.The four selected objective functions are used in the multi-objective optimization scheme of MIKA-SHA.to select an optimal model from a set of equally performing models would be to select the model with least complexity in terms of the model structure (most parsimonious model).However, in the current study, we employ the following relative performance measures to evaluate the model performance of each model relative to other competing models at the optimal model selection stage.Details about how each performance measures are used within the proposed framework are given in Sect.6.

Standardized Signature Index Sum (SIS)
The Standardized Signature Index Sum value (Ley et al., 2016)

Cross sample entropy value (Cross-SampEn)
Cross-SampEn value is a derivation from the commonly used Sample Entropy value (Richman and Moorman, 2000).Sample Entropy is a complexity measure of data series which has its origin in information theory.Sample Entropy value gives an idea about the complexity of the data series based on the information content in a mathematical way.Cross-SampEn value also follows the same concept but is used to measure the correlation between two series by matching patterns from one series with another.A low Cross-SampEn value indicates that the two series are more similar to each other.More details about Cross-SampEn can be found in Delgado-Bonal and Marshak (2019).

Dynamic Time Warping (DTW) distance
Dynamic Time Warping (Sakoe and Chiba, 1978) is a similarity measure between two time series which includes warping of their time axes to find the optimal temporal alignment between the two.DTW distance is derived as an alternative to the commonly used Euclidean distance.Two identical time series with a small-time shift may ending up with a large Euclidean distance and may consider them as two dissimilar time series.The DTW method captures them as two similar time series as it ignores the shift in the time axes.A low DTW distance indicates more similarity between the two time series compared.Details and applications of the DTW method can be found in Salvador and Chan (2007), Giorgino (2009) and Vitolo (2015).

Model parsimony
Here, the model parsimony is evaluated in terms of the number of associated model parameters of each model.One model is considered more parsimonious than another model if the number of model parameters of the former is lower than the later.

Model Identification
As shown in Fig. 2, model identification stage starts with a randomly generated set of candidate model structures (semidistributed model structures made out from the special functions, basic mathematical functions and random constants) to capture the runoff dynamics of the catchment of interest.These model structures (GP individuals) may differ from each other in terms of model structural components and parameter values.Then, the performance of each individual is assessed on a user-defined multi-objective criterion.Next, the individuals are selected to create a mating pool which is used to create offsprings (child population) through the application of genetic operators, such as crossover and mutation.The selection scheme (explained later) ensures that individuals with higher performance values in terms of the objective functions used have a higher chance of selection.This way GP algorithm optimizes both model configuration and associated parameters of the GP individuals simultaneously.Finally, the same selection scheme is utilized to select individuals to form the next generation (next parent population) from both individuals of the child population and parent population (combined population).This evolution process continues through the generations until the algorithm reaches the maximum number of generations.Here, the optimization algorithm is repeated for a user-specified number of iterations (independent runs) to cover the solution space to a greater extent.The output of the model identification stage consists of a set of non-dominated models (Pareto-optimal models) based on the selected objective criteria.More details about the basic steps involved at this stage are given in Chadalawada et al. (2020).
MIKA-SHA relies on a multi-objective optimization framework founded on Non-dominated sorting genetic algorithm-II (NSGA-II) (Deb et al., 2002) at model identification stage, using desired objective functions from the objective function library.Each individual (candidate solution) in the population is evaluated on each objective function separately.Based on the objective function values, each individual is assigned a non-domination rank and a crowding distance value.The ranks are identified based on the Pareto-optimality concept.For example, all the individuals with non-domination rank 2 are dominated by individuals with rank 1.However, individuals with rank 2 are not dominated by any other individuals with a higher rank (lower the rank better the individual).On the other hand, crowding distance measures how an individual located relative to the other individuals of the same rank (more the distance better the individualmore diversity).Therefore, at the selection phase of the algorithm, when two individuals are randomly selected and they have different ranks, the individual with the lower rank is selected.If both of them have the same rank, then the individual with higher crowding distance is selected.Having said that, identification of the best performing model from Pareto front of non-dominated solutions for a watershed of interest is not a trivial matter.The explanatory power of the performance measure used to assess the prediction accuracy of model simulations has a direct impact on the optimal model selection (Chadalawada and Babovic, 2017).

Model Selection
Model selection stage starts with the best models of each independent run (front 1 models of final generation) derived through the GP framework at the model identification stage.The quantitative optimal model selection process is streamlined as follows.
1. Performance evaluation using the same multi-objective criterion on validation data for all identified models from the model identification stage.
2. Re-identification of Pareto-optimal models based on both calibration and validation fitness values.
3. Calculation of Standardized Signature Index Sum (SIS) of each Pareto-optimal model.4. Selection of Pareto-optimal models with SIS scores below zero over the calibration and validation periods.
5. Identify unique model structures (hereinafter referred to as competitive models) from the models in step 4. If there is more than one model with the same model structure, the model with the most negative SIS value is selected.
6. Quantitative selection of the optimal model to represent catchment dynamics based on three relative measures: Cross sample entropy value (Cross-SampEn), Dynamic Time Warping (DTW) distance and model parsimony based on the number of associated model parameters.Competitive models are ranked according to each measure (lower the value better the performance) and the model with the lowest sum up rank is selected as the optimal model for the watershed of concern.

Uncertainty Analysis
Once the optimal model is identified for the catchment of interest its uncertainty and sensitivity analysis are performed using Generalized Likelihood Uncertainty Estimation (GLUE) (Beven and Binley, 1992) as described below.
1.A random subset of model parameters of the selected optimal model structure is changed uniformly within their parameter range (in this case between 0 and 1 as all parameter ranges are normalized within MIKA-SHA framework) while keeping the remaining model parameters at their calibrated values.NSE is used as the likelihood estimation.If the model parameter set provides an NSE value greater than the likelihood threshold, the parameter set, its NSE value and the simulated discharge are recorded (known as behavioural models).
2. Repeat the above step until the number of behavioural models reaches a user-defined value.
3. For each time step, simulated discharge values of all behavioural models are sorted in ascending order.Then, a weight is assigned to each model (NSE value itself is used as the weight).Finally, the Cumulative Probability Distribution Function (CDF) of the weights is calculated at each time step.4. For each time step, a relationship diagram is obtained by taking CDF as the x-axis and simulated discharge at the yaxis.From the diagram, corresponding simulated discharge values of 95% and 5% quantile of CDF are selected as the upper and lower bounds of the 90% confidence band. 5. Percentage of observed discharge values (both in calibration and validation period) which fall within the 90% confidence band is used to measure the uncertainty estimation capability of the selected optimal model.6.If the uncertainty estimation capabilities are satisfactory, the model performance of the optimal model is tested for an independent time frame (testing period) which is not used in model selection or identification stages.If the uncertainty estimation is not satisfactory, then, all the above steps are to be repeated with the next best competitive model.7. Sensitivity scatter plots are drawn for each model parameter using the parameter values of behavioural models.The shape of the scatter plot (the x-axisnormalized parameter range, the y-axis -NSE values) is used to identify the degree of sensitivity of each model parameter.
The main target of MIKA-SHA is to induce physically consistent semi-distributed rainfall-runoff models through the incorporation of spatial heterogeneity data and existing hydrological knowledge with its machine learning algorithm.However, the following measures are taken to handle general hydrological modelling issues, such as overfitting, subjectivity and equifinality.
• Avoid overfitting -Limit tree growth, consider validation fitness and model parsimony in the optimal model selection, use of internally coherent special functions, evaluate the fitness of the optimal model on an independent data set.
• Remove subjectivity -Automated process ensures no direct human involvement, test many hypotheses before selecting an optimal model.
• Handling equifinality -Optimal model selection is based on many absolute and relative performance matrices.

Application of MIKA-SHA
This section aims to demonstrate how MIKA-SHA works through a one case study using a watershed in the United States.
MIKA-SHA was used to identify two optimal model configurations for the catchment of interest using SUPERFLEX and FUSE model building libraries separately.

Study Area
The Red Creek watershed near Vestry, Mississippi (Fig. 3) was selected to test the semi-distributed model induction capabilities of MIKA-SHA.Red Creek watershed is located in the Eastern United States and basin details are summarized in Table 2. Soil and land use data of Red Creek catchment (resolution of 30 m x 30 m) were downloaded from the United States Department of Agriculture's (USDA's) Geospatial Data Gateway (USDA's Geospatial Data Gateway, 2020), whereas the Digital Elevation Data (DEM) at 30 m resolution were obtained from the Shuttle Radar Topography Mission (SRTM) data from United States Geological Survey (USGS) EarthExplorer (USGS EarthExplorer, 2020).The whole watershed was divided into three subcatchments (Sub 1, Sub 2 and Sub 3) for the current application.HRUs were identified based on the topography of the area and three HRUs namely, Hill (slope band %> 10), Floodplain (slope position threshold = 0.1) and Plateau (slope band % < 10) were selected.The HRU details are given in Table 3.  (Newman et al., 2015).The spatial distribution of daily precipitation data were considered and lumped at the subcatchment scale (three precipitation time series for the three subcatchments).Precipitation data were downloaded from the Dayment dataset (Dayment, 2020) which provides daily weather parameters (resolution: 1 km x 1 km), over North America.The time series diagrams of precipitation, potential evaporation, temperature and streamflow of Red Creek watershed are displayed in Fig. 4. Once the relevant data were processed, the user can set the algorithmic parameters of MIKA-SHA, which eventually 20 decide the computation power and time required for the model induction.Table 4 summarizes the algorithmic setting of MIKA-SHA used in the current study.structural components and the calibrated values suggest that the runoff generation in both hillside and floodplain areas respond quickly to precipitation.This quick response is reasonable in both areas due to the higher slopes at hillside and widening of the river across the floodplain in high flow events.Further, the main soil type in the floodplain area of Red Creek catchment belongs to Smithton soil series which is characterized as soil with slow permeability and seasonally high-water table (Official Soil Series Descriptions, 2020).This may result in fast discharge generation dynamics, such as saturation excess overland flow.The constitutive function of the FR in both hillside and floodplain structure is the power function.This may help to capture the non-linear response of runoff generation.On the other hand, in plateau areas (around half of the total catchment area), one can expect higher residence times as the slopes are milder.This may lead to a delayed response in discharge to its forcing and may allow water to infiltrate more into subsurface layers.On top of that, the majority of the plateau area consists with McLaurin and Heidal soil types, which are characterized as sandy, well-drained soil types with moderate permeabilities (Official Soil Series Descriptions, 2020).Therefore, having a UR with a delayed base flow component as the plateau area model structure in SUPERFLEX_TOPO_M2 is meaningful.Please refer to the Appendix for more details about model parameters.Majority of the model sensitive parameters are connected with plateau area model structure.This is acceptable as the plateau area has the largest spatial coverage in terms of the catchment area of Red Creek catchment under topography based HRU classification.As reported earlier, obtaining a high percentage of measured data within the uncertainty bands (75%) suggests that the SUPERFLEX_TOPO_M2 is capable of estimating the total output uncertainty satisfactorily.

MIKA-SHA Models induced using FUSE Building Blocks
Application of MIKA-SHA with FUSE library resulted in five competitive model structures using topography based HRU classification.The relative rank scores are given in Table 7. Model M1 (hereinafter referred to as FUSE_TOPO_M1) was selected as the optimal model as it gave the lowest sum up rank.FUSE_TOPO_M1's model configuration is shown in Fig. 8.      and Lag_Sub).This demonstrates a lesser dependency on model parameters compared to the total model performance in semi-distributed modelling owing to a large number of model parameters.FUSE_TOPO_M1 results in high value (94%) for the percentage of measured streamflow data within the confidence interval bands and hence shows a significant capability of estimating associated uncertainty.

Discussion and Conclusions
In this study, spatial heterogeneity of the catchment was incorporated into the model building process based on the topography of the area (i.e. three HRUs namely hills, floodplain and plateau were identified based on the topography of the area).The results obtained based on topography based HRUs, such as achieving high-efficiency values for the absolute performance measures and obtaining a good visual equivalent between measured and modelled hydrographs suggest that topography of the catchment may have a strong impact on runoff generation.This demonstrates another potential utilization of the MIKA-SHA which is to use the toolkit to identify the runoff drivers of a catchment of interest.For example, one can also define the HRUs based on either geology or soil type of the catchment of interest and use MIKA-SHA to identify optimal model configurations.
This way one can identify the relative dominance of runoff drivers towards the total catchment runoff response.
One of the major issues with machine learning models is the overfitting of the model to its training dataset.The consistent performances over the calibration, validation and testing periods of all selected optimal models through MIKA-SHA show no such issues in this case.Deterministic semi-distributed modelling would require/ rely a large number of model parameters, by comparison, a smaller number of model parameters which are sensitive towards the total model performance.Further, the values of two lag parameters associated with "DISTRIBUTED" function (Lag_HRU and Lag_Sub) were found to be crucial in achieving high model performances.As the research findings of MIKA-SHA demonstrate a logical match with previously reported research findings and fieldwork insights, it may be safe to assume that MIKA-SHA is capable of handling equifinality phenomenon satisfactorily (i.e.selected optimal models perform for the right reasons).Additionally, the quantitative model selection scheme of MIKA-SHA ensures the selected optimal model has the appropriate complexity to describe the dominant runoff generation processes of the catchment instead of selecting an optimal model only based on model parsimony.
More importantly, both SUPERFLEX_TOPO_M2 and FUSE_TOPO_M1 share similarities among their model configurations, such as the inclusion of the same model structure for hillside and floodplain HRUs, demonstration of more subsurface type delayed responses by plateau area model structures, use of non-linear constitutive functions to represent storage-discharge relationships.This consistency demonstrated by the MIKA-SHA in capturing the similar runoff dynamics across different model inventories shows that the toolkit is smart enough to mine knowledge from data which makes it feasible to depend on the induced model configurations beyond just statistical confidence.
Further, it was possible to find a reasonable match between the model structural components of optimal models and the catchment characteristics.For example, both hillside and floodplain model configurations demonstrate a quick runoff response while plateau area model configurations demonstrate a delayed subsurface type runoff response to their forcing terms.This matches the topographic characteristics (e.g. higher slopes at hillside and widening of the river across the floodplain in high flow events, high water table at floodplain, milder slopes in plateau area) and soil characteristics (plateau area: sandy, welldrained soil types with moderate permeabilities, floodplain: soil with slow permeability) of Red Creek catchment.
In this contribution, we introduce Model Induction Knowledge Augmented-System Hydrologique Asiatique (MIKA-SHA) for learning semi-distributed models where the spatial distributions of catchment properties and climate variables are taken into account.MIKA-SHA utilizes the existing hydrological knowledge to guide the machine learning algorithm which eventually results in physically meaningful hydrological models that can be readily interpretable by domain specialists.In the current study, background hydrological knowledge is blended with the machine learning algorithm through the model building components of flexible rainfall-runoff modelling frameworks.
Results of this study indicate that the consideration of spatial distributions of forcing data and catchment properties gives more meaningful insights regarding the environmental dynamics occurring within the watershed.The unique and distinct feature of MIKA-SHA is that it could be coupled with any internally coherent collection of building blocks representing the elements of hydrological knowledge and use genetic programming to optimize both model architecture and model parameters simultaneously.This approach enables hydrologists to utilize flexible modelling frameworks to their full potential by trying many hypotheses before selecting an optimal model.MIKA-SHA is expected to be most valuable in circumstances where there may be a lack of experimental insights regarding the catchment of interest or human expert's knowledge.
We recognize the potential offered by machine learning algorithms towards hydrological modelling.However, simplistic black box type data-driven models may contribute to the development of accurate models with severe difficulties with interpretation may not serve towards the advancement of hydrological knowledge.Thus, the most promising way forward would be through the integration of existing hydrological knowledge with learning algorithms to induce more generalizable and physically consistent models.This was the motivation behind the development of the proposed MIKA-SHA framework which has been founded on both machine learning and hydrological theories.Therefore, we expect this work will strengthen the link between two leading, but historically, largely independent communities in water resource science and engineering: those working with physics-based process simulation modelling, and those working with machine learning.Finally, we expect more research studies on theory-guided machine learning to be directed towards the knowledge mining and automated model building in hydrological modelling.
steps,   : Observed streamflow,   : the genetic programming algorithm drives its total population towards the global or near-global solution which results in a set of possible solutions instead of one solution.In the context of rainfall-runoff model induction, such possible solutions may represent different model structures (different hypotheses about catchment dynamics).Hence, it is often required to use a model selection scheme to select the optimal model from the competing models.The more straight forward approach

Figure 1 :
Figure 1: Parse tree representation of the DISTRIBUTED function in MIKA-SHA

Figure 3 :
Figure 3: Red Creek catchment, Vestry, Mississippi, United States (map was generated through SWAT+ plugin in QGIS software using Shuttle Radar Topography Mission (SRTM) DEM data and USDA's Geospatial Data Gateway soil and land use data)

Figure 5 :Figure 6 :Figure 7 :
Figure 5: SUPERFLEX_TOPO_M2 model configuration (P: precipitation, E: evaporation, QT: total discharge, WR: snow reservoir, RR: riparian reservoir, FR: fast-reacting soil reservoir, UR: unsaturated soil reservoir, L: half-triangular lag function) 565 Finally, the choice of modified logistic function as the constitutive function may help the plateau area model structure to capture the threshold like behaviours (e.g.saturation excess overland flow) in catchment dynamics.Out of the 34 model parameters included in SUPERFLEX_TOPO_M2, 10 model sensitive parameters can be recognized by analysing the shapes of sensitivity scatterplots.They are D_R and D_S in floodplain model structure, Beta_Qq_UR, Ce, Smax_UR, D_S, mu_Qq_UR and K_Qb_UR in plateau area model structure and two lag parameters (Lag_HRU and Lag_Sub).
Both the hillside model structure and the floodplain model structure have the same upper-and lower-layer architectures identical to ARNO-VIC model with a single state upper soil reservoir and a fixed size base flow reservoir.Plateau area model structure incorporates a lower zone configuration like ARNO-VIC model and an upper zone configuration similar to SACRAMENTO model with the upper layer broken up into tension and free storages.Surface flow from all three model structures is developed as saturation-excess overland flow and described using the flux equations in FUSE parent model TOPMODEL.Both hillside and floodplain model structures have the same percolation mechanism which allows water to percolate from the field capacity to saturation and described using the flux equations of PRMS model, whereas in plateau area percolation is controlled by the moisture amount in the saturated zone as in SACRAMENTO model.A root weighting evaporation model is used both in floodplain and plateau area model structures, while a sequential evaporation model is used in hillside model structure.Interflow is allowed only in the plateau area model structure and the routing is allowed only in hillside model structure.

Figure 8 :
Figure 8: FUSE_TOPO_M1 model configuration (P: precipitation, E: evaporation, qb: base flow, qsx: surface flow, q12: percolation, qif: interflow, Zuz and Zlz: depth of unsaturated zone and saturated zone, S1 and S2: total water content in the unsaturated zone and saturated zone, S1 F : free water content in the unsaturated zone, S1 T : tension water content in the unsaturated zone, ϴwlt: soil moisture at the wilting point, ϴfld: soil moisture at field capacity, ϴsat: soil moisture at saturation)

Table 1 : Absolute performance measures used in the current study
is a relative performance measure which quantifies how well a model captures the observed flow duration curve (FDC) relative to the other competitive models.Models with negative SIS values indicate better than average performance in capturing observed FDC and vice versa.In SIS calculation, both observed and simulated FDCs are divided into four flow regimes based on flow exceeding probabilities and calculate the absolute difference in observed and simulated cumulative discharges in each region.Then, four separate Z-score values (representing 4 regions) are assigned to each model based on the mean and standard deviation of all models considered.The algebraic sum of those four Z-score values becomes the SIS value of the model.