Data Assimilation Informed Model Structure Improvement (DAISI) for Robust Prediction Under Climate Change: Application to 201 Catchments in Southeastern Australia

This paper presents a method to analyze and improve the set of equations constituting a rainfall‐runoff model structure based on a combination of a data assimilation algorithm and polynomial updates to the state equations. The method, which we have called “Data Assimilation Informed model Structure Improvement” (DAISI) is generic, modular, and demonstrated with an application to the GR2M model and 201 catchments in South‐East Australia. Our results show that the updated model generated with DAISI generally performed better for all metrics considered included Kling‐Gupta Efficiency, NSE on log transform flow and flow duration curve bias. In addition, the elasticity of modeled runoff to rainfall is higher in the updated model, which suggests that the structural changes could have a significant impact on climate change simulations. Finally, the DAISI diagnostic identified a reduced number of update configurations in the GR2M structure with distinct regional patterns in three sub‐regions of the modeling domain (Western Victoria, central region, and Northern New South Wales). These configurations correspond to specific polynomials of the state variables that could be used to improve equations in a revised model. Several potential improvements of DAISI are proposed including the use of additional observed variables such as actual evapotranspiration to better constrain internal model fluxes.


Introduction
The pressure on water resources is reaching unprecedented levels in many catchments around the world due to increasing anthropogenic presence and higher variability induced by climate change.In this tense context, catchment scale rainfall-runoff models are one of the main quantitative tools used by water managers to translate future climate predictions into water volumes and assess water sharing scenarios.Estimation of future streamflows like in the study by Chiew et al. (2008) is generally done by selecting a few rainfall-runoff models to generate streamflow projections based on future climate inputs.Unfortunately, the performance of these models degrades significantly when predicting values beyond the range of hydro-climate conditions seen during their calibration (Coron et al., 2012).Of particular worry is the tendency for rainfall-runoff models to over-estimate streamflow in dry years which are expected to become more common in the future in many regions, for example, in South Eastern Australia (Chiew et al., 2011).This paper presents a method to analyze and improve the equations constituting a rainfall-runoff model structure in the context of climate change scenario modeling demonstrated with an application to the GR2M model (Mouelhi et al., 2006) and a large data set of catchments in South-East Australia.
Most rainfall-runoff models are empirical and hence require their parameters to be calibrated based on observed data.Once input and output data of acceptable quality are obtained, improving model calibration is the first step to obtain defensible simulations of future streamflow.Calibration algorithms are the topic of a considerable literature including the development of stochastic (see the review by Arsenault et al., 2014), probabilistic (Beven & Freer, 2001;Kuczera & Parent, 1998;Vrugt & Ter Braak, 2011) or multi-objective (see the review by Efstratiadis & Koutsoyiannis, 2010) algorithms.These advances have allowed highly parameterized models to be routinely calibrated within operational systems.However, there are limits to what a better calibration strategy can achieve to simulate streamflow in a changing climate.Coron et al. (2014) showed that models are often incapable of simulating significant changes in rainfall-runoff relationships regardless of how they are calibrated.Zheng et al. (2022) go further by saying that "calibration can only marginally (if at all) improve the quantification of uncertainty in future runoff projection due to hydrological nonstationarity."These studies suggest that improvement in model structures is critical to obtain robust streamflow projections.Pursuing this idea, Fowler et al. (2020) identified that most model structures are not able to simulate multi-year processes that are driving changes in rainfall-runoff relationships during drought periods.
Unfortunately, formulating a rainfall-runoff model structure is not straightforward because of the difficulty to describe physical processes at the catchment scale (Beven, 2001) leading to a certain level of subjectivity in the process.To overcome these limitations, one can assemble a large collection of published models and compare their performance as was done by Perrin et al. (2001) and more recently by Knoben et al. (2020).These studies have laid the foundations for the development of robust model structures such as the GR4J model (Perrin et al., 2003).However, they also concluded that no single structure outperforms the others systematically and that the difference in performance between structures is not well explained by catchment descriptors.As a result, it is difficult to define a clear path leading to model structure improvement from this approach.To overcome these limitations, flexible software frameworks such as FUSE (Clark, Slater et al., 2008) or SUPERFLEX (Fenicia et al., 2011) have been proposed to create arbitrary model structures from selected components and hence allow the comparison of a much larger set of candidate structures.These tools remain complex to implement and few authors have applied them beyond pure research applications in a single catchment.A notable exception is the study by Van Esse et al. (2013) who applied a large number of model structures to 237 catchments in France.Van Esse et al. (2013) concluded on the difficulty to relate model structures with catchment characteristics.The study named several modeling components that proved generally beneficial (e.g., parallel routing stores, bypass flows), but did not offer a simple diagnostic to improve a particular model such as the ones used in climate change studies.
As an alternative to the previous approaches, data itself can guide the identification of model structures.Machine learning method such as deep learning offers powerful tools to generate purely data-driven model structures (Nearing et al., 2021).However, as suggested by Wi and Steinschneider (2022), pure machine learning models may lack the capacity to extrapolate far beyond historical conditions such as required in climate change studies.In addition, machine learning models remain complex compared to empirical lumped rainfall-runoff models which does not facilitate their use in an operational context.Consequently, this paper focuses on classical modeling approaches based on empirical equations derived from physical system knowledge.
In this context, Lamb and Beven (1997) and subsequently Kirchner (2009) used data analysis to infer the form of model equations.Their approach remains limited to specific hydrological processes (recession for Lamb & Beven, 1997) or catchment characteristics (dominant base flow contribution for Kirchner, 2009).This concept was expanded further by Gharari et al. (2021) in theoretical experiments who explored the uncertainty in model structure via randomly generated piecewise linear functions.Overall, these attempts of data-driven model structure identification are promising but lack practical and large-scale applications to improve climate projections in the short term.In contrast, the field of data assimilation has produced firmly established algorithms such as the Ensemble Kalman Filter (Evensen, 2009) to efficiently blend model simulations with observed data over large spatial domains (see Fox et al., 2018 for an application to a land surface model).These algorithms have been used in hydrology for several decades as reviewed by Ghorbanidehno et al. (2020) and Moradkhani et al. (2018).For example, Pathiraja et al. (2016a) used data assimilation to estimate time-varying parameters in synthetic case studies, which is a powerful approach to estimate model structure deficiencies.This approach could help in the case of important changes in land use as demonstrated by Pathiraja et al. (2016b).However, according to Beck (1985), large time variations of parameters could also be interpreted as a structural deficiency requiring remediation.Pathiraja et al. (2018a) confirmed this opinion by showing that poor model structures can lead to non-sensical parameters variations.Furthermore, Pathiraja et al. (2018b) showed in a forecasting context that adjusting model parameters is not required if the magnitude of model errors is small compared to overall predictive uncertainty, even in the case of non-stationary catchment behavior.Overall, to our knowledge, only Bulygina and Gupta (2009) have demonstrated the use of a data assimilation algorithm for the identification of a complete set of rainfall-runoff model equations and applied their model beyond synthetic experiments to observed data.The approach of Bulygina and Gupta (2009) relies on an iterative algorithm where the particle filter (Doucet et al., 2000) is used within each iteration to generate probabilistic model equations sampled from a mixture of multivariate normal distributions.This approach is elegant because it allows combining a prior estimate of the model structure with observed data in a fully Bayesian inference scheme.However, it is significantly more complex than classical data assimilation algorithms because it requires a repeated application of the particle filter (hence ensuring that the filter does not degenerate as warned by Abbaszadeh et al. (2019) and Moradkhani et al. (2012)) and a customized sampling scheme described by Bulygina and Gupta (2011).The method is promising but was applied to a single catchment in the United States with results qualified as "preliminary" by Bulygina and Gupta (2011).Consequently, it does not seem applicable to a large catchment data set in an operational context.
The previous review of the literature shows important research gaps related to the improvement of rainfall-runoff model structures: • Lack of methods to improve model structure beyond trial and error of pre-defined structures: the most advanced methods currently available to identify model structures are based on trial and error of pre-defined model structures either collected from published literature or built from selected components.These approaches often lead to model equifinality where multiple structures are seen as equally applicable, which provides limited guidance for the improvement of a specific model.Consequently, the three objectives of this paper are: • Define a new approach where data assimilation is used to identify structural improvements of an existing rainfall-runoff model structure.The method should be robust and computationally tractable enough to be applicable to a large data set of catchments offering insight into regional trends in model structure updates.The method focuses exclusively on model structure improvement and not on model parameterization by assuming that the model has been calibrated prior to the structural diagnostic.The rationale behind this choice is to avoid duplicating existing research on model calibration and leave open the choice of the calibration process.• Demonstrate the improvement of model simulations using a wide range of metrics and provide a detailed diagnostic on the structural improvement to guide future model development.
• Present an example of the overall process using the GR2M monthly rainfall-runoff model, an existing data assimilation scheme (ensemble smoother (ES)) and a large data set of catchments in a region experiencing a pronounced climate change signal (South-East Australia).
The proposed method is presented in Section 2 including its objective and principles.Section 3 describes the empirical case study with the description of the GR2M model, the evaluation process, and the catchment data set.Application of the method is presented in details for one example catchment in Section 4.1 and then generalized to 201 catchments in Section 4.2 and 4.3.The strength and weaknesses of the method, the knowledge gained on the GR2M structure and potential future development of the method are discussed in Section 5. Section 6 concludes the paper.

Objective and Principles of Data Assimilation Informed Model Structure Improvement (DAISI)
The main goal of DAISI is to provide a rapid diagnostic of an existing rainfall-runoff model structure by analyzing time series of state variables generated by a data assimilation algorithm.DAISI relies on Bayesian inference but aims at providing simple diagnostics that can be used outside of a probabilistic framework.The method can be applied to a single catchment or to a large data set of catchments to obtain a more robust diagnostic on the model structure.
DAISI is based on three steps outlined in Figure 1.The method starts by assimilating observations (e.g., observed streamflow data) during a calibration period resulting in an ensemble of state variable simulations.In a second step, the assimilated states are used to update the model structure.Finally, the new model structure is run over an independent validation period similarly to a classical rainfall-runoff model (i.e., without assimilation or structure update) and compared with the original structure in terms of model behavior and performance.

Step 1: Data Assimilation
A rainfall-runoff model is a numerical solution to a set of ordinary differential equations describing the water storages and fluxes at a catchment scale.When integrated over a time step, these equations take the following form referred to as "state equations" in the rest of the paper: Where t is the time step, ũt is the input vector, xt the state vector of length V, f is a vector value function, and θ is a parameter vector assumed to be obtained from a prior calibration exercise.Vector xt includes all variables that affect the dynamic of the model such as internal stores, fluxes and model outputs (a subset of xt of length O denoted as mt ).Observed data corresponding to mt are denoted dt .The concatenation of all xt vectors for t = 1,…,T is denoted x.Similar notations are used for vectors ũ and d.
The first step of DAISI is a smoothing data assimilation algorithm that aims at estimating the probability distribution of states x given f , inputs ũ and observed data d over the whole calibration period (t = 1 : T).Among the wide range of methods described in the literature (Moradkhani et al., 2012;Van Delft et al., 2009), the linear ES introduced by van Leeuwen and Evensen (1996) is one of the simplest smoothing algorithms where model errors are assumed to be linearly related to observations and follow a Gaussian prior distribution.Despite its limitations in handling non-linear dynamics, the high computational efficiency of ES, especially its non-sequential nature, is appealing in a diagnostic tool such as DAISI.Note that the use of smoothing algorithms considering the joint uncertainty of states over the whole calibration period remains limited in hydrology, mostly due to their high computing requirements (Li et al., 2014).
The ES algorithm implemented in this paper starts by transforming model and input variables so that their distribution becomes closer to normal following Clark, Rupp et al. (2008).The transformation adopted are the log transform for rainfall, the , respectively, where r = 1,…,R.In this paper, independent perturbations are added to transformed data and input vectors as follows (Moradkhani et al., 2005;Pathiraja et al., 2016a): Where ẽd t [r] and ẽu t [r] are sampled from multivariate normal distributions.The perturbed observed vectors dt [r] are then collated into a matrix D of dimension OT × R. Subsequently, the original model is run using perturbed inputs ũt [r] as forcings to the state equations (Equation 1) combined with model errors as follows: Where x f t [r] are the perturbed states (or "forecast" states to follow the data assimilation terminology) and ẽx t [r] is the model error sampled from a multivariate normal distribution.Finally, the perturbed ensembles x f [r] are collated into matrix X f of dimension VT × R. A subset of this matrix of dimension OT × R, referred to as HX f , contains the model outputs.This perturbation scheme has been the subject of numerous studies (Gong et al., 2023;Lei et al., 2014) with potentially complex parameterization such as the data driven approach presented by Pathiraja et al. (2018c).A pragmatic approach is adopted here by using perturbations with mean 0 and fixed covariance defined similarly for the three vectors ẽd t [r] , ẽu t [r] , and ẽx t [r] as follows: Where v is either d (observations), u (inputs) or x (state variables), α e is a scaling factor set to 0.1 and Σ v is the sample covariance matrix of variable v computed from the original deterministic model simulation run over the calibration period.The value chosen for α e remains subjective and based on values generally reported for the uncertainty in hydrological data (Seo et al., 2009;Vrugt et al., 2005) where an error rate of ±10% is common.Alternative values of α e have been tested with results reported in Section S3 of Supporting Information S1.
The final step in ES is to generate a set of R ensemble vectors {x[r],r = 1,…,R} denoted as "analyzed states" forming the matrix X a containing samples from the posterior distribution p(x| ũ, d,f , θ) .This matrix is computed from X f based on the well-known Kalman gain equation detailed in Appendix A with more details provided in Moradkhani et al. (2018).
Data assimilation schemes are notoriously complex to configure with parameters that are difficult to relate to actual observations.Consequently, it is important to verify that the assimilated ensemble is statistically consistent with observed data (reliable).This consistency was measured with the normalized RMSE ratio (Fortin et al., 2014;Moradkhani et al., 2005;Thiboult & Anctil, 2015): Where m t [k,r] is the rth ensemble of the kth model output corresponding to observation d t [k] .A value of N R close to one indicates statistical reliability while N R substantially smaller or greater than 1 suggests a too wide or narrow ensemble, respectively.

Step 2: Model Structure Update
In the second step of DAISI, the analyzed states X a are used to estimate updates in the state equation (Equation 1).
A preliminary transformation of the state equations, referred to as "normalization" in the remainder of the paper, is undertaken if DAISI is applied to a large number of catchments.This normalization aims at removing site specific parameters from the state equations to allow the comparison of structural updates between catchments (see examples in Section 3.1).It is acknowledged that such normalization is not always possible, which is an important limitation of DAISI.Let us assume that the normalization of the state and input vectors is given by where ψ x and ψ u are normalization functions for states and inputs, respectively.Using this normalization, the state equation becomes: where f * is the normalized function with no dependency on rainfall-runoff model parameters θ as opposed to f .Note that if DAISI is applied to a single catchment, the normalization process is not required and ỹt and f * can be replaced by xt and f , respectively.
The fundamental concept in DAISI is to alter the state variables with an update term as follows: Where y n,t is the nth component of ỹt , ŷn,t is the updated state and δ n,t is the update term.Let us assume that a subset of V n variables, noted {y i,t } i=1,⋯,V n , affects y n,t among the full set of V state variables.The update term is computed as a quadratic form of y i,t written as: Where is η n [i] the ith update coefficient and k(i,j) = 2V n + i + ( j 1)( j 2)/ 2 is the index of cross-product terms.The coefficient vector ηn is of length L n where Equation 9 is the fundamental equation of DAISI and is referred to as the "update equation" in the rest of the paper.The form of the update term in Equation 10 was chosen because it is a non-linear function of the state variables but a linear function of the update coefficients, which greatly facilitates their estimation as discussed below.Equation 9 provides a simple way to explore alternative model structures continuously (as opposed to predefined or discrete structures) by varying the coefficients ηn .Note that Equation 10 has a similar form to the second order Taylor series expansion of the nth component of f * at the origin.Consequently, the update coefficients ηn can be interpreted as modifications of its partial derivatives up to order 2 close to the origin.Despite these attractive properties, Equation 10 does not impose any physical constraint on the update term, which can lead to non-physical values of the updated state ŷn,t+1 .This is an important limitation of the method and the price to pay for the flexibility offered by Equation 10.When running the updated model structure, the lack of physical constraints requires that checks on their bounds be placed on the updated state variables to ensure their physical realism.An example of such checks is provided in Appendix C.
If the normalized states are known, for example via data assimilation as presented in the previous section, it is possible to compute what is referred to as the "assimilated update," denoted Δ n,t [r] , for each ensemble member r: Water Resources Research 10.1029/2023WR036595 LERAT ET AL.
Where ỹt [r] is the assimilated normalized state vector from the rth ensemble member and f * n is the nth component of f * .Introducing a residual ϵ n,t [r], the assimilated update Δ n,t [r] can be subsequently combined with Equation 9as follows: Where ϵ n,t [r] is assumed to follow a normal distribution with mean 0 and standard deviation σ n .Due to the form of δ n,t [r] given in Equation 10, Equation 13 is an ordinary multivariate regression with L n predictor variables given in the right-hand side of Equation 10.It can be solved easily by Bayesian inference if non-informative priors are assumed (Gelman et al., 2013).Consequently, Equation 13 provides a way to estimate the update coefficients ηn for each ensemble member.
Summarizing, DAISI aims at estimating the distribution of ηn for each state equation given the model structure f , model parameters θ, input ( ũ) and observed data ( d) over a calibration period t = 1 : T. This probability is noted Using the posterior distribution of state variables x estimated by data assimilation presented in the Section 2.2, the distribution of ηn can be obtained by introducing x and integrating as follows: Introducing the assimilated ensemble, Equation 14 can be approximated as This paper aims at producing a deterministic modified model, which requires a single estimate of ηn .The choice made here is to compute this estimate as the expected value of P(η n | ũ, d,f , θ), denoted ηa n and computed as follows: If a noninformative prior on ηn is assumed, the integral on the right-hand side of Equation 17is the posterior mean of the coefficients in a multivariate regression which is equal to the ordinary least square solution (Box & Tiao, 2011): where Y[r] is the predictor matrix associated with assimilated ensemble r in which the columns are the L n predictor variables in the right-hand side of Equation 10.
In summary, the second step of DAISI aims at modifying the N state equations, and hence the model structure, using a multivariate polynomial regression parameterized by coefficients ηn .Expected values of these coefficients, denoted ηa n , can be estimated to obtain a single set of update coefficients.

Step 3: Model Diagnostics
Once the expected coefficients ηa n are obtained from Step 2, the model is run using the updated state equation (Equation 9), leading to modified simulated variables.It is highlighted that data assimilation and coefficient fitting are not used at this stage of DAISI and that the updated model runs exactly like a classical rainfall-runoff To answer the first question, the simulations produced by both structures are compared over a validation period using evaluation metrics.Four metrics were selected starting from the Kling-Gupta Efficiency (KGE) performance metric (Gupta et al., 2009) which summarize model performance by aggregating measures of bias in the mean, bias in variance and correlation into a single metric.KGE alone is not sufficient to assess model performance, especially on low flows (Pushpalatha et al., 2012).To assess low-flow performance, we used the Nash-Stucliffe efficiency computed on log-transform flow with an offset of 1 mm/month to handle zero values.Furthermore, following the recommendations of Refsgaard et al. (2014) in the evaluation of climate change scenario, we included the flow duration curve bias index (Lerat et al., 2020):

Water Resources Research
Where Pct( q,k) is the kth percentile of streamflow time series q, and qo and qs are the observed and simulated streamflow series, respectively.F B is equal to 1 for a perfect simulation.The fourth metric is the relative elasticity of modeled streamflow to rainfall computed as: Where p+ and p are two rainfall scenarios in which historical rainfall series p are scaled up and down by +10% and 10%, respectively.qs ( p) is the streamflow simulation obtained when forcing the model with rainfall scenario p and E[ p] is the mean value of p.The choice of 10% as a scaling factor was guided by the range of rainfall variability expected in South-East Australia (Charles et al., 2020).ϵ P is distinct from the three previous metrics because it does not compare the model with an observed reference.Comparing ϵ P between the original and updated model quantifies the impact of the DAISI structural update on climate change scenarios.This last test is important because better performance, as measured by the three previous metrics, does not guarantee that the updated model will yield significantly different climate change scenario when forced with different climatological inputs (e.g., reduced rainfall scenario).
Additional metrics including absolute bias, NSE, NSE on reciprocal flow (Pushpalatha et al., 2012), the recent PMR robustness (Royer-Gaspard et al., 2021) and split KGE (Fowler et al., 2018) metrics are included in Section S2 of Supporting Information S1.These metrics lead to similar conclusions than the three described earlier in this section.
The second element of the DAISI diagnostic explores the trends in the update term δ n,t .To visualize how state variables affect the update term, a scatterplot is generated by plotting δ n,t on the vertical axis versus the percentile rank of one of the state variables on the horizontal axis.The percentile rank is used to allow the plotting of data from multiple sites in a single plot and hence analyze regional trends.The choice of the state variable on the horizontal axis is subjective and depends on the model and state equation.All variables were trialed and the one leading to the easiest plot to interpret was retained.When multiple sites are plotted simultaneously, a curve showing the update terms are binned based on the state variable.The median, 25% and 75% quantiles of the update term are computed for each bin and added to the plot to ease interpretation.
The third element of the DAISI diagnostic aims to find dominant patterns in the functional form of the updates.Let us assume that DAISI was applied to B sites and two calibration periods.A matrix C n of size 2B × L n (L n is the number of update coefficients in Equation 10) is constructed for each state variable n by concatenating as rows all the update coefficient vectors ηa n for each site and calibration period.The influence of outliers in this matrix is tempered by clipping the values between 1 and 1.These bounds are subjective and may vary depending on the model.Dominant patterns are identified in C n through a reduced singular value decomposition (Lawson & Hanson, 1974): where A n and B n are orthogonal matrices of size 2B × L n and L n × L n , respectively, and S n is a diagonal matrix of size L n × L n containing the singular values s n,1 …s n,L n along its diagonal in decreasing order by convention.The columns of B n are referred to as singular vectors.Equation 21provides important insights into the functional form of the update.First, the components of the singular vectors are directly related to the predictor variables in the update equation (Equation 10).Consequently, each singular vector corresponds to a set of coefficients and hence to a specific update polynomial.Second, assume that the weights ω n,k are defined from the singular values as: ω n,k varies between 1/ L n and 1 and represents the total distance between the rows of C n and their projection on the kth singular vector as per the inner-product (Hastie et al., 2009).A value of ω n,1 close to 1 indicates that the rows of C n are nearly colinear with the first singular vector, suggesting that the update polynomial is similar to the first singular vector for all sites and periods.Finally, the product s n,k × A n [: ,k] , referred to as principal component k, contains the projection of each row of C n on the kth singular vector.These projections can be used to find groups of sites where the update coefficients are colinear to the kth singular vector, and hence where the update is close to the corresponding polynomial.The uncertainty in the decomposition was assessed by replicating Equation 21whilst bootstrapping the rows of C n to obtain confidence intervals on the singular vectors.

Rainfall-Runoff Model
The DAISI method is applied to the GR2M monthly rainfall-runoff model (Makhlouf & Michel, 1994;Mouelhi et al., 2006) presented in detail in Appendix B. The model runs a sequence of two stores.The first one referred to as the "production" store (S) receives rainfall (P) and potential evapotranspiration (E).It generates effective rainfall P e which is then fed to the "routing" store of fixed capacity θ r = 60 mm which, in turn, produces streamflow (Q).The model has two calibrated parameters: the capacity of the production store θ 1 (mm) and the inter-basin exchange coefficient θ 2 (-) which controls the amount of water gained or lost from the surface water catchment (Mouelhi, Michel et al., 2006).The GR2M model has four state variables listed in Table 1 with more details provided in Appendix B. In this table, variables corresponding to the end of the time step are marked with a "+".
Note that the notation is modified in the rest of the paper to improve readability by referring to specific GR2M state variables using the names indicated in Table 1 as a lower-case subscript instead of the state variable number n used in the previous sections (e.g., y s + ,t instead of y 1,t ).To further simplify notations, reference to time step is also dropped when possible.
This model was chosen because it has been applied to a wide range of catchments across the world (Ditthakit et al., 2021;Huard & Mailhot, 2008) and in Australia (Bennett et al., 2017).It also has a smooth and parsimonious structure which simplifies the application of DAISI.Finally, GR2M shares its production store with the daily GR4J model which has been applied even more widely than GR2M, especially in Australia (Hapuarachchi et al., 2022;Lerat et al., 2020).The monthly time step further simplifies the process by reducing time lags between the model variables and corresponding observations that can penalize certain assimilation schemes significantly (Li et al., 2014).It is highlighted that GR2M does not model vegetation dynamic or groundwater-surface water interaction explicitly.Both processes, often cited as important in Australia (Doble et al., 2014;Donohue et al., 2012), are assumed to be captured within the dynamic of the production and routing stores, respectively.
In this paper, the GR2M model is calibrated by maximizing the KGE (Gupta, Kling et al., 2009).The calibration algorithm is a two-step approach where 10,000 random parameter sets are first generated followed by a Nelder-Mead gradient descent (Nelder & Mead, 1965) applied to the best parameter set as detailed in Lerat et al. (2020).This configuration is referred to as "GR2M-kge" in the rest of this paper.
To assess the influence of the objective function on GR2M performance and compare it with the DAISI performance, a benchmark configuration is obtained by calibrating GR2M using the sum of squared Box-Cox transformed flows with an exponent of 0.2.McInerney et al. (2017) found that this objective function is a satisfactory compromise for a wide range of performance metrics.The function is computed as follows: where BC qt ,λ) is the Box-Cox transformation of qt with exponent λ.In the rest of this paper, the calibration of GR2M using the BC02 objective function is referred to as "GR2M-bc02."Several other benchmarks are presented in Section S2 of Supporting Information S1.
As part of DAISI Step 1, the linear ES data assimilation algorithm described in Section 2.1 was applied to GR2M using the parameters obtained from the GR2M-kge calibration and ensemble of R = 500 members.For the DAISI Step 2 (model update), the expected update coefficients introduced in Equation 17 are computed for the four state variables described in Table 1.
It is highlighted that the update terms applied to the effective rainfall (δ p e ) and the simulated streamflow (δ q ) are flux corrections.In other words, these corrections alter the way GR2M computes effective rainfall and streamflow from its inputs and internal variables.The interpretation of the update terms corresponding to the production (δ s + ) and routing store (δ r + ) is more subtle.Via rearrangement of the model equations, Appendix C concludes that the opposite of the sum δ p e + δ s + is the update term for the actual evapotranspiration, hence a correction on the evapotranspiration flux.Similarly, the opposite of the sum δ q + δ r + is the update term for the inter-basin exchange flux, hence a correction on this flux.

Model Evaluation
The GR2M model calibration and application of the DAISI method were implemented within a split-sample cross-validation scheme where the GR2M model parameters, analyzed ensembles and update coefficients are obtained using half of the total data period.These model parameters and update coefficients are then applied to the second half of the period without any use of data assimilation.Both sub-periods are subsequently exchanged.As detailed in the following section, the study region used in this paper experienced a prolonged dry period during the second half of the period known as the "Millennium drought" (Chiew et al., 2014), which leads to significantly different hydro-climate conditions between the two sub-periods.
Overall, three modeling scenarios are run for each catchment and each validation period: (a) a GR2M simulation using parameters calibrated over the independent corresponding calibration period with the KGE objective function (GR2M-kge), (b) GR2M calibrated with BC02 objective function (GR2M-bc02), and (c) the DAISI Water Resources Research 10.1029/2023WR036595 LERAT ET AL. updated model structure using the GR2M-kge parameters and update coefficients fitted for the same calibration period.

Catchment Data Set
The DAISI approach was tested on a set of 201 catchments located in South-Eastern Australia as shown in Figure 2. The hydro-climatic catchment characteristics are provided in Table 2.The data was extracted from the data sets collated by Lerat et al. (2020) including rainfall and potential-evapotranspiration data obtained from the Bureau of Meteorology Australian Water Outlook website (Frost et al., 2016) and streamflow data obtained from the Bureau of Meteorology Water Data Online website (Bureau of Meteorology, 2019).The data was collected from 1980 to 2018, split into two sub-periods 1980-1999 (Period 1) and 1999-2018 (Period 2).
In addition, two sub-groups of stations including stations located in Western Victoria (WVIC) and Northern New South Wales (NNSW) are located in Figure 2 to support the presentation of results in Section 4.
Table 2 highlights the predominance of semi-arid conditions in this data set with median runoff coefficients of 0.17 and 0.12 for periods 1 and 2, respectively.The table also reveals that Period 1 was much wetter with median runoff of 160 mm/yr against 113 mm/yr for Period 2. Figure 2 shows that the relative reduction of rainfall between periods 1 and 2 exceeds 10% for a large number of catchments.
Among the 201 study catchments, the Jamieson River at Gerrang Bridge (station id 405218) was selected as an example to illustrate the DAISI method.This catchment represents 9% of the total catchment area of lake Eildon, one of the largest reservoirs in Australia with a maximum storage capacity of 3,334 Mm 3 .Lake Eildon is a key piece of infrastructure which supports irrigation and environmental flows along the Goulburn and Murray Rivers.

Example of DAISI Workflow Applied to the Jamieson River at Gerrang Bridge
This section follows the three steps of the DAISI workflows applied to the example catchment.The parameters and diagnostic metrics for this catchment are provided in Table 3.

Step 1: Data Assimilation
Figure 3 illustrates Step 1 of DAISI by showing as orange lines the ensemble time series resulting from the ES data assimilation algorithm applied to the GR2M model calibrated using the KGE objective function (GR2M-kge) over the first period (1980)(1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999).Each plot in this figure corresponds to the four states listed in Table 1.In addition, the figure shows the original GR2M-kge simulations in green along with the observed streamflow data as black dotted lines in Figure 3d.The comparison between streamflow observations (black) and GR2M-kge simulations (green) in Figure 3d highlights the systematic overestimation of low to mid flows by GR2M-kge.This overestimation is particularly pronounced between the second half of 1997 and the first half of 1998 for which GR2M-kge simulation stays above 5 mm/mth whereas observations are close to cease-to-flow conditions with values as low as 1 mm/mth.As can be seen in Figure 3d, assimilation corrects the low-flow bias of GR2M-kge by bringing the ensemble closer to streamflow observations.During high flow periods, the assimilation does not affect the simulation significantly as can be seen during the period from July to October 1998.
Assimilation impacts the GR2M routing store shown in Figure 3c in a similar way than streamflow by decreasing the store level by 5-10 mm during the low flow periods compared to the original model.Like the two previous variables, the assimilated effective rainfall (P e , see Figure 3b) is reduced during low-flow periods but remains largely unaffected during high flow periods.The assimilated production store level (S + ) shown in Figure 3a remains close to its value in GR2M-kge throughout the simulation.Overall, the effect of data assimilation decreases for state variables located further apart from streamflow within the model structure.This is expected as their correlation with observed streamflow estimated via the Kalman gain matrix is likely to decrease (see Appendix A).
The RMSE ratio metric N R reported in Table 3 measures the statistical reliability of assimilated streamflow ensembles and reaches 0.73 and 0.78 for the two calibration periods.These values are below one, which denotes an ensemble that is slightly too wide.This result is not necessarily obvious from Figure 3d alone, which highlights the importance of assessing ensemble reliability quantitatively with metrics such as N R .Similar results are obtained across the whole catchment data set; hence the discussion of this point is deferred to Section 4.2.

Step 2: Model Structure Update
In Step 2 of DAISI, the update equation (Equation 9) is fitted for each assimilated ensemble to obtain the update coefficients ηn for each state variable.The process is illustrated in Figure 4 where the fitting is undertaken using data from the first calibration period.Figures 4a and 4c show time series of the update terms corresponding to the routing store state (y r + ) and the first two ensemble members.The assimilated updates (i.e., difference between assimilated variables and values computed using GR2M original equations as defined in Equation 12) are shown as dashed orange lines while the predicted updates computed from Equation 10 are shown as plain blue lines.For both ensemble members, the predicted update captures the general trends of the assimilated updates: both updates are close to 0 during the high flow periods from July to October 1998 while being negative during earlier low flow months.However, the variability of predicted updates appears to be underestimated as can be seen during the low flow period from October 1997 to June 1998.This result reveals the limitations of the regression model used in the update equation which can only explain a part of the variability seen in the assimilated updates.Figures 4b and 4d provide a more detailed analysis of the performance of the update equation by plotting predicted (on the horizontal axis) versus assimilated (on the vertical axis) updates for the first two ensembles.In these two plots, the points appear scattered around the 1:1 line (dotted line) in the lower left part of the plot which suggests that the predicted updates exhibit large differences with the assimilated updates for low updates values.The predicted updates are much closer to assimilated updates for large updates with points clustered along the 1:1 line.Overall, the Pearson correlation between assimilated and predicted updates is close to or above 0.4 (shown in bottom right corner of the plots) indicating that the regression captures the main trends of assimilated updates but does not reach a high predictive power.Note that Figures 4a-4d are limited to the first two ensembles.Figures 4e and 4f expand the analysis by showing both assimilated and predicted updates for all ensembles.Here again, the predicted updates correlate with the assimilated updates, but lack a high predictive power.
Figure 4e also highlights the high uncertainty of assimilated updates during the low-flow period between October 1997 and June 1998 during which the updates jump from low to high values following a noisy pattern.This point illustrates the challenge of selecting a suitable update equation able to capture the important trends of the updates without reproducing its noise which is unavoidable in the presence of uncertain data and empirical model structures.
The distribution of the update coefficients for the routing store (y r + ) resulting from the fitting of the update equation is shown in Figure 5. Plots corresponding to the remaining three state variables are shown in Section S1 of Supporting Information S1.As indicated in requiring six coefficients to be fitted.The expected value of each coefficient (η a n ) is represented by a vertical blue line.The predictor variable corresponding to the coefficient in the update equation is indicated in the top left of each plot.Figure 5 reveals that most coefficients are statistically significantly different from zero with a majority of the probability mass located on either side of 0 (black vertical line), which suggests that most predictors play a significant role in the update equation.There are exceptions: for example, the distribution of coefficient η r + ,4 (Figure 5e) is centered around 0, which suggests that the y 2 p * e variable could be excluded from the update equation without much loss to its predictive power.Such predictor selection could lead to a more parsimonious update equation.It was not undertaken in this paper to keep the fitting of the update equation consistent across all sites and calibration periods.

Step 3: Model Diagnostic
Once the expected update coefficients are computed (vertical blue line in Figure 5), the updated model can be run and DAISI proceeds to Step 3. We highlight that data assimilation is not used in this step and that the update coefficients stay fixed leading to an updated model that runs exactly like GR2M.
Figure 6 shows time series of state variables for the updated model along with the update terms δ n computed from Equation 10.The data covers the last 2 years of the first calibration period and the first 2 years of the following validation period.A comparison between Figures 3 and 6 suggests that the updated model reproduces the behavior of the assimilated ensembles reasonably well.For example, it corrects the GR2M-kge overestimation of low flows with runoff simulations that are closer to observations in Figure 6d.More important, this finding also applies to the validation period, for example, between January 2001 and June 2001 in which the updated model fits the observed flow particularly well.This result demonstrates that the structural changes introduced by DAISI persist beyond the calibration period and can improve simulations during an independent validation period as confirmed by the performance metrics listed in Table 3 and discussed in the following paragraph.
Figure 6 offers further insights on the update terms shown as dotted lines.For example, Figure 6d shows that the streamflow updates δ q are negative during most of the period except around high flow peaks (e.g., September 1998 and August 1999) where the updates become positive.This means that the updated model reduces low flows and increases high flows compared to GR2M.The updates of the routing store δ r + shown in Figure 6c exhibit a much smaller amplitude than δ q , which indicates that the sum δ r + + δ q is close to δ q .As shown in Appendix C, the opposite of this sum is the update term for the GR2M inter-basin exchange flux (amount of water leaving the catchment unaccounted).Consequently, the update of the inter-basin exchange flux is close to δ q .In other words, when the updated model increases streamflow compared to GR2M, the inter-basin flux is reduced by the  same amount.Considering this explanation, Figure 6d reveals that the updated model increases the inter-basin flux during low flows (negative δ q ), perhaps to increase losses to ground water.Conversely, the flux is decreased during high flows (positive δ q ).A similar analysis is more complex for the updates related to the production store shown in Figures 6a and 6b as the two updates δ s + and δ p e are of similar magnitude.
Table 3 displays the four evaluation metrics underlying the diagnostic performed in Step 3 of DAISI.The metrics are computed for the GR2M-kge, GR2M-bc02 and the updated model (DAISI).The three performance metrics (KGE, NSElog, and F B ) indicate a significant performance improvement of DAISI compared to both GR2M configurations for both calibration and validation periods.For example, KGE increases from 0.82 and 0.65 for the two GR2M configurations to 0.87 for DAISI when calibrating on Period 1 and evaluating on Period 2, and from 0.86 and 0.74 to 0.92 when calibrating on Period 2 and evaluating on Period 1. Similar metric improvements are seen for both NSElog and F B metrics with DAISI reaching systematically higher performance.
These improvements, especially when evaluating the model outside of the calibration period, suggest that the updated model is a robust alternative to GR2M.At the same time, the modeled rainfall elasticity ϵ P is generally higher for the updated model compared to both GR2M configurations.For example, ϵ P increases from 1.75 for GR2M-kge and 1.93 for GR2M-bc02 to 1.98 for the updated model when calibrating on Period 1 and evaluating on Period 2. Note that Period 1 was significantly wetter than Period 2 with a mean annual rainfall of 1,190 mm/ year compared to 1,093 mm/year for Period 2 as indicated in Table 3, which constitutes a valuable test to explore future climate scenario that are likely to be drier than present condition in South-East Australia (Charles et al., 2020).Given that the updated model improves all performance metrics compared to both GR2M configurations, it seems reasonable to assume that these elasticities are closer to the true elasticity, and hence better suited to evaluate the impact of future climate scenario.The high elasticity computed from the updated model suggests that the variability of future runoff projections will increase significantly compared to GR2M-kge, which is an important finding in a catchment contributing to inflows into one of the largest dams in Australia.
For the sake of brevity, the presentation of other diagnostic tools introduced in Section 2.4 is not done for the example catchment.Section 4.3 presents the application of these tools to the whole catchment data set.

DAISI Evaluation Metrics Computed for 201 Catchments
Following the application of DAISI Step 1 and 2 to the 201 catchments of our data set, this section and the next present the diagnostic obtained from DAISI Step 3.
The distribution of the Normalized RMSE ratio N R measuring the statistical reliability of the assimilated ensembles is presented in Table 4 covering the 201 catchments and the two calibration periods.The 25th percentile, median and 75th percentile are 0.70, 0.81, and 0.95.These values are lower than 1, indicating that the assimilated ensemble is slightly too wide for most catchments across the data set.Section S3 of Supporting Information S1 suggests that statistical reliability of the assimilated ensemble can be improved by tuning the variance reduction factor α e in the data assimilation algorithm (see Section 2.2).Such tuning was not undertaken here to keep the assimilation scheme as simple as possible and because it does not have a significant impact on performance metrics (Supporting Information S1).Overall, this result suggests that the ES algorithm reaches reasonable performance but could be improved, which is hardly surprising considering the strong linearity assumption underlying this data assimilation algorithm.This point will be further discussed in Section 5.3.
Figure 7 presents the distribution of the four metrics computed for the 402 catchments/periods over independent validation periods.The bar plots presented in the right-hand side of each plot show the percentage of catchments/ periods where metrics for the updated model (DAISI) are larger, similar or lower than GR2M-kge and GR2M-bc02 by more than 0.05. Figure 7 reveals that the median value of KGE, NSElog, and flow duration curve bias index F B is systematically higher for the updated model compared to both GR2M configurations, which confirms the superiority of the former over the later.With KGE in Figure 7a, the increase is small between the updated model (median of 0.69) and GR2M-kge (median of 0.65).However, it is much larger when comparing the updated model against GR2M-bc02 (median of 0.52).For NSElog in Figure 7b, the increase is large between the updated model (median of 0.76) and GR2M-kge (median of 0.68) but insignificant between the updated model and GR2M-bc02 (median of 0.75).F B shown in Figure 7c follows a similar pattern than NSElog.In terms of pairwise comparison, the updated model always obtains similar or better performance than the best of GR2M configuration for a majority of catchments and periods.For example, DAISI reaches a KGE that is significantly better than GR2M-kge in 43% of catchments/periods and similar to GR2M-kge in 46%.Against GR2M-bc02, these figures reach 65% and 20% of catchments/periods.All other metrics provided in Section S2 of Supporting Information S1 confirm these results showing that DAISI leads to a consistent and reliable improvement of performance compared to GR2M across all flow regimes.Even if the performance improvement is modest for certain metrics (e.g., KGE metric when comparing against GR2M-kge), the number of catchments where DAISI is worse than GR2M remains limited which suggests that the updated model does not introduce major structural trade-offs (e.g., favoring a certain type of catchments against another).
Overall, the updated model combines the strength of both GR2M configurations by equaling or exceeding their combined maximum for all performance metrics.This is an important result as it suggests that the DAISI structural updates surpass the performance obtained from alternative objective functions.
The fourth evaluation metric is the elasticity of modeled runoff to rainfall shown in Figure 7d.The median elasticity is 2.51 for GR2M-kge which increases to 2.73 for the updated model.Pairwise comparisons confirm this result with updated model elasticity being significantly higher than GR2M-kge in 71% of the site/periods.Comparing the updated model against GR2M-bc02 reveals that both models reach similar elasticity values with equal proportions of sites/periods where one is greater than the other.However, Section S4 of Supporting Information S1 shows that when using BC02 as an objective function, DAISI obtains a significantly higher elasticity on a majority of sites/periods (median elasticity of DAISI reaches 2.96 in this case).Consequently, it can be said that DAISI generally leads to higher elasticity values across the catchment data set.Considering that the updated model obtains better or equal performance than GR2M for most performance metrics, the elasticity from the updated model is very likely to be closer to reality than the GR2M elasticity.
Figure 8 explores the evaluation metrics further by showing the spatial distribution of metric averages between the two validation periods for each catchment.The first column of the figure shows the metrics for GR2M-kge, the second the metrics for the updated model and the last column the difference between the two.Figures 8a-8c corresponding to the KGE metric reveal that there are strong spatial trends in the performance improvement brought by DAISI which is mostly occurring in catchments located in the Western part of the state of Victoria (WVIC, see Figure 2 for location of this region) and the North-Eastern part of the state of New South Wales (NNSW).For these two regions, KGE improvements are greater than +0.10 (dark green triangles in Figure 8c).Improvement of rainfall-runoff model performance in the WVIC region is important because this region has been reported to suffer from strong rainfall-runoff non-stationarity with long lasting effects from recent drought (Peterson, Saft et al., 2021).Conversely, KGE values for the catchments located in the center of the domain (Eastern Victoria) are comparable between GR2M and the updated model (white dots).A closer inspection of Figures 8a and 8b reveals that GR2M reaches its highest KGE values in these catchments (dark blue points in Figure 8a).As GR2M simulations are of high quality there, it is difficult for DAISI to improve performance significantly.Nonetheless, it is important to note that DAISI does not degrade performance in this region.
The spatial distribution of performance differences for NSElog and F B metrics resembles the one of KGE as can be seen in Figures 8f and 8i.The updated model improves performance over GR2M in the WVIC and NNSW regions with limited gains in the central region.The rainfall elasticity follows the same spatial pattern with higher elasticity for the updated model compared to GR2M in WVIC and NNSW.It is worth noting that the GR2M elasticity in the WVIC and NNSW varies between 2.50 and 3.25 (light to dark blue points in Figure 7d) which increases by up to +0.50 with the updated model (dark green triangles in Figure 8l).This represents an increase in elasticity of 15%-20%.

DAISI Model Structure Diagnostic for 201 Catchments
The previous section confirmed the diagnostic undertaken in the example catchment which concluded that the updated model is a better alternative to GR2M over the catchment data set considered in this paper.Building on this result, the second part of the DAISI Step 3 diagnostic can be undertaken using plots described in Section 2.4. Figure 9 shows scatter plots of the update terms on the vertical axis.The horizontal axis displays percentile ranks of y s (production store) for the first two rows (Figure 9a-9d) and of y r (routing store) for the last two rows (Figures 9e-9h).
The streamflow updates (δ q ) shown in Figures 9g and 9h are similar for both regions with an update that is close to 0 for very low routing store levels, then decreasing to a median value of approximately 0.01 for percentiles of y r up to 0.7.Above this value, the streamflow update increases rapidly with y r reaching a positive median greater than +0.02 close to the maximum of y r .This pattern explains why the updated model improves performance on low flows measured by the NSElog metric discussed in the previous section by lowering simulated flows when the routing store is low, and hence correcting the tendency of GR2M to overestimate low flows (see example in Figure 3).The positive update seen for high values of y r leads to an increase in streamflow values if y r is high, explaining the modest increase in mid to high flow performance measured by the KGE metric.
The routing store updates (δ r + ) shown in Figures 9e and 9f are of much smaller magnitude than the streamflow updates.This suggests that the sum δ r + + δ q is largely dominated by the latter which, as explained in Section 3.1 and Appendix C, implies that the update term on the inter-basin exchange term (water gained or lost from the surface water catchment) is approximately equal to δ q .In other words, when the updated model increases streamflow by δ q , it decreases the exchange flux by δ q .
The behavior of the streamflow and routing store updates appears similar in NNSW and WVIC regions as can be seen by comparing Figure 9e with Figure 9f.Conversely, the updates on the production store δ s + and effective rainfall δ p e reveal a striking difference between the NNSW and WVIC regions.In NNSW region, variations of δ s + (Figure 9a) are negligible compared to those of δ p e (Figure 9c) but in WVIC region, they are of similar magnitude (Figures 9b and 9d).Based on Section 3.1 and Appendix C, this suggests that δ s + + δ p e ≈ δ pe in NNSW, and hence that the update on actual evapotranspiration is close to δ p e .In WVIC we can assume that δ s + ≈ δ p e hence that the update on the actual evapotranspiration is approximately 2δ p e .Consequently, the structural update affects the   actual evapotranspiration twice as much in WVIC as NNSW.This is an important finding to improve the representation of evapotranspiration in the model depending on the modeling region.
To go beyond the previous qualitative analysis of the structural updates, Figure 10 presents the singular value decomposition of the update coefficient matrix introduced in Equation 21.The results for streamflow (last row in Figure 10) are the easiest to interpret and are commented first.Figure 10j shows the components of the first two singular vectors along with their confidence intervals and weights (Equation 22) in the legend.The total weight for these two vectors is 0.93 (sum of 0.73 and 0.20), which is close to the maximum of 1 and indicates that the update coefficients are well approximated by linear combinations of these two vectors.This is a useful result because it suggests that the polynomial used to correct streamflow variable in the update equation (Equation 9) can be described accurately across the whole data set with two degrees of freedom only instead of 6 coefficients used in Equation 10.Furthermore, the singular vectors show narrow confidence intervals in Figure 10j which suggests that they are not influenced by the catchment selection and potentially applicable to a wider range of catchments.As seen in Figure 10j, the largest absolute values of the first vector components are associated with variables y r ( 0.24), y 2 r (0.76) and y r × y p e ( 0.60).As a result, the first singular vector corresponds approximately to the following update polynomial for the streamflow state variable: For a fixed value of y p e , this polynomial is equal to 0 when y r = 0, then decreases with y r to reach a minimum and finally increases with y r .This analysis explains the patterns seen in Figures 9g and 9h and precises the structural diagnostic by clarifying the role of y p e which was not apparent in Figure 9.In this discussion, the exact numerical values of the coefficients in Equation 24are less important than the functional form of the update which narrows considerably the type of form to be considered for future model improvement.In addition, Figure 10k shows that the first principal component exhibits strong regional trends with negative values for catchments in the WVIC and NNSW regions (dark purple triangles) and mild positive values in the central region (light green triangles).Consequently, the update equation which is mostly explained by this component exhibits a clear spatial pattern which could be used to define different update of the state equations in these regions.
The singular value decomposition corresponding to the R + state variable follows similar patterns than Q.The sum of the weights for the first two singular vectors is 0.95 (sum of 0.77 and 0.18, see legend in Figure 10g) which means that the update equation can be represented accurately by linear combinations of two vectors only in most catchment across the data set.The polynomial and regional trends (see figures Figures 10h and 10i) associated with the singular vectors are similar to the ones for Q.
The results for the S + (Figures 10a-10c) and P e (Figures 10d-10f) variables are less clear: the weights associated with the first two singular vectors are significantly lower than 1 (sum of 045 + 0.22 = 0.67 in Figure 10a and 0.40 + 0.32 = 0.72 in Figure 10c).For these two states, the low values of the weights reveal that the functional form of the update is more complex than linear combinations of the first two singular vectors in most catchments.
In addition, the confidence intervals of the singular vector shown in Figures 10a and 10c are relatively wide, suggesting that the component are affected by the catchment selection, and hence less likely to generalize beyond our data set.The most striking element visible in Figure 10b is the strong regional trend shown by the first principal component for S + .The component clearly differentiates the catchments located in the WVIC (positive component values) from the ones in the NNSW region (negative component values).Here again, these conclusions suggest that the improvement in the state equations are region specific.
Overall, the DAISI diagnostic identified several directions to guide future improvement of the GR2M model including elements related to parameterization of the update and its regional trends.A summary of these directions is provided in Section 5.2.

Advantages and Limitations of DAISI
Most existing methods used to improve model structures are based on trial and error using a finite set of structures that is arbitrarily selected by the modeler.Compared to this discrete approach, the first advantage of DAISI is that the exploration of alternative model structures is driven by data through data assimilation (DAISI Step 1) and   22) for the four state equations (rows) and for the 201 catchments with data pooled from the two calibration periods.The plots in the first column show the components of the first two singular vectors along with their 90% bootstrap confidence intervals.The plots in the second and third columns show the projection of the update coefficient vectors for each site and calibration period on the first (second column) and second (third column) principal component, respectively.
fitting of the update equation (DAISI Step 2).This process can identify modeling solutions that were not considered a priori due to the complexity of formulating multi-dimensional state equations to represent physical processes that are often poorly quantified at the catchment scale (e.g., see discussion about the difficulty to close mass balance by Safeeq, Bart et al., 2021;Huang, Wang et al., 2023).DAISI also offers an alternative to trial and error by considering a continuum of model structures generated via the update equation (Equation 9).The results presented in Sections 4.2 and 4.3 show that the updated equation improves performance significantly compared to the original model including for the simulation of contrasting hydro-climatic conditions, and converges to a reduced number of update configurations with clear regional patterns.At the same time, DAISI does not lose the potential for interpreting model equations based on a physical system understanding, which is the main issue with most machine learning approaches.
The second advantage of DAISI is its generic nature.The first two steps of DAISI, that is, data assimilation and fitting algorithms, are mostly independent from the model structure and observed data.The only model specific element in DAISI is the normalization of state equations used to remove the influence of model parameters.This normalization is needed to compare the structural updates across different sites as is done in Section 4.3.However, we point out that it is not compulsory if the focus is on a single site or if the model parameters are identical across sites (e.g., when using a landscape model with same parameter values across a region).Consequently, DAISI is a general method that could be used to guide improvement for a wide range of models.This opens opportunities for applying DAISI beyond catchment hydrology, for example, to landscape models (Yan et al., 2023).If run at point scale, these models can assimilate point measurements such as soil moisture as shown by Fox et al. (2018), allowing DAISI to be applied if updates of state equations can be inserted in the model code.In addition, DAISI can be used to incorporate observed data beyond the traditional climate inputs used in empirical rainfall-runoff models.This has been attempted (e.g., accounting for artificial storage in the GR4J lumped model by Payan et al., 2008) but remains a difficult exercise because model states in this type of model rarely correspond to observed data.In contrast, DAISI can incorporate additional data seamlessly via either the data assimilation algorithm by expanding the observed data vector d (see Section 2.2) or by adding a predictor to the update equation.This point is further discussed in Section 5.3.
Finally, DAISI is a flexible and modular method where each step is independent from the others.For example, the aim of data assimilation in Step 1 is to evaluate the conditional probability of states given input and observation data via an ensemble.In this paper, the linear ES described in Section 2.2 is used because of its limited computing requirements.However, any algorithm generating similar outputs more accurately could be used, which is discussed further in Section 5.3.Once an ensemble of states is generated, DAISI Step 2 fits the update equation to each assimilated ensemble.Here again the approach presented in this paper was chosen because of its parsimony and closed form solution but could be replaced with more powerful fitting techniques.
Despite the qualities highlighted above, the first obvious limitation of DAISI is that it requires an existing model structure to apply the update equation.Early attempts (not shown) of removing the existing state equation (f * n ) from Equation 9and creating a fully data-driven model structure led to poorer performance than the original model, which highlighted the difficulty of producing a model structure without a strong prior knowledge.However, relying on an existing model has benefits including the possibility to remove the structural update completely and revert to the original model if needed.Such a case is discussed in Section 4.2 where the GR2M model was seen to perform well in the central region of our modeling domain leading to structural update becoming negligible (see Figures 8c, 8f, 8i and 8l).In other words, the updated model identified by DAISI is unlikely to suffer from large reduction of performance against the original structure.
The second limitation of DAISI is the reliance on fixed model parameters obtained from a previous calibration exercise.A simple solution to overcome this limitation is to include parameters in the assimilated variables using the "state augmentation" technique (Pathiraja et al., 2016a;Vrugt, Diks et al., 2005).This approach was investigated (not shown) but did not lead to significant differences in both performance and diagnostic.Another more radical approach would be to repeat the whole DAISI process using parameters calibrated with different objective functions.This is done in Section S4 of Supporting Information S1 where the GR2M model is calibrated using a box-cox transformed sum of squared errors following McInerney et al. (2017).This exercise confirms that DAISI improves average performance for all metrics considered compared to GR2M but reveals that the largest improvements are obtained for different metrics compared to the ones identified in Section 4.2.This can be explained by the fact that the choice of objective function specializes the model in the simulation of a particular Water Resources Research 10.1029/2023WR036595 LERAT ET AL. streamflow regime (e.g., KGE focuses on mid to high flows).Within DAISI, data assimilation and structural updates correct the largest errors, most likely outside of this streamflow range, and consequently improve the corresponding performance metrics (e.g., low flow metrics when calibrating the model against KGE).Despite these performance differences, the results shown in Section S4 of Supporting Information S1 suggest that a change in objective function did not affect most DAISI diagnostic plots, especially the singular value decomposition shown in Figure 10, revealing that the choice of objective function may not be a critical factor in the DAISI diagnostic.These results are encouraging but it is acknowledged that more research is needed to formally incorporate parameter uncertainty in DAISI.
As indicated in Section 2.1, DAISI is a statistical method that aims to identify improved model structures that are deterministic and run like a classical rainfall-runoff models.To achieve this, the uncertainty in the update coefficients is not used in DAISI diagnostic, and only their expected value is retained through Equation 17.This choice was motivated by the need to improve rainfall-runoff modeling in climate change studies in which probabilistic hydrological modeling is seldom used.More fundamental research could explore this aspect further to combine climate scenario uncertainty with model structure uncertainty and obtain a more realistic range of water resources projections.
Finally, the simplicity of both the data assimilation and fitting algorithms used in this paper is another limitation of DAISI which may constrain its performance.As indicated above, the data assimilation algorithm could be replaced with more flexible approaches.Regarding the fitting algorithm, the lack of physical constraints is an important issue because it leads to update terms that are potentially non-physical (e.g., negative streamflow) and requires truncation when running the updated model as shown in Appendix C. Extensive checks on modeled time series such as the ones presented in Figure 6 along with the computation of multiple performance metrics reported in Section 4.2 did not reveal any obvious non-physical behavior of the updated model.This is likely to be due to the small amplitude of the updates compared to original model values which rarely leads to exceeding physical constraints.

What Have We Learned About the GR2M Model?
The DAISI method applied to GR2M, and more specifically the diagnostic conducted in Step 3, identified several elements to guide further improvement of this model.First, extensive analysis of performance metrics computed over a period independent from the calibration period concluded that the updated model improves all metrics, especially the ones related to low flow simulations.The updated model also increases the elasticity of modeled streamflow to rainfall significantly compared to GR2M-kge, which suggests that structural updates produce a more robust model for modeling future streamflow under climate change.
Second, clearly defined structural updates are found for the lower parts of the model including the routing store (R + ) and simulated streamflow (Q).Streamflow values are altered in the updated structure by reducing mid-range values (negative update) while increasing high values (positive update) following a form similar to the polynomial of Equation 24.The update for the routing store resembles the ones for streamflow but is of much lower magnitude, which lead to the conclusion that the updated model redistributes fluxes between streamflow and the inter-basin exchange (flux entering or leaving the surface water catchment).More precisely, the exchange flux is increased for low to mid-levels of the routing store and decreased for high levels of the routing store.Overall, these findings point to the need to modify the partition between streamflow and exchange flux in GR2M and relate this partition to the routing store level.
Third, the structural updates are less pronounced for the upper parts of the model including the production store (S + ) and effective rainfall (P e ).The updates reduce both variables for mid-range levels of the production store while leaving them unaffected for very low and very high values of the store.
Fourth, there are two regions where the structural updates in the production store differ significantly.In the Northern part of the state of New South Wales (NNSW), the updates of the effective rainfall are of much larger magnitude than updates of the production store level, which suggests that the updated structure introduced a redistribution of flux between effective rainfall and actual evapotranspiration compared to GR2M.The rainfall regime in this region is one of the most extreme in Australia, suggesting that infiltration excess runoff might be a dominant process there.This could explain why model updates favor an increase in effective rainfall.
In the WVIC, a similar flux redistribution is observed, but the modification in actual evapotranspiration is found to be approximately twice the change in effective rainfall.This more aggressive redistribution is likely to reduce production store level in this region, which is a recommendation formulated by Fowler et al. (2020) while investigating the cause for poor performance of rainfall-runoff models in this region.Western Victoria has experienced profound hydrological changes due to recent droughts that are still the topic of active investigation (Fowler et al., 2022).One of the plausible explanations is the persistence of high evaporative demand during dry periods, perhaps due to deep rooted vegetation response.This could explain the increase in actual evapotranspiration resulting from model updates in this region.Despite all these findings, it is acknowledged that the updated model generated by DAISI remains heavily parameterized as it depends on the two original GR2M parameters and 32 update coefficients (see Table 1).Incorporating the finding identified above into a more compact structure constitutes a logical follow-up of the work presented in this paper.

How Can DAISI Be Improved?
This paper presented a first version of the DAISI method.As mentioned in the previous sections, it is currently limited by the simplicity of the data assimilation in Step 1 and fitting algorithm used in Step 2.More flexible assimilation algorithms, such as ensemble particle filter (Moradkhani et al., 2005;Van Delft et al., 2009) could improve the quality of assimilated ensembles and allow the identification of more robust structural updates.For example, the data driven approach presented by Pathiraja et al. (2018c) constitutes a promising extension of the perturbation schemes used in the DAISI assimilation algorithm.In addition, the ES data assimilation algorithm used in this paper is applied independently in each catchment, hence neglecting spatial correlation that is likely to exist between observation errors in neighboring catchments.ES was originally designed by van Leeuwen and Evensen (1996) to assimilate observations in large gridded models which suggests that DAISI could be extended to a multi-site approach where streamflow from all catchments are assimilated jointly.Tian et al. (2022) showed an example of such approach with a spatially explicit error covariance matrix used to assimilate streamflow in a gridded landscape model.A multi-site configuration will be critical if DAISI is applied to areas where streamflow measurements are limited, hence reducing the effectiveness of data assimilation within DAISI.
As indicated in Section 5.1, the fitting of the update coefficients lacks physical constraints.This could be addressed by replacing the ordinary least squares solution introduced in Equation 18 by a Bayesian regression with a censored predictand defined according to physical constraints (see for example the model developed by Wang et al., 2009).However, such statistical models generally lack a closed form solution and rely on sampling methods, which would increase the computing time of DAISI significantly (the fit must be repeated for each assimilated ensemble).
Improving the current algorithms in DAISI as described above is important, but we believe that greater benefits would come from including more observed data.As mentioned in Section 5.1, DAISI is flexible enough to incorporate additional observed data in the assimilation algorithm or in the fitting of the update coefficients.In South-East Australia, evapotranspiration has a significant impact on runoff which is expected to grow in future climate (Fowler et al., 2020).Adding in-situ or remotely sensed actual evapotranspiration data to DAISI is possible and could lead to improvement in rainfall-runoff model structures for simulating both runoff and evapotranspiration.
Finally, it would be useful to extend the application of DAISI to daily models (e.g., GR4J) to confirm that the method can be applied to more complex structures and in the presence of delayed response.

Conclusion
This paper introduced the Data Assimilation Informed model Structure Improvement (DAISI) method which aims at analyzing and improving a hydrological model structure by combining the ES data assimilation algorithm with polynomial updates applied to the model state equations.The method is generic, modular and was demonstrated with an application to the GR2M monthly rainfall-runoff model and a data set of 201 catchments in South-East Australia.
The results show that the updated model generated with DAISI reaches higher median performance across the catchment data set for all metrics considered including KGE, NSE on log transform flow and flow duration curve bias.Performance improvement is largest for metrics measuring low flow performance such as log NSE where the updated model produced significantly higher performance scores.In addition, the elasticity of modeled runoff to rainfall was shown to increase from a median of 2.51 for GR2M to 2.73 for the updated model, which is closer to the observed data, suggesting that the structural changes will lead to more robust modeling of future streamflow under climate change.Finally, the DAISI diagnostic identified a reduced number of update configurations in the GR2M structure with clear regional patterns.These configurations correspond to specific polynomials of the inputs to the state equations that could form the basis for the definition of improved equations in a revised model.The regional patterns suggest that the structural updates correspond to distinct functions in three sub-regions of the modeling domain (WVIC, central region, and NNSW).
Several avenues for improvement were proposed starting with the incorporation of additional observed data in DAISI (e.g., actual evapotranspiration) to better constrain internal model variables.Other proposed improvements include the incorporation of parameter uncertainty and the testing of DAISI for more complex model structures or shorter simulation time steps.P e = P + S S 1 + S 2 S + (B4) R 2 = θ 2 (R + P e ) (B5) The four state equations listed in Table 1 correspond to equations Equation B3 (production store), Equation B4(effective rainfall), Equation B6 (routing store) and Equation B7 (streamflow).Dividing both sides of Equation B1 by X 1 leads to a form of the production store equation that is independent of θ 1 : y s 1 = tanh y p ) + y s 1 + tanh y p ) y s (B8) Where y s 1 = S 1 /θ 1 ,y s = S/ θ 1 ,y p = P/θ 1 .The same approach can be applied to equations Equations B2-B4, suggesting that one can obtain transformed state equations for states S + and P e that are independent of θ 1 when introducing the normalized variables y s = S/θ 1 ,y s + = S + /θ 1 ,y p e = P e /θ 1 ,y e = E/ θ 1 .Using such variables leads to the first two transform state equations: y p e = y p + y s y s 1 + y s 2 y s + (B10) where y s 2 = y s 1 1 tanh y e )) 1 + 1 y s 1 ) tanh y e ) (B11) Similar approach can be used for equations Equations B5-B7 by introducing y r = R θ 2 θ r , y r + = R + θ r , y p * e = P e θ 2 θ r ,y q = Q θ r .Using these variables, the two states equations Equations B6 and B7 become independent from parameter θ 2 and constant θ r as follows: Overall, Equations B9, B10, B12, and B13 constitute the four normalized state equations of GR2M.
It is worth noting that the state variables mentioned in Equations B1-B7 do not include actual evapotranspiration and inter-basin exchange (flux gained from or lost to neighboring catchments, see extensive discussion about this flux by Mouelhi et al. (2006)).The reason for this omission is that the variables listed above are sufficient to describe the model dynamic completely.In the case of the production store for example, once the store level at the start (S) and end (S + ) of the time step are known along with the effective rainfall (P e ), the actual evapotranspiration AE can be computed as a mass balance residual equal to In the right-hand side of this equation, all terms except the last one are derived from the GR2M structure while the last term is related to the update terms.Consequently, the opposite of the sum δ s + + δ p e can be considered as the update term for actual evapotranspiration.Similar manipulations for the routing store equations lead to F = R + P e θ r f r ŷr , ŷ * p e ) + f q ŷr , ŷ * p e )) + θ r ( δ r δ q ) (C13) As a result, the opposite of the sum δ r + δ q can be considered as the update term for the inter-basin exchange flux.
The findings derived from Equations C11 and C13 offer a way to relate the four update terms to actual evapotranspiration and inter-basin exchange flux.
last step of DAISI compares the original and updated model by answering three questions: (a) Is the updated model a robust alternative to the original model?(b) What is driving the updates?(c) Are there dominant functional forms of the update?

Figure 2 .
Figure 2. Site locations and relative change in rainfall between the two calibration periods.

Figure 3 .
Figure 3. Data Assimilation Informed model Structure Improvement Step 1-GR2M-kge variables (green dashed line) and assimilated ensembles (orange lines) for the Jamieson River at Gerrang bridge catchment.The plot covers the last 2 years of the first calibration period (1980-1999).Plots (b) and (d) use a log scale for the vertical axis.The black dotted line in plot (d) is the observed streamflow.

Figure 4 .
Figure 4. Data Assimilation Informed model Structure Improvement Step 2-Assimilated ( Δr + ) and predicted ( δr + ) update terms for the routing store level (R + state variable) and the first two assimilated ensemble members in plots (a) and (b).Plots (b) and (d) show the predicted versus assimilated update for the same ensembles along with the Pearson correlation coefficient between assimilated and predicted updates shown in the lower right corner of each plot.Updates from the 500 ensemble members are shown in plot (e) and (f).Data relates to the Jamieson River at Gerrang Bridge catchment and the first calibration period (1980-1999).

Figure 6 .
Figure 6.Data Assimilation Informed model Structure Improvement Step 3-Updated model simulations (blue line) and update terms (dashed blue line using secondary vertical axis) for the Jamieson River at Gerrang Bridge catchment.The plot covers the last 2 years of the first calibration period and the first 2 years of the second validation period.Plot (b) and (d) use a log scale for the vertical axis.The black dotted in plot (d) is the observed streamflow.

Figure 7 .
Figure 7. Data Assimilation Informed model Structure Improvement (DAISI)Step 3 evaluation metrics for GR2M calibrated using the Kling-Gupta Efficiency objective function (green), GR2M calibrated using the BC02 objective function (gray) and DAISI updated model (blue) for the 201 catchments.Metric values are computed for the two validation periods for each catchment.The bar charts on the right of each plot indicate the percentage of catchments/periods where DAISI is lower/similar/ higher or worse/similar/better than GR2M-kge and GR2M-bc02.

Figure 8 .
Figure 8. Spatial distribution of the four metrics for GR2M (first column) and updated model (Data Assimilation Informed model Structure Improvement [DAISI], second column) over the 201 catchments.Metric values are computed from and averaged over the two validation periods for each catchment.The difference between DAISI and GR2M metrics is shown in the third column with green upper pointing (pink lower pointing) triangles showing catchment with better (worse) performance for DASI versus GR2M.

Figure 9 .
Figure 9. Data Assimilation Informed model Structure Improvement Step 3-model structure diagnostic diagrams for four state equations (rows) and for catchments in the Northern New South Wales (first column) and Western Victoria (second column) regions.Data are from both calibration periods.The plots show the update term δ n on the vertical axis.The horizontal axis shows the percentile rank of y S for the first two rows and y R for the last two rows.Medians (black line), 25% and 75% percentiles (dotted lines) of the update term are computed by binning the data according to the variable on the horizontal axis.

Figure 10 .
Figure 10.Data Assimilation Informed model Structure Improvement Step 3-Reduced singular value decomposition of update coefficient matrices (see Equation22) for the four state equations (rows) and for the 201 catchments with data pooled from the two calibration periods.The plots in the first column show the components of the first two singular vectors along with their 90% bootstrap confidence intervals.The plots in the second and third columns show the projection of the update coefficient vectors for each site and calibration period on the first (second column) and second (third column) principal component, respectively.
The three steps of the Data Assimilation Informed model Structure Improvement method.
Box-Cox transform with a 0.2 exponent for streamflow following McInerney et al. (2017), while other variables are left untransformed.Subsequently, ES perturbs observed data, input andFigure 1. LERAT ET AL. state variables to obtain R ensembles for each time step t, noted dt [r] , ũt [r] , and xt [r]

Table 2
Hydro-Climatic Characteristics of the 201 Case Study Catchments

Table 3
Model Parameters and Metrics for the Jamieson River at Gerrang Bridge Note.Numbers highlighted in bold correspond to metrics computed over a validation period.LERAT ET AL.

Table 4
Distribution of Normalized RMSE Ratio (N R ) Computed From Assimilated Ensembles (Data Assimilation Informed Model Structure Improvement Step 1) Over the 201 Test Catchments and the Two Calibration Periods LERAT ET AL.