Bayesian parameter inference in hydrological modelling using a Hamiltonian Monte Carlo approach with a stochastic rain model

. Conceptual models of the rainfall-runoff behaviour of hydrological catchments have proven to be useful tools for making probabilistic predictions. However, model parameters need to be calibrated to measured data and their uncertainty quantified. Bayesian statistics is a consistent framework for learning from observed data, in which knowledge about model parameters is described through probability distributions. One of the dominant sources of uncertainty in rainfall-runoff modelling is the true rainfall over the catchment, which often needs to be inferred from a few rain-gauge and runoff measurements. Mod-5 elling this uncertainty naturally leads to stochastic differential equation models, which render traditional inference algorithms such as the Metropolis algorithm infeasible due to their expensive likelihood functions. Therefore, in hydrology and other applied fields of research, error models are traditionally oversimplified for ease of inference as additive errors on the output, leading to biased parameter estimates and unreliable predictions. However, thanks to recent advancements in algorithms and computing power, full-fledged Bayesian inference with stochastic models is no longer off-limits for hydrological applications. 10 We demonstrate this with a case study from urban hydrology, for which we employ a highly efficient Hamiltonian Monte Carlo inference algorithm with a time-scale separation.

Uncertainty in rainfall-runoff hydrological modelling arises mostly from input errors associated with an inaccurate estimation of the integrated rainfall over a catchment (Kavetski et al., 2006).These input errors, typically due to a combination of heterogeneous rainfall, sparse rain-gauge measurements and insufficient temporal resolution (McMillan et al., 2011;Renard et al., 2011;Ochoa-Rodriguez et al., 2015), can seriously deteriorate the quality of model calibration results for heavily inputdriven hydrological systems (Bárdossy and Das, 2008).A variety of stochastic weather generators (Rodriguez-Iturbe et al., 1987;Cowpertwait et al., 1996;Deidda et al., 1999;Paschalis et al., 2013;Langousis and Kaleris, 2014) have been proposed to simulate precipitation with its uncertainty.Although such weather generators can provide uncertain inputs to rainfall-runoff models, and therefore reproduce the effect of rainfall errors on runoff predictive uncertainties, such input uncertainties have been largely neglected in studies focusing on model parameter inference (Sikorska et al., 2012), probably because of the computational difficulty of including them in a likelihood function (Honti et al., 2013).Input uncertainties should be included directly in the input as a stochastic contribution, then transported through the model and thereby naturally incorporated in the likelihood function describing the probability distribution of observations given model parameters.In the Bayesian framework, the sought posterior probability distribution for the model parameters is proportional to the product of the likelihood function and the prior probability distribution describing our prior knowledge about model parameters.However, it is still common practice in hydrology, as well as in other applied disciplines, to consider an over-simplified error model based on likelihood functions defined as uncorrelated normal distributions centred on the outputs of a deterministic model (Yang et al., 2008;Reichert and Schuwirth, 2012;Sikorska et al., 2012).This inevitably leads to biased parameters and unreliable predictions (Renard et al., 2011;Honti et al., 2013;Del Giudice et al., 2015).Although so-called rainfall multipliers can mitigate this problem (Kavetski et al., 2006;Sun and Bertrand-Krajewski, 2013), they fail in assessing input uncertainties when a rainfall event is not detected by the available rain-gauges (Kavetski et al., 2006;Renard et al., 2011).
Del Giudice et al. (2016) proposed a method based on an input uncertainty model describing the rainfall as a continuous stochastic process.The method is called SIP (acronym for Stochastic Input Process).The idea of describing selected model parameters or inputs as stochastic time-dependent processes in order to take intrinsic uncertainties realistically into account in hydrological modelling has gained momentum in recent years (Tomassini et al., 2009;Reichert and Mieleitner, 2009;Reichert et al., 2021;Bacci et al., 2022).The SIP technique uses i) possibly inaccurate rain-gauge precipitation data, ii) runoff data from a flowmeter at the catchment outlet, iii) a hydrological runoff model, iv) a rainfall model in the form of a transformed stochastic Ornstein-Uhlenbeck (OU) process, v) models for rainfall and runoff observation errors, and vi) prior distributions, to infer in a Bayesian manner both the marginal posterior distributions for the parameters of interest and a "true" spatially-integrated average rainfall over the catchment.The SIP method uses the catchment as an additional rain-gauge to gather information about a catchment-averaged real rain, which is inferred from both prior knowledge and observations of rainfall and runoff.Both the hydrological model describing the catchment dynamics and runoff observations are assumed to be accurate in comparison to the rainfall data.The inferred rainfall pattern can then be used to calibrate the model and significantly reduce the bias in the estimated parameters, despite the inaccuracy of the rainfall data.As :::::::: However, :: as : described in detail in Del Giudice et al.
The HMC method bears valuable advantages with regard to both generality and efficiency.Indeed, it is by no means limited to an OU process, unlike the original SIP approach of Del Giudice et al. (2016), which strictly requires a linear stochastic process as a rain generator.Although in this study we also opt for an OU process for the sake of simplicity, it should be clear that such process could be arbitrarily replaced by any other stochastic process, thus giving the method significantly more flexibility in reproducing the statistical properties of real rainfall.The HMC method is also not at all limited to urban hydrology, and could be applied for instance in natural catchment hydrology as well.The specific case study presented here was chosen to put emphasis on input uncertainties, which often represent the biggest source of uncertainty in hydrological modelling in general.More information on possible hydrological applications can be found in Del Giudice et al. (2016).Moreover, the HMC algorithm allows us to sample from the posterior probability density both model parameters and a true averaged rainfall pattern that are simultaneously compatible with data, models and prior distributions.This yields very high acceptance rates even in the context of expensive high-dimensional problems, with great benefits in terms of performance and efficiency of the algorithm.Another interesting method from the family of particle filters to tackle high-dimensional inference problems for stochastic model calibration is Particle Markov Chain Monte Carlo (PMCMC) (Andrieu et al., 2010), which combines piece-wise forward simulations of the stochastic model with data-based importance sampling.Like HMC, PMCMC methods can be applied to any stochastic process, including (unlike HMC) processes with discrete states (e.g., numbers of organisms in ecological models).A generally low implementation effort could be an additional argument in favor of PMCMC algorithms, which do not require the differentiation of the posterior distribution like HMC methods.However, PMCMC methods may suffer significantly in terms of efficiency when compared to an HMC-based approach.A detailed comparison, based on the same case study presented here, of different methods for Bayesian inference with stochastic models, can be found in ?.
Hydrology can potentially take great advantages from more realistic stochastic models and a fast and reliable method for their calibration.Bayesian inference methods bear the great advantage over traditional optimization algorithms of providing an uncertainty estimation for the calibrated parameters in the form of a probability distribution.The knowledge of such uncertainty is important for making probabilistic predictions, which can be in turn a useful tool for decision makers.The ::::::::::::::: Bacci et al. (2023) : . ::: The ::::: HMC :::::::: approach :: is :::: very :::::: general ::: and ::::::: suitable ::: for : a ::::: broad ::::: range :: of :::::::::: applications :::::::: requiring :::::::: Bayesian ::::::: inference :::: with ::::::::: stochastic :::::: models.:: It :::::: should ::: be ::::: noted ::: that ::: the : method presented here is meant for offline-calibration of stochastic models, not for realtime updating, which might be needed in a model-based control setting.For the latter problem, filtering algorithms might be more appropriate, but we do not discuss this topic here.The HMC approach is very general and suitable for a broad range of applications requiring Bayesian inference with stochastic models.Moreover, HMC is inherently a MCMC method and it is therefore embarrassingly parallelizable by breaking it up into an arbitrary number of independent Markov chains.This makes HMC very well-suited for applications in the big-data regime or with expensive models.

Bayesian inference with a stochastic rain model
The SIP method of Del Giudice et al. ( 2016) describes the rain input to a hydrological system based on an unobserved and continuous stochastic process, the rainfall potential ξ(t).The latter should not be interpreted as a potential in the physical sense, but rather as a rain generator based on a latent, i.e., "potential", stochastic process that can be transformed into real precipitation P by a suitable empirical transformation P (t) = r(ξ(t)), which will be addressed below in Section 2.1.
The inference process allows us to learn from noisy rainfall and runoff time series, P obs and Q obs , respectively, the parameters θ of the hydrological system, the uncertainties σ ξ and σ z of both rainfall and runoff observation models, respectively, and the unobserved true rainfall over the catchment expressed as a discretized rainfall potential ξ = {ξ i }, to be interpreted as an evaluation of the stochastic process ξ(t) at a discrete set of time points t i , with i = 1, ..., N .
For this purpose, we subdivide each interval between consecutive rain observations into j P bins, yielding a total of n P j P + 1 = N discretization points, where n P + 1 is the number of rain observations, that is, the length of the precipitation timeseries P obs .Analogously, the system output dimension is discretized by partitioning the intervals between consecutive runoff observations into j Q sub-intervals, such that the total number of discretization points is n The number of discretization points is thus the same (N ) in both rainfall and runoff dimensions, and it defines the discretization time step dt = T /(N − 1), where T is the total time interval covered by observations.We will provide numerical values for all these constants later in Section 4. Here, suffice it to say that the only important requirement of the method is that the total number N of discretization points is large enough compared to the number of measurement points, n P and n Q , in order to accurately probe the fine dynamics occurring on short time scales between observations.Other features, such as for instance having the same number N of discretization points for rainfall and runoffs, are just arbitrary choices to ease the practical implementation of the method, which could be removed without altering the results and conclusions of this work.The same holds if the observations were irregularly spaced in time, in which case one could use observation windows with more/less intermediate discretization points.Note that a large number of discretization points N , with a fixed number of observation points, does only moderately increase the computational effort of the algorithm, since the part of the Hamiltonian dynamics that scales with N can be calculated analytically, as shown below.
The inference goal is to sample both parameter combinations (θ, σ ξ , σ z ) and realizations ξ of the stochastic process ξ(t) from a posterior distribution obeying Bayes' equation, which reads in its discretized form as, where f (θ, σ ξ , σ z ) is the joint prior distribution for the model parameters, and is the likelihood expressing the probability of observed data (P obs , Q obs ) given model parameters (θ, σ ξ , σ y ) and a system realization ξ, weighted by the prior probability density f (ξ).
The stochastic process realization ξ in Eq. 1 is a time-series of length N ≫ 1.The high dimensionality of the problem renders the likelihood computationally expensive, thus making the inference problem intractable with traditional Bayesian inference algorithms, such as random walk Metropolis algorithms, which require a large number of likelihood evaluations.On the other hand, when the inference target is only the posterior distribution for the model parameters (θ, σ ξ , σ z ) and model simulations are fast, one may resort to Approximate Bayesian Computation (ABC) algorithms (e.g., Albert et al. (2015)), which approximate the parameters posterior through repeated comparisons of model simulations with observations in terms of a low-dimensional set of summary statistics.However, here we are interested in the joint inference of model parameters and the real rainfall ξ, which makes ABC an inefficient approach.
To tackle this problem, we apply a Hamiltonian Monte Carlo (HMC) algorithm (Duane et al., 1987;Neal, 2011;Albert et al., 2016), which exploits the principles of Hamiltonian dynamics to attain very high acceptance rates and low auto-correlation even on high-dimensional sampling spaces.This makes it possible to explore such spaces with large steps, without compromising the acceptance rate, and thus making the algorithm very efficient.The inherent high efficiency of HMC is further boosted here by a time-scale separation analogous to the one described in Albert et al. (2016).
The HMC algorithm allows us to sample simultaneously from the posterior of Eq. 1 both model parameters (θ, σ ξ , σ z ) and realizations of the stochastic process ξ.In particular, the rainfall potential ξ is inferred indirectly using prior knowledge, the observed runoff Q obs , and the possibly inaccurate observed precipitation P obs .The discharge data is used as an indirect source of knowledge about the rainfall, that complements the unreliable information due to the sparse rain-gauge measurements.
The method described here requires a stochastic input process, i.e., a rainfall model, a hydrological rainfall-runoff model to describe the observed discharges Q obs , observation models for both rainfall and runoff, and prior probability distributions, which are all outlined below.

The rainfall model
The rainfall potential is described by a normal and linear Ornstein-Uhlenbeck (OU) process with mean zero and standard deviation unity, which can be written in the form of a Langevin equation as, where η(t) represents zero-mean Gaussian white noise and τ is the process correlation time.The latter is set equal to 636 seconds and will not be inferred.A list of model parameters that are assumed to be known and are not inferred is given in Table 1 at the end of Section 2.
The rainfall potential ξ(t) is then transformed into real rain P (t) by the nonlinear transformation (Sigrist et al., 2012;Ailliot et al., 2015;Del Giudice et al., 2016), where the zero/nonzero rain threshold ξ r , the scaling factor λ and the exponent γ are all inferred parameters.A list of all inferred parameters is shown in Table 2 (see end of Section 2).The inherent rainfall stochasticity is thus accounted for by the stochastic process of Eq. 2, while the skewness of the rainfall distribution and a finite probability of zero rain are embedded in the model by the transformation of Eq. 3. Note that since r (ξ) = 0 for every ξ ≤ ξ r , the transformation r is not invertible when the precipitation is zero.We will discuss this point in detail in Section 2.4.

The hydrological model
The stormwater runoff is modelled by a linear reservoir alimented by rainfall precipitations, P (t), and a constant groundwater flow, Q gw .The dynamics of the water volume S(t) in the reservoir is thus governed by, where A is the estimated catchment area contributing to the rainfall-runoff dynamics, Q M is the hydrological model describing the runoff at the outlet of the system and K is the retention time of the reservoir.It should be noted that the original hydrological model used in Del Giudice et al. ( 2016) is simplified here by omitting the daily variation due to the wastewater contribution.This is justified by the fact that in the present work we focus only on a single short dataset of 4 hours, whereas in Del Giudice et al. ( 2016) the authors consider 3 independent datasets covering a time span of about 48 days.One may refer to Section 4 for more details.
The discretized form of Eq. 4 reads as, where S i and P i are S(t i ) and P (t i ), respectively, with i = 1, ..., N .It should be noted that explicit methods, such as the forward Euler scheme applied in Eq. 5, are very easy to implement, however, they impose stringent limitations on the time step size to ensure numerical stability.In general, explicit methods might not be sufficiently accurate in regions where the solution exhibits a rapidly varying behaviour.In that case it would be advisable to apply an implicit backward scheme, numerically more stable, albeit more difficult to implement.In the application discussed here, we reckon that the problem is simple enough to opt for the explicit forward scheme, thus trading off accuracy for an easier implementation.The intrinsic inaccuracy of the method is attenuated by choosing a discretization time step dt that is sufficiently small compared to the system dynamics timescale. The ) can be calculated recursively using Eq. 5 with the initial con- where M,i is the ξ -dependent contribution defined recursively as M,1 = 0 and where we have used the rainfall potential transformation P i = r(ξ i ) of Eq. 3. The parameters of the hydrological model K, Q gw and the initial water volume S 1 are unknown and to be inferred (see Table 2), while the catchment area is known and fixed as A = 11815.8m 2 (Table 1).

Runoff observation model
The probability distribution for the observed discharges Q obs given the model predictions Q M,i (ξ, θ) is assumed to be a normal error model with standard deviation σ z , which reads as, where (s−1)j Q +1, with s = 1, ..., n Q +1, are the indices corresponding to real observations in the discretized runoff dimension, and H is a transformation introduced to take the heteroscedasticity of the errors into account (Del Giudice et al., 2016), with the parameters α = 25 l/s and β = 50 l/s (Table 1).Since we are interested only in terms depending on the parameters to be inferred, in Eq. 8 we can neglect constant multiplicative factors such as the Jacobian

Rainfall observation model
The observation error model for the rainfall, given the rainfall potential ξ(t), is defined in the space of the rainfall potential as a normal distribution centered on ξ(t) and with standard deviation σ ξ .Its discretized form can be expressed in terms of the discretized potential ξ as a product of normal distributions, where ξ obs is defined as the effective rainfall potential generating the observed rainfall and N (ξ,σ ξ ) denotes a normal distribution with mean ξ and standard deviation σ ξ .Note that in Eq. 10 the ξ (s−1)j P +1 with s = 1, ..., n P + 1 are the elements of the discretized potential ξ corresponding to time points where rainfall observations are available.This distribution is transformed to real rainfall by the inverse transformation ξ obs = r −1 (P obs ) (Eq. 3).However, since all ξ-values below ξ r are transformed to zero rain, the transformation r is invertible only where P obs ̸ = 0. Therefore, we need to distinguish two possible regimes, with and without rain: -At time points where P obs ̸ = 0, the probability density of Eq. 10 reads, where the sum i,P ̸ =0 extends only over time points t i where P obs,i ̸ = 0 and n P ̸ =0 is the total number of such points.
Moreover, the transformation from ξ− to P −values requires the Jacobians J i defined as, -At time points where P obs = 0 and therefore ξ obs can take any value below ξ r , we integrate the probability density of Eq. 10 over the interval [−∞, ξ r ].This yields the probability of zero observed rain, which turns out to be the cumulative distribution function of the normal distribution, that is, where the product i,P =0 extends over time points t i where P obs,i = 0. Therefore,

The priors
At this point, the only elements of Eq. 1 that still need to be defined are the prior distributions f (θ, σ ξ , σ z ) and f (ξ).The former is simply the product of normal or log-normal univariate probability densities for each individual parameter to be inferred.The parameter vector θ here includes the parameters of the hydrological model, K, Q gw and S 1 , as well as those of the transformation r (Eq.3), that is, ξ r , λ and γ.We infer a total of 8 parameters, listed in Table 2.For the parameters S 1 and ξ r the prior densities are assumed to be normal distributions, whereas for the other parameters (K, Q gw , λ and γ) the prior densities are assumed to be log-normal distributions, with mean and standard deviation µ LN and σ LN , respectively.Analogous log-normal distributions are assumed for the rainfall and discharge observation uncertainties, σ ξ and σ z , respectively.The prior distributions for all parameters to be inferred, with their mean values and standard deviations, are summarized in Table 2.
Our prior knowledge of the rainfall potential ξ is defined in terms of a function S(ξ), called action (Lau and Lubensky, 2007;Albert et al., 2016), as, where the action can be written in its discretized form for the SDE model of Eq. 2 as (see Appendix A), and the initial condition for ξ is specified as the marginal distribution of a standard OU process, which is a standard normal distribution for ξ 1 .It is noteworthy that the HMC method described here always requires an explicit analytical form for the action S(ξ), which is essentially just the negative log of the prior density f (ξ), which is also needed in any other Metropolistype sampling algorithm.Although in this study we follow the approach of Del Giudice et al. ( 2016) and resort to a linear OU process as a precipitation generator, an analytical expression for the action is generally readily available even for more 275 complex and nonlinear SDE models.More details about the procedure to derive the action for generic SDE models can be found in Appendix A. Before setting off to implement the HMC algorithm, we take one further convenient step, i.e., we apply the transformation from the coordinates ξ to the so-called staging variables u (Tuckerman et al., 1993) using, which leaves the components corresponding to measurement points unchanged, and with s = 0, ..., n P − 1 and k = 2, ..., j P , and with, for the intermediate discretization points.Also relevant are the inverse transformations for the discretization points, with s = 0, ..., n P − 1 and k = 2, ..., j P . (20) The action S(ξ) (Eq.16) can be formulated in the space of u-coordinates (see Appendix B) as, while the initial condition ξ 2 1 /2 can be simply replaced by u 2 1 /2.This coordinate transformation, analogous to a transformation to canonical coordinates, is not a strict requirement of the HMC method and undoubtedly adds a further degree of complexity to the overall strategy.However, it also bears remarkable computational benefits.Indeed, the first term on the right-hand side of Eq.21 describes the potential of a system of uncoupled harmonic oscillators, which can be effortlessly solved analytically.
These dynamics also turn out to be much faster than all other characteristic timescales of the system.In Section 3 we will describe in detail how this can be exploited.

The HMC algorithm
The HMC algorithm interprets the negative logarithm of the posterior density as a potential energy driving the dynamics of a fictitious statistical mechanics system whose configurations, namely, the system's degrees of freedom, are described by combinations of both parameters (θ, σ ξ , σ z ) and realizations ξ of the stochastic process ξ(t).The degrees of freedom of the system are coupled to conjugate variables, that is, to momenta π, paired with the parameters (θ, σ ξ , σ z ), and p, paired with the realizations ξ.The posterior density of Eq. 1 can be rewritten in the following discretized form, with the Hamiltonian H HMC defined as, where N p is the number of parameters to be inferred (8 in our case) and we have introduced two auxiliary terms on the righthand side using the vectors of momenta π and p, akin to the kinetic energies associated with the degrees of freedom of the fictitious statistical mechanics system.
The masses m α and m i are tuning parameters of the algorithm.Since we want the coordinates ξ i corresponding to actual observations to be more tightly constrained than those corresponding to the intermediate discretization points, we set m sj P +1 = M with s = 0, ..., n P , for the "heavy" measurement points, and m sj P +k = m with s = 0, ..., n P − 1 and k = 2, ..., j P , for the "lighter" intermediate discretization points, with M ≫ m.
The potential energy, proportional to − log [f (ξ, θ, σ ξ , σ z | P obs , Q obs )], guarantees that each state of the system is compatible with the observations, and constrained within their measurement uncertainties, as well as with the prior distributions for both model parameters and rainfall potential realizations ξ.
The HMC algorithm iterates the following steps.First, vectors of momenta π and p are drawn from the ::::::::: zero-mean normal distributions defined by the kinetic terms in Eq. 23.Then, the system is let to evolve for a fictitious time interval dτ in the (ξ, θ, σ ξ , σ z ; π, p) phase space according to Hamilton's equations.This Hamiltonian dynamics is controlled by tuning the fictitious masses in Eq. 23, which represent the variances of the normal distributions for the corresponding momenta.
Finally, the discretization error on the energy conservation introduced with the integration of Hamilton's equations is corrected by a Metropolis accept/reject step.The resulting marginal distributions of the Markov chains of the system configurations (ξ, θ, σ ξ , σ z ) represent a sample of the sought posterior density.The method is described in detail in Albert et al. (2016).
Using Eq. 1 and the definitions of Sections 2.1, 2.2, 2.3, 2.4 and 2.5, we write the HMC Hamiltonian as, where the three components describe dynamics occurring on different timescales.Let us consider them individually.The first component, contains the harmonic term for the intermediate discretization points from the action of Eq.21 and scales like the total number of discretization points N .The second component, contains a harmonic term for the measurement points from Eq.21 and the observation error models f (Q obs | u, θ, σ z ) and f (P obs | u, σ ξ ) from Sections 2.3 and 2.4, respectively.All the components of Eq. 26 scale like the number of observations n P or n Q .Note that both the observation models for runoff and rainfall are rewritten in the space of u-coordinates.While the coordinate transformation ξ → u is straightforward for the rainfall observation model, which depends only on measurement points, it is somewhat more cumbersome for the runoff observation model, which requires the ξ-dependent component Q (see Eqs. 6 and 7) to be rewritten in the u-space as Q (u) M,i .Such transformation is described in Appendix C. The third component of the Hamiltonian H HMC is, which includes the remaining terms from the rainfall potential prior (Eq.15) and the model parameter priors.The latter, i.e., the last two terms in Eq. 27, are sums over the parameters θ N = (S 1 , ξ r ) and θ LN = K, Q gw , λ, γ, σ ξ , σ z , whose prior densities are normal and log-normal distributions, respectively.Note the sign change in front of the term u 2 1 /4 with respect to Eq. 21 due to the initial condition for u 1 (see Eq. 15).The component H 1 does not grow unbounded with neither N nor n P or n Q (notice that dt ∝ 1/N ).
Let us now exploit the different timescales characterizing the three components of the Hamiltonian H HMC .We assume the regime H N ≫ H n ≈ H 1 , implying that the number of data points is not too large and that the total number of discretization points is instead large compared to the number of data points.Under these conditions the dynamics of the system occur on very different time scales.In particular, the dynamics described by H N are much faster than the other components of the Hamiltonian and therefore impose the most stringent limitations on the step size for the numerical integration of Hamilton equations.However, we use Trotter's formula (Tuckerman et al., 1992) to construct a reversible molecular dynamics integrator to take the different time scales into account as described in Albert et al. (2016).In particular, we exploit the fact that the much faster dynamics of the intermediate discretization points described by H N is analogous to a system of uncoupled harmonic oscillators that can be solved analytically.This analytical solution gives a significant boosting contribution to the intrinsic efficiency of the HMC algorithm.The "slow" remaining components of the Hamiltonian, H n and H 1 , can be integrated using a much larger integration step, which allows us to explore the high-dimensional parameter space of the system with remarkable efficiency.
As explained in Albert et al. (2016), the decoupling of the different dynamics and the analytical solution of the expensive fast component is always possible for one-dimensional SDE models.It is also possible in the case of multiple independent variables, where the decoupling procedure can be applied to each of them individually.In this way, our HMC approach covers a significant range of possible hydrological modelling scenarios.However, in this work we focus only on 1D models, leaving the exploration of higher-dimensional models to future studies.
In this work we apply a HMC method with a stochastic input model (SIP) following Del Giudice et al. (2016) to investigate the rainfall-runoff dynamics of an urban catchment based on real rainfall and runoff observations.The catchment is a small and fast-reacting sewer network in Adliswil, near Zurich, Switzerland.Two typical scenarios of possible rainfall data are considered here, that is, we use: 1) high-resolution data, with a time resolution of 1 minute, recorded by two pluviometers (denoted P1 a and P1 b ) located in the immediate vicinity of the catchment area, 2) low-resolution data, with a time resolution of 10 minutes, recorded by a pluviometer (denoted P2) located further away from the catchment, at a distance of about 6 Km.We shall refer to the two scenarios as scenario 1 (Sc1) and scenario 2 (Sc2), respectively.More information can be found in Del Giudice et al. (2016).
The precipitation data P obs used in this study contain n P + 1 = 241 observations in Sc1 and n P + 1 = 25 observations in Sc2, covering a total observation time T = 240 minutes.The initial and final observation time points are the same for both scenarios and include a storm event that took place in the evening of June 10, 2013, approximately between 18:00 and 20:00.
We should mention here a substantial difference with respect to Del Giudice et al. ( 2016).In their work, the authors base the inference process on three independent time series covering a total observation time of 1710 minutes distributed over a time span of 48 days.In the work presented here, instead, we consider only the first of the 3 time-series, which happens to be also the shortest.Although the HMC algorithm could deal with the multiple independent time-series used by Del Giudice et al. ( 2016), our simplification finds its justification in that it reduces the computational requirements of the problem, without compromising the goal of our work, that is, showing the feasibility of Bayesian inference of both model parameters and high-dimensional system states (i.e., the ξ's) under the considerable hardship of a stochastic input process.
The discharge flow at the outlet of the catchment was measured with a time resolution of 4 minutes, and the output observations Q obs , containing n Q + 1 = 61 measurements, are the same for both scenarios and considered accurate compared to the precipitation data.The initial and final observation times are the same as for the input time-series.The time-series for the observed outputs (Q obs ) as well as the observed precipitation (P obs ) for both Sc1 and Sc2 are shown in Fig. 5.
Scenario 1 represents a best-case scenario of input data availability, and we shall therefore classify it as the accurate input scenario, while scenario 2 is a typical example of inaccurate and unreliable input data due to both its sparsity and the distance of the rain-gauge P2 from the area of interest.The runoff observations Q obs exhibit an important rainfall event (the storm) represented by an evident discharge peak.While this event is properly recorded by the rain-gauges of Sc1, the inaccurate input data of Sc2 misleadingly recorded the event at an earlier time period, presumably when the storm passed over the location of the distant pluviometer P2.
We are particularly interested in the performance of the combined SIP-HMC method in the latter case, characterized by faulty precipitation data, which clearly represents the most challenging scenario and therefore the hardest test for the HMC method.Therefore, our work consists of three main steps.First, we use the combined SIP-HMC approach described above, with the inaccurate precipitation data P obs of Sc2, to calibrate the model and infer the unknown "true" average rainfall pattern over the catchment.Then, we use the accurate rainfall observations from Sc1 as a reference to assess the quality of the simulated "true" rain.Finally, we repeat the calibration process using the accurate data of Sc1 as a further validation for the method.

Implementation
The HMC algorithm is implemented in C++14 using the open-source ADEPT library (Automatic Differentiation using Expression Templates, version 1.1) for the calculation of the gradients of the Hamiltonian H HMC (Hogan, 2014).Automated differentiation allows us to automatize the algorithm to a large extent, thus making it applicable to a broad range of models with relatively little implementation effort.Indeed, for the hydrological application presented here we resorted to the algorithm already implemented for the much simpler toy model studied in Albert et al. (2016).The implementation of the algorithm remained essentially unaltered, except only for the Hamiltonian H HMC that had to be modified according to Eqs. 25, 26 and 27.
The simulations were run on 2.6-3.7 GHz processors Xeon-Gold 6142 with 196 GB of memory.We observed a relatively short burn-in phase for all inferred parameters, suggesting the possibility of a straightforward (embarrassingly simple) parallelization of the algorithm obtained by simply breaking up the Markov chains into smaller independent chains that can then be executed as parallel processes.It is well-known that Markov Chain Monte Carlo (MCMC) methods, like the HMC algorithm employed here, are indeed very well suited for parallel computing.This kind of approach was proven successfully in Albert et al. ( 2016) with a toy system.In that case we used an OpenMP-based parallel implementation of the algorithm and observed a reasonably linear strong scaling behaviour with up to 16 parallel processes.
In the present work, for Sc2, after an initial single burn-in chain of 75k steps, which is then disregarded, we limited ourselves to running 4 independent Markov chains each of length 100k steps based on a serial implementation of the algorithm.For Sc1, which is faster than Sc2 due to the much smaller number of intermediate discretization points (see below for more details), we considered a single chain of 750k steps and disregarded the first 150k steps.The extension of the current serial version to an OpenMP parallel implementation of the HMC code would be straightforward.
We set a fine-grid time step dt = 10 seconds and a total number of discretization points N = 1441 for both scenarios.It is easy to verify that these conditions are fulfilled by setting the number of bins between consecutive observations to j P = 6 or j P = 60 on the precipitation dimension for Sc1 and Sc2, respectively, and j Q = 24 for the runoff dimension.The initial configuration of the system state ξ for the burn-in chain is a random realization of the OU process of Eq. 2, while the parameters are set equal to the mean values of their prior distributions (see Table 2).
The algorithm requires tuning of two sets of parameters, that is, the parameters defining the Hamiltonian propagator in the molecular dynamics part of the HMC algorithm (see Eq. 26 in Albert et al. (2016)), and the masses m, M and m α defining the kinetic energy of the system in H HMC .Thus, in the Hamiltonian propagator we set the integration time ∆τ = 0.015 and the number of integration steps P = 3, while the masses were set to m = 0.4 for the intermediate discretization points and M = 1.6 for the measurement points.The masses for the inferred parameters are given in Table 3.
It should be noted that the integration time interval of the Hamiltonian propagator could be automatically optimized by employing the so-called No-U-Turn Sampler (NUTS) (Hoffman and Gelman, 2014) and the masses of the kinetic terms could be to Sc2 reflects exactly this attempt of the algorithm to find a better fit to the rain observations, especially where precipitation values are large.

465
Among the parameters of the hydrological model, the marginal distribution of the retention time K appears to be clearly shifted to smaller values in Sc1, suggesting a faster reacting system compared to Sc2, while the groundwater contributions Q gw and the initial water volume S 1 exhibit only very minor shifts.
Markov chains for the model parameters from Sc1.Unlike Sc2, we have in this case a single chain for each parameter.
In Fig. 4 we also show two typical Markov chains from Sc2 for the stochastic process ξ, evaluated at two time-points with and without rain.The two chains clearly fluctuate above (point with rain) and below (point without rain) the inferred 470 zero/nonzero rain threshold ξ r .Analogously to the model parameters, the Markov chains for ξ appear to have converged and to be well mixed.The ξ chains in Sc1 are qualitatively identical to those of Sc2 and are not shown here.
In the left panels of Fig. 5 we compare the inferred discharge and rainfall patterns, Q M (ξ, θ) and r(ξ), respectively, based on the inaccurate rainfall data of Sc2, with the observed runoff Q obs and precipitation P obs .The measured outflow (upper horizontal solid line is the median of the inferred zero/nonzero rain threshold ξr, while the shaded area represents its uncertainty given by the 2.5 and 97.5 percentiles.As in Fig. 1, we have 4 independent chains for each point.that some rainfall events were either not detected, or recorded at a different time point, by the rain-gauge (P2) that produced those data.This is of course in line with the inaccuracy of Sc2.
The observed output peaks are used by the HMC algorithm as an additional source of information about the rain falling over the catchment area during the observation time.This new information, together with the stochastic input model, is used to attempt a reconstruction of a true rainfall pattern.The simulated rainfall and outflow patterns are represented by the medians of their inferred distributions (black line) and an uncertainty given by the 2.5%-97.5% quantiles (gray area).The rainfall pattern reconstructed using the inaccurate data of Sc2 (lower left panel) clearly displays the peaks corresponding to the rainfall events that had been missed by the pluviometer P2 located away from the catchment (filled blue squares).Such predicted peaks reproduce very accurately in both time and duration the rainfall events detected in Sc1 by the rain-gauges P1 is that the rainfall patterns predicted in the two scenarios are qualitatively very similar, despite the significant difference in the accuracy of the data used for the inference.
Although not shown here, we have also run the HMC inference without rainfall data at all, i.e., omitting the term f (P obs | ξ, σ ξ ) in the posterior density, obtaining both model parameter marginals and a predicted rainfall pattern that are substantially identical to those obtained with the inaccurate data of Sc2.Essentially, in Sc2 the HMC algorithm "learns" that the observed rain is unreliable and should thus be ignored.However, in most applications the accuracy and reliability of the measured precipitation data is unknown a priori.In those cases the rainfall observations can be safely used in the inference process, since the algorithm itself will assess its accuracy and possibly disregard it in favor of a more reliable reconstructed rainfall.
The inferred outflows, shown in the upper panels of Figures 5 match very well the observations and are essentially identical, compatibly with the assumption that runoff data are accurate and the same for the two scenarios.we demonstrated the performance of the method using a rainfall-runoff toy model with synthetic data time-series.For purely didactic purposes, the rain input to the system was modeled using a smooth sinusoidal function.

Figure 1 .
Figure 1.Markov chains for the model parameters generated using the faulty input data of Sc2.As explained in Section 4.1, we have 4 independent chains for each parameter.

Figure 3 .Figure 4 .
Figure 3. Marginal posterior probability densities for the model parameters from Sc2 (thick black line) and Sc1 (thin gray line).The dashed lines represent the prior densities.

Figure 5 .
Figure 5.Comparison of observed and predicted discharges (top) and rainfall (bottom), based on Sc2 (left) and Sc1 (right) data.Predictions are represented by the median (black line) of the corresponding inferred distributions and a confidence interval given by the 2.5 and 97.5 percentiles (gray area).In the bottom panels the predicted rain is compared with both the inaccurate data of Sc2 (filled squares) and the accurate data of Sc1 (open squares).

Table 1 .
List of model parameters that are assumed to be known, with their values and units.

Table 2 .
Prior probability densities for the inferred parameters, with their mean values (µ), standard deviations (σ) and units.N and LN stand for normal and log-normal distributions, respectively.Note that here, also for log-normal distributions, µ and σ refer to the mean value and standard deviation of a parameter, not its log.The mean and standard deviation for the log, µLN and σLN, respectively, are given by µLN = log(µ 2 / µ 2 + σ 2 ) and σLN = log(1 + σ 2 /µ 2 ).