Estimating Bedload From Suspended Load and Water Discharge in Sand Bed Rivers

Estimates of fluvial sediment discharge from in situ instruments are an important component of large‐scale sediment budgets that track long‐term geomorphic change. Suspended sediment load can be reliably estimated using acoustic or physical sampling techniques; however, bedload is difficult to measure directly and can consequently be one of the largest sources of uncertainty in estimates of total load. We propose a physically informed predictive empirical model for bedload sand flux as a function of variables that are measured using existing acoustic or physical sampling techniques. This model depends on the assumption that concentration and grain size in suspension are in equilibrium with reach‐averaged boundary conditions. Bayesian inference is used to fit model parameters to data from eight sand‐bed rivers and to simulate bedload flux over the available gage record at one site on the Colorado River in Grand Canyon National Park. We find that the cumulative bedload flux during the 9 year period from 2008 to 2016 was 5% of the cumulative suspended sand load; however, instantaneous bedload flux ranged from as little as 1% of instantaneous suspended sand load to as much as 75% of instantaneous suspended sand load due to fluctuations in flow strength and sediment supply. Changes in bedload flux at a constant discharge are indicative of short‐term sediment supply enrichment and depletion. Long‐term average bedload flux cannot be expected to remain constant in the future as the river adjusts to changes in sediment runoff and the dam‐regulated discharge regime.


Introduction
Estimates of fluvial sediment load provide an important tool for quantifying large-scale geomorphic change. In a wide range of environments, suspended sediment load can be accurately constrained using acoustic surrogates for sediment concentration (Topping & Wright, 2016;Topping et al., 2004), enabling low-cost measurement of suspended load at high temporal resolutions over multiyear time scales (Dean et al., 2016;Grams et al., 2013Grams et al., , 2018. However, acoustic estimates of flux depend on assumptions about the vertical concentration distribution that are reasonable if not strictly valid in the interior of the flow (Gray & Gartner, 2009) but that become increasingly dubious in the near-bed region. Bedload may vary significantly with respect to suspended sediment load due to changes in Rouse conditions (van Rijn, 1984).
Existing procedures for measuring bedload separately from suspended load in sand-bedded rivers (Gray et al., 2010;Holmes, 2010) are incompatible with the goals and limitations of long-term monitoring. Direct physical sampling is costly and can be inaccurate in large rivers due to undersampling (Pitlick, 1988), and existing predictive bedload transport models that might be used in lieu of direct measurements (e.g., Wong & Parker, 2006) generally require, at minimum, an estimate of the skin friction component of bed shear stress, which in turn necessitates additional measurements or models each subject to their own logistical limitations and unce:rtainty. Sediment budgets therefore rely on simplified treatments of bedload flux that can introduce large persistent biases to estimates of total bed material load. For example, bedload is typically estimated either as a constant fraction of suspended load (Grams et al., 2013), a power law function of water discharge (i.e., a rating curve) (Ellison et al., 2016), or ignored (Wright et al., 2010). This is problematic because bedload flux can be a substantial fraction of total load in suspension-dominated rivers, particularly at low flow conditions (Turowski et al., 2010); bedload flux can vary relative to suspended load due changes in suspension conditions, and it can vary with respect to a fixed water discharge due to changes in bed material composition and channel geometry (Topping, Rubin, & Vierra, 2000;Topping, Rubin, Nelson, et al., 2000).
The purpose of this paper is to provide a reliable means for estimating bedload flux in sand-bed rivers when suspended sediment information is available. The rationale behind our approach is that bedload and suspended load are mutually determined by the same causal boundary conditions at the reach-averaged scale. As a result, measured changes in concentration and grain size in suspension can be used to deduce changes in these boundary conditions and estimate bedload flux. This concept was first proposed by  and underlies an empirical model that expresses bedload flux per unit channel width as a function of unit water discharge, suspended sand concentration, and suspended sand diameter.
Our primary goal is to estimate bedload flux from gage data and propagate uncertainty through estimates of cumulative load. This is accomplished using Bayesian inference, which provides a convenient framework for quantifying uncertainty in sediment transport parameters using numerical Markov chain Monte-Carlo (MCMC) methods (Schmelter & Stevens, 2012;Schmelter et al., 2011Schmelter et al., , 2015. Moreover, Bayesian techniques implemented in the MCMC framework enable rigorous propagation of uncertainty through individual estimates of sediment load and time-integrated mass balance calculations . Our model can be applied in any sand-bedded river and does not require site-specific calibration. However, our analysis reveals that predictions may be biased on a site-specific basis such that greater predictive accuracy is achieved when the model is fit using only data from one site. This is particularly important when computing sediment budgets because error associated with model bias accumulates over time (Grams et al., 2013). Unfortunately, site-specific data are not always available; in order to meet the varying needs of different applications, we present three modeling approaches that utilize historical data from seven rivers reported by Toffaleti (1968) to varying degrees. The first approach involves pooling all data to estimate model parameters and is acceptable for obtaining single estimates of bedload flux at sites where no direct observations are available. The second approach utilizes only data from the site of interest and is suitable when extensive site-specific data are available. The third approach involves a hierarchical modeling framework (Christensen et al., 2011;Gelman et al., 1995) that optimizes use of limited site-specific data by using sites with many observations to inform prediction at sites with relatively few observations. Application of all three approaches is demonstrated at one sediment monitoring station on the Colorado river. The statistical procedure presented here ultimately provides a convenient method for tracking changes in bedload flux driven by flow strength and sediment supply limitation over time scales ranging from days to years.

Colorado River Sediment Monitoring
On the Colorado River in Grand Canyon National Park, flux-based sediment budgets inform flow regulation protocols aimed at minimizing the downstream impact of Glen Canyon Dam. The primary management objective is the reversal of long-term depletion of alluvial sand deposits, especially emergent deposits known as eddy sand bars, through the use of controlled floods (Grams et al., 2015;Topping et al., 2010;Wright & Kaplinski, 2011). However, the range of available management solutions is limited; this objective must be accomplished without compromising other economic (Ingram et al., 1991) and ecological (Melis et al., 2012) objectives. Designing such a protocol requires a detailed understanding of the dynamics of flow and sediment transport through the canyon.
In the dam-regulated Colorado River, the upstream sediment supply is completely independent from water discharge. Undammed tributaries comprise the only resupply of alluvial material to the post-dam river, while the hydrograph is determined by clear water releases from Lake Powell (Andrews, 1991;Rubin et al., 2002;Topping, Rubin, & Vierra, 2000). Sediment supply and flow fluctuations cause complex morphodynamic interactions as the channel adjusts to accommodate pulses of sediment under the imposed discharge regime. Confinement by bedrock and bouldery debris fans also limits the extent to which flow can modify local slope and hydraulic geometry. As a result, antecedent sedimentary and morphological conditions are as important as water discharge in regulating instantaneous sediment transport . This condition, known as "supply limitation," is common in natural rivers but is particularly pronounced on the Colorado River and other dammed rivers due to artificial flow regulation and sediment starvation (Dolan et al., 1974;Schmidt & Graf, 1990).
Modeling the dynamics of alluvial deposits in supply-limited systems requires substantial physical simplifications and empiricism (e.g., Wright et al., 2010). Changes in stored sediment mass estimated from spatial gradients in sediment flux are a useful metric for evaluating the effects of past flow regimes and for testing predictive models that can be used to determine best-practice scenarios for the future. The canyon is  Grams et al. (2013). Data used in this study come from the reach adjacent to the Diamond Creek gage located at river mile 225. divided into five sediment budget reaches, each bounded by monitoring stations on the main stem and major tributaries that estimate total sand load every 15 min (Figure 1). At the time of writing, these records comprise over a decade of almost uninterrupted suspended sediment data that can be used to quantify morphodynamic trends over a range of time scales: Multiyear trends indicate regime-scale adjustment, while short-term variability reflects the transient response to individual or seasonal perturbations in flow strength and sediment supply. Data are available online (https://www.gcmrc.gov/discharge_qw_sediment/). Bedload flux is perhaps the largest source of uncertainty in estimates of total sediment load. At the time of writing, bedload is estimated at all monitoring sites on the Colorado River as a constant 5% of suspended load based on a single set of concurrent measurements of bedload and suspended load . Presently, we aim test this assumption at one site ( Figure 2) and reduce bias in estimates of total load by developing and applying a robust statistical methodology for estimating bedload flux from gage data.

Modeling Approach
The goal of this paper is to predict total mass bedload flux, Q b (MT −1 ), from measurements of water discharge, suspended sand concentration, and suspended sand diameter. To this end, we adopt an empirical power law equation for bedload flux per unit width q b (MT −1 L −1 ) given by Here, q w (L 2 /T) is the average volumetric water discharge per unit width equal to Q w ∕W, where Q w (L 3 /T) is the total volumetric water discharge and W (L) is the surface width of the channel. C s (L/L) is the discharge-averaged suspended sand concentration, D s (L) is median diameter of suspended sand, and A is a dimensional coefficient expressed in terms of fixed reference values for each variable (denoted by the subscript 0) as A = q b0 ∕q 1 w0 C 2 s0 D 3 s0 . Finally, 0 is an intercept term that is equal to 0 if reference values are chosen so that q b = q b0 when q w = q w0 , C s = C s0 , and D s = D s0 .
Equation (1) is purely empirical; however, we consider the form of this expression in the context of existing theory to (1) facilitate qualitative interpretation of our results and (2) support the notion that in-sample fit will extend to out-of-sample predictive accuracy. Forward models for equilibrium sediment transport (Einstein, 1950;McLean, 1992;Molinas & Wu, 2000;Wright & Parker, 2004) encompass the physical interactions that are relevant to this objective and generally involve several computational steps that incorporate various physical and empirical expressions. As an example, Wright and Parker (2004) proposed a computational procedure for estimating C s , D s , and the Shields' stress due to skin friction * s (among other variables) from specified reach-average boundary conditions, which are q w , slope S (L/L), and bed material grain size D b (L). Bedload flux can be computed from * s using an empirical bedload transport formula (e.g., Wong & Parker, 2006). Additional relevant physical parameters that must be specified are often assumed to be constants. These are gravitational acceleration g (L/T 2 ), the kinematic viscosity of water (L 2 /T), and the densities of sediment s (M/L 3 ) and water w (M/L 3 ). In summary, this model approximates three unknown physical equations of the following functional form: (2) Each forward equation has eight variables (seven predictor variables and one response variable) and three physical dimensions and can therefore be reduced to five dimensionless variables (four predictor variables and one response variable) according to the Buckingham Pi theorem (Gibbings, 2011). However, four of the eight physical variables are usually assumed to be constant so one of these dimensionless variables will always be a constant or a linear combination of other variables. Assuming power law forward equations between dimensionless variables, we assert that any choice of dimensionless variables can be rearranged to obtain the following dimensional equations: where 1 , 2 , and 3 are fixed dimensional coefficients that can be expressed in terms of g, , s , and w . This system of equations can then be solved to obtain equation (1), noting that i exponents are simply algebraic combinations of i exponents.
Based on these arguments, we offer the following interpretation of equation (1), leaving further discussion to section 5.2. We assume changes in fluvial sediment transport conditions are driven by changes in q w , S, and D b . By measuring one of these variables (q w ) and two variables that directly respond to changes in these variables (C s and D s ), it is possible to constrain the state of the transport system and predict unknown variables including bedload flux. In this manner, C s and D s are viewed as proxies for S and D b .
As an aside, equation (1) can also be derived by combining simplified relations presented in the canonical sediment transport literature (e.g., Brownlie, 1983;Engelund & Hansen, 1967;Garcia & Parker, 1991;Wong & Parker, 2006); however, many of these relations have empirical origins and thus contain large, unquantifiable uncertainty. Rather than combining a series of existing empirical expressions, we fit i parameters and quantify predictive uncertainty directly; this approach minimizes predictive bias assuming that available data sufficiently capture the underlying physical processes.
The majority of this paper focuses on the development and application of a statistical methodology used to estimate empirical scaling parameters in equation (1) and predict bedload flux. We present an example application at our field site on the Colorado River in Grand Canyon National Park, where estimates of bedload flux obtained from repeat bathymetric surveys of dune migration paired with concurrent gage measurements form the observational basis for statistical analyses. Parameter estimation and prediction is conducted using Bayesian inference. This approach facilitates consistent propagation of uncertainty from multiple sources of information and prediction of distributions for quantities of interest (Schmelter & Stevens, 2012;Schmelter et al., 2011Schmelter et al., , 2015. This approach is particularly useful for propagating uncertainty arising from both measurement uncertainty and parameter estimation uncertainty in calculations of cumulative sediment load . In addition to the data from our site, we also consider data from six other rivers reported by Toffaleti (1968) in order to test generality and improve the predictive power of our model. These data cover a much wider range of discharge, slope, and bed grain size conditions than those that are found at the site on the Colorado River. In order to incorporate these data into the predictive model for bedload flux at our site, we consider three statistical models that are distinguished in principle by their assumptions regarding the universality of scaling exponents and in practice by their treatment of groups in the data. These approaches have advantages and disadvantages to each other relative to the specific modeling conditions and objectives, as well as the quantity and quality of data that are available at a site of interest.

Bayesian Linear Regression
The generalized linear model given by equation (1) has four unknown parameters that must be estimated from a large number of observations of model variables. This system is overdetermined, and no single solution can fit all of the data simultaneously. As a result, it is necessary to employ regression analysis to handle uncertainty and error. Log-transformed variables enable linear regression, which assumes that the ith observation of the response variable log (q b ) i can be expressed as a linear function of the predictor variables log (Q) i , log (C s ) i and log (D s ) i , plus an error term i Perhaps the most common variant of linear regression is ordinary least squares (OLS), which finds the combination of model parameters 0 , 1 , 2 , and 3 that minimizes the sum of the sum of the squared error terms. OLS regression leads to an unbiased predictor of the response variable assuming i is normally distributed and independent across all samples. However, for the purposes of the present research, this approach has several limitations. OLS regression cannot handle hierarchical organizations of data that potentially violate the assumed independence of i , such as when individual observations are grouped by river or site. Additionally, analytical quantification of predictive uncertainty in the OLS framework does not readily allow for propagation of errors through mass-balance calculations.
Bayesian inference provides a convenient framework for overcoming these issues. to linear regression starts with the same assumptions as OLS that are encapsulated by (8). However, we introduce an additional parameter that quantifies the standard deviation of the error term, that is, where the tilde means "distributed as" and  (0, ) is an independent normal distribution with zero mean and standard deviation . Consequently, we aim to draw inference on five parameters: 0 , 1 , 2 , 3 , and .
At this point we note for clarity that the term "variables" refers to measurable physical quantities, while the term "parameters" refers to unknown quantities that appear in the data model and are the object of statistical inference. Henceforth, we use to refer to the 5 × 1 vector of model parameters, that is, Additionally, we use X to refer to the 4 × N matrix of N observations of model variables q w , C s , D s , and q b .
While OLS regression seeks estimates of model parameters that minimize the global sum of the squared residuals, Bayesian model fitting embraces uncertainty associated with the fact that small differences in model parameters may fit the data nearly as well as the optimal result. These small differences are quantified by the likelihood function, which exists on the domain of model parameters assuming fixed observational data X and is denoted by L( |X). Here, the vertical line denotes conditional dependence, that is, the likelihood of given X. The likelihood can be computed for any combination of parameters, where higher likelihoods represent more likely combinations of parameters. Introducing the prior probability distribution P( ), we obtain an expression for the posterior probability distribution of model parameters conditional on observational data P( |X) through Bayes theorem: Once the posterior probability distribution of model parameters is known, unobserved values of q b can be estimated from measured values of predictor variables using Bayesian posterior predictive distributions, which efficiently propagate uncertainty through individual estimates of q b as well as time-integrated mass-balance calculations.

Grouped, Ungrouped, and Hierarchical Model Variations
The basis for equation (1) suggests that it is sufficient to predict bedload flux in any sand-bed river using a single universal set of scaling parameters. However, some degree of predictive uncertainty is inevitable owing to both measurement error and model bias arising from simplification of physical processes. While measurement error can be considered uncorrelated, systematic biases are caused by a failure of the data model to capture specific physical processes and are thus likely to be correlated when conditions are similar. As a result, we anticipate persistent site-specific biases using a general model based on data from many rivers. For example, details of channel geometry not explained by width and slope may cause bedload flux to be more or less sensitive to changes in water discharge at one site compared with the central tendency of all sand-bed rivers. In this case, better predictive accuracy would be achieved at that site by adjusting the value of 1 to reflect this difference. In general, we anticipate better predictive performance if model parameters are constrained on a site-specific basis.
This theoretical consideration is at odds with practical limitations: Regression analysis requires numerous independent estimates of bedload flux that are expensive and difficult to obtain. Thus, it would be advantageous if existing data from many rivers could be used to help inform bedload prediction at a new site. Optimal model parameters may differ slightly from site to site; however, sand-bed rivers are all governed by the same general physical processes such that it is reasonable to expect that scaling parameters should be similar between rivers. In order to balance theoretical and practical concerns, we consider three distinct generative data models, each of which reflects a different trade-off between observational data requirements and assumptions regarding the generality of scaling parameters.
The first model (the grouped model, Appendix B1) assumes a single universal set of model parameters The standard deviation of the error term is the same for all data. All observations are therefore treated as independent observations from the same exchangeable group of observations. The advantage of this model is that it can be applied at a new site without collecting any additional data. However, it ignores the possibility of correlated errors by river or site and is therefore subject to unquantifiable systematic biases when applied at a specific site without local data.  The second model (the ungrouped model, Appendix B2) assigns different independent scaling parameters = [ 0 , 1 , 2 , 3 , ], for = 1, … , m and m = 8 is the number of data groups (i.e., independent sites). This is equivalent to performing grouped regression independently on a site-specific basis: Each site is treated as an independent statistical entity comprising its own exchangeable group of observations. This model is perhaps the most theoretically conservative in that it assumes nothing with regard to physical similarity between sites. However, it is also the least practical in that it requires extensive uncorrelated observations of bedload flux from each monitoring site in order to ensure reliable results and cannot be applied at a site where bedload has never been measured directly.
The third model (the hierarchical model, Appendix B3) assigns different regression coefficients to each site but assumes some degree of physical similarity between sites. Observations are treated as exchangeable on a site-specific basis, and each site comes from an exchangeable group of sites, that is, all sand-bed rivers. We aim to draw inference, not only on the behavior of individual sites but also on the distribution of behaviors that can be observed at different sites. Site-specific coefficients are thus determined partly by data collected at that site but are also informed by the behavior of other rivers, which can reduce issues related to low sample size at one site if sufficient data exist at other sites. Hierarchical organization is implemented through priors for the regression coefficients, which are assumed to be normally distributed with a mean and variance that reflects the central tendency and variability of sand-bed rivers. This model lies somewhere between the grouped and ungrouped models in terms of both theoretical assumptions and data requirements. Some data are useful in order to constrain bedload flux at a new site, but limited observations are utilized to greater effect than in the ungrouped model.

Priors
Diffuse (i.e. wide, minimally informative) priors are commonly used to minimize influence on model results and are employed here for all three model variations. Diffuse priors are effectively constant over the relevant parameter domain, which means that the posterior distribution is essentially reflects a renormalization of the likelihood function, preserving the relative log-likelihoods while ensuring the posterior integrates to 1. Due to the relatively large sample size, our results are not sensitive to the specific choice of diffuse prior.
Grouped and ungrouped regression models were fit using an approximation for Jeffrey's prior, which is an attractive choice due its unique theoretical properties (Christensen et al., 2011;Gelman et al., 1995). Jeffrey's prior is a uniform distribution on the domain ( 0 , 1 , 2 , 3 , log( )), which is an improper prior because it does not integrate to 1. Thus, normal distributions centered on zero with large standard deviations are used to approximate Jeffrey's prior because a normal distribution approaches a uniform distribution as the standard deviation goes to infinity. Jeffrey's prior is also uniform log( ) meaning the prior probability that the parameter is between 0.01 and 0.1 is the same as the probability the parameter is between 0.1 and 1. The inverse gamma distribution approaches a uniform distribution on log( ) as its parameters go to zero.
The hierarchical model structure is implemented through informative, dynamic priors, where the parameters for these priors are referred to as "hyperparameters." Inference is drawn on parameters and hyperparameters simultaneously such that the hyperparameters have their own prior and posterior probability distributions. Priors for hyperparameters, or "hyperpriors," must be specified. Again, we utilized diffuse, minimally informative hyperpriors, the specific choice of which does not influence model results. For additional details on priors and hyperpriors, see Appendices B1-B3.

Model Fitting
All three models were fit using MCMC sampling methods. This technique is commonly used to sample the posterior distribution and conduct predictive simulation when analytical alternatives are cumbersome or impossible. For additional details on MCMC sampling, see Appendix B4 and example workflows (Ashley, 2019b).

Model Selection
Quantitative comparison of predictive power is accomplished using the Deviance Information Criterion (DIC; Gelman et al., 2014;Spiegelhalter et al., 2002; Appendix B6), a generalization of the Akaike Information Criterion that is suitable for comparing the hierarchical and nonhierarchical models used here. DIC includes two terms: one that quantifies in-sample predictive accuracy and one that corrects for model complexity to approximate out-of-sample predictive accuracy under certain assumptions . As a relative measure of predictive power, models with lower DIC are expected to have lower prediction error than models with higher DIC. However, DIC is not a perfect measure of relative prediction error and is reported here ( Table 2) to inform model evaluation rather than as the sole discriminatory factor.

Field Methods
Transport-related data were collected at one field site on the Colorado River in Grand Canyon National Park during three field campaigns in the Spring and Summer of 2015, as well as the Fall of 2016. The site ( Figure 2) is located at river mile 225 in the vicinity of USGS Monitoring Station 09404200 (Colorado River above Diamond Creek near Peach Springs, AZ). Hereafter, we refer to this site informally as "Diamond Creek" or the "Diamond Creek field site." Data include repeat bathymetric surveys of dune migration, ADCP surveys of flow velocity, suspended sediment and bed sediment samples, and bed photographs for optical grain-size analysis (Buscombe et al., 2010). Concurrent gage measurements of water discharge, suspended sand concentration, and grain size were also collected following standard procedures during this time (Topping & Wright, 2016;Rantz et al., 1982).
Estimates of bedload flux were obtained using 320 high-resolution, full-width bathymetric surveys of an approximately 400 m reach adjacent to the Diamond Creek gaging station. Surveys were collected using a 400 kHz Reson 7125 multibeam echo sounder, which produces a swath composed of 512 beams (each 1 × 0.5) across a transverse subtended angle of 135. In order to map sonar returns onto a global coordinate system, the location of the boat was tracked using a robotic Total Station referenced to a fixed position on the bank, and a fiber-optic gyrocompass and inertial sensors were used to calculate heading, roll, and pitch of the sonar head. Patch tests were conducted before the surveys to determine the offset angles and timing latency between the various system components. Bad soundings and sweep misalignments (due to, e.g., systematic side-lobe interference and scattering of soundings by air bubbles, drifting insects, and other organic matter in the water) were identified by manual sweep editing and systematically stepping through overlapping sweeps. Quality assurance assessments were performed after the surveys by comparing selected soundings from all surveys over a large, flat-topped rock located along the channel margin. The mean standard deviation of soundings over   Simons et al. (1965) provide the method by which bathymetric data can be used to generate bedload flux estimates. Their expression is given by where q b (L 2 T −1 ) is the volumetric bedload flux per unit width, p (−) is the bed porosity taken to be a constant 0.35, V c (LT −1 ) is a characteristic bedform migration rate, H c (L) is a characteristic bedform height, and C is a constant of integration assumed to be zero. Measured bedform heights ranged from 0.15 to 0.70 m, and measured migration rates ranged from 0.21 to 1.76 m/hr. Both of these quantities varied predictably with water discharge.
Equation (11) is derived from a statement of mass conservation (the Exner equation, Paola & Voller, 2005) combined with a simplified model for dune evolution characterized by translationally invariant migration of triangular or sinusoidal forms. Although it represents substantial simplifications of physical process (e.g., by ignoring bedform deformation and variability in bedform migration rate and geometry), flume and field studies find good agreement between (11) and other estimates of bedload flux across a wide range of conditions extending from the threshold of bedform development to suspension-dominated dunes (Engel & Lau, 1980;Mohrig & Smith, 1996;Simons et al., 1965;van den Berg, 1987). Consequently, we argue that this expression provides a reasonable estimate of bedload transport that is not captured by acoustic estimates of suspended sand load. Equation (11) was used to compute 55 hourly estimates of average bedload flux (Figure 4). Major elements of this procedure are discussed in Appendix A. Additional details can be found in the documentation of software developed for this purpose (Ashley, 2019a)

Additional Data From Other Rivers
The large river data set presented by Toffaleti (1968) (and derived quantities) is used to supplement limited data from our field site. Flow and transport conditions at each site are summarized in Table 1. This data set comprises a total of 262 concurrent observations of bedload flux Q b , water discharge Q w , suspended sand concentration C s , median suspended sand diameter D s , and channel width W on the Atchafalaya River (n = 60), the Mississippi River in Louisiana (n = 47), the Mississippi River in Missouri (n = 63), the Red River (n = 28), the Rio Grand River (n = 36), the Middle Loup River (n = 9), and the Niobrara River (n = 19). These sites are similar to the Diamond Creek field site in that the predominant bed material is sand; however, they are different in that they are all alluvial rivers (whereas the Colorado River in Grand Canyon  Toffaleti (1968). Note that predictor variables (q w , C S , and D s ) cover a wide range of conditions and are only weakly correlated when viewed collectively. Site-specific correlations are evident, especially between C s and q w .
is a bedrock-confined alluvial river with gravelly and sandy reaches). Our model is based on physical theory describing one-dimensional transport and assumes nothing about channel form. Consequently, it can be applied in rivers that are not fully alluvial as long as the bed material at the site of interest is sand.
Total suspended sand concentration C s and median suspended sand grain size D s were computed from reported grain-size specific suspended sediment concentrations. Bedload flux was computed according to the revised Meyer-Peter & Müller bedload equation (Wong & Parker, 2006) with grain stresses estimated using the Einstein drag partition as reformulated by Garcia (2008). This procedure was also used to compute bedload flux at our study site when flow velocity and bed sediment data are available to check approximate correspondence with estimates of flux from dune migration. Note that here, and throughout, "observations" is used as part of the statistical vernacular to refer to independent samples of variables and implies nothing about how those samples were obtained. This distinction is particularly important here because "observations" of bedload flux are actually computed from depth, slope, grain size, and flow velocity using physically based model. Similarly, observations of bedload flux at Diamond Creek are computed using a physically based model from dune height and velocity.

Data Treatment
The statistical methods employed here assume errors in observations are uncorrelated. However, the 55 hourly estimates of average bedload flux from the Diamond Creek field site were collected over 7 days during which temporal correlation is likely. Unqualified extrapolation of trends in this data set to the full gage record spanning nearly 10 years may therefore produce unrealistic results. In order to mitigate this effect, we use only the first and last measurement from each day (n = 14) in order to estimate model parameters.
The full data set used for statistical analysis comprises a total of 276 observations from eight sites. Data were log-transformed to obtain the linear regression variables q * w , C * s , D * s , and q * b using fixed reference values of each variable ( Figure 5). We chose to use a single reference values for each variable (as opposed to individual reference values for each site) computed as the geometric mean of all 276 pooled observations of each variable, which results in centered (zero mean) log-transformed variables. Other choices may provide additional insight (if, e.g., different physically important reference values are used on a site-specific basis like mean annual discharge or bankfull discharge); however, such analyses are beyond the scope of this paper. Reference values of model variables are given by q b0 = 0.039 kg/s/m, q w0 = 4.35 m 2 ∕s, C s0 = 1.07×10 −4 , and D s0 = 0.13 mm. Channel widths were computed using an empirical power law function of water discharge at the Diamond Creek field site. Reported widths were used at other sites.
Here, we emphasize that the full data set contains observations of bedload flux that were obtained using two very different methods. Bedload was estimated from grain stresses computed using the Einstein drag partition and the Wong and Parker bedload equation for the large river data set reported by Toffaleti (1968), while bedload flux at Diamond Creek was computed using observations of bedform migration. For the purposes of statistical analysis, we assume both methods produce unbiased estimates of bedload flux with comparable uncertainty. Consequently, both methods are treated identically in the context of inference and prediction.

Bedload Fluxes at Diamond Creek
Bedload flux computed from bedform migration is similar to bedload flux estimated as a constant 5% of suspended sand load during the July 2015 survey period, corresponding to the highest water discharges observed (450 to 600 m 3 /s). Bedload fractions are significantly higher during the March 2015 and November 2016 survey periods, corresponding to lower water discharges (275 to 400 m 3 /s). Bedload flux ranged from 0.33 to 8.6 kg/s during the various data collection intervals (Figure 4). The bedload fraction is negatively correlated with suspended sand flux, ranging from as little as 3% to as much as 26% of suspended sand flux.

Inference on Model Parameters
Kernel density estimates of the marginal posterior distributions of model parameters are plotted in Figure (6). The statistical effect of each predictor variable is quantified by the value of the exponent corresponding to that variable. Peaked distributions indicate low parameter estimation uncertainty, and wide distributions indicate high uncertainty. Median parameter estimates are reported in Table 2.
Computed DIC values indicate that the hierarchical model has the lowest expected prediction error averaged across all sites. In order to evaluate the effect of each parameter on predictive power, we computed DIC using permutations of each model involving only two predictor variables. The predictive power using the grouped model is significantly reduced using any of the two-variable permutations. However, we find that the predictive power of the ungrouped model is improved by ignoring D * s (DIC = 298 compared to 382). This indicates that considering D * s does not improve model fit enough to justify the added complexity. Excluding D * s has essentially no effect on the predictive power of the hierarchical model (DIC = 112 compared to 123).

Prediction
Predictive distributions of total mass bedload flux (Appendix B5) were computed using all three models using hourly average measurements of Q, C s , and D s recorded at the Diamond Creek gage from 1 January 2008 to 31 December 2016. This was accomplished by computing full posterior predictive distributions for each gage measurement of model variables. Median predictions are compared against observational data in Figure 7. The full simulated time series of bedload flux, the ratio of bedload to suspended load, and predictor variables are plotted in Figure 8.

Comparison of Model Variations
We have presented three variations on our generalized bedload modeling framework that differ in their assumptions, implementation, and interpretation. Here, we compare model variations in the context of the statistical inference and predictions reported in section 4.
The grouped model most closely encapsulates the physical reasoning presented in section 3.1, which argues that quasi-universal relationships between transport parameters emerge through the processes governing their interaction and equilibration. These relationships comprise three primary modes of variability driven by water discharge, channel geometry, and bed composition. Three predictor variables improve predictive power compared with any two-parameter model permutation, indicating that all three modes of variability are represented in the data.
In principle, the grouped model can be applied at any site to predict bedload flux, including new sites that lack direct observational data. However, while individual predictions are unbiased relative to the full data set, systematic biases exist among groups of measurements that come from a single site; for example, the grouped model underpredicts bedload flux at the Diamond Creek field site (Figure 7). Systematic biases are problematic when computing sediment budgets because they accumulate over time to cause compounded errors.
By considering each site separately, the ungrouped and hierarchical models reduce site-specific systematic biases. They also reflect a restricted scope of physical process: While the grouped model represents quasi-universal physical relationships across many sites, the ungrouped and hierarchical models capture site-specific associations between variables. As a result, we find that two-parameter permutations of the grouped and hierarchical models (ignoring D s ) provide equal or better predictive power than the generalized three-parameter approach. This observation can be explained by the fact that slope is effectively fixed at each site over human time scales in comparison to the differences observed between rivers, reducing the number of principle modes of variability to two. These modes are driven by fluctuations in flow strength and sediment supply, where sediment supply influences fluxes through both "grain size and reach-geometric effects" (sensu Topping, Rubin, & Vierra, 2000;Topping, Rubin, Nelson, et al., 2000). This finding is potentially valuable for sediment monitoring purposes because measurements of C s are significantly easier to obtain than measurements of D s . C s varies by many orders of magnitude and can be measured accurately using single-frequency instruments in a wide range of conditions, while D s requires well-sorted suspended material and two-frequency instrumentation and is only accurate for a small range of grain sizes (Topping & Wright, 2016).
The hierarchical model differs from the ungrouped model in that the site-specific associations between variables are assumed to be similar between sites. Through this assumption, sites with many observations inform prediction at sites with relatively few observations. This effect is most clear at our field site, where few observations (n = 14) lead to spurious point estimates of regression parameters (Table 2) and large uncertainty ( Figure 6) using the ungrouped model. The hierarchical model produces a slightly poorer fit to the data but yields much more precise and consistent estimates of regression parameters.
In summary, each model has a specific set of assumptions, data requirements, and limitations that must be evaluated in order to be applied to a specific problem. The grouped model reflects quasi-universal physical relationships between variables and can be applied at any site without training data but introduces systematic bias to cumulative bedload estimates. The ungrouped model minimizes site-specific, systematic biases and assumes nothing about similarity between sites but requires extensive observational data to be applied at a given site. The hierarchical model reduces the number of observations needed at a site relative to the ungrouped model under the assumption that sites are similar. Grouped and hierarchical models can potentially be applied using only measurements of Q w and C s .
Presently, we aim to compute sediment budgets over the full gage record at the Diamond Creek sediment monitoring station. We argue that the hierarchical model is the best choice for this purpose because it reduces systematic bias but provides efficient use of limited data. Time series predictions made using the hierarchical model are plotted over select intervals in Figures 9-11.

Comparison With Existing Methods for Estimating Bedload Flux
Prior to this research, the two primary methods for estimating bedload flux from gage data in practical applications are (1) rating curves with discharge (e.g., Emmett & Wolman, 2001;Leopold & Maddock, 1953) and (2) constant bedload coefficients based on continuous measurements of C s (e.g., Grams et al., 2013;. To highlight the advantages of the model presented here, we compare simulated bedload time  Note that predicted bedload flux may vary by over an order of magnitude with respect to a fixed water discharge, an effect that is typically attributed to supply-limitation effects. Suspended sand concentration is connected to the supply-limitation state of a reach; here, elevated suspended sand concentrations indicative of fine-sediment enrichment and amplified bedload flux. series with rating curve and bedload coefficient predictions. Several short example intervals were selected for this purpose and are plotted in Figures 10 and 11. Both approaches are special cases of our general model (equation (1)), wherein certain parameters are fixed. For example, rating curves express bedload flux as a power law function of water discharge, that is, which is similar to equation (1) with null coefficients on suspended sand concentration C s and diameter D s : Assuming width scales with discharge (W = aQ b w ), this reduces to Here, k = Ae 0 a 1− 1 and m = 1 +b(1− 1 ) are assumed to be constant. For the purposes of comparing rating curve and hierarchical predictions, rating curve parameters (k and m) were found using OLS regression applied to concurrent observations of water discharge and bedload flux obtained at the gaging station and from repeat surveys of dune migration, respectively. By specifying 2 = 0 and 3 = 0, rating curves assume a unique relationship between bed composition, channel geometry, and discharge, which is problematic because sediment supply limitation is known to modify the transport efficiency of a given discharge through reach-geometric and grain size effects (Topping, Rubin, & Vierra, 2000;Topping, Rubin, Nelson, et al., 2000). Sediment supply variability can thus cause systematic deviations from rating-curve predictions; pulses of fine bed material result in an enriched state characterized by increased bedload flux. Subsequent preferential evacuation of fine material produces a depleted state during which bedload flux is suppressed relative to a hypothetical discharge rating curve prediction ( Figure 12). Our modeling approach provides the potential to capture the effects of sediment supply limitation parameterized by C s and D s . As a result, we interpret the difference between hierarchical model predictions and rating curve predictions as an indicator of the relative supply-limitation state of the Diamond Creek sediment monitoring reach: A positive difference is indicative of relative enrichment of fine sand, whereas a negative difference is indicative of relative depletion.
Such enrichments or depletions are particularly pronounced during and after controlled flood experiments ( Figure 10). For example, the period following each controlled flood typically records finer suspended sand grain sizes and elevated suspended sand concentrations relative to antecedent conditions, indicating fine-sediment enrichment . This is perhaps caused by delivery of fine material accessed above the typical high water line and/or the reworking of existing alluvial deposits in a manner that increases transport efficiency. Hierarchical model predictions are correspondingly elevated relative to rating curve predictions following each controlled flood.
Bedload coefficients are sometimes used to account for the contribution of bedload to total load in scenarios where measurements of suspended flux are available and bedload is thought to be small (e.g., Grams et al., 2013). In order to estimate total load, researchers sometimes apply a universal correction factor 1 + to measurements of suspended sand flux Q s , which implies Noting that q s = q w C s , equation (15) is a special case of the our general bedload model (1) wherein 1 = 1, 2 = 1, and 3 = 0, that is, Here, = Ae 0 is the constant bedload coefficient. In some sense, this expression represents a crude attempt to account for supply limitation effects by assuming bedload and suspended load are equally sensitive to changes in their mutual causal predictors (water discharge, channel geometry, and bed composition). However, suspension conditions (parameterized by the Rouse number, Z R = w s ∕ u * , where w s is the particle setling velocity, u * is the basal shear velocity, and is von Karman's constant) vary with flow strength and sediment supply and are the most important predictor of (van Rijn, 1984, Equation 45). Insofar as the Rouse number may vary over time at a site, it is unreasonable to expect that the bedload fraction should remain constant. Instead, increasing Z R should generally cause an increase in . This may occur due to changes in u * (as a function of water discharge, channel geometry, and bed roughness) or due to changes in w s , which is a monotonically increasing function of D b .
Comparison of hierarchical model and bedload coefficient predictions reveals several expected behaviors. In general, elevated suspended sand fluxes tend to correspond to increased suspension conditions (low Rouse numbers) and low bedload fractions ( Figure 13). Bedload flux is a larger fraction of total load when discharge is low, corresponding to higher Rouse numbers due to decreases in u * . Sediment supply depletion also increases the bedload fraction when discharge is held constant, corresponding to higher Rouse numbers due to increases in w s . Discharge effects are most pronounced before, during, and after controlled flood experiments (Figure 10), which exemplify both the high and low bedload fraction extremes. Supply limitation effects are evident during long periods of nearly constant discharge, which tend to be associated with gradual sediment-supply depletion ( Figure 11). Gradual depletion causes a concurrent increase in the bedload fraction, which is also apparent after the 2014 controlled flood experiment ( Figure 10).

Management Implications
Sediment budgets are used to estimate changes in stored sediment mass over a wide range of time scales. Short-term effects of interest include perturbations related to dam-regulated water discharge or tributary sand delivery. Serendipitously (in the context of 5% bedload coefficients used by Grams et al., 2013;Topping et al., 2010), we find that cumulative bedload discharge was approximately 5% of the cumulative suspended sand discharge over the 9 year record considered here. However, instantaneous bedload flux ranges from less than 1% to as much as 75% of suspended sand load depending on water discharge and the supply-limitation state. As a result, short-term mass-balance fluctuations caused by experimental changes in discharge regime (i.e., controlled floods), transient accommodation of tributary sand pulses, or prolonged periods of constant discharge are not adequately represented using a constant bedload fraction or rating curve model for bedload flux. For example, cumulative bedload during controlled floods is only 2% of cumulative suspended load during the same intervals, whereas the cumulative bedload is 10% of cumulative suspended load during period when flow is below the mean annual discharge. In general, the magnitude of deviations in short-term average bedload fraction from a measured long-term average is a function of averaging time scale (Figure 14) Over longer time scales, researchers aim to constrain the effects of changes in the water discharge or sediment delivery regime as dictated by dam protocols, climate, and land use in the upper Colorado River basin (e.g., Andrews, 1991;Grams et al., 2013Grams et al., , 2015Kasprak et al., 2018;Mueller et al., 2014Mueller et al., , 2018. In particular, the dam-regulated water discharge regime is the primary tool for enacting management decisions aimed at balancing ecological, social, and economic goals. Nearly three decades of Grand Canyon research suggests that a return to a more natural, seasonal discharge regime would induce a desirable geomorphic response. Actionable proposals like the "Fill Mead First" plan  are designed to balance this and other management objectives by changing the annual cycle of dam releases, and flux-based sediment budgets are critical for accurately evaluating the effects and effectiveness of such plans. However, the intended geomorphic response will necessarily involve changes in channel geometry and bed composition, affecting sediment flux in a manner that cannot be tracked using traditional rating curve or bedload fraction approaches. Measurements of q w , C s , and D s are indicative of changes in q b , and such that it is possible to resolve short-term morphodynamic adjustment and evaluate the effects of future changes in the water discharge and sediment supply regime.

Other Applications of Modeling Approach
Bedload has historically been difficult to measure directly. As a result, its role in governing large-scale river organization is poorly understood. Although this paper focuses on estimating bedload on the Colorado River, the modeling approach presented herein will enable improved estimates of bedload flux in any sand-bedded river. Our model can be applied retroactively to innumerable historical measurements of suspended sediment concentration and grain size, providing a new approach for connecting bedload transport to continentand basin-scale river dynamics.
This work also supports a more general principle that extends beyond the problem of estimating bedload flux. We have argued that our bedload model provides reliable predictions because it approximates quasi-universal relationships between transport parameters emerge through the processes governing their interaction and equilibration. In this view, first-order changes in flow and transport conditions including bedload flux, suspended sand concentration, and suspended sand diameter are driven by three variables: water discharge, slope, and bed material grain size. This implies that any relevant variable can be estimated from measurements of three other variables, providing a general formula for constructing predictive empirical relations in sandy fluvial systems. This strategy may prove useful for reconstructing hydraulic and transport conditions in scenarios where certain variables are difficult or impossible to measure, for example, in applications involving remotely sensed river data or measurements of fluvial sedimentary rocks.

Conclusions
The modeling approach presented here was developed to estimate reach-averaged bedload flux from measurements of water discharge, concentration, and grain size in suspension. This approach is based on the assumption that most of the variability in sand-bed rivers can be reduced to three principle modes of variation that are causally attributed to water discharge, slope, and bed grain size. Measurements of concentration and grain size in suspension provide reliable proxies for the effect of slope and bed material grain size on bedload flux.
Bayesian hierarchical modeling assumes similarity between rivers to ensure efficient use of limited data. This approach reduces in-sample bias compared with a fully grouped regression, and it improves parameter estimation precision compared with the ungrouped regression. However, we anticipate that the general modeling approach presented here may prove useful in other contexts for which grouped or ungrouped generative data models may be preferable.
We find that predicted bedload flux during the period from 2008 to 2016 averaged over the full gage record at Diamond Creek is approximately 5% of the measured suspended sediment load. However, instantaneous values deviate significantly from 5% depending on flow strength and sediment supply conditions. Notably, changes in bedload flux at a constant water discharge are indicative of short-term sediment supply enrichment and depletion. Using the median prediction from the hierarchical model, we find that bedload flux ranges from as high as 75% of suspended sand load (during fine-sand depleted, low-discharge periods) to less than 1% (during fine-sand enriched floods). The decade-average bedload fraction is expected to deviate systematically from 5% in the future if bed composition and channel geometry evolve due to changes in tributary sand supply or the dam-regulated discharge regime. In order to ensure accurate quantification of fluctuations in sediment storage over a range of time scales, it is critical to account for deviations in the ratio of bedload to suspended load driven both by individual events (e.g., high flow experiments or tributary floods) and long-term evolution of channel geometry and bed composition.
1. Flow direction is determined by inspection, and point clouds are transformed to streamwise and cross-stream coordinates. 2. An upstream and downstream extent is chosen to bracket a region of the bed used for computation of flux. The region used here is the largest region where the margins of the bedform field are parallel and bedform geometry appears to be uniform in all surveys. 3. Point clouds are divided by cross-stream coordinate into streamwise oriented transects spaced at 25 cm. 4. Ungridded points that fall within each 25 cm wide transect are gridded at a 10 cm streamwise resolution using a locally weighted nonparametric filter. 5. Transects are detrended using a high-pass Fourier filter. The filter wavelength used here is 3 times the largest dune length determined by inspection. 6. Characteristic bedform height is estimated as 2 √ 2 * where is the root mean squared detrended bed elevation (McElroy, 2009). 7. A matrix of dune displacements (determined from the maximum of the cross-correlation function) is computed for each transect using every pair of surveys. Valid displacements are retained to calculate migration rate according to the following criteria: (a) Temporal separation is not greater than 1 hr, (b) displacement is not greater than 20% of the bedform length, determined from the spectral centroid of the detrended bed profile (van der Mark & Blom, 2007), (c) the maximum of the cross correlation function is not less than 0.8, and (d) the implied migration rate (displacement divided by temporal separation) is not greater than 3 m/hr and not less than 0.3 m/hr. These criteria optimize temporal resolution and stability of the bedload flux calculation and reliably discriminate transects with active dune evolution from plane-bed topography. 8. Bedform migration rate is computed for each transect using OLS regression forced through the origin with all valid displacements. 9. Volumetric bedload flux per unit width is computed for each streamwise transect using the bedform bedload equation (Simons et al., 1965). 10. Total bedload mass flux was computed for each transect by multiplying unit bedload flux by the transect width (25 cm) and the density of quartz (2,650 kg∕m 3 ) and then summed.
We find that the bedform migration rate regression using displacements forward and backward in time is necessary to ensure stable results. However, this means that bedload flux estimates are derived from overlapping data. Down sampling is thus necessary to ensure that each reported value of bedload flux is computationally independent: We consider a maximum temporal resolution of 1 hr. Results are plotted in Figure 4.

Appendix B: Bayesian Regression
Here, we provide additional details on the statistical techniques employed in this paper. In order to make this explanation more clear, we adopt notation that is common in statistical literature (e.g., Christensen et al., 2011;Gelman et al., 1995). We consider the problem of predicting a continuous response variable from a vector of predictor variablesx = [1, x 1 , x 2 , x 3 ]. The relationship between predictor variables and response variables is studied using a probabilistic model with parameters for the data generating process.
Physical variables of interest are log-transformed and normalized to obtain linear predictor and response variables such that = log(q b ∕q b0 ), x 1 = log(Q∕Q 0 ), x 2 = log(C s ∕C s0 ), and x 3 = log(D s ∕D s0 ), and the subscript "0" denotes the geometric mean of all observations, which is equivalent to subtracting the arithmetic mean of log-transformed variables and results in centered response and predictor variables. This is a convention that facilitates interpretation of the intercept term 0 . The subscript i denotes a specific observation such that i andx i are the ith of n observations of response and predictor variables, respectively. A capital X is short hand for all observations of model variables, that is, X = (x 0 , … ,x n , 0 , … , n ). Finally, we usex i to denote a vector of observations of predictor variables for which we intend to predict an unobserved value of the response variable,̃i.

B1. Grouped Model
The grouped model ignores potential correlations that may exist on a site specific basis. All data are pooled into a single normal linear regression analysis. Regression coefficients and errors are assumed to be equivalent at all sites. Here, = [ 0 , 1 , 2 , 3 ] is a 1 × 4 vector of regression coefficients. The ith observation of the response variable i is modeled as a linear function of predictor variables plus a normally distributed independent error term (e.g., equation (8)). This is equivalent to specifying that the i follows a normal distribution with meanx i and standard deviation . Formally, the probability of observing i givenx i , , and is given by and the likelihood of model parameters = ( , ) conditional on all observational data X = (x 0 , … , x n , 0 ... n ) is simply the product of the probabilities of each individual observation: For the grouped model, we employ the following independent priors for model parameters: Here,  ( , ) denotes a normal distribution with mean and standard deviation , and Γ −1 ( 1 , 2 ) denotes the inverse gamma distribution with shape parameter 1 and scale parameter 2 . Since the marginal priors are independent, p( , ) = p( 0 )p( 1 )p( 2 )p( 3 )p( ). These priors approximate Jeffrey's prior for normal linear regression which is a uniform distribution on ( , log( )) (Christensen et al., 2011;Gelman et al., 1995).
The posterior probability distribution of model parameters given data X is proportional to the product of the likelihood function and the prior: where the constant of proportionality [∫ L( |X)P( )d ] −1 ensures that the posterior integrates to 1.

B2. Ungrouped Model
The ungrouped model involves fitting separate regression models for each site. Henceforth, the subscript = 1, ...m denotes the th of m = 8 sites. = [ 0 , 1 , 2 , 3 ] is thus the vector of regression coefficients corresponding to site , and is the standard deviation of the error term at site . The full data model thus contains 4 × m regression coefficients and m error terms, totaling 40 parameters compared with the five parameters used in the ungrouped model.
Each site has a different number of observations, n . The probability of observing ith of n observations of the response variable at site , i, given x i, , , and is given by

B4. MCMC Sampling
Posterior distributions for model parameters were constructed using the No-U-Turn sampling algorithm (Hoffman & Gelman, 2014), as implemented in the open source Python package, PyMC3 (Salvatier et al., 2016). The sampler was initiated using the automatic differentiation variational inference algorithm (Kucukelbir et al., 2017). Three chains were used, and 1,000 burn-in steps were more than sufficient to achieve convergence. The posterior distribution of model parameters was approximated using 5,000 steps without thinning.

B5. Prediction
Once the posterior probability distribution of model parameters is known, unobserved values of the response variablẽi can be estimated using Bayesian posterior predictive distributions. The posterior predictive density P(̃i|x i , X) is found by integrating the sampling distribution of̃i given a specific set of parameters, p(̃i|x i , ), against the posterior distribution of model parameters, P( |X): This distribution is straightforward to compute numerically using MCMC techniques. In addition to predicting single unobserved values of q b , it is possible to obtain a simulated predictive distribution for any conceivable quantity that can be expressed as a function of model parameters (e.g., time-integrated bedload flux).

B6. DIC
The DIC is a measure of relative predictive power that reflects the trade-off between goodness of fit and parameter estimation precision Spiegelhalter et al., 2002). It is used here instead of other more well-known model selection criteria like the Akaike information criterion or the Bayesian information criterion because unlike Akaike information criterion, it is suitable for comparing the hierarchical and nonhierarchical models considered here, and unlike Bayesian information criterion, its intended use is for comparing expected out-of-sample predictive accuracy under the assumption that the data model is correct.
DIC uses the log-likelihood log L( |X) of different models to compare expected out of sample predictive accuracy. Models that achieve higher values of the likelihood function provide better in-sample fit. The log-likelihood of the posterior mean parameter estimate log L(̄|X) is used here to quantify model fit. For clarity,̄= E( |X) is the posterior mean parameter estimate.
More complex models may lead to higher log-posterior densities and better in-sample fit at the cost of parameter estimation precision. In other words, a much wider range of model parameters provide a good fit to the data such that it is difficult to select optimal values. For models that are too complex, predictive uncertainty is primarily related to uncertainty in model parameters rather than being directly quantified by the noise term ( in the models presented here). It is thus necessary to introduce a correction factor that accounts for parameter estimation uncertainty. Here, the effective number of parameters p DIC = 2var post (log L( |X) is framed in terms of the posterior variance in the log-likelihood and can be computed by taking the variance of MCMC sampled log-likelihoods.
The expected log predictive density is given by elpd = log L(̄|X) − p DIC . Assuming predictive error is normally distributed, the expected log predictive density is proportional to the mean squared error. DIC is a related to the expected log posterior density by a factor of −2 due to convention: For additional details on the derivation and interpretation of DIC, see Spiegelhalter et al. (2002) and Gelman et al. (2014).