Simulating Debris Flow and Levee Formation in the 2D Shallow Flow Model D‐Claw: Channelized and Unconfined Flow

Debris flow runout poses a hazard to life and infrastructure. The expansion of human population into mountainous areas and onto alluvial fans increases the need to predict and mitigate debris flow runout hazards. Debris flows on unconfined alluvial fans can exhibit spontaneous self‐channelization through levee formation that reduces lateral spreading and extends runout distances compared to unchannelized flows. Here we modify the D‐Claw shallow flow model in two ways that are hypothesized to generate levees. We evaluate these modifications with observations from a large‐scale flume experiment. We investigate model performance when including the effect of two different friction sub‐models, as well as the inclusion of segregation effects on granular permeability. Results show that, for a wide range of plausible model input parameters, simulations including the effects of segregation promoted modeled levee formation, whereas simulations without the effects of segregation did not create levees. Further, using a forward predictive framework, simulations with the effects of segregation were more likely to better model the magnitude of debris flow depth and runout distance, whereas simulation timing of the debris flow was affected by the choice of friction sub‐model. Our results indicate that including the effects of segregation on granular permeability can improve the likelihood of better predictions of debris flow depth and runout prior to an event occurring.

Observations of debris flows point to the important role of segregation of heterogeneous material in levee formation, where coarse-grained debris segregates to the surface of the flow and is preferentially advected to the front of the flow. The coarse-grained debris then slows and is pushed aside by the more mobile finer-grained debris flow tail (Iverson, 2003;Johnson et al., 2012). Higher pore-fluid diffusivity of the coarse-grained debris decreases remobilization of the debris, generating static levees along the lateral perimeter of the debris flow (Major & Iverson, 1999). However, less pronounced levee formation is also observed in monodisperse (and therefore non-segregating) dry granular flows in which no pore fluid effects are present (Félix & Thomas, 2004). In two-and three-dimensional models of dry granular flow, a hysteretic friction coefficient, which changes depending on the state and history of the flow, has been shown to simulate levee formation (Edwards et al., 2017;Rocha et al., 2019). Studies have demonstrated levee formation resulting from segregation in debris flow models (e.g., Johnson et al., 2012), but have not investigated hysteretic friction as a modeled mechanism for levee formation in shallow flow models.
Levee formation in self-channelizing granular flows consist of two stages (Félix & Thomas, 2004). First, when the flow head passes, the entire width of the flow is composed of mobile material. The depth of the flow can be almost uniform or, typically for narrower flows, have a cross-flow varying depth that is deepest at the center of the flow. Second, after the head of the flow passes, the outside edges of the flow come to rest, forming static levees along the edges of the flow. The depth of the levees is similar to that of the flowing material. Levee formation can continue until the granular supply is exhausted. Lastly, the flowing tail material can erode remaining in-channel debris resulting in levee walls that are deeper than in-channel deposits (Major, 1997). However, if channel erosion does not occur, such as in cases when tail flow has low volume, the channel depth remains of the same magnitude as the levee walls (Iverson, 1997;Johnson et al., 2012).
We used the D-Claw two-phase shallow flow model George et al., 2022; (described in further detail in Section 2.1) to explore a range of input parameters for different sub-model combinations of granular friction (Coulomb and hysteretic) and segregation (using two permeabilities or no segregation) to simulate a debris flow through an initially confined channel onto an unconfined gentle slope. We use the term sub-model to describe a method to model a physical effect (i.e., friction or segregation) that fits within the wider D-Claw model framework. We used a range of input parameters to evaluate which combination of sub-models is most likely to produce a better result in a forward predictive analysis. This differs from previous studies in which all input parameters were fixed based on physically measured material properties to evaluate the model's reproduction of experimental data (e.g., . Prior to this study, the D-Claw model included a static Coulomb friction angle and a lateral transport model (Gray & Kokelaar, 2010) to spatially distribute segregated debris material. However, feedback from the effects of segregation on material properties (i.e., changes to frictional behavior, permeability, density) was not previously modeled. Importantly, the effects of segregation on the local material properties of flowing material changes the local rheology of the flow. Therefore, the two potential mechanisms of levee formation described above, hysteretic friction and the effects of segregation, have not been included in previous D-Claw studies.
Our simulations were compared to published data from a debris flow experiment performed at the U.S. Geological Survey (USGS) debris flow flume near Blue River, Oregon (Logan et al., 2018;Rengers et al., 2021). We evaluated simulation results for several performance metrics of three different simulated behaviors using each sub-model combination and a wide range of material property parameters. For our evaluation, we asked three questions, first, without comparison to matching flow behavior, did the sub-model combination allow for levee formation? Second, for a forward predictive framework where precise material parameters would not be known, how well did the sub-model combination recreate the magnitude and time of arrival of peak depth within the flume channel over the range of input parameters? Third, how well did the temporal evolution of the simulated runout distance and lateral spread, as well as the deposited inundation area, match the experimental deposition on the unconfined runout pad? The performance metrics were chosen because they represent easily observable behavior, either during a debris flow or through post event analysis. The ability to predict these metrics is important in determining the hazard a debris flow presents by answering the question, "how far will the debris flow go?" Furthermore, the goal of this paper is to evaluate if certain sub-model combinations are likely to perform better than other combinations in a forward predictive framework. Because debris flow composition strongly affects runout (e.g., de Haas et al., 2015) and because of input parameter uncertainty, especially for predictive modeling, it is necessary to know how each combination of sub-models performs over a range of input parameters. Even if one sub-model combination performs well with one set of parameters, it may perform relatively poorly with other possible parameter combinations. Using a sub-model combination that performs well with a large range of parameters enables the debris flow model to be applied, such as in a post-fire debris flow hazard assessment type scenario, without necessitating material property calibration to the specific study location.

Methods
We extended the D-Claw model from previous applications (e.g., George et al., 2017; to include an additional friction sub-model, a hysteretic friction representation (Rocha et al., 2019), and the effects of segregation on material properties through the combination of two different values of debris permeability. These two additions to the model are independent, allowing for the effects of each sub-model to be explored independently and in combination. For each of four D-Claw sub-model combinations (static friction angle with no segregation, hysteretic friction angle with no segregation, static friction with segregation, and hysteretic friction with segregation), we evaluated a range of input parameters for model performance. This differs from previous studies in which inputs were calibrated such that no parameter adjustment was needed (e.g., George et al., 2017;. Although better results can be achieved with calibrated input parameters, in a forward predictive framework, such as evaluated potential debris flow hazards, the best performing parameters are not known. All D-Claw simulations were performed on the USGS Denali supercomputer (U.S. Geological Survey Advanced Research Computing, 2021).

D-Claw Model
D-Claw is a two-phase, depth-averaged shallow flow model for fluid-granular flows, (George et al., 2022). The continuum model accounts for the coupled evolution of five dependent variables over irregular topography by solving hyperbolic equations. Importantly, the model includes the feedback between granular phase dilation and pore-fluid pressure, this feedback affects the apparent friction and mobility of the flow (Iverson, 2005;. A detailed derivation of the D-Claw model and the numerical implementation is given by Iverson and George (2014) and George and Iverson (2014). Below, we provide a brief description of the model governing equations and the modifications to the source terms that are implemented by our new sub-models.

Governing Equations
D-Claw solves five coupled equations, Equations 1a-1e, for five dependent variables: flow depth, h (L); depth averaged x-and y-velocity components, u (L T −1 ) and v (L T −1 ), respectively; depth averaged solid volume fraction, m (-); and the basal pore-fluid pressure, p b (M L −1 T −2 ). For simplicity, the spatial and temporal dependence of each variable, that is, ( ) , is omitted.
The bulk density, ρ (M L −3 ), depends on the solids density, ρ s , and the fluid density, ρ f , as The depth-averaged granular dilation rate, D (L T −1 ); is a function of the hydraulic permeability of the granular material, k (L 2 ); pore-fluid viscosity, μ (M L −1 T −1 ); and difference between the basal pore-fluid pressure and the hydrostatic fluid pressure (at the base of the flow) as The permeability varies with changing m by the empirical relationship, The dilation rate, D, is present in each of the governing equations, Equations 1a-1e, and accounts for the change in debris bulk density caused by the evolution of the solid volume fraction.
The shear basal tractions for the solid and fluid phases, τ s (M L −1 T −2 ) and τ f (M L −1 T −2 ), respectively, are where Φ (-) is the granular basal friction angle, Ψ (-) is the granular dilatancy angle that evolves as a function of m, σ n (M L −1 T −2 ) is the total normal stress, and ̂ is the unit vector in the direction of the flow velocity. For stationary debris, τ s is bound by the resistive force to maintain equilibrium.
The elastic compressibility of the granular component, α (-), empirically evolves as, where a and σ 0 are constants.

Friction Sub-Models
The effect of granular friction is modeled by the basal shear traction term, τ s , in Equations 1b and 1c. We extended D-Claw to include a second/alternative friction sub-model for Φ to evaluate if the new friction model alone would allow for levee formation. The existing sub-model uses a constant Coulomb friction angle and the new sub-model uses a Froude number, Fr (-), dependent hysteretic friction angle. The constant Coulomb friction angle, as described in Iverson and George (2014), uses a single material parameter for friction whereas the new sub-model uses three.
The frictional hysteresis sub-model varies the effective basal friction angle Φ e based on the local Froude number, Fr (Edwards et al., 2017;Rocha et al., 2019). The hysteretic behavior is modeled by splitting the effective basal friction angle into dynamic, Φ d(ynamic) , intermediate, Φ i(ntermediate) , and static, Φ s(tatic) , regimes as where the local Froude number, depends on the velocity, depth, and local slope angle, θ (-). The transitional Froude number between the intermediate and dynamic regimes, Fr ⋆ , is material dependent as where experimentally obtained, material dependent constants Λ (-) relates finite versus infinite depth flow behavior (Edwards et al., 2017), β (-) relates flow depth and Froude number, and Γ (-) accounts for non-spherical particles (Forterre & Pouliquen, 2003). For this study, the material properties Λ = 1.34, β = 0.65, and Γ = 0.77 are held constant using established values for dry sand (Rocha et al., 2019;Takagi et al., 2011). The transitional Froude number for dry sand, above which the flow is in the dynamic effective friction range, is Fr * = 0.1.
Although the values for Λ, β, and Γ may be different for wet sand, because we are looking at the performance of the sub-model combinations over a range of other parameters, we assumed that different values for Λ, β, and Γ would not change the overall conclusions of the general performance of the different sub-model combinations.
The effective friction angle depends on Fr and h as well as the angles ζ 1 (-), the minimum angle at which steady uniform flow can be maintained, ζ 2 (-), the maximum angle for steady uniform flow, (Forterre & Pouliquen, 2003;Pouliquen, 1999;Pouliquen & Forterre, 2002), and ζ 3 (-), which corresponds to the minimum angle of spontaneous flow motion for an infinitely deep static layer (Daerr & Douady, 1999;Pouliquen & Forterre, 2002). The maximum angle for steady uniform flow, ζ 2 , is equivalent to the static Coulomb friction angle. The effective friction angle in each regime is given by where  = 2 (L) is a length scale assumed to be twice the mean particle diameter, d (L). The effective static friction angle, Φ s , is the minimum friction angle that balances the other forces acting on the material with a maximum value defined by the first argument of the min(imum) function. Note that Rocha et al. (2019) used static, dynamic, and intermediate friction coefficients that we converted to the equivalent friction angles (ζ 1 , ζ 2 , and ζ 3 ) to fit within the framework of the D-Claw model. The values of ζ 1 and ζ 3 were chosen relative to ζ 2 (see Table 1). The intermediate friction coefficient is a Fr dependent linear interpolation between the maximum static friction Note. Fixed D-Claw input parameters can be found in the sample D-Claw input file (setrun.py) included in the Supporting Information S1. coefficient and the minimum dynamic friction coefficient, when Fr = Fr ⋆ (Edwards et al., 2017(Edwards et al., , 2019Rocha et al., 2019). An example of the effective friction angle, Φ e , versus Fr for different fixed depths, h, is shown in Figure 1. The figure shows the effective friction decreases with increased depth and that the effective friction increases with Fr in the dynamic friction regime. Note that the static Coulomb friction angle is recovered when The effect of this friction model is that moving material can have a lower effective friction angle than static material and that moving material can continue to move on gentler slopes than those required for incipient motion.

Segregation Sub-Model
Granular segregation occurs when a heterogeneous mixture of discrete elements undergoes shear (Gray, 2018;Umbanhowar et al., 2019). Segregation is the de-mixing of material, for example, when soils undergo size segregation, sorting from well graded to poorly graded. Segregation can occur as a result of many material property differences including size, density, shape, elasticity, and surface roughness. To maintain generality and not limit the representation of segregation to one material property difference, materials with different properties are typically referred to as "species," and a heterogeneous mixture can be described as having two or more species. Importantly, the materials of two different species can have overlapping distributions of material properties, including those that lead to a segregated state (Gao et al., 2021). The rate of segregation is non-linear with concentration (Gajjar & Gray, 2014;Jones et al., 2018) and depends on the magnitude of the difference between the material properties (Jones et al., 2020Schlick et al., 2015).
In a framework of dense gravity-driven flows, segregation vertically separates and advection spatially distributes the different species. In a vertical velocity profile, where material at and near the surface has a higher velocity than material near the bottom of the flow, species that segregate upwards toward the free surface is advected toward the flow front, and species that segregate downwards accumulates toward the back of the flow. We extended D-Claw to include material properties for two segregating species. The previous version of D-Claw included a lateral transport equation to track the spatial evolution of the species volume fraction, χ (-), of a two species debris flow (Gray, 2013;Gray & Kokelaar, 2010), which was included in the publicly available code but not presented in any publications. The unchanged lateral transport equation for simple shear takes the form (in the x-direction) This form of the lateral transport equation assumes simple shear and that the flow has instantaneous vertical segregation.
The validity of an instantaneous segregation assumption can be evaluated relative to two dimensionless parameters, the ratio of the advection timescale to the segregation timescale, Ω = SL/h 2 , and the Péclet number, which is the ratio of the advection timescale to the diffusion timescale, Pe = 2qh/DL (Fan et al., 2014). We explore these parameters using a range of values for the length scale, 0.1 ≤ L ≤ 20 m, which represents the length of a mesh cell to ≈1/4 the flume length; the flowing layer depth, 0.2 ≤ h ≤ 0.5 m, which are the range of depths observed in channel flow; the segregation length scale 5 × 10 −5 ≤ S ≤ 2 × 10 −4 m (Jones et al., 2020Schlick et al., 2015) for a reference particle diameter, d, of 0.25-1 mm; a 2D flow rate, 1 ≤ q = hv ≤ 5 m 2 /s; and a diffusion coefficient, 1.25 × 10 −6 ≤ D ∝ (v/h)d 2 ≤ 2.5 × 10 −5 m 2 /s (Fan et al., 2014). The non-dimensional numbers then range as 2 × 10 −5 ≤ Ω ≤ 0.1 and 800 ≤ Pe ≤ 4 × 10 7 . The low Ω and high Pe values correspond to a length scale of a single computational mesh cell and indicate that the surface of the flow will be strongly segregated, but below the surface the flow will be remain somewhat mixed, with the segregation increasing the farther the flow travels. Increasing the length scale in Ω further increases the segregation, whereas the lower end of the range of Pe is not likely to substantially decrease the segregation intensity. These values indicate that the distance a debris flow travels before being fully segregated is short relative to the length of the channel (Deng et al., 2018;Fan et al., 2014).
The effective material properties for a mixture of two species, labeled A and B, was modeled by a linear combination of species property based on the species concentration as † = † + (1 − ) † , where † is a placeholder for any material property (e.g., density, permeability, friction), the subscripts e, A, and B denote the effective bulk, species A, and species B material property, respectively. In bidisperse flows, the species volume fraction, χ, is only needed for one species as the volume fraction for the other species is 1 − χ.
In the experiments outlined below, the material properties of the two species differ only in their permeability, k 0 . Although all of the material properties may vary between species, we only varied the permeability because it is the most influential material property of all of those that we varied . Therefore, only the partial effects of segregation on material properties are modeled in this study. For simulations in which the effects of segregation are not considered, the initial species volume fraction, χ 0 , is set to one and the permeability of each species are equal so no effects of segregation could occur.

Simulated Experiments
All simulations were performed using the geometry of the USGS experimental debris flow flume located at the H. J. Andrews Experimental Forest near Blue River, Oregon (see Figure 2 and Iverson et al., 2010). The flume consists of a straight, rectangular concrete channel uniformly sloped at 31°. The flume channel is 80 m long, 2 m wide, and 1.2 m deep. The bottom of the flume channel, from 6 to 79 m in length, was covered in regularly spaced, 5 cm apart, 3 cm diameter, 16 mm high bumps that increased the effective roughness of the channel. The flume channel discharges onto an 25 m long, unconfined concrete runout pad that has an gentle slope of 2.4°, away from the end of the flume. The simulations did not limit the length of runout to be confined to the length of the pad, allowing the runout to extend as far as the debris flow remained mobile. The last 8.5 m of the flume channel follows a catenary curve to provide a smooth transition to the runout pad. The flume also has a small lateral, cross-flume tilt that varies through the length of the flume with a maximum of approximately 2°. The experimental debris flow flume was designed to develop a one-dimensional like flow through the flume channel and allows lateral spreading once debris discharges onto the runout pad (Iverson et al., 2010).

10.1029/2022EA002590
8 of 20 The computational geometry used in our simulations was generated to match the flume geometry but excludes the lateral tilt and the roughness increasing bumps. The x-axis was aligned along the inside corner of the flume channel wall, the y-axis was in the cross-channel direction, and the origin was aligned with the inside corner of the flume wall at the upstream end of the flume. The end of the flume walls was at x = 72 m. All reported depth values excluded the elevation of the flume and runout pad, representing just the depth of the flow. The debris was initialized as a triangular prism with a flat top and the front was located at x = 0 m. The bottom surface roughness in D-Claw was controlled by a Manning's n coefficient. All simulations were performed with a mesh resolution of 10 cm and the computational mesh was aligned with the flume such that the edge of a mesh cell was located along the inside corner of the flume walls. D-Claw allows for an adaptive mesh; therefore, regions of the computational domain with no debris material present were allowed to coarsen up to a maximum mesh size of 2 m. D-Claw also allows for an adaptive timestep that is bound by a maximum step size or a maximum Courant-Friedrichs-Lewy (CFL) limit. All simulations used a maximum adaptive timestep of 0.1 s and were subject to CFL less than 0.7. Simulation data were output every second for the entire domain and at every timestep at locations located every 5 m along the centerline of the flume.
Simulations were performed to match the experimental setup used at the debris flow flume. We initialized simulations with a flow volume of 10 m 3 , corresponding to the experiment performed at the flume on 25 May 2017 (Logan et al., 2018;Rengers et al., 2021). The pore pressure was initialized to hydrostatic pressure, which matches the failure pressure in previous simulations . The simulations allowed the saturated debris to instantaneously fail (dam break initiation) at the start of the simulation and did not include the opening of the experimental headgate, which has been shown to affect the flow depth and timing 2 m downslope from the headgate . Because the study evaluated sub-model performance across a wide range of input parameters, which substantially affect the flow depth and timing, the simple dam break initiation was used. Further, in a forward predictive scenario the actual flow initiation is unknown. The different initiation mechanism, dam break versus gate release, was not anticipated to substantially affect overall sub-model performance evaluations. The experimental debris flow consisted of equal parts sand and gravel up to 32 mm in diameter; see (Iverson et al., 2010) for a complete description of the material used in the experiment. The experimental flow also included cobble sized rocks that were not discretely modeled by D-Claw. The debris was fully saturated with water and flowed down the flume after release from the headgate.
We evaluate the ability of the sub-models to represent the experimental debris flow. Since the start of operations (Iverson et al., 1992),  . Data from the debris flow runout, after discharging from the flume channel, is not aggregated from previous experiments because it would result in substantial information loss (Iverson et al., 2010). Therefore, we only compare our simulated runout dynamics to a single experiment. However, clear similarities in runout behavior have been observed for experiments with similar material properties and initial conditions (Iverson et al., 2010).
A vertically oriented terrestrial lidar instrument, located at the bottom of the flume, was used to capture the temporal evolution of the debris flow depth profile, along the center-line of the flume, as it traveled down the flume (Rengers & Olsen, 2020;Rengers et al., 2021). An overhead camera captured video of the runout. The debris flow was observed to form levees on the unconfined runout pad during the experimental flow, see example in Figure 3. The levees are highlighted by green outline in the figure, demonstrating that they remain intact throughout the flow. However, tail flow was observed to seep through the levees at the end of the flow. Similar behavior has been observed in previous work at the flume focused on the levee formation (Johnson et al., 2012). Results for the lateral spread and runout distance of the debris flow on the runout pad were measured from still frames of the overhead video recording of the experiment (Logan et al., 2018). The final runout distance, which extended beyond the camera field of view, approximately 13 m (x = 85 m) from the end of the flume walls, was determined from the post-event lidar scan (Rengers & Olsen, 2020).
For the simulations, we varied the range of material property input parameters based on the range of measured material properties from previous debris flows at the flume . The material properties that we used do not account for the cobble sized material that was present in the experimental debris flow. We evaluated a total of 1,800 simulations across all combinations of sub-models and parameters (see Table 1). The added sub-models, hysteretic friction and the effects of segregation, each require additional parameters, which results in more simulations for combinations that use these sub-models. The different parameter sets for each sub-model combination result in 36 simulations using a static friction angle and no segregation, 144 simulations with hysteretic friction and no segregation, 324 simulations with a static friction angle and effects of segregation, and 1,296 simulations with hysteretic friction and effects of segregation. Material properties were applied with either min/max or min/intermediate/max sampling rather than continuous or random sampling to allow for evaluation of the sub-model combination performance for the range of parameters without trying to discern the best (i.e., calibration) values of the parameters within the range. For simulations using a static friction angle, ζ 1 = ζ 2 = ζ 3 , and for simulations without the effects of segregation k A,0 = k B,0 and χ 0 = 1.

Reproducing Levee Formation
We explored the ability of the model to reproduce levee formation by assessing simulation results for a set of levee formation criteria. Channel draining, which occurs after levees have formed and does not effect levee formation, was not evaluated. The comparison to the models recreation of in-channel flow and runout was performed separately from the evaluation of the model's ability to reproduce levee formation.
To determine the presence of levees, we evaluated simulation results at transects (perpendicular to the x-axis) 2, 4, 6, and 8 m from the end of the flume walls (x = 74, 76, 78, and 80 m) on the runout pad, at one second intervals. The post-processing determined if flowing material was bound by stationary levee material. The static material had to be of sufficient width and depth and had to remain static through the remainder of the simulation to be classified as a levee. Specifically, the following criteria had to be satisfied for a particular simulation to have modeled levee formation.
1. Static material on each side of flow at least two cells (20 cm) wide.
2. The depth of static material on each side of the flow was at least half the depth of the flowing material.
3. The width of flowing material, at each transect, remained the same or decreased between output frames.
The first and second criteria ensure that the static material was originally part of the flowing material and then came to rest, rather than just representing the edge of the flow at the wet/dry interface. The third criteria ensures the static material is stable and is not a result of lateral spreading. We assessed the above three criteria for each output time step, every one second, and they had to remain present for at least the final two output frames to be classified as a levee. Although the formation of a rocky flow front and depositional snout is observationally linked to segregation and varying friction within the flow, this study did not evaluate the formation of a distinct snout within the model flow.

Model Performance Measures
We compared simulation results to experimental data to asses model performance. We performed comparisons of multiple observations (in-channel peak depth, time of arrival of peak depth, and lateral spread with runout distance) using the root mean squared error, RMSE. The RMSE is given by, where E is the error between each observation and the simulation and N is the total number of observations. For the peak depth comparison (subscript p), the error was calculated as E p = (h p,m − h p,l )/h p,l (-) at each of the N p = 14 observations. The subscripts m and l represent model and lidar values, respectively. At each location, the peak depth difference was normalized by the experimental peak depth. We performed this normalization because flow depth decreases the farther a debris flow has traveled down the flume and the normalized error represents typical behavior along the entire flume channel rather than weighting toward the top of the flume where the peak depth magnitude was the greatest. Calculations of the time of arrival (subscript t) of peak depth used the direct error between observed and simulated arrival times, E t = t p,m − t p,l (s), at each of the N t = 14 observations. To evaluate the velocity of the flow front, given that the time of arrival is dependent on differences between the simulated and experimental flow initiation, we time-shifted the simulated time of arrival results such that the flow reaches the first peak at the same time as in the experiment, thereby removing the effects of the simulated flow initiation. The time-shifted comparison (subscript ts) for time of arrival of peak depth, therefore, has N ts = 13 because the simulation data are time-shifted so that the time of arrival at the first observation point is the same as the experimental data. Experimental data for the lateral spread and runout distance were extracted from still frames from a video captured by the overhead camera of the experimental flows (Logan et al., 2018). Experimental data were extracted at one second intervals along transects spaced one m apart from the end of the flume sidewalls, and the runout distance was measured along the centerline, perpendicular to the end of the flume walls. Observations of the error (subscript r) for lateral spread, = − for each transect, and runout distance, = − , occurred every one second from t = 8 to t = 30 s at one m intervals from the end of the flume walls to an additional 27 m down the runout pad (x = 99 m), resulting in N r = 616. For the error calculations, δ y is the lateral spread (along the y − axis) and δ x is the runout distance (along the x − axis). The time over which the observations were collected covers the range from the earliest time a simulated flow reached the end of the flume walls, through the end of the simulation. The length observations were collected from the end of the flume wall to the longest distance a simulated flow reached. The high number of observations included simulation results and experimental data that had E r = 0, depressing the RMSE r (m) to a smaller error value as a result of the "true negative" values included in several comparisons. To aid in the comparison of performance across all simulations using the relative magnitude of RMSE r , the number of observations was kept the same for all simulations, including the "true negative" observations. We compared the total area of inundation in two ways. The absolute value of the normalized simulated inundation area error, E A , was calculated, ignoring inundation location, by E A = |(A m − A l )/A l |. The other method assessed inundation performance used the threat score, TS, of the confusion matrix. Each mesh cell of the final output frame from each simulation was classified as true positive, TP, false positive, FP, or false negative, FN, by comparison to the lidar scan of the experimental debris flow deposition. The fourth quantity of the confusion matrix, true negative, TN, was not used for the analysis because TP + FP + TN + FN = 1 and therefore any three of the four elements contain all the information of the matrix. The threat score measured debris flow inundation performance as The TS ranges from 0, when no inundation is correctly modeled and any area is incorrectly inundated, to 1, when the model perfectly simulates the observed inundation.

Results
We examined simulation results for three different simulated behaviors. First, results were evaluated for the formation of levees. Second, simulated results within the flume channel, peak flow depth, and time of arrival of the peak flow depth were compared to experimental lidar data. Third, the temporal evolution, length and lateral spread, of the flow on the runout pad and the inundation map were compared with experimental data.

Simulated Levee Formation
Using simulation results for levee formation, we explored the ability of the different D-Claw sub-models to recreate observed behavior. An example of levee formation from a simulation with hysteretic friction and the effects of segregation is shown in Figure 4. The figure illustrates (a) that flowing material is observed between static For each combination of sub-models, simulation results were evaluated to determine the percentage of parameter sets that produced levees on the runout pad. When a static friction angle and no segregation were used, no parameter sets reproduced levee formation. Using the hysteretic friction model with no segregation resulted in 4.2% (6/144) of the parameter sets creating levees. The static friction sub-model with the effects of segregation resulted in levee formation in 34.9% (113/324) of the parameter sets. Lastly, using the hysteretic friction and effects of segregation resulted in 36.8% (477/1,296) of the parameter sets producing levees.

Reproducing USGS Debris Flow Flume Results
We compared simulation flow dynamics to experimental data from the 25 May 2017 large scale debris flow experiment with a starting volume of 10 m 3 . The dynamics for comparison were separated into in-channel flow and flow on the runout pad.

In-Channel Flow Dynamics
In-channel flow dynamics were evaluated for peak depth magnitude and time of arrival of peak depth along the length of the flume channel. We compared simulation results to lidar depth profiles of the debris flow in the flume channel that were captured at a 60-Hz sampling frequency . Lidar data were filtered using a spatial median filter as described in Rengers et al. (2021). We compared simulation results to the experimental data every 5 m in the x-direction along the flume channel centerline. Because the variable D-Claw timestep can lead to a larger timestep size than the experimental data sampling rate, not all experimental data points were used. An example comparison is shown in Figure 5. For Figure 5, and for the analysis shown in Figure 6c (described later), the simulation results are time-shifted such that the time of arrival of peak depth at 5 m from the front of the location of the initial material aligns with the time of arrival of the experimental peak, as shown in Figure 5a. The experimental debris flow had bouncing cobbles that preceded the coherent flow front, which can be seen in the filtered lidar data as sharp peaks proceeding the flow front (shown as the first peak in Figures 5d-5f). We disregarded these early sharp peaks in the determination of the peak depth and time of arrival for the experimental data, as in previous work .
The model performance for peak depth magnitude, RMSE d , is shown in Figures 6a and 6d. In the figure, the simulations are grouped by sub-model combination and ordered from lowest RMSE (better performance) to highest. The x-axis shows the percentage of simulations using the sub-model combination that performs at or better than the associated RMSE shown on the y-axis. The figure highlights the best 20% of simulations for each sub-model combination in Figures 6a-6c and shows all simulations in Figures 6d-6f. Figures 6a and 6d shows that for all parameter sets evaluated, simulations with the partial effects of segregation (purple and yellow curves) and with the hysteretic friction sub-model (yellow and red curves) improved the peak depth performance for in-channel flow compared to the static friction sub-model. Simulations that used the hysteretic friction sub-model had similar performance either with (purple curve) or without (red curve) the segregation sub-model, although the inclusion of the segregation sub-model allowed for better performance for the best 20% of simulations. The segregation sub-model also improved model performance when the static friction sub-model was used (yellow curve with segregation, blue curve without segregation) for the full range of parameter values.
The modeled flow front velocity was compared to experimental data by looking at the time of arrival of peak flow depth along the flume channel. Model performance for the time of arrival from the start of the simulation compared to the experimental release of material is shown in Figures 6b and 6e. The model best reproduced the time of arrival using the static friction sub-model (blue and yellow curves) with better performance when no segregation (blue curve) was included. Simulation performance with segregation (red and purple curves) was not heavily influenced by which friction sub-model was used.
When the simulation results were time-shifted such that time of arrival of peak flow depth 5 m down flume matched that of the experimental flow, all sub-model combinations performed similarly, shown in Figure 6f. This result, combined with the performance differences shown in Figure 6b, indicates that all sub-model combinations recreated the flow front velocity similarly but that the timing of the initial motion and acceleration of the debris was reproduced differently. However, as with the results shown in Figures 6b and 6e, simulations with the static friction and no segregation sub-models performed better than the other sub-model combinations.

Runout Flow Dynamics
The debris flow runout dynamics that we investigated were the temporal evolution of the lateral spread and runout distance on the runout pad, as well as for the final inundation area. We compared modeled debris flow runout data to experimental data captured by overhead camera and, for the final inundation area, by lidar scan. Because the comparison occurs at one second intervals, the time of arrival of the runout influences the RMSE. As with the evaluation of in-channel flow dynamics, runout behavior was investigated from the simulation start, as well as time-shifting the model data to match the experimental timing of the debris flow reaching the runout pad.
An example of the simulated runout (with no time-shifting), using the hysteretic friction and effects of segregation sub-models, is compared to the experimental flow in Figure 7. The figure shows that the modeled flow initially spreads laterally to approximately twice the channel width, Figure 7a, compared to the experiment where the flow remained the width of the channel. The modeled flow then travels farther than the experimental flow with a width that is smaller than the spread of the experiment, Figure 7b. After the flow has stopped, Figures 7c and 7d, the total inundation area is similar, captured by the total inundation error metric being near zero, but the location of the deposition varies between the model and experiment, shown by TS = 0.68.
We assessed the model's ability to represent the temporal evolution of the debris flow runout on an unconfined runout pad. A comparison of the performance for the temporal evolution of the lateral spread and runout distance, RMSE r , is shown in Figures 8a and 8e. Figures 8b and 8f shows the RMSE r,ts when the simulation results were time-shifted so that the simulated flow reaches the end of the flume walls at 12 s, the same time as the experimental flow. The second comparison avoids temporal differences created by the simulated in-channel flow. For both metrics of model performance, simulations with segregation (purple and yellow curves) match the experiment better than simulations with no segregation (red and blue curves). The choice of friction sub-model had little effect of model performance over the entire range of input parameters.
We also evaluated the models performance at recreating the final inundation area with no temporal component to the comparison. The error in the final inundation area, E A , is shown in Figures 8c and 8g. Simulations using the effects of segregation were able to best reproduce the same total inundation area as the experimental flow.
As with the temporal evolution of the runout, the friction sub-model had little effect on model performance for inundation area. The threat score, TS, for inundation performance is shown in Figures 8d and 8h. Again, inundation performance was better when the effects of segregation were used and the choice of friction sub-model had a much smaller effect compared to the segregation sub-model.

Discussion
Accurate predictive modeling of debris flow runout can provide a valuable tool to understanding the hazard that a potential debris flow presents. In particular, predicting the magnitude of flow depth, time of arrival of the flow front, and area of potential inundation provides important information for protecting life and infrastructure. Therefore, improving the likelihood of better performance for a forward predictive model over a range of uncertain material property input parameters is beneficial to assessing debris flow hazards.
The ability of the model to reproduce levee formation on the unconfined runout pad was governed by the segregation model with little effect from the friction sub-model. Across all simulations that used the segregation sub-model, levee formation was more likely to occur with intermediate values for permeability; k A,0 = 1 × 10 −11 m 2 and k B,0 = 100k A,0 ; and a species volume fraction; χ 0 = 0.25. In these simulations, 25% of a debris flow, which advected to the front of the flow, had a higher permeability (k B,0 = 1e − 9 m 2 ) that allowed for faster dissipation of pore pressure. This dissipation caused the front of the flow to slow and get pushed aside by the remainder of the flow, which remained more mobile because of the retained pore pressure. Parameters other than permeability had a less consistent effect on levee formation.
Peak flow depth along the flume channel was best reproduced by the hysteretic friction sub-model. Including the effects of segregation with the hysteretic friction sub-model increased model performance somewhat, but not substantially over the range of input parameters. The segregation sub-model provided a larger performance improvement for the static friction sub-model. Note that for all sub-model combinations, the top 20% of simulations has an RMSE d of less than 50%, but that the best 20% of simulations using the hysteretic friction sub-model, with or without segregation, had an RMSE d less than the best performing parameter set using only the static friction sub-model. Similarly, the best 18% of simulations using the segregation sub-model, and either friction sub-model, performed better than the best parameter set using only the static friction sub-model. These results demonstrates that, for a range of parameters, the model using the segregation and/or hysteretic friction sub-models can well represent the in-channel depth of the experimental debris flow. For simulations using the hysteretic friction sub-model, low friction angle, ζ 2 , and a large friction offset, ζ 1 , producing a lower effective friction angle, performed best at recreating the flume flow depth. Permeability also was an important factor in model performance. For simulations with segregation, a large permeability difference between the two species (1,000× difference) and high species volume fraction, χ 0 = 0.75, of the material that advects to the front of the flow performed best.
The time of arrival of the flow front, measured at the peak flow depth, was best reproduced with the static friction sub-model. Model performance was better without segregation. However, when the simulation data were time-shifted to remove effects from flow initiation differences, all sub-model combinations performed similarly, with the best performance by the static friction/no segregation sub-model combination and worst performance by the hysteric friction/no segregation sub-model combination. Overall model performance, for all sub-model combinations, at recreating the flow front velocity was improved by shifting the simulation data to remove the difference in flow initiation. This improvement is likely the result of not modeling the opening of the headgate, as noted by George and Iverson (2014). The initial solid volume fraction, m 0 , which affects the initial dilation of the material, was most important in time of arrival performance. The higher m 0 value, and therefore smaller difference between the critical solid volume fraction, m crit , produced an average of 72% of the top 20% best performing simulations (all simulations shown in Figure 6b) for the time of arrival of peak flow depth in the channel.
The temporal evolution of the lateral spread and runout distance after discharging the flume channel was best reproduced by the segregation sub-model. The friction sub-model had little effect relative the segregation sub-model. The segregation sub-model provides a substantial improvement over simulations with no segregation, with the average RMSE r of best 20% of simulations with segregation less than half that of simulations without segregation. However, the RMSE r of the best 20% of simulations using the segregation sub-model combinations had a magnitude of approximately 2 m (1.75 m for time-shifted data), which is approximately half the magnitude of the approximate 4-5 m lateral spread of the experimental flow. This result shows that although modeling the lateral spread and runout distance with the segregation sub-model showed improvement, further improvement to model performance of runout, further reducing the magnitude of the error, is possible. Simulations that use the effects of segregation sub-model show a decrease in performance after the best 30%-40% of parameter sets for the runout comparison, as shown in Figures 8a and 8b insets. This performance appears to be the result of the change in permeability between species rather than from levee formation. Only approximately 32% of the top 40% of simulations using the effects of segregation formed levees.
Model performance at matching total inundation area, evaluated using E A and TS, was better using the segregation sub-model. The choice of friction sub-model did not have much effect on the inundation performance. Model performance for both metrics was most influenced by the difference in permeability of the two species; however, the amount of difference, k B,0 = 10k A,0 for the inundation area and k B,0 = 1,000k A,0 for TS, was different.
The parameter sets that performed best for all metrics except time of arrival of peak flow depth were different, but all used the segregation sub-model. However, good performance on some metrics was correlated with good performance on other metrics. Simulations that performed well at matching the evolution of the runout distance and lateral spread (Figures 8a and 8b) also performed well at matching the inundation area (Figure 8c), inundation TS (Figure 8d), and depth of in-channel peak flow (Figure 6a). Similarly, simulations that performed well at reproducing the total inundation area (Figure 8c) also performed well at matching the evolution of the runout (Figures 8a and 8b) and the inundation TS (Figure 8d).
These results demonstrate that the segregation sub-model provides a substantial improvement to the D-Claw model for simulating debris flows, not only for the best matching input parameters, but for a wide range of reasonable input parameters. The results also show that, except for recreating in channel flow depth, using the hysteretic friction sub-model had little benefit or decreased performance. Furthermore, the time of arrival of the debris flow was best recreated using the static Coulomb friction sub-model and no segregation. From an overview of all the physical flow dynamics metrics, the best combination of sub-models to reproduce the entire debris flow, from in the flume channel through the final deposit, was the segregation and static friction sub-model combination.
An important observation from these results is that observed debris flow dynamics were best reproduced when the segregation sub-model allowed the material properties, and therefore rheology of the flow, to evolve along the length of the runout path. This behavior was partially reproduced by the segregation model. Further model performance improvement may be achieved by varying more material properties as a result of segregation and by discretizing the continuum of materials into a larger number of species, rather than just two. Additionally, this result indicates that other computational models focused on debris flow runout, including other two-phase models (Mergili et al., 2017;Pudasaini & Mergili, 2019) and single phase flows with a non-Newtonian rheology (Christen et al., 2010;Gibson et al., 2022;Hungr & McDougall, 2009;O'Brien et al., 1993), would benefit from the ability to capture evolving rheology or material properties.

Conclusions
The performance of debris flow simulations depends on a combination of what, and how, different physics is modeled, and on the material and model input parameters. Given the uncertainty associated with material and model input parameters, even in a constrained experiment, it is desirable for a debris flow model to perform well over a range of appropriate material input parameters. In this paper we evaluated the D-Claw shallow flow model for two different friction representations, either including or excluding the effects of segregation on material properties. The model results were evaluated for performance against several different aspects of debris flow dynamics, including the ability to generate self-channelizing levees, in-channel behavior, and unconfined runout and inundation. Model performance was evaluated over the whole range of input parameters, focusing on the best 20% of simulations for each sub-model combination, rather than calibrating to only the best performing simulation.
These results demonstrate that, for the D-Claw shallow flow model, the choice of which sub-model combination to use depends on which output metric(s) the simulation is trying to simulate. Debris flow timing is best modeled using the static friction sub-model with no segregation, whereas all other flow dynamics were best modeled using the segregation sub-model. Using the segregation sub-model, the choice of friction sub-model did not substantially affect performance, except for reproducing in-channel flow depth where the hysteretic friction sub-model improved model performance over the static friction sub-model. Model performance across all metrics was better with the static friction and segregation sub-model combination.
A challenge remains in constraining the inputs to real world conditions for the segregation model. This study did not attempt to determine which constraints should exist on the input parameters, including for the segregation sub-model. Such a study may benefit from allowing all material properties (permeability, friction, density) to vary between the two species as well as evaluating flow performance compared to a field mapped flow over natural and built terrain.
Application of debris flow simulations as a tool for both post-event evaluation and pre-event prediction would benefit from further research to constrain debris flow material properties that are used as model input parameters. Additionally, a more extensive evaluation of parameter sensitivity could aid in understanding uncertainty associated with input parameter ranges. Improving model performance across a range of input parameters helps alleviate some of challenges associated with the debris flow runout hazard assessments.

Data Availability Statement
The D-Claw model was developed based on the extensive effort of the Clawpack (clawpack.org) framework.
The D-Claw model used for this study (George et al., 2022), [Software], is available in its most recent version at: https://github.com/geoflows/D-Claw. A sample D-Claw input file can be found in the Supporting Information S1.