Can high resolution 3D topographic surveys provide reliable grain size estimates in gravel bed rivers?

High resolution topographic surveys such as those provided by Structure-from-Motion (SfM) contain a wealth of information that is not always exploited in the generation of Digital Elevation Models (DEMs). In particular, several authors have related sub-metre scale topographic variability (or ‘ surface roughness ’ ) to sediment grain size by deriving empirical relationships between the two. In ﬂ uvial applications, such relationships permit rapid analysis of the spatial distribution of grain size over entire river reaches, providing improved data to drive three- dimensional hydraulic models, allowing rapid geomorphic monitoring of sub-reach river restoration projects, and enabling more robust characterisation of riverbed habitats. However, comparison of previously published roughness-grain-size relationships shows substantial variability between ﬁ eld sites. Using a combination of over 300 laboratory and ﬁ eld-based SfM surveys, we demonstrate the in ﬂ uence of inherent survey error, irregularity of natural gravels, particle shape, grain packing structure, sorting, and form roughness on roughness-grain-size relationships. Roughness analysis from SfM datasets can accurately predict the diameter of smooth hemispheres, though natural, irregular gravels result in a higher roughness value for a given diameter and di ﬀ erent grain shapes yield di ﬀ erent relationships. A suite of empirical relationships is presented as a decision tree which improves predictions of grain size. By accounting for di ﬀ erences in patch facies, large improvements in D 50 prediction are possible. SfM is capable of providing accurate grain size estimates, although further re ﬁ nement is needed for poorly sorted gravel patches, for which c -axis percentiles are better predicted than b -axis percentiles.


Introduction
The interaction of river channel morphology and hydraulics at the sediment-water interface influences both water and sediment fluxes through fluvial systems as well as in-stream ecological processes by providing habitat . As such, understanding and manipulating these interactions are often the focus of river restoration activities (Palmer et al., 2014). Bed material size represents one of the smallest scales of topographic variability within river channels. Analyses of grain size variability can yield information on flow history, sediment origins and patterns of previous transport and deposition (e.g., Folk and Ward, 1957;Friedman, 1979;Bui et al., 1989). Information on bed material size is essential for hydraulic modelling (Smart et al., 2004) and is required to predict boundary shear stress and sediment transport (e.g., Wilcock and Crowe, 2003) in morphodynamic models (e.g., Langendoen et al., 2016). Bed material size is also an influential determinant of the availability of suitable spawning habitat for fish and habitat for macroinvertebrates (Armstrong et al., 2003;Fryirs, 2015).
Gravel bed rivers exhibit marked reach-scale variability in grain size distributions (e.g., Church and Kellerhals, 1978;Nelson et al., 2014), which necessitates sampling over large areas to represent accurately the heterogeneity of grain sizes (Church et al., 1998). Conventional and well-established methods of grain size analysis (e.g., Wolman, 1954), although relatively reliable (despite known biases; Hey and Thorne, 1983;Bunte and Abt, 2001), are limited in their ability to provide high resolution information of grain size variability owing to the timeconsuming nature of physical sampling. Both Wolman pebble counts and even more laborious areal sampling methods are destructive techniques which limits the potential for repeated surveys and disrupts the habitat they are intended to quantify. To address the clear need for a fast, accurate and reproducible grain size measurement technique (Woodget and Austrums, 2017), recent studies have developed new methods of obtaining grain size information from remotely sensed data.
Summary values of local grain size (e.g., D 84 or D 50 , the size at which 84% or 50% of measured b-axes are finer) have been estimated from photographs (so-called 'photosieving' methods) using semivariograms of image texture (e.g., Carbonneau et al., 2004;Verdú et al., 2005), the autocorrelation of image intensity (e.g., Rubin, 2004;Barnard et al., 2007;Warrick et al., 2009) or other statistical properties of images (see Buscombe and Masselink, 2009;Buscombe and Rubin, 2012). Others have sought to identify and measure individual grains within an image (e.g., Graham et al., 2005;Detert and Weitbrecht, 2012), allowing the full grain size distribution to be estimated. While photosieving methods focus on the planimetric dimensions of grains, parallel developments have utilized the increased availability of high resolution Digital Elevation Models (DEMs) and other topographic data products to estimate the vertical scale of topographic variability attributable to grain size. Survey methods such as Airborne Laser Scanning (ALS) and Terrestrial Laser Scanning (TLS) can provide decimetre and millimetre scale 3D survey point accuracies, respectively (Gallay, 2013;Smith, 2015). In particular, TLS has been used at scales from individual grains and small patches of 10 0 -10 1 m 2 (e.g., Hodge et al., 2009a), to fluvial landforms of 10 2 -10 3 m 2 (e.g., Leyland et al., 2015), and even to reaches of 10 6 -10 7 m 2 (e.g., Williams et al., 2014). More recently, Structure-from-Motion with Multi-View-Stereo (SfM-MVS) methods have arisen as a flexible, low-cost alternative to TLS Eltner et al., 2016;Carrivick et al., 2016) and have gained considerable traction in fluvial applications (e.g., Javernick et al., 2014;Woodget et al., 2015;Marteau et al., 2016).
The focus of many fluvial geomorphic surveys is the broad-scale topography and quantification of surface changes (e.g., Milan et al., 2011). However, the high spatial resolution of TLS and SfM-MVS data has enabled sub-grid scale surface roughness to be investigated as a proxy for grain size (i.e., topographic scales smaller than the resolution of Digital Elevation Models which are typically decimetre-metre scale in fluvial studies) (Smith, 2014). The development of an empirical relationship between surface roughness and grain size essentially reverses the established practice of using measured grain size percentiles to estimate roughness height for use in flow resistance equations (e.g., Strickler, 1923;Clifford et al., 1992;Wilcock, 1996;Powell, 2014). These relationships, however, have been confounded by the range of particle sizes, shapes and packing-structures (e.g., imbrication) observed in natural alluvial channels (Furbish, 1987;Powell, 2014;Vázquez-Tarrío et al., 2017). Moreover, the presence of bedforms and multiple scales of topographic variability limits the usefulness of particle axes measurements to estimate flow resistance (Robert, 1990;Gomez, 1993;Smart et al., 2004). To some extent, the same confounding factors will hinder attempts to estimate grain size from surface roughness.
The complex geometry of fluvially-formed surfaces cannot be described fully by a finite number of parameters  and as such no single property or parameter can be defined uniquely as surface roughness (Smith, 2014). Nevertheless, a single roughness metric has dominated the geomorphological literature for several years: the standard deviation of elevations (σ z ) (e.g., Grohmann et al., 2011;Brasington et al., 2012). Studies that relate sub-grid surface roughness to grain size (typically D 50 ) also preferentially use σ z , often detrended at the grid-scale (Smart et al., 2004;Aberle and Nikora, 2006;Hodge et al., 2009b;Brasington et al., 2012;Schneider et al., 2015;Westoby et al., 2015;Bertin and Friedrich, 2016) resulting in a single roughness value. However, estimates of grain size from surface roughness have not been subject to the same rigorous evaluation as photosieving approaches (e.g., Graham et al., 2010). Fig. 1A summarises the scatter observed across all these previously published studies; with all data combined an R 2 of 0.49 is observed (n = 97; p < 0.001) though the 95% confidence intervals of the gradient (from 1.19 to 1.81) are sufficiently wide to preclude accurate prediction across a range of facies types. Table 1 provides details of each of these studies; while fluvial gravels dominate the literature, glacial moraines have also been sampled (e.g., Westoby et al., 2015) reflecting the fact that grain size classification is of benefit in a wide range of process environments beyond fluvial gravels. TLS data have dominated such studies, though the most recent studies have used SfM. Direct comparison of these results is complicated further by the different grain size measurement methods employed (which may require correction factors; Bunte and Abt, 2001;Graham et al., 2012), the different patch sizes (and hence different detrending scales) and the differences in methods used to extract roughness data.
At individual study sites, linear relationships between D 50 and σ z are observed (typically R 2~0 .9; Table 1). Brasington et al. (2012) observed a strong linear relationship between σ z and D 50 for cobble-sized material at the River Feshie in Scotland (R 2 = 0.92) that improved (R 2 = 0.95) when additional data from Wales (beach sediment) and New Zealand (braided river) were added, though the parameters of the regression equation were different (Table 1). Conversely, on poorly sorted glacial sediment, Westoby et al. (2015) observed no significant relationship between σ z and D 50 . Table 1 also describes studies that used two alternative roughness metrics for grain size prediction. Heritage and Milan (2009) avoided detrending patches over large scales by summarising the localised standard deviation of elevations within a moving window, equivalent in size to the largest clast (0.15 m in this case) and multiplied the result by two to generate the effective roughness equivalent (a convention maintained herein). Comparison of percentiles of the resulting roughness metric (given the notation 2σ loc herein) with grain size percentiles yielded strong relationships in eight individual patches. To facilitate comparison with other studies, the median value of 2σ loc for each patch has been plotted in Fig. 1B and a relationship between that and D 50 summarised in Table 1. Baewert et al. (2014) used a similar approach (on c-axis data) with a smaller window size (~50 mm) while Vázquez-Tarrío et al. (2017) used a grid size of 1 m. Comparison of these D 50 -2σ loc relations shows that they plot as expected with larger window sizes yielding higher roughness values; indeed Baewert et al. (2014) demonstrated this directly. Finally, Woodget et al. (2016) and Vázquez-Tarrío et al. (2017) outlined a third roughness metric (rh) which summarises the deviation from a fitted plane of all points within a given radius (the former study used the mean value within 0.2 m, the latter study used the median value within 0.5 m). In a comparison of all three roughness metrics, Vázquez-Tarrío et al. (2017) found rh to provide the best prediction of D 50 . While Woodget et al. (2015) plotted rh against D 84 (for which there was the strongest linear relationship) rather than D 50 , this relationship remains very different from the rh-D 84 relationship reported in Vázquez-Tarrío et al. (2017) (though note the calculation differences reported above). Although there are some differences between roughness metrics, they are fundamentally very similar; each presents a summary of local variability in elevations while filtering out larger wavelength topographic variability. For better comparison with the existing literature, herein we focus on σ z as the main roughness metric used for grain size prediction; the other metrics are compared briefly for field patches in Section 3.7.
Comparison of the relationships in Fig. 1A with that of Westoby et al. (2015), obtained over sediment of very different character, highlights that properties of sediment facies other than grain size influence surface roughness (see also the comparison in Fig. 15 of Vázquez-Tarrío et al., 2017). It is not surprising, therefore, that each study represented in Fig. 1 obtained a different empirical relationship. At a minimum, site-specific calibration of any σ z -D 50 relationship is required. However, a more generalizable approach would be to identify the individual factors that have caused these differences and present a suite of σ z -D 50 relationships, each one of which is appropriate for a different fluvial sediment facies. Such factors include: (1) Survey error: low precision survey observations would increase σ z and the sources of such error in SfM surveys (e.g., image network geometry, tie point quality, control measurement; see James et al., 2017) are very different from those in TLS surveys (e.g., registration errors, beam divergence and mixed pixels, range errors; see Smith, 2015) ; (2) Particle irregularity: natural deviations of gravels from abstract shapes (e.g., spheres, prolates) would increase σ z ; (3) Particle shape: sphere and prolate-shaped particles normally protrude above the bed more in comparison to oblates with the same aand b-axes measurements; (4) Grain packing structure: imbrication would affect grain orientation and protrusion (Buffington and Montgomery, 1999) and influence σ z as particles do not lie flat, but are layered in the direction of the flow with one end elevated above the general surface level; (5) Sorting: a well-sorted patch would exhibit a smaller σ z compared to an otherwise identical poorly sorted patch as grains at the upper range of grain sizes would increase σ z (Hodge et al., 2009a); (6) Bedforms and form roughness: grain scale topographic variation may be superimposed upon longer scale topographic variability that is not filtered out by linear detrending (particularly at larger grid sizes) which would inflate σ z for a given grain size.
The aim of this study is to advance the use of using surface roughness as a proxy for grain size in high resolution topographic surveys by disentangling the confounding effects of survey error, particle shape, sorting, grain packing structure, and bedforms on empirical relationships. Through a mixture of > 300 laboratory and field SfM-MVS surveys, we: (i) isolate the effect of each influencing factor; (ii) identify limits of application for σ z -D 50 relationships; and (iii) produce a suite of empirical σ z -D 50 relationships for use with distinct fluvial sedimentary facies.

Methods and study site
A combination of laboratory and field-based SfM-MVS and grain size surveys were undertaken. Laboratory surveys were performed on artificial spheres and grains sampled from the field that had been prepared deliberately to isolate the effects listed above. Field surveys were then undertaken to investigate the influence of bedforms on the relationships observed.

Study sites
Field surveys were undertaken at two contrasting locations (Fig. 2). The River Cover is a moderate gradient river (slope = 0.025) that runs through the Coverdale catchment in the Yorkshire Dales. The River Cover is a tributary of the River Ure which eventually flows into the Yorkshire Ouse. The study reach was c. 900 m long and c. 1.5 km from the headwaters with a channel width of 5-10 m and an upstream catchment area of approximately 15 km 2 . The reach is characterised by a combination of gentle meanders with associated point bars and historically straightened sections with a gravel bed, where sediment sizes range from fine gravels (2 mm diameter) to boulders (> 256 mm diameter). The River Ure study area is 15 km downstream of the River Cover study site, and 2 km from the confluence of the Cover and Ure rivers. The River Ure study area has a larger upstream catchment area (~500 km 2 ), wider channel (20-30 m) and lower channel gradient (0.003). The study area was focused on a large gravel bar, with an area of 5000 m 2 . The bar was characterised by a less diverse size range of sediments than the River Cover, with cobbles (< 256 mm) representing the largest sediments found. The sediments were well sorted towards the vegetated field, and less well sorted adjacent to the water edge.

Sediment sampling
Field survey patches were established on exposed gravel bars throughout the study reaches. Thirty-one patches of 1 m 2 were chosen. Immediately following SfM-MVS surveys, a manual areal sample of the a-, b-and c-axes of all pebbles represented on the surface of the patch was conducted. Grain b-axis sizes ranged from 9-300 mm (aside from a single grain at 5 mm) and all particles were rounded. To provide grains for laboratory surveys, a further 2000 rounded, fluvial particles were collected randomly from a 150 m 2 area and each axis measured (with baxis sizes from 9-111 mm). Wentworth's (1922) size classification based on particle b-axes was used to separate grains into size categories. All sediment was fluvial in origin. Pebbles were further divided into shape categories based on the classification of Sneed and Folk (1958) which uses three ratios of particle a-b-and c-axes to classify shape into three categories: spheres, oblates (disc-like) and prolates (rod-like). These ratios measure a particle's relative equancy (c/a), elongation (b/ a) and 'disc versus rod-like aspect' (a− b/a− c).

Structure-from-Motion with Multi-View-Stereo (SfM-MVS) photogrammetric surveys
SfM-MVS surveys followed standard workflows, which are welldescribed in the geomorphological literature (e.g., Micheletti et al., 2015;. The same survey methods were used in both field and laboratory settings. Each gravel patch was surrounded by a number of Ground Control Points (GCPs). These consisted of either yellow survey targets or crosshairs on bright red backgrounds. Each target was surveyed using a reflectorless total station (Leica TS15) in a local coordinate system. For the laboratory patches, all surveys were  Table 1 Summary of existing studies estimating grain size percentiles from relating sub-grid surface roughness of high resolution topographic surveys. See text for details. All units stated in m. The Heritage and Milan (2009)  undertaken within an immobile array of GCPs that were surveyed prior to the photogrammetric surveys. The 3D co-ordinates of each GCP were recorded as an average over 10 distance measurements, with a 3D accuracy of 0.003 m. An average of 40 ground-based hand-held oblique photos were taken for laboratory patches and an average of 25 ground-based photos were taken for field patches. The survey range was typically~2 m. The same camera was used as much as possible; however, changes in equipment were required due to camera failure. The majority of surveys were completed using a Canon PowerShot ISO200 (10 MP, 6 mm focal length). Several patches were surveyed using both FujiFilm FinePix J350 (16 MP, 5 mm focal length) and AV200 (14 MP, 5.7 mm focal length) models. Both cameras malfunctioned; while no direct comparisons were made between cameras on the same patch, previous studies suggest this would have minimal influence on the results (Thoeni et al., 2014;Eltner and Schneider, 2015;Micheletti et al., 2015;Prosdocimi et al., 2015;Mosbrucker et al., 2017).
Photogrammetric surveys were processed using Agisoft PhotoScan Professional 1.2.6. Following bundle adjustment, three GCPs were located in the imagery for lab patches and five GCPs for field patches and the point cloud was scaled and oriented. Georeferencing errors were typically 0.79 mm for lab patches and 1.21 mm for field patches. MVS image matching algorithms were performed to produce final dense point clouds which were cropped to the measured gravels. For laboratory patches, this method resulted in edge effects as the sides of peripheral particles (i.e., particles located at the edge of the survey area) were visible in the point cloud. Thus, to better reproduce field conditions, further cropping was undertaken to remove peripheral pebbles. Using this method, sub-millimetre data resolution was obtained on all patches. All patches were linearly detrended prior to calculation of σ z , yielding a single roughness value for each patch (as

Experimental design
The overall experimental design is outlined in Fig. 2. The patch surveys were undertaken on a range of grain sizes for different types of patch to isolate the six factors under investigation. Within the experimental design, each of the laboratory patches were surveyed three times to provide replication, and the average of these three values was used in linear regressions.

Factor 1: precision limitations of SfM-MVS
While the majority of studies in Table 1 used TLS data to evaluate σ z , SfM-MVS is increasingly being applied as an alternative to TLS (e.g., Marteau et al., 2016) and was thus used in this study. In any case, given the small magnitude of errors for both techniques at this scale (typicallỹ 2 mm; Smith and Vericat, 2015), the specific survey method is unlikely to have a large effect. Nevertheless, the performance of SfM-MVS was evaluated using regular arrays of uniform spheres of known diameter. The standard deviation of points distributed uniformly over a perfect hemisphere can be calculated theoretically because the surface area of a sphere is constant at any given elevation. With a large number of equally spaced points over that surface, σ z will equal 1/√12 the particle radius. Thus, a linear relationship between hemisphere diameter D and σ z is expected such that: For comparison with Eq. (1), 11 sets of polystyrene balls of diameters between 0.02 m and 0.12 m were painted to represent natural sediment colours (Fig. 2). Balls of the same diameter were arranged into a tightly packed patch in a levelled tray. Surveys were evaluated both before ('Full hemispheres') and after cropping the peripheral particles (as detailed in Section 2.2), as the non-cropped surveys provide a better comparison with theoretical calculations, while cropped surveys better simulate field conditions.

Factor 2: irregularity of natural gravel particles
To assess the effect of naturally occurring particle roughness, the collected particles classified as spherical in shape were used to manually create patches of different sizes. The σ z -D 50 relationship was then compared with the patches of smooth hemispheres.
2.4.3. Factor 3: particle shape effects Patches based on each particle shape (sphere/oblate/prolate) were made for each particle size class for which a sufficient number of particles were available to create a patch. Relationships between σ z and D 50 for oblates and prolates were compared with the relationship for spheres. Ratios of c:b axes were in the range of 0.53-1.0 (spheres), 0.18-0.69 (oblates), and 0.30-1.0 (prolates).

Factor 4: imbrication effects
Imbrication is characterised by particles leaning on each other on a contact surface, with particles consistently dipping upstream, typically at an angle of approximately 36° (Laronne and Carson, 1976), although a range of dip angles has been observed in fluvial sediments. This analysis was undertaken on oblate-and prolate-shaped particles only as spherical particles, by definition, cannot be imbricated. Patches were arranged per size class, and pebbles were manipulated to re-create signs of imbrication on a row by row basis, with measured dip angles between 36°and 38°. Regression equations for imbricated oblates and prolates were compared with their non-imbricated counterparts.

Factor 5: sorting effects
Sorting was highlighted by Brasington et al. (2012) as a key issue when using high resolution topographic data to derive grain size estimates. Many upland rivers are of glacial origin (Howard and Macklin, 2003), resulting in a large range of sizes, with the largest boulders being well above entrainment thresholds for their associated channels. Although laboratory analysis was undertaken on relatively small particles (with the largest particle having a b-axis measurement of 136 mm), the effect of sorting was evaluated. The sphere patches used in Factor (2) were well sorted due to their separation into size classes. A further 20 laboratory patches were created to evaluate the effect of sorting with particles of different sizes mixed throughout the patch (i.e., no clustering of small and large particles). To signify poor (-moderately poor) sorting, the inclusive graphic standard deviation, σ I (Folk, 1968) of b-axes of each patch had to be > 0.8 (range 0.8-1.1).
Field patches consisted on average of 53% oblate, 24% prolate and 23% sphere shaped particles. Particles measured ranged in b-axis size from 5-311 mm. The within patch range in b-axis size varied from a minimum of 74 mm to a maximum of 288 mm. Visual assessment of each patch revealed that patches were not imbricated, despite being of fluvial origin. Inclusive graphic standard deviation was again calculated per patch, ranging from 0.698-0.983. To test the effect of sorting on these patches, the data set was split in half, consisting of 16 moderately well-sorted patches (σ I < 0.76), and 15 moderately sorted patches (σ I > 0.76).

Factor 6: form roughness
Unlike the patches created manually in the laboratory, naturally occurring patches were assumed not to be on an underlying flat surface. To assess the effect of form roughness on σ z , three methods of removing form roughness were compared to σ z values calculated from a linearly detrended surface. CloudCompare (Girardeau-Montaut, 2016) was used to detrend the patch surface using: (i) a 2.5D quadric curved surface fitted to the point cloud; (ii) a coarse DEM at 100 mm grid size; and (iii) a coarse DEM at 75 mm grid size.
For (ii) and (iii), care was taken not to reduce the resolution so much that genuine grain-scale variability was removed during the process. In each case, the fitted surface was subtracted from the original point cloud and σ z recalculated. Finally, alternative roughness metrics (2σ loc and rh) were computed for field patches as described previously using a 0.2 m kernel/window size to reflect the maximum grain size.

Statistical analysis
Linear regressions were performed in Stata to identify σ z -D 50 relationships and to provide comparison with the theoretical linear relationship of Eq. (1). Normality of residuals was evaluated using Shapiro-Wilk and skewness-kurtosis tests (D'Agostino et al., 1990); heteroscedasticity was evaluated using the tests of Breusch and Pagan (1979) and Cook and Weisberg (1983). The relationships for oblate and imbricated oblate particles showed non-normality of residuals; however, these were only minor and in the case of oblates arises from a single data point. For clear comparison with other relationships, linear regressions were retained for these patches. The relationship for the oblate patch was also recalculated without the outlier.

Precision limitations of SfM-MVS
A direct comparison of the σ z -D 50 relationship for 'Full hemispheres' with Eq. (1) showed a very close correspondence (Fig. 3). The gradient of the relationship was slightly lower, but overall differences are negligible with a significant relationship observed (R 2 = 0.995; p < 0.001) ( Table 2). However, when peripheral hemispheres were cropped (i.e., better reflecting a field situation as no particle sides are visible in the point cloud), systematically lower σ z values were observed (Fig. 3), though there was a small increase in the R 2 (0.997, p < 0.001). For comparison with field conditions, the data sets presented in subsequent sections have had the peripheral particles cropped.

Irregularity of natural gravel particles
Gravels are more irregular in shape than the hemispheres used  Table 2.

E. Pearson et al.
Geomorphology 293 (2017) 143-155 above ( Fig. 2 and Fig. 3), resulting in a higher σ z for a given D 50 . Nevertheless, a strong relationship is observed (R 2 = 0.974; p < 0.001) with no intercept term and a gradient that is similar to Eq. (1).

Particle shape effects
Patches of prolate-shaped ('rod-like') gravels behaved similar to spherical gravels with a slightly higher σ z for a given D 50 (Fig. 4). A similarly significant relationship was observed (R 2 = 0.988; p < 0.001). However, oblate-shaped ('disc-like') gravels exhibited a much lower σ z and considerable scatter in the σ z -D 50 relationship for larger grains (though with R 2 = 0.776; p < 0.001). This result is a consequence of oblate-shaped gravels having minimal vertical expression. Removing the clear outlier for oblates in Fig. 4 improved the relationship (R 2 = 0.936; p < 0.001) and lowered the gradient term (8.47).

Imbrication effects
Imbricated prolate patches resulted in a higher σ z than predicted theoretically for hemispheres (Eq. (1)) and for otherwise identical nonimbricated prolate patches (Fig. 5). However, overall, there was little difference observed between imbricated and non-imbricated prolates, and a strongly significant relationship was again observed (R 2 = 0.949; p < 0.001). Imbrication exhibited a larger effect on oblates resulting in an inflation of the σ z as the now inclined disc-shaped particles have greater expression of the b-axis in the vertical dimension (and indeed a greater vertical expression in general). Consequently, the model fit improved (R 2 = 0.966; p < 0.001).

Sorting effects
When compared with well-sorted spherical gravels, poorly sorted laboratory patches exhibited a much larger σ z for any given D 50 (Fig. 6A) and no σ z -D 50 relationship was observed. This difference results from small particles being obscured by surrounding larger particles and are thus not represented in the point cloud such that roughness no longer reflects D 50 . The same situation is observed in field patches (Fig. 6B), though σ z values are even higher than poorly sorted laboratory patches, reflecting the additional factor of form roughness (see Section 3.6 below). For moderately well-sorted field patches, a significant linear relationship was observed, though with a lower R 2 than all well-sorted patches described above (R 2 = 0.710; p < 0.001).
Different grain size percentiles (D 50 and D 84 ) and particle axes (band c-axes), were examined for relationships among the patches exhibiting a range of grain sizes both in the laboratory and in the field. Table 3 identifies significant relationships between σ z and the 84th percentile of c-axis measurements for all such patches (Fig. 7). This relationship is most notable for the poorly-sorted laboratory patches for which all other grain size metrics showed no significant relationship; however, two outliers account for the poor relationship with D 84 in this case (when removed R 2 = 0.63, p < 0.001).  . 4. The effect of particle shape on measured σ z -D 50 relationships.  Table 3 Linear regression models between grain size percentiles for both b-and c-axes and σ z for patches with a range of sorting both in the laboratory and the field. 3.6. Form roughness Fig. 6 shows σ z and D 50 values for field patches that had only been detrended linearly (i.e., within-patch form roughness had not been removed). There was no overlap in σ z between the laboratory and field patches with field patches having much larger σ z values. This difference suggests that larger scale undulations at the patch scale (1 m 2 ) were inflating σ z , as the patches otherwise were identical. When a 2.5D quadric curved surface was fitted, the relationship between D 50 and σ z did not improve (R 2 = 0.701; p < 0.001) (Fig. 8A). However, minor improvements were noticed when a coarse DEM was subtracted from the point cloud (Fig. 8B). DEMs with a resolution of 100 mm and 75 mm resulted in an R 2 of 0.742 and 0.736, respectively. The σ z values were reduced to a level similar to (but still slightly higher than) the poorly sorted laboratory patches, suggesting that the higher σ z values of field patches were associated with form roughness.
For the poorly sorted field patches, no relationships between D 50 and σ z were observed for any type of detrending. While the quadric detrending further improved the relationship between σ z and the 84th percentile of c-axis (R 2 = 0.351; p < 0.020), no improvement was observed for the DEM-detrending methods.

Alternative roughness metrics
For comparison with previous studies relating roughness and grain size, Table 4 shows the performance of 2σ loc and rh on the field patches ( Fig. 9). Very few differences were observed and the three roughness metrics were highly correlated (Spearman's r s > 0.87 for each pair of metrics).

Discussion
Extracting grain size data from surface roughness metrics is increasingly popular in geomorphology as high resolution survey techniques have become more readily available. A number of empirical relationships have emerged in the literature, but there has been little synthesis of these findings and no attempt has been made to explain the very different relationships observed ( Fig. 1; Table 1). This study demonstrated that while grain size information can be obtained accurately from SfM data (compare the 'Full Hemispheres' relationship with the theoretical line in Fig. 3), sediment facies type has a Fig. 6. The effect of sorting on measured σ z -D 50 relationships for both laboratory and field-based patches. Fig. 7. Relationships between σ z and the 84th percentile of c-axis measurements for patches exhibiting a range of sorting, both in the laboratory and the field. pronounced effect on the results obtained. This finding builds on previous work that indicated similar differences for a much smaller number of gravel patches (e.g., Heritage and Milan, 2009). The approach is most challenging with oblate-shaped gravels and on poorly sorted sediments, where c-axis percentiles are better represented by roughness (Fig. 7). For the first time, we have isolated several factors influencing σ z -D 50 relationships to provide plausible explanations for variability in the existing literature. Differences between theoretical and observed σ z -D 50 relationships are not simply a function of survey method (Fig. 3); when standardised hemispherical shapes were surveyed using SfM-MVS, only a very minor deviation from theoretical values was observed. Inherent SfM precision is not a limiting factor and any inaccuracies encountered would also occur when using TLS data. The differences between the observed relationships for different facies are much larger than the precision limitations of either method at the plot scale. This finding is encouraging and demonstrates that the use of surface roughness as a proxy for grain size is technically possible. However, the lower σ z observed for the hemispheres with peripheral particles removed indicates that high resolution surveys do not fully capture the interstitial surfaces. This inherent underestimation of σ z may have been obscured from previous studies, as Fig. 3 shows that it is counterbalanced by the inflation of σ z caused by particle irregularities.
Particle shape affected σ z primarily for oblate-shaped gravels, which displayed lower σ z for a given D 50 compared to both the theoretical calculation and other shapes. Oblate-shaped particles are relatively flat, plate-like surfaces that exhibit a reduced vertical expression relative to spheres or prolate-shaped particles with the same b-axis length. This result adds support to previous suggestions that roughness is more closely related to the c-axis than the b-axis, as fluvially transported grains typically come to rest with their c-axis orientated vertically (Gomez, 1993;Heritage and Milan, 2009). However, despite this effect, particle shape is not the major influence on σ z -D 50 relationships, as Brasington et al. (2012) examined a range of shapes including platy fluvial gravels and rounded beach sediment.
Imbrication caused an increase in σ z for oblate-shaped gravels because the individual particles support each other and thus reach higher local elevations for a given grain size. Such an increase was not observed for prolates, as the increased elevation at the top was balanced by the obscuring of the lower end of the particle and thus reducing the topographic variability within the patch. None of the previous studies explicitly mentioned that their patches were imbricated. Although with all but Westoby et al. (2015) examining water-worked sediments, some degree of imbrication would have been likely. All σ z values observed in the laboratory patches were smaller than the values obtained from a literature meta-analysis (Fig. 10). This higher σ z indicates that other factors such as particle sorting and form roughness are inflating the σ z of previously reported field studies.
All the previous studies showed higher σ z for a given D 50 than the laboratory-based analysis and the theoretical calculation (Fig. 10). The data presented here identifies particle sorting as the major problem when using surface roughness to estimate grain size. Poor sorting of gravels caused a large increase in σ z for a given D 50 as a result of a much wider range of grain sizes and a subsequent breakdown in the relationship for poorly sorted gravels. In this case, smaller particles lower the D 50 of the patch but are not proportionally represented in the point cloud. While no fines or sand were included in this study, the same issue of poor representation in point clouds is even more pronounced where they are present. The presence of large grains within a patch strongly  affects σ z and suggests a consistent methodological bias towards larger particles in poorly sorted facies. Larger particles take up a larger surface area within the patch and hence comprise a higher proportion of the points in the point cloud. While several studies report strong relationships between roughness and higher grain percentiles, specifically D 84 (Aberle and Nikora, 2006;Baewert et al., 2014;Schneider et al., 2015;Woodget et al., 2016;Vázquez-Tarrío et al., 2017), in this study roughness metrics could not predict D 84 in poorly sorted patches. However, more significant relationships were obtained when particle caxis data (c-axis D 84 ) were predicted in poorly sorted oblate field patches.
Particle sorting also had similar and pronounced effects on the relationship between σ z and D 50 in the literature. Brasington et al. (2012) examined relatively well-sorted patches compared to the work of Westoby et al. (2015) on glacial moraine that was poorly sorted. Of interest is the significant σ z -D 50 relationship observed by Schneider et al. (2015) in glacial streams (R 2 = 0.91; taking only directly measured values) which was surprising because glacial material is often very poorly sorted despite having some fluvial influence (Carrivick et al., 2013). The line-by-number grain size sampling method used by Schneider et al. (2015) may explain this difference, as this sampling method may lead to bias towards larger grain sizes (Bunte and Abt, 2001). In the one location where Schneider et al. (2015) used gridby-number sediment sampling, they observed a lower D 50 . Likewise, Smart et al. (2004) observed very different relationships when different grain sampling methods were used (Table 1). We undertook areal sampling of all surface particles by hand which yields very different results from the Wolman method (Bunte and Abt, 2001) but does provide an accurate representation of grains present on the surface and was best-suited for comparison with laboratory patches. The closest comparisons are the work of Westoby et al. (2015) and Bertin and Friedrich (2016) which used photo-sieving methods to represent all grains in the patches. Certainly, grain size sampling technique requires some standardisation, as different methods yield different D 50 values (see Graham et al., 2012).

Limitations and further work
The presence of methodological problems in sediment reporting must be addressed before standardised methods can be adopted. For example, naturally occurring patches are not one particle deep; therefore the degree of burial would also have an effect on the σ z of a patch and needs further consideration. Likewise, some particles may be buried in the presence of sand and fines. Analysis of surface roughness variability alongside hiding functions may yield further insight. Second, interactions between the different factors examined herein also require further analysis. Third, the effect of SfM survey range requires examination. Data from UAVs (e.g., Woodget et al., 2016;Vázquez-Tarrío et al., 2017) will exhibit lower precision and resolution (Smith and Vericat, 2015) than that presented here. Differences seen in Fig. 1 will arise from a combination of facies variability and survey factors (e.g., point precision and density, image blur, georeferencing errors). Precision limitations will set a minimum grain size for which these methods can be applied reliably; for ground-based surveys this should encompass the full size range of gravels. Fourth, where the b-axis is not orientated vertically, a roughness-only approach would be challenging. The most robust method of grain size estimation might be to combine a surface roughness approach with photosieving approaches that could be performed on orthophotographs produced using SfM. In combination, these methods might be able to represent all three dimensions of variability of gravel particles, thereby yielding a more robust grain size estimate (see also Steer et al. (2016) for an alternative approach involving 3D geometrical fitting algorithms to obtain grain size and shape).
While three different roughness metrics have been used in the literature, all three are highly correlated and all represent a related metric of local elevation variability subject to different detrending methods. The findings here support those of Baewert et al. (2014), who observed a large change in roughness value for different detrending lengths. Removal of sub-grid topographic variability improved regressions slightly, but came at the expense of potentially filtering out large grains. As expected, the reduction in σ z values achieved through these detrending methods brought field values close to the σ z observed for poorly sorted lab patches (for a given D 50 ).
The various relationships identified in Table 2 indicate that the nature of gravel facies has some influence on measured σ z for a given grain size. More accurate grain size estimates could therefore be obtained if a suite of empirical relationships were used to represent different gravel facies. However, use of surface roughness as a proxy for D 50 is inappropriate in poorly sorted gravels; this is a substantial limitation of the approach presented here. That said, stronger relationships with σ z were observed for higher grain size percentiles (D 84 ) in poorly sorted gravels, both for the c-axis and b-axis (if two extreme values were removed) and thus subsequent SfM-derived grain size estimates may still be of value for use in flow resistance formulae, for example. Based on the empirical research in this study, Fig. 11 presents a new decision tree that can be utilised by practitioners to identify which empirical equation is most suited to characterising river bed D 50 from SfM surveys along river reaches or for specific landforms. This process could potentially be automated, with gravel facies type (e.g., particle shape, degree of sorting) identified automatically. To achieve this level of automation, a much wider variety of roughness metrics would need to be applied to point clouds than is conventionally performed (Smith, 2014). For example, eigenvalue ratios identify clustering of normal vector orientations (Woodcock, 1977;McKean and Roering, 2004) and the inclination index of Qin et al. (2012) may potentially discriminate imbricated patches. Such automation is worthy of further investigation.
Prior to such automation, further work is needed to validate empirical relationships in Table 2 and expand the suite of equations in Fig. 11 by incorporating several other factors such as particle angularity, degree of surface armouring, packing density and burial. Indeed, Aberle and Nikora (2006) found σ z increased with successive armour development. The idealized end-member patches used herein are unlikely to be replicated in reality; Fig. 11 is intended as a demonstration of a potential approach. As with other such studies, it is unlikely that these relationships could be reliably extrapolated to consider rivers exhibiting much larger grain sizes. Most importantly, sorting effects need to be better understood as poorly sorted sediments are common in many environments. For such a method to be of use to practitioners, a set of sampling and reporting protocols would be needed to ensure consistent patch sizes and sedimentological reporting  Table 1. See Table 1 for methodological details of each study (e.g., grid size used).

Conclusions
High resolution topographic surveys of gravel facies can be interrogated to extract grain size information. Previous studies have identified site-specific empirical relationships; however, there is a large variation in such relationships in the published literature. This research has investigated the reasons for differences between published datasets and evaluated the impacts of several factors on σ z -D 50 relationships. Through a series of laboratory and field SfM surveys, sources of variability in surface roughness have been identified for the first time. Results indicated that the survey technique itself is capable of providing accurate grain size estimates. A suite of empirical equations was developed, reflecting the importance of gravel facies-type in the expression of grain size in surface roughness. Further refinement will be needed to improve the method for application to poorly sorted gravel patches and to validate σ z -D 50 relationships on other facies types.
The extraction of grain size information from SfM-MVS surveys would be of great benefit to river restoration practitioners. Estimating grain size using patch scale roughness could be used to aid ecological surveys where species have preferences towards certain grain sizes (e.g., Williams and Mundie, 1978;Kondolf and Wolman, 1993), or provide a detailed baseline survey of the sedimentological nature of an area prior to river restoration (Kondolf et al., 1996;Merz et al., 2006). Using SfM-MVS methods to measure patch scale roughness and consequently estimate grain size is a less expensive and comparatively simple method of obtaining results that are of the same quality as TLS outputs and have thus begun to increase in popularity (e.g., Westoby et al., 2015;Woodget et al., 2015). This can substantially decrease the time and financial cost of thorough post-restoration geomorphic monitoring (Marteau et al., 2016). A combination of surface roughness and photosieving methods, both of which can be extracted from SfM-MVS surveys, may ultimately provide the most reliable grain size estimates over the largest range of gravel facies. the research. Frances Proctor, Andrew Neverman and Zora van Leeuwen helped with field data collection. Jeff Warburton and Duncan Quincey also provided helpful advice.