A novel construct for scaling groundwater–river interactions based on machine-guided hydromorphic classification

Hydrologic exchange between river channels and adjacent subsurface environments is a key process that influences water quality and ecosystem function in river corridors. Predictive numerical models are needed to understand responses of river corridors to environmental change and to support sustainable watershed management. We posit that systematic hydromorphic classification provides a scaling construct that facilitates extrapolation of outputs from local-scale mechanistic models to reduced-order models applicable at reach and watershed scales. This in turn offers the potential to improve large-scale predictions of river corridor hydrobiogeochemical processes. Here we present a new machine-guided hydromorphic classification methodology that addresses the key requirements of this objective, and we demonstrate its application to a segment of the Columbia River in the northwestern United States. The resulting hydromorphic classes form spatially coherent and physically interpretable hydromorphic units that exhibit distinct behaviors in terms of distributions of subsurface transit times (a primary control on critical biogeochemical reactions). This approach forms the basis of ongoing research that is evaluating the formulation of reduced-order models and transferability of results to other river reaches and larger scales.


Introduction
Hydrologic exchange flows (HEFs) are defined as the bidirectional exchange of water between a river channel and hydrologically connected slow-moving environments such the adjacent subsurface porous media, and are a critical component of river corridor and watershed function [1][2][3]. HEFs expose nutrients and other surface water constituents to mineral surfaces and biological agents (e.g. microbes) that catalyze biogeochemical reactions. Reactions are further enhanced by the mixing of surface and subsurface waters of differing chemical composition that may relieve constraints that exist in river or groundwater alone. Surface water exchanges with adjacent sediments also support riparian vegetation that provides critical habitat and enriches productivity and biological diversity. HEFs modulate surface water temperatures through mixing with groundwater and exposure to subsurface media that is buffered against surface temperature extremes [4], potentially creating thermal refugia for fish and other organisms. Biogeochemical reactions in hydrologic exchange zones can represent a major contribution to overall respiration within riverine ecosystems [5], but the influence of exchange zones on river corridor biogeochemistry is highly variable, motivating the need to quantify those factors that control spatial variability in HEFs.
HEFs are primarily caused by pressure gradients generated by the interaction between dynamic river flows and riverbed topography (hydromorphology). The transit time distribution (TTD) of river water entering the subsurface and re-emerging into the channel is important as it controls the times of exposure of waterborne constituents to subsurface environments where biogeochemical reactions are enhanced [6,7]. Large-scale models of HEFs (e.g. NEXSS, [8]) have recently been developed and represent an important advance. However, they involve limiting assumptions such as steady river discharge, homogeneous subsurface properties, and simplified hydromorphic geometries. We aim to improve models through increased mechanistic representation of the primary drivers of hydrologic exchange, foremost of which are the interactions among transient river flow, riverbed hydromorphology, and heterogeneous subsurface properties [9]. We recognize, however, that mechanistic representation is not feasible at scales larger than a few tens of kilometers and thus cannot be directly used to predict cumulative impacts of HEFs on watershed and basin-scale phenomena.
Hydromorphic classification can serve as an organizing construct for extrapolating results of detailed mechanistic models to larger scales for system-scale prediction. If classes of hydromorphic units (HUs) exhibit distinct and characteristic TTDs as predicted by mechanistic models, then mapping HUs in any river segment could facilitate estimation of TTDs based on mechanistic models of selected representative river segments. This is an extension of the underlying concept of the NEXSS model that incorporates a greater degree of mechanistic representation. To be useful in this context, a hydromorphic classification method must exhibit the following characteristics: (a) quantitative: output of the classification must be easily represented in terms of digital data on a spatial mesh; (b) objective: The method must result in repeatable representation of hydromorphic patterns without human subjectivity; (c) automated: the method must be based on standardized digital data and computational workflows to minimize time and labor required for application; and (d) representative: the behaviors of interest (e.g. HEFs and TTDs) must be primarily controlled by hydromorphology, distinct among hydromorphic classes, and transferrable (consistent across multiple instances of hydromorphic classes).
We present a novel hydromorphic classification method that meets these criteria. It is based on unsupervised machine learning applied to digital data (bathymetric field observations and hydrodynamic model outputs) and is thus automated, quantitative, and objective. Our method produces results similar to previously published classification approaches of hydromorphology but eliminates the subjectivity inherent to those approaches. We applied and tested our method using data from the Columbia River, USA. We find that simulated TTDs are distinct between different hydromorphic classes, thus meeting the first requirement for representativeness. Ongoing research is evaluating transferability and influence of factors other than hydromorphology (e.g. subsurface heterogeneity) on HEFs and TTDs.

Classification approach overview
A wide range of classification schemes have been developed for fluvial systems, reflecting the intended purpose of the classification, different disciplines involved, and the characteristics of the studied environment [10]. Classification typically delineates spatial features such as river patterns, floodplains, and in-channel features (e.g. pools and riffles), which can be separated according to a set of measurable parameters or qualitative descriptors, some of them being descriptive of the form itself (width, depth, slope, length, geometry of nested features such as braided index for a braided reach) or inferring differences in terms of functional linkages between forms and associated processes [11]. Standardized classification methodologies have recently been developed by river geomorphologists, such as the Geomorphic Units Survey and classification system [12] and the formal taxonomy and mapping approach of Wheaton et al [13]. However, there does not yet exist a widely accepted standard. It is contended that 'the fluvial geomorphic community lacks a broadly applicable framework for consistent identification of such features [geomorphic units].' [13] Therefore, we have developed a novel approach that merges elements of several published frameworks. We focus on those methods that incorporate data from hydrodynamic simulations under variable flow conditions and provide an objective and quantitative means of mapping the entire river channel into well-defined units. In particular, the taxonomy and hierarchy of Wheaton et al [13] can be combined with the hydrodynamics-based characterization approach of Wyrick et al [14] to create an approach that has the desired attributes discussed in the introduction. Wheaton et al [13] proposed a taxonomy of fluvial landforms comprising 68 distinct geomorphic taxa. Their mapping methodology is manual in nature, but is based on specific quantitative and qualitative descriptors that reduce subjectivity in its application. Wyrick et al [14] proposed an approach to classification that is based explicitly on hydrodynamic simulation outputs. This approach is less manual than that of Wheaton et al [13] but does not provide a comprehensive taxonomy of morphologic unit types. A key step in this approach is the assignment of ranges of depth/velocity combinations to four geomorphic unit types (run, pool, riffle, glide), based on literature reports, field studies, and expert knowledge, and is therefore somewhat subjective in nature. The mapping process then consists simply of assigning each raster pixel to one of the four unit types based on its simulated flow depth and velocity. All contiguous pixels of the same type become a single geomorphologic unit, and some post-processing can be performed to enforce minimum size limits and eliminate outlier pixel values. Our approach combines outputs from 2D hydrodynamic models with metrics derived from field bathymetric observations (digital elevation models or DEMs) in a systematic fashion to define hydromorphic classes and map HUs. The subjectivity and manual implementation inherent to the above methods is eliminated by applying an unsupervised machine learning method to select feature sets and identify classes (clusters), then digitally mapping these clusters into 2D space based on site-specific data.

Study site
Our approach is developed and tested using data from the Hanford Reach of the Columbia River, a 70 km stretch of river in southeastern Washington State, USA (figure 1). River discharge in the Hanford Reach varies significantly over time scales ranging from hours and days up to seasonally and interannually. The associated river stage changes influence river corridor biogeochemistry and ecology, both directly through changes in inundation status and indirectly through dramatic fluctuations in the magnitude and direction of HEFs. The Hanford reach experiences a wide range of types and magnitudes of HEFs because of the highly variable discharge combined with highly permeable surficial aquifer sediments [9]. The understanding to be gained through this research has direct implications throughout the Columbia River Basin (CRB), which is the fourth largest by total discharge in the United States and includes critical natural resources and ecosystems.

Deriving bathymetric attributes
Riverbed digital elevation data were available from prior combined LIDAR and bathymetric surveys, on a grid resolution of 1 m [15]. The derivatives of digital elevation were calculated from these data with geographic information system (GIS) analysis [16]. Slope is calculated as the maximum rate of change in elevation from a cell to each of its eight neighbors. Aspect is the direction of maximum slope, stated as degrees in range [0, 359.9], measured clockwise from north. Aspect is flagged with a value of −1 when the surface is flat (no slope). Curvature is the second derivative of the surface elevation or the rate of change of the slope. Positive curvature indicates the surface is upwardly convex at the cell. A negative curvature indicates the surface is upwardly concave at the cell. A value of zero indicates a flat surface. Here we introduce an curvature indicator variable 'concavity' , which takes values of (−1, 0, 1) for concave upward, flat, and convex upward curvatures, respectively.

Computing dynamic flow attributes
The Modular Aquatic Simulation System 2D simulator (MASS2) [17,18] (see supplemental information (SI)) was previously calibrated and used to simulate 2D transient depth-averaged Hanford Reach hydrodynamics from 1917 to 2016 at a nominal 10 m resolution using observed flow conditions [19]. Data analysis was performed for MASS2 simulation outputs for each cell in the domain. For these analyses we used only simulation outputs for the 'pre-McNary' period (the time prior to the construction of McNary Dam in 1953 and its subsequent inundation of the lower Hanford Reach). Relative to the modern era, this period is characterized by higher peak magnitude and variability of discharges, and the hydrodynamics are found to have the best correspondence to riverbed texture as shear stress distributions, particularly under high flow rate dynamic conditions (e.g. pre-McNary), exert a primary control on riverbed substrate size distributions [20]. It is assumed that the current morphology and substrate texture of the Hanford Reach, which has been observed to be stable under current flows [21], was established under higher-flow regimes prior to dam construction that no longer occur due to the flood control function of the dams. Temporal statistical moments (e.g. mean, variance, skewness, and kurtosis) of simulated hydrodynamic attributes (e.g. river depth, flow velocity, shear stress) were computed for the simulated time series at each cell. These temporal moments at each grid cell are used as descriptors for classification in parallel to bathymetric properties such as slope, aspect, and concavity; this allows consideration of the influence of variable flows in the grid cell classification, as compared to most of the previous classification methods that consider only a single steady flow condition (e.g. baseflow or mean discharge).

Machine learning-based feature selection and clustering
Machine learning (ML) has been applied to make the best use of available data or provide guidance on further data acquisition, for developing physical understanding or models when there is limited physics knowledge (e.g. data-driven modeling) or when the computational cost of a detailed mechanistic model is too high (e.g. surrogate models). The application of these state-of-art techniques usually requires a significant amount of data to develop models with acceptable accuracy, stability and reliability.
When large amounts of unlabeled data (i.e. training samples of natural or human-created artifacts without meaningful tags or labels) are available, unsupervised ML techniques can be used for data mining tasks [22], such as heterogeneous data analysis, cluster and pattern identification, and anomaly detection (e.g. geological formation boundaries, preferential flow paths, faults, barriers, etc.). Similarly, when there are sufficient 'labeled' data, supervised methods (e.g. classifiers or regressors) [23] can be used to discover/establish causal relationships between predictors (e.g. hydrological properties/features) and key hydrobiogeochemical process performance metrics (e.g. HEFs, heat flux). When data are limited and incomplete (e.g. data gaps and partial knowledge), physics-informed reinforced machine learning can be used to simultaneously fill the gaps and learn unknown physics.
Machine learning methods can be applied, including several clustering and classification approaches, among which principal components analysis (PCA) is often used for feature selection of the most important predictors for classification purposes, while K-means and expectation-maximization (EM) clustering, combined with Bayesian information criteria, (BIC), are used to determine the optimal number of classes [24,25]. Instead of assigning examples to clusters to maximize the differences in means for continuous variables, the EM clustering algorithm computes probabilities of cluster memberships based on probability distributions, with the goal to maximize the overall probability or likelihood of the data points given the final clusters. The HU classification system is shown schematically in figure 2, in which DEM data were used to derive concavity, slope and aspect attributes while simulated hydrodynamics for the pre-dam time period were used to calculate the statistical moments. This full set of attributes was subjected to feature selection using PCA and EM-clustering to determine the subset of variables used for HU classification. The final classification was established through K-means clustering applied to the selected variable subset.

Verification of the hydromorphic classification system
The developed classification system was checked for spatial coherency and transferability, physical interpretability, and consistency with previous taxonomies. The identified hydromorphic classes were mapped on a pixel-by-pixel basis, then polygons of like pixels were extracted to derive HU-specific characteristics such as sizes, shapes, aspect ratios. Field ground truthing was performed by experienced geomorphologists through boat-based surveys and comparison to published observations of riverbed substrate sizes.

Relationship of hydromorphic classes to HEFs and TTDs
Three-dimensional models of coupled river hydrodynamics and subsurface flow and transport simulations were applied to a 10 km subset of the study area to evaluate relationships between hydromorphic classes, HEFs and subsurface TTDs. These simulations use computed hydrostatic pressures at the riverbed from the MASS2 model as transient boundary conditions to simulate subsurface flow using PFLOTRAN (an open source subsurface flow and transport model, see SI, section S1 (available online at stacks.iop.org/ERL/16/104016/ mmedia)) [26]. MASS2 time series of transient flows from the years 2013-2015 were used for these simulations, as the HEFs and TTDs of interest are controlled by current flow conditions (not pre-dam as were used to derive the HU classification scheme). The PFLOTRAN model spin-up period was January 2013 through September 2013. Particles were released from October 2013 through October 2014 and then tracked through the end of the simulation period (December 2015) to accommodate long transit times. PFLO-TRAN simulations were subject to prescribed patterns of aquifer properties [9] based on extensive previous studies at the Hanford Site. To account for the impacts of regional groundwater flow, boundary conditions for the PFLOTRAN domain (other than the riverbed pressures) were prescribed based on interpolation of outputs from the larger-scale regional model of Shuai et al [9]; see SI (section S1) for more information. In this initial test of the correspondence among HEFs, TTDs, and HUs, we assumed a homogeneous value for the conductance parameter that represents permeability of the riverbed alluvial layer, following Shuai et al [9]. We recognize that heterogeneity in riverbed conductance can exert considerable control on HEFs and our ongoing research and model development is considering alternative representations including HU-and facies-based heterogeneity models and mappings based on machine learning. Particle tracking was used based on transient flow fields output from PFLOTRAN to quantify TTDs for particles released at various locations and times. The simulated HEFs and TTDs were then evaluated with respect to the defined classification system.

Variables selected for classification
Sixteen variables were included in the initial set of potential variables for riverbed classification at every map pixel in the study domain. These include bathymetric slope, aspect ratio, variability, and concavity, and simulated hydrodynamic attributes (e.g. mean, variance, skewness and kurtosis of depth, velocity, and shear stress) during the simulated discharge period.
These 16 variables contain redundant information, for example, among mean flow depth and its statistical moments such as variance and skewness, or between flow velocity and riverbed shear stress. Furthermore, a classification system with such a high dimensional factor matrix may render interpretation of the results difficult. Considering both the physical interpretability and information redundancy of the factors, we reduced this initial set of variables using PCA. PCA results (see SI, section S3) show that (a) the high dimensional factor matrix can be reduced to four dimensions with four principal components explaining more than 90% of the total variability of the data matrix, and (b) the factors can be clustered into groups around four distinct variables: local slope, flow velocity, flow depth, and local concavity. The other 12 variables are highly correlated with these four key variables and can be eliminated from consideration for the classification system.

Number of clusters selected
BIC [27] was combined with the EM algorithm to determine the optimal number of classes in our study, where classes were identified using K-Means clustering. BIC = 2logL(θ) + klogn, where θ is the vector of model parameters (classifiers), L(θ) is the likelihood of the candidate (classification) model given the data when evaluated at the maximum likelihood estimate of θ, k is the number of estimated parameters in the candidate model, and n is the number of observations. It is widely used for model selection through balancing goodness of fit and complexity of models. BIC model selection indicated optimality of seven to nine classes, providing balance between the likelihood of correctly assigning observations to the defined clusters and model complexity. Based on this criteria and qualitative comparison to existing classification taxonomies, we selected the number of classes to be seven.

Clusters represent spatially coherent and interpretable hydromorphic classes
The identified seven clusters, shown in scatter plots in figure 3, are analogous to the four variable groupings in [14], except that dividing values are subjective in the Wyrick classification (replaced here by objective clustering) and we consider slope and concavity in addition to mean depth and mean velocity. As discussed above, these four variables were selected from the original set of 16 variables through PCA. We note that, while it appears that HUs can be effectively defined using only the mean flow attributes (velocity and depth), the transient flow impacts will still be important and were considered when computing HEFs and TTDs. When these clusters are mapped into our spatial domain, as shown in figure 4, they form spatially coherent HUs. Based on the values of the four variables that are characteristic of each class, these can be interpreted in terms of descriptors used in standard hydromorphic taxonomies [13], such as 'run' , 'fast glide' , 'pool' , 'slow glide' , 'riffle transition' , 'slack water' , and 'riffle' . For example, pools are characterized by large depth, low to intermediate velocity, low riverbed slope, and concave zones; while runs have intermediate depth, intermediate to large velocity, low riverbed slope, and can be concave, flat, or convex. We also tested the consistency of the approach by applying the classification approach (i.e. over the entire reach) and to local subdomains, each time yielding consistent classes and maps with only negligible changes along the boundary grid cells.
We note that, although results are not shown here for conciseness, we applied these methods to individual sub-sections of the reach as well as to the entire reach and observed that the classification scheme derived and the output maps of HUs varied only slightly among these various cases, indicating that the HU classifiers and resulting classes are generally applicable across the domain of study.

Field verification
The classification obtained was verified by experienced geomorphologists using near-shore and boatbased field surveys at locations along the reach from the meandering horn areas to the farther downstream of the study area during different seasons. The surveys were able to verify the spatial coherence of the classified neighborhood areas and spatiotemporal consistency in terms of flow velocity, river water depth, riverbed curvature, and substrate texture. In addition, spatially well-distributed observations of riverbed substrate size were available from a number of previous video-based surveys [28] and were also used for ground-truthing analysis, which showed distinct distributions of substrate size within several major HUs (see SI, section S5); for example, the 'runs' correspond to large substrate size with limited variation and therefore high riverbed permeability with small uncertainty, while the 'glides' have smaller substrate size on average but a wider range of variability. The distinct distributions observed for the various HUs support our assertion that they represent physically meaningful features in terms of riverbed hydrodynamic properties.
The HUs also exhibit distinct geometric attributes such as the principal length, width, and aspect ratio. The principal lengths are determined by finding the distance between the furthest pair of points within each polygon; the effective widths (and therefore aspect ratios) are calculated by dividing the polygon areas by the principal lengths. In general, the pool areas have the largest principal length and area on average, and the largest range of aspect ratios; while the slow glide zones tend to have the least principal length and area on average, and the smallest variations in aspect ratio.

Relationship to HEFs and TTDs
The objectives of our research require that not only are we able to define a meaningful hydromorphic classification scheme, but that the HUs so defined exhibit distinct and characteristic behavior in terms of variables of interest, specifically magnitudes and directions of HEFs and subsurface TTDs. As a preliminary test, we performed a set of simulations on a 6 × 6 km subdomain (approx. 10 km of river channel) using the integrated PFLOTRAN, MASS2, and particle tracking codes. Since particles are released at points on the riverbed which have an associated HU type, subsurface transit times of all particles released on each HU type can be gathered into distributions (TTDs). Calculated TTDs for each of the seven HU classes and for the subdomain as a whole are shown in figure 5. Each HU exhibits multimodal TTDs: the most common transit times (i.e. TTD modes) are around O(100) days for the entire reach, but the modes are typically below 100 days for some HUs (e.g. pools, runs) and above 100 days for others (e.g. riffles, fast glides); slow glide and slack water zones exhibit unimodal TTDs with the mode around 5-10 days. The transition zone represents a small fraction of the study domain, and exhibits two modes around O(10) and O(100), respectively. Note that the TTDs are calculated based on simplified spatial heterogeneity of the riverbed conductance, which may have reduced the distinguishing behaviors of TTDs among HUs; nevertheless, we find that the TTDs are distinct among the HUs for the study section, indicating that the hydromorphic classification does provide a potential construct for estimating TTDs in HUs of sections that have not been subjected to detailed 3D modeling.

Discussion
The integration of bathymetric information with output of 2D hydrodynamic simulations, the feature selection and dimensional reduction, together with the information criteria-based model selection, yield a classification system where hydromorphic features can be efficiently and objectively mapped at the reach scale and beyond. While our approach yields mappable units that are qualitatively similar to those defined in common taxonomies (e.g. [11].), it uses a systematic quantitative approach that eliminates the subjectivity associated with previous taxonomies.
Our approach extends that of Wyrick et al [12] by using unsupervised machine learning methods to identify the relationships between observable or computed variables and morphological units rather than assigning those relationships subjectively based on expert knowledge, and our method considers a wider range of variables than Wyrick et al [12].
The approach addresses data redundancy and possible data quality issues with the data compression and reduction procedure; in the regions where data are absent, it also provides guidance on the necessary data with desired spatial coverage and resolution for riverbed classification or verification purposes. The resulting mapped HUs enable selection of specific segments of the river (containing one or more features of given types) for three-dimensional integrated river and groundwater flow simulations [29] and provide guidance for selection of extended field characterization and experimentation sites for comprehensive understanding of a complex large river system.
Classification of a large complex system into HUs that correspond to distinct HEF and TTD behaviors can help identify HU-specific biogeochemical functions. For example, the Hanford Reach of study is subject to dynamic hydrological forcing with dam operations, the dynamic stage fluctuations created rapidly changing losing-gaining conditions in the river, which led to highly transient TTDs and resulted in multiple modes of TTDs. Among the HUs, slack water and slow glide regions experience many short transit times leading to high biogeochemical reaction potential relative to zones with a few long transit times, while fast glide and riffle correspond to spatially coherent upwelling zones (see SI, section S6) and intermediate transit times, providing friendly environment and nutrition for fish. In the study system, long transit times can also correspond to pathways for river water to interact with groundwater contaminant plumes [9,30]. We have also noted a spatial correspondence between patterns of salmon spawning and our derived HUs, and are currently exploring these relationships using machine learning approaches.
There are several remaining challenges and scientific questions. A critical question for our purposes is how transferrable these results are. There exists a large range in stream orders across the CRB, and it is likely that the dominant classification factors such as velocity, depth, and slopes and their relationships will vary across the different environments. While detailed bathymetric data and prior hydrodynamic simulation outputs are available for the mainstem Columbia and Snake Rivers, smaller streams typically have not been characterized in this manner. Also, regional groundwater flow conditions and aquifer properties also exert control on HEFs and TTDs, thus potentially limiting their transferability. Our study example, the Hanford Reach of the Columbia River, is a generally gaining reach that is well connected to the saturated groundwater aquifers; care would be needed in applying these results to losing river reaches and/or reaches that were disconnected from the saturated aquifer. Full evaluation of the extensibility of the approach across large basins will depend on advances in remote sensing to provide necessary DEMs for lower-order stream channels, and on more complete demonstration that the approach is transferrable to other fluvial settings.

Conclusions
In this study, we propose and apply an effective method for quantitatively and rigorously defining and mapping hydromorphic classes in a large mainstem river. The new approach considers both flow simulation outputs (depths and velocities) and bathymetric/topographic data in objectively defining and mapping hydromorphic classes. The resulting mapped classes correspond to spatially contiguous regions (HUs), and the spatially coherent features are physically interpretable and consistent over the entire reach, as well as in sub-domains within the reach. The classification scheme was verified by ground-truthing against field observations of feature geometry and riverbed textural properties.
Preliminary analysis of relationships between HU class and simulated values of HEFs and TTDs based on high-resolution mechanistic modeling confirm that these important characteristics of riversubsurface exchange are distinct for each HU class, meeting the first necessary condition of a construct for upscaling results of high-resolution mechanistic models within limited spatial domains to develop reduced-order models applicable over much larger domains. Our initial results suggest that these results are transferrable to other reaches within the mainstem Columbia and Snake Rivers, where sufficient data are available. Transferability to smaller streams within the watershed, and availability of data to support analysis in those environments, remains a topic of ongoing research and development.

Data availability statement
The data that support the findings of this study are available from the corresponding author upon request.
The data that support the findings of this study are openly available at the following URL/DOI: https://ess-dive.lbl.gov/. Data will be available from 31 December 2021.