Important Role of Overland Flows and Tile Field Pathways in Nutrient Transport

Nitrogen and phosphorus pollution is of great concern to aquatic life and human well-being. While most of these nutrients are applied to the landscape, little is known about the complex interplay among nutrient applications, transport attenuation processes, and coastal loads. Here, we enhance and apply the Spatially Explicit Nutrient Source Estimate and Flux model (SENSEflux) to simulate the total annual nitrogen and phosphorus loads from the US Great Lakes Basin to the coastline, identify nutrient delivery hotspots, and estimate the relative contributions of different sources and pathways at a high resolution (120 m). In addition to in-stream uptake, the main novelty of this model is that SENSEflux explicitly describes nutrient attenuation through four distinct pathways that are seldom described jointly in other models: runoff from tile-drained agricultural fields, overland runoff, groundwater flow, and septic plumes within groundwater. Our analysis shows that agricultural sources are dominant for both total nitrogen (TN) (58%) and total phosphorus (TP) (46%) deliveries to the Great Lakes. In addition, this study reveals that the surface pathways (sum of overland flow and tile field drainage) dominate nutrient delivery, transporting 66% of the TN and 76% of the TP loads to the US Great Lakes coastline. Importantly, this study provides the first basin-wide estimates of both nonseptic groundwater (TN: 26%; TP: 5%) and septic-plume groundwater (TN: 4%; TP: 2%) deliveries of nutrients to the lakes. This work provides valuable information for environmental managers to target efforts to reduce nutrient loads to the Great Lakes, which could be transferred to other regions worldwide that are facing similar nutrient management challenges.

Table S2.Total annual nitrogen and phosphorus flux and yield and ranges.Table S3.Range of source contributions to total basin nutrient delivery.Table S4.Range of pathway contributions to total nitrogen and phosphorus delivery.

S1. SENSEflux model equations
The main function of SENSEflux modeling is shown in equation (1) below.Where Lk is the simulated load (kg/day) at the sampling location for an individual catchment.S represents the nutrient source, Spoint is the point source and directly applied into the rivers and streams, Ssep is the septic tank source, and Sij is the application of other sources i that can be harvested to watershed cell j.SepEff is removal efficiency on septic loads, it's fixed as 0.3 for total nitrogen and 0.35 for total phosphorus 1,2 .ExHij is an extraction factor that describes the in-place root zone removal of nutrients before transport (such as Harvest).For all other cells and sources, ExHij is equal to 1. Fj is a subsurface partition parameter, describing the fraction of nutrients that are transported via a subsurface pathway.It is a function of normalized groundwater recharge fraction (recharge fraction in a cell j divides the maximum recharge fraction across the SENSEflux model domain), where rechFj is the groundwater recharge fraction in cell j, and max(rechF) is the maximum value of groundwater recharge fraction across the SENSEflux model domain (equation ( 2)).Recharge fraction was defined as the average annual precipitation becoming recharged and is limited by 0.55 across the basin 3 .
A major update between SENSEflux and its former version 1 is an updated river retention function (equation ( 3)-( 6)), lake retention, and subsurface phosphorus storage with two new model parameters Lacusj and Fstorj, defined in Equation ( 7) and ( 8) respectively.Rj describes river reduction of the remaining nutrients after landscape attenuation, and it split as denitrification for TN (sorption for TP) as well as biological uptake & burial (equation ( 3)).For N denitrification or P sorption (equation ( 4)), it is an exponential function of DNSPj that was calculated based on the flow length tool and used streambed interaction as an input weight raster.The streambed interaction rate calculation is shown in equation ( 6), where  ̂, , and  ̂ are derived from hydraulic conductivity, slope, and basin yield respectively, R and V represent hydraulic radius and velocity, see details in Supporting Information (S3).Biological uptake & burial is an exponential function of Tj that represents in-stream travel time from cell j to the downstream observation point (equation ( 5)).
=  − *   *  −1 *   (3) = (  ) (5) We also consider nutrient attenuation via lakes or reservoirs (Lacusj) as they travel down the stream network, which is a function of travel distance in lakes (equation ( 7)), and only the loss for a lake or reservoir which has a connection with streams is considered due to the inherent river routing scheme in the SENSEflux model.Fstorj is the fraction of groundwater pathway nutrients stored/lost in the soil and the deeper unsaturated zone where fstor is a calibrated constant (equation ( 8)).fstor was assumed to be zero due to the high mobility of nitrogen.By adding this term, we can estimate the amount of phosphorus in every grid cell delivered to the streams.
All basin attenuations are exponential functions of flow length (D) from each cell to the nearest downgradient stream cell.Boj, Bgj, and Bsj are basin reduction parameters, representing overland flow, non-septic groundwater, and septic plume pathways respectively (equations ( 9), ( 11), ( 12)).Tile field pathway Btj is also considered in the model as an alternative overland pathway, representing nutrient attenuation along with tile fields if tile exists in a cell (equation (10)).
Equations describing each of these terms are given in Luscz et al (2017) 1 and Martin et al (2021) 4 .

S2. Spatial distribution of loss terms and basin storage
Three in-situ loss terms are applied before nutrients are transported: Septic removal (SepEff), Harvest (ExH), and Subsurface storage (Fstor).Septic removal efficiency is applied on the septic load and fixed as 0.3 for N and 0.35 for P 1,2 .Harvest includes all in-place root zone nutrient loss and is assumed to occur in cells with manure or chemical agricultural fertilizers applied.The Subsurface storage loss term includes both in-place storage and loss of nutrients below the root zone, which we assume to occur for phosphorus (Eq (8)).Fstor was assumed as zero for nitrogen due to the high mobility of N. By adding this term, we can estimate the number of nutrients in every grid cell delivered to the streams.

S3. Spatial distribution and derivation for instream and lake losses
The attenuation of nutrients during stream/wetland transport is assumed to be broken into two components: 1) water column and sediment interface losses, and 2) hyporheic zone losses.Water column losses consist of biological uptake, followed by subsequent denitrification and particulate transport.For phosphorus, sediment burial is another active process.Hyporheic zone losses may be biological uptake (N or P), denitrification (N), sorption (P), or mineralization (P).We label the corresponding terms "water column (WC)" and "hyporheic zone (HZ)" attenuation.
Water column attenuation is assumed to be a function of travel time in the stream/wetland, given as   =     , where   is a calibrated constant across the domain, and   is the travel time in-stream.Travel time in-stream in each model cell is given by where  is the velocity of water in that stream cell.While this is the loss in a single cell, integrating across all cells along the flow path results in the delivery of nutrients   =     , where   =  −    .Here,   is the total travel time in streams, computed in ArcGIS as described in the main text (Section 3.5).
To compute losses due to streambed and hyporheic zone interactions beneath it, we define losses in this zone to be given by the residence time multiplied by a static constant,  ℎ =  ℎ  ℎ where  ℎ is the residence time in the hyporheic zone, defined as  ℎ =   ℎ ⁄ where  is the hyporheic zone depth and  ℎ is the velocity of water flowing into/out of the HZ.
The streambed interaction factor (D) is given by ~(, , 1 ℎ ⁄ ), where K is the hydraulic conductivity of the streambed sediments, S is the slope of the stream channel, and Rech is groundwater recharge.Thus, we have assumed that higher streambed sediments increase the exchange of surface and groundwater (increasing hyporheic zone depth) and that a higher slope leads to greater streambed morphometric variability, and thus greater incidence of flow paths entering the sediments and exiting shortly thereafter, promoting hyporheic exchange.We further assume that greater groundwater recharge upstream leads to a stronger influx of water from below, which would reduce the depth of exchange through the groundwater pushing back against streamflow.Here, recharge is assumed to be represented by the basin yield of streams at their 30 th percentile, thus ~ 30 .We chose to normalize each of the terms by their maximum and minimum values across the model domain, and we log-transformed K.The final equation for hyporheic zone depth D is: Where We further assume that nutrient uptake is linearly related to the residence time in the HZ,  ℎ which can be expressed as  ℎ = / ℎ .Thus, the flux of nutrients out   =   •  ℎ •  ℎ , where  ℎ is a parameter to be calibrated in the model.Substituting the definitions of   and  ℎ , .Finally, if we substitute this and equation (13)   into our equation for   (noting that the linear constants combine  ℎ =  ℎ •   ) we get the equation ( 14).
The bold terms in Equation ( 14) are independent of nutrient concentrations but vary in space.Therefore, we combine these terms into a single model input layer we term the streambed exchange shown in Figure S6e.Other intermediate inputs to calculate  are shown in Figures S2 and S6.While the values of  and  could be calculated from input static layers (e.g., Figure S2d and Figure S6a), , , and  needed to be computed for each point along the stream channel.For this, we used the at-many stations hydraulic geometry method 5 , deriving the functional power-law relationships that relate median velocity, and hydraulic radius to discharge.More specifically, we first extracted river geometries, including discharge, velocity, width, and area, from USGS gauges across the basin from 1/1/2000 to 12/31/2019.Hydraulic radius was then calculated as area divided by width with the assumption that river channels are rectangular.Then, the log-linear fit relationship was built for stream geometries as a function of median discharge for all gauges within the period.75% of the measurements are used to fit the model and the remaining 25% are used to validate the model, r squared values are used to assess the fitting performance.Lastly, we created a stream geometry raster for the USGLB with the following steps.First, calculation of basin yield.During this step, calculated flow accumulation based on DEM (Section 3.5) is extracted for all the USGS gauges, then up to 20% of the difference between the flow accumulation and reported area is allowed, quality assurance/quality control (QA/QC) are performed to remove outliers afterward, and finally nibble tool was used to replace cells with missing values by the value of their nearest neighbor.We then calculated streamflow for every cell by multiplying the nibbled basin yield with flow accumulation.Then, in-stream velocity and hydraulic radius across the basin were calculated based on the streamflow and derived relationships at gauges between discharge and stream geometries.

S4. Groundwater recharge and tile-drained area calculation
Groundwater recharge was estimated based on a series of linear models derived from the Landscape Hydrology Model (LHM), a coupled process-based hydrological model 3 .LHM was originally developed for the Muskegon River Watershed, located in the central part of Michigan's lower peninsula, and contains diverse land use representative of the broader region.This hydrological model combined several GIS layers including land use, soils, and station observation data to predict stream discharge, groundwater recharge, and evapotranspiration from 1990 to 2004.The linear regression models fit each land use type to the percentage of precipitation that becomes recharge as a function of soil hydraulic conductivity.The annual precipitation from 2008 to 2012 was downloaded from Parameter-elevation Regressions on Independent Slopes Model (PRISM) database 6 and average annual precipitation from 2008 to 2012 was used for linear models.The soil hydraulic conductivity is derived from the soil texture of the soil survey geographic database (SSURGO) 7 and land cover data from the national land cover database 2011 8 .The recharge estimates for US-GLB are shown in Figure S2.
To derive an estimated tile drainage map, we used GIS-based mapping based on the premise that crops grown on land with low slopes and poorly drained soil likely have tile drains.First, cells classified in the 2011 NLCD 8 as "Cultivated Crops" were extracted.Then, we fit a model to land drained by tile county-level data from land-use practice in 2012 for the 109 counties in the US-GLB (Figure S6), published by the United States Department of Agriculture, National Agricultural Statistical Service 9 .55 counties were randomly selected as training datasets and the remaining 54 counties as validation datasets.This model selects areas that are cultivated land-use types, moderately low soil permeability (Ksat < 14.4 mm/hr), and low average slopes (< 1.2%) as tile drainage (r 2 = 0.83).The rest of the 54 counties were used to calibrate the model (r 2 = 0.85).
Estimates of tile-drained areas are shown in Figure S6.The aggregation method (maximum) used in pre-processing data may overestimate the area of tiled fields in the model.

S5. Model parameter extended discussion.
Four of the model parameters (f, ExH, SepEff, and fstor) are linear coefficients on loss terms in the model, while the remainder (basin and river losses) are coefficients in the exponent of an attenuation term-thus their values are not directly comparable.
The subsurface partition parameter f is multiplied by the normalized recharge (equation ( 2)) and represents the proportion of mobile surface-applied nutrients (after Harvest) that are sent through the groundwater pathway.Thus, areas with the highest recharge in the basin have 49% of mobile surface-applied nutrients sent to groundwater for N while 78% is routed through groundwater for P (Table S1).See Figure S14 for a map of the final groundwater partition fraction.It is somewhat surprising that phosphorus has a higher fraction than nitrogen, given the relative ease with which nitrate in particular leaches from soils 10 , however, this may be an artifact of the simplistic relationship between recharge rates and groundwater pathway hydrologic fractionation imposed here.It may also be due to a non-linear relationship between soil texture (underlying recharge) and P mobility.P moves much more readily through sandy soils than finer textured ones 11 .That relationship is also not captured properly here.
Our model predicts that 80% of surface-applied N is harvested, lost to the atmosphere (N only), or stored in the root zone in agricultural settings, while 95% of P is.Figures S10a and S11a show total losses due to harvest, atmosphere, and storage for N and P, respectively.The higher rate of P harvest may be influenced by the tendency of P to sorb to unsaturated zone soils, rather than a more careful accounting of N and P needs when fertilizers are applied.
Phosphorus then experiences an additional deep unsaturated zone storage, where up to 55% of groundwater-mobile P (in the lowest recharge areas, Eq (8)) is stored (Figure S11b).Areas with higher recharge then experience less storage proportionately.
Nutrient attenuation during basin transport through surface runoff, tile drain fields, general groundwater, and septic plumes are determined by the overland travel distance along with the corresponding parameters (bo, bt, bg, and bs, equations ( 9) -( 12)).The lower calibrated parameter value means a higher delivery rate through the pathway, but the number of nutrients delivered through these pathways are not directly comparable due to the different amounts traveled before basin attenuation.Overland flow and tile drains are dominant pathways (Table S1) while groundwater and septic plumes have relatively fewer deliveries.Nutrient losses during nutrient transport across the landscape are determined by travel distance and the calibrated basin reduction parameters (Table S1), and the losses are shown in Figure S10b (TN) and Figure S11c (TP).P attenuations are mainly concentrated in the southern basin with agricultural soils, while N losses have more spatial variability.
Within stream and lake attenuation, water column losses (bio) are higher than streambed and hyporheic zone losses (dnsp) for both TN and TP (Table S1).The losses for TN and TP are shown in Figure S10c and Figure S11d, respectively.Generally, Areas close to the coastline have less stream and lake attenuation for TP due to less travel time.

S6. Load comparison with the SPARROW model
First of all, there are some important differences to note: SPARROW does not use spatially explicit nutrient sources nor attenuation processes and is run at a coarser resolution.Using the same observation dataset, the SENSEflux TN model slightly underestimated high loads while the SPARROW model slightly overpredicted them (Figure S10).Both SENSEflux and SPARROW models slightly underpredicted higher TP loads and overpredicted lower loads.These differences are not surprising because the two models have several notable differences, including nutrient attenuation processes, methods to model nutrient sources, spatial resolution, and timeframes.Specifically, SENSEflux includes four distinct pathways (tile fields, overland, septic plumes, and groundwater, see Figure 1) while the SPARROW model uses data on land-to-water delivery factors, such as soil permeability, drainage density, precipitation, air temperature, the fraction of the stream catchment with tile drains to describe attenuation processes broadly across the basin 12 .Moreover, SENSEflux uses spatially explicit nutrient source inputs from SENSEmap while SPARROW uses land use/cover and county-level estimates of nutrient masses to statistically compute sources more generally 13 .Finally, SPARROW models were developed for TN and TP with a 2002 base year 12 , while nutrient sources and watershed factor data used in SENSEflux are based on ca.2010 data.

( 1 )
6 Texts (pages S3 -S9) Text S1.SENSEflux model equations.Text S2.Spatial distribution of loss terms and basin storage.Text S3.Spatial distribution and derivation for in-stream and lake losses.Text S4.Groundwater recharge and tile-drained area calculation.Text S5.Extended model parameter discussion.Text S6.Load comparison with the SPARROW model.

Figure S2 .
Figure S2.Study region and data source.

Figure S5 .
Figure S5.Spatial domain showing nitrogen and phosphorus loads used for model calibration and validation.

Figure S6 .
Figure S6.Inputs are used to derive the river retention factor in SENSEflux.

Figure S8 .
Figure S8.Log model residuals (kg/day) by watersheds for both calibration and validation datasets.

Figure S10 .
Figure S10.Comparison to the simulated loads in the SPARROW models.

Figure S11 .
FigureS11.TN model loss and attenuation outputs: harvest, basin loss, river, and lake attenuation.

Figure S13 .
Figure S13.Estimated percentages of total nutrients delivered to lakes by sources.

Figure S14 .
Figure S14.The estimated TP yield is delivered to lakes by four key pathways.

Figure S15 .
Figure S15.Spatial distribution of SENSEflux surface and subsurface partition parameter.

( 4 )
Reference (pages S25 -S26) each of the terms with a tilde (~) represents 0-1 normalized values, corresponding to the three-square bracket sections of the left-hand side of the equation.The unit of D is [L] (here, m).Note for  ̃ we limited the maximum value to 1.If we assume that the total flux of nutrients into the HZ (  ) is given by   =  ℎ   , where  is the flux of water in/out of the HZ [L 3 /T] and   is the input concentration to the HZ [M/L 3 ], we can represent  ℎ =  • 1 •  ℎ , where  is the perimeter of the stream channel [L], 1 is the unit length of the channel [L], and  ℎ is the velocity of water exchange in the HZ [L/T].Therefore,   =  •  ℎ •   = • ℎ •   , where   is the input nutrient flux [M/T] in the stream channel above the HZ, and  is the streamflow in the channel [L 3 /T].
we get   = • ℎ •   •   ℎ •  ℎ .Canceling terms, we get   =   •   •  •  ℎ .We can further use the relationship between  (the wetted perimeter of the stream channel) and  =  •  [L 3 /T], determined by the hydraulic radius  = / [L], where  is the channel area, and  is the instream average water velocity.Thus

Figure S1 .
Figure S1.Model key inputs.(a) annual groundwater recharge for US-GLB; (b) overland flowlength; (c) harvested areas where either manure or chemical agricultural fertilizers applied for TN and TP; (d) estimated tile drainage (indicated in yellow) estimated from land use, soil permeability, and average slope

Figure S3 :
Figure S3: SENSEmap nitrogen source for US-GLB resampled to 720 m resolution for display, derived from Hamlin et al (2020).The units are kg/ha/yr except for point sources (kg/yr).The color breaks are based on quantile classification and round-off methods.

Figure S4 :
Figure S4: SENSEmap phosphorus source for US-GLB resampled to 720 m resolution for the display.The units are kg/ha/yr except for point sources (kg/yr).The color breaks are based on quantile classification and round-off methods.

Figure S5 .
Figure S5.Spatial domain showing nitrogen and phosphorus sampling locations that are used for delineating watersheds and loads used for model calibration and validation.(a) TN sampling locations (N = 116), (b) TP sampling locations (N = 119), (c) TN watersheds with loading, and (d) TP watersheds with loading.Maps are classified in quantiles, with each color representing 25% of the study domain.

Figure S6 .
Figure S6.Inputs are used to derive the river retention factor in SENSEflux.(a) average slope, (b) basin yield during baseflow; (c) hydraulic radius; (d) average velocity; (e) streambed exchange rate; (f) N denitrification or P sorption factor.

Figure S7 .
Figure S7.Model residual (log10 model -log10 observed) distribution and density are shown as violin plots.The bottom and top of each box represent the first and third quartiles, respectively, and the line inside each box represents the median.Zero residual is indicated as a black dashed line.The top and bottom bars (whiskers) represent the maximum and minimum residuals, respectively.Data beyond the end of the whiskers are "outlying" points and are plotted individually.None of the means were significantly different from zero, as measured by the onesample t-test, with P values > 0.05.

Figure S8 .
Figure S8.Log model residuals (kg/day) by watersheds for both calibration and validation datasets.The color breaks are based on quantile classification and are rounded to the nearest 0.1.

Figure S9 .
Figure S9.The boxplot represents the median, 25 th and 75 th percentile and max value within 1.5 times the interquartile range above the 75 th percentile, and minimum value within 1.5 times the interquartile range below the 25 th percentile.These values are based on 10% of global optimization runs with lower objective function (MAEL) values (100 for TN, 78 for TP).

Figure S10 .
Figure S10.Comparison to the simulated loads (log10 of kg/day) in the SPARROW models.Blue dots and lines are for SENSEflux simulation, and red is for SPARROW.The dashed black line is a 1:1 line, where simulated loads are equal to observed loads.

Figure S11 .
Figure S11.TN model loss and attenuation outputs.(a) crop extraction of nitrogen; (b) total nitrogen loss during basin transport; (c) nitrogen uptake in streams and connected lakes.Maps are resampled from 120m SENSEflux outputs to 720m resolution here for display purposes and classified in quantiles with each color representing 20% of the dataset.

Figure S12 .
Figure S12.TP model loss and attenuation outputs.(a) crop extraction of phosphorus; (b) in-place phosphorus storage and loss of phosphorus below the root zone; (c) TP loss during basin transport; (d) phosphorus uptake in streams and connected lakes.Maps are resampled from 120m SENSEflux outputs to 720m resolution here for display purposes and classified in quantiles with each color representing 20% of the dataset.

Figure S13 .
Figure S13.Estimated percentages of total nutrients (TN(a) & TP(b)) delivered to lakes by sources.Bars are the source percentage from the best-performing parameter set within the local optimization and error bars represent standard deviation from the unique best-performing global optimization runs.

Figure S14 .
Figure S14.The estimated total yield of TP delivered to lakes by four key pathways (kg/km 2 /yr).(a) -(d) for TP overland, tile fields, groundwater, and septic plume respectively.Maps are resampled from 120 m SENSEflux outputs to 720 m resolution here for display purposes and classified in quantiles, with each color representing 20% of the dataset; the white area in a&b within the basin boundary represents areas with no data as we assumed overland and tile fields are alternative pathways.

Figure S15 .
Figure S15.Spatial distribution of SENSEflux surface and subsurface partition parameter (f) for TN (a) and TP (b).

Table S1 .
Summary of optimized model parameters.The best-performing parameter set from local optimization is reported.The range (minimum, maximum) of the top 10% of global optimization which has lower objective function values is shown in parentheses.Note that capitalized parameter values appear directly in Equations 2 (f), 8 (fstor), 9 (bo), 10 (bt), 11 (bg), 12 (bs), 4 (dnsp), 5 (bio), 7 (lacus) while lower-case parameter values are input to related equations as defined inLuscz et  al. (2017).One exception is a newly added parameter fstor, which is defined in S1.

Table S2 .
Total annual nitrogen and phosphorus flux and yield from the best-performing parameter set with local optimization are reported.The range of flux (kiloton per year; kt/yr) and yield (a tenth of kilogram per square kilometer per year; kg/km 2 /yr/10) are signified by the minimum and maximum parameter value combinations within the 10% of global optimizations.

Table S3 .
Range of source contributions to total basin nutrient delivery at US Great Lake Basin.

Table S4 .
Range of pathway contributions to total nitrogen and phosphorus delivery at US Great Lake Basin.