Chesapeake legacies: the importance of legacy nitrogen to improving Chesapeake Bay water quality

In the Chesapeake Bay, excess nitrogen (N) from both landscape and atmospheric sources has for decades fueled algal growth, disrupted aquatic ecosystems, and negatively impacted coastal economies. Since the 1980s, Chesapeake Bay Program partners have worked to implement a wide range of measures across the region—from the upgrading of wastewater treatment plants to implementation of farm-level best management practices—to reduce N fluxes to the Bay. Despite widespread implementation of such measures and notable reductions in N inputs, water quality across the region has been slow to improve. Such lack of response has in some cases been attributed to N legacies—accumulations of surplus N in soils and groundwater—that can contribute to time lags between implementation of conservation measures and improvements in water quality. Here, we use the ELEMeNT-N modeling framework to explore the role of legacy N in slowing reductions in N loading to the Bay, and to provide estimates of the time required to meet water quality goals in nine major tributary watersheds. Our results first show that recent improvements in water quality can be attributed to decreases in N surplus magnitudes that began to occur in the 1970s and 1980s, and that such improvements will continue in the coming decades. Future simulations suggest that, even with no additional changes in current management practices, goals to reduce N loads across the region by 25% can nearly be met within the next two decades. The present results also suggest that time lags to achieving water quality may vary considerably in the individual study watersheds, with the longest lag times being found in the highly agricultural Choptank watershed, where N surplus magnitudes remain high and where legacy N remains a major control on water quality.


Introduction
Over the last century, coastal regions around the world have experienced an increased incidence of algal blooms and the development of large hypoxic zones (Diaz et al 2017). Hypoxia, which is commonly defined as dissolved oxygen levels less than 2 mg l −1 , degrades aquatic ecosystems by damaging habitats, disrupting the food web, altering nutrient cycling, and increasing the acidity of the water column (Du et al 2018). The widespread development of coastal eutrophication is primarily driven by increased fluxes of nitrogen (N) and phosphorus (P) from the land to the coasts (Paerl et al 2014), and it is broadly considered to be one of the greatest current threats to coastal ecosystems (Boesch 2019, Malone andNewton 2020).
Chesapeake Bay, North America's largest estuary, has not been exempt from such problems. Evidence of persistent summer hypoxia was first noted in the Chesapeake Bay in the 1930s (Newcombe and Horne 1938), and there have been significant increases in both the severity and extent of this seasonal hypoxia in subsequent decades (Kemp et al 2005, Murphy et al 2011. These increases have occurred concurrently with increases in both human population and the use of commercial N fertilizers across the region (Liu andScavia 2010, Du et al 2018). Many areas of the Chesapeake Bay region are densely populated, and others are home to high-input agriculture and intensive livestock production, leading to a long-term elevation of N and P loads to the Bay (Hagy et al 2004, Ator et al 2019. Currently, the majority of tidal waters across the Chesapeake Bay have been classified as 'impaired' by the U.S. Environmental Protection Agency (ChesapeakeProgress 2020), despite decades of attempts to improve nutrient management practices. Much of this lack of attainment can be attributed to a lack of reductions in watershed nutrient loading. It is increasingly understood that this delay in achieving water quality improvement can be attributed in part to the presence of legacy nutrients within watersheds, with legacy being defined as nutrients that have accumulated within the landscape over decades of elevated anthropogenic N inputs (Van Meter and Basu 2015. This legacy can exist in both organic and inorganic forms within soils, sediments, and groundwater. Across the Chesapeake Bay watershed, studies have demonstrated both an accumulation of legacy N in shallow groundwater systems and multi-decadal groundwater residence times, leading to conjecture that it could take several decades to see the full effects of newly implemented conservation measures (Bachman et al 1998, Focazio 1998, Sanford and Pope 2013 In 2010, the U.S. Environmental Protection Agency established a new plan to restore clean water across the Chesapeake Bay Region (USEPA 2010). Referred to as the Chesapeake Bay total maximum daily load (TMDL), the plan was established due to perceived insufficient progress toward improving water quality in the Bay and its tributaries. In short, a goal was set to limit N delivery from the Chesapeake Bay watershed to approximately 84 ktons of N per year, with additional limits being set for P and sediments (USEPA 2010). This goal of 84 ktons N corresponds to a 25% reduction in N loads compared with 2009 baseline values, with all measures to meet these goals to be put into place by the year 2025. These measures include not only commitments to reducing nonpoint source N runoff from agricultural areas, but also interventions such as wastewater treatment plant (WWTP) upgrades and stricter controls on atmospheric N emissions (USEPA 2010).
Development of the TMDL was based on both extensive monitoring and the results from multiple models, calibrated to decades of river discharge and water quality data. The Chesapeake Bay Program partnerships' Watershed Model, which simulates the transport of N, P, and sediments to the Bay, was used to estimate the extent to which changes in management and land use could reduce watershed nutrient loading (USEPA 2010). The Watershed Model is based on the Hydrologic Simulation Program-Fortran (HSPF) model, which is a physically based, lumped-parameter model that can simulate both discharge and water quality .
Like most water quality models, however, HSPF does not consider the potentially long residence times for N and P within the landscape, and it assumes fixed, empirical relationships between nutrient inputs and stream nutrient loading across various land-use types Linker 2013, Van Meter et al 2018). Accordingly, while these types of model simulations may be able to accurately predict reductions in nutrient load in response to changes in management and land use, they do not provide estimates of the time required to achieve such reductions. In particular, they do not take into account the role of nutrient legacies in increasing the time needed to meet water quality goals (Chen et al 2018, Ilampooranan et al 2019. In the present study, we have used the ELEMeNT-N (Exploration of Long-tErM Nutrient Trajectories-Nitrogen) model  to quantify the effects of legacy N on the time required to meet policy goals for reducing N loading to the Chesapeake Bay. ELEMeNT-N was specifically developed to simulate accumulation and depletion of legacy N stores in anthropogenic landscapes . In this work, we focus on nine tributary watersheds, which together account for more than 90% of riverine discharge in the Chesapeake Bay's nontidal watershed (Qian Zhang et al 2015). In our simulation of both historical and future N loading, we attempt to answer the following questions: (1) What is the relationship between N inputs and N load across the Chesapeake region, and how does this relationship vary, both spatially and temporally across the study watersheds? (2) How do changing relationships between surplus N and N load relate to accumulation and depletion trajectories for legacy N? (3) To what extent do N legacies contribute to time lags in achieving water quality goals? (4) How long will it take to achieve desired reductions in N loading?

The ELEMeNT-N model
The ELEMeNT-N model is a process-based model that pairs source-zone dynamics, including accumulation and depletion of soil organic N (SON) within the root zone, with a travel time-based approach that allows for transport and transformations of N along subsurface pathways . It explicitly takes into account the accumulation of legacy N, in both organic and inorganic forms, within soils and groundwater, and the contribution of these legacies to watershed N loading trajectories.
ELEMeNT-N conceptualizes the modeled watershed as a set of points, each corresponding to an individual stream tube with its own unique travel time to the river network (Jury and Utermann 1992. Accordingly, the watershed can be treated as a bundle of stream tubes, with a unique distribution of travel times, f (τ ). This distribution then controls the subsurface flux of nutrients to the outlet, M out (t) (ML −2 T −1 ) (Małoszewski and Zuber 1982, Haitjema 1995, McGuire et al 2005, Van Meter and Basu 2015. where J s,wshd (t − τ ) (ML −2 T −1 ) is the source function describing the mass flux of nitrate from the unsaturated zone to groundwater, γ (T −1 ) is the firstorder rate constant describing denitrification along hydrologic pathways, SURF basin,t (t) (ML −2 T −1 ) is the surface N yield (runoff, as nitrate-N), and DMSTC sw (t) (ML −2 T −1 ) is wastewater N (Van Meter et al 2017) (figure S1, supplementary information (SI)) (available online at stacks.iop.org/ERL/16/ 085002/mmedia). For detailed information regarding the derivation of these terms, see the SI (text S1). The guiding principle behind source-zone dynamics within the ELEMeNT model is that nutrient fluxes at any point in time are a function of not only current inputs to the landscape, but also to past land use and management . Accordingly, the watershed is evenly divided into s distinct units representing distinct land use trajectories, and the temporal evolution of each unit is stored within a 2D land-use array, LU (s, t): where A crop (t)and A past (t)correspond to the watershed percent cropland and pastureland, respectively. N inputs to cropland, pastureland, and nonagricultural land are calculated using a surface N balance approach (Bouwman et al 2005, Byrnes et al 2020a across the different landscape types, with surplus N being defined as landscape N inputs minus N uptake (Leip et al 2011): where NS (crop, t) (ML −2 T −1 ), NS (past, t) (ML −2 T −1 ), and NS (other, t)(ML −2 T −1 ) are N surplus inputs to cropland, pastureland, and nonagricultural areas; FERT crop (t) (ML −2 T −1 ) and FERT past (t) (ML −2 T −1 ) refer to inorganic N fertilizer; MAN crop (t) (ML −2 T −1 ) represents livestock manure N applied to cropland; MAN past (t) (ML −2 T −1 ) represents both manure N applied to pastureland as well as livestock N excretion during grazing; DEP (t) (ML −2 T −1 ) refers to atmospheric N deposition; BNF crop (t) (ML −2 T −1 ) and BNF past (t) (ML −2 T −1 ) are biological N fixation; UPTAKE crop (t) (ML −2 T −1 ) represents N removal by crop uptake; UPTAKE past (t) (ML −2 T −1 ) represents N removal by livestock during grazing; and UPTAKE othr (t) (ML −2 T −1 ) refers to forest N removal. Note that any N remaining on the landscape as crop residues is taken into account in the N surplus calculations, as only N in harvested product is considered as part of the crop N uptake term. See the supporting information regarding calculation of individual components of the N mass balance (SI text S1.1.1).

Study area: the nontidal Chesapeake Bay watershed
The 165 000 km 2 Chesapeake Bay watershed extends from New York to Virginia, including parts of New York, Pennsylvania, West Virginia, Delaware, Maryland, and Virginia, and all of the District of Columbia. While the Bay's estuarine system includes more than 50 tributaries (Pritchard and Schubel 2001), we focus here on nine tributaries that have been monitored since the mid-1980s at the watershed fall-lines-approximately 25-100 km from the Bay-by the USGS River Input Monitoring (RIM) program. These nine tributaries, the Appomattox, Choptank, James, Pamunkey, Patuxent, Potomac, Rappahannock, and Susquehanna Rivers, are collectively referred to herein as the Non-Tidal Chesapeake Bay Watershed (NTCBW) (figure 1).
Land-use trajectories for the nine study watersheds show remarkable similarities in trend (figure 2). Agricultural land-use expanded from 1850 to 1900 at the expense of forests and woodlands. However, since the early 1900s, much of the region has experienced a widespread reforestation of land previously used for agriculture along with increasing urbanization (Kemp et al 2005, Thompson et al 2013. As shown in the figure, the largest decreases in agricultural land use since 1900 have occurred in the Potomac watershed, where the proportion of agricultural land has decreased from 62.5% in 1900 to 24.7% in 2018.

N mass balance data
Components of the N mass balance were calculated using the following datasets: (1) For the period 1985-2018, data was obtained at the individual watershed scale from the 2019 Chesapeake Bay Program Partnership Phase 6 Chesapeake Assessment and Scenario Tool (CAST) (Devereux and Rigelman 2014, Chesapeake Bay Program 2019); for the period, 1930-1984, county-scale data was obtained from the TREND-Nitrogen dataset (Byrnes et al 2020a(Byrnes et al , 2020b and then aggregated to the individual watershed scale; prior to 1930, components of the N mass balance data were calculated from state-scale U.S. Census of Agriculture data, grid-scale atmospheric N deposition data (Hegglin et al 2016), and grid-scale N fertilizer data (Cao et al 2017). For additional details regarding calculation of individual components of the N mass balance, see the supplemental information (SI text S1.1.1).

Land use and population data
Watershed land-use trajectories (1700-2018) were constructed using watershed-scale data obtained from CAST (Chesapeake Bay Program 2019) , supplemented by historical modeled cropland data from Ramankutty and Foley (1999). Historical population estimates were made based on U.S. county-level population data from the U.S. Census Bureau (Forstall 1996) and from the National Bureau of Economic Research (https://data.nber.org/ data/census-intercensal-county-population.html).

River water quality and streamflow data
Water quality (NO x ) and daily streamflow data for the nine watersheds are available from the USGS National Water Information System web interface (USGS 2020). Daily concentrations and loads were calculated from the source data using the weighted regressions on time, discharge, and season method

Nitrogen use efficiency (NUE)
Time-varying agricultural N use efficiency (NUE) values were calculated for each of the study watersheds. Here, it is important to clarify that NUE can be calculated in different ways in different contexts (Cassman et al 2002). In agronomy, for example, the NUE has been defined as the crop recovery efficiency of applied N, calculated at the field scale as the kg increase in N uptake per kg N applied. In the present work, however, we use an approach typically used at larger scales, defining the agricultural NUE as the ratio between N yield and N inputs on agricultural land (Van Meter Submitted, Lassaletta et al 2014, Howarth 2019, Roy et al 2021). For additional details on these calculations and results for the study watersheds, see the supplementary information (SI text S2.0, figure S2).

Model calibration and validation
The ELEMeNT model was calibrated and validated based on (1) annual catchment N loads and (2) the cropland SON content. For catchment N loads, the Nash-Sutcliffe efficiency (NSE) was used as the objective function to assess goodness of fit to the observed data (Nash and Sutcliffe 1970). For SON, we first used gridded SSURGO soil data to obtain estimates of the soil organic C content for each of the nine study watersheds (NRCS 2020). As C:N ratios have been found to be remarkably constrained across a range of soil types, climate, and topographical position (Cleveland and Liptzin 2007), we then calculated the SON content from the soil organic C magnitudes based on cropland C:N ratios (Schimel et al 1985, Zhang et al 2014, Shi et al 2017, Xu et al 2019, NRCS 2021. The percent bias (PBIAS) was used as the objective function to evaluate model performance for the SON content (Moriasi et al 2007). For additional details regarding model calibration and validation, see the supplemental information (SI text S3.0; tables S3-S5).

Determining dominant controls on riverine N loads
Relationships between riverine N loads, watershed N inputs, and the overall N surplus were first determined using simple linear regression analysis. The time series was divided into three periods to capture the effects of different periods of land use and management on the surplus-load relationship: (1) 1930-1945, a period of relatively low N inputs, prior to the large, post-1945 increases in commercial N fertilizer use; (2) 1970-1980, a period of large increases in riverine N loading and high rates of fertilizer N application; and (3) 2000-2018 a period of declining N surplus and of a stabilization or reductions in N loads. For each period, we quantified the correlations between riverine N loads and individual components of the N mass balance, as well as between load and the overall N surplus for all of the study watersheds.

Simulated scenarios
Two types of scenarios were developed and run: (1) backcast scenarios to quantify the past effects of recent reductions in atmospheric N deposition and WWTP N removal; and (2) future scenarios to assess the time required to achieve N reduction goals. We emphasize here that these simulations are only intended for the NTCBW, or, more precisely, the nine RIM watersheds, while the Chesapeake Bay TMDL is designed for the entire Bay watershed (figure 1).

Backcast scenarios
Simulations were carried out to quantify the effects of recent reductions in atmospheric N deposition and in N removal at WWTPs on current N loads. These simulations were run based on the following two scenarios: (1) DEP87, in which there are no reductions in atmospheric N deposition subsequent to 1987; and (2) WWTP87, in which there are no reductions in N from WWTPs subsequent to 1987. For additional details, see the supplemental information (SI text S4.0, tables S7 and S8).

Future scenarios
Future simulations were carried out to assess the time required to meet goals for N reduction in the nine tributary watersheds and for the NTCBW as a whole. The simulations were run based on the following six scenarios: S1, Business-As-Usual (BAU); S2, a 16% to 47% reduction in atmospheric N deposition, varying across watersheds (see SI table S9 for details); S3, a 55%-83% reduction in wastewater N, varying across watersheds; and S4-S6, increases in the agricultural NUE to 75%, 85%, and 95%, respectively.
The future simulations were run assuming an intervention year of 2025. Historical N surplus magnitudes were used through the year 2018; between 2018 and 2024, N surplus magnitudes were considered to remain constant. To eliminate the influence of annual discharge variability on the modeled results, 500 simulations of each scenario were carried out for each watershed, with discharge being allowed to vary randomly across the historical distribution of annual discharge values for each watershed. Reported annual loads are based on annual mean values from the 500 simulations. For additional details, see the supplementation information (SI text 5.0).

Results and discussion
The primary objective of the present study was to better quantify the role of legacy N in controlling both current and future water quality in the Chesapeake Bay region. To achieve this objective, we used the ELEMeNT-N model to simulate N dynamics across the nine Chesapeake Bay study watersheds, with a specific focus of quantifying legacy N accumulation within the watersheds and understanding the changing contributions of legacy N to watershed N loads. Using this framework, we were able to explore future management scenarios across the study Chesapeake Bay region, and to estimate the time required to achieve desired improvements in water quality.

Model performance
Model performance was evaluated based on its simulation of annual catchment N loads as well as estimates of the current SON content in the study watersheds. For catchment N loads, the model performed well in both the calibration and validation periods across the top-performing simulations (figure 3).
Mean NSE values for the study watersheds during the calibration period ranged from 0.66 for the Susquehanna to 0.85 for the Potomac, with an average value of 0.76. During the validation period, there was a marginal reduction in performance, with the mean NSE across the study watersheds being reduced to 0.65. During the validation period, NSE for the Susquehanna increased to 0.71. Across all watersheds, the relationship between modeled and measured N loads gave an R-squared value of 0.95 (figure S3).
Due to the importance of effectively capturing soil N accumulation and depletion when modeling legacy N dynamics, we used the SON content as a calibration tool. Here, we evaluated the PBIAS between the modeled current SON content and estimates of  . Load values are normalized to the catchment area for the individual watersheds (kg ha −1 y −1 ). The black line represents median modeled values, and the bounds of the grey areas indicate the upper and lower limits of load estimates obtained from the Monte Carlo simulations. The red diamonds represent measured annual N loads. SON content in the study watersheds, based on the SSURGO soil survey. Based on our multi-objective calibration, the modeled results show our estimates of current SON levels to be closely in line with National Cooperative Soil Survey soil sampling-on the order of 2065 kg ha −1 basinwide and 4,600 kg ha −1 in cropland. The model was able to capture current SON magnitudes, with mean PBIAS values of only 2.2%-5.0%. Complete calibration and validation results are provided in the supplemental information (SI, tables S3 and S4).

Nitrogen surplus trajectories (1930-2017)
Simulation of N fluxes by the ELEMeNT-N model is dependent on the development of N surplus trajectories across the study period, with the N surplus being defined here as the difference between N inputs to the watershed (atmospheric N deposition, fertilizer N, manure N, biological N fixation, N in food for humans) and N outputs (crop, pasture, and forest N uptake). Surplus N can accumulate in soils and groundwater, runoff to surface waters, or be lost to denitrification.
Trajectories for the watershed N surplus across the nine study watersheds show similar temporal patterns, but also some notable variability (figure 4). Between 1930 and peak periods later in the 20th century, the watershed N surplus magnitudes increased approximately three-to nearly eightfold, depending on the individual watershed. The Susquehanna watershed reached its peak N surplus magnitude, 36.4 kg ha −1 y −1 , the earliest, in 1978, while most watersheds did not peak until the mid-1990s (Choptank, Mattaponi, Pamunkey, Patuxent, Potomac, Rappahannock). The N surplus increases in these watersheds can primarily be explained by two factors: (1) the steady increase in atmospheric N deposition across the Chesapeake Bay region (1930, 4.9 ± 1.1 kg ha −1 y −1 ; 1995, 13.3 ± 1.1 kg ha −1 y −1 ); and (2) the rapid increase in application of commercial N fertilizers (1930, 0.7 ± 0.2 kg ha −1 y −1 ; 1995, 9.5 ± 6.5 kg ha −1 y −1 ). Across the study watersheds, atmospheric N deposition in 1995 accounted for 34 ± 1% of total N inputs to the watershed, while fertilizer and livestock manure each accounted for another 19 ± 5% and 20 ± 9% of inputs, respectively. The contributions of N in human waste varied considerably across the watersheds, from a low of approximately 3% in the Choptank and Rappahannock watersheds to a maximum of 34% in the Patuxent watershed.
Most of the study watersheds have shown significant declines in N surplus magnitudes since reaching their peaks. The largest relative decreases in N surplus have occurred in the James (38%), the Susquehanna (37%), and the Rappahannock (35%) watersheds, and the fastest change has occurred in the Potomac, with a decrease in N surplus of approximately 10 kg ha −1 y −1 since 1996. The decreases during this period are driven primarily by reductions in atmospheric N deposition, which decreased by approximately 38% between 1995 and 2018. These decreases in N deposition can be attributed primarily to NO x emission reduction policies in the Clean Air Act Amendments implemented in 1990 (Krupnick et al 1998, Eshleman et al 2013. Variations in the timing of deposition decline have varied across the study areas, however, as decreases in NO x emissions from fossil-fuel burning have been offset in some regions by increasing automobile emissions . Noteworthy among the nine study watersheds are the post-peak N surplus trends for the Choptank and Patuxent watersheds. While the other seven watersheds have experienced 27%-38% decreases in the N surplus, decreases in the Choptank and Patuxent have been relatively small (Choptank, 9%; Patuxent, 5%). In addition, the actual surplus magnitudes for these watersheds are more than double those seen in the other watersheds (Choptank 34.3 kg ha −1 y −1 ; Pamunkey, 44.0 kg ha −1 y −1 ). The higher surplus magnitudes in these two watersheds, however, have very different drivers. In the Choptank, a highly agricultural watershed on the Delmarva Peninsula, N inputs to the watershed are currently dominated by manure (22%), fertilizer (30%), and biological N fixation (31%), while N in human waste accounts for only 4% of total inputs. In contrast, in the highly populated Patuxent, human waste N currently accounts for 47% of total N inputs to the watershed.

Dominant controls on riverine N loads (1930-2018)
Our modeled riverine N loads were used to explore dominant controls on N loading over time. For this exploration, we first divided the time series into three key periods: (1) 1930-1945, a period of relatively low N inputs, prior to the large, post-1945 increases in commercial N fertilizer use; (2) 1970-1980, a period of large increases in riverine N loading and high rates of fertilizer N application; and (3) 2000-2018 a period of declining N surplus and of a stabilization or reductions in N loads. For each period, simple linear regression analysis was used to test the significance of correlations between components of the N mass balance and riverine N loads, as described in section 4.3.1 (figure 5). We also examined the changing relationship between N surplus and N load over time, and explored the role of legacy N in driving these changing relationships (sections 4.3.2, 4.3.3).
After 1945, despite decreases in agricultural land use across much of the region, catchment N loads began to increase along with inputs of commercial N fertilizer. In most watersheds, the greatest increases in loading were seen between 1970 and 1990. During this period, mean annual N loading ranged from less than 0.4 kg ha −1 y −1 in the Mattaponi watershed to a high of more than 6 kg ha −1 y −1 in the Susquehanna and almost 8 kg ha −1 y −1 in the Patuxent. Of the various N input sources, N loads during this period were found to have the strongest correlation with atmospheric N deposition rates (R 2 = 0.96, p < 0.001) ( figure 5(b)).
Since 2000, the highest area-normalized N loading across the nine study watersheds is seen in the Susquehanna and Choptank watersheds, with loading magnitudes of 6.3 ± 0.7 kg ha −1 y −1 and 5.7 ± 0.6 kg ha −1 y −1 , respectively. N loading since 2000 is still significantly correlated with atmospheric N deposition across the study watersheds ( figure 5(c)), although the relationship is less significant than it was prior to 1945 (R 2 = 0.54, p = 0.02). During the current period, there is also a significant relationship between catchment N loads and fertilizer N use (R 2 = 0.45, p = 0.05), clearly demonstrating the increased importance of fertilizer N to nonpoint source N runoff. Notably, human population densities do not show a significant relationship with catchment N loads during the current period (p = 0.50), probably due to the importance of N removal by WWTPs as well as the variability in treatment levels across the study watersheds, which may be independent of population density (figure 5(f)) (USEPA 2010).

Relationships between N load and N surplus
Watershed N loads are frequently characterized as varying linearly with the N surplus, or with net watershed N inputs . To test the strength of this relationship over time, we also explored correlations between analysis N surplus and N load in the study watersheds, for the three time periods : 1930-1945, 1970-1980, and 2000-2018. Interestingly, while the surplus-load relationship was found to be significant in all three periods (p ⩽ 0.05), the strength of the relationship has weakened over time. The relationship was found to be the strongest from 1970 to 1980 (R 2 = 0.98, p < 0.001), when both atmospheric N deposition rates and N fertilizer use were on a steep upward trend, in comparison with the relatively weak relationship since 2000 (R 2 = 0.45, p = 0.05) (figures 5(j)-(l)).
To better our understanding of the changing surplus-load relationships across the Chesapeake watershed, we next explored the trajectories of these relationships over time. In figure 6, we show the time-varying relationship between the total N surplus across the Chesapeake watershed and total catchment N load . As shown in the plot, the relationship between N surplus and N load since the late 1970s is nonlinear, demonstrating clear hysteresis. As an example, the figure shows that the N surplus in 2018 was essentially the same as it was in 1970. N loads, however, were nearly 70% higher in 2018 than they were in 1970. This nonlinearity indicates that riverine N loads in the Chesapeake watershed are a function not just of current N inputs, but also of legacy N-in other words, they are dependent on the history of N inputs to the watershed. The counter-clockwise Figure 5. Mean relationships between N loads and components of the N mass balance across the nine Chesapeake Bay study watersheds. The panels show relationships between riverine N load and atmospheric N deposition (a)-(c), population density (d)-(f), fertilizer N application (g)-(i), and the overall watershed N surplus (j)-(l), across the nine watersheds for three snapshot periods: (a) 1930-1945; (b) 1970-1980, and (c) 2000-2018. The relationship between N surplus and load has weakened since the mid-1980s, quite likely due to the strong influence of legacy N on current riverine N loads.
hysteresis 'loop' formed by the surplus-load trajectories is driven by the presence of legacy N within the landscape, legacy that continues to contribute to riverine N loading even when current-year N inputs are decreased (Van Meter and Basu 2015, Zhang et al 2016, Vero et al 2017, Van Meter et al 2018.
The influence of N legacies on water quality trajectories does, however, seem to vary by watershed ( figure 7). Similar to the Chesapeake watershed as a whole, the Potomac, Choptank, Rappahannock, and Susquehanna watersheds (figures 7(d), (e), (g) and (h)), show evidence of strong legacy effects, with wide counter-clockwise hysteresis loops. In the Appomattox, Pamunkey, and James watersheds, however, the relationships between surplus and load are more linear across large (30%-40%) increases and decreases in N surplus (figures 7(a), (c) and (f)). Notably, loads in these watersheds are relatively low, less than 1 kg ha −1 y −1 , compared with the higher loads (>3 kg ha −1 y −1 ) of the more legacy-affected watersheds. In the Patuxent (figure 7(i)), we actually see a clockwise loop, indicating that while N inputs to the watershed have stayed nearly constant over the last 30 years, loads have decreased by more than 40%. In this case, the abrupt decrease in N loads was achieved primarily through implementation of point-source controls at WWTPs in the upper basin, where there are eight major sewage treatment plants (Boynton et al 2008). In other words, implementation of pointsource controls has accelerated improvement of water quality, despite increases in human population, thus leading to the observed nonlinearity between N surplus and load. Accordingly, the Patuxent has shown the most dramatic decline in long-term N load among the nine RIM tributaries (Hirsch et al 2010, Qian Zhang et al 2015.

Nitrogen age at the catchment outlet
The importance of legacy N can also be seen in a comparison of N age at the catchment outlet across the study watersheds (figure 8). The N age is a function of residence times of N in both soil and groundwater, and greater age can be expected to correlate with longer subsurface travel times and a stronger influence of legacy on catchment N loads (Puckett et    . The black dashed lines are linear fit lines between surplus and load . Deviations from the line represent nonlinearities in the relationship between surplus N and catchment N loads. Red arrows indicate the direction of the hysteresis relationship between surplus and load. Large counter-clockwise loops are indicative of larger legacy effects. et al 2020). Previous work in the Chesapeake Bay watersheds has shown a wide distribution of groundwater age in monitoring sites across the region, ranging from 5 years to greater than 50 years, suggesting that groundwater residence times may vary considerably (Focazio 1998). Our results show that in the watersheds with clear, legacy-driven behavior (Potomac, Choptank, Rappahannock, Susquehanna; figure 7), less than 30% of the N reaching the catchment outlet is current-year N, while approximately 50% or more of the N is greater than 5 years of age. With larger contributions of legacy N to currentyear loading, longer time lags to changes in water quality can be expected. For the Patuxent, however, we see a very different age distribution. In this more population-dense watershed, the N age in the Patuxent river is younger than in the legacy-driven watersheds, with more than 35% of the N originating from current-year sources. The young N age for the Patuxent is consistent with the strong point sourcedriven behavior indicated by the hysteresis plot in figure 8(i).

Backcast scenarios
To better our understanding of the relationship between atmospheric N deposition and N from human waste on recent trends in riverine N loads, we also carried out model simulations for scenarios that assume no reductions in atmospheric N deposition (DEP87) or in wastewater N (WWTP87) since 1987, the year of the second Chesapeake Bay Agreement, which set the first numeric goals for reducing nutrient pollution. With these simulations, our predictions of N load allow us to estimate the extent to which regulation of atmospheric and WWTP N emissions have improved water quality.
The simulation results show that upgrades to WWTP plants have had the largest effects on N loading in the Patuxent watershed. Our results show that in the Patuxent, 2018 loads would have been approximately 1.7 times greater if WWTP improvements had not been implemented (figure 9). In other words, while Patuxent dissolved N loads decreased by approximately 67% between 1987 and 2018, loads would only have decreased by approximately 10% without the WWTP upgrades. Other watersheds have also benefited from WWTP upgrades, namely the James, where loads would be approximately 19% higher than they are now without implemented improvements, the Potomac, where loads would be 12% higher, and the Susquehanna, where loads would be 6% higher. Across the NTCBW as a whole, reductions in loading related to WWTP upgrades have reduced loads by approximately 9%.
For most watersheds, reductions in atmospheric N deposition have had a greater effect on N loads. Across the individual watersheds, 2018 N loads would have been 8%-20% higher without reductions in N deposition, and, across the NTCBW as a whole, these reductions have reduced N loads by approximately 13%.

Future scenarios
While current policy specifies that measures should be put in place by 2025 to meet water quality goals for the Chesapeake Bay, it has remained unclear how long it will take to achieve the desired improvements-a question that is of great interest to the Chesapeake Bay Program partnership (USEPA 2010). With our future scenarios, our goal was to assess the time required to achieve the mandated 25% reduction in N loading from 2009 levels, and to explore the role of N legacies in controlling possible time lags. The time lags reported in the discussion below represent the number of years between the 2025 'intervention' year and the year in which the 25% load reduction goal is achieved.
The results of our simulations first show that under business-as-usual practices (S1), meaning no changes in management practices after 2018, the NTCBW comes very close to meeting the N reduction goal (figures 10(a), (b) and table S10). More specifically, loads reach a steady-state value representing an approximately 24% reduction by the year 2039. This reduction, achieved without additional interventions, would result from changes in management that have already been made across the Chesapeake region. Indeed, our results suggest that mean NO x -N loads across the NTCBW had already decreased by 10% between 2009 and 2017, from approximately 58 ktons y −1 in 2009 to 52 ktons y −1 in 2017. Additional decreases over the next two decades, as suggested by the model, occur due to a slow depletion of legacy N in response to reductions in the N surplus that have occurred over the last 40 years. In other words, reductions in legacy N, already triggered by past decreases in N inputs, will drive long-term reductions in N loading under business-as-usual practices until approximately 2039, after which loading will vary only interannually with variations in discharge.
In the second and third scenarios, we simulate reductions in atmospheric N deposition (S2) and in wastewater N (S3), in line with the 'Everyone . Notably, while dissolved N loads have decreased for the Appomattox, James, Pamunkey, and Patuxent watersheds, they have increased in all other areas. The blue and yellow bars represent changes in N loads for the DEP87 and WWTP87 simulations. Results show that without reductions in atmospheric N deposition across the NTCBW, N loads would be approximately 14%-20% higher than they are currently. In the Patuxent, N loads would have increased by approximately 5% without reductions in WWTP emissions, instead of the more than the actual 55% decrease in observed loads. In other watersheds, WWTP controls have had much more moderate effects (differences of 5%-25% between simulated and observed loads).
Everything Everywhere' E3 scenario used by the USEPA to develop the Chesapeake Bay TMDL (USEPA 2010). These scenarios involve reducing atmospheric N deposition anywhere from 15% to 45%, depending on location, and reducing N loads from wastewater plants by 56%-83% (SI, table S9). Across the NTCBW, the simulated atmospheric deposition (S2) and wastewater N (S3) reductions result in approximately 25% and 35% reductions in N loading, respectively. The strategies, however, differ somewhat in the time it takes to achieve the desired reductions ( figure 10(b)). With reductions in atmospheric deposition (S2), it takes 3 years to achieve a 25% reduction in N loading, while reductions in wastewater N (S3) require less than a year to achieve the 25% reduction goal due to the more immediate water-quality effects of reducing direct discharge of N from wastewater plants.
In the next four scenarios, we implement improvements in the NUE. Such scenarios mimic improvements in nutrient management that could occur through reductions in fertilizer N and manure N application, better timing of N application, use of cover crops, and other implementation of BMPs (Sharma and Bali 2017). Across the three scenarios, we assume progressively greater levels of NUE, from 65% (S4) to 95% (S7). Each of these scenarios provides an increase in efficiency over the current mean NUE of 61%. As expected, the greater the efficiency, the greater the reductions in N loading, with reductions of 32% (NUE = 65%) to 46% (NUE = 95%) being achieved across the study watersheds by 2050 ( figure 10(b)).
Notably, the time required to achieve the N reductions for the Chesapeake region also varies with the implemented efficiencies. As shown in the contour plot in figure 10(c) there are tradeoffs between the speed of achieving water quality goals and the extent to which agricultural N use efficiencies are improved across the NTCBW as a whole. For example, at the current NUE of 61% (S1), the 25% N reduction goal is not achieved, and it takes approximately 14 years to achieve a 24% reduction. In contrast, with a 95% NUE (S7), reduction goals could be reached faster, in just 2 years. These time differences become even more pronounced if goals are set higher. To achieve a 40% reduction in N loading-the goal set in the 1987 Chesapeake Agreement-we could expect a time lag of approximately 21 years with an increase in NUE to 75%. However, if the NUE could be increased to 95%, the time lag could be decreased to 9 years.
Of course, there is some variability in the responses of individual watersheds to the future simulations ( figure 10(b)). Our aggregated findings for the Chesapeake Bay are primarily driven by decreasing N loads in the Susquehanna watershed, which accounts for approximately 70% of the total NTCBW N load. In the Susquehanna, our results suggest that an increase in NUE from a current value of 57% to 65% (S4) would result in meeting the 25% reduction goal in 3 years. In the Patuxent, which has benefited from implementation of point-source controls, the 25% reduction has already been achieved. In many other watersheds, however, the 25% reduction is more difficult to achieve. In the Choptank, for example, where legacy N is a primary driver of current water quality, our simulations show that loads would actually increase by approximately 6% under business-as-usual conditions (S1), and that even with an increase in NUE Colored bars represent the time lag to achieving a 25% reduction in loading for the NTCBW as a whole, and for the individual study watersheds. Only simulations successful in meeting the 25% reduction goal. Note that for the Patuxent, the 25% reduction goal is achieved before the start of the 2025 interventions. (c) The contour lines in the plot represent achievement of reductions in N loads, from 25% (blue line) to 40% (yellow line). The y-axis represents a range of N use efficiencies (NUEs), and the x-axis the time lags to achieving the desired N reduction goal. These results demonstrate that with greater increases in NUE, time lags to achieving desired reductions in riverine N loads are reduced.
to 85% (S6), it would take approximately 13 years to reach the 25% load reduction goal. This difference between the Choptank and the Susquehanna can be attributed primarily to differences in the N surplus trajectories for the two watersheds. In the Susquehanna, the N surplus peaked in the mid-1970s and has since then decreased by approximately 37%, primarily as a result of decreases in atmospheric N deposition. In contrast, in the Choptank, where N deposition is a smaller part of the budget and where use of N fertilizer has increased in recent decades, the N surplus continued to increase between 1975 and 1998, and in 2018 the surplus magnitudes remained approximately 19% higher than they were in 1975. These results indicate that N has continued to accumulate within the Choptank watershed for decades, and that legacy N will continue to be a more dominant contributor to catchment N loads than it is in the Susquehanna, where legacy stores are being depleted.

Implications and conclusion
In the present work, we have constructed N mass balance trajectories across nine watersheds of the Non-Tidal Chesapeake Bay Watershed (NTCBW) and have used the ELEMeNT-N modeling framework to simulate catchment N dynamics for the period 1930-2018 for each of these watersheds. We have also developed future N management scenarios and have carried out future simulations to answer questions regarding both the time required to achieve desired reductions in N loading to the Chesapeake Bay and the role of legacy N in driving water quality trajectories.
Our work first demonstrates that N loading to the Chesapeake Bay, in the 30 years since the signing of the Chesapeake Bay Agreement, has been positively influenced by two major factors: upgrades to WWTPs and decreases in atmospheric N deposition. Across the nine NTCBW watersheds, our results suggest that without upgrades to wastewater plants and decreases in atmospheric N deposition, N loading from the study watersheds combined would be approximately 22% higher than it is today, with the majority of that difference being due to reductions in deposition. These findings are in line with those of Ator et al (2020) and Hyer et al (2021), who have also recognized the positive progress made in these two source sectors. Our simulations also show that more can be done in these domains. Additional wastewater upgrades (S3), when implemented, could allow for near-immediate achievement of the EPA's 25% N reduction goals for the Chesapeake Bay, and implementation of Clean Air Act 2030 management practices (S2), which would reduce deposition, would allow for achievement of these goals within approximately 3 years. While the treatment plant upgrades and reductions in atmospheric N deposition provide some of the fastest paths to achieving water quality goals, they are also some of the most expensive. For example, it has been estimated that WWTP upgrades can cost approximately $104 for each 1 kg reduction in N loads. In comparison, on-farm best management practices may cost only $3-$10 to obtain the same reduction (Jones et al 2010).
Next, our results suggest that legacy N has led to substantial delays in achieving water quality goals in the Chesapeake Bay watershed. Across the region, the N surplus reached a peak in 1978 and has continued to decrease at a rate of approximately 3 ktons y −1 since that time, driven primarily by decreases in atmospheric N deposition. Despite these decreases in the N surplus, catchment N loads have been slow to respond, and in 2009, the average total N load across the NTCBW was essentially the same as it was in 1987. Since 2009, however, the watershed has seen an approximately 20% decrease in loading, and our future business-as-usual simulation suggests that, even without intervention, N loads will come close to the 25% N reduction goal by the year 2035. This decreasing trend echoes the decrease in N surplus that has occurred in recent decades and is a function of slowly depleting stocks of legacy N. The nonlinear, temporally varying relationships between N surplus and riverine N loads shown in figure 7 provide tangible evidence of legacy effects in many of the study watersheds.
Finally, our results show that there are many tradeoffs to be considered when establishing policy to improve water quality. It is increasingly understood, for example, that spatial targeting of conservation measures can provide greater improvements in water quality on a per-dollar basis than ad hoc approaches (Randhir et al 2001, Van Meter and Basu 2015, Wardropper et al 2015, Cheng et al 2020. Accordingly, while it may create political goodwill to hold all areas of a region equally accountable for reducing N loads, targeting conservation efforts to areas with the highest levels of N runoff or the lowest N-use efficiencies may provide the greatest gains in water quality (Wardropper et al 2015, Kaye-Blake et al 2019. Similarly, we would argue that 'temporal targeting' should be taken into account when implementing water quality policy (Preisendanz et al 2020). If fast reductions in nutrient loading are required, it must be recognized that not all interventions are created equal. Upgrades to WWTPs, for example, provide the fastest reductions in nutrient loading, as these plants are major sources of nutrient pollution and they discharge effluents directly to surface waters. In contrast, while implementation of nutrient-management practices on farms across the region may ultimately provide even greater water quality benefits, these approaches take longer, and substantial lags must be expected. In this respect, a better understanding of the timescales over which legacy nutrients influence water quality is crucial to creating spatially and temporally explicit blueprints for meeting water quality goals, both in the Chesapeake Bay and beyond.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.