Castles built on sand or predictive limnology in action? Part B: Designing the next monitoring-modelling-assessment cycle of adaptive management in Lake Erie

In Lake Erie, a wide variety of statistical and process-based models have significantly advanced our understanding of the major causal linkages/ecosystem processes underlying the local water quality problems. In this study, our aim is to identify knowledge gaps, monitoring assessment objectives, and management recommendations that should be critically reviewed through the iterative monitoring-modelling-assessment cycles of adaptive management. In the watershed, the presence of multiple SWAT applications provides assurance that a wide array of physical, chemical, and biological processes with distinct characterizations are used to reproduce the patterns of flow and nutrient export in agricultural lands. While there are models with more advanced mechanistic representation of certain facets of the hydrological cycle (surface runoff, groundwater and sediment erosion) or better equipped to depict urban settings, we believe that greater insights will be gained by revisiting several influential assumptions (tile drainage, fertilizer/manure application rates, land-use/landcover data) and recalibrating the existing SWAT models to capture both baseline and event-flow conditions and daily nutrient concentration (not loading) variability in multiple locations rather than a single downstream site. It is also critical to redesign land-use management scenarios by accommodating recent conceptual and technical advancements of their life-cycle effectiveness, the variability in their starting operational efficiency, and differential response to storm events or seasonality, as well as the role of legacy phosphorus. In the receiving waterbody, the development of data-driven models to establish causal linkages between the trophic status of Lake Erie and external phosphorus loading represents a pragmatic means to draw forecasts regarding the phytoplankton community response to different management actions. Two critical next steps to further augment the empirical modelling work is the iterative updating as more data are acquired through monitoring and the introduction of additional explanatory variables that are likely associated with the occurrence of cyanobacteriadominated blooms. The majority of the process-based models are not constrained by the available data, and therefore their primary value is their use as heuristic tools to advance our understanding of Lake Erie. The validation of their predictive power should become one of the overarching objectives of the iterative monitoringmodelling-assessment cycles. With respect to the projected responses of the system to nutrient loading reduction, we express our skepticism with the optimistic predictions of the extent and duration of hypoxia, given our limited knowledge of the sediment diagenesis processes in the central basin along with the lack of data related to the vertical profiles of organic matter and phosphorus fractionation or sedimentation/burial rates. Our study also questions the adequacy of the coarse spatiotemporal (seasonal/annual, basinor lake-wide) scales characterizing the philosophy of both water quality management objectives and modelling enterprise in Lake Erie, as this strategy seems somewhat disconnected from the ecosystem services targeted. We conclude by emphasizing that the valuation of ecosystem services should be integrated into the decision-making process, as we track the evolution of the system over time. https://doi.org/10.1016/j.ecoinf.2019.05.015 Received 26 October 2018; Received in revised form 22 May 2019; Accepted 22 May 2019 ⁎ Corresponding author. E-mail address: georgea@utsc.utoronto.ca (G.B. Arhonditsis). Ecological Informatics 53 (2019) 100969 Available online 24 May 2019 1574-9541/ © 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/BY/4.0/). T


Introduction
The rigorous analysis of decision problems in water quality management requires specification of ecosystem indicators that reliably reflect the prevailing conditions; an objective function to evaluate benefits and costs of alternative management strategies; predictive models formulated in terms of variables relevant to management objectives; a finite set of alternative management actions, including any conditional constraints on their use; and a monitoring program to follow system evolution and responses to management (Walters and Holling, 1990;Walters, 1986). In this context, one of the major challenges arises from the uncertainty in the predictions of management outcomes. This uncertainty may stem from incomplete control of management actions, errors in measurements and sampling, environmental variability, or incomplete knowledge of system behavior. Failure to recognize and account for these sources of uncertainty may lead to catastrophic environmental and economic losses. Consequently, there has been a growing interest in the policy practice of adaptive management, as it provides an iterative implementation strategy recommended to address the uncertainty associated with ecological forecasts and to minimize the impact of inefficient management plans (Allen et al., 2011;Williams et al., 2011). Adaptive implementation or "learning while doing" augments initial forecasts of management actions with post-implementation monitoring, and the resulting integration of monitoring and modelling provides the basis for revised management actions. As such, adaptive implementation is often considered the only defensible environmental management strategy, as a hedge against unknown forecast errors that can undermine the restoration practices and may result in misallocation of the taxpayers' money (Williams and Brown, 2014).
Environmental models play a critical role in the adaptive management process, as they can be used to evaluate the degree of our knowledge of the system being managed and to identify the outstanding questions we must address through monitoring and research. In Lake Erie, a unique combination of statistical and mathematical models have been developed to evaluate the relationships among watershed physiography, land-use patterns, and phosphorus loading, to understand ecological interactions, to elucidate the role of specific facets of the ecosystem functioning (internal loading, dreissenids), and to predict the response of the lake to external nutrient-loading reductions. In our companion paper , we presented a technical analysis by identifying strengths and weaknesses of the existing models. In particular, it was shown that the current modelling work in the Maumee River watershed, is not ready yet (i) to provide reliable predictions regarding the long-term achievability of the phosphorus loading targets, and (ii) to evaluate the impact of individual episodic events that can carry significant nutrient loads and presumably modulate the downstream water quality conditions. We also underscored the importance of revisiting several influential assumptions (fertilizer/ manure application rates, land-use/land-cover data, tile drainage) and recalibrate the existing watershed models to capture both baseline and event-flow conditions and daily nutrient concentration (not loading) variability in multiple locations rather than a single downstream site Kim et al., 2019). Another challenging aspect is the proper consideration of legacy phosphorus (P), e.g., initialization that accommodates the spatial soil P variability, sufficient model spinup period, parameter specification that reproduces the gradual P accumulation in the soils, and its ability to reproduce the critical hydrological and transformation mechanisms modulating the dissolved reactive phosphorus (DRP) loading in the Lake Erie basin.
Similar to the general trends in the international aquatic biogeochemical modelling literature (Arhonditsis and Brett, 2004), the ability of the local models to reproduce the spatio-temporal dynamics in Lake Erie declined from physical, chemical to biological variables (see Fig. 5 and Appendix A in Arhonditsis et al., 2019). Temperature and dissolved oxygen variability were successfully reproduced, but less so the ambient nutrient levels. Model performance against individual phytoplankton functional groups (e.g., cyanobacteria) was not up to par with those presented for chlorophyll a concentrations and zooplankton abundance . To further reduce the predictive uncertainty, Arhonditsis et al. (2019) argued that there is a rich research agenda that should be put in place with the next iteration of the adaptive management cycle, in order to improve our understanding of several facets of phytoplankton ecology, such as the degree of reliance of phytoplankton growth upon internal nutrient sources (e.g., microbially mediated regeneration, dreissenid or zooplankton excreted material in nearshore and offshore waters, respectively), the internal P loading from the sediments, the role of nitrogen, the trophic interactions with zooplankton, the degree and timing of the sediment response or the likelihood of unexpected feedback loops that could delay the realization of the anticipated outcomes, and factors that determine Cladophora growth in the nearshore zone of the eastern basin in Lake Erie (Fig. 1).
Consistent with the scientific process of progressive learning, the present study aims to assist with the next iteration of the modelling framework by pinpointing essential augmentations and research/monitoring priorities to effectively integrate watershed and aquatic ecosystem processes. We also discuss the optimal resolution in time and space required for the comprehensive skill assessment of the existing modelling work, as well as for the establishment of nutrient loading targets and ecosystem response indicators in Lake Erie. Viewing ecosystems as providers of economically valuable benefits to humans, we strongly recommend the development of a rigorous framework that quantifies in monetary terms the importance of a well-functioning ecosystem. Our thesis is that this could be a critically important strategy to gain leeway and keep the investments to the environment going, especially if the water quality improvements in Lake Erie are not realized in a timely manner.

Selection of the optimal scale to evaluate model performance in time and space
Recognizing that we can never have all the empirical information that necessitates to develop a completely constrained model, there is always a trade-off between knowing "much about little" or "little about much" in the environmental modelling practice. In Lake Erie, most of the existing work opted for the latter strategy, whereby complex mathematical tools have been used to advance our understanding of the mechanisms operating both in the watershed and receiving waterbody Kim et al., 2019;Shimoda et al., 2019). In a recent critique, Kim et al. (2014) argued that the majority of these process-based models are profoundly over-parameterized with unproven ability to provide robust predictive statements. Regarding the latter assertion, the skill assessment results presented by Scavia et al. (2016aScavia et al. ( ,b, 2017 were particularly favorable with respect to their ability to capture the magnitude of important eutrophication indicators, such as phosphorus loading, phytoplankton biomass, and hypoxia severity, at an aggregated spatiotemporal (seasonal/annual time scale, basin-or lake-wide) resolution.
In principle, the selected coarse scales for evaluating model performance in time and space are defensible, as they are consistent with those used for the established nutrient loading targets and water quality indicators in Lake Erie (Scavia et al., 2016a,b). Nonetheless, given that the majority of these models are based on daily (or sub-daily) simulations within one-to three-dimensional spatial domains, it would seem that the bar of what constitutes an acceptable model performance has been lowered significantly. There are compelling reasons why this practice is problematic and should be revisited during the next iteration of the modelling framework. From a technical standpoint, evaluating model goodness-of-fit with a coarser resolution not only entails the risk to obfuscate multiple daily or location-specific errors/biases that cancel each other out when seasonally or spatially averaged, but may also detract the attention from the much-needed critical evaluation of the process characterizations derived after the calibration of (prone-tooverfitting) complex models. In particular, many of the assumptions made or parameter values assigned could be adequate to describe spatially or temporally aggregated patterns, but could also be the culprits for the misrepresentation of important aspects of the intra-or inter-annual and spatial variability (e.g., magnitude of the spring freshet, timing of algal blooms, and response of the nearshore zone to extreme precipitation events). Our independent model-fit reassessment exercise on a daily scale reinforced the importance of the latter issue by showing the distinctly inferior performance of both watershed and aquatic ecosystem models Kim et al., 2019), as well as their inability to capture critical short-term or event-based facets of the simulated terrestrial and aquatic biogeochemical cycles.

Synthesis of predictions from multiple models
In Lake Erie, the development of an ensemble of models offers the unique ability to evaluate competing hypotheses regarding the relative importance of hydrological processes and mechanisms of nutrient fate and transport within a watershed context, or the plausibility of alternative aquatic ecosystem conceptualizations; especially when complex over-parameterized models are in place with inadequate empirical information to pose meaningful constraints. The propagation of this uncertainty through the environmental forecasts, i.e., scenarios of best management practices (BMPs) and load-response curves, was based on the generation of the uncertainty envelope from individual model predictions, without the introduction of weighting factors that consider their goodness-of-fit, bias, or model complexity (Scavia et al., 2016b). Counter to this practice, there are viewpoints in the literature advocating the development of weighting schemes to objectively synthesize ecological forecasts from multiple models (Raftery et al., 2005;Roulston and Smith, 2003;Wilks, 2002). One of the critical decisions involves the establishment of standards for the calibration and validation domains that will allow to rigorously evaluate the ability of a model for extrapolative tasks, i.e., forecast conditions distinctly different from those currently prevailing in the modelled system. Another criterion focuses on the goodness-of-fit of individual models as a weighting factor to determine their corresponding influence on the ensemble predictions.
A promising advancement towards a rigorous synthesis of multiple model predictions is the Bayesian averaging framework used by Scavia et al. (2017), whereby the weights of each of the SWAT models for the Maumee River watershed were based on their predictive performance either for total phosphorus (TP) or DRP loading over a selected validation dataset. In the same context, Ramin et al. (2012) advocated the consideration of the performance over all model endpoints, for which observed data exist, rather than individual modelled variables that are more closely related to the question being asked. In doing so, we ensure that the models included in an ensemble environmental forecast should have balanced performance over their entire structure. To put in another way, we ought to penalize the likelihood of calibration bias, whereby the maximization of the fit for a specific variable (e.g., nutrients or total phytoplankton biomass) may be accompanied by high error for other variables (e.g., individual phytoplankton functional groups or zooplankton), and thus avoid deriving forecasts founded upon models with misleadingly high weights that conceal fundamentally flawed representations of system behaviour. Other criteria for the development of ensemble weighting schemes are the consideration of penalties for model complexity that will favor parsimonious models (McDonald and Urban, 2010), and performance assessments that do not exclusively consider model endpoints but also evaluate the plausibility of the values assigned to major processes pertaining to water budget, nutrient cycles, and critical ecological pathways against empirical estimates; whenever these values exist Arhonditsis and Brett, 2004). Regarding the latter factor, it is also important to  also Table 1). G.B. Arhonditsis, et al. Ecological Informatics 53 (2019) 100969 reiterate one of the key points from our companion paper that no consistent information has been reported in the Lake Erie modelling literature regarding the relative magnitude of modelled biogeochemical fluxes , and therefore it is difficult to evaluate how "diverse" the model ensembles for the watershed or receiving waterbody are or to what extent the individual models replicate the same characterization of the system functioning.

Improving the credibility of the load-response curves
Several important structural augmentations of the existing modelling tools could increase both their heuristic and predictive values as long as commensurate empirical knowledge to constrain the mathematics becomes available from Lake Erie. If we strive to establish predictive linkages between the magnitude and timing of the response of the sediments and different loading regimes, the study of the sediment diagenesis processes is essential in understanding the control of redox chemistry on the vertical profiles of biodegradable organic matter and P binding forms (Gudimov et al., 2016). While their simple form is convenient, it must be noted that the expressions of sediment oxygen demand (SOD) as a function of dissolved oxygen (DO), temperature, or even total phosphorus (TP) loading are primarily conceptual without adequate ground-truthing in the literature and therefore carry little predictive power. Empirical information is also needed to constrain the submodels/differential equations related to dreissenids, Cladophora, and zooplankton. While some progress has been made in representing the role of dreissenid mussels in the system (Karatayev et al., 2017;Verhamme et al., 2016), little work has been done to adapt the existing Cladophora submodel to the nearshore zone and even less so to depict the phytoplankton-zooplankton interactions in Lake Erie (Table 1). Likewise, with the shift in focus to the average conditions of the offshore waters, the nearshore zone has received less attention from the existing modelling work in Lake Erie. These areas are intermediate zones in that they can receive polluted inland waters from watersheds with significant agricultural, urban and/or industrial activities while mixing with offshore waters, having different biological and chemical characteristics. Coastal upwelling events during the early summer appear to modulate offshore-nearshore mass exchanges, whereby nutrients and hypoxic waters are injected from the hypolimnion into the nearshore Lake Erie (Valipour et al., 2016). Surface waves can also resuspend bottom sediments in the shallow waters, and as they tend to be repositories of both nutrients and contaminants, resuspension events are highly important in predicting water quality. Thus, there is a need for an integrated watershed-receiving waterbody modelling framework to shed light on the interactions of surface/subsurface hydrological inflows with in-lake hydrodynamics that shape to a large degree the dispersal of pollutants and consequently the spatial extent and magnitude of associated ecological impacts in the different basins of Lake Erie (Schwab et al., 2009).

Challenges and prospects in forecasting HABs
Another critical challenge revolves around the establishment of robust phytoplankton group-specific parameterizations to support predictions in a wide array of spatiotemporal domains, given the uncertainty with the derivation of distinct functional groups from fairly heterogeneous algal assemblages and our knowledge gaps of cyanobacteria ecology . The ability of the current generation of plankton models to reproduce succession patterns and structural shifts in phytoplankton communities has not been proven yet, and thus efforts to predict cyanobacteria-dominated HABs (or cHABs) with process-based models are often characterized as attempts to "run before we can walk" (Anderson, 2005). Although we do not agree with these skeptical views, we do believe that the inclusion of Table 1 Major knowledge gaps and sources of uncertainty to guide monitoring and improve modelling in Lake Erie Shimoda et al., 2019).
Knowledge gaps Phytoplankton • Winter productivity under ice needs to be further examined to establish the causal linkages with spring phytoplankton dynamics and summer hypoxic conditions. Cyanobacteria/HABs • Increased soluble reactive phosphorus (SRP) loading since mid-1990s appears to correlate with more frequent and severe harmful algal blooms (HABs). Accurate fractionation of exogenous phosphorus loads (P bioavailability) is needed to confirm the relationship with algal blooms.
• Regular monitoring of cyanotoxin data (Microcystin-LR) in the nearshore zone. • Quantitative characterization of the selective rejection of cyanobacteria by dreissenids Hypoxia • It is critical to understand the intensive microbiological, geochemical, and physical processes occurring within the top few centimeters of the sediment and determine the fraction of organic matter and nutrients released into the overlying water. Field, experimental (e.g., porewater analysis, phosphorus fractionation, organic matter profiles), and modelling (e.g., primary and secondary redox reactions, mineral precipitation dissolution reactions, acid dissociation reactions, and P binding form reactions) work should be designed to shed light on the mechanisms of organic matter mobilization in the sediments and to identify process controls under a variety of conditions. Dreissenids • Empirical information on selective rejection of cyanobacteria by dreissenids and revisit of the mathematical representation of dreissenid feeding.
• Empirical data on the spatial distribution of zebra and quagga mussels is needed, since the two species differ significantly with regard to filtration and excretion rates.
• Empirical data on the year-to-year variability of dreissenid population density and size distribution is critical to realistically predict their impact on water quality.
• The role of dreissenids on N mineralization and the modification of the N:P ratios remains understudied Cladophora • Local modelling efforts in Lake Erie will greatly benefit from a high-resolution monitoring of the nearshore zone to establish the causal linkages between the abiotic conditions (e.g., phosphate, light, temperature) in the surrounding environment and the internal P content and sloughing rates in Cladophora mats. Sediment-related processes • Empirical information is needed to distinguish the relative importance of oxic and anoxic release of P from the sediments in the western basin of Lake Erie in order to validate model predictions.
• Empirical information is needed to quantify the contribution of particulate matter resuspension to the P concentrations in the water column (all models except WLEEM do not consider this process separately from P chemical release).

Refinement of monitoring framework
• Monitoring programs should be extended until the end of September to obtain a better characterization of hypoxia, given that the maximum hypoxic area in the nearshore zone has been projected to occur in September (Bocaniov and Scavia, 2016).
• Monitoring programs should target nearshore areas, which occupy a significant portion of the area of the central basin and represent an important habitat for many aquatic species as well as a source of drinking water (Bocaniov and Scavia, 2016).
• Whole-year phytoplankton and zooplankton sampling to quantitatively characterize the seasonal succession patterns, as well as the likelihood of top-down control.
G.B. Arhonditsis, et al. Ecological Informatics 53 (2019) 100969 empirical cause-effect relationships in the model ensemble to link the nutrient loading variability with the magnitude of the summer harmful algal bloom offers a reliable complementary framework to track the anticipated response of the system (Bertani et al., 2016;Stumpf et al., 2016). Building upon the Bayesian foundation of some of these empirical tools, the next steps should involve their sequential updating as more data are acquired through monitoring, as well as the consideration of additional predictors to accommodate the role of other nutrients, light availability, water column stability, and water temperature. The latter augmentation not only will improve our predictive power, but will also allow to establish hierarchical linkages between the year-toyear variability captured by the Bertani et al.'s (2016) and Stumpf et al.'s (2016) studies and within-year conditions that ultimately lead to cHAB formation .

Do we need other models to complement the SWAT framework in the Maumee River watershed?
In terms of the diversity of the watershed modelling framework, SWAT has been the only process-based model used to evaluate alternative agricultural management practices in the Maumee River watershed. Notwithstanding its conceptual and operational advantages, an important question arising is to what extent are we missing profound advancements of our understanding of watershed processes that other models can offer? To address the latter question, we compared SWAT against eight commonly used watershed models, i.e., the Annualized  (Refsgaard and Storm, 1995), and Storm Water Management Model (SWMM; Rossman and Huber, 2016), regarding their strategies to capture surface runoff, groundwater, sediment transport, nutrient cycling, and channel routing (Fig. 2). For the purpose of the present study, we only provide the distinct differences among the modelling strategies that are available in the literature, but detailed description of the various models can be found in Dong et al. (2019).
To quantify the potential magnitude of surface runoff, SWAT uses the empirical Soil Conservation Service Curve Number (SCS-CN) method based on the antecedent moisture condition and the hydrologic soil group of a particular location at a given day. The SCS-CN method lumps rainfall interception, depression storage, and soil infiltration as the initial abstractions, which are assumed to account for 20% of the potential maximum retention. The surface runoff is estimated by subtracting that amount from the rainfall total volume, and therefore the total precipitation volume should exceed the initial abstraction before any runoff is generated. Because of the limitations of the SCS-CN method in estimating cumulative runoff depth and peak flows (Borah et al., 2007), SWAT also provides a sub-daily simplified model, the Green-Ampt (GA) infiltration method, by postulating a uniform movement of water from the surface down through the deep soil with a sharp wetting front. The HBV and DLBRM models use the linear-reservoir concept to represent the rainfall-runoff process at the watershed scale, based on the assumption that the overland flow is linearly correlated with the water storage. HSPF uses the Philip's equation, which is a subdaily method simplified from the Richards' equation (Richards, 1931). The latter approach is considered by the MIKE SHE model and describes the vertical movement of water through the soil profile using Darcy's law. The flow rate through porous media is proportional to the hydraulic conductivity, which in turn is estimated from the soil water content (Baver et al., 1972). Compared to other models, the Richards' equation could more accurately quantify vertical water percolation and dynamic unsaturated flow based on various soil properties.
In terms of the representation of subsurface process, SWAT incorporates a kinematic storage model to simulate lateral flow in the unsaturated zone. The saturated zone is conceptualized as an unconfined-shallow aquifer and confined-deep aquifer. Groundwater flowing into the main channel is assumed to be linearly correlated with hydraulic conductivity and the changes of the water table height. By contrast, a simple linear reservoir model is used in DLBRM, HBV, HSPF, and GWLF, whereby interflow and groundwater are linearly proportional to moisture content of the unsaturated zone and saturated zone, respectively. Conceptually similar with the division of shallow and deep aquifers in SWAT, HSPF divides the saturated zone into two storage reservoirs: active and inactive groundwater. AnnAGNPS also uses a simple empirical method, the Darcy's equation, assuming that lateral flow is linearly related to saturated hydraulic conductivity and hydraulic gradient. Compared to subsurface modelling in SWAT, both the linear reservoir model and the Darcy's equation are simpler representations of the subsurface system. Conversely, MIKE SHE provides a physically based 3-D saturated zone model to explicitly represent the vertical and spatial characteristics of the subsurface profile. MIKE SHE incorporates a 3-D Finite Difference Method numerical engine, which is theoretically similar to MODFLOW (MODular 3-D Finite-Difference Ground-Water FLOW Model) (McDonald and Harbaugh, 2003). The subsurface modelling in SWAT could be complemented with MIKE SHE by tracking the vertical solute transport and providing a comprehensive, fully dynamic depiction of the hydrological interplay between surface and subsurface layers.
One of the commonly used strategies to quantify soil erosion, the Modified Universal Soil Loss Equation (MUSLE), is used in SWAT. MUSLE is developed from the original USLE method by replacing the rainfall-erosivity (R) factor with a runoff-energy factor (Smith et al., 1984). The Revised USLE (RUSLE) used in AnnAGNPS is another modification by converting the soil-erodibility (K) factor to a timevarying parameter (Renard et al., 1991). Counter to these USLE-based empirical approaches, DWSM, HBV-INCA, HSPF, and MIKE SHE offer physically based methods that explicitly accommodate the physical detachment/reattachment and transport processes. The European Soil Erosion Model (EUROSEM) used in MIKE SHE considers soil detachment affected by both raindrop and leaf drainage, which enables the explicit representation of the effects of vegetation heights. By contrast, HSPF, INCA and DWSM erosion models ignore the effects of leaf drip. Thus, the sediment erosion module in MIKE SHE introduces an advanced representation of our contemporary understanding of the soil erosion process, which could be used to complement the empirical MUSLE method in SWAT. Soil P simulation with SWAT is based on the Erosion Productivity Impact Calculator (EPIC) model (Sharpley and Williams, 1990), which includes three organic P pools (fresh, active and stable) and three inorganic pools (solution, active, and stable). In SWAT, fertilizer, manure, and residues are the input sources of soil P, which could be removed by plant uptake, water flow, and soil erosion. The plant P uptake is dependent upon plant growth, simulated as a function of the leaf area growth, light interception, and biomass production. One of the strengths of the EPIC model used in SWAT is the explicit simulation of the daily plant growth, whereas other models, like INCA and HSPF, represent plant growth either through a seasonal plant-growth index, or a simple empirical first-order kinetics equation. SWAT also takes into account the residue decay and mineralization, which is not considered with HSPF and INCA and may thus limit their ability to evaluate the importance of legacy P. SWAT provides two options in simulating soil P routines: (a) the conventional EPIC model based on the assumption of a constant equilibrium adsorption/desorption concentration, and (b) a dynamic approach that allows the P sorption coefficient to vary with solution P (Vadas and White, 2010). Similarly, to represent the longterm soil P dynamics, HSPF adjusts the adsorption rates by soil temperature, whereas INCA varies the equilibrium inorganic P concentration based on the mass of P in the labile soil pool. The consideration of dynamic P subroutines can be quite critical when evaluating the longterm watershed responses to various agricultural management strategies.
Existing submodels of water routing in channels could be divided into four main categories spanning a wide range of complexity: the dynamic wave; the diffusive wave; the kinematic wave; and the nonlinear reservoir models. The dynamic wave model used in MIKE SHE and SWMM is the most complex, physically based approach founded upon the continuity and momentum equations (Tayfur et al., 1993). The diffusive wave model uses simplified momentum equations by downplaying the role of local and convective acceleration. SWAT uses the kinematic wave model, which incorporates the most simplified momentum equation by omitting the pressure gradient and acceleration terms (Miller, 1984). The non-linear reservoir model divides the channel segments into a series of reservoirs with uniform water surface. Since SWAT omits pressurized flow and backwater effects, it is not capable of simulating pipe flows, whereas a fully dynamic wave equation to model water routing would be applicable to both open channels and closed pipes.
Considering the different strengths of the watershed models reviewed, SWAT could be complemented by the modules of other watershed models, especially for surface runoff, groundwater and sediment-erosion processes. For the hydrological and sediment processes, MIKE SHE seems to be more up-to-date with respect to the mechanisms considered, assuming that local empirical knowledge is available to constrain the additional parameters. SWAT has the advantage to explicitly simulate the daily plant growth, but it could be improved by considering the dynamic P equilibrium concentration. MIKE SHE and SWMM are superior to SWAT in channel routing because of their capability to simulate pipe flows. SWAT is suitable for agricultural BMPs (e.g., terracing, contouring, strip cropping, tillage operations, crop rotations, and fertilizer application), while the urban BMP modules in SWMM (e.g., rain gardens, green roofs, infiltration trenches, permeable pavement, and vegetative swales) offer a useful alternative .
Another challenge with the existing SWAT applications in the Maumee River watershed was the consistent event-flow underestimation, which can be potentially ameliorated by the use of an alternative runoff estimator, such as a Green-Ampt method, instead of the conventional SCS-CN method (King et al., 1999;James et al., 1992). It is also important to recognize that the characterization of surface runoff and subsurface processes during flow events is largely unknown in the area, and therefore the design of high-frequency, event-based, waterquality sampling coupled with stable-isotope analysis ( 18 O and 2 H) should be one of the priorities in our efforts to rectify the misrepresentation of extreme flow conditions Klaus and McDonnell, 2013). In the same context, recent advancements in hydrology suggest that baseline conditions and extreme events may be associated with distinct flow mechanisms, and thus two major strategies have been proposed to accommodate threshold behavior in watershed models (Zehe and Sivapalan, 2009). The first strategy is the introduction of a two-domain conceptualization of soil-water movement into numerical watershed models, whereby flow through the soil matrix or macropore flow is responsible for small-and large-runoff events, respectively (Zehe et al., 2001). The second approach postulates that the watershed operates in multiple states or modes of behavior, and the identification of which can be explicitly accommodated through the model calibration process (Ali et al., 2013). A characteristic example of this strategy is the Bayesian hierarchical framework used to calibrate the SWAT model in Hamilton Harbour (Ontario, Canada), which enabled the identification of precipitation thresholds that trigger shifts to alternative watershed states, as well as state-specific parameters to depict extreme states with higher propensity for runoff generation (Wellen et al., 2014a(Wellen et al., , 2014b. Together with the process-based modelling work in the Maumee River watershed, it is also critical to have simpler statistical models in place that not only provide predictive statements confined within the bounds of data-based parameter estimation, but can also constrain processes/fluxes parameterized by mechanistic models or even validate the corresponding forecasts drawn. A characteristic example of the potential benefits of a data-driven model is the use of SPARROW to validate the predicted spatial distribution of phosphorus loads in the Maumee River watershed by the five SWAT applications (Scavia et al., 2016c). SPARROW is a hybrid mechanistic-statistical model, with empirically based parameters (i.e., land-to-water delivery coefficients, nutrient export from different land uses, in-stream attenuation rates, reservoir settling velocities) to estimate nutrient loading from a series of hydrologically linked catchments and thus to delineate areas of high risk in many Great Lakes watersheds (Kim et al., 2017;Wellen et al., 2014aWellen et al., , 2014b. Interestingly, the SPARROW spatial projections were not in complete agreement with the SWAT-ensemble predictions of nutrient-export hot-spots in the northwestern part of the Maumee River watershed (see Table 1 in Arhonditsis et al., 2019), and were instead suggestive of an extensive area in the southern/southwestern Maumee River watershed with TP loading estimates distinctly higher than 150 kg km −2 yr −1 (Scavia et al., 2016c). Thus, we can infer that the SPARROW spatial nutrient export projections are influenced from the explicit consideration of farm fertilizers, and livestock wastes of unconfined animals on farms, pastures, and rangelands, as well as those in concentrated animal feeding operations (Robertson and Saad, 2011). In the same context, Kim et al. (2019) used the Base Flow Index (BFI) map as an independent source of information to reconcile these projected discrepancies in the Maumee River watershed attributes (Fig. 3). BFI is a measure of the ratio of long-term baseflow to total stream flow, representing the slow continuous contribution of groundwater to river flow, and therefore low (high) BFI values suggests higher (lower) likelihood of surface runoff which in turn can lead to higher (lower) suspended solid and particulate phosphorus loads. The consistently lower values of the empirically obtained baseflow index in the southern/southwestern Maumee River watershed appear to be closer to the SPARROW rather than the SWAT spatial predictions (Fig. 3 vs Table 1 in Arhonditsis et al., 2019). While the latter result may not be an evidence for unequivocally ground-truthing either of the two models, it does highlight the need for the current members of the SWAT-ensemble to be recalibrated against data from multiple sites across the watershed, as well as to revisit some of the fundamental assumptions regarding the fertilizer/manure application rates in the croplands or the spatial drainage of soils .
The existing SPARROW application in Lake Erie is part of the regional model developed for the Upper Midwest (Great Lakes and Upper Mississippi, Ohio, and Red River Basins or MRB3 model as referred to in Robertson and Saad, 2011), which comprised 810 sites of TP empirical loading estimates. Lake Erie accounted for < 6% of those sites, while the median and 90% percentile of the local loading estimates were 30% and 70% lower than the corresponding values in the entire dataset, respectively (Material S3 in Supporting Information section of Robertson and Saad, 2011). The use of "cross-sectional" datasets over broader regions has been one of the pillars of the SPARROW modelling enterprise, as this practice is deemed more suitable in unravelling the complex patterns within a watershed context while the significant loading range across the calibration locations typically leads to wellidentified parameters and a larger application domain (Alexander et al., 2004). To mitigate the impact of potential outliers or non-representative calibration data, USGS has implemented a number of procedures, including subjective data censorship of daily records for both flow and nutrient concentrations in case they hamper obtaining a satisfactory Fluxmaster regression fit (Schwarz et al., 2006); water quality monitoring stations with a number of records below a certain number are filtered out from further consideration; and stations with Fluxmaster model errors > 50% are omitted from SPARROW calibration datasets (Robertson and Saad, 2013). These practices can neither address the problem of datasets that disproportionately consist of baseline-rather than event-flow samples (Long et al., 2014(Long et al., , 2015Richards et al., 2013), nor do they overcome the fact that the assumption of regionally common parameters could introduce watershedspecific bias in the characterization of fundamental processes, such as nutrient export from different land-uses, land-to-water delivery, and instream attenuation rates. An appealing alternative that could rectify many of these problems will be the development of a Great Lakes SPARROW model that narrows the focus of the original MRB3 model, while maintaining its "global" character (Fig. 4). Importantly, the rigid common parameter estimates over the entire spatial model domain can be relaxed by the use of a hierarchical structure to estimate watershedspecific parameters, and thus accommodate the spatial variability within the Great Lakes basin. In addition, rather than the stringent data censoring currently implemented, the SPARROW practice should become more inclusive and instead the calibration datasets could be coupled with measurement-error models to characterize our degree of confidence on their quality or to accommodate the serial correlation among nested subwatersheds (Kim et al., 2017;Wellen et al., 2012Wellen et al., , 2014aBalin et al., 2010;Carroll et al., 2006). This is an important exercise that will consolidate the presence of an empirical modelling tool to guide the delineation of nutrient hot-spots alongside the processbased modelling work.

Risks and uncertainties with the implementation of best management practices: what does the literature suggest?
A variety of costly BMPs have been designed to mitigate pollution In the next level, the spatial heterogeneity is accommodated by introducing i(=6) "regional" distributions, associated with the basins of each waterbody of the Great Lakes system (Superior, Huron, Michigan, Ontario, Erie, Georgian Bay), which are used to draw values for the vectors β i = [β i1 , β i2 , …, β ij ]. Similarly, in the upper stage, the regional parameters β ig are specified probabilistically in terms of the global population parameters or hyperparameters μ and σ that correspond to the wider Great Lakes area. Coupled with the mechanistic tools for the Maumee River watershed, empirical (SPARROW-like) models geared to depict catchment-specific (rather than continental or regional) nutrient loading conditions can offer a multitude of complementary benefits, such as validate the spatial delineation of hot-spots with higher propensity for nutrient export, narrow down the uncertainty of processes/fluxes parameterized by mechanistic models, and obtain predictive statements constrained within the bounds of data-based parameter estimation. from diffuse sources in agricultural and urban areas (Leitão et al., 2018;Edwards et al., 2016;Dietz, 2007;Sharpley et al., 2006). Although their implementation has been based on the stipulation that both their shortand long-term effectiveness are guaranteed, emerging evidence is suggestive of moderate water quality improvements in many watersheds and broad variability in their performances, often much lower compared to the specs of the original design from BMP experimental studies (Jarvie et al., 2013;Kleinman et al., 2011). This form of scenario uncertainty can be attributed to a number of factors, such as suboptimal design, lack of landowner participation (Fig. 5a, b), erroneous selection of BMPs (Fig. 5c), failure to address non-point pollution sources, inadequate coverage of the watershed, lag time between BMP implementation and distinct improvements of downstream conditions, different efficiency between particulate and soluble nutrient forms (Fig. 5d, e), and variability induced by extreme events and other weather-related anomalies (Liu et al., 2017;Meals et al., 2010). In Lake Erie, Smith et al. (2018) noted that the majority of local farmers apply P fertilizers at or below the current recommendations, and are erroneously singled out as the main culprit for the recent re-eutrophication. It was asserted that agronomic changes (e.g., no-tillage adoption, crop cultivar advances) in the surrounding watersheds and the lack of appropriate fertility guidance and practices to protect water quality could instead be primarily responsible for the recent trends in nutrient biogeochemical cycles (Smith et al., 2018). The same study also questioned whether the "law of unintended consequences" has received sufficient consideration in the local decision-making process, as environmental interventions can conceivably have long-term damaging effects on ecosystem services given our limited knowledge of complex ecosystem interactions (Smith et al., 2018;May and Spears, 2012).
In the same context, Osmond et al. (2012) raised concerns that many important empirical findings from past conservation practices across North America have not been incorporated into current BMP guides. For example, earlier work in the area cautioned that the focus on sediment erosion control (no-till conservation, buffer strips, and fall fertilization) may entail a trade-off effect with elevated losses of bioavailable phosphorus (Gebhardt et al., 1985;Logan et al., 1979) and indeed recent studies by Jarvie et al. (2017) and Baker et al. (2017) have attributed the re-appearance of HABs to the unintended consequences from conservation decisions adopted 20-50 years ago (Fig. 6). More recently, Liu et al. (2017Liu et al. ( , 2018 identified that BMP performance assessments are predominantly based on short-term experimental studies, whereas long-term monitoring has registered variable performance trends. For example, Mitsch et al. (2012) has observed a gradual degradation of constructed wetlands in terms of their effectiveness for SRP removal within 15 years of monitoring, while Kieta et al. (2018) reported limited efficiency of vegetative buffer strips in Great Lakes basin, where the majority of nutrients are transported with spring freshet during the non-growing season. Similarly, Li and Babcock (2014) reported long-term orthophosphate areal export rates from green roofs comparable to those of highly intensive agricultural areas. In order to minimize the discrepancy between expected and actual environmental effects, Liu et al. (2018) proposed a framework to incorporate BMP life-cycle effectiveness into watershed management plans by explicit accounting for: (i) the variability in starting efficiency of each BMP type to reduce the severity of runoff and pollutant concentrations due to local condition differences and installation practices; (ii) intrinsic variability of operational performance due to watershed geophysical conditions, differential response to storm events, and seasonality; (iii) non-linearity of BMP effectiveness in response to different loading regimes as well as the expected decline in performance over time, which in turn enforces the need for regular maintenance; and (iv) lagged manifestation of water quality improvements after BMP adoption due to nutrient spiraling downstream or recycling in receiving water bodies (Fig. 6).
Promoting watershed management plans often requires financial incentives, such as tax credits, cost-sharing, reimbursements, insurance, and certification price premiums (Tuholske and Kilbert, 2015). The aforementioned discrepancy in timing between BMP implementation and water quality improvement can make the financial incentives unappealing, if we opt for the "pay-per-performance" practice. Failure of selected BMPs to achieve loading reduction targets should be viewed cumulatively as direct budget losses, environmental capital depreciation, and socio-economic values at risk (Wolf et al., 2017;Farber et al., 2002). The consideration of BMP uncertainties into scenario analysis would introduce financial risk assessment in strategic agro-environmental management decisions by weighting the amount of the proposed financial incentives with non-attainment risks of nutrient reduction goals (Palm-Forster et al., 2016). The Chesapeake Bay Program (CBP) protocol can serve as an exemplary case of comprehensive validation guidance of BMP effectiveness based on rigorous assessment of both treatment risks (known probabilities associated with BMP performance) and uncertainty (lack of knowledge surrounding these probabilities). The CBP protocol is based on transparency and inclusivity, and as such it considers detailed literature review, expert elicitation, data collection from local BMPs, and rigorous analysis (CBP, 2015).
To the best of our knowledge, none of the current watershed models accounts for the life-cycle non-stationarity or overall uncertainty in BMP effectiveness. In particular, SWWM5 does consider concentrationdependent removal of pollutants with specific BMPs during peak and base flows, but still relies on deterministic values of statistically significant median influent-and effluent-event concentrations (Rossman and Huber, 2016). Other major ecohydrological models, such as SWAT Fig. 6. Risks and uncertainties with the BMP implementation of Best Management Practices in the Maumee River watershed. Our study highlights the importance of designing land-use management scenarios that accommodate recent conceptual and technical advancements of the life-cycle effectiveness of various BMPs, the variability in their starting operational efficiency, and differential response to storm events or seasonality. and HSPF, are either based on a deterministic (pre-specified constant) nutrient removal effectiveness or on empirical relationships of variable statistical power (Dorioz et al., 2006). In particular, SWAT considers the impact of vegetative filter strips on dissolved phosphorus removal as a linear function of surface runoff reduction. Nonetheless, the corresponding regression model explains < 30% of the observed variability, while the empirical reduction efficiency ranges from 43% to −31% near zero runoff reduction (Dillaha et al., 1989). As a first step to accommodate BMP uncertainty, we thus propose a moderate enhancement with a stochastic time-invariant representation of BMP effectiveness in watershed models (Griffin, 1995), followed by the introduction of time-variant probability distributions for BMP life-cycle performance (Liu et al., 2018). The proposed stochastic augmentation would allow sampling over the uncertainty of BMP scenarios with Monte Carlo simulations, thereby providing a pragmatic tool to assess the likelihood of the achievability of the proposed nutrient-loading reduction goals. These probabilities can then be subjected to sequential updating through the iterative monitoring-modelling-assessment cycles of adaptive management, whereby our degree of confidence on the success of a selected BMP strategy can be refined.

Integration of economic values of ecosystem services with the Lake Erie modelling framework: an optional augmentation or an emerging imperative?
Ecosystem services are the benefits that humans directly or indirectly gain from ecosystem functions (Costanza et al., 1997). Viewing ecosystems as providers of economically valuable benefits to humans, the concept of ecosystem services effectively links their structural and functional integrity with human welfare. Lake Erie, in particular, provides numerous valuable benefits by supplying drinking water for approximately 11 million people, supporting a $50 billion industrial sector that encompasses tourism, boating, shipping, and fisheries, providing over 240,000 jobs to both the American and Canadian economies, and offering habitat for ecologically, culturally, and economically important biotic communities (Lake Erie Improvement Association, 2012). There is, however, a pressing need to collectively protect ecosystem services that are at risk in the current degraded state of Lake Erie. Given that environmental policy affects both the ecosystem state and the provision of services that human societies benefit from, we argue that the efficacy of local restoration efforts will be significantly enhanced by a rigorous framework that quantifies the economic benefits from a well-functioning ecosystem. Rather than solely acknowledging their vulnerabilities, the actual quantification of the value of ecosystem services is critical when considering trade-offs among diverse policies.
The rationale behind ecosystem valuation is to explicitly describe how human decisions affect ecosystem service values and to express these changes in monetary units that allow for their incorporation in the decision-making process (Pascual et al., 2010). Current markets provide information about the value of a limited subset of ecosystem services that are priced as commodities (Pascual et al., 2010), which poses challenges in our ability to estimate values of a comprehensive set of ecosystem services typically considered in the decision-making process (Millennium Ecosystem Assessment, 2005). To best communicate the trade-offs among policy choices, ecosystem service valuation must examine the marginal improvement in ecosystem services attributable to a policy change. For example, Isely et al. (2018) estimated that a $10 million investment to restore the Muskegon Lake Area of Concern would have a return on investment of approximately 6:1, or in other words, an added $50 million in environmental value over a 20-year period due to increased property prices and a more attractive recreational environment. Although extensive resources and capital are required to conduct ecosystem service valuation, the outcome of such an exercise places a premium on the communication of policy trade-offs in economic terms (commercial goods/services or non-market values such as the average consumer's willingness to pay), thereby increasing stakeholder engagement and societal relevance of conservation actions (Egoh et al., 2007).
Economic values of ecosystem services can help policy-makers determine the optimal degree of investment and action needed at each time step by defining the monetary trade-offs from different courses of management action (Fig. 7a). At the beginning of each restoration effort, the total returns and benefits are typically commensurate with the costs and investments, but this pattern may not hold true after a certain point, where we get diminishing (and ultimately negative) returns and marginal benefits. Viewed from this perspective, the question arising is how likely is it to experience environmental improvements proportional to the socioeconomic investments required (steep linear segment in Fig. 7a), given the presence of a wide array of feedback loops, ecological unknowns, and external factors (i.e., internal loading, dreissenid mussels, different trends between TP and DRP loading, changing climate and increased frequency of extreme events) in Lake Erie? Even more so, our analysis also highlights an additional layer of complexity that we need to factor in during the decision making process; namely, as we opt for drastic (and likely more costly) management actions that differ significantly from current conditions (right end of Fig. 7a), the forecasting error increases significantly (see Fig. 6c, d in Arhonditsis et al., 2019) and so does the likelihood of realizing benefits that are distinctly lower than our original investments. Do we have enough leeway to keep the investments to the environment going? While these assertions seem to paint a pessimistic picture about the challenges and associated risks with the next steps in Lake Erie, it is important to delve into (somewhat underappreciated) ideas, such as the Total Economic Value (TEV) of the ecosystem, the degree of our knowledge of the monetary value of ecosystem services in Lake Erie, and the mismatch between the scales where environmental goals are being set and the spatiotemporal domain that predominantly influences the perception of the public Ramin et al., 2018).
To facilitate ecosystem service valuation, the TEV framework can relate a wide array of ecosystem services to human well-being in monetary terms (Fig. 7b). Direct use values are derived from the uses made of Lake Erie's resources and services, such as drinking water and the natural environment for recreation, while indirect use values are associated with Lake Erie's natural functions, such as nutrient removal and ability to provide fish habitat, refugia, and nursery. TEV typology also helps to identify non-use values that are unrelated to present or future uses, but instead reflect the value associated with the Lake Erie's existence: option, bequest, and existence values (Gilpin, 2000). Option value is the willingness-to-pay a certain amount today to ensure the available use of a benefit provided by Lake Erie in the future. Bequest value refers to the willingness-to-pay to preserve Lake Erie for the benefit of other people, both in the present and future. Existence value is the value attached to knowing that Lake Erie and its benefits exist, even if the individual does not intend to ever actively use them. The Great Lakes, including Lake Erie, provide a wide array of ecosystem services, although they have yet to be comprehensively inventoried (Steinman et al., 2017). Efforts have been underway to rigorously assess the status of ecosystem services and facilitate future valuation studies in Lake Erie. Allan et al. (2017) mapped the distribution of ecosystem services in the Lake Erie basin, while Annis et al. (2017) delineated optimal areas for the conservation of multiple ecosystem services in the nearshore zone of western Lake Erie.
In this context, research on ecosystem service valuation in Lake Erie has concentrated in water quality improvements, erosion risk reduction, recreation, and recreational fishing (see Table 2 for details on the methods used for these valuation studies). Brox et al. (1996) conducted a contingent valuation survey to estimate the willingness to pay up to be $4.50 per household per month (19% of the average water bill) for residential water quality improvements. Likewise, Kriesel et al. (1993) found that the closer a lakefront property was to Lake Erie, the more the homeowner was willing to pay for to reduce the risk of damage from shoreline erosion. Building upon this finding, Dorfman et al. (1996) predicted that owners of high-risk properties would pay an average of $37,826 to effectively eliminate erosion risk, a fairly high amount given that the average selling property price in the study sample was $127,800 at that time. To estimate the value of reducing beach advisories in Lake Erie, Murray et al. (2001) surveyed visitors at 15 Lake Erie beaches in the summer of 1998 and estimated the average seasonal benefits of reducing one advisory to be $28 per visitor per year, while Chen (2013) more recently projected that day trips to a public Great Lakes beach (including Lake Erie) were valued at $32-39 per visitor. Importantly, Palm-Forster et al. (2016) estimated that a full-season closure for a single public beach in Lake Erie would result anywhere from $1.96 to $2.21 million depending on the valuation method used. Along the same line of evidence, Hayder and Beauchamp (2014) estimated that in 2018, the Great Lakes (including Lake Erie) provided approximately $7.76 billion in recreational benefits (recreational boating, wildlife viewing, and beach and lakefront use) and that value would increase to $354 billion after 50 years. In the same context, Kelch et al. (2006) found that angling in Lake Erie was valued at $36-46 per trip and also showed that an annual $0.6 million stocking program could result in a river steelhead fishery of $12-$15 million per year in Ohio. According to Sohngen et al. (2015a), angling trips in Lake Erie could be valued up to $88 per trip or total expenditures in the economy that amount to $67.1 million per year, while Wolf et al. (2017) estimated that fishing-license sales drop between 10% and 13% when algal conditions exceed the World Health Organization's risk-advisory threshold of 20,000 cyanobacteria cells mL -1 . The latter study concluded that a wide-scale, summer-long algal bloom in Lake Erie would reduce fishing licenses issued by 3600 and fishing expenditures by $2.25-5.58 million.
Ecosystem service valuation can facilitate the active involvement of stakeholders and allow for new insights and knowledge to be passed into the decision-making process. This can be particularly helpful in Lake Erie given its complex ecology and diverse stakeholder groups with divergent goals, priorities, and values (Egoh et al., 2007). Integrating scientific knowledge with ecosystem service values can Fig. 7. (a) Benefits accrued resulting from costs invested in an environmental restoration project. At the beginning of each restoration effort, the total returns and benefits are typically commensurate with the costs and investments, but this pattern may not hold true after a certain point, where we get diminishing (and ultimately negative) returns and marginal benefits. (b) Breakdown of Lake Erie's ecosystem services using the Total Economic Value (TEV) framework.

Table 2
Research on ecosystem service valuation concentrated on water quality improvements, erosion risk reduction, recreation, and recreational fishing in Lake Erie.  Dorfman et al. (1996); Kriesel et al. (1993) Travel-cost valuation (revealed preference): rationalizes that the value of a site is reflected in the willingness-to-pay the associated travel cost. Hayder and Beauchamp (2014); Palm-Forster et al. (2016) promote knowledge co-production and co-learning among technical experts, stakeholders, and decision-makers (Laniak et al., 2013). Fortunately, the wealth of watershed and aquatic ecosystem models in Lake Erie offers an excellent foundation upon which relationships among human actions, water quality trends, multiple ecosystem goods and services, and associated changes in values can be depicted (Fig. 8).
A characteristic example of the insights that could be gained by such an integrated modelling framework was the study presented by Roy et al. (2010Roy et al. ( , 2011, which examined the likelihood to find an optimal balance between the conflicting interests of two societal groups, "food producers" and "recreational water users", in Sandusky Bay. The latter group includes coastal homeowners, recreational lake users, and local firms that serve recreational lake users, whereas the former represents agricultural operations that generate revenues by activities that increased lake eutrophication. The proposed addition of a socioeconomic component to the existing integrated watershed-receiving waterbody models will allow the rigorous evaluation of conservation actions and identification of options that allocate financial incentives (direct payments, tax credits, insurance, and stewardship certification benefits) cost-effectively by funding practices with high predicted environmental benefits per dollar invested (Palm-Forster et al., 2017). Consistent with our criticism regarding the skill assessment of the existing modelling work against aggregated spatiotemporal (seasonal/ annual, basin-or lake-wide) resolutions , we also question the adequacy of the coarse scales selected to establish nutrient loading targets and water quality indicators in Lake Erie (Scavia et al., 2016b). This strategy is neither reflective of the range of spatiotemporal dynamics typically experienced in the system nor does it allow us to evaluate our progress with ecosystem services at the degree of granularity required to assess the public sentiment. It would seem paradoxical to expect a single-valued standard, based on monitoring and modelling of offshore waters, to capture the water quality conditions in nearshore areas of high public exposure (e.g., beaches). The degree of public satisfaction is primarily determined by the prevailing conditions at a particular recreational site and given date, and not by the average water quality over the entire basin (or lake) and growing season. In our view, the problems with the outdated practice to basing the water quality assessment on the offshore zone with a coarse time scale are twofold: (i) we cannot effectively track the progress with the response of the system, as it is not clear to what extent an incremental improvement in the open waters is translated into distinct changes in the nearshore; and (ii) the environment targets and decisions are implicitly disconnected from our aspiration to protect ecosystem services and gauge public satisfaction at the appropriate resolution. In the context of adaptive management, we believe that the critical next steps involve the determination of appropriate metrics and scales of expression along with the design of a monitoring program that will allow us to effectively track the progress of the system in both time and space (Table 1). Depending on the ERI considered, there are different areas for future augmentation in order to more comprehensively monitor the response of Lake Erie. In particular, the assessment of the trophic status may be more appropriate to revolve around extreme (or maximum allowable) phytoplankton or TP levels and must explicitly accommodate all the sources of uncertainty by permitting a realistic frequency of violations Arhonditsis et al., 2016). Rather than any type of data averaging, we advocate the assessment of compliance against the proposed probabilistic criteria using daily snapshots collected regularly from different sites during the growing season. The development of the "cyanobacteria index" is certainly useful, but given the technical limitations of the satellite images, we also need other cHAB proxy variables that will be collected regularly from the system, including toxins, e.g., Microcystin-LR (Kelly et al., 2019). The established thresholds for drinking water (1.5 μg L −1 ) and recreational purposes (20 μg L −1 ) offer easily defensible targets to track the frequency of compliance of Lake Erie in time and space. Regarding the hypoxia and Cladophora ERIs, given our limited mechanistic and quantitative understanding of the primary driving factors, we also propose the development of systematic records for variables that represent direct causal factors of the actual problem, such as phosphorus content in the Cladophora tissues, characterization of the organic matter and Fig. 8. Relationships among human actions, water quality changes, multiple ecosystem goods and services, and associated changes in values in Lake Erie. The proposed addition of a socioeconomic component to the existing integrated watershed-receiving waterbody models will allow the rigorous evaluation of conservation actions and identification of options that allocate financial incentives cost-effectively by funding practices with high predicted environmental benefits per dollar invested. (Modified from Keeler et al., 2012) phosphorus fractionation in the sediments, is the most prudent strategy to move forward.

Conclusions
With a wealth of models developed, the next steps of the modelling enterprise should be strategically designed to serve the aspiration of a sustainable resource management in Lake Erie. Rather than "reinventing the wheel" by building new models that bear significant similarity to the ones that are already in place (Mooij et al., 2010), it is critical to craft augmentations that will effectively complement the existing work. In particular, the presence of multiple SWAT applications provides assurance that a wide array of physical, chemical, and biological processes with distinct characterizations are considered to reproduce the patterns of flow and nutrient export in agricultural settings, like the Maumee River watershed. As noted in our companion paper , even though there are models with mechanistically more advanced representation of certain facets of the hydrological cycle (surface runoff, groundwater and sediment erosion) or better equipped to depict urban environments (e.g., MIKE SHE, SWMM), we believe that greater insights will be gained by revisiting several influential assumptions (tile drainage, fertilizer/manure application rates, land-use/land-cover data) and recalibrating the existing applications to capture both baseline and event-flow conditions and daily nutrient concentration (not loading) variability in multiple locations rather than a single downstream site. Of equal importance is to redesign the land-use management scenarios to accommodate recent conceptual and technical advancements of the life-cycle effectiveness of various BMPs, the variability in their starting operational efficiency, and differential response to storm events or seasonality. One of the focal points should also be the role of legacy P along with the hydrological and biotransformation mechanisms that modulate DRP loading trends. The assessment of the flow-concentration patterns for N species and the characterization of processes associated with the nitrogen cycle are still missing in the Lake Erie basin, even though nitrogen could be one of the regulatory factors of the downstream water quality conditions; especially with respect to the composition of the algal community Kim et al., 2019). Coupled with the mechanistic tools for the Maumee River watershed, empirical (SPARROWlike) models geared to depict basin-specific (rather than continental or regional) nutrient loading conditions can offer a multitude of complementary benefits, such as validate the spatial delineation of hotspots with higher propensity for nutrient export, narrow down the uncertainty of processes/fluxes parameterized by mechanistic models, and obtain predictive statements constrained within the bounds of databased parameter estimation.
Counter to the watershed modelling framework for the Maumee River watershed, the multi-model approach for Lake Erie included both data-oriented and process-based models to examine the ERI achievability under different nutrient loading conditions. The former models (UM-GLERL and NOOA Western Basin HAB models) established causal linkages between cHAB proxies and external phosphorus loading. Their foundation upon statistical parameter estimation allows for rigorous predictive uncertainty assessment, and thus they represent a pragmatic means to draw forecasts regarding the severity of cHABs. Two critical next steps to further augment the empirical modelling work is the iterative updating as more data are acquired through monitoring and the introduction of additional explanatory variables that likely favor the occurrence of cyanobacteria-dominated blooms. After all, while the availability of phosphorus may hierarchically be one of the primary conditions for cyanobacteria dominance, there are several other factors (e.g., nitrogen, iron, light availability, water column stability, and water temperature) that can ultimately determine the winners of the inter-specific competition within the phytoplankton assemblage (Kelly et al., 2019). Because the majority of the process-based models (ELCOM-CAEDYM, WLEEM, EcoLE) are far from being constrained by the available data, their primary use has been (and should continue to be) as heuristic tools to advance our understanding of the lake functioning (e.g., potential role of dreissenids, relative importance of meteorological forcing vis-à-vis nutrient availability on the severity of hypoxia), whereas their predictive power is still under question .
With respect to the load-response curves presented by Scavia et al. (2016a,b), the forecasting exercise related to the overall summer phytoplankton biomass in the western basin has a lot of potential to assist the local management efforts. The next augmentations should focus on the development of more reliable empirical model(s) that will connect chlorophyll a with a suite of significant predictors, and the advancement of the representation of several factors that could modulate the phytoplankton response to external nutrient loading reductions, such as the degree of reliance of phytoplankton growth upon internal nutrient sources (e.g., microbially mediated regeneration, P loading from the sediments), or the strength of top-down control . The coupling of empirical and process-based models to predict the cHAB likelihood of occurrence under reduced loading conditions offers a robust foundation to evaluate competing hypotheses and advance our knowledge on the suite of factors that may trigger cyanobacteria dominance in Lake Erie. However, as our companion paper underscored, it is important to recognize that the reported range of cumulative Maumee March-July loads of 890-1150 MT for achieving the cHAB target is likely narrow and does not capture the actual uncertainty with this ERI . We also remain skeptical with the optimistic projections of the extent and duration of hypoxia, given our limited knowledge of the sediment diagenesis processes in the central basin and the lack of data related to the vertical profiles of organic matter and phosphorus fractionation or sedimentation/burial rates. Without this piece of information is practically impossible to quantitatively characterize feedback loops of elevated internal loading and sediment oxygen demand, even when the prevailing conditions in the water column are improved, and thus offer strategic foresights into the likelihood to experience a delayed response of the sediments to reduced nutrient loading. It is important to keep in mind that one of the pillars of adaptive management is resilience thinking by monitoring existing problems, highlighting emerging threats, and redefining the research agenda (Cook et al., 2014;Johnson et al., 2013). In terms of the beach fouling by Cladophora blooms, the current modelling efforts have been insightful but further enhancement of their predictive value requires a high-resolution study of the northeastern nearshore zone to elucidate the relationships among abiotic conditions, internal P content, and sloughing rates in the local mats.
From a management standpoint, it is important to note that the complex mechanistic models are an absolutely worthwhile activity and will continue to assist the on-going management efforts in a meaningful way. Consistent with Anderson et al. (2006) views, we believe that prediction is not everything. Even if the structure of complex mathematical models reduces their forecasting power or even the ability to conduct rigorous uncertainty analysis, they still offer excellent platforms to gain insights into the direct, indirect, and synergistic effects of the ecological mechanisms forming the foundation of system behavior (Arhonditsis, 2009). For example, the virtual 3-D environment created by ELCOM-CAEDYM and/or WLEEM can offer a convenient platform to reconcile the coarse-scale (practically offshore) predictions, required to assess the ERI achievability, with the granularity that necessitates to elucidate nearshore processes and associated ecosystem services. Even more so, their dynamic integration with the watershed modelling framework will allow to trace the fate of nutrients and suspended solids transported by the Maumee River (and other major tributaries), and generate hypotheses about their impact on the timing and locations where structural shifts in the algal assemblage may occur. Furthermore, being an integral part of the iterative monitoring-modelling-assessment cycles, the foundation of the mechanistic modelling work in Lake Erie can be optimized through reduction of the uncertainty of critical ecological processes or refinement of their structure (e.g., mathematical reformulation of highly sensitive terms, exclusion of irrelevant mechanisms and inclusion of missing ones), thereby augmenting their ability to support ecological forecasts (Arhonditsis et al., 2007). It is thus critical that one of the priorities of the research agenda should be to maintain the ensemble character of the modelling work in Lake Erie. The wide variety of models that have been developed to understand the major causal linkages/ecosystem processes underlying the local water quality problems are a unique feature that should be embraced and further consolidated.
Our analysis questioned the adequacy of the coarse spatiotemporal (seasonal/annual, basin-or lake-wide) scales characterizing both the modelling enterprise and water quality management objectives in Lake Erie. More than anything else, this strategy seems somewhat disconnected from the ecosystem services targeted under Annex 4 of GLWQA. In the same context, we argued that ecosystem service valuation can facilitate the decision-making process by identifying costeffective restoration actions, as we track the evolution of the system over time. While adaptive management and ecosystem service valuation are not typically used together to support the decision-making process, they are exceptionally complementary. Both approaches assess ecological systems empirically and are policy-oriented as they describe management implications for stakeholders (Epanchin-Niell et al., 2018). To advance the operationalization of this integrative approach will however require greater interaction among different types of experts of methods, models, and data in social, economic, and environmental sciences. Applying an integrated adaptive management-ecosystem services framework places a premium on articulating policy trade-offs, and therefore has the potential to facilitate the management decisions in the face of uncertainty.