Groundwater sustainability: a review of the interactions between science and policy

Concerns over groundwater depletion and ecosystem degradation have led to the incorporation of the concept of groundwater sustainability as a groundwater policy instrument in several water codes and management directives worldwide. Because sustainable groundwater management is embedded within integrated, co-evolving hydrological, ecological, and socioeconomic systems, implementing such policies remains a challenge for water managers and the scientific community. The problem is further exacerbated when participatory processes are lacking, resulting in a communication gap among water authorities, scientists, and the broader community. This paper provides a systematic review of the concept of groundwater sustainability, and situates this concept within the calls from the hydrologic literature for more participatory and integrated approaches to water security. We discuss the definition of groundwater sustainability from both a policy and scientific perspective, tracing the evolution of this concept from safe yield, to sustainable groundwater management. We focus on the diversity of societal values related to groundwater sustainability, and the typology of the aquifer performance and governance factors. In addition, we systematically review the main components of an effective scientific evaluation of groundwater sustainability policy, which are multi-process modeling, uncertainty analysis, and participation. We conclude that effective groundwater sustainability policy implementation requires an iterative scientific evaluation that (i) engages stakeholders in a participatory process through collaborative modeling and social learning; (ii) provides improved understanding of the coevolving scenarios between surface water-groundwater systems, ecosystems, and human activities; and (iii) acknowledges and addresses uncertainty in our scientific knowledge and the diversity of societal preferences using multi-model uncertainty analysis and adaptive management. Although the development of such a transdisciplinary research approach, which connects policy, science, and practice for groundwater sustainability evaluation, is still in its infancy worldwide, we find that research towards groundwater sustainability is growing at a much faster rate than groundwater research as a whole.


Introduction
One of the most pressing global needs is to ensure water security in the face of a growing human population, agricultural expansion, and climate change (Vörösmarty et al 2010, Wheater and Gober 2015). Groundwater is the world's largest distributed store of fresh water that is central to sustaining ecosystems and enabling human adaptation to climate variability and change (Taylor et al 2013). Sustainable management of groundwater resources is particularly critical, with 50% of the world's drinking water and 43% of irrigation sourced from aquifers (Siebert et al 2010, IUCN 2016). Yet as a common pool resource that sustains humans and ecosystems (Ostrom 1990), groundwater is often subject to unsustainable levels of exploitation and depletion (Custodio 2002, Rodell et al 2018, Bierkens and Wada 2019, de Graaf et al 2019. This can increase with a growing demand for food (Mclaughlin and Kinzelbach 2015). Due to intensive groundwater exploitation, saltwater intrusion and land subsidence may become serious concerns in some areas (Galloway andBurbey 2011, Michael et al 2017), among other environmental threats and geo-hazards as discussed by Bierkens and Wada (2019). Additionally, these emerging threats to groundwater often disproportionately impact the poor (UPGro 2017). To cope with these threats, and enhance the utility of this vital natural resource, sustainability often enters the groundwater literature and practice via the concepts of safe yield, sustainable yield, sustainable groundwater development, and sustainable groundwater management. We utilize the term groundwater sustainability to refer to these four concepts interchangeably, unless otherwise specified. Groundwater sustainability depends on the environment, varying globally from arid to humid (Cuthbert et al 2019), and accordingly societal preferences change from quantity to quality. Additionally, not all aquifers are renewable at a human time-scale as discussed in section 2. Generally, groundwater sustainability can be defined as 'maintaining long-term, dynamically stable storage [and flow] of high-quality groundwater using inclusive, equitable, and longterm governance and management' (Gleeson et al 2020).
Groundwater sustainability is increasingly being incorporated into groundwater policies, laws, and regulations in a number of places around the world, such as Australia (Quevauviller et al 2016), British Columbia (Ohdedar 2017), California (Grabert et al 2006, Owen et al 2019, France , Germany (Knüppe et al 2016), Hawaii (Sproat 2009), Massachusetts (Levangie 2008), the Netherlands (Lijzen et al 2014), and South Africa (Seward 2010), among other places as reviewed by (Kalf and Woolley 2005). As a policy instrument, the goals might be variable, but often aim to prevent groundwater overdrafting, and might include measures to ensure water supply into the future, or to protect groundwater dependent systems (Kalf and Woolley 2005, Pierce et al 2013, Milne-Home 2016. However, the lack of transdisciplinary communication has a profound influence on the interactions between groundwater sustainability policy and implementation (Pandey et al 2011, Bakker 2012, Unver et al 2017. Despite being a policy instrument in several water codes and directives, the operationalization of groundwater sustainability policy in a dynamic and interconnected way remains problematic for both groundwater managers and the scientific community. For example, Owen et al (2019) note that while the California Sustainable Groundwater Management Act acknowledges surface watergroundwater interconnections, California will need years to reconcile legal and management systems that have spent decades in artificial separation. Similarly, Seward (2010) states that while the South Africa Water Act is widely regarded as one of the most progressive pieces of environmental legislation in the world, misunderstanding still abounds regarding its environmental aims. Also, Rejman (2007) concludes that the EU Water Framework Directive contributes little to the improvement of groundwater sustainability, without practical and factual groundwater management at an operational level. Even in the Murray-Darling Basin, where the Australian water reform agenda has successfully returned overexploited aquifer systems to environmentally sustainable levels of withdrawal, several communities do not yet have full confidence in water plans or their processes (Jackson et al 2012).
Effective operationalization of groundwater policy related to groundwater sustainability requires developing a solid conceptual foundation for transferring scientific knowledge into societal decision making (Maimone 2004, Archfield et al 2010, Pierce et al 2013. This necessitates careful consideration around three main challenges. First, the science of groundwater sustainability, which includes both the natural and social sciences, is complex. Groundwater management is embedded within coevolving biophysical and socioeconomic systems, which are difficult to fully capture in a groundwater modeling framework. Understanding and modeling the coevolution of societies with water resources systems, ecosystems, and their interactions with the climate is a complex transdisciplinary problem that involves physical, socioeconomic, technological, and institutional aspects. Although there are many recent calls for such interlinkages to be incorporated into water sustainability agendas (Montanari et al 2013, Thompson et al 2013, Brown et al 2015, Sivapalan and Blöschl 2015, Wheater and Gober 2015, this type of research is only beginning to emerge within the groundwater literature. Second, these integrated models incorporating both the natural and human aspects of complex and  Wheater andGober 2015, Maier et al 2016). Third, there is often a communication gap between the academic community, decision makers, and practitioners, which makes the scientific output less demand-driven (Bakker 2012. This article is a review of the concept and evolution of groundwater sustainability within science and emerging policies to highlight the science-policy gaps in the operationalization of groundwater sustainability policy. In response to the aforesaid three challenges, we propose that effective operationalization of groundwater sustainability policy requires a science-policy interface that is deeply participatory; considers multiple uses of water by people and ecosystems; and effectively communicates uncertainty. Drawing on emerging case studies, the article provides details about the transdisciplinary groundwater management process to operationalize groundwater sustainability policy, and identifies knowledge gaps. The remainder of the review article consists of three parts. The first part attempts to define groundwater sustainability by showing the evolution of this concept (section 2), and then discusses the science-policy interface (section 3). The second part presents our systematic review method and results, highlighting recent trends in groundwater sustainability literature (section 4). The third part reviews the scientific evaluation of groundwater sustainability, focusing on the three essential components of this process. These three components are multiprocess modeling that includes hydrological modeling, ecosystem services modeling, and human activities modeling (section 5), uncertainty analysis (section 6), and participation (section 7). Finally, in section 8 we conclude the article with a discussion of directions for future research, based on the challenges and gaps that we identified in the literature reviewed.

Diverse perspectives on groundwater sustainability
Among the first attempts to understand groundwater sustainability is the introduction of the term 'safe yield' that is defined by Lee (1915) as 'the limit to the quantity of water which can be withdrawn regularly and permanently without dangerous depletion of the storage reserve.' The term and its definition have undergone many changes and transitioned to 'sustainable yield' as discussed by several studies (Kalf and Woolley 2005, Pierce et al 2006, Dzurik et al 2018. Yet despite the large number of papers regarding this topic, the safe yield definition remains elusive. 'Safe yield' is in the eyes of the beholder. As noted by Thomas and Harold (1951, p.262) 'safe yield is an Alice-in-Wonderland term which means whatever its user chooses.' This subjective understanding emerges due to the presence of diverse actors with different objectives, even though the principles of groundwater flow are scientifically well established. To further illustrate the inherent uncertainty and complexity of 'safe yield' with its transition to 'sustainable yield' and more generally to groundwater sustainability, Rudestam and Langridge (2014) attempt to catalogue different interpretations of the subject from the perspectives of different actors including academics, the courts in groundwater adjudications, state agencies, and local water practitioners as summarized in figure 1. However, these diverse perspectives share some common underlying themes that can be consolidated into three primary interrelated constituents of the groundwater sustainability concept: societal values, aquifer performance and governance factors, and groundwater sustainability criteria. These constituents are the subject of this section. This shall serve as an introduction to the main goal of the review article, which is to synthesize peer-review and grey literature to illustrate the basic components of scientific evaluation to implement a groundwater sustainability policy.
The development of the concept of groundwater sustainability, has come a long way. Starting with the concept of 'sustainable yield' , various studies (Maimone 2004, Alley andLeake 2005, Rudestam and Langridge 2014) provide a historical context of the current notion of sustainable yield. At its core, defining the limits of groundwater sustainability in a manner capable of informing management decisions was first introduced via the concept of safe yield by Lee (1915). In that initial definition, the concept of safe yield was defined as any rate of water pumping that is less than or equal to recharge under steady state conditions, regardless of the role of discharge from the aquifer, and adhered to quantifiable parameters in groundwater hydrology. This concept, however, defies conservation of mass and is referred to in the hydrological literature as the 'Water Budget Myth' (Bredehoeft 2002, Devlin andSophocleous 2005). More correctly, Theis (1940) concludes that groundwater pumping will be balanced by a loss of water elsewhere, largely from storage, and possibly from induced recharge, reduced discharge, or both. Subsequently, there has been a shift from the use of the concept of 'safe' to 'sustainable' yield (Alley and Leake 2005), with the emphasis that groundwater systems are embedded within the broader hydrological system (Alley et al 1999). Through the development of the technical understanding of critical linkages between groundwater and other management components, the concept of sustainable yield evolved to include economic feasibility, water quality degradation, water rights, and other factors piecewise (Kalf andWoolley 2005, Alley andLeake 2005). In the case of California, for example, a transition away from the use of the phrase 'safe yield' to increased use of 'sustainable yield' and 'sustainable groundwater management' in state documents may reflect progress towards an integrated, whole-systems view of groundwater management (Grabert et al 2006). This encompassing concept of sustainable groundwater management is defined in a seminal article by Gorelick et al (2015) as follows: A sustainable groundwater system is one in which pumping can safely continue indefinitely. If water managers adopt the definition of safe yield as the maximum prolonged pumping such that all logistic, environmental, legal, social, economic, and physical constraints are met, then sustainable groundwater use solutions can be identified. However, the other essential requirement is a complete understanding of the future hydrogeologic system, including ultimate long-term capture of surface waters and rejected recharge, as well as water quality degradation.
It took nearly a century for the safe yield concept to develop from Lee (1915) to Gorelick et al (2015). Yet the '[term] "Yield" (with safe or sustainable as a modifier) has an inherently exploitative connotation while "sustainable groundwater management" or "groundwater sustainability" is more clearly and inherently participatory and progressive' (Gleeson 2020). This transition has occurred in the academic literature and various laws and policies, such as the EU Water Framework Directive (WFD 2000, GWD 2006, the Sustainable Groundwater Management Act (CWC 2014) in California, and the Water Sustainability Act (SBC 2014) in British Columbia (Gleeson 2020). With participation, adaptive governance, and more rigorous representation of groundwater sustainability criteria, the concept of sustainable yield is further developed to sustainable groundwater development and management as shown in section 2.4.
However, Bredehoeft and Alley (2014) highlight how practical realities in the implementation of groundwater development and management make a 'true estimate' of sustained resource availability difficult to estimate in practice. This is mainly due to the existence of different perspectives on groundwater sustainability by different actors, and the diversity of interpretations of groundwater resources utility and quality through different societal value lenses. In addition, there exists complex interlinkages between multiple factors related to both aquifer performance and governance. Moreover, the presence of several aquifer yield terms such as safe yield, sustainable yield, perennial yield, renewable yield, consensus yield, operational yield, management yield, optimal pumping considering private and societal costs, optimal yield and many more as reviewed by Hata (1998) can create further misunderstanding. Molle (2011) expresses this underlying ambiguity stating that: water allocation [is] very much of [sic] a zero-sum game. Your benefit here is likely to be my cost there; ones's [sic] short-term use here conflicts with next generations' use there; what is safe for you is unsafe for me; impacts may appear negligible to me [sic] but not to another beholders's [sic] eyes, etc. In other words, because of the fluid nature of water [sic] my use, right, vision or values are not independent from those of other people equally connected to the same hydrologic regime. Groundwater use appears intricately linked to this wider cycle and inevitably speaks to issues of rights, equity, economic efficiency and environmental values.
With this debate continuing, a general consensus is emerging that sustainable groundwater management requires a stakeholder-driven approach from both the policy and science sides (Molle 2011, Pierce et al 2013, Rudestam and Langridge 2014, Alley 2018 with a more holistic systems-perspective (Molle 2011, Pierce et al 2013, Gleeson et al 2020. Before discussing the scientific challenges of groundwater sustainability evaluation, we first discuss societal values related to groundwater sustainability.

Diversity of values when debating groundwater sustainability
As noted by Gober et al (2014) the policy process is a debate about core human values and their meanings. This debate is one of the twenty-three unsolved problems in hydrology, which was formulated by a community perspective as: 'What are the synergies and tradeoffs between societal goals related to water management?' (Blöschl et al 2019). 1 Figure 2 shows a classification of values related to groundwater sustainability that undergird these goals. Defining the core values of the decision context is generally debated and contested among different actors during the groundwater sustainability evaluation process (Baldwin et al 2012, Wiek and Larson 2012, Griffioen et al 2014. While this debate can be dialectic, arguing for instrumental values (i.e. the environment is important for human activities) or for intrinsic values (i.e. the environment has value independent of human activities) (Tallis and Lubchenco 2014), Chan et al (2016) argue that we need to rethink environmental values beyond this dichotomy to a third class that is relational values.
Relational values refer to individual and collective identity, relation, dependence, attachments, and responsibilities to a place (Chan et al 2016, Rudestam et al 2018. They can also be guided by norms, virtues, care, core-values, spiritual well-being, local narratives, traditional knowledge, and principles associated with place. Relational values also include cultural and environmental heritage sites. For example, Piscopo et al (2019) evaluate sustainable yield to ensure a significant flow to the natural thermal springs that are a heritage site. Relational values also capture rights of indigenous peoples and communities whose definition of water security is often grounded in traditional or spiritual values (Wheater and Gober 2015). Rudestam et al (2018) note that relational values go beyond materialistic theories of place to additionally account for humanistic theories of place, which allow for a better understanding of groundwater conservation ethics. Rudestam et al (2018) further highlight that metrics utilized by water managers and government agencies typically focus primarily on economic aspects and aesthetic qualities of place rather than the societal and psychological ways that residents engage with places. Understanding place identity and dependence is imperative for sustainable groundwater management. For example, Fernald et al (2015) show that the community cohesion in the valleys of northern New Mexico is maintained by the value of attachment to place derived from traditional irrigation and local farming culture; and this in turn maintains groundwater recharge that is important for groundwater dependent ecosystems. Similarly, Rudestam et al (2018) suggest that debating relational values provides a deeper understanding of why particular groundwater management practices occur. Such understanding can, for example, aid in shaping future sustainability efforts (Rudestam et al 2018), and in the implementation of groundwater conservation practices related to climate change adaptation (Sanderson and Curtis 2016).
Aesthetic values, when incorporated into the design and management of natural and engineered systems, add a dimension of beauty that goes beyond mere functionality. For example, the Hawaii State Water Code considers maintaining scenic beauty a critical component of sustainable yield management, stating that 'adequate provision shall be made for … the maintenance of proper ecological balance and scenic beauty' (HRS chapter 174C 1987). In practice, societal preferences regarding aesthetic qualities can be accounted for using coupled behavioralhydrological models to inform water policy (Conrad and Yates 2018). For more details, a recent review article by Roobavannan et al (2018) discusses the incorporation of norms, values, and social science insights into hydrological models.
A fifth class of values is equity, which centers around fair allocation of groundwater resources with respect to competing interests (Kumar et al 2011, Reddy et al 2014, Srinivasan and Kulkarni 2014, Farhadi et al 2016, Kumar 2016, and as related to social justice . For example, UPGro (2017) show that changes and emerging threats to groundwater access, quality, and quantity are likely to disproportionately impact the poor, and improved groundwater access can confer a variety of benefits for the poor, and drive long-term changes in poverty trajectories. There are generally two types of equity considered in water policy design, which are intra-generational and intergenerational equity. The difference between these two types is not merely semantic (Mckay 2011, Larson et al 2013. For example, Zagonari (2010) shows that certain policy measures such as groundwater subsidies that are beneficial for the current generation can be detrimental for future generations. Thus, the future generations need to be explicitly represented (e.g. via representatives who defend the anticipated interests of future generations) in the water governance process (Wiek and Larson 2012). Because intra-generational equity is evaluated under shorter time horizons, policies can be developed and implemented by backcasting and adaptive management ; the process of backcasting entails defining a desirable future and developing plans to connect that predefined future to the present. However, inter-generational equity generally incorporates more complex values to discuss. This is mainly because the integration of a multi-generational perspective requires a time frame of 50 to 100 years that is often longer than the policy timeframe (<50 years), but shorter than the groundwater residence time . Note that groundwater residence time refers to the time required for the aquifer system to reach another steady state, and ranges from tens to millions of years. Managing groundwater resources to protect and improve public health (e.g. Lapworth et al 2019) is another core value as discussed by (Gorelick et al 2015). Moreover, public health can be closely tied to equity (Lele 2017).
Resilience is another class of values, which is more pragmatic in nature and is extensively addressed in groundwater sustainability evaluation studies. Resilience with respect to groundwater sustainability, involves coping with externalities and crises, and returning to the pre-crisis status, or taking pro-active measures to mitigate the crisis. Crises related to groundwater include sea-level rise , Pulido-Velazquez et al 2018, Ha et al 2018b, severe drought (Leblanc et al 2012, Ranjan 2013, Gober et al 2016, Han et al 2017, and land subsidence and sinkholes (Calderhead et al 2012, Ruiz-Constan et al 2018, Wang et al 2018. While resilience is generally a major concern for stakeholders, the trade-off between economic payoffs and resilience can result in conflict (Katic andQuentin Grafton 2011, Tortajada et al 2017).
Another core value is consensus, which many water policies (e.g. the EU Water Framework Directive, the U.S. Federal Clean Water Act, the Sustainable Groundwater Management Act in California, the Australia water reform agenda, the Water Sustainability Act in British Columbia, etc) emphasize in groundwater management. Consensus involves reaching an agreement among different actors with the objective of not only improving groundwater resources management, but also enabling individuals and groups to participate freely and equally in management (Carr et al 2012). In the case of transboundary basins, consensus building can also involve multiple neighboring countries and stakeholders, which necessitates discussing nationally sensitive issues and engagement with those who may have differing ideas (Lee et al 2017, Gleeson et al 2020. Another important component of consensus building is indigenous rights. The 2007 United Nations Declaration on the Rights of Indigenous Peoples stipulates the right to self-determination, which includes the right of indigenous peoples worldwide to freely determine their political status and freely pursue their economic, social, and cultural development. As noted by Wheater and Gober (2015) respecting indigenous peoples' right to exercise self-determination requires policy reform and social efforts beyond the traditional collaborative process.
Given these diverse classes of values, the formulation and implementation of a groundwater sustainability policy is an opportunity to unite actors who otherwise would be separated by differing environmental and societal values; Gober et al (2014) come to a similar conclusion on the issue of water security. As 6 noted by Custodio et al (2019), aquifer sustainability is not a purely scientific and technical issue, but the result of societal preferences and decisions. This understanding is explicitly expressed in groundwater sustainability policies. For example, the Australian National Groundwater Committee (NGC) defines sustainable groundwater yield as 'the groundwater extraction regime, measured over a specified planning time frame, that allows acceptable levels of stress and protects dependent economic, social, and environmental values' (NGC 2004). Explicit inquest into the societal values motivating groundwater use and management is critical to groundwater sustainability (Johnston et al 2002, Bremer et al 2018, Lauer et al 2018.

Typology of the groundwater sustainability factors
Scientific evaluation of groundwater sustainability addresses multiple aquifer performance and governance factors. The debate about groundwater sustainability centers around understanding the hydrological fundamentals, while remaining cognizant of the science-policy interface (Sophocleous 2000, Asefa et al 2014, Jorgensen et al 2017. The aim is generally to define clear groundwater sustainability factors that can be measured and assessed. For example, Pierce et al (2013) present a conceptual typology that identifies six key aquifer yield factors derived from the literature: (1) recharge rates and storage conditions, (2) water quality, (3) discharge rates and environmental flows, (4) legal constraints, (5) economic feasibility, (6) and equity. Given these factors, multiple groundwater sustainability indicators can be formulated. For example, Henriksen et al (2008) develop four indicators that relate pumping to recharge, river runoff, and baseflow given current and pre-pumping conditions. Srinivasan et al (2017) categorize these factors into four main systems, which are the (1) natural groundwater system with other coevolving ecological systems, (2) infrastructure system that includes groundwater pumping and injection facilities and technologies, (3) socioeconomic system that includes societal values and preferences related to water use, and (4) institutional system that sets rules to decide who is permitted to pump how much water for what purpose. Figure  3 presents eight groundwater sustainability factors, updated from the six sustainable yield factors identified by Pierce et al (2013), and shown within the four system categories established by Srinivasan et al (2017).
The eight factors related to the evaluation of groundwater sustainability, which are presented in figure 3, capture aquifer performance and governance. The first factor accounts for recharge rates and storage conditions, which includes climate and land-use change impacts on recharge, lowering of groundwater levels, and reductions in groundwater storage. This also includes increasing the recharge rate by changing agricultural technologies and practices , Qin et al 2013 and land-use (Hyndman et al 2017) as discussed in section 5.3. Water quality, the second factor, includes point and non-point source contamination, unreasonable saltwater intrusion, and mobilization of heavy metals, such as arsenic. The third factor is related to discharge rates and environmental flow. This includes groundwater capture from surface water bodies, spring discharge, and submarine groundwater discharge with their associated beneficial uses detailed in section 5.2. The performance of an aquifer in relation to natural hazards such as land-subsidence, sinkholes, triggering earthquakes, sea-level rise, and severe drought is an additional factor to the six factors listed by Pierce et al (2013). Note that severe drought entails a non-stationary externality (Milly et al 2008(Milly et al , 2015 as opposed to managing the aquifer under stationary conditions.
Groundwater related infrastructures that enable pumping, injection, monitoring, treatment, and distribution are directly related to aquifer performance. This factor also includes facilities and technologies that can support and improve aquifer performance such as water transfer projects (e.g. Scanlon et al 2007, reservoirs for drought (Langridge and Daniels 2017), environmental flow maintenance (Shi et al 2012), desalination and water recycling (e.g. 2017), and conjunctive use and managed aquifer recharge (e.g. Badiuzzaman et al Harou and Lund 2008, Scanlon et al 2016, among others. Aquifer governance includes three main factors as shown in figure 3. Water governance is the processes of decision-making regarding resource goals, and the rules and practical measures defined to meet those resource goals (Gleeson et al 2020). The legal and institutional system is a central component in groundwater governance. Several recent studies and reports (Foster et al 2010, UN FAO 2016, Villholth and Conti 2018, Gleeson et al 2020 provide an overview on the structure and role of effective legal and institutional groundwater governance frameworks for promoting groundwater sustainability. We briefly discuss a few legal and institutional constraints that can be directly related to the evaluation of groundwater sustainability to operationalize a related policy. Legal and institutional constraints include water rights such as riparian, prior appropriation, common-pool resources, indigenous rights, or other doctrines. For example, Sophocleous (2012) suggests that improving the prior-appropriation framework and broadening the definition of "beneficial use" under the Kansas Water Resources Appropriation Act of 1945 is a needed step toward groundwater sustainability. Similarly, Llamas et al (2015) show that with respect to the implementation of the EU Water Framework Directive 2000 in Spain the overall groundwater management is still chaotic since groundwater remains in private hands instead of being in the public domain, and water planning relies on concessions. Legal and institutional constraints also include restrictions on production well locations and depths (e.g. Yihdego andDrury 2016, Zhang et al 2016) or specific activities such as irrigation groundwater use (Piscopo et al 2019). This factor also includes regulations related to water efficiency and conservation such as incentives or tariffs (e.g. Downward andTaylor 2007, Fishman et al 2015), and connection between surface water and groundwater (De La Hera et al 2016, Owen et al 2019. Legal and institutional constraints also include groundwater sustainability, no-overdraft, and similar policies related to aquifer management. Note that groundwater overdraft is a form of overexploitation that causes groundwater depletion, and occurs when extraction exceeds both natural and induced aquifer recharge over the long period required for the recharge and discharge of the aquifer to adjust (Harou and Lund 2008). Groundwater depletion occurs when prolonged (multi-annual) extraction causes persistent head declines in renewable aquifers or the mining of fossil aquifers translating into a reduction of aquifer volume or in the usable volume of fresh groundwater within an aquifer (Konikow and Kendy 2005).
The socioeconomic system contains two main factors related to aquifer governance as shown in figure 3. Societal values and preferences include the trade-off between the core values (section 2.2) that can be described and elicited by multiple approaches as discussed by Reichert et al (2015) and in section 7.0. Finally, economic feasibility includes pumping cost, groundwater substitution cost (e.g. by desalination or treated surface water), induced recharge cost, and other costs related to groundwater development. This factor is important because it allows for the comparison of environmental and socioeconomic benefits of candidate management strategies to select the best alternative by which groundwater sustainability can be attained (Harou and Lund 2008, Shi et al 2012. From the policy side, a subset of these factors is typically included in groundwater sustainability policies. For example, the California Sustainable Groundwater Management Act defines groundwater sustainability as 'sustainable yield: the maximum quantity of water calculated over long-term conditions in the basin, including any temporary excess that can be withdrawn over a year without an undesirable result; sustainable groundwater management: the management and use of groundwater that can be maintained without causing an undesirable result; Undesirable results include any of the following: Persistent lowering of groundwater levels, significant and unreasonable reductions in groundwater storage, significant and unreasonable saltwater intrusion, significant and unreasonable degradation of water quality, significant and unreasonable land subsidence, surface water depletion having significant and unreasonable effects on beneficial uses' (Cal. Water Code § 10721(v)). From the science side, a combination of these factors can be presented as groundwater sustainability objectives for case-specific studies on groundwater management, or for more general groundwater management across cases as illustrated in section 5.

Defining groundwater sustainability
Given the eight sustainability factors, groundwater sustainability can be evaluated as a function of aquifer performance and aquifer governance components. With respect to aquifer performance, the definition of groundwater sustainability has been extensively debated in the literature (Maimone 2004, Kalf and  Rudestam and Langridge 2014). There have been transitions from safe yield to sustainable yield to sustainable groundwater development (Smith et al 2010), which all fall under the umbrella of sustainable groundwater management (Sikdar 2019) as shown in figure 4.
Starting with safe yield, a common practice is to utilize recharge as the criterion for addressing groundwater sustainability policies. In this case, groundwater development is considered to be safe if the pumping rate does not exceed the rate of natural recharge. However, the idea that, over a long period, pumping rates can be equal to groundwater recharge without causing negative consequences is referred to as the 'water budget myth' (Bredehoeft 2002, Devlin andSophocleous 2005). Under the concept of sustainable yield, other researchers have proposed groundwater capture rather than recharge as the conceptual basis for sustainable groundwater use. Such capture is defined as the sum of the increase in recharge and decrease in discharge caused by pumping (e.g. Barlow et al 2018, Seward et al 2006, Zhou 2009). In this case, sustainable water use has nothing to do with natural recharge (before pumping), but depends on groundwater capture induced by pumping. As indicated by Konikow and Leake (2014), capture is a critical factor in assessing sustainability of groundwater development because it influences the water budget, groundwater storage depletion, and ecosystem services. However, Henriksen et al (2008) note that recharge measurements are necessary because sustainability is broader than just sustainable pumping such that recharge will have long-term effects on water quality, ecological, and socioeconomic factors. Thus, we agree with Zhou (2009) that it is safe to assume that both natural recharge and dynamic development of the capture determine the groundwater sustainability of a groundwater basin. An accurate approach would be site-specific and problem-specific, and should include all surface and subsurface inputs in the system.
Groundwater sustainability is a range that is a function of aquifer performance and governance factors. Seward et al (2006) note that a range of 'sustainable yields' is possible for any given situation depending on participation, and what is deemed acceptable or at least permissible. To clarify this point, Pierce et al (2013) present the concept of an aquifer yield continuum that pairs scientific evaluation of groundwater sustainability with societal preferences. This system view of groundwater availability integrates aquifer performance and aquifer governance components as shown in figure 5. Pierce et al (2013) define sustainable yield as the range of values bounded from the lower-end by permissive sustained groundwater stock to the upper-end by maximum sustained groundwater stock; maximum sustained yield is the amount of water that can be continuously withdrawn without eventually dewatering the most productive water-yielding formation, though storage may vary within the planning horizon. This range between permissive and maximum sustained indicates that groundwater can be considered as a dynamically responsive system Gleeson et al (2020). As such, Gleeson et al (2020) defines physical sustainability as follows: 'renewable groundwater may be any groundwater that can be dynamically captured during pumping that leads to a new dynamically stable equilibrium in groundwater levels within human timescales (∼100 years)' . As discussed in detail by Gleeson et al (2020), beside the concepts of safe yield and renewability, depletion Bierkens and Wada (2019) and stress Alley et al (2018) are additional concepts for defining physical sustainability. A more encompassing term than physical sustainability is sustainable groundwater development, which acknowledges human activities as an influential aspect in 9 Figure 5. Performance and governance components of groundwater stock that extends the aquifer yield continuum of Pierce et al (2013). groundwater management (Smith et al 2010, Sikdar 2019. This is a transition from 'sustainable yield' to a more general 'groundwater sustainability' term (Alley et al 1999, Rudestam and Langridge 2014. As such, Alley et al (1999) define groundwater sustainability as the 'development and use of ground water in a manner that can be maintained for an indefinite time without causing unacceptable environmental, economic, or social consequences. ' The definitions of physical sustainability, or sustainable groundwater development as described above, do not include aquifer governance factors. Groundwater sustainability is not only a function of the aquifer performance, but also of the larger participatory and adaptive governance processes. This is reflected in modern policies such as the California Sustainable Groundwater Management Act that differentiates between sustainable yield and sustainable groundwater management. Pierce et al (2013) define two additional yield terms: operational yield that describes candidate solutions for operational or technical implementation of policy; and consensus yield that accounts for societal consensus through participatory or adaptive governance processes (Molina et al 2012). This concept of an aquifer yield continuum links both the technical evaluation, and participation components of groundwater sustainability evaluation. In addition, a safety margin can be added to these yield terms for the assessment of the production capacity of the aquifer. This type of 'managed yield' (Smith et al 2010, Meyland 2011 can also be adopted to safeguard ecosystem services, or as a general safety factor (e.g. Henriksen et al 2008). For example, Gallardo et al (2009), following the precautionary principle, evaluate safe yield with a safety margin on acceptable drawdown to avoid irreversible impact on groundwater dependent ecosystems, given uncertainty in climatic variability and frequent changes in irrigation strategies.
Two key points should be noted when discussing the groundwater sustainability timescale. First, some aquifers are non-renewable at human timescale (e.g. Bierkens and Wada 2019). Confined and deeper aquifers are more likely to have non-renewable groundwater (Klove et al 2014) as well as aquifers in arid climates. For example, the Nubian aquifer system in North Africa, the world's largest transboundary non-renewable aquifer, contains a large volume of high-quality groundwater that is millions of years old, but receives insignificant recharge (Voss and Soliman 2014). Engelhardt et al (2013) note for sustainable use of groundwater resources in arid areas we need to develop the concept of 'smart mining' with the aim of using the groundwater resources in the most efficient way. Water mining means taking from storage. This is also referred to by Gleeson et al (2020) as managed aquifer depletion to extend the usable lifespan of the aquifer. Gleeson et al (2020) and Foster and Loucks (2006) provide an overview on the management of non-renewable aquifers. Gleeson et al (2020) also define strategic aquifer depletion during non-stationarity periods such as prolonged extreme drought periods. In addition, some basins may appear to be renewable at present, but cannot be maintained over longer time periods because of largescale hydrological changes (Rudestam and Langridge 2014). To avoid overexploitation of aquifers with lengthy renewal periods such as the Nubian and High Plains aquifers, among many others, Gleeson et al (2010) suggest setting long-term sustainability goals for water quality and quantity on a multigenerational time horizon of 50 to 100 years with acknowledgement of the longer-term impacts, through community engagement. Note that groundwater overexploitation is a general expression referring to any groundwater development that creates consequences that are negative or perceived as such (Harou and Lund 2008).
The second key point when discussing the time-scale of groundwater sustainability is the planning horizon, or time scale of evaluation. The time-scale of the groundwater sustainability evaluation needs to be clearly defined, which is a function of both aquifer performance and aquifer governance.  Hugman et al (2013) discuss the evaluation of groundwater sustainability as a single number (or range) over a long period versus a more variable number (or range) that is determined by dynamic modeling over short periods, and show how groundwater sustainability changes accordingly.

The policy-science interface of groundwater sustainability
3.1. The policy side Many groundwater policies and regulations around the world that aim at better balancing competing uses are beginning to incorporate a more integrated systems-approach to groundwater sustainability evaluation   (WFD 2000, GWD 2006, and the State Water Code of Hawaii (HRS chapter 174C 1987). As shown in figure 6, the state of Hawaii has a flexible definition for groundwater sustainability as it delegates to the state water regulators the task of determining the water source utility as the community preferences continue to evolve. This fluidity stems from the fact that the State Water Code of Hawaii was enacted with its roots in the indigenous Hawaiian concept of shared resource management, which naturally entertains participation and consensus, and is based on an intimate understanding of the environment and a deep connection to the land. Figure 6 also shows that the Hawaii State Water Code treats surface and groundwater as one integrated unit, includes concepts such as ecosystem services and human activities when determining groundwater sustainability, and further advocates for a water planning process that is based on participation and consensus.
The two key components in the Hawaii example-multi-process consideration (i.e. hydrological processes, ecosystem services, and human activities) and participation-are gradually being recognized as essential for sustainable groundwater management. For example, the EU Water Framework Directive calls for combined management of surface water and groundwater with proper assessment of the influence on groundwater quantity and quality of surface water ecology (Henriksen et al 2008). Similar integrated approaches can be found in Australia, California, South Africa, and other places, as reviewed by Rohde et al (2017). Some examples in contemporary policy literature, which offer lessons learned from developing or analyzing such policies, show that these policies mainly arise out of concern for degradation of groundwater dependent ecosystems and related ecosystem services (Tuinstra and van Another main factor is the general unsustainable water use that leads to both water quality and water quantity concerns, especially during water stressed periods (Quevauviller et al 2016). In a comparative study between Germany and South Africa, for example, Knüppe et al (2016) find that even though degradation of groundwater dependent ecosystems may motivate local policy change, explicit language around ecosystem services is needed in policy to ensure effective sustainable groundwater governance. Stakeholder engagement is also increasingly being recognized as a critical component of groundwater sustainability policy. Rinaudo et al (2016) offer a case of agricultural groundwater users in France demonstrating the need to provide cognitive legitimacy in groundwater sustainability evaluation through better communication of science results to groundwater users. The planning process for groundwater sustainability itself requires thoughtful understanding of institutional structures, engagement of key stakeholders, as well as solid analytical support (Quevauviller et al 2016).

The science side
There has been a growing general interest-as this article quantitatively shows in section 4-to develop decision support that uses existing science and develops stakeholder-driven scientific research to ensure effective implementation of groundwater sustainability policy. Several review articles discuss the scientific evaluation of groundwater sustainability There are general sentiments that are echoed among these studies, which can be summarized by three main challenges. First, the science of groundwater sustainability that includes both natural and social sciences is complex. Understanding and modeling the coevolution of societies with water resources systems, ecosystems, and their interactions with the climate is a transdisciplinary problem that involves physical, socioeconomic, technological, and institutional aspects (section 5). Second, such integrated models involving both the natural and human aspects of complex and dynamic water systems have profound uncertainties (section 6). Third, there is often a communication gap between the academic science community, policymakers, and water managers, which can make the science outputs less demand-driven. Thus, there are increasing calls for the important role of participation to add legitimacy, credibility, and saliency to scientific assessments. This can lead to demand-driven scientific outputs, and more effective and readily adoptable water management decisions (section 7).
This article is meant to showcase the synthesis of both natural and social sciences to develop and operationalize groundwater sustainability policy. We draw on the literature, which indicates the broad future directions for water security and sustainability, to argue that a scientific process to better implement groundwater sustainability policy should be: 1. multi-process through analyzing the inherent feedback and coevolving processes and scenarios between the surface water-groundwater systems, ecosystems, and human activities (Montanari et al 2015, Sivapalan andBlöschl 2015), 2. multi-narrative through conducting a multimodel uncertainty analysis with adaptive management to clearly state 'what is known, what is possible, what is unknown' (Ferré 2017b), 3. and participatory such that 'the products of science must emerge from an iterative, collaborative, two-way exchange with management and policy communities' (Wheater and Gober 2015).
We propose that the basic recipe for an effective groundwater sustainability evaluation is to develop and present our scientific knowledge with its uncertainty based on societal preferences in a participatory process. With respect to the aforementioned three components, we surveyed 23 review and opinion articles about groundwater sustainability. Regarding multi-process modeling, while most of these articles generally support recognizing surface water and groundwater as a single resource, few articles emphasize the explicit description and modeling of . With respect to human activities, several articles discuss human activities with relation to economic analysis (Custodio 2002, Harou and Lund 2008, Pierce et al 2013, Sikdar 2019, but without explicit reference to human activities modeling with bi-directional feedback to hydrological and ecological systems. Finally, articles that emphasize the importance of uncertainty analysis (Custodio 2002, Seward et Figure 7 shows how these three main components (i.e. multi-process modeling, uncertainty analysis, and participation) have evolving levels of involvement and integration. This article reviews the progress being made towards integrating these three components in the evaluation of groundwater sustainability.

Literature review methods and results
This article combines a systematic and experiencebased review of literature. We follow the PRISMA guidelines (Moher et al 2009) for a systematic review of peer-reviewed papers. The first stage is the identification of relevant literature. A comprehensive search for peer-reviewed studies was performed using Clarivate Analytics Web of Science search engine. We used the topic search function that searches title, abstract, author keywords, and Keywords Plus. We first searched the database in January 2019 and then reran the whole search in April 2019 to update our records. Searching for records about groundwater sustainability is not a straightforward task. For example, searching keyword 'groundwater' and filtering the results by ('sustainable' or 'sustainability') gives more than 5500 records. Most of these records are not directly related to groundwater sustainability evaluation. To directly target the groundwater sustainability evaluation records, we followed this procedure: (1) We searched for the terms ('sustainable yield' or 'safe yield'), and filtered the search records with 'groundwater' , which resulted in 190 records. (2) We identified the relevant records that are directly related to groundwater sustainability evaluation, which were 107 records. (3) All the records that cited these 107 records were added to our search results. This resulted in 1346 records after removing duplicates. These records were mainly the safe yield and sustainable yield literature, and the relevant literature around these two topics. 13 (4) To identify sustainable groundwater development and sustainable groundwater management literature, we performed additional searches for ('groundwater sustainable' , 'groundwater sustainability' , or 'sustainable groundwater') that resulted in 659 records. (5) Combining the records of these four steps resulted in 1927 records after removing the duplicates. (6) Limiting our review period from January 2001 until our search date resulted in 1727 records. This collection represents the groundwater sustainability literature and other relevant records around groundwater sustainability literature, given the search period. (7) To filter out records that are not directly related to groundwater sustainability evaluation, we performed text analytics on the titles and abstracts of the identified records (Elshall 2020). The text analytics method and python code are included as supplement. This resulted in 1185 records. The number of records from 2001 to 2010 and from 2011 to 2019 are 239 and 946 records, respectively. The supplement contains the used data and processed results.
We also conducted a keyword search on 'groundwater' in Web of Science between 2001-2018 to analyze the relative expansion of the general literature on groundwater (figure 8(a)) compared to groundwater sustainability specific literature (figure 8(b)) for the same period. In figure 8, we use 1141 groundwater sustainability records instead of 1185 because we exclude 2019 (since our search in April 2019 did not allow for a full record). The groundwater literature shows a linear increase (i.e., no acceleration), while the groundwater sustainability literature shows a polynomial increase (i.e., constant rate of acceleration). We further analyze these results by normalizing the number of records for each year by the total number of records for the whole period, and breaking the data set into two periods that are 2001-2010 and 2011-2018, respectively, as  RRI period,GWS is 3.0 for the groundwater sustainability literature.
From the abovementioned results we can observe three main trends. First, the groundwater literature is expanding. This can be attributed to meeting research demands from both academic and industrial sectors (Vadiati et al 2018) as environmental changes are now progressing at an unprecedented pace (Ceola et al 2016). This can be additionally attributed to the launch of numerous academic journals related to groundwater, and partly to the emergence of unanticipated problems (Vadiati et al 2018). Second, the results show that the groundwater sustainability research is growing at a faster rate than overall groundwater research (i.e., 1.3 times faster for 2001-2010, and 1.8 times faster for 2011-2018). These quantitative findings are in line with the Vadiati et al (2018) overview on trends in groundwater research, showing progress towards sustainability. Third, it is clearly evident that the groundwater sustainability research is growing at a much faster rate in this decade than in the previous decade (3 times faster). It is worth noting that in 2018 alone there were more than 17% of the total records in 2001-2018 (figure 8(c)). This accelerating expansion in groundwater sustainability research suggests that the concept of sustainability is becoming more applied in groundwater management (Chaminé and Chamine 2015), and that groundwater is increasingly being recognized as a key aspect of sustainability challenges facing humans in the Anthropocene .
We further analyzed the groundwater sustainability records by breaking them down by topic, using text analytics for titles and abstracts. The metaanalysis for the topic breakdown, the python code, and data are presented in the supplement. The aim of this analysis is to gain insight about research trends of the main components of the scientific evaluation of groundwater sustainability policy. are policy, human activities, and SW-GW interaction. This reflects a growing interest to integrate more social science in groundwater sustainability evaluation.
We further investigate the relative expansion of different research topics in groundwater sustainability literature as shown in figure 10. The groundwater sustainability literature for 2011-2018, which has 1141 records, has an increasing slope of 8.3 × 10 −3 . We consider this to be the average rate of expansion of groundwater sustainability literature. Research topics with groundwater sustainability literature that have a below average rate of expansion are ecosystem services, SW-GW interaction, and uncertainty analysis with a slope of 7.1 × 10 −3 , 7.7 × 10 −3 and 8.3 × 10 −3 , respectively (figure 10). Research topics within groundwater sustainability literature that have above average rate of expansion are policy, human activities, participation, and multi-process modeling, with a slope of 8.5 × 10 −3 , 9.1 × 10 −3 , 9.3 × 10 −3 , and 9.5 × 10 −3 , respectively (figure 10). These results indicate that multi-process modeling, participation, and human activities are the top three growing research areas in groundwater sustainability literature in comparison to SW-GW interaction, ecosystem services, and uncertainty analysis. This confirms the previous observations that groundwater sustainability research is progressing in the direction of integrating more social science research. In addition, these results echoes the conclusion of the Velis et al Figure 10. An analysis of groundwater sustainability literature during 2001-2018 shows the annual frequency of records covering a topic, normalized by the total number of records covering that topic during the period. The total number of records during 2001-2018 covering policy, SW-groundwater interaction, ecosystem services, human activities, multi-process modeling, uncertainty analysis, participation, basic groundwater modeling, and groundwater sustainability are 219,202,162,195,81,117,128,529 and 1141, respectively. Basic groundwater modeling records refer to records that do not have a SW-GW interaction, ecosystem services, human activities, multi-process modeling, uncertainty analysis, and participation component.
(2017) review article about groundwater and sustainable development goals, which states that achieving a sound understanding of human impacts on groundwater resources across scales is paramount to integrated implementation of sustainable development goals. Having said that, it is worth noting that 80% and 45% of the groundwater sustainability records in 2001 and 2018, respectively, are basic groundwater modeling studies that do not have any component of SW-GW interaction, ecosystem services, human activities, multi-process modeling, uncertainty analysis, and participation components. Yet there is a clear trend toward a more integrated evaluation.
The second stage in our systematic review was the screening, as shown in figure 11. We did a qualitative screening for the identified 1185 records based on titles and abstracts. This resulted in 261 records that are eligible for full-text assessment. The third stage was to assess records for eligibility and exclude records after a full-text assessment. The number of records that were included in the review after full-text assessment is 201 records. The criteria for inclusion and exclusion differ by topic, and are explained in the beginning of each section.
With respect to groundwater sustainability, gray literature is often an important source of knowledge and information as it includes key reports and initiatives from the United Nations related organizations, and governmental and non-governmental organizations. We did a non-systematic search for grey literature using Google Scholar and Google Web Search with keyword 'groundwater' , 'groundwater sustainability' , 'sustainable groundwater management' , and 'sustainable yield' paired with the following keywords: 'sociohydrology' , 'policy' , 'integrated groundwater management' and 'governance.' We collected more than 40 full-text grey literature records including institutional reports, government documents, state water codes and directives, book chapters, opinion articles, initiatives, and conference proceedings. While these records are mainly English written texts, some good reports can be found in Spanish and French covering the European Union and Central and South America, which are not included in this review.
We also conducted an experience-based search for topics related to groundwater sustainability. We conducted this search using Web of Science, Google Scholar and Google Web Search. We used several keywords such as 'adaptive management', 'decision support', 'ecohydrology', 'hydroeconomic', 'science' and 'policy', 'sociohydrology', 'participation', 'stakeholder engagement' , 'groundwater sustainability' , 'groundwater security' , among many other keywords as needed. We collected more than 250 full-text records that are mainly peer-reviewed articles, book chapters, and technical reports. Records were selected based on our expertise to present the state-of-the-art tools and developments, and recent discussions. The main objective of the experience-based component of the search was to on leverage recent advances in groundwater literature in order to point out gaps and future directions in groundwater sustainability literature. Given both systematic and experience-based searches, the total number of records that are included in the article is 450 records.

Hydrological modeling
The first layer of a multi-process modeling for evaluating groundwater sustainability is hydrological modeling, which simulates both groundwater and surface water processes. Given aquifer properties and geometry, hydrological models assess groundwater sustainability based on the concept of groundwater capture to ensure that groundwater pumping does not exceed natural and induced recharge over long periods. Groundwater capture increases in response to recharge from increased percolation due to irrigation surplus, decreased evapotranspiration, and changes in soil storage (Harou and Lund 2008); induced recharge from connected groundwater aquifers and surface water bodies (such as streams, lakes, and wet lands); and reduction of terrestrial and marine groundwater discharge. Natural and induced recharge will balance decreases in storage due to pumping until a new sustainable equilibrium is reached. Due to this dynamic nature of groundwater capture, there is a general consensus that surface water and groundwater (SW-GW) should be jointly managed (Alley 2018, Gleeson et al 2020). In some cases, however, an aquifer can be decoupled from surface flow effects with only consideration of lateral groundwater recharge from neighboring geological formations Tsai 2014, Urrutia et al 2018). In either case, numerical modeling is required for groundwater sustainability evaluation to provide quantitative outputs, such as a basin water budget (Kalf and Woolley 2005). Note that several studies (e.g. Kalf and Woolley 2005) stress the importance of using numerical models versus analytical models to evaluate groundwater sustainability. However, given the results of our systematic review, we extend this discussion to also cover analytical models.
The groundwater management literature shows that a wide array of both mechanistic models (e.g. numerical models and analytical solutions) and phenomenological models (e.g. analytical models) are used to evaluate groundwater sustainability. The first class of models adopts a mechanistic description of flow and transport systems and the land-vegetationatmosphere interface in a way that is consistent with physical governing equations derived from first principles. Phenomenological models describe the relationships and interactions between variables beyond measured data, in a way that is consistent with the theory and observations, but are not necessarily physically-based. Mechanistic groundwater models are generally fully distributed. Examples include MODFLOW (Hughes et al 2017), a commonly used mechanistic groundwater flow model; SUTRA (Provost and Voss 2019), a density dependent flow model; and GSFLOW (Regan et al 2018), a coupled groundwater and surface-water flow model. Since analytical solutions are uncommon in groundwater management, hereinafter we will only focus on mechanistic numerical models. Phenomenological models can range from lumped to distributed models, and from black box models that are fully-empirical to gray-box models that have mechanistic elements. We conducted a full-text review of 85 case-study articles about groundwater sustainability at different geographical locations, covering both numerical models (53 articles) and phenomenological models (32 articles). After full-text review, 34 and 20 articles relevant to the respective class were selected for inclusion in the review article. Articles were excluded mainly for brevity. The hydrological modeling approaches that are used in these studies are discussed below.

Numerical models.
A wide array of numerical models can be used to simulate various elements of the hydrologic cycle, ranging from groundwater models to fully-integrated, physically-based SW-GW models. For example, several groundwater sustainability studies use MOD-FLOW to simulate groundwater flow (Sakiyan and Yazicigil 2004 The limitation of this approach is that MODFLOW simulates rivers as either fully connected or fully disconnected, while the transition stages between the two flow regimes exist in nature (Brunner et al 2010). A more physically-based approach includes studies (Nastev et al 2006, Stigter et al 2009) that use FEFLOW (Trefry and Muffels 2007); and the study of Calderhead et al (2012) that uses HydroGeoSphere (Brunner and Simmons 2012), which are capable of simulating streamflow, and unsaturated and saturated flow. Calderhead et al (2012) use HydroGeo-Sphere to additionally account for land subsidence as a sustainability factor. Using a more physicallybased approach could be advantageous. For example, (Brunner et al 2010) compare the relative accuracy of MODFLOW and HydroGeoSphere in simulating SW-GW interaction and show that MODFLOW cannot simulate negative pressures beneath disconnected streams, resulting in an underestimation of the infiltration flux. In addition, the discretization of both the river and aquifer in MODFLOW can cause errors in estimating the position of the water table under the river and in simulating the height of the groundwater mound (Brunner et al 2010).
In numerical models, groundwater recharge can be estimated as a calibration parameter (Nastev et al 2006, Mustafa et al 2018, from a pre-processing step using a numerical steady-state model (Sarma and Xu 2014), as a fraction of precipitation (Stigter et al 2009, Yihdego andDrury 2016), or as the difference between precipitation and actual evapotranspiration ( . SALUS simulates crop, soil, and water interaction; HELP simulates rainfall, runoff, infiltration and other water pathways, accounting for vegetation, and soil types, among other land-surface aspects; SWAT is a complex watershed model that simulates spatial-temporal impacts of land-use and climate on the hydrologic cycle at the river basin scale. A combination of a cropping system model with SWAT is also used to estimate recharge in agricultural and natural areas (Hu et al 2010). This sequential approach of using a surface water model to estimate recharge for a groundwater model is loose-coupling because it does not allow for an iterative feedback between the surface water and the groundwater model.
To better describe the interaction of surface and subsurface processes, coupled and integrated SW-GW models are emerging. Generally, a fully coupled or integrated SW-GW model would be more desirable. A simple approach is to use a surface water model that has a groundwater module. Zhang et al (2016) use SWAT to evaluate the sustainability of shallow groundwater in an irrigated plain river basin in China. Acero Triana et al (2019) compare MOD-FLOW and SWAT results, and show that SWAT results will not be always as accurate as using a groundwater model. The study concludes that, although SWAT has a groundwater module, such an approach will not provide a detailed enough description of the groundwater system, leading to wrong conclusions that could misinform policy. Shi et al (2012) present a coupled SWAT and MODFLOW approach with two-way iterative feedback between the two models to evaluate groundwater sustainability in a semi-humid basin in China.
Alternatively, other studies use a fully integrated SW-GW model with detailed description of the surface and subsurface processes, including the unsaturated zone. A few studies in this class implement an integrated hydrological modeling approach using the MIKE-SHE model for sustainable groundwater management in Denmark (Henriksen et al 2008) and China , Qin et al 2013. MIKE SHE Storm 1995, DHI 2003) is an integrated model that simulates the main land-surface processes in the hydrologic cycle, and simulates the unsaturated zone using the Richards equation. However, for computational efficiency, (Henriksen et al 2008) disable the unsaturated zone component. Instead of solving the full Richards equation that is computationally expensive, a more practical approach is to simulate the unsaturated zone using a one-dimensional column. This takes advantage of the dominant vertical flow direction within the unsaturated zone when averaged over large area, as in the GSFLOW model (Regan et al 2018). Several versions of GSFLOW are applied to examine the impact of unsustainable groundwater management practices on ecosystem services (Tian et al 2015, Wu et al 2015) and the interlinked impact of climate change and human activities on groundwater sustainability (Feng et al 2018). In a study that is related to groundwater sustainability with respect to land-use and climate change, Ferguson and Maxwell (2012) use ParFlow to compare effects of climate change with pumping and irrigation on terrestrial water and energy budgets of an agricultural watershed in the semi-arid Southern Great Plains, USA. ParFlow (Kollet and Maxwell 2008) is a fully-integrated numerical model that simulates the hydrologic cycle from the bedrock to the top of the plant canopy. Integrated SW-GW models have also been tailored for specific regions, such as California's C2VSim groundwater-SW simulation model (Brush et al 2013). Macewan et al (2017) develop surrogate response functions from C2VSim to evaluate groundwater sustainability in the Central Valley agricultural region in California.
Coupled and integrated hydrological models have several advantages. First, they are more accurate in determining groundwater recharge through infiltration than external water balance models for water resource management in areas with complex SW-GW interaction (Feng et al 2018). This reduction in uncertainty is due to accounting for the dynamics of evaporation and irrigation over space and time during the simulation (Qin et al 2013). In addition, coupled and integrated hydrological models provide additional observation data types other than hydraulic head and concentration data to constrain the model (Wu et al 2014). However, these complex models could involve substantial simulation uncertainty when data for model development and calibration is insufficient, therefore hindering their wider application (Wu et al 2014, Feng et al 2018.

Phenomenological models.
It is not uncommon to use phenomenological models to estimate groundwater sustainability, which is the case, for example, in Kansas (Butler et al 2016), Hawaii (Mink 1981, Liu 2007, California (Miro and Famiglietti 2018), and elsewhere. Phenomenological models include analytical models as well as other types of models as presented at the end of this section. We focus on analytical models because they are frequently used to evaluate groundwater sustainability. This class of models generally utilizes a simple closedform equation for estimating groundwater sustainability, based mostly on the water budget. Elements of such a budget are simplified by using empirical equations. They are generally termed analytical models (Kalf and Woolley 2005) owing to the way the expressions are derived, in contrast to the spatially distributed numerical models. This class of models is appealing due to its simplicity and is preferred where the use of site-specific detailed modeling approaches is not feasible, mainly due to the lack of appropriate data. In general, at least one criterion, such as the maximum allowed water level decline, is used to constrain water use, satisfying the aquifer water budget. Studies that simply assess sustainability of the aquifer without setting such a criterion were excluded from this review.
Virtually all groundwater sustainability analytical models apply a form of the water budget equation. The models are generally lumped (e.g. Mink 1981, Zhang and Kennedy 2006, Liu 2007, Loaiciga 2008, Aksever et al 2015, Alcala et al 2015, Bailey et al 2015, Benini et al 2016, Butler et al 2016, Bailey and Tavakoli Kivi 2017, Miro and Famiglietti 2018, which is consistent with the nature of this class of models. As described by Butler et al (2016Butler et al ( , 2018, a typical lumped model (also known as a single-cell aquifer model) considers groundwater recharge as an input and pumping and natural discharge as outputs. Water height is a parameter that is related to pumping through a linear relationship and can be translated to water volume given the surface area and storage coefficient of the aquifer. Baseline groundwater sustainability generally corresponds to the average level of pumping that causes zero average groundwater-level change, or any other sustainability criterion.
Analytical models can differ regarding processes included, approaches to quantify budget elements, and the criteria set for estimating the groundwater sustainability. Some models include more processes than others in their water budget equation and model formulation. For example, Zhang and Kennedy (2006) account for artificial sources of groundwater recharge and alteration of groundwater systems due to development in an urban context. Similarly, the model of Voudouris (2006) accounts for exploitable dynamic groundwater reserve, artificial recharge, and irrigation return. Loaiciga (2008) uses a graphical method to account for surface reservoir sizing. Miro and Famiglietti (2018) use a modified water-budget approach for confined aquifers. Benini et al (2016) utilize an index that accounts for salinization to assess aquifer vulnerability to climate and land-use changes. Other models use various techniques to account for fresh water lens in coastal aquifers (Liu 2007, Bailey et al 2015, Bailey and Tavakoli Kivi 2017. In addition, to improve the representation of the groundwater system, a few studies extend analytical models beyond the lumped or single cell aquifer. For example, Jang et al (2012) treats the shallow and deep aquifers as two compartments and perform the water balance sequentially. Also, Jafari et al (2018) develop a distributed model that utilizes the area of a Thiessen's polygon surrounding each piezometer in the aquifer and the respective value of water level decline.
Hawaii is a good example of using an analytical model to evaluate groundwater sustainability. Mink (1981) developed the Robust Analytical Model (RAM) based on a steady state assumption for an aquifer where inflows (recharge) equal outflows (leakage plus pumping), with all pumping lumped as a single value. The RAM models have been used as the primary tool to implement sustainable yield water policy in Hawaii (figure 6). Estimating sustainable yield is required to update the Water Resources Protection Plan, which is a component of the Hawaii Water Plan (CWRM 2008(CWRM , 2019b. The model provides a parabolic relationship between average hydraulic head and pumping. The curve can be used to estimate optimal pumping based on known recharge and the ratio between equilibrium and initial heads. Difficulties in applying RAM are mainly related to identifying a value for the equilibrium head. Values suggested by water management agencies range between 50%-75% of the initial aquifer head. RAM2 (Liu 2007, Liu andDai 2012) improved the original RAM model by accounting for salinity. The improvements also include analytically estimating the equilibrium head to prevent salinity of pumped water from exceeding a certain acceptable level. Limitations of RAM and RAM2 models include the inability to account for groundwater leakage. Thus, the models cannot address societal preferences with respect to ecosystem services by not addressing leakage components, such as submarine groundwater discharge, spring discharge, base-flow, evapotranspiration, and drains. Simulating leakage is also important in accounting for change in lateral inflow between different aquifer administrative units.

Numerical versus phenomenological models.
Although analytical models, which are the most commonly used class of phenomenological models, are widely used to evaluate groundwater sustainability, they are subject to criticism (e.g. Kalf and Woolley 2005, Henriksen et al 2008, Mulligan et al 2014. A first limitation as stated by Kalf and Woolley (2005), is that the models are unsuitable for basin groundwater sustainability estimation, mainly because they lack the ability to simulate leakage, and their treatment of inflow and outflow component interaction cannot be rigorous. Not being able to simulate groundwater leakage is critical. As explained by Pereau and Pryet (2018), if leakage is not considered, then over a long period the single-cell aquifer will dry out if pumping rates exceed recharge, and will overflow if pumping remain below recharge (Pereau and Pryet 2018). Seward et al (2006) also emphasize model limitations if only natural recharge and pumping are considered. The implications of this limitation on groundwater dependent ecosystems is discussed in section 5.2. More recently, Pereau and Pryet (2018) addressed this limitation by developing an analytical model in which leakage explicitly appears in the water budget equation as a function of the groundwater level using a linear conductance model.
Another limitation of analytical models is the lack of consideration for spatiotemporal relationships, which has several implications. First, analytical models assume instantaneous change in the water-level of the model domain, while in reality pressure diffuses gradually over space from pumping wells, and the impacts of pumping can be delayed in time (Pereau and Pryet 2018). Second, not accounting for the aquifer spatial distribution of hydrogeological data can yield inaccurate results. As indicated by Thomas and Famiglietti (2015), groundwater sustainability is critically dependent on accurate estimation of the temporal and spatial variability of groundwater behaviors as a response to both natural and anthropogenic influences. For example, Oki and Meyer (2001) compare results obtained by both the numerical model SHARP (Essaid 1990) and the analytical model RAM (Mink 1981) for a major aquifer in Hawaii. The results indicate that the field measured declines of water levels are larger than those predicted by RAM, which are consistent with the results of the numerical model analysis. The model RAM underestimates water-level declines in areas where a low-permeability confining unit exists, and in the vicinity of pumping wells. Also, Elshall et al (2018) show that the RAM2 model (Liu 2007) overestimates sustainable yield by up to 30%, seriously underestimating the chance of aquifer salinization. Third, distributed spatiotemporal response of soil moisture and SW-GW interaction to variations of precipitation and pumping can be vital when incorporating ecosystem services (Henriksen et al 2008) and human activities modeling (Mulligan et al 2014) as detailed in section 5.2, and section 5.3, respectively. Fourth, considering the spatial location of pumping wells is important. For example, in the presence of saltwater intrusion, turning off certain pumping wells near the coast can increase sustainable yield (Elshall et al 2018). In addition, a pumping well located close to a discharge zone could be critical for groundwater dependent ecosystems (Seward et al 2006).
Despite these limitations, analytical models have several advantages and usages. First, they can be applicable to specific cases, such as when groundwater sustainability evaluation is based on waterlevel response alone. This includes cases where groundwater levels are decoupled from environmental and river flow effects (Macewan et al 2017), and for fossil groundwater resources where natural recharge and discharge are negligible (Pereau and Pryet 2018). Second, analytical models are useful as first-stage assessment tools (Kalf and Woolley 2005) and for conceptual and generic discussions on groundwater management strategies (Pereau and Pryet 2018). Third, analytical models are especially useful when there are insufficient site-specific data to develop a high-fidelity numerical model with more mathematical and geological realism. In many locations, especially in developing countries, site-specific data needed for numerical models are generally limited, which reduces the usefulness of numerical models. To overcome some of the limitations of analytical models and the high computational cost of numerical models, some studies use more elaborate phenomenological models to evaluate groundwater sustainability. This includes the economic-engineering optimization model of California CALVIN (Harou and Lund 2008), and machine learning models (Salem et al 2017). Due to the computational efficiency of the hydrological module, these models encourage accounting for multiple aquifer governance factors as discussed in section 5.3.
In summary, the choice of using a phenomenological versus numerical model is case-specific and depends on data availability, aquifer type, and sustainability factors of interest. As discussed by Hill (2006), simpler models are often preferred, given that they are characterized by fewer parameters and shorter execution times. A recommended approach is to start with simple models and slowly build model fidelity and model complexity to reach the best fit with measurements, while avoiding underfitting or overfitting the observation data (Hill 2006, Elshall et al 2019. The challenge, especially with numerical models, is to develop the most parsimonious model given groundwater sustainability factors of interest (figure 3) and plausible data. To avoid what Voss (2011b) calls 'groundwater modeling fantasies,' in which the modeler finds herself 'adrift in the details,' the focus should be on 'down to earth' model development with reasonable data requirements (Voss 2011b). Data collection and model development can be time consuming and expensive, and require specialized training. The shared responsibility of stakeholders and the scientific community is to work together to define the state of knowledge and hydrological modeling tools required to address societal preferences and answer groundwater sustainability questions.

Coastal aquifers.
We use coastal aquifers as an example to illustrate the role of hydrological modeling in the scientific evolution of groundwater sustainability concepts. Several studies review management aspects of coastal aquifers. These include dealing with groundwater scientific challenges to meet societal needs in coastal areas (Michael et al 2017), modeling tools and challenges to characterize and manage coastal aquifers (Werner et al 2013), computational and conceptual issues in the calibration of seawater intrusion models (Carrera et al 2010), the use of simulation optimization method for coastal aquifer management (Singh 2014, Ketabchi and Ataie-Ashtiani 2015, Singh 2015, Sreekanth and Datta 2015, and the impact of sealevel rise and climate change on coastal aquifers (Holding et al 2016, Ferguson and Gleeson 2012, Ketabchi et al 2016, Werner et al 2017. Groundwater sustainability is discussed by a number of these articles (Ketabchi and Ataie-Ashtiani 2015, Werner et al 2017. Werner et al (2017) review approximate approaches for estimating groundwater sustainability in atoll islands, which include determining a percentage of recharge or rainfall as the groundwater sustainability, or using analytical solutions with the Dupuit approximation and the Ghijben-Herzberg principle, along with field measurement and monitoring techniques. For example, Benini et al 2016 use the Dupuit-Ghijben-Herzberg relationship (Fetter 2001) and a salinization vulnerability index based on water table level to evaluate the longterm impact of land-use, climate changes, and sea-level rise on groundwater sustainability. These approaches are very useful for first assessment, and when data is limited, as discussed in section 5.1.3. In addition, field measurements techniques, such as hydrogeophysical approaches (e.g. Vouillamoz et al 2012), and hydrogeochemical approaches (e.g. Bryan et al 2016), are used to quantify groundwater resources in coastal areas. In general, numerical modeling approaches to evaluate groundwater sustainability can better project outcomes under a wide array of natural and anthropogenic scenarios. As documented by Post et al (2018), a numerical modeling approach better enables the analysis of freshwater volumes and fluxes of submarine groundwater discharge in comparison to other techniques. Hence, this section mainly focuses on numerical modeling studies, considering that phenomenological modeling studies are discussed in section 5.1.2. We conducted full-text assessments for 45 research articles about groundwater sustainability evaluation in coastal aquifers, and 32 articles were eligible for inclusion. We exclude most of the studies that only use synthetic case studies, as well as articles that do not provide sufficient justifications for not accounting for density-dependent flow. However, we include studies that use a constant density model, while discussing the limitation of this approach. These justifications include fewer data requirements, easier model development, and less computational cost.
Groundwater sustainability evaluation is challenging in coastal aquifers as they are more vulnerable to additional threats than inland aquifers. The combination of natural and anthropogenic threats significantly affects water supplies from nearshore fresh groundwater lenses. Threats can be summarized from a number of coastal aquifer studies, including Ferguson and Gleeson (2012), Michael et al (2017), Rotzoll and Fletcher (2013), and Werner et al (2017). They include (1) drought due to the often large interannual variability of natural rainfall cycles, (2) seawater inundation due to storm surges, extremely high tides, and tsunamis, (3) groundwater inundation in coastal-plains due to sea level, (4) inundation by flooding of low-lying areas by seawater, (5) thinning of the fresh groundwater lenses and widening of the mixing zone due to pumping-induced drawdown and decrease of spring and submarine groundwater discharge, (6) upconing at individual wells due to over-pumping, (7) local salt storage in salt-making facilities, and (8) nutrient and antibiotics use in aquaculture. These factors adversely affect the thickness of the fresh groundwater lenses, the width and shifting of the mixing zones, the resilience of pumping wells to upconing, the resilience of fresh groundwater lenses to seawater inundation and drought, and quantity and quality of groundwater discharge to groundwater dependent ecosystems. Accordingly, groundwater sustainability evaluation in coastal aquifers should ideally take into account density dependent flow, mixing processes, recharge rates, drought periodicity, and alternative sources of freshwater (Werner et al 2017). Additional issues include accounting for submarine groundwater discharges and related water quality, including nutrient levels that are important for terrestrial ecosystem services (e.g. In coastal aquifers, the volume of freshwater lens depends on the location of the freshwater-saltwater interface and the recharge-discharge mechanisms of the aquifer. Budget analysis alone cannot be sufficient for groundwater sustainability evaluation. More accurately, groundwater sustainability can be evaluated by observing the changes in the volume of freshwater lens under different pumping schemes and considering recharge and sea-level rise scenarios. In addition, the analysis would be enhanced by utilizing different aquifer conceptual models and including different aquifer performance and governance factors. The most commonly used aquifer performance factors when evaluating groundwater sustainab- Salinity thresholds can be set to regulatory values or as determined by local conditions. For example, Zhao et al (2016) study considers a monitoring point to be affected by seawater intrusion when the detected chloride concentration is higher than the limit for freshwater use of 250 mg l −1 as set by U.S. Environmental Protection Agency (1973). Alternatively, Pholkern et al (2019) set groundwater salinity threshold to 1000 mg l −1 TDS. When salinity is not explicitly considered, changes in head-level can be taken as a surrogate for salinity and for the location of the saltwater-freshwater interface. For example, Sedki and Ouazar (2011) define an altitude value from mean sea level for hydraulic head along the coastal boundary as a threshold to prevent saltwater intrusion. Similarly, Liu et al (2006) use an adaptive management approach with a groundwater management index based on groundwater levels in wet and dry seasons. In addition to drawdown and salinity, other aquifer performance factors of concern include spring discharge (El-Kadi et al 2014, Hugman et al 2015, Burnett et al 2020, and submarine groundwater discharge (El-Kadi et al 2014, Hugman et al 2015, Post et al 2018. Groundwater discharge volumes and salinity are particularly important for groundwater dependent ecosystems and various human activities, such as in agriculture zones and culturally significant sites (Burnett et al 2020). These additional aquifer performance factors add more restrictive constraints on fresh groundwater utilization.
Fresh groundwater lenses are highly vulnerable to salinization due to excessive groundwater pumping rates, coupled with natural recharge variations and sea-level rise. Determining the groundwater sustainability from freshwater lenses is challenging because the lens response during drought periods and the long-term effects of pumping are both difficult to predict; Post et al (2018) show that the lens contraction caused by pumping is nearly a linear function of the total pumping, yet this might not be the case under different recharge scenarios. This is mainly due to aquifer heterogeneity, including the existence of preferential flow paths (Elshall et al 2013, Werner et al 2013. In addition, using multiple geological conceptualization of the subsurface can have a large impact on groundwater sustainability estimates similar to the impact of climate change (Pholkern et al 2019). A major gap in groundwater sustainability literature in coastal aquifer is the absence of adequate characterization of subsurface heterogeneities, especially under different conceptual geological models.
Significant uncertainties also exist with respect to recharge evaluation, especially regarding climate modeling and difficulties in the downscaling processes necessary for local aquifer scale simulations. Typical studies evaluate recharge scenarios utilizing data for climate change, land-use, artificial recharge, among others. Different top-down and bottom-up approaches have been adopted to evaluate natural recharge due to climate change. A top-down approach evaluates a specific future climate change scenario, while bottom-up approach assesses the impact of systematic increase or decrease in recharge. Top-down approach includes recharge estimation under different Intergovernmental Panel on Climate Change (IPCC) scenarios for future climate projections, which includes using the Special Report on Emissions Scenarios (SRES) scenarios , El-Kadi et al 2014, Benini et al 2016, Hugman et al 2017, and the Representative Concentration Pathways (RCP) scenarios (Pholkern et al 2019). The SRES scenarios are designed around a set of development assumptions of regional and global patterns of economic growth and environmental sustainability. They were superseded in IPCC Fifth Assessment Report released in 2014 by RCP scenarios that are based on the change in radiative forcing and other forcing agents. Example studies include Rasmussen et al (2013), who assume an increase in groundwater recharge of 15% in the period from 2010 to 2100 for a coastal aquifer in Denmark, based on outputs from regional climate models representing an IPCC A2 scenario of regionally oriented economic development, and a B2 scenario of local environmental sustainability. Alternatively, Pholkern et al (2019) evaluate three RCP scenarios of low-emissions mitigation scenario leading to a very low forcing level (RCP2.6), a midrange mitigation emission scenario (RCP4.5), and a very high baseline emission scenario (RCP8.5). The annual precipitation of the study basin at North East Thailand indicates that average annual rainfall tends to increase by 6%, 7%, and 11%, under the RCP2.6, RCP4.5, and RCP8.5 scenarios, respectively. The study shows that groundwater sustainability is consistently higher under RCP8.5, RCP4.5 and RCP2.6, respectively, under each of the four evaluated conceptual models. The study found that uncertainty about the model geological structure and recharge due to future climate change are significant in comparison to uncertainty about boundary conditions. Other than IPCC scenarios, El-Kadi et al (2014) evaluate a drought scenario that is developed from historical drought events. Similarly, Zhao et al (2016) conduct a frequency analysis using historic regional rainfall data to develop three recharge scenarios representing wet, normal, or dry weather conditions. On the other hand, bottom-up recharge perturbs current or pre-development recharge by a percentage decrease ( Change in land-use is another important factor that controls recharge. To study the impact of human activities on future recharge, land-use scenarios are developed to evaluate increased urbanization (El-Kadi et al 2014), stakeholder driven socioeconomic development (Benini et al 2016), and irrigation recharge return due to different agricultural practices (Charalambous and Garratt 2009). Landuse change includes developing adaptation scenarios. For example, Benini et al (2016) show that climate change is the most important driving force in comparison to land-use change dynamics. However, watershed restoration can help in decreasing water deficit.
A number of studies emphasize recharge related factors in assessing changes in fresh groundwater lenses for groundwater sustainability evaluation. These include developing river boundary condition scenarios (Hugman et  . Another management strategy is to passively enhance infiltration from streams by using small dams or weirs and increasing the infiltration capacity of the streambed (Hugman et al 2017). Hugman et al (2017) show that using existing large diameter wells to infiltrate surplus water from the large surface water reservoirs that are used for public water supply is effective for reducing saltwater intrusion and increasing groundwater sustainability. In general, all studies related to climate, land-use, and managed aquifer recharge show that the thickness of the fresh groundwater lenses can generally be directly proportional to recharge. However, some areas in the aquifer can be more vulnerable to decrease in recharge than others. Also, the increase in groundwater sustainability with respect to the increases in recharge volume can be a nonlinear process under different recharge scenarios.
While increase in recharge will generally increase sustainable groundwater the impact of sea-level rise is not uniquely defined. This is mainly due to the intricate relation between several factors that include landsurface inundation, flux-control or head-control inland boundary conditions, recharge variations, aquifer bed slope, and aquifer thickness to aquifer length ratio. For example, the study by Rasmussen et al (2013) shows that only changes in groundwater recharge had an effect on the saltwater intrusion for a system with flux controlled boundary conditions, but for a system with head-controlled boundary conditions, changes in recharge, sea level, and the boundary value (e.g. the stage a drainage canal) are important for saltwater intrusion. The impact of sea-level rise on land-surface inundation and aquifer salinization have been studied before to a large extent (Ferguson and Gleeson 2012, Rotzoll and Fletcher 2013, Ketabchi et al 2016. The Ketabchi et al (2016) review article discusses and highlights the gaps in literature about factors that control the impact of sea-level rise on saltwater intrusion. Here we focus on the few studies that assesses the impact of sea-level rise on groundwater sustainability evaluation. Among these, the study of Unsal et al (2014) assesses the impact of a worst case, maximum global average sea-level rise estimated by IPCC corresponding to an increase of 0.88 m at the end of 100 years, showing a reduction in groundwater sustainability by about 25% from the base-line case. Similar studies show that sea level rise will increase the risk of seawater intrusion and a related decrease in groundwater sustainability (Zhou et al 2017). On the other hand, Lathashri and Mahesha (2016) illustrate that saltwater intrusion is intensified in the area adjoining the tidal rivers rather than that due to the sea alone, and concludes that regional sea level rise of 1 mm/year has minimal impact on groundwater sustainability. Similarly, Rasmussen et al (2013) show that the recharge variation has a much more dominant impact on saltwater intrusion in comparison to a sea level rise of 0.75 m by 2100, under flux-controlled inland boundary conditions that did not result in significant impacts on saltwater intrusion.
Tidal effect is another factor that can influence groundwater sustainability in coastal aquifers. Tidal dynamics induce periodic fluctuations of the water table in coastal aquifers resulting in further saltwater intrusion. For example, Zhao et al (2016) show that saltwater intrusion risk increases with high coastal tides. This study simulates tidal fluctuations by setting the specified head on the coastline boundary to the sea level, and evaluates tidal impact on the groundwater sustainability given nine future recharge scenarios due to climate change. The study shows that without considering tidal effects, the extent of seawater intrusion would be underestimated because tidal effects can influence the model boundary conditions, and accordingly, the groundwater levels in the aquifers. Lu et al (2013) evaluate high, medium, and low tide levels with different groundwater pumping schemes. The study shows that tide-induced seawater intrusion can significantly affect the groundwater levels and salinities when groundwater pumping exceeds a certain bound.
In the literature, diverse approaches are adopted to evaluate the groundwater sustainability in coastal aquifers. One approach is to use the Hill method (Freeze and Cherry 1979) that plots simulated pumping rates against drawdowns. For example, Liu et al (2006) use this method to obtain a linear relationship between annual pumping and drawdown, and to define a groundwater management index based on the status of groundwater level changes to allow local government officials to dynamically adjust the pumping scheme. A second approach is the zero water-level change method (e.g. Pholkern et al 2019) that defines safe yield as the average pumping over a time period in which the groundwater storage level is maintained at the beginning and end of this period. The main disadvantage of these two methods is that they do not explicitly account for groundwater discharge. However, their benefit is that they have a defined timeframe. A third approach is to define a long timeframe for the evaluation, which could be challenging, and must be based on timescales of several decades (Post et al 2018). One option is to define a multi-generational timeframe, for example, until the end of the current century , Benini et al 2016, Hugman et al 2017. Rasmussen et al (2013) further aim to capture the long-term effects of the imposed climate change by treating recharge and sea level as unchanged for an additional 200 years as post-processing steps. Another methodology is to evaluate groundwater sustainability when the numerical simulation reaches a new equilibrium condition, which can be a long period. For example, Post et al (2018) show that, even for a small permeable island like Bonriki, it takes three decades to realize new equilibrium conditions that reflect pumping stresses.
Another commonly used approach for groundwater sustainability evaluation is to assess different pumping scenarios against a set of aquifer performance and governance factors (e.g. Liu et al , El-Kadi et al 2014, Hugman et al 2017. For example, Hugman et al (2017) evaluate sustainability by assessing two pumping scenarios in which groundwater use for irrigation is decreased by 25% and 50% due to decreasing irrigated land, increasing agricultural efficiency, or the promotion of the use of alternative water sources. In addition, the study evaluates the impact of spatiotemporal distribution of public-supply pumping given different users in wet and dry seasons. The aquifer factors considered comprise spring discharge and salinity levels for groundwater dependent ecosystems (Hugman et al 2017, Burnett et al 2020). Pumping schemes could also include alternative groundwater pumping options. For example, to reduce salinization, Alam and Olsthoorn (2014) evaluate the use of scavenging wells to pump shallow brackish water to replace deep fresh groundwater for irrigation.
Using a simulation optimization method to evaluate the optimal and sustainable pumping of groundwater from coastal aquifers is another common approach. This can include a manual search. For example, Pholkern et al (2019) divide the pumping scheme into eight clusters, and decrease pumping in each cluster until a feasible solution that does not violate groundwater sustainability factors is obtained. The groundwater sustainability factors considered in this study comprise drawdown change that is set to zero, reflecting the achievement of a balance between water inflow and outflow, and groundwater salinity levels at existing pumping wells that do not exceed the salinity threshold. Alternatively, a more common approach is to use an automatic search that determines the optimal pumping that satisfies the aquifer performance and governance factors (e.g. Qahman et al 2005, Schoups et al 2006, Rejani et al 2009, Javadi et al 2012, Nocchi and Salleolini 2013, Uddameri et al 2014, Renau-Prunonosa et al 2016, Kamali and Niksokhan 2017, Burnett et al 2020. For example, the study by Qahman et al (2005) develops a simulation optimization method to maximize pumping and the profit of selling water, minimize salinization and the operational and water treatment costs, and satisfy drawdown thresholds. The decision variables are pumping rates, while the constraints are headlevels and concentrations. To reduce the high computational cost of density dependent groundwater flow models, the study by Renau-Prunonosa et al (2016) uses a similar approach, but employs a constant density groundwater flow model, which assumes a direct relationship between drawdown and seawater intrusion.
The finite difference MODFLOW model is commonly used to simulate groundwater flow for groundwater sustainability evaluation in coastal and other aquifers (

Karst aquifers.
We selected karst aquifer as a second example for three main reasons. First, karst aquifers are an important water resource, because ∼15% of the global ice-free continental surface is characterized by the presence of karstifiable carbonate rock, and ∼16% of the global population lives on karst (Goldscheider et al 2020). Secondly, karst terrain has a high vulnerability in terms of groundwater quality and subsidence. Thus, the development of methods to address data uncertainty and unavailability, and the development of mathematical models that fully capture the karst complexity, and accordingly the estimation of groundwater sustainability is not a trivial task. The third reason is that karst aquifers have generally received less attention in groundwater hydrology research than porous media and fractured rock aquifers.
The model simulation cannot be more accurate than the input data. Modeling flow and transport in subsurface media has a distinctive position among all other fluid mechanics applications, since hydrogeological data is uncertain especially when it comes to karst aquifers. While porous media aquifers are often heterogeneous, conductive fractures are generally expected to have highly erratic heterogeneity, directional dependence, a dual or multicomponent nature, and multiscale behavior (Neuman 2005). Yet on top of the matrix and fracture porosities, karst media is commonly characterized by a tertiary porosity of interconnected conduits. In comparison to porous and fractured media, data unavailability is a peculiar feature that karst mediums exhibit. As these conduits occupy small fraction of the aquifer area and volume, the probability of intercepting one of the high permeability channels during a random drill is very low (Worthington and Ford 2009). In other words, the existing geophysical methods, or other site investigation methods, will not guarantee the location of active conduits, but will only increase the probability of finding them. Thus, while these conduits are controlling most of the flow, they can remain undiscovered. Even a common definition of karst is unclear. As noted by Bakalowicz (2005) 'because of the complexity of karst medium, many think that every karst aquifer is currently considered as being representative of only itself ' . This data uncertainty and unavailability problem makes karst the most challenging topic in hydrogeology.
Apart from the challenging data issue, the development and application of mathematical models to simulate flow and transport in karstified rocks are still in their infancy. Karst aquifers did not receive much attention in the past in comparison to porous media and fractured aquifers. In part, many of the methods such as the stochastic flow and transport equation that were developed for porous and fractured media did not extend to karst media. On the other hand, the methods that were developed or adopted to address the special complexity of karst are still under development in a sense that none of the existing codes can fully capture the full complexity of karst robustly. The complexity arises due to the presence of (1) diffuse, fracture, and conduit recharge, (2) laminar flow in porous matrix, fracture, and conduits filled with sediments, (3) turbulent flow in vuggy matrix, fracture, and conduits, (4) saturated or unsaturated flow for both matrix and fractures, (5) pipe or open channel flow for conduits, (6) solute transport under laminar and turbulent flow conditions, and (7) diffuse and spring discharge.
Unlike porous media aquifers, phenomenological models are generally more commonly used in karst aquifers than numerical models due to the aforementioned difficulties. Phenomenological models in karst, which are termed global models, can be classified as black box or grey box models. These models are represented functions that relate the system inputs, outputs, and responses. These do not provide specific spatial information, but temporal information of the global system responses to different inputs and thus they can be used for predication. Also, by analyzing the system inputs and outputs, the system components or functions can be resolved. Thus, these methods can be extended in application to characterize vulnerability at a spring catchment (Butscher and Huggenberger 2008), nitrate transport (Pinault et al 2001), coastal aquifer (Pinault et al 2004), and surface water and groundwater exchange (Pinault and Schomburgk 2006), among other applications related to groundwater sustainability. The use of these methods is common in karst hydrology, since they can bypass the data problems as previously highlighted, yet the range of their applications is limited. On the other hand, the applications of numerical models can be wider with respect to assessment and management of water resources and mitigation of hazards such as subsidence.
Studies of groundwater sustainability for karst aquifers face a number of challenges in numerical modeling, aquifer characterization, and groundwater and surface water interaction. The major challenge is that the dual-media (matrix and conduit) characteristics of karst aquifers make groundwater sustainability definition complicated because groundwater flow is laminar in a matrix, but turbulent in conduits. Therefore, groundwater sustainability should be defined for matrix and conduits separately. However, since the spatial distribution of conduits is always unknown, groundwater sustainability cannot be defined specifically for conduits. One solution to this dilemma is to use the concept of equivalent porous medium by assuming that conduit flow is also laminar (or more specifically that the representative elementary volume of a karst aquifer is large enough to average out local influence of karst conduits), and thus can be studied by the tools developed for porous and fractured media (Scanlon et al 2003, Ghasemizadeh et al 2012. The field-scale modeling of Davis et al (2010) demonstrates that this concept is valid when using MODFLOW and MT3DMS to simulate groundwater flow and nitrogen transport in the Woodville Karst Plain of northern Florida, USA. However, Kuniansky (2016) points out that the concept of equivalent porous medium cannot simulate the response of the groundwater system to short-term hydrological events such as heavy rainfalls. This is not surprising because the concept of equivalent porous medium is merely an approximation and cannot fully describe turbulent flow in conduits.
In the last decade, building on the concept of discrete conduit continuum, MODFLOW-CFP (conduit flow process) (Shoemaker et al 2007) and CFPv2 (Reimann et al 2014) have been developed to couple laminar flow in porous media and turbulent flow in conduits. While the concept of discrete conduit continuum is theoretically more advanced than the concept of equivalent porous medium and the computer codes have been used for field-scale studies (e.g. Gallegos et al 2013, Xu et al 2015, the research of coupling laminar and turbulent flows in karst aquifers is still in its infant stage, and many mechanisms of the coupling are still largely unknown. For example, while MODFLOW-CFP and CFPv2 assume that the flow exchange between matrix and conduits is a linear function of the difference between the heads of matrix and conduits, this assumption may not be valid (Pacheco Castro et al 2020). In addition, field applications of the computer codes require knowing spatial distribution of karst features (e.g. conduits and sinkholes), where the distribution is the only large unknown at any field sites. Lacking quantitative tools of groundwater modeling is the major challenge of evaluating groundwater sustainability in karst aquifers.
Efforts have been spent to develop quantitative tools without requiring detailed aquifer characterization. For example, models based on the concept of lumped-parameter equivalent porous medium have been developed (e.g. Scanlon et al 2003, Santos and Andreu 2010, Jódar et al 2014. Such models treat the matrix as a reservoir and conduits as another reservoir, and thus do not require knowing the spatial distribution of karst features. However, many parameters (e.g. reservoir areas and water depth) of these models are fudge factors, making the model not useful in practice. These kinds of problems also exist in statistical models developed for groundwater sustainability studies. For example, Yin et al (2012) develop an artificial neural network for an assessment of sustainable yield of karst water in Huaibei, China. Using machine learning techniques for groundwater sustainability studies may be a future direction if available data are sufficient to develop a statistical model (e.g. an artificial neural network), but insufficient to develop a mechanistic model (e.g. a MODFLOW-CFP model).
Studying groundwater sustainability of karst aquifers requires advanced understanding of groundwater and surface water interactions in the aquifers, because groundwater and surface water are strongly and dynamically connected through karst features such as sinkholes. For example, a stream can entirely disappear when it falls into a sinkhole and becomes groundwater. This phenomenon is well described in the review article of Tihansky (1999). The groundwater and surface water interactions make the groundwater sustainability studies challenging because of the dynamic nature of the interactions. For example, a productive karst aquifer may become unsustainable quickly if its recharge from surface water is reduced substantially over a drought period. Studying groundwater and surface water interactions again faces the difficulty of lacking knowledge and data. For example, it is rarely known how much surface water directly drains into groundwater aquifers. Resolving this problem requires investigating resources on aquifer characterization, and monitoring program development. Here we extend this discussion by conducting a systemic review of articles that incorporated the ecosystem services concept in groundwater sustainability evaluation. Given the pool of articles from the screening stage, we identified 40 research articles related to GDE, with 26 articles including a groundwater modeling component. The eligibility criteria for inclusion in this review article is groundwater sustainability evaluation relevance, and all the articles were eligible to be included. In addition, we conducted a full-text assessment of eight review articles about GDE. In this section we first provide a brief overview about the concept of ecosystem services and its importance for groundwater sustainability, threats to ecosystem services due to unsustainable groundwater management practices, and the state of science in incorporating the ecosystem services concept in groundwater sustainability from an ecohydrology perspective. This is followed by a summary of our aforementioned systematic review, which mainly focuses on how the ecosystem services concept has been incorporated in groundwater sustainability evaluation.

Ecosystem services modeling
Layering GDE on top of an inherently complicated SW-GW system adds another layer of complexity, which needs to be considered when evaluating a groundwater sustainability policy. GDE include springs, wetlands, peatlands, wet forests, lakes, rivers, streams, hyporheic zone, estuary, coastal lagoons, nearshore marine environments, and aquifers. GDE provide ecosystem services, which are benefits that people obtain from ecosystems. These include provisioning services (e.g. supply of water, food, water, and timber), regulating services (e.g. water purification, waste treatment, erosion regulation, flood control, drought buffer, and flora and fauna habitat), cultural services (e.g. provision of recreational, educational, research, aesthetic and spiritual benefits), and supporting services that are necessary for the production of all other ecosystem services such as water and nutrient cycling (Millennium Ecosystem Assessment 2005, Knüppe et al 2016). The reader is referred to other studies (Tuinstra and van Wensem 2014, Knüppe et al 2016) that provide a detailed description of ecosystem services that depend on groundwater. Groundwater is important for safeguarding ecosystem functions and services because groundwater helps to maintain water levels, temperature, oxygen content, dissolved ions, nutrients, organic matters, and (bio)geochemical conditions required by plants and animals in GDE (Kløve et al 2011, Gleeson andRichter 2018). For detailed discussion about the importance of various groundwater attributes to different classes of GDE the reader is referred to Eamus et al (2016).
The aforementioned ecosystem functions and services will be altered by groundwater pumping. Although this relation is poorly understood, a few articles reviewed the impact of groundwater pumping on wetlands (De La Hera et al 2016), streamflow (Gleeson and Richter 2018), and multiple GDE under climate change (Klove et al 2014). Yet there is a general understanding that groundwater capture can detrimentally impact ecosystems. Groundwater pumping and capture translate into (1) increased recharge rates, (2) decreased areas of rejected recharge where the water table is close to the ground surface that is vital for wetlands, (3) changes in groundwater pressure, (4) increases in drawdown, (5) decreases in streamflow, baseflow, spring discharge, and submarine groundwater discharge, and (6) changes in water quality and physio-chemical properties (Kalf and Woolley 2005, Konikow and Leake 2014, De La Hera et al 2016. For example, the impacts of increased drawdown are that phreatophytes will no longer able to reach the water-table (De La Hera et al 2016), and soil salinization may increase (Shi et al 2012). For details about the impacts of groundwater pumping and quality on GDE the reader is refereed to Eamus et al (2016).

Science lagging behind policy.
Ecosystem services is an integral policy component in sustainable groundwater management. The aforementioned external pressures from the groundwater system can change the structure and functioning of GDE, cause habitat destruction, result in loss of sensitive species and biodiversity, and lead to gradual loss of ecosystem services as discussed in details by Klove et al (2014). This conflict between ecosystems and human needs is increasing, as in the past century human population quadrupled, irrigated agricultural land increased by six-fold, and water pumping rates from fresh water ecosystems increased eightfold (Sophocleous 2007). Thus, safeguarding ecosystem services is becoming mandatory under different groundwater sustainability policies. Rohde et al (2017)'s review article provides a global synthesis of managing GDE under groundwater sustainability policies with special focus on Australia, California, the European Union, and South Africa.
The science to quantify ecological water needs to support GDE is in its infancy (Wheater and Gober 2015, De La Hera et al 2016, Rohde et al 2017, Gleeson and Richter 2018. In Hawaii, for example, the impacts of submarine groundwater discharge on marine ecosystems (which are especially important to the indigenous Hawaiian culture and Hawaii's tourism-based economy) have been studied (e.g. Duarte et al 2010, Delevaux et al 2018. Yet we do not know the ecological water requirements, which is how much water the GDE needs. Thus, the groundwater sustainability estimates that would safeguard these ecosystem services have never been quantified, except as a trade-off curve between pumping and available fraction of the pre-development ecological water (Burnett et al 2020). In other words, a significant challenge is to determine the groundwater needs of GDE, and the acceptable amount of water that can be withdrawn from ecosystems without severally degrading the natural functioning and productivity of GDE (Sophocleous 2007, De La Hera et al 2016. This is a challenging task because how ecosystems depend on hydrological drivers, and how they respond to predicted changes in hydrology at different spatial and temporal scale issues is currently not well described, largely due to knowledge gaps in ecohydrology (Klove et al 2014, Rohde et al 2017. This is also due to sparsity of monitoring data that includes information on geomorphology, ecology and hydrology of ecosystems (Klove et al 2014). In addition, the diversity of groundwater dependent ecosystems (e.g. Howard and Merrifield 2010) makes it difficult to provide a one-size-fits-all management solution (Rohde et al 2017). Although most tools for managing groundwater-ecological systems are not yet well defined, De La Hera et al (2016) conclude that the integration of groundwater processes and ecohydrological concepts is currently possible, and may result in more sustainable outcomes.

Science into action: modeling approaches.
How to incorporate ecosystem services to groundwater sustainability evaluation is unclear. To bridge the gap between scientific research and regulatory needs, Klove et al (2014) suggest establishment of a list of measurable indicators that describe GDE vulnerability at several spatial and temporal scales with multiple factors (physical, chemical and biotic); this is followed by identification of the linkages between these indicators and groundwater hydrology, climate change impacts, and human activities. In practice, to account for ecosystem services in groundwater sustainability evaluation, there are two related questions that can be assessed. These are: 'How much ecological water will be available given a certain pumping scheme? What are the ecological water needs of the GDE?' While the first question is relatively easy to answer, the second question is much more involved. Hydrological modeling can be used to answer the first question using multiple hydraulic, water budget and water quality indicators. For example, water-level is a commonly used indicator to account for ecological water demand (Stigter et (2017) discussed the scientific underpinning for establishing indicators with thresholds, highlighting that the inherent diversity of groundwater dependent ecosystems requires indicators and thresholds to be locally determined, due to differences in reliance on groundwater, species composition, and adaptive capacities to varying threats. The challenge is how to define the threshold of these indicators. Multiple approaches are used to address this question.
One straightforward approach is to conduct scenario analysis to show the impact of different indicator thresholds. For example, Hu et al (2010) tested the impact of different agricultural and irrigation scenarios on ecological water, showing that improving agricultural practices can help groundwater recovery and increase ecological water supply. Similarly, Stigter et al 2009 used FEFLOW to test future scenarios of recharge due to climate change and water demand increases, showing the difficulty of meeting ecological water for GDE as the lowering of hydraulic heads would significantly reduce or stop spring discharge, and increase saltwater intrusion. Ronayne et al (2017) used MODFLOW with streamflow-routing package to evaluate groundwater sustainability for the natural base-case scenario, and managed aquifer recharge scenario, with the purpose of increasing streamflow to improve riverine species habitat in the South Platte River, Colorado. The study uses capture concepts to assess how pumping and recharge alter the head-dependent flows to and from the aquifer that in turn can help explain impacts on streamflow. A similar study (Scherberg et al 2014) shows that a managed aquifer recharge scenario increased groundwater discharge through springs and stream beds, benefiting aquatic habitat rather than building long-term aquifer storage.
A second approach to is to use insights gained from hydrological modeling to determine the indicator threshold. For example, Hsu et al (2012) in the Chih-Ben watershed, Taiwan used changes in groundwater levels that can lead to variation in soil-water content, and accordingly the deterioration of the GDE as an indicator. The indicator threshold is determined using mechanistic SW-GW model simulations that showed that groundwater sustainability evaluation is sensitive to any small drawdown in the downstream region. This causes a significant decrease of groundwater levels in the mountain areas impacting biodiversity and other ecosystem services. Also, Mulligan et al (2014) use average stream flow as an indicator and the threshold is set based on a percentage of the historic flow that would not cause infeasible solutions in the model runs.
A third approach is to use a simulation optimization method with trade-off evaluation of different indicator thresholds. Shi et al (2012) used a trade-off approach for meeting ecological environment water demand while balancing the integrated benefits of the society, economy, environment, and resources. This study evaluated groundwater sustainability under twelve strategies of four surface reservoir supply scenarios with low to high proportions of water crossed by three river ecological water demand thresholds of minimum (10% of multi-annual runoff), suitable (30% of multi-annual runoff), and optimal (60% of multi-annual runoff). By conducting trade-off evaluation and ranking these solutions, Shi et al (2012) show that the optimized groundwater yield could be sustained by high reservoir supplies to maintain suitable ecological water demand, while meeting human needs. Similarly, Burnett et al (2020) use a simulation optimization method to develop a trade-off curve of optimal groundwater pumping at different levels of spring discharge, which is important for terrestrial GDE.
A fourth approach to determine the indicator threshold is to estimate the water needs of GDE. The goal is to define measurable ecological indicators that assess GDE; and link these indicators to hydrological models to evaluate the groundwater sustainability accordingly. Monitoring and targeted research can identify which aspects of the hydrological regime are most important in supporting the structure and function of an ecosystem, and accordingly the ecological water requirements can be quantified using this best available scientific information (Rohde et al 2017). For example, following the guidelines from the Danish EPA with a participatory evaluation process, Henriksen et al (2008) formulate an ecologicalbased indicator focusing on baseflow and the ecological objectives of river reach. This indicator was used with other hydraulic and aquatic indicators to evaluate groundwater sustainability in Denmark using an integrated SW-GW via the model MIKE-SHE. Establishing these indicators with thresholds requires studying multiple ecological factors. For example, Aldous and Bach (2014) estimate groundwater sustainability by establishing hydro-ecological relationships that can be used to develop quantitative, measurable thresholds that are sensitive to changes in groundwater quantity and drawdown. Also, in response to the Massachusetts Water Management Act, Levangie (2008) shows that successive monitoring and research efforts resulted in moving from safe yield to groundwater sustainability policy through defining an indicator that focuses on habitat and streamflow requirements of the basin. Ecohydrological modeling can also be used to formulate these indicators. For example, Yang et al (2016) use field survey data and an eco-environment assessment model to develop ecohydrological indicators for the Tuwei River watershed in Shaanxi Province, China. The indicators are used to evaluate and rank multiple pumping schemes using an ensemble of mechanistic and phenomenological hydrological models that simulate water budget, groundwater levels, and water quality. Generally, selecting and formulating these indicators to maximize the amount and quality of information about the ecological integrity of a system, while minimizing the time and expense involved, remains a challenge (Klove et al 2014, De La Hera et al 2016. Fifth, a pragmatic approach is to evaluate the indicator thresholds from a hydroeconomic modeling perspective. As explained in detail in section 4.3, hydroeconomic models can be used to find optimal groundwater pumping schemes that would maximize a utility function such as increasing ecosystem services, and to evaluate trade-offs between human and ecological water needs. In doing so, the optimum path of pumping rate does not necessarily converge to the recharge rate, but depends on the costs associated with GDE damages (Pereau and Pryet 2018). As an example, Esteban and Dinar (2013) used an linear utility function to estimate damages to GDE given drawdown and reduction in the flows that feed GDE. Commonly, hydroeconomic models use lumped phenomenological hydrological models. Pereau and Pryet (2018) reviewed hydroeconomic modeling studies highlighting that most of these studies can be considered an illustration of the water budget myth as they do not account for natural discharge in the water budget, which is important for sustainable groundwater management in the presence of ecosystem damages. Thus, the study of Pereau and Pryet (2018) developed a hydro-economic model that takes into account environmental flows both in the water budget, and in the utility function. Kahil et al (2016) note that a major gap in hydroeconomic modeling is the weak integration of physicallybased representations of SW-GW interactions to inform complex basin scale policy choices. Henriksen et al (2008) emphasize the need for a mechanistic SW-GW model to allow for discretization and explicit representation in space and time of exchange flow components between saturated zone flows, unsaturated zone, overland flows, river reaches, and drains. On the other hand, other studies (Seward et al 2006, Seward 2010) emphasize the importance of phenomenological models particularly when an adaptive management approach is adopted.

Science into action: adaptive management.
Modeling is only one tool to evaluate ecosystem services indicators, and other practical approaches include adaptive management, risk assessment, and participation. The concept of adaptive management has a long history of application in ecosystem management (Knuppe and Pahl-Wostl 2011). This is mainly because we recognize that GDE are complex systems; our knowledge about GDE is imperfect; our ability to predict the characteristics and responses of GDE to groundwater pumping is limited; the outcomes of our groundwater management practices on GDE are uncertain; human activities and societal attitudes towards GDE are diverse and variable; our knowledge about GDE management needs to be developed incrementally; and more ecohydrological research and modeling studies alone will not help us to improve groundwater sustainability evaluation. The concept of adaptive management acknowledges these concerns, and adopts a pragmatic, flexible and experimental 'learning by doing' approach to management. Adaptive management enables water managers to allocate groundwater based on routine monitoring (e.g. groundwater levels, water quality metrics, instream flow criteria, and vegetation growth), targeted scientific investigations (e.g. plant water-use modeling, numerical groundwater modeling, and environmental or isotopic tracers), participation, and exploration programs to test potential pumping schemes (Seward et al 2006, Rohde et al 2017. This is to determine the hydrological conditions and thresholds required to maintain a GDE (Rohde et al 2017), to learn about the resilience and buffering capacity of the systems (Henriksen et al 2008), and to select prefered options (Seward et al 2006).
Several case studies show the effectiveness of the adaptive management approach when successfully implemented. For example, Rohde et al (2017) state that adaptive management is the core of Australia's approach for managing GDE. As such the Australian government commissioned the development of a practical 'GDE Toolbox' to assist Australian state agencies in revising conceptual models, identifying threshold responses, and identifying and managing GDE for water plans. The study of Rohde et al (2017) shows that New South Wales adopted the most specific and comprehensive statewide approach for managing GDE by combining the adaptive management strategies required in the Australia National Water Initiative with ecological valuation and risk assessment. This is to strategically target GDE with high value and high risk that require the greatest attention. This is especially needed when large uncertainty exists in the early management years, and limited financial resources are available for monitoring (Rohde et al 2017). Similar regulatory programs were undertaken by the state of Michigan, USA, the province of Ontario, Canada, and under the EU Water Framework Directive as discussed by Gleeson and Richter (2018). For example, to meet the California Sustainable Groundwater Management Act requirements for GDE, a guidance document (Rohde et al 2018) was developed based on best available science to help agencies, consultants, and stakeholders to efficiently incorporate GDE into groundwater sustainability plans. Participation is an integral part of adaptive management because indicator thresholds, and tradeoffs cannot be decided on by science alone and community values need to be integrated (Seward 2010), and to enable stakeholders and decision makers to make risk-based consensus decisions (Gleeson and Richter 2018).
While the concept of adaptive management may appear to be in conflict with the stance that sustainable groundwater volume is a fixed number or range that science can determine with more research, this is not the case. Seward et al (2006) clarify that the role of scientists is to identify a range of sustainability options with expected consequences, while the role of groundwater users and managers is to select the preferred option; then scientists can monitor the outcomes of that option and revise the sustainability scenarios as required. This is further elaborated in section 6.

Human activities and groundwater sustainability.
Human activities add a third layer of complexity to the modeling framework. Sustainable water resources management requires consideration of human-groundwater systems interaction for several reasons. From the policy side, accounting for human activities is imperative for groundwater sustainability policy-making and implementation. For example, most of the articles related to human activities modeling, that were eligible to be included after our full-text review, address a policy issue. These articles addressed policies related to regulation of flow, quality, allocation, or pricing of groundwater ( (Kahil et al 2016, Wurl et al 2018, and groundwater replacement policies (Badiuzzaman et al 2017). These policy components also encompass evaluation and comparisons of the impact of different groundwater policy options and mechanisms (Aistrup et al 2017, Esteban and Dinar 2013, Farhadi et al 2016, Harou and Lund 2008, Hyndman et al 2017, Zellner and Reeves 2012, groundwater policy guidance and development (Zellner 2007, Hu et al 2010, Shi et al 2012, Fernald et al 2015, Salem et al 2017, and evaluation of modeling approaches related to groundwater sustainability policy (Mulligan et al 2014, Macewan et al 2017. From the science side, limited characterization of the human-groundwater systems interaction has resulted in inadequate groundwater sustainability evaluation. This is mainly because the groundwater sustainability is a function of both aquifer performance and governance factors (figure 3). As these factors are changing through space and time, it is becoming increasingly clear that improved accounting of changes, interactions, and feedback within coupled human-water resources systems is important (Montanari et al 2013, Vogel et al 2015. Moreover, human activities have a big impact on model simulation and predictions. Vogel et al (2015) challenge the emphasis on natural processes as the main predictor of hydrological response, given the integration of human systems into hydrological design. Accounting for human activities is important for studying the improvement, efficiency, and sustainability of groundwater resources systems because societal and land-use changes can cause fundamental shifts in basin functions that may be of the same order of magnitude, if not greater, than those predicted by climate change (Ceola et al 2016, Ferguson andMaxwell 2012). While mathematical models about natural aspects of groundwater resources systems have been a matter of research for centuries and are generally well developed, human activities and behavioral models in hydrology are much newer and less developed (Giuliani et al 2016).
In this section, we review human activities modeling in the groundwater sustainability literature, which is relatively limited in comparison to natural aspects, although rapidly expanding ( figure   10). We conducted a full-text review of 46 articles about groundwater sustainability at different geographic locations that describe groundwater systems and human activities. After our full-text review, 36 articles were eligible to be included in the review article based on the relevance of the work. For example, we excluded articles that are not directly related to groundwater sustainability, and articles that are mainly economics-driven rather than hydrologydriven. We classified these articles under the three research areas of integrated water resources management, hydroeconomic modeling, and sociohydrology resulting in 8, 11, and 17 articles respectively. This section summarizes human activities modeling approaches that are used in these studies. These include land-use scenarios and backcasting, simulation optimization methods, hydroeconomic optimization, regional economic-engineering models, system dynamics modeling, holistic natural-human models, and agent-based modeling.

Integrated water resource management.
Several studies in hydrosociology and integrated water resources management address human activities modeling for sustainable groundwater development. Hydrosociology is the study of the impacts of human activities on water resources systems and the societal impacts of water resources projects (Falkenmark 1979). Integrated water resources management extends hydrosociology. The Global Water Partnership (2000) defines integrated water resource management as 'a process which promotes the coordinated development and management of water, land, and related resources in order to maximize the resultant economic and social welfare in an equitable manner without compromising the sustainability of vital eco-systems.' The eight reviewed articles under these two research sub-fields use mechanistic groundwater models to relate human activities to groundwater systems. While MODFLOW is commonly used In these studies, sustainable groundwater development is evaluated such that human activities are modeled using scenario analysis and simulation optimization. The scenario analysis approach estimates how the groundwater system responds to different scenarios related to human activities. These scenarios represent changes in agricultural and domestic water demand (Ramesh and Mahesha 2008) . These scenarios were developed using an array of approaches, from simple to complex. As a simple approach, Feng et al (2018) use an attribution approach based on historical trend analysis of coupled SW-GW model simulations to develop scenarios of upstream inflow and groundwater pumping that are a function of climate conditions and human activities. While agricultural water use in this study was determined externally and assumed unchanged under future climate conditions, Ramesh and Mahesha (2008) estimate increase in agricultural and domestic water demand based on population forecasting. This was used to evaluate sustainable groundwater development under different development scenarios that are a function of agricultural and domestic water demand and recharge. Similarly, Li et al (2013) define sustainable management practices by using a ground-settlement recovery scenario and groundwater exploitation planning scenarios generated from local development plans. Wurl et al (2018) adopt a more elaborate stakeholder-driven approach to conduct PESTE (political, economic, social, technological, and environmental) and SWOT (strengths, weaknesses, opportunities, and threats) analyses to define water insecurity scenarios and sustainable pumping scenarios. More elaborate dynamic scenarios can also be developed. For example, Shu et al (2012) adopt an integrated modeling approach using MIKE-SHE to conduct a scenario analysis of the effect of different crop rotations, irrigation intensity, water transfer projects, and other water management options. Also, Hu et al (2010) use a crop-growth model to develop agricultural water-saving scenarios with agronomic factors such as water use efficiency and crop productivity to ensure groundwater recovery. Similarly, Hyndman et al (2017) use SALUS (System Approach to Land-use Sustainability) to develop dynamic land-use scenarios to study groundwater sustainability in a megacity. Pulido-Velazquez et al (2015) sequentially couple SWAT and MODFLOW with a nitrate mass-transport model in MT3DMS to evaluate the impact of climate and land-use change on groundwater sustainability strategies such that landuse scenarios were developed based on historical analysis and by accounting for multiple socioeconomic factors.
Whereas scenario analysis provides a solution given a scenario, the purpose of simulation optimization is to maximize a target objective. Shi et al (2012) evaluate sustainable groundwater development for a river basin using an integrated simulation optimization method that assesses the integral benefit of water resources development and utilization by maximizing water use efficiency, optimizing environmental water demand, and minimizing anthropogenic influence on groundwater systems. This is carried out using an integrated hydrological model coupled with a benefitcost-loss model, which accounts for multiple aquifer performance and governance factors using cost and trade-off functions (e.g. substitute expense method, shadow engineering method, allocation coefficient method, opportunity cost method, and market value method). These factors include benefits (industrial, irrigation and domestic water supply, flood control, inter-generational equity, tourism, and culture), cost (surface water and groundwater), and loss (groundwater and surface pollution, phreatic evaporation, soil salinization, increase exploitation cost, river sedimentation, impact on farmland and people, reservoir surface evaporation).

Hydroeconomic modeling.
We reviewed 21 hydroeconomic studies and 11 were eligible to be included based on relevance. We only included hydroeconomic modeling studies and not hydroeconomic indicator studies. For example, Livingston and Garrido (2004) suggest a set of hydroeconomic indicators for assessing the relative success or failure of groundwater policies that address emerging physical, economic, and environmental stress on groundwater resources. Other similar studies that are not modeling-related were excluded. We also excluded studies that utilize global hydrological models to study groundwater sustainability at a large scale (Zaveri et al 2016).
Hydroeconomic modeling Howitt 1982, Harou et al 2009) aims to optimize the economic objectives of a water system subject to natural or societal constraints, or to evaluate optimal water conservation options and development projects (Pande and Sivapalan 2017). Hydroeconomic models attempt to capture the complexity of water users' economic decisions and the biophysical constraints of the water system (Macewan et al 2017). Generally, the overarching goal of an economically efficient (optimized) water policy is to ensure that water can be sustainably provided to meet users' needs at an affordable price (Kahil et al 2016). Depending on the groundwater sustainability objectives, the hydroeconomic model that is linked to the hydrological model could be a simple crop or water demand model addressing one or a number of human activities related to groundwater sustainability such as agriculture ( , economic optimization agent-based model (Mulligan et al 2014), and Open Modeling Interface Standard (Aistrup et al 2017). Note that a system dynamics model uses a causal loop diagram to represent the constituent components of the system and their interactions including feedback loops to determine the system's behavior over a period of time. An agent-based model simulates the behaviors, actions and interaction of individual and collective agents to assess their effects on the system as a whole. The Open Modeling Interface Standard (Gregersen et al 2007) is an interface that facilitates model integration by allowing existing models to run simultaneously and share information. In addition, more comprehensive region-specific economic models are used such as California's CALVIN regional economic-engineering model ( (Salem et al 2017). Alternatively, Kahil et al (2016) and Mulligan et al (2014) use the distributed mechanistic model of MODFLOW. This approach more accurately simulates the spatial and temporal heterogeneity aquifer and interaction with river flow (Kahil et al 2016). In addition, Mulligan et al (2014) compare lumped and distributed models to show that a detailed distributed mechanistic model with spatially heterogeneous representation (rather than a lumped groundwater model) allows for spatially disaggregated human activities representation, which is necessary to properly investigate policy instruments in a groundwater basin. Such a fullycoupled approach between human-groundwater system is generally a more robust framework to support water management decisions when optimal groundwater policy has multiple objectives with complex interactions with environmental variables (Macewan et al 2017). However, phenomenological models could still be advantageous when mechanistic understanding of the aquifer system is not possible due to data scarcity. In addition, calibration of fullycoupled human-groundwater hydroeconomic model could be difficult, and data exchange between two systems is not generally straightforward. Data exchange between hydrological and socioeconomic systems can be bidirectional with system dynamics modeling (Valderrama et al 2011, Alcala et al 2015, agent-base modeling (Mulligan et al 2014), and Open Modeling Interface Standard (Aistrup et al 2017). This bidirectional feedback, which allows for developing coevolving scenarios between human-water system, is one of the emerging ideas and ways forward for the progress of hydrology (Montanari et al 2015).

Sociohydrology and other trending approaches.
With respect to sustainable groundwater development, while hydroeconomic modeling operationalizes economic concepts and incorporates them into water management to find feasible and optimal solutions, sociohydrology integrates the humandimension as an endogenous component of water systems to develop an understanding of the dynamics and bidirectional feedback of the coupled humanwater system Blöschl 2015, Wheater andGober 2015).
Hydrology is relevant to society, and thus requires integration with the natural and social sciences to better support water management decisions (Wagener et al 2010, Lund 2015, Vogel et al 2015. With respect to sustainable groundwater development, hydroeconomic models are mainly normative models aiming at operationalizing and incorporating economic concepts into water management to maximize a utility function to find feasible and optimal solutions (Giuliani et al 2016). On the other hand, sociohydrology is more descriptive, aiming at integrating the human-dimension as an endogenous component of the water system. This is to develop an understanding about the dynamics, bidirectional feedback, and coevolving scenarios of the coupled humanwater system in the past and when exposed to altered boundary conditions such as climate or other socioeconomic drivers Blöschl 2015, Wheater andGober 2015). As noted by Wesselink et al (2017), the hydrological science community has recently launched sociohydrology as the research theme for the current decade (2013-2022) to advance hydrological science for the benefit of society. As such, the water system acts as a changing interface between environment and society (Montanari et al 2013). This poses a necessity and a challenge for water sustainability analysis to explore the endogenization of human-dimension (e.g. values, societal preferences, technology, economics, etc) in space and in space-time (Pande and Sivapalan 2017). A sociohydrological system is generally composed of four sub-systems that are the hydrological, ecological, economic, and societal sub-systems (Liu et al 2015). For details about sociohydrological system conceptualization and modeling the reader is referred to recent reviews (Sivapalan and Blöschl 2015, Blair and Buytaert 2016, Roobavannan et al 2018).
As the science of sociohydrological modeling is emerging, so are sustainable groundwater development studies from a sociohydrology prospective. We reviewed 17 studies that we divide into two categories. The first category includes articles that provide a sociohydrological perspective about the humanwater system of the study area over a long period of time. These articles do not necessarily include a modeling component, but rather present a comprehensive overview based on reviewing and synthesizing existing data and literature. By considering the changes in values and norms governing the aquifer system along with natural events such as increase or decrease of precipitation, the aim of these studies is to describe changes in the structure, dynamics and different stages of sustainable groundwater development thresholds and tipping points. Thus, these studies provide insights on how groundwater resources react to pressures from human activities and in turn how the society reacts to threats to groundwater resources . This could also include using a human-ecosystem-water modeling framework with a water accounting method such as the concept of water metabolism (Cabello et al 2015).
The second category includes articles about coupled human-water system modeling to study sustainable groundwater development. Diverse coupled human-water system modeling approaches are used in sociohydrology, including agent-based modeling, coupled component modeling, system dynamics, game theory, Bayesian network, and pattern-oriented modeling (among others) as reviewed by Blair and Buytaert (2016). The reviewed groundwater sustainable development studies (Zellner 2007 , which is finite-difference solution of the governing equations of groundwater flow with agent-based simulation. Two studies in Michigan (Zellner 2007, Zellner andReeves 2012) use an agent-based land-use model to study sustainable groundwater development with respect to urban growth by suggesting alternative forms of development and water-use to understand how these processes interact to create the observed patterns of water resource depletion and sustainability. Al-amin et al (2018) develop an agent-based model for Arizona that simulates water demands to capture the dynamic interactions among household-level consumers and policy makers to implement mandatory restriction policies for groundwater sustainability. Farhadi et al (2016) use an agent-based model in Iran with a multi-objective simulation optimization method that is informed by stakeholders. This is to evaluate groundwater sustainability policy mechanisms that encourage agents to cooperate with the management decisions. Castilla-Rho et al (2019) investigate case studies in the Murray-Darling Basin (Australia), the California Central Valley (USA), and the transboundary Punjab aquifer (India and Pakistan) to show that effective groundwater sustainability regulations and their implementation need to account for culture.
From the above analysis we can conclude that unlike classic Integrated Water Resource Management (IWRM) approaches that mainly focus on the one-way relationship between human-water systems, sociohydrology can provide more insights through a two-way dynamic relationship. This might not be useful in all cases. Yet such involved analysis is particularly important when human factors could have a larger impact on groundwater sustainability solutions in comparison to other factors such as climate and geology. In that case, descriptive sociohydrological models can provide insights into the dynamic interaction between human-water systems, and help to find better sustainable groundwater development solutions (Zellner 2007, Zellner and Reeves 2012, Farhadi et al 2016, Al-amin et al 2018. In addition, a few studies turn descriptive sociohydrological tools to normative approaches whose ultimate goal is to predict the optimal human decisions toward sustainable groundwater development using both phenomenological models (Valderrama et al 2011, Alcala et al 2015, Aistrup et al 2017 and mechanistic models (Mulligan et al 2014, Farhadi et al 2016. Using a mechanistic model can provide site-specific details, while using a phenomenological model can provide a regional economic context and an understanding about macro-level properties, functioning, dynamic interactions, pathways, and sustainability tipping points of human-water systems. However, major challenges to human activity modeling studies stem from the profound uncertainties associated with both the natural and human aspects of the complex and dynamic water systems (Wheater andGober 2015, Blair andBuytaert 2016) as discussed in the following section.
6. Scientific evaluation of groundwater sustainability: uncertainty analysis 6.1. Uncertainty analysis in groundwater sustainability evaluation The nature of scientific knowledge is that it is uncertain. While this calls for uncertainty analysis to help us to select groundwater policies that reflect what we know and manage the risk accordingly, in practice this 'has been hampered by ingrained ideas, inadequate training, and inadequate resources' (Kitanidis 2015). As a result, this topic generally does not receive adequate attention (which is reflected by the literature reviewed in this section). In fact, it is not uncommon to find some local agencies request that researchers refrain from communicating uncertainty to them or in public meetings. Clarifying this sentiment, Leduc et al (2017) states: any uncertainty surrounding scientific knowledge has consequences for both authorities and stakeholders. In the worst case, it may be interpreted as lack of knowledge, and exploited as such to delay remediation works that could be costly financially or electorally. More generally, water managers are faced with the increasing complexity and fragility of socio-hydrosystems, while reliable information is often hard to come by. Water managers need to design longterm strategies to protect the collective interest and the sustainability of groundwater resources.
Such sentiments of uncertainty analysis are prevalent in groundwater management. The aim of this section is to review literature that has incorporated uncertainty analysis into groundwater sustainability evaluation to show that it is an essential component in groundwater sustainability elevation to facilitate reliable information.
Reviewing the sources of uncertainty and uncertainty analysis methods is beyond the scope of this work. Rather we focus on two questions that receive less attention. The first question is why we need to conduct uncertainty analysis, for which we provide five reasons (section 6.2). The second question is what the typical questions are that uncertainty analysis tries to answer. For this we reviewed 33 articles from section 5.0 that discuss uncertainty, and inferred eight purposes for conducting an uncertainty analysis. While several review articles about groundwater sustainability have discussed uncertainty (e.g. Custodio 2002, Maimone 2004, Seward et al 2006, Molina et al 2012, Walton and Mclane 2013, to our knowledge this is the first article that provides a systematic review of the topic (section 6.3). We also discuss two components in a successful uncertainty analysis, which are multi-model uncertainty analysis, and adaptive management. Multi-model uncertainty analysis is the theme of the 2016 Darcy lecture (Ferré 2017a), and adaptive management is an integral policy component in several water codes.
As a note on terminology, while many studies attempt to classify different types, levels, and sources of uncertainty (e.g. Beven 2016, Guillaume et al 2016), we adopt a simple classification for the sake of clarity. We classify uncertainty as parametric uncertainty, conceptual or model uncertainty, and scenario uncertainty (Meyer et al 2007, Dai et al 2015. Parametric uncertainty refers to a continuous or discrete variable that are model parameters. Conceptual or model uncertainty refers to alternative methods, mathematical structures, conceptualizations, assumptions, etc. Scenario uncertainty refers to a future event such that 'scenarios are images of the future, or alternative futures' (IPCC 2000). Take recharge as an example. Recharge can be represented as a model parameter with continuous probability distribution using for example a recharge multiplier (e.g. Mustafa et al 2018); alternatively, multiple conceptual models can be developed and calibrated to represent different conceptualizations about recharge (e.g. Feng et al 2018, Ye et al 2008; and multiple scenarios can be developed to represent possible future recharge (e.g. Pholkern et al 2019). We also use the term uncertainty analysis generically to refer to discussing, identifying, describing, characterizing, resolving, prioritizing, quantifying, reducing, or communicating uncertainty.

Motivations for conducting uncertainty analysis in groundwater sustainability evaluation
Conducting uncertainty analysis in groundwater sustainability evaluation is needed for at least five reasons. The first reason is that uncertainty analysis is a useful learning tool. Section 6.3 presents the typical questions that motivate adding uncertainty analysis to a groundwater sustainability evaluation study. Second, uncertainty analysis is an essential component of a scientifically defensible model. This is especially required and useful when models or modeling results are contested by stakeholders or in court cases. Womble (2017) shows the role of uncertainty analysis in US courts citing as an example the Fifth Circuit endorsement for probabilistic analysis holding that 'just because a Monte Carlo simulation produces a range of outcomes, rather than one single numerical value, does not mean it is speculative. If anything, Monte Carlo provides greater certainty than the basic alternatives.' Third, understanding and addressing uncertainty is a policy requirement for policy frameworks that adopt the concept of adaptive management (section 6.5). For example, Rohde et al (2017) note that Australia's adaptive management framework aligns with California's Sustainable Groundwater Management Act legislation as both require local agencies to reconcile knowledge gaps and uncertainties through acquiring new information by monitoring programs, and amending planning and management actions. Fourth, communicating, discussing, resolving, and making strategic decisions about uncertainty with stakeholders can be a very effective participation tool (Henriksen and Barlebo 2008, Guillaume et al 2012, Guillaume and El Sawah 2014. This increases the legitimacy of models, modeling results, and the decisions made using the models (section 7.0). Finally, uncertainty analysis is a necessary component in an effective groundwater sustainability evaluation.
Regarding the last reason above, two examples from groundwater sustainability literature help illustrate this point by showing the impact of parametric, conceptual, and scenario uncertainty on model predictions. Concerning parametric uncertainty, Delottier et al (2017) illustrate the importance of presenting calibration uncertainty in studies employing the common calibration practice of history-matching. The study presents a realistic synthetic model, and uses the PEST suite (Doherty 2016) to perform repeated calibrations with different starting values, yielding different calibrated parameter values. The results show two runs with similar objective function values, yet with very different parameter values. This is a clear illustration of equifinality, which is the non-uniqueness characteristic of an ill-posed problem. Then Delottier et al (2017) show that high parameter uncertainty leads to a high predictive uncertainty such that sustainable yield estimates for the two runs are 157 m 3 h −1 and >700 m 3 h −1 , respectively. Delottier et al (2017) show that even when the management model is well calibrated, it does not ensure that the model is reliable to make predictions for management purposes; furthermore, if such uncertainty is omitted this can lead to an unsustainable management policy. Delottier et al (2017) also show that using uncertainty analysis techniques, such as regularization based on expert knowledge and predictive uncertainty based on a linearized model (Doherty and Hunt 2009), can yield more reliable results. For example, the calibration uncertainty for sustainable yield (SY) is 139 < SY < 158 m 3 h −1 , and the true reference value is 157 m 3 h −1 . These results are obtained using PREDUNC tool of PEST suite (Doherty 2016), which is computationally efficient as it requires n + 1 simulations, n being the number of model parameters. By addressing conceptual and scenario uncertainty, Pholkern et al (2019) show that sustainable yield in the study area will vary by 140%-400% by 2045; the study shows that the difference in the increase in sustainable yield across models (i.e. four conceptual models representing geological and boundary conditions of uncertainty) can be as large as 214% given all climate scenarios (i.e. RCP 2.6, 4.5 and 8.5), and across climate scenarios can be as large as 76% given all models. Omitting uncertainty in this case means under-characterization of the groundwater resources that can impact the cost-benefit analysis of groundwater pumping (Harou and Lund 2008).

Purposes for conducting uncertainty analysis in groundwater sustainability literature 6.3.1. Examine model limitations.
Modeling is a process by which we communicate (what we think to be) our knowledge about the system, and uncertainty analysis is a process by which we communicate our incomplete knowledge about the system. In groundwater sustainability literature, several studies identified and discussed multiple sources of uncertainty without addressing all or any of them (e.g. El-Kadi et al 2014, Hu et al 2010, Lathashri and Mahesha 2016, Urrutia et al 2018, Piscopo et al 2019. As such, uncertainty analysis is an excellent tool for the modelers to understand and communicate model uncertainty and limitations. For example, Piscopo et al (2019) identify sources of uncertainty to caution that a proposed groundwater management plan is preliminary since uncertainties of the system are still hardly quantifiable. Also, communicating to endusers what the model can and cannot do reduces the risk of model misuse. For example, Hugman et al (2013) caution that conceptualizing karstic aquifer systems with a single continuum equivalent porous media may result in significant uncertainty when simulating smaller scale effects such as locations of well fields. Even if uncertainty is not quantified, a qualitative analysis of uncertainty can be informative (Gillespie et al 2012) and helpful to mitigate uncertainty (Gallardo et al 2009). For example, Gillespie et al (2012 propose two conceptual models representing local recharge or intra-basin flow conceptualizations and discuss the limitations of the two models with implications on aquifer sustainability. As another example, Gallardo et al (2009) discuss the model limitations and suggest a safety factor of 20% to account for uncertainty.

Generate scenarios to promote discussion.
Generating different predictive, explorative, and normative scenarios (Börjeson et al 2006) are the simplest and most commonly used approaches for uncertainty analysis. In the context of groundwater sustainability, these include pumping scenarios (e.g. Stigter et al 2009, Calderhead et al 2012, Hugman et al 2013, Lathashri and Mahesha 2016, Macewan et al 2017  While scenarios can be generated for multiple purposes, as discussed below, we first present the simplest case of generating scenarios, which is to demonstrate future possibilities and to generate discussion. For example, El-Kadi et al (2014) conduct a study in Jeju Island, South Korea to show that if an historical drought occurred in future, it would decrease sustainable yield by 16%, decrease spring discharge by 28%, and dry up 27% of springs in comparison to the baseline case. Several studies develop scenarios to advocate for the need of adaptation measures and new polices. For example, Urrutia et al (2018) present recharge scenarios due to climate change and pumping scenarios due to increase in population and mining activities in the Atacama Desert in northern Chile to demonstrate the need to use alternative water resources such as desalination to minimize the impact of the combined effects of economic growth and climate change on the aquifer. Similarly, Stigter et al (2009) demonstrate the combined impact from climate change and water demand increases on aquifer discharge and risk of ecological degradation in Algarve, Portugal; this is to point out to stakeholders the need to prepare societal and technical tools to alleviate these impacts, and to broaden the current definitions of sustainability in the study region. Other studies additionally advocate for the need for new approaches for decision support under uncertainty. For example, Passarello et al (2014) develop landuse change scenarios for an urban area in Texas to demonstrate to stakeholders the influence of urbanization and the implications of these scientific uncertainties on policy and urban water management decisions. Similarly, Gober et al (2010) integrate climate change uncertainty into formal decision analysis for water planning to offer insights into water planning in Phoenix, Arizona, and to demonstrate the need for new approaches to decision making under uncertainty.

Provide quantitative estimates about uncertainty.
As Kitanidis (2015) state: 'The sooner we arrive at this realization [that all models are wrong] the better, so that we can either apply a model appropriately or move on to the next step, uncertainty quantification.' To quantify the impact of parametric, model, and scenario uncertainty on model predictions is the most common task in uncertainty analysis. For example, Henriksen et al (2008) estimate the total exploitable groundwater resource of Denmark to be 1 × 10 9 m 3 /year and provide an uncertainty estimate for each regional estimate, which ranges from ±10% to ±40%. Uncertainty is ubiquitous in management models to evaluate groundwater sustainability. Groundwater modelers are typically faced with (1) complex subsurface heterogeneity, (2) state variables and parameters that are scale, spatial and time dependent, (3) data scarcity about subsurface geology, (4) uncertainty about top and inland boundary recharge and boundary conditions, and (5) computationally intensive numerical models that generally hinder uncertainty quantification. Despite these challenges, systematic investigation into the uncertainty quantification and its impact on decision support has been limited in groundwater hydrology (Heße et al 2019). This is especially true in groundwater sustainability literature. Advancements in uncertainty quantification in subsurface hydrology have been reviewed and discussed for model data integration (Rajabi et al 2018), uncertainty of subsurface characterization (Scheidt et al 2018), and conceptual uncertainty (Enemark et al 2019). However, the state-ofthe-art techniques and tools highlighted in these studies are not widely applied in groundwater sustainability numerical modeling studies. Deterministic groundwater models are often used, and uncertainty quantification in the model outputs is generally assessed using sensitivity analysis (Hu et al 2010, Huang et al 2012, Calderhead et al 2012, El-Kadi et al 2014, Pholkern et al 2019, and multiple deterministic conceptual models of the subsurface (Timani and Peralta 2015, Feng et al 2018, Pholkern et al 2019. Also, Lal and Datta (2019) use an ensemble surrogate model within a simulation optimization framework to address parametric uncertainty of hydraulic conductivity and porosity. Sources of uncertainty discussed in these articles regarding parametric and conceptual uncertainty include subsurface geology, hydraulic conductivity, anisotropy ratio, specific yield, recharge, boundary condition and fluxes, pumping rates and locations, riverbed hydraulic conductance, porosity, dispersivity, etc. Qin et al (2013) consider uncertainty from using a coarse model grid. Delottier et al (2017)    can be adopted to reduce the computational burdens. Other alternatives include using parallel computing (e.g. Elshall et al 2015) or surrogate models (e.g. Zhang et al 2013). More case studies that use these and similar tools are needed to advance the science of groundwater sustainability evaluation.
Groundwater models are only one layer of uncertainty in water-ecology-human models. Uncertainty analysis in water-ecology-human systems has received less attention. While several articles have discussed the uncertainty of human-water systems  (Reichert et al 2015), not much has been done in practice in the groundwater sustainability literature. Uncertainty of ecosystem services tied to groundwater sustainability is discussed in section 5.1.3 within the concept of adaptive management. The challenges and solutions of uncertainty quantification in ecosystem services are discussed by Hamel and Bryant (2017) and can be helpful to water-ecosystem models in hydrology. Regarding human activities, few of the reviewed studies consider several sources of uncertainty in their human-water models. These sources include weather factors (Valderrama et al 2011, Aistrup et al 2017), economic factors (Valderrama et al 2011, Susnik et al 2013, agent productivity parameters (Mulligan et al 2014), and human behaviors (Noel and Cai 2017). Using a phenomenological model, Guillaume et al (2012) conduct an uncertainty analysis of a dynamic coupled economic-groundwater model for groundwater sustainability evaluation. Sources of uncertainty considered in this study include allocation policies in future planning, processes of interest, rainfall variability, spatial pumping distribution, transmissivity and storativity of the aquifer, irrigation choices, agricultural price models of local conditions and crop yield parameters, relevant parameters for decisions, and model output uncertainty. Guillaume et al (2012) demonstrate how a variety of uncertainties in such a model can be addressed with a number of methods including propagation of scenarios and bounds on parameters, multiple models, block bootstrap time-series sampling, and robust linear regression for model calibration. Guillaume et al (2012) also provide an uncertainty typology for coupled human-water models, which can help advance this under-researched area.

Identify and prioritize sources of uncertainty.
Uncertainty analysis is a model diagnostic tool. For example, Tsai and Elshall (2013) develop a hierarchical Bayesian model averaging method that segregates uncertain model components; this is to comparatively evaluate the candidate propositions of each uncertain model component, to understand the individual contribution of each uncertain model component to the model prediction and variance, and to prioritize the contribution of each uncertain model component to the overall model uncertainty. Similarly, Dai et al (2017) develop hierarchical sensitivity analysis to identify important system processes under conceptual and parametric uncertainty. These and similar methods serve as a learning tool to advance knowledge about the model. These methods mainly involve combinatorial design to represent the uncertain model components. In groundwater sustainability literature, Calderhead et al (2012), for example, use multiple scenarios with combinatorial design to represent several uncertain model components, and show that the impact of climate change on recharge plays only a minor role in the occurrence of subsidence in Mexico City in comparison to pumping scenarios and groundwater export. Similarly, Pholkern et al (2019) in Northeast Thailand show that variable depths and thicknesses of the aquifer have a higher impact on sustainable yield estimates than model boundary conditions do. A main limitation of these and other similar studies (e.g. Hugman et al 2013, Unsal et al 2014, Zhao et al 2016 is that these conclusions are based on the comparative analysis of the results of different model components without accounting for the probability of these components and their interaction (e.g. Dai et al 2017). Feng et al (2018) use an attribution approach to study interaction between climatic and human impacts on groundwater sustainability for a coastal aquifer in northern China, leading to several insights related to aquifer function that call for strict regulations on groundwater pumping.

Data-worth analysis.
The identified important sources of uncertainty may be used to facilitate a data-worth analysis, which aims at designing data collection plans such that the expected benefit of new information exceeds its cost. In hydrology this is generally done using a Bayesian framework to identify new data locations (e.g. Pham and Tsai 2015, Neuman et al 2012 or types (e.g. Wöhling et al 2015). We did not identify a similar study in groundwater sustainability literature. However, Timani and Peralta (2015) in Utah use a multi-model simulation optimization approach to reconcile two disparate conceptual models that are contested among stakeholders, and show about a 25% difference of maximum perennial-yield. With the simulation optimization procedure for the two disparate models, Timani and Peralta (2015) identify field data that is most needed to resolve this conflict. Additionally, Li et al (2014) extend the dataworth to information-worth analysis using a numerical groundwater model with different representations of information about the aquifer and its risk of contamination. Li et al (2014) assess the effectiveness of aquifer monitoring information in achieving more sustainable use, showing that pumping rates differ when risk information that synthesizes data on aquifer conditions is provided to the users, and that the level of information about the state of the aquifer also effects extraction behavior. The study highlights the importance of contamination data, showing that pumping is significantly reduced in experiments where contamination is possible compared to those where the pumping cost is the only factor discouraging groundwater use.
6.3.6. Identify robust plans or designs given uncertainty. The reliability of a plan or design can change if uncertainty is introduced to the problem. For example, Chitsazan et al (2015) combine chance-constrained programming with Bayesian model averaging to assess the impact of geological structure uncertainty in groundwater quality control design in comparison to traditional chance-constrained programming; the study shows that considering parametric uncertainty alone overestimates the design reliability. Uncertainty analysis can help identify robust plans given uncertainty. Guillaume et al (2012) explain that uncertainty analysis provides an answer to the question: What if a model assumption is wrong?, and hence allows stakeholders to choose a policy with an understanding of the possible adverse impacts, or that will provide the desired outcome if the 'best assumption' is changed. In groundwater sustainability literature, several studies consider uncertainty analysis to identify robust plans given uncertainty (e.g. Guillaume et al 2012, Mulligan et al 2014, Uddameri et al 2014, Gohar et al 2019. For example, Uddameri et al (2014) use a fuzzy simulation optimization approach to identify a better policy to cope with the uncertainty regarding specifying desired future conditions due to incomplete understanding of the aquifer dynamics in South Texas. This fuzzy approach yielded lower estimates of groundwater availability in comparison to the crisp optimization scheme, as it accounts for stakeholders' uncertainty. Also, Mulligan et al (2014) compare two groundwater-use policies in California under the frame of productivity uncertainty, and use two modeling approaches to explore the effect of modeling assumptions on the projected performance of these polices.

Identify critical models or scenarios given uncertainty.
A critical model parameter realization(s), critical model(s) or critical scenario(s) refers to the ones that have the most influential effect on the solution depending on the desired reliability level (Kourakos and Mantoglou 2008). In groundwater sustainability literature, Wurl et al (2018) offer an analysis of hydrological resilience of a water-limited arid ecosystem in northern Mexico, under future pumping scenarios and changing climate conditions; the study aims to recognize water insecurity scenarios and to define appropriate actions towards more sustainable groundwater use through involvement of local stakeholders. The identified critical scenario or model can be then used for further analysis. For example, Ostad-Ali-Askari et al (2019) evaluate several pumping and agricultural practice strategies to restore the aquifer sustainability in the study area, given the identified critical scenario of climate change. Seward et al (2006) note that identifying which conceptual model to be examined must be done in consultation with all the stakeholders.

Gain deeper understanding about the problem.
While most of the abovementioned studies use uncertainty analysis as a learning tool to learn more about the model, the model solution, and the problem addressed by the model, other uncertainty analysis methods are specifically tailored to provide deeper insights about the problem that the model addresses. For example, Susnik et al (2013) compare system dynamics modeling and object-oriented Bayesian networks modelling to support groundwater management decision in the Kairouan aquifer system, Tunisia. System dynamics modeling (e.g. Calderhead et al 2012, Susnik et al 2013, Fernald et al 2015 implicitly accounts for uncertainty (and probabilistic uncertainty can be incorporated into it), while Bayesian networks are mainly a probabilistic framework. By comparing these two modelling paradigms, Susnik et al (2013) show that system dynamics modeling is a cyclic approach that allows the user to discover potentially hidden dynamics in a system by simulating non-linear feedback processes, while Bayesian networks is an acyclic approach that incorporates the variability and uncertainty in every single variable with probabilistic outputs for key variables. Susnik et al (2013) define a hidden dynamic as a behavior that emerges due to the interaction of all model components, which is not necessarily apparent from studying each model element independently. The study finds that the analyses of both models agree, indicating current overexploitation of the aquifer, and that pumping reduction offers the best solution to end aquifer overexploitation. However, the study notes that system dynamics modeling has the potential for stakeholder collaboration, while Bayesian networks can be overly complex as understanding of probabilistic distributions may not be straightforward for stakeholders.

Multi-narrative solution
Estimating sustainable yield is difficult because our scientific knowledge about complex groundwater systems is uncertain and because societal preferences are difficult to elicit and may be conflicting (Reichert et al 2015). Thus, a fundamental question in water resources decision support is how to present scientific knowledge, which has stirred ample discussion in the literature (Kitanidis 2015, Wheater and Gober 2015, Nearing et al 2016, Beven 2016. We argue in favor of an epistemic modeling perspective (Christakos 2004, Williamson 2005, Elshall and Tsai 2014, Reichert et al 2015, which acknowledges that models describe an incomplete knowledge about nature and focus on knowledge synthesis. Reichert et al (2015) define epistemic interpretations as using 'probabilities to quantify human knowledge or belief ' as opposed to objective interpretations that 'use probabilities to describe features of the material world that are independent of humans.' In other words, probability is interpreted as extension of Aristotelian logic from a proposition to be merely either false or true, to a realm of inductive reasoning in the presence of uncertainty (Jaynes 1990(Jaynes , 2003. In addition, acknowledging incomplete knowledge entails that there exist other valid alternative hypotheses. Thus, an epistemic perspective would naturally adopt a probabilistic multi-hypothesis modeling framework that uses probability as a means of inductive reasoning in the presence of uncertainty. As such, probability is interpreted as the direct measure of our degree of belief on a parameter estimate or a model, given data.
In practice, an epistemic multi-model perspective has several advantages as follows: 1. From a model selection perspective, by presenting a single understanding of the problem (single-model) we increase the risk of being subject to type I error that is a rejection of a true null hypothesis, or type II error that is the nonrejection of a false null hypothesis. 2. From a model averaging perspective, a single model may fail to capture the crucial characteristics of the problem (Guillaume et al 2016). 3. From a model combination perspective, a probabilistic multi-model ensemble can potentially make better predictions than a multi-model ensemble (i.e. multiple deterministic models) or a single-model ensemble (i.e. a single model with multiple realizations). The latter two can potentially make better predictions than a single realization of a single model (i.e. a deterministic model). 4. From a transparency perspective, adopting a probabilistic multi-hypothesis modeling framework provides multiple storylines for the problem at hand, and thus gives confidence and more room to the modelers to express alterative opinions. 5. From an epistemic perspective, evaluating multiple models against observation data is a learning process about our science (e.g. Tsai 2014, Zhang et al 2014) and about the decision process (e.g. Tsai 2015, Wöhling et al 2015). Given observation data and a probabilistic multi-model ensemble, Elshall and Tsai (2014) show that bad ideas can be eliminated, and good ideas will remain. Additionally, the remaining good ideas can be averaged given their probability to increase prediction ensemble reliability, accuracy, and precision (Elshall et al 2018b). Note that this approach does not contradict the approach of using a single-model ensemble to account for parametric uncertainty with embedding model error in the likelihood function (Elshall et al 2019) or through other means such as error modeling (Xu et al 2017). In the formal approach of using a probabilistic multi-model ensemble we try to improve the model structure and in the later approach of using a single-model ensemble with total error we try to improve the data model. 6. From an empirical perspective, an epistemic stance acknowledges upfront that our degree of belief on a parameter estimate or a model is conditional on the available data, and thus is subject to update as new data becomes available. This emphasizes that our solution is following a developmental path from an initial state rather than a teleological path toward a final state (Elshall and Tsai 2014). 7. From a stakeholder engagement perspective, with such an iterative process in which we keep updating our knowledge given possible alternatives, the objective of modeling would change from providing 'the answer' to building 'knowledge partnership' between researchers and stakeholders (Guillaume et al 2016), in addition to providing multiple storylines through uncertainty analysis. Watson (2005) notes that we should go a step further to show the consequences of these different storylines through risk assessment or other means, to demonstrate the importance of uncertainty analysis to stakeholders. 8. From a communication perspective, using multiple models tends to create trust, as noted by Ferré (2017), since there may be a tendency to distrust scientists who present 'the answer' as it runs counter to our mutual experience of the inherent uncertainty of natural systems.
In summary, such careful uncertainty analysis when using a single-model ensemble, multimodel ensemble or both (probabilistic multi-model ensemble) is particularly critical when managing an important common-pool groundwater resource.
In conclusion, without uncertainty analysis we will be at a higher risk of arriving to incorrect conclusions, which can further lead to undesirable decisions. Uncertainty analysis is a useful learning tool to disentangle, understand, and improve model predictions. There is a gap between recent advances in uncertainty analysis and current practices. This gap is not necessarily due to the lack of availability nor the expensive computational cost of the uncertainty methods. This gap can also be attributed to other factors such as the lack of educational resources. For example, with few exceptions (e.g. Caers 2011, Doherty 2015), we are not aware of a practical step-by-step textbook about uncertainty analysis in groundwater hydrology with a suite of tools and examples. Kitanidis (2015) notes that 'given the importance of this topic [of uncertainty quantification], it is somewhat surprising that this topic has not received more attention.' More attention is especially needed to transfer advances in uncertainty analysis to end-users. Also, Guillaume et al (2012) emphasize the importance of careful thinking about how to communicate uncertainty to end-users and to facilitate their use of the information to reduce decision risk. Yet this is another under-researched area that requires further attention.

Adaptive management
Adaptive management is an effective means of management when many gaps in knowledge and uncertainty abound as with the case of ecosystem services (section 5.3.5). This learning-by-doing approach is especially needed for sustainable groundwater management due to the often inadequate characterization of the involved groundwater resources, ecosystem services, and human activities. Adaptive management is also a means to account for our known unknowns, and unknown unknowns. Maimone (2004) provides a detailed discussion about this topic, and argues that adaptive management is the only viable approach to dealing with knowledge uncertainty and the variability of societal attitudes towards groundwater resources. In addition, being aware of the knowledge limitations and uncertainties can advance an interconnected systemsapproach to groundwater management. For example, Sophocleous (2000) illustrates the concepts of adaptive management and the interconnected systemsapproach in Kansas, showing how this leads to the formation of local groundwater management districts, the adoption of minimum streamflow standards, the modification of safe yield policies in some districts, the implementation of integrated resource planning, and the application of sub-basin management in potential problem areas. Similarly, Levangie (2008) shows that moving from fixed to adaptive management led to the transition from safe yield policy for water supply to a sustainable yield policy for water supply and environmental sustainability in the study area in Massachusetts.
Adaptive management is a policy instrument to address uncertainty, and can be planned ahead to reduce uncertainty. For example, Piscopo et al (2019) present an adaptive management workflow to evaluate groundwater sustainability for a hydrothermal area in Italy that starts from the available hydrogeological knowledge to consider the unknowns of the system. The workflow includes stakeholder participation, the development of an integrated plan subject to annual review along with the constant update of the numerical model, and groundwater quantity and quality monitoring system for model update, validation, and uncertainty analysis. Piscopo et al (2019) note that the combination of monitoring and modeling will allow water resource managers and stakeholders to review the management policy based on what is known and unknown about the system, and to dynamically adapt any decision to the variable socioeconomic and environmental conditions. Similarly, Seward et al (2006) argue that our ability to predict the impacts of groundwater pumping on surface water and ecological systems is highly imperfect, and suggests that the way forward is to accept the complex, difficult-to-predict characteristics of aquifer systems, and to build management strategies around those characteristics that are adaptive rather than rigid command-and-control management. This practice is being implemented at different levels at many places worldwide (e.g. Sophocleous 2005, Allan 2008, Ross and Martinez-Santos 2010, Curran and Mascher 2016, Ha et al 2018b, Seward and Xu 2019, Thomas 2019. For example, the Water Resource Protection Plan of Hawaii (CWRM 2019b), which provides the sustainable yield estimates for the state of Hawaii (CWRM 2019a), is updated about every five years based on monitoring programs, advances in groundwater research and modeling in Hawaii, and participation, among other potential advances.

Participation and groundwater sustainability
The term participation here refers to any level of stakeholder involvement in the planning, modeling, and management of water resources. A stakeholder is a person or entity (e.g., water authorities, nongovernmental organizations, or community members) with an interest or concern in something. Stakeholder participation can add saliency, credibility, and legitimacy to scientific assessments, which may lead to more effective and readily adoptable water management decisions (Cash et al 2003, White et al 2010, Heink et al 2015. Saliency entails demand-driven science that provides what is needed at that the time and place; credibility emerges from the technical merits and quality of science that is generally evaluated by peer and external review along with expert consensus; legitimacy refers to an inclusive, comprehensive, and fair process. For the products of science to gain legitimacy for policy implementation, they must emerge from an iterative, collaborative, and bidirectional exchange between stakeholders (Carr et al 2012, Brown et al 2015, Wheater and Gober 2015. Several examples in the literature show that legitimacy is the number one predictor of whether the science products are used in decision making, with legitimacy stemming from how involved the stakeholders were in the scientific assessment (Mckenzie et al 2014, Bremer et al 2015, Posner et al 2016. Additionally, participation is shown to improve the credibility and saliency of science products, which can lead to more effective solutions being identified and adopted (Watson 2005). This is mainly because participation involves tapping into institutional and traditional knowledge, exchange of experiences, deeper understanding, consensus building, and raising commitment toward resource management, among other advantages (Carr et al 2012, Mays 2013, Castilla-Rho 2017. The International Union for Conservation of Nature (IUCN) suggests that sustainable groundwater management requires that 'users participate in the design of governance, incentive schemes and management interventions-otherwise groundwater management will remain a top-down, technocratic activity with unsatisfactory results ' (2016, p 20).
Participation is a basic component in the scientific evaluation of groundwater sustainability. Participation is an integral policy component in several groundwater regulations, such as the Australia water reform agenda (e.g. Tan (2019) show top-down administrative decisions to achieve given sustainable outcomes in Spain have resulted in partial failures, whereas stakeholder consensus can lead to better outcomes. Similarly, Knuppe and Pahl-Wostl (2011) develop an aquifer governance framework to analyze groundwater sustainability at a basin level in Spain, showing that conflict occurs from the one-way communication between official authorities and excluding local stakeholders during the planning processes. On the other hand, evaluation of groundwater sustainable planning in Australia shows that the interaction between decision makers and the public has much to offer when applied to questions that have been developed collaboratively, allowing for implementation of findings. Moreover, participation is an inevitable means to reduce uncertainty in groundwater sustainability as shown by Guillaume et al (2012). This can be equally true for transboundary aquifers. For example, Leduc et al (2017) states that dialogue between local stakeholders, water managers, and researchers seems to be the only way to avoid or alleviate the serious threats to Mediterranean groundwater resources. To enhance the sustainability of the High Plains aquifer in Kansas, Sophocleous (2012) recommends the formation of an interstate groundwater commission along the lines of the Delaware and Susquehanna River Basins Commissions in the U.S.
We mainly focus here on participation studies related to groundwater sustainability, even though communication with water authorities and community members can be done regularly during groundwater management research. Specifically, while there are several groundwater sustainability studies that involve participation (e.g. Rinaudo et al 2016, we focus on the case studies that are reviewed in section 5, which are either case-specific studies, or cross-case studies. A case-specific study would generally include a hydrological modeling component. A cross-case study could be a research or review article for a certain geographic region covering multiple case-studies, summarizing research results, or presenting success or failure stories. In the reviewed case studies in section 5, we identified 26 articles that discussed participation. In addition, from our record search (section 4) we identified 11 articles that provide conceptual discussion on participation within the context of groundwater sustainability, and few case studies about social learning.

Levels of participation
Although identifying stakeholders and defining avenues for user participation is largely contextual (Carr et al 2012, Kusters et al 2017, participation in the identified 26 articles could be considered on three levels of engagement. The first level is to produce demand-driven scientific assessment based on user needs to seek assistance in solving real-world problems within a community or region. This would ensure that the science products such as management cases and future scenarios are designed according to the users' needs and priorities. Most of the reviewed modeling studies to evaluate groundwater sustainability (section 5) can be considered to be demand-driven, yet only few studies explicitly mention that (Gallardo et al 2009, Sheng 2013, Alcala et al 2015. While Gallardo et al (2009) andSheng (2013) mention that their research findings will be used by stakeholders for groundwater management, Alcala et al (2015) study the Amtoudi Oasis in southern Morocco/northern Sahara and find that low financial capability and technical feasibility in northern Sahara prevent the implementation of actions proposed in their study. At this level of participation, the stakeholders could be aware or unaware of the ongoing research, and could be interested or uninterested in the research.
The second level of participation is stakeholder engagement through collaborative model development and characterization of the consequences of alternative options (Brown et al 2015, Basco-Carrera et al 2017. We identified 14 studies that discussed some level of involvement between researchers and stakeholders for model or plan development, which is often referred to in the literature as collaborative modeling or participatory modeling. Using the Basco-Carrera et al (2017) classification, figure 12 shows the level of participation implemented or called for in these 11 articles. Additionally, figure 12 shows two articles (Sheng 2013, Alcala et al 2015 that do not involve any form of collaborative modeling or participatory modeling for the purpose of illustration. The articles shown in figure 12 are either case-specific studies or cross-case studies. Participatory or collaborative modeling is important in groundwater sustainability evaluation. Refsgaard et al (2010) state that developing integrated information systems that include quality assurance and uncertainty information to facilitate active stakeholder involvement and learning is one of the four key scientific challenges facing sustainable groundwater management in Denmark. Henriksen et al (2008) Figure 12. The levels of engagement between researchers and stakeholders that are discussed or adopted by studies related to groundwater sustainability, using the Basco-Carrera et al (2017) classification. study in Denmark stresses the importance of collaborative modeling, stating that if stakeholders are engaged in a design process that involves making consensus decisions, then the chance that they will accept the research outcomes is significantly increased. Similarly, Sophocleous (2010) discusses that unlike previous modeling efforts in Kansas, new models developed under the groundwater-availability modeling program have had substantial stakeholder involvement; this yielded highly successful outcomes for providing appropriate and publicly available tools for regional water planning, raising stakeholder awareness of groundwater modeling, and promoting the importance of groundwater management (van Kelley et al 2008, Sophocleous 2010. While the aforementioned studies mainly focus on key stakeholders such as the water managers and regulators, other studies widen stakeholders to include public participation. This is towards the goal of achieving social learning. Because sustainable groundwater management interacts with multiple systems processes and subsequent uncertainty stemming from such complexity, social learning lies at the intersection of public engagement, scientific assessments, and decision making (Gober 2018). This need was earlier echoed by Sophocleous (2000): It is imperative that the community at large participates in policy formulations and in judgments of what is to be sustained. Strong public education and outreach programs are needed to improve understanding of the nature, complexity, and diversity of groundwater resources, and to emphasize how this understanding must form the basis for operating conditions and constraints. This is the only way to positively influence, for the long term, the attitudes of the various stakeholders involved. Pressure from the community for better management of our natural resources will be the main driving force for most changes. This is further emphasized by Sophocleous (2005) and Tuinstra and van Wensem (2014) that sustainable groundwater management cannot be achieved without an aware and involved citizenship. These are citizens who are aware of the value of their water resources and their potential strength, weakness, uses, and threats. Social learning is a means to build capacity to learn and respond to ongoing and complex water systems problems. Typically, water users will put pressure on key stakeholders such as water regulators and managers, and key stakeholders will voice these concerns to researchers. To better achieve social learning, researchers can engage in bi-lateral information exhange with key stakeholders, community members, or both. For example, Fernald et al (2015) in New Mexico worked with the communities to develop an understanding of the sociohydrological system function using causal loop diagrams, which form the basis for modeling future scenarios to identify thresholds and tipping points of groundwater sustainability in the study area. Piscopo et al (2019) involved both key stakeholders and community members to evaluate groundwater sustainability of a hydrothermal area in Italy. To this end, Piscopo et al (2019) conducted consultations and interviews with water managers and groups of citizens who have diverse and conflicting preferences to determine the trade-off between thermal water use and spring discharge at that study area.
When participatory or collaborative modeling extends beyond key stakeholders, the selection of stakeholders is not a trivial task. Wurl et al (2018) studied groundwater sustainability for agricultural activities in the Valley of Santo Domingo, Mexico, where the stakeholders were selected following guidelines from IFC (2007) with the following criteria: '(1) they represent a particular community or an important subgroup of population of the Santo Domingo Valley; (2) they would provide technical knowledge and/or essential information to the process; (3) to ensure the coherence of the project; (4) to ensure the application of the project, (5) because they are holders of rights in the project area.' The selection of stakeholders is contextual, and it may not be limited to targeted stakeholders. For example, Jorgensen et al (2017) mention that public participation in groundwater management in Denmark primarily has the character of information and consultation procedures such that draft implementation plans, generally at a regional scale, are made publicly available to all citizens for comments invited within a stipulated time. Jorgensen et al (2017) further state that in the recent years, especially under the EU Water Framework Directive, Denmark is testing different methods of stakeholder involvement to get citizen input and make citizens more directly engaged in, and committed to, groundwater management processes and decision making. Similarly, Seward (2010) voices the need for further research and practical testing to formulate a more structured approach to public participation and adaptive management to better operationalize the South Africa Water Act.

Participation methods and tools
Formulating more structured approaches to public participation is needed for effective scientific evaluation of groundwater sustainability. Because of the heterogeneity of people and their interests and perceptions, societal preferences are harder to tackle than individual preferences (Reichert et al 2015). This can be even more challenging, as noted by Leduc et al (2017), when water territories are too vast, individual interests too divergent; when there are individuals and firms that exploit resources for short-term profits; and when individual profit prevails over the need to preserve a common resource.
Elicitation of intersubjective societal preferences can be done through surveys, public comment and vote, interviews, and similar tools (e.g. Fernald et al 2015, Sanderson and Curtis 2016, Jorgensen et al 2017, Piscopo et al 2019, Rudestam et al 2018, Wurl et al 2018. Summary assessments can then be feed into agent-based models (Mulligan et al 2014, Wada et al 2017, Roobavannan et al 2018, which are models of local human behavior. For example, Tan et al (2012) discuss the challenges of adaptive management and social learning in Australia in groundwater sustainability planning. Studies also discuss building community confidence by using the best available science with tools such as agent-based participatory modelling, deliberative multi-criteria evaluation, social impact assessment, and groundwater visualization models, and with good-practices in indigenous engagement (Mackenzie et al 2012, Jackson et al 2012.  conclude that interactive tools with high visual impact are consistently rated highly by both indigenous and non-indigenous community members and water planners. Yet, Tan et al (2012) observe that due to the inherent politicized risks in water planning it is safer and easily manageable to follow current methods of public participation such as information-giving and allowing written submissions.
With respect to modeling studies, both Castilla-Rho (2017) and Guillaume and El Sawah (2014) discuss iterative stakeholder engagement throughout the model development process. Castilla-Rho (2017) focus on participatory agent-based modeling as a proposed means by which to inform decision-making and to understand competing stakeholder objectives, similar to an informative game. Whereas, Guillaume and El Sawah (2014) offer an iterative methodology for engaging stakeholders throughout the groundwater model development process as a mean for information exchange. The most comprehensive use of participation in modeling is described by Baldwin et al (2012) in a similar process to engage stakeholders in decision-making in the Tiwi Islands of the Northern Territory of Australia. All the aforementioned models discuss the importance of visualization in the successful use of modeling products for knowledge exchange, and employ the most comprehensive means by which to engage stakeholders.
Other studies do not necessarily use groundwater models or visualization tools as a means to engage stakeholders in decision-making. The method of engagement as discussed by Manda and Klein (2014) include qualitative analysis via interviews and archival data as a mediation process to solve an impasse in policy development. Comparably, Wurl et al (2018) approach stakeholder engagement through the lens of sociohydrological resilience, and use dialogue, public meetings, and surveys to assess resilience and involve stakeholders in scenarios for future aquifer management in the Santo Domingo Valley of Mexico. Rudestam et al (2018) adopted an approach that is based on ethnographic observation and interviews with groundwater users to elicit the social character, economic interactions, and dominant understandings of culture and community to define the relational values of the place. Molina et al (2012) develop a novel method of calculating a Social Sustainable Aquifer Yield (SSAY), which they demonstrate in the Jaen province of Spain. The SSAY calculation incorporates the average perception of the maximum aquifer exploitation expressed by stakeholders, which they acquire through a survey. In addition, other approaches can be employed in areas where participation is limited. For example, Faysse and Petit (2012) describe a case study in the Chaouia coastal region of Morocco where they propose a social learning process as a means of counteracting weak governance characterized by weak interactions between groundwater users and managers. The study looked at a situation where dialogue was initiated to understand if social learning was an outcome. Although some barriers to communication were overcome, without a long-term outlook for dialogue, it is difficult to classify this case as social learning.
In summary, our systematic review shows that an increasing number of articles discuss stakeholder participation in the context of groundwater sustainability. More case studies of stakeholder engagement that are directly linked to model development and prediction are particularly needed. More cases studies that implement recently developed conceptual frameworks (e.g. Guillaume and El Sawah 2014, Castilla-Rho 2017) for integrating participation with groundwater modeling are needed for proof-of-concept. Finally, Mitchell et al (2012), through their comprehensive literature review, note that 'upon critical analysis, we concluded that much of the literature identified during our research lacked an adequate foundation in social theory or was not based on sound research methods.' Thus, care should be taken in the use of participation methods to be grounded in social theory by engaging with scholars in social sciences, for example.

Conclusions
As a dynamic policy instrument, sustainable groundwater management balances water use and development with a changing society, environment, and climate. This article discusses a collective approach to groundwater sustainability policy development and implementation, and recognizes that science alone rarely leads to direct policy outcomes, especially where scientific findings are contested. Even in the face of strong debate, science can help to inform policy, provided that studies are salient to the policy challenge, involve decision makers in the scientific process, and that results are communicated effectively and viewed as credible (Cash et al 2003. This article also shows that even when a welldesigned policy is in place, as in the case of Hawaii, the science required to capture the dynamics and complexity of hydrogeology and its dependent ecological and human systems is only beginning to be established. Nevertheless, integrated transdisciplinary groundwater management approaches that closely tie science to policy (or vice versa) are rapidly emerging. However, there is a clear need for more transdisciplinary research and case studies addressing the effective development and implementation of groundwater sustainability policy based on multiprocess modeling, multi-narrative solution, and participation.
Addressing multi-process modeling requires continuous improvement of existing groundwater modeling frameworks (e.g. Henriksen et al , Guillaume et al 2012 to better incorporate ecosystem services and human activities. It additionally requires developing new groundwater frameworks (e.g. Castilla-Rho et al 2017, 2019) that align with emerging calls in the hydrology community to frame water security and sustainability beyond just the foci of water quality and quantity to better understand possible co-evolving scenarios between water systems, ecosystems, and society (Montanari et al 2013, Thompson et al 2013, Vogel et al 2015, Ceola et al 2016, Wada et al 2017. A toolbox to address groundwater sustainability at different levels is needed since there is no 'one-model-fits-all' solution. Regardless of the modeling framework, this article illustrates that the incorporation of natural, engineered, societal, and institutional systems into an integrated modeling framework is gradually evolving in the groundwater sustainability literature to keep up with emerging policies that call for these integrations. We review two hydrological modeling approaches for estimating groundwater sustainability, which address the debate over the relative reliability of phenomenological models in comparison to numerical models. Selection of the appropriate modeling approach is case-specific and depends on the available data, aquifer type, sustainability factors of interest, and should be guided by the law of parsimony (Voss 2011a, 2011b). While numerical models are more useful and accurate, simple phenomenological models can be especially useful when there is insufficient site-specific data to develop a high-fidelity numerical model with more mathematical and geological realism. Our analysis shows that hydrological modeling, with respect to surface water and groundwater interaction, is more mature than ecosystem services modeling and human activities modeling. While both ecosystem services modeling and human activities modeling are both emerging, tools for managing groundwater dependent ecosystems are not yet well defined, and calls for more adaptive management approaches.
Decisions on water resources will be made, whether accounting for the uncertainty of our scientific knowledge or not. One of the roles of science is to reduce errors and their cost. How much investment in scientific knowledge and monitoring is needed in a given case to reduce uncertainty depends on the current and future costs of these errors to humans and the environment. Addressing the inherent uncertainty associated with both the natural and societal aspects of complex and dynamic groundwater systems requires developing innovative multi-model approaches to provide multiple narratives about the problem solution (Ferré 2017b), and to effectively communicate uncertainty to end-users and stakeholders in a way that would help them to make better decisions (Guillaume et al 2012). This requires working with stakeholders through collaborative modeling and adaptive management to better characterize, and sustainably manage the groundwater resources. While technical advancements in uncertainty analysis are still developing (especially with respect to handling the high computational cost of groundwater models, multi-disciplinary subsurface characterization and uncertainty quantification, and handling multifaceted uncertainty of water-ecology-human systems), existing methods and tools are not fully utilized in the groundwater sustainability literature. What seems to be lacking is the mainstreaming of these tools to end-users. Additionally, it is of great significance to make the end-user aware of the importance of uncertainty analysis, and the existence of these tools.
The degree of participation in the science-policy process may be the most critical piece, yet the most difficult to design and implement due to tight budgets, time constraints, or the absence of clear structured approaches. It is an essential and nontrivial component, not only to resolve conflicts, but also to identify the strengths, weaknesses, opportunities, and threats related to groundwater sustainability. Within the scientific community, greater collaboration among physical scientists, social scientists, groundwater managers, and policy makers is necessary to develop this aspect of the groundwater sustainability evaluation process. Such collaborative relationships between researchers and key stakeholders can generally be easier to establish than public participation. Testing and evaluating different methods for increasing public participation in groundwater management to achieve social learning is an active research area (e.g. Tan et al 2012, Jorgensen et al 2017). To get public input and make citizens more engaged in groundwater management is particularly important, since sustainable groundwater management cannot be achieved without wellinformed, perceptive, and involved citizens.

Acknowledgments
This work is funded by U.S. National Science Foundation (NSF) Award # OIA-1557349. The fifth author is funded by U.S. NSF EAR 1828827. The authors are very grateful to Tom Gleeson, Anita Milman, and two anonymous reviewers, who helped to significantly improve the manuscript.

Supplement and data availability statement
The data that support the findings of this study are openly available (Elshall 2020). This includes the Jupiter Notebook that has the supplement data, method and code, which can also be accessed from https://github.com/aselshall/SYReview/blob/master/ Supplement%20.ipynb