GEB v0.1: A large-scale agent-based socio-hydrological model – simulating 10 million individual farming households in a fully distributed hydrological model

. Humans play a large role in the hydrological system; for example, by extracting large amounts of water for irrigation, 10 often resulting in water stress and ecosystem degradation. By implementing large-scale adaptation measures, such as the construction of irrigation reservoirs, water stress and ecosystem degradation can be reduced. Yet we know that many decisions, such as the adoption of more effective irrigation techniques or changing crop types, are made at the farm level by a heterogeneous farmer population. While these decisions are oftenusually advantageous for an individual farmer or their community, detrimentalaggregate effects of those decisions are frequentlyalso experienced downstream.Similarly, decisions 15 made by other stakeholders, such as governments often have basin-wide effects, and affect farmers differently. Therefore, to fully comprehend how the human-natural water system evolves over time and space, and to explore which interventions are suitable to reduce water stress, it is important to consider human behaviour and feedbacks to the hydrological system simultaneously at the local household and large basin scales. Therefore, we present the Geographical, Environmental and Behavioural model (GEB), a coupled agent-based hydrological model that simulates the behaviour and daily bi-directional 20 interaction of up to ~1012 million individual farm households with the hydrological system on a personal laptop. GEB is dynamically linked with the spatially distributed grid-based hydrological model CWatM at 30’’ resolution (< 1km at the equator). Because many small-holder farmer fields are much smaller than 1×1 km, CWatM was specifically adapted to implement dynamically sized hydrological response units (HRUs) at the farm level, providing each agent with an independently operated hydrological environment. While the model could be applied anywhere globally at both large and small 25 scales, we explore its implementation in the heavily managed Krishna basin in India, which encompasses ~8% of India’s land area and ~11.1 million farmers. Here, we show how six combinations of storylines with endogenous and exogenous drivers of adaptation affect both the hydrological system and the farmer population.12.1 million farmers.


Introduction
Water stress, defined as water demand exceeding water availability, is an increasing threat to human livelihood through, for example, decreasing agricultural yields, insufficient water for drinking and sanitation, and degrading ecosystems (Ablo and Yekple, 2018; van Leeuwen et al., 2016;Porporato et al., 2001;Kummu et al., 2016).A growing number of regions are expected to experience severe water stress in the future, largely driven by an increasing population and climate change (Kummu et al., 2016;Veldkamp et al., 2015).Effective water management can help to reduce water stress but requires knowledge of the current status of water resources, socio-economic development, climate change, and the effects of interventions (Ibisch et al., 2016) on upstream and downstream water availability (Veldkamp et al., 2017).Therefore, hydrological models, which simulate the hydrological system, are a widely used tool to provide an integrative vision of the system and formulate effective policies.
Humans play a large role in the hydrological system.For example, governments and other organizations construct reservoirs (Biggs et al., 2007) and channels for inter-basin transfers (Gupta and van der Zaag, 2008), disrupting natural flows.Smallscale adaptations, such as groundwater pumping (R. and P., 2005), rainwater harvesting (Li et al., 2000), changing crop use (Kuil et al., 2018) and irrigation practices (Nouri et al., 2019b;Mollinga, 2003), are often realized at the individual or communal level.While these measures are usually beneficial for some, adverse effects can be experienced by other water users across different scales (Di Baldassarre et al., 2021).In addition, the costs and benefits of water stress-related interventions may vary throughout a heterogenous farmer population.
To fully comprehend how water stress develops over time and space and to explore which interventions are suitable to reduce water stress, it is important to understand feedbacks in a coupled human-natural system simultaneously local household-and large basin scale.For example, when water is extracted upstream, water availability downstream can be reduced (Veldkamp et al., 2017).For example, farmers at the head-end of a command area can have access to a larger and more reliable water supply than tail-enders (Mollinga, 2003), which incentivizes head-end farmers to adopt water-intensive high-return crops, reducing water availability downstream (Wallach, 1984).Similarly, upstream famers that invest in rainwater harvesting techniques reduce the amount of water available downstream (Bouma et al., 2011).Another example is through increased groundwater use, where individual well users lower the groundwater table in a larger region (Llamas. and Martínez-Santos, 2005).While some farmers have the resources to invest in deeper wells, other farmers are driven further into poverty (Batchelor et al., 2003).
While most hydrological models are well-suited to simulate the hydrological system at a large scale, they treat small-scale human behaviour rather simplistically and homogeneously.In these models, humans often do not learn over time and do not change their adaptive behaviour under changing water risk (Aerts et al., 2018).In reality, agents adapt to changes in their environment and also respond to each other (Wens et al., 2020).For example, a water pricing tax by the government has a direct influence on household water use, and farmers might construct wells in response to drought events.An agent-based model (ABM) appears to be an effective tool that can be used to simulate these complex heterogeneous behaviours and feedbacks.Therefore, the research realm of "socio hydrology" has developed models that dynamically couple hydrological and agent-based models to better simulate the hydrological system as well as the behaviour of individual heterogeneous agents.
Using such a coupled model allows tracking changes in the natural system (e.g., the effect of changes in climate) or changes in the human system (e.g., government policies or adaptation behaviour) through the entire human-natural system.For example, drought events can accelerate adaptation behaviour making farmers more resilient to the next drought.At the same time, such adaptation behaviour can negatively influence water storages (e.g., increased groundwater extraction; Streefkerk et al., 2023).Simulating such feedbacks, requires a coupled model.
where (i) (ii) In general, two approaches can be differentiated in where a hydrological component is added to an agent-based model: (i) adding a hydrological component: using an agent-based which is also agent-based (e.g., river segments are represented as agents which exchange water) or (ii) adding a traditional hydrological component.model,thus a fully distributed gridded model where water flows from once grid cell to another based on the kinematic wave equation.(ad.i) In the first agent-based approach, all the environmental components, such as river segments, are simulated as agents.For example, Becu et al. (2003) simulate farmers, irrigation behaviour, and crop and vegetation dynamics.Their model uses a simple routing scheme that considers water abstraction and water diversions by canal managers.Another example is Huber et al. (Huber et al., 2019), who created a basin-scale coupled model where water flows downstream from a hydrological river agent to another river agent, while other agents such as farmers or water managers can abstract water from the river.In this approach, the hydrological component is usually relatively simple, largely because authors usually build the hydrological component from scratch.(ad.ii)The othersecond -hydrological model approach -is to couple an agent-based model with a more traditional hydrological model by allowing the agents to interact with its water storage.(Streefkerk et al, 2023).For example, the widely used MP-MAS (Schreinemachers and Berger, 2011;Arnold et al., 2015) is coupled to WASIM-ETH, a fully distributed hydrological model.Van Oel et al. (2010) published a larger coupled grid-based model at a 270m resolution that simulates the irrigation behaviour of individual farmers in a large basin using a grid size of 270x270m.This approach also benefits from ongoing methodological progress in hydrological modelling (Bierkens, 2015).Large-scale hydrological models are run at an increasingly higher resolution, while other advances, such as HydroBlocks, allow us to effectively combine grid cells into hydrological response units (HRUs) while retaining the ability to accurately simulate the hydrological system (Chaney et al., 2016), including dynamic routing (Chaney et al., 2021).
ManySome agent-based models with a hydrological component were released using these methods.Some of these models simulateu simulating water management decisions by groups of people, such as sectors or villages (e.g.Huber et al., 2019, Streefkerk et al., 2013).Other agent-based models represent single water users, such as a person or household (Schreinemachers and Berger, 2011;Wens et al., 2020;Becu et al., 2003;Arnold et al., 2015).These models are better suited to simulate individual adaptation pathways, which are often paramount in capturing the heterogeneity of the farmer population (Wens et al., 2020;e.g., Bert et al., 2011;Tamburino et al., 2020).Yet, to simulate the effect of a single agent on the hydrological system, at least one HRU per agent is required to properly represent system feedback at the individual level (Schreinemachers and Berger, 2011).Using a gridded model, this means that the grid cell size cannot be larger than the smallest farm.This requires a large computation time and computational resources, especially in regions with small holder farms.So far, this has limited the ability of coupled models that capture the full heterogeneity of the agent population to be effectively applied on a large scale.
We propose to resolve this issue by simulating hydrological processes at field scale using dynamically sized HRUs within a grid cell, with each HRU representing a single farm.Each farm-level HRU can be individually operated by an agent.This way, each individual crop field is simulated as an HRU in addition to other land use types.Due to their dynamic size (e.g., one unit for 50% of the cell and 10 units each representing 5% of the cell), CWatM can be run at a relatively coarse resolution, such as 30'' (<1km at the equator) to simulate a large hydrological basin while allowing simulation of small and individually operated farms.Because agents can directly interact with these units (their fields), we can, for the first time, investigate the interaction between small-scale individual behaviour and large basin-wide hydrological processes.
Therefore, we present the Geographical, Environmental and Behavioural (GEB) model, named after Geb, the Egyptian god of the earth.The model is an ABM that is dynamically linked with a specifically adapted version of the Community Water Model (CWatM; Burek et al., 2020).GEB can simulate large-scale hydrological processes as well as the individual behaviour of more than 10 million individual farming households and their bi-directional interactions with the hydrological system.CWatM is used as a coupled model to simulate the hydrological cycle at a grid resolution of 30'' (< 1 km at the equator).Individual farmer households (~11.112.2 million on a normal laptop with 10GB of free RAM) and reservoir operators are simulated as fully integrated agents that can dynamically interact with (i.e., respond to and influence) the water balance in CWatM.Through this coupling, each individual farmer can, at a daily timestep, decide to irrigate from various sources (i.e., surface, reservoir, or groundwater).Furthermore, farmers can decide to plant and harvest crops based on the available water in their environment, the status of their crops, their risk aversion, crop price, water price, weather conditions, etc.Moreover, farmers can adapt, for example, by investing in water-saving techniques, drilling boreholes, and changing crop type.All these decisions can be made at a daily timestep.
In this work, we describe how the open-source model is set up, followed by an example application model in the heavily managed Krishna basin, which encompasses ~257.000km 2 or ~8% of India's land area.Here, we simulate the adaptive patterns to water stress of ~1112.1 million farmers and show how adaptation through irrigation efficiency and crop choice can influence both individual farmers as well as the hydrological system through various artificial storylines.All model code is extensively documented on https://jensdebruijn.github.io/GEB/.

Model description
The GEB model is an open-source coupled hydrological and agent-based model jointly developed at the International Institute for Applied Systems Analysis (IIASA) and the Institute for Environmental Studies (IVM, VU Amsterdam).) developed in Python.The agent-based model can simulate millions of individual farmers in addition to other agents that interact bidirectionally with the hydrological CWatM model (Burek et al., 2020).In this manner, GEB simulates the water cycle, and how this interacts with the individual decision making of farmers and water managers/, such as crop management and growth, and irrigation and reservoir managementat basin-scale, allmanagement in large (or small) basins at a daily timestep (Figure 1)..The model can be adapted to run various scenarios, (Figure 1), both influencing the ABM (e.g., provision of subsidies to farmers or the construction of additional reservoirs) or the water cycle (e.g., varying future climate scenarios).shows further detail of the main model interlinkages in the default setup between model the human and hydrological components.The interactions between model components and consists of two dynamically integrated part, which run synchronized at a daily timestep; a field-scale hydrological model rules can easily be adapted(blue; see section 2.1 for details) and an agent-based model that simulates crop farmers and reservoir managers (orange; see section 2.2 for details).140 other applications or study regions.FirstDuring each timestep, the model is forced by a daily set of meteorological data, considering the initial distribution of land use and crops.Potential evapotranspiration is determined for both cropland and non-cropland, which is subsequently followed by the determination of both the water availability and potential demand.145 Potential irrigation demand for non-paddy irrigation is computed as the difference between current soil moisture content and the soil moisture content at field capacity in the root zone, which is limited by the infiltration capacity, while the.The potential irrigation demand for paddy-irrigated land is computed as the difference between the current water level in the paddies and the targeted water level.Here, reservoir operators can opt to influence water availability by releasingrelease water from their reservoir based on considerations such as current demand and water levels.Then, after water consumption by industrial, 150 domestic, and livestock sectors, farmers can abstract irrigation water.Here, the calculation of water consumption component of is removed from CWatM, and is adapted to interface directly withinstead calculated in the agent-based model and thus considers the addressing irrigation behaviour of individual agents.This irrigation behaviour is, in turn, determined by , selection of crop water requirementstypes, and irrigation equipment.In future work, we will more accurately simulate farmer behaviour by including factors such as 1) current and historical experiences of water availability and water requirements, 2) agent assets and characteristics (e.g., risk profile, household size), 3) agent assets (e.g., irrigation equipment, groundwater well), and 4) agent knowledge and regulations. of individual agents.These factors are not necessarily static over time, as agents can invest in assets (e.g.,management options such as irrigation wells, drip irrigation equipment), farm size can ) or change, etc. crop types. .Moreover, other agents,external bodies such as government and NGO agents, can imposegovernments and NGOs can influence the behaviour of farmers and reservoir operators by imposing regulations, provide knowledge to the farmer population, or invest in the wider availability of assets (e.g., create an irrigation reservoir).Knowledge can also be obtained from other (neighbouring) agents.
After the application of irrigation water, CWatM simulates infiltration, capillary rise within soils, groundwater recharge, surface routing, and groundwater flow (throughusing MODFLOW).Here, CWatM again communicates with the reservoir operator agents to determine the amount of water released downstream.Then, as a new timestep is initiated, each farmer can decide to plant or harvest crops based on experience, assets, characteristics, knowledge, and regulations, updating the CWatM The model is largely implemented in Python 3, with all computational intensive parts written in compiled Python libraries such as NumPy (Harris et al., 2020) and Numba (Lam et al., 2015), and includes optional GPU vectorization of soil components through CuPy.The model can be run on all major platforms (i.e., Linux, Windows, and Mac).An optional model interface is extended from Mesa (Kazil Jackie and Masad, 2020; Figure 3).A high-level description of the technical model integration can be found in Appendix A, while further details can be found in the model documentation.

Simulating hydrological processes at field scale
Most hydrological models implement several different land use types (e.g., Burek et al., 2020;Sutanudjaja et al., 2018;Müller Schmied et al., 2021).In these models, soil processes in all land use types are simulated individually.Runoff and several other hydrological fluxes are then computed by aggregating to the grid cell level while considering the relative size of each land use type in a particulargrid cell.In other words, each land use type within a grid cell is simulated as a hydrological response unit (Flügel, 1997;HRUs;Chaney et al., 2016).Farmers usually occupy cropland land use types such as non-irrigated land, paddyirrigated land, and non-paddy irrigated land (Burek et al., 2020;Sutanudjaja et al., 2018;Hanasaki et al., 2018;e.g., Alcamo et al., 2003).When a single land use type within a grid cell is occupied by multiple farmers, these farmers share ana HRU (i.e., hydrological environment) and are thus simulated as a single unit (of multiple farmers).
This introduces an issue for agent-based models that focus on the implementation of heterogeneous decision-making at the householdfield scale.For example, when two farmers share ana HRU and farmer #1 decides to irrigate while farmer #2 does not, the soil moisture in the field of farmer #1 should increase relative to the soil moisture in the field of farmer #2.However, when both farmers share theira HRU, the soil moisture in their field cannot be separately simulated in the model.
The most straightforward solution is to run the model at a higher resolution, such that the smallest field is simulated as a single grid cell while other larger fields are simulated as multiple grid cells.However, as small farms less than 1 ha make up 72% of global farms (Lowder et al., 2016), this solution requires the use of grid cells less than 100 × 100 m, which would use an enormous amount of computational resources, making the approach unfeasible in larger basins.
As a solution, we simulate the field of each farmer as a single HRU and adapt CWatM to be able to work with these HRUs (Figure 4).In this concept, cropland use types are further subdivided into dynamically sized HRUs based on the land ownership (or rent) of the agent (e.g., a farmer).These HRUs can be independently operated by agents in the ABM, such as farmers.In this manner, the land management decisions (e.g., crop planting date and irrigation) and soil processes (e.g., percolation, capillary rise, and evaporation) are independently simulated in an HRU for each farmer, thus allowing simulation of multiple independently operated farms within a single grid cell.These HRUs can also be split, allowing, for example, farmland expansion into other land use types and the sale of (part of) a farmer's land.Each crop farm that is owned by a farmer is thus an individual HRU.An exception is when a crop farm is spread across multiple grid cells, in which case it is represented by multiple HRUs across those grid cells.However, as these split HRUs are owned by a single farmer and thus management decisions still affect all HRUs and thus their entire farm.In addition, each land use type that is not operated by crop farmers in a grid cell is a separate HRU, thus, operating independently from other land use types, such as water areas, grasslands, and forests.While most primarily "vertical" hydrological fluxes (e.g., infiltration, percolation) occur within HRUs, river discharge and groundwater flow are simulated at the grid cell level.To this extent, conversion of fluxes between HRUs and grid cells is required.Figure 5 shows how this works in practice, similar to hydrological models that simulate multiple land use types in a single grid cell, such as CWatM and PCR-GLOBWB.Runoff is first determined per HRU and then aggregated to the grid cell Formatted: Font: 10 pt, Not Bold level while considering the relative size of each HRU.Aggregated runoff is then added to discharge, followed by solving the kinematic wave equation at the grid cell level.within

Agents
real agent populationareait is not possible to survey every personheterogeneous matchesas closely as possible based onas well asHere, this algorithm essentiallysThe ABM currently has four types of agents: farmers, reservoir operators, a government, and an NGO.Farmers and reservoir operators directly interact with CWatM, while government and NGO agents only communicate with other agents (e.g., by providing subsidies to farmers).Below, we discuss the default decision-making process of agents, some of which might be altered in storylines (see Section 0).Once behavioural rules are determined, agent behaviour is relatively easy to adapt in the model.agents are these Ffarmer agents are initialized as specified in the land use map according to a farm map, here, For the implementation in the Krishna basin, we selected the region basin for the study using the globally available at a 1.5'' resolution (i.e., 20 times higher resolution than the CWatM grid; <50 m at the equator).Given  farmers, a farm map specifies the farm owned by each farmer through  unique identifiers.Correspondingly NumPy c-style arrays of length  specify farmer characteristics.For example, a Boolean scalar array with 12.20 million values specifies whether each of 12.20 million farmers has access to groundwater in the initial timestep.Other characteristics include longitude, latitude, crop type, plant date, and harvest date.Here, we consider all crops that are can grow any of the following crops, grownInstead, because farmers bring sugar cane directly to mills, is used, following official government for each yeartakenonbased on From then forward, each farmer  irrigates each HRU they occupy.Upstream agents, as determined by their elevation, are allowed to abstract water first on a "first-come, first-serve" principle.As agents have no incentive to consider environmental flow conditions, these are not enforced.Famer irrigation demand () is determined by the difference between field capacity () and soil moisture () and is limited by infiltration capacity ().If farmers have access to the right equipment for surface (  ), reservoir (  ), and groundwater irrigation(  ), irrigation demand () is then satisfied, first from surface water (Eq.1), then from reservoirs (Eq.2) and groundwater (Eq.3).All sources are limited to current water availability from the streamflow (  ) in grid cell , reservoirs (  ) that supply the command area of grid cell , and groundwater in grid cell .In addition, farmers only have access to water resources if they have the relevant irrigation equipment.When a farmer decides to irrigate, the water is subtracted from the relevant sources in CWatM and then applied to the land in the relevant HRU.
The farmer's initial crop choice and growing pattern (e.g., single or double cropping) are also loaded as an array.The planting and harvesting dates are dependent on crop type and growing pattern but, if required, could be dynamically determined by the agent, for example, on the basis of weather forecasts.Once the farmer decides to harvest their crop, the respective HRU is set to "barren land" in CWatM.Then, as the farmer decides to plant a new crop, the land use type is changed accordingly in CWatM (e.g., to "irrigated").During a model run, farmers can decide to switch to different crops (see Section 0).
Crop growth is differentiated into four growing periods (i.e., initial, development, mid-season, and late-season), in which crop factors are also based on the specifications of Siebert and Döll (2010).The crop yield ratio is determined based on the ratio between actual and potential evapotranspiration, also following the specifications in Siebert and Döll (2010).eachtheafarmer adaptation procedure is run.Here, Here where farmersfollow neighbouring farmers in adaptation strategies, wconsider,wibut but with an irrigation wellHere, we assume farmers invest in an irrigation well, when the potential additional profit is higher than investment costs in the irrigation well.
The reservoir operator agents release a maximum percentage of the current reservoir water volume for irrigation purposes each day.Hence, as upstream farmers get to abstract water first, this can lead to limited access to reservoir water for farmers at the tail end of command areas.The amount of water released for other purposes (e.g., maintaining outflow, reducing water level) depends on the rating curve of the reservoir and relevant flood cushions (Burek et al., 2020).
Formatted: Font color: Black Finally, a government and NGO agent exist within the model.These agents can communicate with farmers and reservoir operators.In the default settings, these agents take no action but can take action depending on the model settings (see Section 0).

Model integration
This section discusses the coupling between the ABM and hydrological model, serving as both an explanation to GEB, and allowing the reader to couple their hydrological model to the ABM or vice versa.The ABM can be found in the GEB repository1 , while the adapted version of CWatM can be found in the ABCWatM repository2 .Because both models are written in Python, the coupling is performed by subclassing both models and synchronizing their timesteps while adapting functions within each model to communicate with the other.The configuration file (default: "GEB.yml")contains configuration parameters for both models, such as the start and end date of the simulation, which are later used in the respective models.Then, the shared data class ("self data") is loaded.The data class loads a mask of the study area and a land use map and automatically creates the grid and HRUs.The data class also contains convenience functions to convert data between the grid and HRUs.An example is given in Figure 6.It is useful but not necessarily required that all data for the grid cells (e.g., river discharge) and HRUs (e.g., soil moisture in the upper soil layer) used by the respective models is contained within this class, allowing easy access to these variables from both models.Then, the agent-based model is initialized with a set of agent attributes.For farmers, this usually consists of a raster map that indicates the area that is managed by a specific farmer, the locations of the farmer (e.g., the centre point of the field) and other attributes, such as crop type, cropping schedule and irrigation status.Finally, CWatM is initialized as per Burek et al. (2020) while loading initial land use and crop parameters from the ABM. 3

Model application
Here, we show the application of the model in the heavily managed Krishna basin in India.The main aim here is not to perform a fully realistic analysis of agent behaviour but rather to showcase the model by showing its ability to simulate more than 10 million farmers and to show how various artificial storylines influence model results for agents as well as river discharge.

The Krishna basin
With a size of roughly 8% of India's land area, the Krishna basin in India (Figure 7) is a complex socio-ecological system experiencing several sustainability and equity challenges, particularly related to water management.The basin is important for agricultural production while being exposed to floods, droughts, and dropping groundwater tables (Surinaidu et al., 2013).A large number of reservoirs with a total volume of approximately 42 billion m 3 (~20% of annual rainfall) were built primarily for irrigation purposes.Farmers in a reservoir command area can access the reservoir water distributed through a system of

Model setup
First, we selected the region basin for the study using the MERIT Hydro elevation map (Yamazaki et al., 2019), which was upscaled to 30'' using the Iterative Hydrography Upscaling method (Eilander et al., 2020a), and subsequently selecting all upstream cells of the Krishna river outlet.Other routing maps, such as river slope and width, were obtained similarly (Eilander et al., 2020b).Reservoir and lake footprints were obtained from the HydroLAKES dataset (Messager et al., 2016).If available, flood cushions and reservoir volumes were obtained from the Andhra Pradesh WRIMS 4 .database.If not available, flood cushions were assumed to be zero, while reservoir volumes were taken from the original HydroLAKES data.Reservoir command areas were obtained from the India Water Resources Information System (India-WRIS 5 ) and subsequently manually linked to the previously obtained reservoir using satellite imagery.Reservoir operator agents are assumed to release a maximum fraction of the current reservoir volume for irrigation, limited by the irrigation demand in the command area.Land use was obtained at 30-meter resolution from GlobeLand30 (Jun et al., 2014),, downscaled to 1.5'' and mapped to CWatM land use types.Pixels that were classified as "water body" in GlobeLand30 and all cells with at least 100 km 2 upstream area were classified as "water covered area" in CWatM.All other input data were obtained from CWatM input maps at 5' resolution and downscaled to 30'' for CWatM input.The groundwater MODFLOW model is defined by an orthogonal grid at a 1000m resolution.Only one homogeneous unconfined aquifer layer is considered.One pumping well is set up in each MODFLOW cell to satisfy the water demand from farmers and other sectors.
Water demand and consumption for industrial, domestic, and livestock sectors are estimated using the approach developed by Wada et al. (2011) and are then downscaled to the size of the land unitsHRUs by distributing the demands over cells with relevant land uses; grassland for livestock demands, and sealed area for industrial and domestic demands.The model was forced with GSWP3 (Dirmeyer et al., 2006) provided withinas part of ISIMIP3a. 3

Model application
Here, we show the application of the model in the heavily managed Krishna basin in India, simulating the behaviour of more than 12 million farmers and the water system.With a size of roughly 8% of India's land area, the Krishna basin in India (Figure 6) is a complex socio-ecological system experiencing several sustainability and equity challenges, particularly related to water resources management.The basin is important for agricultural production while being exposed to floods, droughts, and dropping groundwater tables (Surinaidu et al., 2013).A large number of reservoirs with a total volume of approximately 42 billion m 3 (~20% of annual rainfall) were built primarily for irrigation purposes.Farmers in a reservoir command area can access the reservoir water distributed through a system of canals.In addition, following the Indian Agriculture Census6,

Agents
The ABM has farmer (section 3.1.1)and reservoir operator agents (section 3.1.2),which can make autonomous decisions affecting the hydrological system as wel as each other.Farmers and reservoir operators directly interact with CWatM.

Farmer initialization
First, an initial agent population needs to be generated with heterogenous characteristics similar to the real population living in the basin.As with most agent-based models, and in particular large-scale models , we do not have specific information for every person.Therefore, we generate a synthetic population of farmers which has statistically similar properties as the real population (e.g.income, irrigation type, household size, etc).These statistics are based on available survey data combined with regional marginal statistics using the Iterative Proportional Fitting algorithm (IPF; Figure 7).For the implementation in the Krishna basin, the IPF algorithm reweights survey data from the Indian Human Development Survey (IHDS; Desai et al., 2005), such that the overall distribution of the adjusted survey data fits the marginal distributions of farm sizes and crop types  Next, the farmer population is randomly distributed within their subdistrict on farmland as specified in GlobeLand30 (Jun et al., 2014), at a 1.5'' resolution (i.e., 20 times higher resolution than the CWatM grid; <50 m at the equator).The smallest field size is thus approximately 0.25 ha.

Cropping
The generated farmer agents grow the following crops; bajra, groundnut, jowar, paddy rice, sugar cane, wheat, cotton, gram, maize, moong, ragi, sunflower and tur.Each crop has 4 growth stages ( 1 …  4 ), and a the number of days since the crop was planted ().The crop factor (Kc) is then calculated based on the following equation (Fischer et al., 2021): 7 https://agcensus.dacnet.nic.in/where t is the day of number of days after the crop has been planted,  1 …  4 the length of each crop stage.At the harvest stage, actual yield (  ) is calculated using a reference yield (  ; Siebert and Döll, 2010), the water-stress reduction factor (), and the ratio of actual () to potential evapotranspiration () over the entire growing period (Fischer et al., 2021). )) (2) All crop-specific factors used in equations 1 and 2 can be found in Appendix C.

Farmer income and expenses
After harvesting, it is assumed that farmers sell their crops for the state-wise market price for that month.Historic monthly market prices are obtained from Agmarknet8 for all crops except sugar cane.For sugarcane, which is brought directly to sugar cane mills, it is assumed that farmers receive the yearly indexed Fair and Remunerative Price (FRP).These prices are fixed by the government.Yearly cultivation costs (e.g., purchasing seeds, manure, labour cost, annual depreciation) per hectare per crop type are obtained from the Ministry of Agriculture and Farmers Welfare9 .Additional farmer income is obtained (e.g., from non-farming work) from the IHDS survey data.Similarly living expenses are calculated from the daily consumption per capita in each household and household size, both available from the IHDS survey.Finally, disposable income is calculated by subtracting income and expenses.

Irrigation
During the model run, when a farmer  has irrigation equipment and is cultivating a crop, they irrigate the HRUs they own (Figure 8).Agents are allowed to abstract water first on a "first-come, first-serve" principle, starting with upstream agents as determined by their elevation.As agents have no incentive to consider environmental flow conditions, these are not enforced.Famer irrigation demand () is determined by the difference between field capacity () and soil moisture () and is limited by infiltration capacity ().If farmers have access to the right equipment for surface (  ), reservoir (  ), and groundwater irrigation(  ), irrigation demand () is then satisfied (Eq.3), first from surface water (Eq.4), then from reservoirs (Eq.5) and groundwater (Eq.6).All sources are limited to current water availability from the streamflow (  ) in grid cell , reservoirs (  ) that supply the command area of grid cell , and groundwater in grid cell .In addition, farmers only have access to water resources if they have the relevant irrigation equipment.When a farmer decides to irrigate, the water is subtracted from the relevant sources in CWatM and then applied to the land in the relevant HRU.
The planting and harvesting dates are dependent on crop type and growing pattern.Once the farmer decides to harvest their crop, the respective HRU is set to "barren land" in CWatM.Then, as the farmer decides to plant a new crop, the land use type is changed accordingly in CWatM (e.g., to "irrigated").

Investing in irrigation wells
On the first day of each year, farmers can choose to invest in an irrigation well to improve their ability to irrigate their land.
Here, we use the expected utility theory (e.g., Schrieks et al., 2021) to assess whether farmers make such an investment.Due to the strong social networks effects (e.g., Tripathi & Mishra, 2017), we consider that farmers assess the potential benefit of installing an irrigation well based on the profit of neighbouring farmers with an identical cropping pattern, but with an existing irrigation well.More specifically, we first calculate the farmer "profit ratio" (  ) [0-1], defined by the ratio of actual profit to potential profit given abundant water (i.e., actual evapotranspiration is equal to potential evapotranspiration).Each farmer without an irrigation well compares their profit ratio, to the profit ratio of 10 nearby farmers with an identical cropping pattern but no irrigation well ( ℎ ).The benefit of installing an irrigation well is then calculated by multiplying the difference by the current crop price () and reference yield (  ).Similarly, costs for well installation are calculated considering the costs of a loan for that amount (  ), considering loan duration in years (), the current interest rate (), and yearly upkeep cost ().
Finally, incremental profit (∆) is determined by subtracting costs and benefits.If ∆ the farmer invest in the irrigation well.
The interest rate is set to the lending interest rate in India for that year10 , tube well installation and maintenance costs for the year 2008 are set at 146.000 Rs and 3.000 Rs/ha, respectively (Sharma et al., 2008), and corrected for inflation for other years.
Loan duration is set to 10 years, following offerings of several major agricultural banks in India.Finally, it is determined whether current disposable income is sufficient to pay for the loan.All installed wells are assumed to be 30 m deep.

Reservoir operators
The reservoir operator agents communicate with farmer agents in their reservoir command area on a daily basis and release the requested water for irrigation purposed, maximized by a maximum percentage of the current reservoir water volume.As upstream farmers get to abstract water first, this can lead to limited access to reservoir water for farmers at the tail end of command areas.The amount of water released for other purposes (e.g., maintaining outflow, reducing water level) depends on the rating curve of the reservoir and relevant flood cushions (Burek et al., 2020).

Creating farmer agents
As a map of individual farms is not available to the best of our knowledge, we created a synthetic map of individual farms and corresponding agents in the area designated as cropland in GlobeLand30.The aim here was not to create a fully accurate farm map but one that is statistically representative of the area.To do so, we obtained farm sizes at the district level from the Indian Agricultural Census11 .Then, we randomly generated farms (see example Figure 4b) while considering the distribution of farm sizes, resulting in approximately 11 million individual farms and corresponding farmer agents.We generated individual farmer access to surface water irrigation using a map of irrigated areas in 2010-2011 (Ambika et al., 2016).In addition, some irrigating farmers had an irrigation well assigned using probabilities of having a well for each farm size class per the agricultural census at the district level 5 .
Farmers are then assigned a crop and planting and harvesting scheme (single, double, triple-cropping) based on their irrigation status and location using the MIRCA2000 dataset12 (Siebert and Döll, 2010).Here, the MIRCA2000 crop areas are downscaled to the resolution of the farmer map, and the crop and crop planting scheme are then randomly sampled based on the location of the farmer and the relative area of all crops within that grid cell.

3.2 Calibration and validation
The model is calibrated based on daily river discharge from the India Water Resources Information System (WRIS) for the Wadenepally station in the Krishna River, nearby the river outlet, roughly 60 km upstream of Vijayawada.Calibration is performed based on several hydrological parameters (Burek et al., 2020), as well as the maximum amount of water released from a reservoir for irrigation purposes on a given day, the normal reservoir outflow, and the irrigation re turn fraction, using.
wherewhere  is the correlation coefficient between monthly simulated and observed discharge,  = The genetic calibration algorithm first generates 60 parameter sets within a predefined range of plausible options (i.e., "the population"), and the model is subsequently run for each parameter set (i.e., "individual").Then, theThe 10 most optimal parameter sets are then combined (i.e., "mated") with a probability of 0.7 or altered (i.e., "mutated") with a probability of 0.3 to create 12 new parameter sets for which the model is also run.Then,This procedure is repeated for 10 iterations (i.e., "generations") and the 10 most optimal parameter sets areset is selected again from all previous model runs, including.This set is then re-run until the initial set.After 10 iterations (i.e., "generations"),year 2019, and the most optimal parameter set is finally selectedKGE score is calculated for 2013-2019 for validation.

Storylines
After calibration, we test how the model responds to various endogenous and exogenous drivers of adaptation, running 3 × 2 simple storylines with a combination of exogenous and endogenous drivers of adaptation.Note again that these storylines are not meant to be actual projections of reality but rather to showcase the model.
Exogenous drivers: 1.No adaptation: all farmers have an irrigation efficiency of 60%, meaning that 60% of irrigation water infiltrates the soil while 40% is return flow.2. NGO adaptation: an NGO is assumed to disseminate knowledge about how to improve irrigation efficiency to 80% for 100.000farmers.All farmers with a higher irrigation efficiency have a daily 1% probability of disseminating the knowledge to another farmer within a 5 km range (i.e., to simulate a social network).3. Government subsidies: the government provides subsidies to improve irrigation efficiency to 80% for 5% of farmers each year.
Endogenous drivers: 1.No crop switching: farmers stick to the crops that they initially use.
2. Crop switching: each year, farmers with a field size of at least 0.5 hectares have a 30% probability of switching to sugar cane, a high water consumptive crop, if they were water limited for less than half of the days during the previous growing season.
For all storylines, we use a spin up period of 2012-2014 and show model output for 2014-2018.Yet, we should also note that agent behaviour is rather simplistic and homogeneous in this conceptual model, and should be made more realistic using empirical parameterization in future research.This requires data campaigns and stakeholder  states.The mean reservoir storage also decreases over time.However, profit, when adjusted for inflation, decreases over time (panel H) as less water is available for farmers and price increases for crops at the market (e.g., 83% increase in average price at the crop market for the analysed crops for Maharashtra, while general prices are 122% higher).Finally, panels I and J show how the number of irrigation wells and profit change for small and large farmers.Here small farmers are the 50% farmers with the smallest fields, while large farmers are those with the 50% largest fields.Here, it is clear that smaller farmers have less irrigation wells (panel I) at the beginning of the run, but also can invest less in irrigation wells due to their limited income compared to larger farmers (section 3.1.1.4).This is also reflected in the profits.The relative increase in profits is approximately 2% higher for large farmers over the entire simulation timeline.
In another hypothetical scenario "drip" (see Figure 11), the state of Maharashtra, one of the upstream states, provides subsidies to farmers, making 20% of farmers switch to drip irrigation each year, corresponding with a 90% irrigation efficiency (Brouwer and Heibloem, 1986).Here, yearly discharge (at the basin outlet) increases slightly in some years (here: 2016 only), likely because most additional water is used downstream due to the large number of reservoirs and reservoir command areas downstream of Maharashtra.The mean combined reservoir storage is slightly higher in this scenario (panel G).Interestingly, the number of installed irrigation wells (panel C) also goes up in Maharashtra, Karnataka and Telangana, likely because less wells fall dry.This means that the benefit of having an irrigation well increases (mostly Maharashtra) and the means to invest in an irrigation well also increase due to higher water availability and thus higher yield (mostly Maharashtra and Karnataka).
In Karnataka and Telangana this even leads to a slight decrease in the water table.
While some adaptation options are considered here, additional adaptation options could be considered at a later stage such as crop switching and rainwater harvesting (Tamburino et al., 2020).Moreover, including factors such as threat appraisal (e.g., perception of drought risk), the coping appraisal of individual farmers (e.g., knowledge, information, and financial resources; Wens et al., 2020;Schrieks et al., 2021),, Streefkerk et al., 2023), an extension of farmer networks (Wens et al., 2020) and collectives (Shah and Bhattacharya, 1993).In addition, model parameterization could benefit from the automatic delineation of small-holder fields using machine learning (Waldner and Diakogiannis, 2020) and the recognition of crop types at the field scale (Gumma et al., 2020) could benefit a more realistic model2020) when such datasets become freely available in the future.

Storylines
Finally, Figure 10 shows the potential of the model to simulate the storylines (see Section 0).Panels D and E show the percentage of farmers that have adopted both high irrigation efficiency and sugar cane crops.In the storylines with no irrigation adaptation, all farmers remain at low irrigation efficiency, while an increase is observed in the other storylines.For the adoption of sugar cane, it can be seen that a slightly higher percentage of farmers adopt sugar cane in the storyline with NGO adaptat ion (blue line) compared to the storyline with government subsidies (orange line) because adoption of high irrigation efficiency is faster in the NGO adaptation storyline.This increases the availability of water, and hence more farmers switch to sugar cane.
The effect of these storylines is also clearly visible in panel C, where the mean hydraulic head across the entire region is shown.
A lower hydraulic head indicates a lower groundwater table.Clearly, the most sustainable storyline-with the least dropping groundwater table-is the one where farmers do not switch crops but do increase their irrigation efficiency.In contrast, the storyline with both low irrigation efficiency and crop switching is the least sustainable.Similar effects, albeit less clear, can be observed in the discharge (panel A) and reservoir storage (panel C).Especially at the end of the dry season (see insets in panel C), reservoir storage is lower for the less sustainable storylines and vice versa.Similarly, the reservoirs fill up quicker in the sustainable storylines.While discharge is difficult to distinguish between the storylines, it can also be seen in the insets of panel A that the discharge is slightly higher at the end of the dry seasons.Similarly, the discharge peaks are higher dur ing the wet season in the more sustainable storylines.While any increased flood risk from these discharge peaks can easily be averted by increasing the size of the flood cushions or increasing the release of water downstream when the reservoir is nearfull, this could lead to a slightly higher flood risk if identical reservoir operation rules are used in all storylines.Future research is still required to further simulate the behaviour of farmers on a large scale.Future studies can also furt her investigate how governments or other organizations can affect the human-natural system Using the model, we quantitatively show how farmer behaviour and the hydrological system are intricately woven across both small and large scales.Changes in behaviour or investment in irrigation measures affect hydrology and other farmers locally, but also affect river discharge and other farmers further downstream.Effects are visible both in hydrological variables as well as farmer behaviour and profit.Using a scenario where drip irrigation is promoted in an upstream state, we show how the effect of policies can be assessed on local and large-scale processes across both the hydrological and human domains.This provides opportunities to study large-and small-scale socio-hydrological processes simultaneously in large river basins worldwide.
The agent-based model can be separately coupled to other hydrological models, assuming the hydrological model facilitates simulation of field-scale hydrological processes.Alternatively, at the price of losing the ability to fully capture heterogeneity of human processes, the ABM can be adapted to simulate aggregated agents within a grid cell facilitating coupling to other hydrological models that do not support field-scale hydrological simulation.Similarly, the adapted version of CWatM, which now simulates field-scale hydrological processes, can be coupled to other ABMs.
Future studies could include additional adaptation measures to make the model more realistic and further investigate how policies and infrastructural projects, such as through reservoir construction and management, water/electricity pricing (Parween et al., 2021), water rights (Nouri et al., 2019a), enforcing specific crop types (Wallach, 1984), etc.
FFurther) can affect the human-natural system.As humans play a key role in the environment, the human component of GEB can also be central in coupling further models, for example to economic models or land use change models (Dou et al., 2020), allowing to investigate the land-water-food use nexus.In addition, coupling GEB to a hydrodynamic model such as DIM (Farrag et al., 2021) allows researcherswould allow to investigate the interactions between human behaviour and flood and Formatted: No bullets or numbering drought risk (Ward et al., 2020).They can also enhance agent behaviour by coupling this behaviour with economic models or models that simulate land use change (Dou et al., 2020).In additionFinally, the integration of future scenarios such as climate change, population growth, and exogenous land use can be used to project how the coupled human-natural system is projected to change into the future.

Competing interests
The authors declare that they have no conflict of interest. 1010.

Figure 2 :
Figure 2: A schematic overview of GEB.

Figure 2 :
Figure 2: A schematic overview of GEB.

Formatted:
Justified, Don't keep with next Formatted: Font: Bold Formatted: Normal Formatted: Font: Bold Formatted: Font: Not Italic .The land use classes in CWatM are then updated accordingly.Finally, the next timestep is initiated, starting with meteorological forcing, as described above.

Figure 3 :
Figure 3: Optional model interface.The model can be run for one more timestep, and one can show all model variables on a map.Here, the land use type is shown.The red dots represent farmer agents.Optional line charts can be added to show variables like discharge and mean groundwater level over time.For visualization purposes only, a small subbasin northwest of Pune (Maharashtra) is shown here.Land use map derived from Jun et al. 2014.

Figure 4 :
Figure 4: In this figure, 3×3 grid cells are shown, delineated by horizontal and vertical black lines (30'' resolution).Panel a displays various land use types at 20 times higher sub-grid resolution, panel b shows the crop farms owned by agents, and panel c shows the resulting HRUs.Each contiguous area of one colour in panel b represents a farm, while each contiguous area of one colour in panel c represents an HRU.One exception is non-crop HRUs of the same land use type within a grid cell, which belong to one HRU (e.g., all rivers within a grid cell are 1 HRU).Crop farms owned by one farmer that cross grid cell boundaries are represented by multiple HRUs; see, for example, the crop farm in the centre of the red circle.Land use map derived from Jun et al. 2014.

Figure 5 :
Figure 5: Schematic overview of the implementation of farm-level HRUs.Here, a grid cell consists of four HRUs; one water-covered area, two crop farms, and one grass-covered area.(see coloured 'land use types').Runoff is determined per HRU and then aggregated considering the relative size of the HRUs to compute runoff for the entire grid cell.

A
coupling to another hydrological or agent-based model can be made by adapting the model class.It is, however, required that the hydrological model can work with farm-level HRUs.In essence, the model class 1) loads the model configuration file (see below), 2) loads a shared data superclass, 3) initializes the ABM, 4) initializes the hydrological model, and 5) iteratively runs a timestep of both models.

Figure 6 :
Figure 6: A simplified code example of the conversion between hydrological response units (HRUs) and the grid.

Figure 7 :
Figure 7: Outline of the Krishna Basin in India.© OpenStreetMap contributors 2022.Distributed under the Open Data Commons Open Database License (ODbL) v1.0.

Figure 6 :
Figure 6: Outline of the Krishna Basin in India.© OpenStreetMap contributors 2022.Distributed under the Open Data Commons Open Database License (ODbL) v1.0.
at subdistrict-level based on the Indian Agriculture Census 7 .Here, we consider all crops that are grown by at least 2% of the farmer population.Because farmers with multiple crop types throughout the year are counted multiple times in the census, an adjusted version of IPF is used (Appendix B).In this manner, a heterogeneous population of 12.2 million farmers is generated with the following characteristics; household size, crop type in kharif, rabi and summer season, irrigation type, daily non-farm income and daily consumption per capita.

Figure 9
Figure 9 shows observed versus simulated discharge in  3  ⁄ for the calibration model.The KGE during the calibration period is 0.849810, while the KGE during the test period is 0.709834 (1 is optimal), showing a good calibration performance for the model.Figure 9 during both periods.

Figure 9 :
Figure 9: Observed versus simulated discharge for the calibrated model.

Figure 10
Figure10shows irrigation from channels, reservoirs, and groundwater for all agents.While the data is difficult to show for individual farmers at this scale, for the Krishna basin.The insets show the detailed heterogeneous irrigation quantities at the field scale for a small portion of the basin.Yet, on the larger scale, it is clearly visible that farmers along rivers and within reservoir command areas have better access to irrigationwater for irrigation.Differences in irrigation quantities from various sources are explained due to the location of farmers (and hence access to water from various irrigation sources), crop types, irrigation equipment etc.For example, farmers without an irrigation well cannot access the groundwater and thus can irrigate less, while farmers with sugarcane are expected to irrigate more than other farmers.

Figure 11 :
Figure 11: Yearly simulation results for baseline scenario and scenario with investment in drip irrigation.In figure11weshow how several variables change over time for specific scenarios, with the aim of showing how the model behaves.In the "baseline" scenario, farmers that have access to a reliable irrigation well have a 20% probability of switching to sugarcane, a crop that generally ensures a higher income but also uses a lot of water.Here, we define a reliable irrigation well as a well that has not fallen dry in the last 3 years.The resulting mean annual discharge is shown in panel A, while the number of farmers with sugarcane in each state gradually increases over time, as shown in panel B. Similarly, the number of wells increases over time (panel C), due to investment in irrigation wells (see section 3.1.1.4).Panel D and E show the amount of irrigation from the groundwater and total irrigation from all sources.Because the increasing number of irrigation wells, as

Figure 8 :
Figure 8: Observed versus simulated discharge for the calibrated model.545
The model for the entire Krishna basin can be run on an above-average laptop.Model run time was ~10 seconds per daily timestep (i.e., ~1 hour for one year) using a single core on an AMD EPYC 7302 while requiring no more than 7GB of RAM and an 8GB RTX1070 GPU.Without GPU, the run time was ~30 seconds per timestep while requiring 12GB of RAM.Model run time and requirements scale near-linearly with basin size, assuming identical farm sizes.Larger farm sizes reduce the requirements, while smaller farms increase the requirements.Here, the average farm size is 1.6 ha.

Figure 10 :
Figure 10: This figure shows several model outputs for six storylines.Panel A shows daily discharge, panel B shows the mean hydraulic head, panel C shows total reservoir storage, panel D shows the percentage of farmers with a high irrigation efficiency, and panel E shows the percentage of farmers that adopted sugar cane.Note that the y-axes for panels D and E are different.
is used as an objective function with  as the correlation coefficient between monthly simulated and observed discharge,  =