Modelling boreholes in complex heterogeneous aquifers

Reliable estimates of the sustainable yield of supply boreholes are critical to ensure that groundwater resources are managed sustainably. Sustainable yields are dependent on the pumped groundwater level in a borehole, how this relates to vertical aquifer heterogeneity, and features of the borehole itself. This paper presents a 3D radial flow model (SPIDERR), based on the Darcy-Forchheimer equation, for simulating the groundwater level response in supply boreholes in unconfined, heterogeneous aquifers. The model provides a tool for investigating the causes of non-linear behaviour in abstraction boreholes, which can have a significant impact on sustainable yields. This is demonstrated by simulating a variable-rate pumping test in a Chalk abstraction borehole. The application suggests the non-linear response to pumping is due to a combination of factors: a reduction in well storage with depth due to changes in the borehole diameter, a reduction in hydraulic conductivity with depth, and non-Darcian flow.


Introduction
Groundwater is estimated to provide drinking water to more than 50% of the global population, with 2.5 billion people solely dependent on groundwater resources for their basic daily water needs. Groundwater also accounts for 43% of irrigation water worldwide, playing a crucial role in maintaining the livelihoods of some of the world's most vulnerable people and contributing to food security for many more (WWAP, 2015). However, it is estimated that 20% of the world's aquifers are over-exploited (Gleeson et al., 2012), a situation which is likely to escalate due to an increasing demand for water for domestic, agricultural, and industrial uses in the context of global population growth and climate change. To develop and use groundwater resources sustainably, aquifers (particularly those that are strategically important or vulnerable to over-abstraction and/or large seasonal groundwater level fluctuations) must be managed carefully within the context of the wider environment. There is a need to assess both the short-and long-term availability of groundwater to ensure an economically, socially, and environmentally acceptable balance is maintained between supply and demand without risk to the long-term supply. Over short, seasonal to annual, timescales this requires an understanding of the sustainable yield of individual abstraction boreholes, how yields change as groundwater levels decline seasonally (particularly during droughts), and the impact of abstraction on the surrounding environment, e.g. river flows and groundwater-dependent ecosystems. Over longer timescales, it is important to understand groundwater abstraction within the context of long-term average recharge to prevent continuous declines in groundwater levels, and consequent long-term reductions in aquifer and borehole yields.
The sustainable yield of a borehole reflects an interplay between regional-and local-scale processes. At the regional-scale, yields are largely influenced by the spatial and temporal distribution of recharge, groundwater storage, and aquifer transmissivity. At the local-scale, yields are critically dependent on the pumped water level in the borehole and how this relates to vertical aquifer heterogeneity, the nature and distribution of inflow horizons to the borehole, and features of the borehole itself, such as borehole storage, the depth of the pump and its pumping characteristics (Foster et al., 2017). In unconfined vertically heterogeneous aquifers, particularly fractured aquifers, which are estimated to cover more than 20% of the earth's surface (Krasny and Sharp, 2007), the groundwater level response in a pumped borehole can be highly non-linear. Dewatering of major fractures or inflow horizons can result in significant reductions to the yield of a borehole as the groundwater level in the borehole falls in response to pumping. Understanding this response and being able to accurately model it is, therefore, crucial for determining the sustainable yield of a borehole, particularly during droughts, and predicting how yields might change in response to future reductions in groundwater recharge (Doll, 2009)  The groundwater level response in a pumped borehole is typically tested by constant-or variable-rate pumping tests, carried out over periods of hours to days. Numerous analytical and numerical methods have been developed to analyse these tests under different hydrogeological conditions. Analytical solutions exist for solving for the groundwater level in a pumped borehole in confined (Theis, 1935) and unconfined (Neuman, 1972) aquifers with large diameter abstraction boreholes (Papadopulos and Cooper, 1967;Mathias and Butler, 2006) and non-linear well losses (Jacob, 1947), and in non-uniform or fractured aquifers (Barker and Herbert, 1982;Barker, 1988;Butler, 1988;Butler and Liu, 1993). These methods are generally based on a number of simplifying assumptions that may not be satisfied in hydrogeologically complex aquifers, which can be more accurately represented by numerical models. The most commonly applied numerical method for simulating flow to an abstraction borehole is based on a finite difference approximation to the governing flow equation in cylindrical coordinates, whereby specific discharge is calculated using Darcy's law. This method was applied in the earliest radial flow models (Rushton, 1974;Rushton and Booth, 1976;Rushton and Redshaw, 1979;Rathod & Rushton, 1984, 1991Sakthivadivel and Rushton, 1989) and in more recent developments such as COOMPuTe (Mansour et al., 2003) and MODFLOW (Samani et al., 2004;Langevin, 2008;Louwyck et al., 2012Louwyck et al., , 2014. The application of these models has highlighted the importance of representing several features in and around a borehole to accurately simulate the pumped water level in the borehole: confined and unconfined conditions, and particularly variations in the rest water level where hydraulic conductivity varies with depth (Rushton and Chan, 1976); aquifer heterogeneity in both the vertical and horizontal dimension (Connorton and Reed, 1978;Mansour et al., 2006Mansour et al., , 2011; development of vertical head gradients and vertical flow (Rushton and Howard, 1982;Rathod and Rushton, 1991); well storage, casing and screening; and seepage face development (Rushton and Singh, 1987;Sakthivadivel and Rushton, 1989;Rushton, 2006). A common observation during variable-rate pumping tests, which are designed to test the efficiency of a borehole, is the effect of well losses, or a non-linear increase in drawdown with the rate of abstraction. Well losses can be attributed to the effect of flow through gravel packs or well screens, the development of a seepage face, or to the development of non-Darcian flow in the aquifer itself. The radial flow models described above typically account for well losses by lowering the hydraulic conductivity in the immediate vicinity of the borehole, but assume that flow in the aquifer remains linear and can therefore be described by Darcy's law.
The significance of non-Darcian groundwater flow is well documented in the literature (Sen, 1987(Sen, , 1989(Sen, , 1990(Sen, , 2000Kohl et al., 1997;Wen et al., 2006;Qian et al., 2007). This is particularly the case where high flow velocities are likely to develop around abstraction boreholes in fractured aquifers. Non-Darcian groundwater flow is most commonly represented by Forchheimer's law, which includes a quadratic term to account for the observed non-linear relationship between head gradient and specific discharge as flow velocities increase (Forchheimer, 1901). Several analytical models based on Forchheimer's law are available in the literature (Şen, 1988;Ewing et al., 1999;Mathias et al., 2008;Mathias and Wen, 2015;Liu et al., 2017) and it has been applied using finite difference and mesh-free methods to simulate drawdown in 1D confined and leaky aquifers (Mathias and Todman, 2010;Wen et al., 2011Wen et al., , 2014. There are, however, no examples in the literature of numerical groundwater models that incorporate Forchheimer (non-Darcian) flow, aquifer heterogeneity, and the detailed features of an abstraction borehole, which have all previously been shown to influence drawdown in pumped boreholes. This represents a gap in our analytical tools to simulate and assess sustainable borehole yields in unconfined aquifers. This paper presents a 3D radial flow model -SPIDERR -based on the Darcy-Forchheimer equation, for simulating the pumped water level in an abstraction borehole. The model can represent confined and unconfined aquifers, vertical and horizontal aquifer heterogeneity, Darcian and non-Darcian flow, and key features of a borehole such as changes in diameter with depth, well casing, and screening. This is crucial if the simulation of sustainable borehole yields, which can reduce as groundwater levels fall in response to pumping in complex water table aquifers, is to be improved. As outlined above, a reliable estimation of borehole yields is key to managing groundwater resources in a safe and sustainable way. In the UK, this is also a statutory requirement for water companies, who must provide estimates of the sustainable yield of supply boreholes during droughts as part of the water resource management process. The model presented here integrates components of the work described above, explicitly representing all of the pertinent hydrogeological and borehole construction features that may be required to simulate the water level response in an abstraction borehole during pumping. The model is evaluated against an analytical solution for non-linear drawdown in a pumped borehole (Cooper and Jacob, 1946) and by reproducing two relatively simple step-drawdown tests from confined sandstone aquifers, as previously presented by Mathias and Todman (2010). The model is then applied to simulate a more complex groundwater level response from a pumping test in an abstraction borehole in the Chalk aquifer. This is the principal aquifer in the UK where borehole yields are often significantly affected by pumped water levels and vertical heterogeneity related to fracture development (Butler et al., 2009). The application to simulate a variable-rate pumping test in a Chalk borehole demonstrates that the model can be used as a tool to investigate the causes of non-linearity in the groundwater level response to pumping.
The ability of SPIDERR to simulate the short-term response in an abstraction borehole is a key step in evaluating the long-term sustainable yield of a borehole. The model has been developed as part of a wider study, which has produced a methodology to insert this boreholescale model into a regional groundwater flow model. This coupled representation of local and regional groundwater processes is critical for estimating the long-term sustainable yield of a borehole (Upton, 2014). We cite this wider study, which will be presented in a future paper, as it places the development of this model in context. The focus of this paper, however, is on the development of the radial flow model and its application to simulate drawdown during variable-rate pumping tests, which allows us to investigate the key factors influencing drawdown in complex heterogeneous aquifers.

Conceptualisation
The model is constructed on a three-dimensional cylindrical grid as shown in Fig. 1. This allows aquifer heterogeneity to be represented along the radial (r), cylindrical (Θ) and vertical (z) axes. Grid nodes are spaced logarithmically in the radial direction to provide refinement in the vicinity of the borehole where the hydraulic head, or in the case of an unconfined aquifer, the water table has greatest curvature.
The borehole is located at the centre of the radial grid and is represented by a single node with a specified radius. The borehole node can extend to the base of any model layer allowing the representation of fully or partially penetrating boreholes. The borehole can be connected to the first adjacent aquifer node, located on the well face, with a very high conductance term to represent an open borehole, a specified conductance to represent a well screen, or it can be completely disconnected from the aquifer node to represent the presence of a well casing. A gravel pack can be represented by modifying the hydraulic conductivity of the aquifer nodes around the borehole. The borehole node, to which pumping is applied, has a storage coefficient equal to unity to represent well storage. The depth of the pump is also specified such that no water can continue to be abstracted if the water level in the borehole falls below this depth.
The vertical dimension of the model is represented as a series of horizontal layers of varying thickness. Nodes transition between confined, unconfined, and inactive depending on the elevation of the groundwater head relative to the top and bottom of the layer (Fig. 1). Cell dewatering and rewetting often causes numerical instabilities in groundwater models, particularly when modelling highly non-linear systems. An upstream weighting approach, similar to that applied in MODFLOW-NWT (Niswonger et al., 2011), has been implemented in the radial flow model to help smooth the transition when cells dewater or rewet. In a single direction the flux between two cells is, therefore, based on the weighted harmonic mean of the hydraulic conductivity of the two cells and the up-gradient saturated thickness in that direction. This means that water cannot flow horizontally out of a dewatered cell, but it can still receive water from an up-gradient cell. All water entering a dewatered cell, either from an adjacent cell or from recharge, is automatically routed vertically to the active node below. This helps to smooth the transition as cells dewater and rewet thus reducing numerical instabilities.
Vertical flow from the water table and development of a seepage face are common features around abstraction boreholes in unconfined aquifers (Rushton and Howard, 1982;Rushton and Singh, 1987;Sakthivadivel and Rushton, 1989;Rathod and Rushton, 1991;Rushton, 2006). Flow across a seepage face is poorly quantified, particularly in fractured aquifers. It occurs across the interval between the water level in the borehole and the elevation at which the water table intersects the borehole wall. In the radial flow model, the groundwater head in the upper unconfined layer does not explicitly represent the elevation of the water table but an average of the head within that layer. In a singlelayer model, the use of the Dupuit approximation for unconfined flow assumes that all flow into the borehole is horizontal and occurs below the water level in the borehole (Fig. 2a). Vertical discretization allows the vertical component of flow to be incorporated (Fig. 2b) and a seepage face is approximated by allowing a layer to discharge to the borehole even after the water level in the borehole has fallen below the base of the layer. Under these conditions, the head at the node on the borehole wall, which represents the seepage face, is fixed at the elevation of the base of the layer and all flow into this node from the adjacent aquifer node is passed directly into the borehole. When this aquifer node also dewaters the layer becomes horizontally disconnected from the borehole and seepage flow stops. This representation is slightly different from the COOMPuTe radial flow model, which approximates seepage flow by representing the water table as a discrete numerical layer (Mansour et al., 2011). While the COOMPuTe model provides an approximate solution of the water table elevation, vertical discretization is still required to represent seepage flow and there are known issues during recovery whereby the water level in the borehole rises more quickly than the water table surface in adjacent aquifer nodes resulting in flow from the borehole into the aquifer.

Mathematical formulation
The radial flow model is based on the continuity equation for transient groundwater flow through a porous medium in cylindrical coordinates: where q is specific discharge [LT −1 ] in the radial (r), cylindrical (θ), and vertical (   yield [-] of the layer. Note that b = z t -z b for confined layers and b = h -z b for unconfined layers. Specific discharge in the radial and cylindrical dimensions is calculated by the Darcy-Forchheimer equation: where β is the non-linear Forchheimer parameter [L −1 T] and K r,θ is hydraulic conductivity [LT −1 ] in the radial and cylindrical directions. This reduces to Darcy's Law when β = 0. The numerical model is based a finite difference approximation to the continuity equation, which reduces to the following set of ordinary differential equations with respect to time, t: , , , max( , , , 1, , ) , , , , max( , , , 1, , ) This allows groundwater head to be approximated at each node in the radial (i), cylindrical (j) and vertical (k) directions, whereby r i is the radial distance from the borehole to the node. Mathias et al. (2008) show that the Darcy-Forchheimer equation can be solved for specific discharge to give (in the radial direction): Vertical flow across layer boundaries is calculated by: Where K z is the vertical hydraulic conductivity of a layer and Δz is the thickness of the layer. The vertical and horizontal hydraulic conductivities, Forchheimer parameter, specific storage and specific yield can be set at different values for each individual layer in the model to represent vertical heterogeneity in the aquifer. Horizontal heterogeneity can also be incorporated by specifying different conductivities and storage values at each node within a layer. If this is the case the harmonic mean is used to calculate the conductivity between adjacent nodes. The Darcy-Forchheimer radial flow model is coded using the MATLAB software package (Mathworks, 2012). The set of non-linear ordinary differential equations is solved using the stiff differential equation solver ode15s (Shampine and Reichlet, 1997). This is an implicit, variable-order solver, which uses adaptive time-stepping based on the error in computed head at each iteration.

Validation & testing
The model was evaluated against various analytical solutions that describe drawdown in simplified aquifers and was found to accurately reproduce the response in large diameter boreholes in confined and unconfined aquifers (Papadopulos and Cooper, 1967;Neuman, 1972). The model was also evaluated against a late-time approximation to the Forchheimer equation using the Cooper-Jacob solution (Cooper and Jacob, 1946) and the method of matched asymptotic expansion (Mathias et al., 2008). This provides a benchmark for testing the nonlinear behaviour of the SPIDERR model. A series of single layer, confined, and laterally extensive (10 km radius) models were set up with a hydraulic conductivity of 50 m day −1 , aquifer thickness of 10 m, specific storage of 10 −6 m −1 and abstraction rate of 1000 m 3 day −1 . Eight non-linear models were run with the Forchheimer parameter increasing from 0.05 to 1.5 m -1 day; these models were compared with an equivalent linear model in which the Forchheimer parameter was set to zero ( Table 1). The transmissivity and storage, thus the linear component of drawdown, were constant in all models and the variations in drawdown between the linear and non-linear models is therefore due only to variations in the Forchheimer parameter. The proportion of the total late-time drawdown attributable to the non-linear component of drawdown is shown in Table 1 for each test simulation.
The results of the linear and non-linear simulations were compared with the analytical model whereby the Cooper-Jacob non-linear parameter is calculated from the Forchheimer parameter according to the relationship presented by Mathias and Todman (2010) (see Table 1). The numerical model does not match the early-time data of the analytical model because this does not account for well storage, but there is agreement between the models at late times in all simulations (Fig. 3a).
Comparison of the cones of depression at the end of the simulation (1 day) shows that the influence of non-linearity decreases away from the borehole. The test simulations presented in Fig. 3 show that the relationship between the Forchheimer parameter and the distance from the borehole at which the influence of non-linear flow is no longer observed (defined as the distance at which the difference between the final cones of depression for the linear and non-linear models is < 1 cm and therefore largely unmeasurable) is approximately linear (Fig. 3b). The difference between the final cone of depression for the linear model and non-linear test 1, which has a Forchheimer parameter of 0.05 m -1 day, is negligible (< 1 cm) at a distance of 25 m from the borehole (Fig. 3c); for the final non-linear test, which has a Forchheimer parameter of 1.5 m -1 day, the cone of depression converges to the linear model at a distance of around 700 m from the borehole (Fig. 3d). Table 1 shows that the non-linear component of drawdown in test 8 accounts for more than 95% of the total drawdown, which is unlikely to be a realistic estimation. In the case studies presented by Mathias and Todman (2010), which simulate step-drawdown tests in four different aquifer types using a 1D Darcy-Forchheimer radial flow model, the non-linear component of drawdown accounts for between 1 and 60% of the total drawdown. For equivalent components of nonlinear losses the simulations presented above suggest that the impact of non-Darcian flow will be fairly localised around the abstraction borehole, with the biggest impact on drawdown observed within 10 m of the borehole. This has implications for the required location of an observation borehole to verify the non-linear component of drawdown, which would need to be located close to the abstraction borehole to be within the cone of influence of non-Darcian flow.
SPIDERR is further evaluated by reproducing two relatively simple step drawdown tests previously presented by Clark (1977) and van Tonder et al. (2001) and modelled by Mathias and Todman (2010) using a 1D Forchheimer model. These tests are from confined sandstone aquifers and are accurately reproduced by a 1D homogeneous confined SPIDERR model using the best-fit parameters identified by Mathias and Todman (2010) (Fig. 4).
The validation exercises described above demonstrate the ability of SPIDERR to represent linear and non-linear flow to boreholes in relatively simple homogeneous aquifers. Further evaluation of the model for simulating observed groundwater level data from more complex heterogeneous aquifers is provided in the following section through application to a variable-rate pumping test in a Chalk abstraction borehole.

Study area & pumping test data
The SPIDERR model represents a range of hydrogeological and borehole construction features that may control the groundwater level response in an abstraction borehole, providing a tool for simulating drawdown in complex heterogeneous aquifers where the assumptions of simpler analytical models are not met. This is illustrated in the following sections through application to a groundwater source in the Chalk aquifer in the UK, which displays a distinct non-linear response to pumping. The model is used as a tool to investigate the key factors influencing drawdown, and thus the yield of the borehole, in this strategically important, dual permeability aquifer.
The Chalk aquifer is designated a principal aquifer in the UK, accounting for up to 70% of the total public water supply in parts of southern England (Butler et al., 2012). The Chalk is a very fine-grained limestone, deposited over significant parts of England and northern Europe during a Cretaceous marine transgression. In southern England, the Chalk aquifer has a thickness of 200-400 m and is classified into several members based on widespread flint and marl units (Bristow et al., 1997). The Chalk matrix is highly porous (20-45%) but has very low permeability (< 10 −2 m day −1 ). Matrix blocks are typically bound by interconnected fractures. The fracture network, which is often enhanced by dissolution, is highly permeable (> > 10 −2 m day −1 ) and is the principal mechanism for groundwater flow in the saturated zone (Allen et al., 1997;MacDonald et al., 1998). Trends in fracture density are controlled by depth and location within a catchment and correspond with distinct vertical and lateral variations in transmissivity and storativity (Owen and Robinson, 1978). Fractures generally have a higher density at shallow depths and in valleys and dry valleys, and a lower density at greater depths and on the interfluves (Williams et al., 2006). The Chalk aquifer therefore displays two broad vertical hydraulic zones: (1) a high-permeability upper zone of active groundwater movement within which the water table fluctuates and fractures are enhanced by dissolution; (2) a deeper zone of slow regional-scale flow characterised by low, relatively uniform permeability. Lateral variations in transmissivity are also observed, with transmissivity estimates from pumping tests in valleys or dry valleys typically ranging from 500 m 2 day −1 to 2000 m 2 day −1 , and those on the interfluves suggesting transmissivities of approximately 50 m 2 day −1 (Allen et al., 1997). The variations in the properties of the Chalk described above have a strong influence on the yield of abstraction boreholes, which often vary non-linearly with groundwater level (Rushton and Chan, 1976;Butler et al., 2009;Mansour et al., 2011;Tamayo-Mas et al., 2018).
The pumping test analysed for this study is from a borehole commissioned by Thames Water Utilities Ltd in the early 2000s, located at the western extent of the Bean well field in the south-eastern part of the London Basin (Fig. 5). The Chalk in this region forms the dip slope of the North Downs and the regional hydraulic gradient generally slopes from the Chalk escarpment in the south to the River Thames in the north. The Chalk of the North Downs is known to have well-developed karstic systems (Allen et al., 1997). It is locally covered by the Palaeogene Thanet Sand Formation and overlying Lambeth Group, which largely comprise clays to fine-grained sandstones. It is largely unconfined due to the depth of the water table.
The   and Lewes Nodular Chalk Formations are present with respective thicknesses of 62 and 45 m. The Seaford Chalk is composed mainly of soft, blocky white chalk, which has a high flint content in its lower 10 m in this area. The Lewes Nodular Chalk is composed of hard nodular chalk, and is well-fractured in its upper 10 m. The rest water level was recorded at a depth of 60 m; the caliper log shows borehole enlargement due to acidisation between 60 and 80 m depth with an increased diameter of 380 mm, below which the diameter is closer to the drilled diameter of 248 mm. The interval between 60 and 90 m depth shows lower fluid temperature and electrical conductivity relative to those at lower depths; below 90 m depth the gamma and resistivity logs show a number of distinct marl bands, which coincide with higher fluid temperature and electrical conductivity (Buckley, 2003). The geophysical observations suggest that the majority of inflow to the borehole occurs above a depth of 90 m, corresponding with the fractured boundary between the Seaford and Lewes Nodular Chalk Formations (Fig. 6). The borehole was tested with a variable-rate pumping test, which was carried out after two phases of borehole development involving acid injection and air lifting. The test consisted of five 100-min steps at abstraction rates increasing from 10 to 30 L per second (ls −1 ) (0.86-2.6 ML per day). The time-drawdown data from the pumping test are shown in Fig. 7. There is evidence of non-linear behaviour during the test, particularly the final step, which shows a significant increase in drawdown relative to the increase in abstraction rate. The increased drawdown observed during step five occurs when the pumped groundwater level in the borehole falls below the boundary between the Seaford and Lewes Nodular Chalk Formations, which also coincides with the reduction in borehole diameter. As outlined in the introduction, non-linearity in the groundwater level response to pumping could be due to a number of factors, including aquifer heterogeneity, borehole construction and losses, and aquifer losses due to non-Darcian flow. These will be explored further in the following section.
Groundwater levels were not monitored in any observation boreholes during this pumping test, therefore analysis focuses solely on the abstraction borehole.

Methodology
SPIDERR is used to simulate the variable-rate test described above    both to evaluate the model's ability to reproduce drawdown in complex, heterogeneous aquifers, and to investigate the potential causes of nonlinearity in the groundwater level response in this Chalk abstraction borehole. In order to do this, the following hypotheses were tested with a series of simulations with increasing model complexity: 1. The non-linear response to pumping is due to neither vertical heterogeneity or non-Darcian flow and can be reproduced using a single-layer homogeneous model without non-Darcian flow; 2. The non-linear response to pumping is due only to non-Darcian flow and can be reproduced using a single-layer homogeneous model with non-Darcian flow (as for the validation runs in Section 2.3); 3. The non-linear response to pumping is due only to vertical heterogeneity and can be reproduced using a two-layer heterogeneous model without non-Darcian flow; 4. The non-linear response to pumping is due to vertical heterogeneity and non-Darcian flow and can be reproduced using a two-layer heterogeneous model with non-Darcian flow.
For each of the hypotheses described above, a single-or two-layer model was set up with logarithmically spaced nodes extending from the borehole to an outer boundary at a radial distance of 1000 m. A sensitivity analysis was carried out to test the impact of node refinement on model results, highlighting that grid refinement in the vicinity of the borehole has a significant impact on model results when non-Darcian flow is included. An optimum number of radial nodes was selected to balance accuracy and stability in the model results with model runtimes. For the linear and non-linear simulations, 33 nodes and 801 nodes were logarithmically spaced along the radial dimension, respectively. This has obvious implications for model run-times, however the impact is not significant for practical purposes (example model runtimes are shown in Table 2). Model results were also checked to ensure the outer boundary did not impact the simulated drawdowns.
The top of the model was set at the elevation of the rest water level, which corresponds approximately to the base of the cased section of the borehole. The total model thickness was 77.5 m, with the base of the model corresponding with the base of the borehole. This was deemed acceptable because the geophysical surveys indicate that flow into the lower 47 m of the borehole was minor, and it was not necessary, therefore, to represent the Chalk below this depth. In the two-layer model, the elevation of the layer boundary was informed by the downhole geophysics and pumping test data and set to coincide with the boundary between the Seaford and Lewes Nodular Chalk Formations, which is also the approximate depth at which the borehole narrows. The borehole was open throughout the full thickness of the model and was set at the enlarged diameter of 380 mm in the single layer models. In the two-layer models, vertical variation in the borehole structure was incorporated with the diameter set at 380 mm in the upper layer and 248 mm in the lower layer. The layer elevations and hydraulic heads prior to pumping were assumed constant across the model domain.
For each of the hypotheses listed above, the radial flow model was calibrated to the variable-rate test using a Monte Carlo method, whereby a simulation consists of multiple model runs using different parameter sets that are sampled from a predefined range. The modelled groundwater levels in the abstraction borehole from each run are evaluated against the observed pumping test data allowing a best-fit parameter set to be identified.
Parameter ranges for each simulation, as shown in Table 3, were based approximately on minimum and maximum published values of the Chalk in the London Basin (Allen et al., 1997). For the single layer simulations 1000 model runs were performed, with parameters sampled from the ranges displayed in Table 3 (Sim1 and Sim2). It is suggested that the number of model runs required to adequately sample the parameter space should increase by an order of magnitude per parameter when Monte Carlo sampling methods are applied for hydrological modelling (Beven, 2001). However, for the two-layer models this becomes computationally unfeasible. Consequently, parameters were sampled using the quasi-random Sobol sequencing method, which produces sequences that are uniformly distributed in an n-dimensional unit cube (Sobol, 1967;Sobol et al., 1992), to sample more efficiently the parameter space and reduce the number of runs required. An initial simulation of 10,000 runs was therefore carried out for the two-layer non-Darcian model (Sim4c) and this was compared with a 100,000 run simulation for the same model set-up. The order of magnitude increase in model runs did not produce an improved fit to the observed data and therefore 10,000 runs were performed for all two-layer simulations (Sim3 and Sim4a-c) using the parameter ranges shown in Table 3.
The modelled groundwater levels were evaluated against the observed data from the abstraction borehole using the Nash-Sutcliffe Efficiency (NSE). This is a measure of the residual variance relative to the observed data variance and is widely used to evaluate hydrological models. An NSE value of one indicates a perfect fit to the observed data, while a value of zero indicates that the observed mean is a better predictor than the model. NSE is sensitive to magnitude and will therefore be biased towards the late-time data. This was considered acceptable for this application because the late-time data captures the non-linear response of the borehole and aquifer. Fig. 8 shows the modelled and observed groundwater level in the abstraction borehole for the best-fit model(s) from each Monte Carlo simulation of the variable-rate pumping test. The single layer models (Simulations 1 and 2, shown in Fig. 8a and b) do not reproduce the pumped response in the borehole. This may be partly due to the failure of the model to represent vertical variations in the aquifer and borehole properties and the delayed yield process caused by movement of the water table at later times. Incorporating non-Darcian flow does not significantly improve the model performance. This allows hypotheses one and two to be disregarded, indicating that the non-linear response in the Chalk abstraction borehole must be partly attributable to vertical variations in the borehole structure (i.e. a reduction in well storage with depth due to the borehole narrowing) and/or aquifer heterogeneity.

Results
The simulation using a two-layer Darcian model (Simulation 3) produces three models with a NSE of 0.98 (Fig. 8c). This is slightly improved upon by introducing non-Darcian flow in the bottom layer (Simulation 4b), as is evident in Fig. 8e, which shows four models with a NSE of 0.99. Including non-Darcian flow in the upper layer, as is the case in Simulations 4a and 4c, does not appear to improve the model's ability to reproduce the observed groundwater level response in the borehole ( Fig. 8d and f).
The parameter sets for Simulations 3 and 4b, as evaluated by the NSE, are shown in Figs. 9 and 10, respectively, and the best-fit parameters are given in Table 4. In both the Darcian and non-Darcian models, hydraulic conductivity (in both the horizontal and vertical dimensions) are reasonably identifiable and are generally an order of magnitude higher in layer one than layer two. This suggests that the non-linear response in the abstraction borehole is not fully explained by a reduction in borehole storage with depth, but is partly due to vertical heterogeneity in the Chalk aquifer. However, the significant role that the reduction in borehole diameter with depth has on the model results  Upton, et al. Environmental Modelling and Software 118 (2019) 48-60 is demonstrated in Fig. 11, which shows a comparison of the four bestfit two-layer non-Darcian models, with equivalent parameterisations but a uniform borehole diameter of 350 mm (Fig. 11). This further highlights the importance of both factors (vertical variations in borehole storage and hydraulic conductivity) in controlling non-linearity in the borehole response.
Introducing non-Darcian flow in layer two results in slightly higher hydraulic conductivities in this layer compared to the Darcian model. This provides a better fit to the late-time behaviour of the final step, during which the rate of drawdown is more significantly overestimated in the Darcian model. The slight improvement in model performance for the non-Darcian model compared to the Darcian model, particularly during the fifth step of the test, indicates that the non-linear response in the borehole is most likely caused by an interplay of aquifer heterogeneity, changes in borehole storage with depth, and non-Darcian flow.
Figs. 9 and 10 show that the storage parameters are not identifiable in Simulations 3 or 4b. This is not surprising given the models are only evaluated against observations from the abstraction borehole, and not the wider aquifer. Comparing the modelled cone of depression for the best-fit non-Darcian models (Simulation 4b) at the end of each step of the variable-rate test, shows that modelled groundwater levels in the wider aquifer vary significantly despite similar levels of drawdown in the abstraction borehole (Fig. 12). This effect is not only seen in the immediate vicinity of the borehole, but up to a distance of 100 m from the borehole. The validation runs presented in Section 2.3 highlighted that non-Darcian flow can also have an impact on drawdown away from the borehole, with the distance of influence proportional to the Table 3 Parameter ranges for each simulation to test the hypotheses outlined above.

Table 4
Best-fit parameter sets, as evaluated by the NSE, for the three best two-layer Darcian models (Sim 3) and four best two-layer non-Darcian models (Sim4b). Forchheimer parameter. Fig. 10 shows that the Forchheimer parameter varies by an order of magnitude for the four best-fit models in Simulation 4b therefore variations in drawdown away from the borehole are likely due to variations in both the storage and Forchheimer parameters. Groundwater levels from observation boreholes would help to better constrain these parameters, which would help to develop a more complete understanding of the main causes of non-linear behaviour in the borehole itself.

Discussion
The above simulations demonstrate the ability of SPIDERR to simulate drawdown in an abstraction borehole in a complex, heterogeneous unconfined aquifer. This is particularly important for determining the sustainable yield of an abstraction borehole where yields are highly dependent on pumped water levels and may not be linearly related to abstraction. The application presented here demonstrates that the model can be used as a tool to investigate the potential causes of this non-linearity in the groundwater level response to pumping, particularly where there aren't sufficient observations to fully understand the factors influencing drawdown in and around a borehole.
Increasing the complexity in the model structure, and performing a Monte Carlo simulation to calibrate the model to pumping test data from the Chalk aquifer, allowed the relative importance of vertical heterogeneity, borehole structure, and non-Darcian flow to be assessed. In the case studied, it was found that changes in borehole storage with depth as well as vertical heterogeneity in the Chalk aquifer have a significant impact on the model results, while introducing non-Darcian flow in the bottom layer was found to slightly improve the fit of the model to the observed pumping test data.
In general, the best fit to the observed data was achieved with a higher hydraulic conductivity in layer 1 compared with layer 2, which is consistent with our understanding of vertical heterogeneity in the Chalk aquifer. Non-Darcian flow was found to provide a better fit to the data when incorporated in layer 2, but not layer 1, which may be consistent with a conceptual model where flow is concentrated in fewer fractures in layer 2 therefore the average hydraulic conductivity of the layer may be lower, but non-Darcian flow may be relatively more significant. These outcomes are consistent with our general understanding of flow in the chalk under pumped conditions (Butler et al., 2009), which can often occur through highly preferential pathways, where velocities in the vicinity of abstraction wells are often sufficiently large for turbulent conditions to occur (Mathias et al., 2007).
The modelled groundwater levels in the abstraction borehole were not sensitive to the storage parameter, however, this significantly affects drawdown in the wider aquifer. We also demonstrate that non-Darcian flow impacts drawdown away from the borehole, with the cone of influence of non-Darcain flow proportional to the Forchheimer parameter. This highlights the need for better observation data in the vicinity of an abstraction borehole to constrain potential causes of nonlinearity in the groundwater level response.

Conclusions
Pumping test analysis remains an important tool not only for obtaining in-situ upscaled aquifer parameters, but also for assessing the sustainable yield of a groundwater source. The SPIDERR model builds on previous modelling capabilities for simulating groundwater flow to boreholes by allowing Darcian and non-Darcian flow to be represented in unconfined, heterogeneous aquifers. As shown through application to a Chalk borehole, this provides a tool for investigating the potential Fig. 11. Comparison of equivalent model parameterisations with a uniform (dashed line) and non-uniform (solid line) borehole diameter. Pumping test data provided courtesy of Thames Water Utilities Limited. Fig. 12. Modelled cone of depression in layer 1 (dashed lines) and layer 2 (solid lines) for the four best-fit non-Darcian models at the end of (a) step 1, (b) step 2, (c) step 3, (d) step 4, and (e) step 5. causes of non-linear behaviour in abstraction boreholes, which can have a significant impact on the sustainable yield of a borehole. In this case, the model was able to reproduce the short-term response in the Chalk abstraction borehole using a two-layer non-Darcian model, suggesting that non-linearity at this source is due to a combination of factors: a reduction in well storage with depth due to changes in the borehole diameter following acidisation; a reduction in hydraulic conductivity with depth, which is consistent with our conceptual understanding of the hydrogeology of the Chalk; and non-Darcian flow.
The long-term response in an abstraction borehole is dependent on regional groundwater processes as well as the local-scale processes investigated in this paper. The SPIDERR model has been developed as part of a larger study, which allows this borehole-scale model to be incorporated into conventional groundwater models, such as MODFLOW and ZOOMQ3D. This allows the effect of both local and regional groundwater behaviour on pumped levels in abstraction boreholes to be simulated, which is critical for evaluating the long-term sustainable yield of an abstraction borehole. This multi-scale methodology will be presented in a future paper.

Software availability
Name of software: SPIDERR; Developer: Dr Kirsty Upton (kirlto@ bgs.ac.uk); Availability: On request; Program language: MATLAB. SPIDERR is jointly owned by Imperial College London and the British Geological Survey.

Declaration of interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.