Towards integrated flood inundation modelling in groundwater-dominated catchments

Traditionally in flood inundation modelling the contribution of groundwater is either neglected or highly simplified. Long-duration groundwater-induced events, such as those that occur across Chalk catchments of northern Europe, can, however, incur significant economic and social cost. We present a new methodology for integrated flood inundation modelling by coupling the 2D hydrodynamic model LISFLOOD-FP with the 3D finite-difference groundwater model ZOOMQ3D. We apply the model to two adjacent Chalk catchments in southern England, the Lambourn and Pang, over two flooding events, during the winters of 2000/01 and 2013/14. A dense network of monitoring boreholes reveals local-scale heterogeneities in the aquifer not captured by the model. However, we demonstrate through inundation extent and streamflow comparisons that, on a regional scale, groundwater levels are simulated sufficiently well to capture groundwater inundation extent. The role of the unsaturated zone is discussed and contrasted between the two events. Currently, predictive tools to simulate groundwater flood events are limited, and this new, computationally efficient methodology will help to fill this gap.


Introduction
The importance of groundwater flooding has only been widely acknowledged in the past two decades, subsequent to widespread groundwater flooding events in the UK and France in the winter of 2000/01 (Marsh and Dale, 2002;Pinault et al., 2005). Groundwater flooding can be defined as the emergence of groundwater at the ground surface away from perennial river channels but can also involve the rising of groundwater into man-made ground and subsurface assets, including the basements of buildings and other infrastructure such as sewers (Booth et al., 2016;Macdonald et al., 2008Macdonald et al., , 2012. In some settings, groundwater discharging at the land surface will flow downgradient causing surface inundation of property and land. Due to the relatively slow recession of groundwater levels subsequent to the main flood event, groundwater flooding can last much longer than pluvial or fluvial flooding, with substantial economic impacts (Cobby et al., 2009).
Although recent research has examined groundwater flooding in the context of urbanised riverine floodplains overlying permeable (unconsolidated) superficial deposits (Gotkowitz et al., 2014;Kreibich and Thieken, 2008;Macdonald et al., 2012), the primary focus has been on the Chalk (bedrock) outcrop in northern Europe. Major groundwater flood events occurred on the Chalk outcrop in the southern UK and northern France in the winters of 2000/01 (Marsh and Dale, 2002;Pinault et al., 2005) and 2013/14 (Ascott et al., 2017). In 2000/01 a sequence of vigorous frontal systems across the UK brought successive heavy pulses of rainfall, resulting in totals approximately twice that of the long-term seasonal average (Marsh and Dale, 2002). Combined with above-normal antecedent groundwater levels, due to two preceding wet winters, this rainfall caused groundwater flooding that started in December 2000 and in many areas lasted until June of 2001 (Finch et al., 2004;Hughes et al., 2011). In 2013/14, a similar series of storm events resulted in the wettest winter on record for the UK (since 1910; Kendon and McCarthy, 2015), again causing long-term groundwater flooding on the Chalk outcrop that lasted until the summer of 2014. In contrast to 2000/01, antecedent groundwater levels in 2013/14 were close to or below normal (Muchan et al., 2015). During both events, baseflow to streams substantially increased resulting in downstream fluvial flooding, and groundwater discharges into normally dry valleys caused the flooding of property, land and transport routes down-gradient (Ascott et al., 2017;Morris et al., 2018). Abnormally high groundwater levels also severely compromised the sewerage network, causing overflows of contaminated water onto road surfaces.
Modelling of a Chalk catchment (414 km 2 ) in southern UK by Brenner et al. (2018) showed projected climate change may lead to a reduction of exceedances of high groundwater level percentiles. However, analysis by Ascott et al. (2017) showed that in those areas of the Chalk where overlying clayey superficial deposits are absent, groundwater levels can be highly responsive in the short term to the more intense rainfall events projected to be more common in the future (Fowler and Ekström, 2009) (see Fig. 2b for spatial extent of superficials). Jimenez-Martinez et al. (2016) suggested that groundwaterinduced flooding in a Chalk aquifer in southern UK may be four times more frequent by 2040-2069 than in the period , and seven times more frequent by 2070-2099. Given these projections, and the potential for increased vulnerability due to forecasted population growth (Miller and Hutchins, 2017), there is a need for appropriate flood risk management tools that incorporate all key mechanisms relevant to flooding in groundwater-dominated catchments. These tools should allow the simulation of flows within the subsurface but also on the land surface.
Generally, flood inundation models are run over periods of days or weeks, and inputs from groundwater are neglected because they are not considered significant over these time scales. This is a reasonable assumption in most cases. However, on permeable floodplains, surface water-groundwater interaction can play an important role in flood behaviour (e.g. Burt et al., 2002;Clilverd et al., 2016;Doble et al., 2012;MacDonald et al., 2014). There are a limited number of studies that have taken a simplisticor perhaps pragmaticapproach towards modelling groundwater flooding. A few have derived maps of groundwater flood susceptibility, or groundwater flood extents at particular times, by spatially interpolating point groundwater level measurements or modelled observation borehole hydrographs (McKenzie et al., 2010;Upton and Jackson, 2011), but these provide little or no information on flood dynamics. Some others have used conventional groundwater modelling methods to simulate floodplain groundwater levels, which may rise above ground (Mansour et al., 2016;Abboud et al., 2018;Yu et al., 2019), or the spatio-temporal pattern of discharge to the land surface (Habets et al., 2010), but these do not simulate floodplain flows.
Most recently, Morris et al. (2018) calculated groundwater discharge rates along chalk valleys within the Pang and Lambourn catchments, UK, using a simple Darcian calculation. These estimated flows drove their JFLOW CA-based flood inundation model (Bradbrook, 2006) but had to be adjusted to generate better matches of simulated floodplain flow against targeted spot flow measurements. These were taken during the flood event at observed points of groundwater emergence and at a point near the perennial head of the river downstream. Conversely to the above-mentioned studies, this approach simulates groundwater-driven floodplain flow, but not the groundwater system itself. Consequently, for many applications it does not allow predictive modelling: for example, changing groundwater flood risk under projected future climate; the generation of improved flood hazard maps in groundwaterdominated catchments; or combining real-time observed groundwater levels with meteorological forecasts for winter groundwater flood prediction.
Whilst a number of models have been developed that simultaneously solve for surface and subsurface flows (e.g. Kollet and Maxwell, 2006;Li et al., 2016;Liang et al., 2007;Spanoudaki et al., 2009), these have not been widely applied to simulate flood inundation. This is likely a result of the high computational cost involved, in particular because groundwater events occur over much longer time scales. Some recent studies have, however, gone down this route to simulate integrated subsurface and floodplain exchanges. Bernard-Jannin et al. (2016) used the MOHID Land model (Braunschweig et al., 2004) to simulate a small area (~7 km 2 ) of the Garonne floodplain, France, over a 5-month period, and Glaser et al. (2020) applied HydroGeoSphere to a micro-catchment (0.42 km 2 ) in Luxembourg over a 2.5-year period. On a regional scale, Berezowski et al. (2019) modelled the Biebrza River floodplain (250 km 2 ), Poland, over a period of 2 years with HydroGeoSphere and Saksena et al. (2019) applied ICPR to a flood event in the Upper Wabash River basin (5842 km 2 ), Indiana, USA, over a 1-month period.
In settings where dynamic effects are less significant, such as on floodplains, reduced-complexity cellular automata (CA) models have often been used to simulate surface water flooding (Teng et al., 2017). These have become popular because, by neglecting some or all of the dynamic terms of the Saint-Venant equations for shallow water flow, significant computational savings can be made against more complex hydrodynamic models, enabling them to be applied to larger catchments (e.g. Dottori et al., 2016;Guidolin et al., 2016). For this reason, CA models are well suited to being coupled with groundwater models and run over the time scales necessary to capture long-duration groundwater floods over large areas. However, to the best of the authors' knowledge the only published studies that have incorporated surface watergroundwater exchange into a CA flood inundation model are those of Barkwith et al. (2015) and Litwin et al. (2020), both of which focus on landscape evolution and simplify the groundwater component. This gap was recognised by Teng et al. (2017), who undertook a review of flood inundation modelling, highlighting that surface water-groundwater interaction is an under-represented process in codes used for this purpose.
The challenges of applying integrated models of saturatedunsaturated groundwater and surface water flows to regional groundwater systems are considerable, predominantly relating to complex parameterisation and long model run times (see Barthel and Banzhaf, 2016 for review). Here we report on the development of a new integrated flood modelling framework that allows prediction of the extent and dynamics of groundwater-driven floodplain flows using a simplistic parameterisation and relatively short run times. This couples a 3D finitedifference model of saturated groundwater flow, ZOOMQ3D, to a CAbased model of 2D floodplain flow, LISFLOOD-FP. We apply this coupled model (ZOOMQ3D-LISFLOOD-FP) to the same Chalk groundwater system investigated by Mansour et al. (2016) and Morris et al. (2018), and simulate the groundwater flooding events of the winters of 2000/01 and 2013/14. We compare the results from the new model to a range of observations, including the measurements of floodplain flow made by Morris et al. (2018), but with a focus on detailed maps of the areas inundated by groundwater. Finally, we discuss the challenges of adequately capturing groundwater flood behaviour in Chalk catchments, which often exhibit significant subsurface heterogeneity, and then make recommendations about further development of the model.

Model code development
To simulate groundwater flow, we use the finite-difference code ZOOMQ3D (Jackson and Spink, 2004). This solves the governing equation of saturated Darcian groundwater flow on a three-dimensional Cartesian grid, which can be locally refined horizontally to increase resolution. The model simulates groundwater flows between vertical layers of variable thickness according to heads and a vertical conductance parameter. It has been applied in a number of studies to simulate Chalk groundwater systems using the equivalent porous medium approach (e.g. Jackson et al., 2011;Le Vine et al., 2016;Parker et al., 2016;Upton et al., 2020).
To simulate flow over the floodplain, we use the method implemented in the cellular, or 'storage cell', model of Bates et al. (2010). This is one of the set of numerical solutions used in the widely applied LISFLOOD-FP floodplain inundation modelling code (Bates and De Roo, 2000). Storage cell models have been developed as an alternative to models that solve the full shallow water equations in two dimensions, which can involve a significant computational overhead. This reduced complexity approach discretises the floodplain into cells and calculates the flux of water in each Cartesian direction analytically, reducing the computational overhead to lower than that of equivalent numerical solutions of the full shallow water equations. Storage cell models take advantage of the fact that flows over floodplains are typically slow and shallow, and gradients of the local free surface are very small. Consequently, the inertial, or dynamic wave, terms of the momentum equation (Eq. (1)) of the Saint-Venant equations (Chow et al., 1998) can be neglected; these are the first two terms of Eq. (1), describing the local and convective acceleration, respectively.

∂u ∂t
where u is the fluid velocity [L/T], h is the water depth [L], S 0 is the bed slope [-], S f is the friction slope [-], g is the acceleration due to gravity [L 2 /T], x is the coordinate direction [L], and t is the time [T]. However, Bates et al. (2010) incorporated a simple local acceleration term into the formulation of inter-cell fluxes to improve the stability of such schemes and their range of applicability. LISFLOOD-FP has been used to simulate flood inundation at high resolutions within complex urban settings (Neal et al., 2011;Sampson et al., 2012), across large areas (Dottori et al., 2016;Schumann et al., 2013), and over decadal to centennial time-scales (Barkwith et al., 2015;Coulthard et al., 2013;Liu and Coulthard, 2017). Considering that groundwater flood events can last from weeks to months, alongside the functionality, efficiency and applicability of the storage-cell model of Bates et al. (2010), we selected it as the approach to adopt in this study. Hereafter we refer to this one solution method as LISFLOOD-FP. The ZOOMQ3D-LISFLOOD-FP code provides a choice of two methods for simulating the exchange of flood water and groundwater: (1) as a Darcian flux, dependent on the difference between the groundwater level and surface water level, or the digital terrain model (DTM) elevation if the floodplain pixel is dry, and an associated conductance parameter; and (2) groundwater discharge to the land surface is simulated by subtracting the DTM surface elevation from groundwater levels. The second method was chosen in this study. As opposed to a flat floodplain, where the groundwater-surface water interaction is often affected by the deposition of fine alluvial sediments, in groundwater-dominated valleys such as those modelled in this study, large changes in groundwater level drive stream-source migration, meaning error in simulated groundwater levels can be assumed to outweigh any benefit in introducing a Darcian exchange mechanism. Moreover, there is thought to be a high degree of connectivity between the chalk and the land surface, meaning any conductivity value assigned to a Darcian exchange would be very high. Typically the resolution of floodplain model cells is one or two orders of magnitude higher than that of the groundwater model cells. Consequently, multiple DTM cells will be associated with one groundwater solution point. At the end of every groundwater model time step, heads on the groundwater model grid are interpolated onto DTM cell centres using thin plate splines (Meinguet, 1979), which were similarly applied by Parker et al. (2016) to Chalk groundwater level data ( Fig. 1). At each DTM cell where the interpolated groundwater level is above ground elevation, the difference between the two is multiplied by the aquifer specific yield to calculate an equivalent surface water depth. This volume of water is then applied uniformly in time over the multiple, much shorter time-steps (typically seconds to a minute) of the floodplain model, which is iterated forward in time to the end of the groundwater model time-step. The LISFLOOD-FP-based floodplain model simulates the flow of water already on the DTM and the groundwater discharge added to it. The groundwater level of the groundwater model grid node is corrected to account for the sum of the discharges onto the associated DTM cells, before ZOOMQ3D solves for the next time step. If there is surface water on a DTM cell that is connected to a groundwater grid node where the groundwater level is below the land surface at the start of a groundwater model time step, then the water is removed from the DTM pixel and added to groundwater storage.

Study area
The study area consists of two adjacent catchments, the Lambourn and Pang, located approximately 70 km west of London in the southern UK and covering an area of 405 km 2 (Fig. 2). The two rivers drain the Cretaceous Chalk of the Berkshire Downs. Regional groundwater flow is generally to the south-east and controlled by the levels of the Kennet and Thames rivers, the latter lying in the deeply incised Goring Gap. The Lambourn and Pang rivers act, therefore, as localised, seasonally dependent controls on groundwater flow . The base flow index of the river Lambourn is 0.97 at the outlet and in the Pang catchment it increases from 0.87 at the outlet to 0.95 at the highest gauge (Frilsham). In the southern part of the Pang catchment, the Chalk is confined below Palaeogene deposits, comprising clay, sand and gravel, which generate surface runoff and give the Pang a lower baseflow index than the Lambourn at their lowest gauging stations (Fig. 3).
The Chalk aquifer is a dual-porosity, dual-permeability medium. Its matrix is fine grained and contributes little permeability. Instead the transmissivity of the aquifer is controlled by the density and distribution of fractures, which provide almost all of the hydraulic conductivity and specific yield (MacDonald and Allen, 2001). Hydraulic conductivity is highest in the top ~50 m of the aquifer and in river valleys, where dissolution and Devensian periglacial processes have enlarged fractures (Allen et al., 1997;Williams et al., 2006;Younger, 1989). Water tables tend to be gently sloping and the unsaturated zone can be as thick as 120 m. Recharge occurs predominately through the matrix and is highly attenuated (Ireson et al., 2006(Ireson et al., , 2009, but can also occur rapidly through piston displacement in the matrix (Lee et al., 2006) or through the activation of fractures during high intensity rainfall events (Ireson et al., 2009;Ireson and Butler, 2011). This rapid recharge can be seen in rises of 10 to 20 m in the water table at interfluves at the beginning of the winter (Allen et al., 1997;Price et al, 2000). Such large fluctuations in the water table result in dynamic headwaters in Chalk rivers; the River Lambourn can double in length between dry and wet periods (Finch et al., 2004;Griffiths et al., 2006).
Land use in the Berkshire Downs is predominately grassland and arable farming. The topography comprises gently sloping hillsides with a minimum elevation of ~50 m above sea level (masl) in the lower reaches of the Pang, and a maximum of ~250 masl in the upper reaches of the Lambourn (Fig. 3). Long-term average precipitation  varies with topography from ~670 mm/year in the lowest area to ~780 mm/year at the interfluves (Tanguy et al., 2019). Long-term average potential evapotranspiration  is ~600 mm/year (Hough and Jones, 1997).

Observation data
Streamflow data were available from the National River Flow Archive (NRFA, 2020) for the Lambourn at Shaw and the Pang at Pangbourne, Bucklebury and Frilsham for the 2000/01 and 2013/14 events, and for the Winterbourne, a minor tributary of the river Lambourn, at Bagnor in 2013/14 (Fig. 3). Streamflow data at Pangbourne in 2013/14 were, however, excluded from the comparison, as two breaches in the riverbank led to significant volumes of water leaving the Pang and flowing into the neighbouring Sulham Brook. Information on the quality of streamflow data were obtained from the UK Environment Agency (referred to as EA hereafter) for the Bucklebury and Frilsham gauges and these are displayed with streamflows (see Fig. 5c, e, f); in particular, a significant amount of flow was also observed to bypass the Bucklebury gauge in 2014 (Pang Valley Forum, 2020). The National River Flow Archive state that measured high flows at Pangbourne should be treated with caution, as overspill can occur from the neighbouring Sulham Brook and submergence can be caused by the river Thames (NRFA, 2020). However, no specific information on data quality was found for Pangbourne for the 2013/14 event. In addition to gauged river flows, spot flow measurements taken on a single occasion in March 2014 at six locations (Morris et al., 2018) were made available (Fig. 3). Groundwater levels were obtained from the EA from observation boreholes across the two catchments. Flood extent maps of the 2014 event derived from aerial photography, and manually translated into shape files, were obtained from the EA. The exact date on which the photography was taken is unclear, but it was judged by the EA to be roughly coincident with maximum flood levels. The extent maps cover only usually dry valleys. West Berkshire Council (2014) created flood extent maps of the 2013/14 flood from information collected from flood wardens and local residents, but these relate primarily to individual properties that experienced flooding, excluding valleys/farmland. Inundation was recorded Fig. 3. Study area and location of river and spot gauging. Villages marked are those in which properties were damaged by flooding in 2014. Contains data from BlueSky International Limited (2020) and CEH rivers. asl, above sea level. on 12 January 2001 -~1 month after peak flow recorded at Pangbourne and Shawthrough aerial survey conducted by UKCEH (Finch et al., 2004). These data are available as aerial photographs for the Lambourn and were manually drawn onto maps and digitised for the Pang (Finch et al., 2004). West Berkshire Council (2014) also details the dates of fluvial flooding in the Pang as well as the dates of onset of flow in the Pang and Roden rivers at Compton (see Fig. 2), which are usually dry, even in winter.

Groundwater model
The groundwater model used in this study was first developed in order to quantify the sustainability of groundwater abstractions from the Chalk (Jackson et al., 2006b), and it has since been used to further understanding of this aquifer through a number of studies relating to the dynamics of Chalk groundwater catchments (Jackson, 2012;Parker et al, 2016), climate change impacts , coupled land surface-groundwater modelling (Le Vine et al., 2016), and borehole yield assessment (Upton et al., 2020). The model covers ~2600 km 2 of the Chalk aquifer at the north-west of the synclinal structure of the London Basin (Royse et al., 2012), and the river catchments of the Kennet, Pang and Wye, which are tributaries of the River Thames (Fig. 2). A full description of the model is provided by Jackson et al. (2011), and so only a summary is given here.
The model is divided into three layers, the total thickness of which increases from the north-west as the Chalk thickens and dips south-east into the London Basin, becoming confined by the Palaeogene deposits of the Lambeth Group and London Clay (Fig. 2). The bottom layer represents the more clayey, lower hydraulic conductivity Chalk formations at the base of the sequence. The upper two layers were defined to enable a specification of the vertical variation of the chalk hydraulic conductivity consistent with the conceptual model of the aquifer. Horizontally, the mesh consists of a coarse base grid of 2 km square cells, which are refined to 500 m square cells over the Lambourn and Pang catchments (Fig. 2).
Groundwater is simulated to discharge from the aquifer through rivers, and springs issuing along the scarp slope of the Chalk. Where LISFLOOD-FP is not being used to simulate surface water-groundwater interaction, rivers are modelled using a network of river nodes, which exchange water with the aquifer via a Darcian head-dependent flux, and a simplified representation of channels (Jackson and Spink, 2004). Discharge from scarp slope springs is similarly represented by Cauchytype groundwater head-dependent boundary conditions. Historical recharge rates over the period 1971-2015 were calculated using the distributed ZOODRM soil moisture accounting code (Mansour and Hughes, 2004) applied as described by Jackson et al. (2011). The recharge model uses precipitation, potential evapotranspiration, topography, land use and soil data to estimate potential recharge. Excess precipitationi.e. when the soil moisture store is full and potential evapotranspiration has been metis split into recharge and runoff according to runoff coefficients. Runoff is routed across the land surface and can re-infiltrate as run-on recharge. The runoff coefficients were calibrated by comparing the simulated surface runoff with the surface runoff component of river flow monitored at 16 river flow gauging stations across the model domain. Historical groundwater abstraction is also included in the groundwater flow model. We use an equivalent porous media approach and vary model hydraulic properties laterally, with transmissivity varying between approximately 50 m 2 /day beneath the interfluves and 5000 m 2 /day in the River Thames valley. Hydraulic conductivities in the top layer range from ~2 to ~100-1000 m/day, in the second layer from ~1 to ~50 m/day and are ~1 m/day in the bottom layer. The modelled specific yield of the Chalk varies between 0.5% across the interfluves to 3% in the valleys. Where the water table is in superficial alluvium or gravel within valleys, calibrated specific yield values are ~10%.
The specific model setup and parameterisation was based on that of Mansour et al. (2016), which was calibrated to groundwater levels in 207 boreholes and to baseflow at 20 gauging stations across the modelled area over the period 1971-2013. However, three adjustments were made to the model. In the earlier model the time step and stress period length were one day and one week respectively. The stress period was reduced to one day to improve the simulation of the dynamics of groundwater floodplain interaction over the two shorter periods modelled: 1 November 1999 to 31 December 2001, and 1 November 2012 to 31 December 2015. Second, hydraulic conductivity in the top layer of the aquifer was adjusted around the village of Upper Lambourn to improve model fit to new groundwater level observations not previously available; and, third, minor adjustments were made to specific yield in the middle of the Pang, as updates to the recharge model led to increases in river flow.

Coupled groundwater-floodplain model set up
Surface flows within the Lambourn and Pang catchments, derived from groundwater discharge to the land surface, were simulated on a 50 m grid produced by upscaling the BlueSky 5 m DTM (Bluesky International Limited, 2020). Elevation values for the 50 m grid cells were calculated by taking the minimum value of the BlueSky DTM in each grid cell. Grids of resolution 10-20 m were similarly derived from the BlueSky DTM for selected valleys in order to obtain finer resolution flood extents. LISFLOOD-FP uses an adaptive time step, the length of which is controlled by the Courant-Freidrichs-Lewy condition (Bates et al., 2010) to maintain stability of the explicit solution. The minimum and maximum time step lengths were set to 1 and 600 s, respectively. Manning's roughness was set to 0.02 m 1/3 s − 1 uniformly across the catchments. Consequently, we did not represent spatial variability in land surface roughness. However, it was found through a series of simulations in which Manning's n was varied between 0.02 and 0.2 m 1/ 3 s − 1 that the simulated flood extent and streamflow were insensitive to this parameter, in part due to the long time scales over which flooding occurs. Future development of the code to make use of parallel computation will facilitate the specification of a higher resolution DTM, and the representation of different land cover types with more appropriate Manning's n values (Werner et al., 2005). However, this was not within the scope of this study.

Groundwater levels
The root mean square error (RMSE) of simulated groundwater levels in the Lambourn ranges from 0.4 to 4.6 m over the first simulation period (Nov. 1999-Dec. 2001). In most (90%) locations the RMSE ranges from 0.4 to 6.5 m over the second simulation period (Nov. 2012-Dec. 2015, although at the top of the Great Shefford valley (Fig. 3) and at one borehole in the neighbouring valley, north of East Garston, the RMSE is between 6.8 and 10.2 m. RMSEs are higher at interfluves, where groundwater levels range by up to 20 m over the simulation period, and smaller in the valleys, where groundwater fluctuations are much smaller (1-5 m) (see examples in Fig. 4). Across the catchment, peak observed groundwater levels vary from 80 to 150 m. The accuracy of the model in simulating the peak groundwater level is shown in Fig. 4a, c and example hydrographs are given in Fig. 4b, d. In 2014, modelled peak groundwater levels tend to be too high in the Winterbourne valley, around Upper Lambourn and, in particular, at the top of the Great Shefford valley.
Simulated groundwater levels in the Pang have an RMSE of between 0.5 and 7.5 m during the first simulation period (Nov. 1999-Dec. 2001) and between 0.4 and 7.8 m during the second (Nov. 2012-Dec. 2015. The model captures the large changes of up to 20 m in groundwater level at the top of the catchment, although the peak is slightly overestimated at one location (see B4 in Fig. 4b). The only borehole for which there is a S.L. Collins et al. poor match to observations in 2000/01 is B3 to the eastern edge of the catchment, located far from the main valley and areas of flooding, and close to a major abstraction site. In 2013/14, there are observations available for a further two boreholes in the south east of the catchment and simulated groundwater levels are too low here. Further north and west in the catchment, where flooding occurred, there is a good match between simulated and observed groundwater levels (Fig. 4c, B8 in Fig. 4d).

Streamflow
Modelled streamflow comprises LISFLOOD-FP surface flows (base flow) and a small amount of runoff from ZOODRM. The Nash-Sutcliffe efficiency (NSE) for streamflow at Shaw, near the downstream end of the River Lambourn, was 0.65 during the 2000/01 event and 0.65 during the 2013/14 event (Fig. 5d, g). Modelled streamflows in the Lambourn are too flashy, with sharp rises and falls after periods of significant rainfall (Fig. 5d, g). The timing of the rising limb of the hydrograph is captured better in the 2000/01 event than in the 2013/2014 event, where simulated streamflows rise ~2 weeks before observed streamflows. The falling limb is also too steep for both events. During the 2013/ 14 event, flow in the River Winterbourne is significantly underestimated (Fig. 5h).
In the Pang, NSE values range between 0.45 and 0.61 during the 2000/01 event. As in the Lambourn, flows at Bucklebury and Frilsham are too flashy. In particular the large peak in late December 2000 observed at Pangbourne and simulated at all gauges of the Pang was not observed at Frilsham (Fig. 5a-c). Observed flows in February at Frilsham are higher than at Bucklebury further downstream suggesting overbank inundation may have occurred between the two gauges during the flood peak or that the gauging is unreliable. While the gauges at Bucklebury and Frilsham proved unreliable throughout much of the 2013/14 flood event, the few recorded flows at these gauges in January that did pass quality checks suggest the simulated flows at the beginning of the event are likely too high (Fig. 5e, f). This is supported by the observed onset of flow in the Pang and Roden rivers at Compton (Fig. 5i). Whereas the River Roden starts flowing 5 days after the observed onset of flow, the River Pang at Compton starts flowing ~1 month too early. The timing of the simulated peak flows, however, appears to be accurate, as it is consistent with reports of fluvial flooding in Hampstead Norreys and Pangbourne (West Berkshire Council, 2014) (dashed lines in Fig. 5e, f). The summer and winter preceding the 2013/14 event were drier than those preceding the 2000/01 event and this could have meant that recharge was delayed in 2013/14, despite very high rainfall. Table 1 presents a comparison of spot flow measurements and modelled surface flows. There is likely to be a high degree of uncertainty in these measurements, but they are a good, order-of magnitude guide for flows in those areas most affected by groundwater flooding. It can be assumed that those measured in channels are probably more accurate than those measured at road sides (full description in Morris et al., 2018). In the Upper Lambourn, there is a good match between the spot flow measurements and the modelled flows: percentage error is 0% and 20% at the two locations (Table 1, see Fig. 3 for locations). In the Great Shefford valley, modelled flows are too low: roughly three times too low at Great Shefford village and five times too low further up the valley. The measurements were taken during the recession of the flood, at which point the streamflow at Shaw is also underestimated (see Fig. 5g). In the Pang, the spot flow measurement taken at East Ilsley is well matched by simulated flow (6% error), but that taken in Compton on the River Roden is seven times larger than simulated flow. Simulated flow on the River Roden was an order of magnitude higher at the beginning of March (0.25 m 3 /s on 3 March 2014), but then declined rapidly.

Flood inundation
The model captures the fundamental behaviour of Chalk streams with the source moving up and down the catchment in winter and summer. Fig. 6a shows inundation at its peak in February 2014 with flow in normally dry valleys, such as the Great Shefford valley and Mile End as well as in the upper parts of the Pang catchment. Overall there is a good match between the modelled inundation and the flood extents maps. Fig. 6b provides a comparison between simulated inundation and flood extent derived from aerial photography carried out by UKCEH in the upper Pang in January 2001 (Finch et al., 2004). Inundation within villages is not captured by the aerial photography, but otherwise there is a good match between the simulated and observed extents. Simulated flood extents in 2001 also compare well with aerial photographs taken in usually dry valleys, such as the Great Shefford valley and Mile End (Fig. 6d, e) (Finch et al., 2004). The model performs very well in the upper Pang in 2014, which was particularly badly affected by groundwater flooding (Fig. 6c). However, the model fails to produce any inundation at the top of the Winterbourne valley, which was recorded by both the EA and West Berkshire Council, and produces significantly less inundation west of the village of Upper Lambourn than was recorded by the EA. Simulated groundwater levels at these two locations are higher than observed levels (Fig. 4c), suggesting the inundation could be caused by rapid flow in the unsaturated zone, for example through karst features, or by surface runoff from saturated soils. The model's underestimation of inundation in the Winterbourne valley is consistent with the model's underestimation of river flows at Bagnor (Fig. 5f).

Discussion
Accurately modelling groundwater levels in Chalk aquifers is difficult, particularly during flood conditions. It is for this reason that previous efforts to model flooding in Chalk have focussed on correlations of groundwater level and river flows (Bradford and Croker, 2007), linear regressions between rainfall and groundwater level (Adams et al., 2010), transfer functions (Jimenez-Martinez et al., 2015) and one-dimensional lumped parameter models (Upton and Jackson, 2011). As well as having dual porosity and dual permeability, an additional complexity of Chalk is its active karstic features and sinking streams, both of which have been identified in tracer tests in the lower parts of the Lambourn and Pang (Banks et al., 1995;Maurice et al., 2006). These features in the unsaturated zone likely become more important with high groundwater levels. However, owing to the complexity and computational demand of modelling the Chalk unsaturated zone, which can be up to 75 m thick in these catchments (Jackson et al., 2006a), to date only 1D models (Ireson et al., 2009;Mathias et al., 2006) and a 2D transect (Ireson and Butler, 2013) have been attempted.
This study sought to use an existing groundwater model setup to assess the performance of the coupled ZOOMQ3D-LISFLOOD-FP code in capturing floodplain inundation without undertaking significant additional groundwater model calibration. The groundwater model generally performs well during the 2000/01 flood event with regard to groundwater levels (Fig. 4a). This is to be anticipated, as these are the observation boreholes against which the groundwater model was originally calibrated . In 2013/14, data from a number of additional observation boreholes not included in the original calibration were available. The performance of the model against these new observations is more mixed (Fig. 4c). However, as shown in Fig. 4c, model performance can vary dramatically between observation boreholes in the same valley and even at the same location. This is illustrated by boreholes B5 and B6. Although these boreholes are just 4 m apart, observed groundwater levels in borehole B5 are ~8 m higher than in B6. Given the resolution of the groundwater model is 500 × 500 m, it would not be possible to accurately simulate both of these levels or, indeed, all observation boreholes around Upper Lambourn and at the top of the Winterbourne valley. As a test, a nested 100 × 100 m groundwater grid was incorporated over Upper Lambourn, but this had no effect on the simulated groundwater levels, suggesting that increasing just the resolution of the model does not improve model performance unless more information about heterogeneity is available and can be parameterised in the model or a re-calibration is performed.
Whereas a groundwater level in a highly heterogeneous aquifer is a local-scale observation affected by fractures or karstic features, streamflow and inundation extent are macro-scale observations representative of a whole valley or catchment. Despite some poor matches to observed groundwater levels in the Lambourn, streamflow and inundation extent are both simulated well (NSE 0.65 in both simulation periods; Fig. 6d, e). The estimated base flow contributions for the Lambourn at Shaw are 0.86 and 0.91 in December 2000 and February 2014, respectively. That streamflow is simulated well and the event was groundwater-driven, suggests the model 'on average' simulates groundwater levels sufficiently well to capture the flooding events.  (Fig. 6b, c), and the model correctly captures the timing of the main flood peak in the Pang in 2014. Streamflows are, however, too flashy in both catchments and during both events. Refining the resolution of the LISFLOOD-FP topographic model to 25 m had a negligible effect on simulated flows, suggesting the flashy behaviour of the model is not a result of the coarse representation of topography. While beyond the scope of this study, we recognise that some features of the model can be modified to improve model performance, but this will be undertaken in future work. First, a re-calibration of the groundwater model could be performed. It was originally developed as a water resources model to study the sustainability of groundwater abstractions (Jackson et al., 2006a(Jackson et al., , 2006b. As such, it was calibrated based on mean baseflow and mean seasonal peak and low flows, and it was run with monthly stress periods and time steps. We reduced both to daily because Jackson et al. (2011) showed that shorter time steps and stress periods were appropriate for high groundwater levels in this Chalk aquifer. However, it might be that the change in time step and stress period invalidates the previous calibration. A new calibration would also take into consideration new groundwater level data that has since become available. Second, another potential source of error could be that water is passed straight from the bedrock to the surface neglecting the properties of the soil zone. This was deemed appropriate given the high degree of connectivity between the chalk bedrock and the land surface. However, it could be investigated as to whether alluvium in the valleyalthough not extensiveis smoothing the emergence of groundwater. Third, the flashy hydrographs could be a result of failing to account for the buffering capacity of the unsaturated zone, which is not represented in the groundwater model. The groundwater model is driven by recharge from a soil moisture balance model, which is applied instantaneously to the groundwater table. Jackson et al. (2011) imposed a one-month delay on recharge in order to account for the unsaturated zone in the Lambourn and Pang, but they were using the model as a water resources model to investigate changes in average baseflow. As recharge is the only driver of the groundwater model, if we were to delay the recharge by one month, the streamflow hydrographs would simply be shifted by one month. It can be deduced from a visual inspection of Fig. 5 that this would be too much. Le Vine et al. (2016) used the land surface model JULES (Best et al., 2011) to provide recharge for the same groundwater model as was used in this study. The standard JULES configuration then (version 2.2) did not represent surface-water routing and so catchment runoff rates were averaged over 10-day intervals to compare to observations. Flows simulated by JULES at the Shaw gauging station on the River Lambourn were flashier than those in this study and baseflow too low (NSE of 0.39 for the period 1995-2007), suggesting their model generated too much surface runoff in the Lambourn. Fig. 7 shows a typical groundwater level (B4 in Fig. 4a) in the Pang in the months preceding the flood events as well as cumulative precipitation. Simulated groundwater level rises more quickly than observed because recharge is not attenuated, and this leads to erroneous rises in streamflow (e.g. peak in Dec. 2000 and rising flow in Jan. 2014). This effect is weaker in the 2000/01 event because 2000 was a wet year and wet antecedent soil moisture conditions can trigger rapid recharge through piston displacement or fracture activation. As 2013 was drier than 2000, recharge appears to have been slower and the observed groundwater level takes longer to 'catch up' with the simulated level. Simulated levels tend to peak at the right time (Fig. 4b, d), meaning the simple 1-month delay is inappropriate for modelling flood peaks and that any approach used to attenuate recharge would have to consider the depth to the water table.
In excluding the unsaturated zone, we make large gains in computational time. It is difficult to compare run time between studies, given that runtime is dependent on the event (i.e. amount of inundation) and computer hardware. Saksena et al. (2019) quote combined run and build times between 18 and 28 h for a 1-month simulation on an Intel i-7 system with a processing speed of 3.7 GHz. On a comparable system, ZOOMQ3D-LISFLOOD-FP simulations of the Lambourn and Pang for the 26-month and 38-month periods in 1999-2001 and 2012-15 were 2.5-3.5 and 4-6 h, respectively. Although the model of Saksena et al. (2019) was much larger in area, the number of cells in the model was on the same order of magnitude as the models in this study. Moreover, fully integrated models usually require the surface water and groundwater catchments to have the same extent, whereas in our ZOOM-Q3D-LISFLOOD-FP setup we make large computational savings by applying ZOOMQ3D to the entire aquifer and LISFLOOD-FP to specific catchments or even valleys within catchments. Future work will aim to diagnose some of the limitations of the model application presented here, and explore in more detail the trade off between complexity and computational burden in modelling groundwater flooding.

Conclusions
Whereas hydraulic flood inundation models are commonly run for periods of days or weeks, the long duration of groundwater flood events necessitates a simpler, more computationally efficient approach. We achieve this by coupling a reduced-complexity cellular automata model with a finite-difference groundwater model. The coupled model is capable of simulating flood inundation from complex subsurface flow systems. Very little attention has been given in the literature to the development of bespoke physics-based approaches to groundwater flood modelling. We thus envisage that this methodology will help fill a gap in predictive tools for groundwater flood modellers and water resources managers in simulating areally extensive groundwater flood events.
The new coupled model was applied to two Chalk catchments in southern England. We demonstrate the power of this coupled system to capture groundwater inundation in usually dry valleys by comparing simulated and observed flood extents. Where reliable observational data are available, the model also performs well in terms of streamflow. We highlight the probable role of the unsaturated zone in smoothing the flood peak in reality; and, while beyond the scope of this work, we suggest that streamflows could be improved by attenuating recharge before it is passed to the groundwater model. Parallelisation of the code would speed up the coupled model even further and make computationally demanding tasks, such as robust uncertainty analyses and automatic parameter estimation, more feasible.

Code and data availability
The code can be made available from the authors upon request. Rainfall (Tanguy et al., 2019) and potential evapotranspiration (Hough and Jones, 1997) are available from the NERC Environmental Information Data Centre. Groundwater level data and inundation extents can be obtained from the UK EA upon request. Streamflow data are available from the National River Flows Archive (https://nrfa.ceh.ac.uk/).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.