Urban drainage models for green areas_ Structural differences and their effects on simulated runoff

Mathematical stormwater models are often used as tools for planning and analysing urban drainage systems. However, the inherent uncertainties of the models must be properly understood in order to make optimal use of them. One source of uncertainty that has received relatively little attention, particularly for increasingly popular green areas as part of urban drainage systems, is the mathematical model structure. This paper analyses the differences between three different widely-used models (SWMM, MOUSE and Mike SHE) when simulating rainfall runoff from green areas over a 26-year period. Eleven different soil types and six different soil depths were used to investigate the sensitivity of the models to changes in both. Important hydrological factors such as seasonal runoff and evapotranspiration, the number of events that generated runoff, and the initial conditions for rainfall events, varied significantly between the three models. MOUSE generated the highest runoff volumes, while it was rather insensitive to changes in soil type and depth. Mike SHE was mainly sensitive to changes in soil type. SWMM, which generated the least runoff, was sensitive to changes in both soil type and depth. Explanations for the observed differences were found in the descriptions of the mathematical models. The differences in model outputs could significantly impact the conclusions from studies on the design or analysis of urban drainage systems. The amount and frequency of runoff from green areas in all three models indicates that green areas cannot be simply ignored in urban drainage modelling studies.


Introduction
Stormwater management is a challenge for cities around the world and has traditionally been addressed using pipe-based drainage networks. These networks are costly to construct and maintain and may potentially exacerbate flooding and water quality issues downstream. Green infrastructure attempts to solve these problems by managing stormwater in a more natural way (see e.g. Eckart et al., 2017;Fletcher et al., 2013). Effective and efficient design and operation require the ability to analyse and predict the performance of such green urban drainage systems. This is often conducted using one of several simulation models available for urban drainage systems (see e.g. Elliott and Trowsdale, 2007). Such models are always simplifications of reality, which result in uncertainties associated with their use. Understanding the sources and describing the magnitude of these uncertainties is necessary if the models are to contribute to optimal management and design of stormwater systems (e.g. Deletic et al., 2012). A lack of understanding could lead to either over-dimensioned, unnecessarily expensive drainage systems, or under-dimensioned systems that lead to flooding and damages.
Uncertainties in urban drainage modelling arise from different sources that may be divided into three groups: model input uncertainties, calibration uncertainties and model structure uncertainties . The uncertainties related to input data and the calibration process have been studied to some extent, albeit more extensively for traditional pipe-based drainage systems than for green infrastructure. Dotto et al. (2014) showed that calibration can to some extent compensate for measurement errors. Following developments in natural hydrology, different methods for estimating parameter uncertainty have been applied to urban drainage modelling (Freni et al., 2009;Dotto et al., 2012), including mathematically formal Bayesian methods (e.g. Kleidorfer et al., 2009;Dotto et al., 2011Dotto et al., , 2014Wani et al., 2017) and mathematically informal methods such as GLUE (e.g. Thorndahl et al., 2008;Breinholt et al., 2013;Sun et al., 2013). Improvements in the calibration procedure have been proposed based on higher-resolution input data (Krebs et al., 2014;Petrucci and Bonhomme, 2014) or a more systematic way of distinguishing the parameter sets that provide an acceptable match between simulated T and observed values in order to estimate parametric uncertainty (Vezzaro et al., 2013).
Despite this growing body of research on urban drainage modelling, model structure uncertainty has received little attention. Although model results for green areas are sometimes compared (e.g. infiltration models in Duan et al., 2011), most comparisons do not explicitly discuss the applied model structures. A review by Elliott & Trowsdale (2007) mainly describes the practical and technical differences between a number of common urban drainage models in terms of model uses, temporal and spatial resolution, runoff generation, flow routing, representation of low-impact development facilities and user interface. This review does not consider the differences between the conceptual models used by each of the models (compare e.g. DHI, 2017a; Rossman and Huber, 2016). Understanding these differences and their implications is valuable for three reasons. First, these differences can be expected to give rise to differences in the results of modelling studies, depending on which model is utilised. It may be argued that models are usually calibrated to field measurements and that this will reduce or even eliminate the differences between their outcomes. However, this argument fails to consider that different calibrated models may predict similar outflow rates, while diverging in other aspects of the water balance (Kaleris and Langousis, 2016;Koch et al., 2016). For urban drainage models specifically, pipe leakage, infiltration and misconnections may also affect the measured flow rates, and these issues cannot be identified from outflow measurements alone. Such divergence could lead to conflicting outcomes if different models are used to forecast the performance of urban drainage systems, for example, examining the differences between various climate and/or development scenarios (see Karlsson et al. (2016) for an example of this in a natural catchment). Second, model calibration is often performed by adjusting only a subset of model parameters based on a sensitivity analysis. This kind of analysis is traditionally performed using one of the many experimental numerical techniques available, but it could also benefit from an improved theoretical understanding of the ways in which different parameters affect the model results. Third, models may be used to predict more extreme events than they were calibrated for, and understanding the ways the model might react to more extreme rainfall input can help in understanding the uncertainty involved.
The three reasons outlined above equally apply to models of traditional and green urban drainage systems. One of the aims of the increasing use of green infrastructure is to locally retain and store water for a longer period. Thus, from a theoretical perspective, models of such systems would also need to include slower in-soil processes such as infiltration, groundwater recharge and evapotranspiration (Fletcher et al., 2013;Salvadore et al., 2015). This was also demonstrated by Davidsen et al. (2018), although they only modelled infiltration capacity using Horton's equation and did not consider the fate of the water after it had infiltrated. The effect of including a more complete description of in-soil processes (more similar to descriptions used in models of natural hydrologic systems) on urban runoff processes at a fine temporal and spatial resolution has not yet been studied extensively.
In addition to the theoretical considerations in the previous two paragraphs, there are also experimental indications that model structure can play a significant role in urban drainage modelling studies. Fassman-Beck and Saleh (2018) found that the EPA Storm Water Management Model (SWMM) displayed some counter-intuitive behaviour when modelling a bioretention system. For green roof modelling in SWMM, Peng and Stovin (2017), for example, found that the green roof module requires further modifications. Leimgruber et al. (2018) found that water balance components were insensitive to most parameters involved in the process (which could mean that the model structure is unnecessarily complex) and Johannessen et al. (2019) raised concerns about the large variability in calibrated model parameters and their non-correspondence with material properties. Within runoff quality modelling, the validity of traditional build-up/wash-off models was shown to be limited to a subset of cases (Bonhomme and Petrucci, 2017;Sandoval et al., 2018).
Considering the above, there are indications that model structure uncertainty could considerably impact modelling outcomes, but it has received relatively limited attention in research on modelling green urban drainage. Thus, this paper aims to further the understanding of model structure uncertainties in urban drainage models that specifically address green areas. The research questions in this paper are: 1. What is the impact of model structure on the predicted runoff and evapotranspiration in green plots with different soil textures and depths in an urban context? 2. How can the differences between conceptual model structures be linked to differences in model results?
Answering these questions could ultimately contribute to a better understanding and applicability of urban drainage models for green infrastructure.

Models
The three models used in this paper were the storm water management model (SWMM, version 5.1.011, U.S. Environmental protection Agency), MOUSE RDII (hereafter referred to as 'MOUSE', release 2017, DHI -Danish hydrological Institute) and MIKE SHE (hereafter referred to as 'SHE', release 2017, DHI). (Note that the PCSWMM (CHI) interface was used for initial setup of the model, but the final model runs were performed with the standard EPA SWMM 5.1.011 executable.) A brief description of the conceptual models is given below and in Fig. 1. For additional details see the official documentation for each of the models. An overview of all model parameters may be found in Table 1. SWMM and MOUSE are semi-distributed models, where the study area may be divided into subcatchments of arbitrary shape and size. Parameter values are set for each subcatchment. SHE, on the other hand, is a fully distributed model that uses a regular rectilinear grid to divide the study area into a large number of cells, and different parameter values can be assigned to each cell. All three models contain three possible storage compartments for each catchment: surface, unsaturated zone (UZ) and saturated zone (SZ) storage. The ways in which water is moved between these three storages and out of the catchment vary between the models. All three models have methods to account for snowfall and snowmelt, but since snow periods were not considered in this study these methods are not described here.

SWMM
SWMM (Rossman and Huber, 2016) conceptualizes each catchment as a single soil column that is divided into an unsaturated and a saturated zone by a variable groundwater level. The surface storage is filled by precipitation (supplied as a time series by the user) and emptied by overland flow, infiltration and evapotranspiration. Overland flow is described using Manning's Formula, which, for the case of overland flow in a catchment, can be written as: where n is Manning's coefficient (which depends on the roughness of the surface, including vegetation), S is the slope of the surface, d is the depth of the water, and W the width of the flow. In ideal cases (as in this study), W is simply the width of the catchment, but in more complex cases it may be considered to be a flow routing parameter instead. The water depth d is replenished by rainfall (provided as input data to the model) and reduced by surface runoff and infiltration.
For infiltration, the Green-Ampt model (Green and Ampt, 1911) was selected (SWMM also supports the Horton and Curve Number methods). This model assumes that, initially, all precipitation will infiltrate, until a saturated layer develops at the top of the soil and the maximum infiltration rate becomes: where the infiltration rate f p is controlled by two soil hydraulic parameters (saturated hydraulic conductivity K sat and suction head ψ s ), the initial soil moisture deficit d at the start of the rainfall event and the cumulative infiltration volume F during the rainfall event. As the soil becomes more saturated the infiltration rate asymptotically approaches the saturated hydraulic conductivity. The rate of recovery of infiltration capacity during dry periods is calculated from: where dmax is the maximum possible moisture deficit, i.e. the difference between soil porosity and residual moisture content. Higher saturated hydraulic conductivity (K sat ) leads to a faster recovery of infiltration capacity. If rainfall occurs long enough after the previous rainfall (the time is calculated as = T r K 4.5 sat ), the initial moisture deficit d for the Green-Ampt calculation is set to the unsaturated zone moisture content. Thus, the recovery of infiltration capacity is initially based only on K sat , but after sufficient time has elapsed it is affected by evapotranspiration and also by percolation from the unsaturated to the saturated zone. From the unsaturated zone, water can percolate to the saturated zone at a rate calculated from: where s is the maximum soil water content (i.e. porosity), cur is the current unsaturated zone soil water content and HCO is a soil parameter representing the slope of the conductivity vs. soil water content curve. The percolation rate grows exponentially (up to the value of K sat ) as the water content in the unsaturated zone increases. Water can drain from the saturated zone according to a linear reservoir equation, which was used for consistency with MOUSE (see Section 2.2). Evapotranspiration takes place initially from surface water and after this is depleted from both the unsaturated and saturated zone simultaneously. A user-supplied factor (between 0 and 1) determines what fraction of potential evapotranspiration is assigned to the unsaturated zone, with the remainder being assigned to the saturated zone. In this study, this factor was set based on the root zone depth according to Shah et al. (2007). Their work provides values for the division of evapotranspiration between the saturated and unsaturated zones based on groundwater depth. Thus, the average groundwater table from an initial SWMM run was used to obtain an estimate of the upper zone evaporation factor (UEF). This was then iterated until there were no further changes in the average groundwater table. Evapotranspiration from the unsaturated zone takes place at the potential rate supplied as input data to the model. Evapotranspiration from the saturated zone is calculated using: where d u is the groundwater depth and DEL is the groundwater depth below which evapotranspiration is disabled. The potential evapotranspiration rate e max can either be provided by the user directly (as in this study) or calculated from temperature data using Hargreaves's method.

MOUSE
In MOUSE, each catchment consists of a surface storage (with current storage U), a root zone storage (with current storage L) and a saturated zone, which is described by the depth of the phreatic surface below the ground level, meaning that it is (theoretically) infinite in size. The surface storage is filled by rain (provided as time series by the user) and emptied (slowly) by evapotranspiration and interflow: where IF is interflow, CK IF is a user-supplied coefficient, the factor L/ L max is the fraction of the root zone storage that is currently filled, and T IF is a threshold for L/L max below which interflow is disabled. (Note that the value of L/L max can theoretically vary between 0 and 1, in contrast to the soil volumetric water content used by SWMM, which can vary only between the soil residual moisture content and the soil porosity.) Since CK IF has a typical value of 500 to 1000 h and the surface storage U is quite small, interflow is usually also small. Water in the surface storage cannot infiltrate into the soil. When surface water exceeds the maximum storage (U max , comparable to depression storage), the excess precipitation P n is divided into overland flow, groundwater recharge and infiltration into the unsaturated zone according to: where OF is overland flow, G is groundwater recharge, I is infiltration into the unsaturated zone, CQ OF is a user-supplied coefficient (between 0 and 1) and T OF and T G are threshold values which L/L max must exceed in order to activate overland flow and groundwater recharge, respectively. The values of OF, G and I are limited to positive values only. Higher values of L/L max indicate a more saturated root zone storage which increases overland flow and reduces infiltration into the root zone storage. Overland flow is routed through two consecutive identical linear reservoirs. Evapotranspiration is initially drawn from the surface storage (at the potential evapotranspiration rate), and then from the unsaturated zone according to: where E a and E p are the actual and potential evapotranspiration rates, respectively. Potential evapotranspiration rates are provided by the user. There is no percolation from the unsaturated to the saturated zone, but capillary flux does take place the opposite way. The saturated zone acts as a linear reservoir generating groundwater outflow (DHI, 2017a).

SHE
Mike SHE combines a number of physically-based methods to describe hydrological processes. Overland flow is described by the diffusive wave approximation of the Saint-Venant equations for 2D shallow water flow. As part of this, SHE uses Manning's coefficient to describe the friction that the flow experiences from the surface. In effect, this means that SHE uses Manning's equation like SWMM. However, this equation is applied to each grid cell individually, rather than on a catchment scale. Thus, surface water depths (and processes that depend on this, e.g. infiltration) can vary throughout the catchment area.
Although not used in this study, the user-supplied precipitation rates can also vary throughout the catchment area.
Water movement in the unsaturated zone (including infiltration) is described using the 1D Richards' Equation, which describes the vertical movement of water in the unsaturated zone. It requires a description of how well the soil retains water. In this study, the Van Genuchten Equation (van Genuchten, 1980) was used for this: where , s and r are the current, saturated and residual soil moisture content, respectively, ψ is the tension head (i.e. the attractive force between water and soil particles, calculated when SHE solves Richards' Equation), and α and n are soil-specific parameters. Higher values of these parameters are associated with looser soils (Schaap et al., 2001). Water movement in the saturated zone is described using a 3D Darcy method. In its simplest form, flow between two grid cells is calculated using: where Q is the flow rate, Δh is the difference in piezometric head (i.e. the water level in unconfined aquifers as used in this study), and C is the conductance. In SHE the conductance is calculated from the saturated hydraulic conductivity parameter of the soil, so the saturated zone flow is mainly controlled by this parameter. As a boundary condition, the groundwater level was fixed at the bottom of the soil profile on the downhill boundary and excess groundwater allowed to drain from the model at this point (DHI, 2017b).

Synthetic catchments
In order to examine how the different models react to changes in soil type and depth, the simulations combined eleven different soil types (from the standard classification by the U.S. Department of Agriculture) with six different depths (0.5, 1.0, … 3.0 m) for a total of 66 soil profiles. The eleven soil types were numbered: 01sand, 02loa-my_sand, 03sandy_loam, 04loam, 05silt_loam, 06sandy_clay_loam, 07clay_loam, 08silty_clay_loam, 09sandy_clay, 10silty_clay, 11clay. The soils are numbered in order of decreasing hydraulic conductivity. However, it should be noted that not all other soil hydrological properties follow the same pattern. The depth of the soil is appended when referring to single soil profiles, e.g. 11clay_3.0 for the 3-metre deep clay profile. Each soil profile is used for one catchment that is 240 m × 240 m, covered in grass, with a 2% gradient in one direction.
For each soil profile, parameters for all the models were set to typical values for the soils from literature and based on the models' official documentation (MOUSE: DHI, 2017a, SHE: DHI, 2017b; SWMM: Rossman and Huber, 2016). The sources of the different parameter values are listed in Table 1, while the actual parameter values can be found in Table S1 in the Supplementary information to this article.
Generic parameter values for the groundwater reservoirs were not available, so the parameter values for SWMM and MOUSE were selected based on results from SHE. Starting with an empty catchment, a SHE simulation for the saturated zone only was run with a constant rainfall rate and the groundwater outflow rate was obtained for each soil type. The time constant of a linear reservoir with the same rainfall input was then calibrated to this groundwater outflow data and the resulting reservoir constant was implemented in both SWMM and MOUSE. Although SWMM supports more complex formulations of the groundwater reservoir, these were not used in order to maintain a comparable setup in all three models.
In MOUSE, overland flow and groundwater recharge only take place if the relative root zone moisture content L/L max is above a certain threshold value. The reference manual states that "the threshold values should reflect the degree of spatial variability in the catchment characteristics, so that a small homogeneous catchment is expected to have larger threshold values than a large heterogeneous catchment" (DHI, 2017a). The catchments in this study are homogeneous. However, setting high values for the thresholds would result in the soil moisture having no impact on the infiltration capacity until the soil was very wet, which we considered unrealistic. Instead, the threshold values were set at 0.2 for overland flow and 0.4 for groundwater recharge, so that (in numerical experiments with constant rainfall) the infiltration rate develops in the same way as in Horton's infiltration model. In addition, MOUSE contains two flow routing time constants (for overland flow and interflow, respectively) that lack a clear way of determining them except via calibration. The overland flow constant was set so that the end of the runoff from MOUSE roughly matched the end of the runoff from SWMM for one rainfall event. The lack of a clear physical basis in this approach means that the shape of the surface runoff hydrograph cannot be compared directly. However, since only the runoff routing is affected (and not the runoff generation), total runoff volumes can still be compared.

Meteorological data
The precipitation input is a 26-year rain and snow record from Trondheim, Norway (5-minute temporal resolution, 0.1 mm tipping bucket, see Thorolfsson et al. (2003)). The potential evapotranspiration (PET) was first calculated using the FAO Penman-Monteith (Allen et al., 1998) formula with eleven years of radiation data from a different meteorological station located about 1 km away (MET Norway, 2018). A generalized formulation of Hargreaves's equation (Almorox and Grieser, 2015) was then calibrated to this PET using temperature data from the precipitation station. The calibrated formula was then used to estimate PET for the entire simulation period based on temperature from the precipitation station. It was not possible to use the Penman-Monteith PET directly, since the required solar radiation data was not available for the entire study period.

Simulation and analysis
The simulations were run as one continuous simulation from which the relevant results for different time periods were extracted. The models were compared on both seasonal and event basis. Since the focus of this article is on rain rather than snow it considers the typically snow-free period from April-October for seasonal comparisons. Rain events are defined with a minimum 3-hour antecedent dry weather period (ADWP), as well as a total depth of at least 2 mm and an average intensity of at least 0.01 mm/h (Hernebring, 2006). The first three hours after the end of each rain event (i.e. up until the earliest possible start of the next rainfall) are included in the analysis of the events. The first year of simulations was used as an initialization period and not included in the analysis.
In the event scale analysis, rainfall events that cause mass balance errors greater than 5% (of the event rainfall) in at least four soil profiles were excluded from the analysis. (The mass balance is calculated as the precipitation volume minus the volumes of outflows and storage increases.) This is because such errors would normally render the simulation outcomes unacceptable and the modeller would typically attempt to eliminate these errors by changing computational settings such as the time step control. Given the length and number of simulations carried out in this study, it was not feasible to further reduce the size of the time steps. In order not to contaminate the results with known erroneous mass balances, these needed to be excluded. It would be possible to exclude mass balances only for the soil profiles and events that had larger errors. However, this would mean that different soil profiles and models employ different sets of events, making comparison between them inaccurate. Thus, the entire rainfall event was excluded if four or more soil profiles had a mass balance error of more than 5% in any of the models. Events for which only a limited number of soil profiles (maximum three) caused mass balance errors were included to avoid over-limiting the set of events.

Total available soil storage
First the (theoretically) available soil storage volumes in the I. Broekhuizen, et al. Journal of Hydrology X 5 (2019) 100044 different models were compared, see Fig. 2. For SWMM and SHE, the maximum storage is calculated according to Eq. (13): where S max is the maximum water storage capacity (mm), d the soil depth (mm), sat (-) the saturated soil water content (porosity) and wp (-) the water content at wilting point. Here it is assumed that soil water content will never fall below wilting point. For MOUSE, the maximum storage (mm) is calculated according to Eq. (14): where d g is the depth (mm) at which capillary flux from SZ to UZ equals 1 mm d −1 , d r (mm) is the effective root depth, s y (-) is the specific yield, and fc (-) is the soil water content at field capacity, and L max is the maximum root zone storage (mm) calculated according to the model's reference manual (DHI, 2017a). It should be noted that the root zone storage is not expected to empty completely under moderate climate conditions, but since there is no clear minimum value of actual root zone storage, the entire value of L max is included in the calculation of total soil storage capacity. Fig. 2 shows that total storage capacity is usually larger in MOUSE for shallower soils, but similar to SWMM/SHE for deeper soils. Silt loam has a larger storage capacity in MOUSE throughout, which is attributable to the large difference between field capacity and wilting point for this soil. The results in this section show that even at the model setup stage there may be significant differences between the various models, although the magnitude of these differences depends on soil type and soil depth. There is, of course, uncertainty involved in estimating parameter values from the literature without calibration. However, it should be noted that it is common practice to directly estimate values for the parameters, since it may not be possible to estimate values for all parameters during model calibration. This can be the result of e.g. the calibration data not containing enough information to identify all parameters or the model structure being overly complex. Although model calibration should be performed whenever possible (since it provides the information that is most pertinent to the studied catchment), it may be unavoidable to base estimates for some parameters on literature values, and in that case understanding the effect of the chosen values is valuable in understanding the uncertainties in the study.

Seasonal runoff
For all soil profiles, the average percentage surface runoff from April-October was highest in MOUSE and lowest in SHE, except for some deeper soils for which SWMM had the lowest runoff (see Fig. 3). The difference between models for the same soil profile reached up to 37 percentage points (11clay_3.0), which was higher than the range between the different soil profiles in SHE (19 pp) and SWMM (31 pp), with only MOUSE showing more variation between soil profiles (50 pp). The sensitivity to changes in soil type (i.e. the change along the horizontal axis in Fig. 3) was the highest in MOUSE (although much of the change occurred in the first four soils) and the lowest in SHE. The sensitivity to changes in soil depth (i.e. along the vertical axis in Fig. 3) was highest in SWMM and lowest in SHE. SWMM and SHE showed a larger change in annual percentage runoff than MOUSE when the soil depth changed from 0.5 m to 1.0 m. The pattern visible in the graph for MOUSE is, as expected, similar to the pattern of the Overland Table 1 Sources of parameter values for all three models. The actual parameter values for all soil types and soil depths are provided as Supplementary material, since this table is impractically large for printed publication. "x" denotes parameters that were input directly into the models, "i" denotes parameters that were used to calculate other parameters.

Soil water content
Porosity (θ sat ) x x Rawls et al. (1983Rawls et al. ( , 1982 Wilting point (θ wp ) i Field capacity (θ fc ) i Suction head (Ψ s ) x Initial deficit (θ dmax ) x Porosity -field capacity, i.e. the maximum empty pore volume available when the soil cannot be drained further by gravity Root zone storage (L max ) x L max = (θ fc − θ wp ) × d r (DHI, 2017a)

Hydraulic conductivity
Saturated hydraulic conductivity (K sat ) x x Conductivity slope (HCO) x Directly from Table 5-9 in Rossman and Huber (2016). Van Genuchten α, n and L x Fitted to data from Rawls et al. (1983); also using Schaap et al. (2001) Evapotranspiration Effective root depth (d r ) i i Shah et al. (2007), grass land cover. Used for lower evap. depth (SWMM) and L max (MOUSE). Lower evap. depth (DEL) x Smallest value of effective root depth and soil depth Upper evap. Fraction (CET or UEF) x Fraction of subsurface evapotranspiration assigned to the unsaturated zone. Based on work on root depth and extinction depth by Shah et al. (2007).

Surface flow
Manning's coefficient (n) x x From  ) x Based on recommendations from the model's developer (DHI, 2017a), extreme values were set for shallowest and deepest sand and clay soils, then other soil profiles were interpolated based on field capacity and soil depth. TC overland flow (CK OF ) x See explanation in text TC interflow (CK IF ) x Threshold overland flow (T OF ) x See explanation in text Threshold groundwater recharge (T G ) x I. Broekhuizen, et al. Journal of Hydrology X 5 (2019) 100044 Coefficient (CQ OF ), see Table S1 in the Supplementary material. The cumulative distribution of seasonal (April-October) percentage runoff (25 seasons in total) allows examination of the inter-annual variation in runoff in the different models (see Fig. 4). MOUSE had the lowest variation between different years (i.e. the lines in the figure are close to vertical), while the variation was similar for SWMM and SHE. For all models, the inter-annual variation was higher in shallower soils than in deeper soils, but this effect was much weaker in MOUSE than in the other two models. For all three models, soil types that generated minimal runoff also had less inter-annual variation. The inter-model variation between MOUSE and SWMM or SHE was higher than the inter-annual variation in MOUSE. The large inter-model variation could significantly impact, for example, studies that address the effects of climate change or the effects of urbanization and green drainage infrastructure on urban water balances.
An explanation of the differences described above may be found in the approach to infiltration in the different models. In MOUSE, whenever the root zone moisture content L/L max is above a certain threshold and the surface storage is full, part of the precipitation becomes overland flow, based on the fixed coefficient CQ OF and current moisture  Table 1 for sources of parameter values. content L/L max . The surface storage has a limited capacity (2.5 mm) and empties only slowly so it will be full relatively often. Emptying of the root zone storage through evapotranspiration slows down as L/L max becomes lower (see Fig. 1), so it will only reach below the threshold during long, dry periods, which are rare in the climate of the study site (see Fig. 7). On the other hand, SWMM and SHE adjust the maximum infiltration rate based on the moisture content of the top layer of soil and when this is exceeded, surface runoff is generated. In other words, the different models predict different percentages of surface runoff, since they generate runoff in different circumstances: SWMM and SHE only generate runoff if rainfall intensity is higher than infiltration capacity (which depends on the soil type and the wetness of the top layer of the soil), whereas MOUSE will nearly always generate at least some runoff in a rainfall event. The higher inter-annual variation in SWMM and SHE can be explained by different factors. Firstly, as described earlier in this paragraph, the models are more sensitive to rainfall    Broekhuizen, et al. Journal of Hydrology X 5 (2019) 100044 properties and initial soil conditions, both of which may vary from year to year. Secondly, both SHE and SWMM divide the fixed depth of the soil profile into UZ and SZ by the variable groundwater table, which reflects the antecedent conditions on a relatively long time scale, and can therefore vary relatively strongly from year to year. In MOUSE the UZ and SZ are less well connected, and the groundwater elevation will only stop infiltration once it reaches the surface.

Seasonal evapotranspiration
Distribution plots of the annual percentage of evapotranspiration are shown in Fig. 5. SHE had higher evapotranspiration than the other two models, even exceeding 100% of rainfall in some cases. This may be attributed to water that is present in the soil following winter evaporating during the subsequent summer, i.e. the period analysed here. The absolute values for SHE were comparable to previously reported modelled values for the same region (Engen-Skaugen et al., 2005). It should also be noted that the most important parameters that influence evapotranspiration in SHE have well-established values (soil hydrological properties from Rawls et al. (1982); leaf area index and root depth from the model's developer (DHI, 2017c)) and were therefore not considered a major source of uncertainty in the model setup.
Of the other two models, SWMM had higher evapotranspiration than MOUSE, except for the deep sandy soils. Both SHE and MOUSE were insensitive to changes in soil type (from left to right in Fig. 5), but SWMM had higher evapotranspiration for clayey soils than for sandy soils. The same applied to sensitivity to soil depth, with only SWMM showing increased evapotranspiration for shallower soils. Although seemingly counterintuitive, this may be explained by SWMM dividing evapotranspiration by a fixed factor between the unsaturated and saturated zones. In this study this factor was set based on root zone depth (Section 2.1.1), which meant that a larger fraction of potential evapotranspiration was assigned to the unsaturated zone in shallower soils. The actual evapotranspiration from the unsaturated zone is equal to the apportioned part of the total, but reduced (based on groundwater depth) for the saturated zone, thus increasing evapotranspiration when the root zone covers a large part of the soil depth, as is the case in shallow soils.
One of the benefits and goals of green infrastructure is to retain water in place, rather than generating runoff. This restores or maintains a water balance which is closer to the natural water balance than would be the case with pipe-based drainage systems. Hydrological models can be used to assess the water balance of existing or proposed drainage systems. The results in this section and the previous section show that if models are used to help assess the water balance of a green infrastructure system, different models may find different results. From a hydrological perspective it is clear that this may influence the design of drainage systems. Increasing evapotranspiration through the increased use of green infrastructure may help reduce the urban heat island effect and its negative consequences (see e.g. Arnfield, 2003;Gill et al., 2007). Information from hydrological models may help inform decisions in urban planning processes, and the uncertainties of the results in these models should be properly understood in order to make optimal use of the information.

Mass balance errors in rainfall events
Although not a primary objective of this study, avoiding mass balance errors is important in any hydrological study and a brief overview is therefore included here. Fig. 6 shows cumulative distribution plots of the mass balance errors for all soil profiles and events. (Mass balance errors on the seasonal scale were negligible and are therefore not presented here.) A mass balance error-free model would show a vertical line at 0% event rainfall and mm, respectively, and it is clear from the plots that this was not the case for any of the three models. SWMM had the smallest mass balance errors and MOUSE the largest. In SWMM, most errors were negative (i.e. outflows larger than inflows + changes in storage), while SHE and MOUSE had mostly positive errors (i.e. some of the rainfall was unaccounted for in outflows and storage changes). As described in section 2.4, rainfall events were excluded from the analysis if one of the models caused a mass balance error of more than 5% for four or more soil profiles. In total, 1,192 rainfall events (out of 2,026 total) remained part of the comparison. For the individual models, SHE would retain 1,452 events, MOUSE 1,623 and SWMM 2,005. This shows  I. Broekhuizen, et al. Journal of Hydrology X 5 (2019) 100044 that the larger mass balance errors in MOUSE were concentrated in a more limited number of rainfall events than in SHE. The differences in mass balance error behaviour mean that the choice of one model over another can significantly affect the computational aspects of modelling studies, and it cannot be assumed (without verifying) that models are free of such computational errors.

Initial conditions for rainfall events
The distribution of UZ water content (relative to total soil pore volume for SHE and SWMM; relative to maximum root zone storage for MOUSE) is shown in Fig. 7. For SHE, it should be noted that soil infiltration capacity depends on moisture content in the top layer of the soil, but average SWC in the entire UZ is shown here in order to facilitate comparison with the other models. For SWMM, it is possible that if the antecedent dry time of an event is relatively short, then the initial moisture used in the Green-Ampt equation is not set equal to the value of UZ SWC, but is derived from the Green-Ampt recovery calculation that followed the previous event. However, it is not possible to obtain this value from the SWMM output files. From the figure, it is clear that initial conditions were most variable in SHE and least variable in SWMM. MOUSE was relatively insensitive to changes in both soil type and depth, both SWMM and SHE were sensitive to soil type, and SHE was also somewhat sensitive to soil depth. The higher variability and the presence of low values in SHE can be explained by the higher evapotranspiration (see section 3.3). The results in Fig. 7 show that care needs to be taken in determining the appropriate initial conditions for simulations of green areas, especially when using SHE or (to a lesser degree) MOUSE. Only for clayey soils in SWMM can it be considered appropriate to use a standard value for different events.
These results support previous findings that the infiltration capacity of green areas may be reduced due to initial soil wetness, even for sandy soils (Davidsen et al., 2018). Using Horton's equation for infiltration, Davidsen et al. found that the initial infiltration capacity ranged from 45% to 100% of the maximum. Although this figure cannot be compared directly with the relative UZ SWC from our results, the variations found here may be smaller or larger, depending on the model used. Fig. 8 shows the percentage of rainfall events that generate at least 1 mm of runoff for each soil profile and model. This revealed the same patterns as the seasonal percent runoff, i.e. MOUSE had the highest number of runoff events and SWMM had the lowest. However, SWMM predicted more runoff than SHE for shallow clayey soils. SWMM was most sensitive to changes in soil type (ranging from 1% of events for shallow sand to 56% for shallow clay), while MOUSE and SHE were sensitive to changing soil type for the three sand soils, but only somewhat sensitive to changes in the other soil types. MOUSE was insensitive to changes in soil depth (6-8% change between shallowest and deepest soil, except for sandy soils), SHE was somewhat sensitive (up to 33%) but not for clay soils, and SWMM was most sensitive to changes in soil depth (e.g. approx. 40% difference between deepest and shallowest soils). This last observation was unexpected since SWMM used the Green-Ampt model for infiltration in which the infiltration capacity becomes limited as the very top layer of soil becomes wetter, and this would be expected to give similar infiltration capacities (and therefore surface runoff) in shallow and deep soils. Davidsen et al. (2018) found no runoff from sandy soils when simulating infiltration capacity, but not the fate of infiltrated water (using Horton's equation). Here we found that runoff does occur for shallow sandy soils, but less for deeper sandy soils, suggesting that the long-term storage of water in the saturated and unsaturated zones can also lead to the generation of runoff. Our findings in this section and in Section 3.2 further underline that green areas may contribute to runoff in urban areas during many rainfall events and cannot be simply ignored in studies of catchment runoff. It should also be noted that actual runoff from urban green areas may be even higher or more frequent if the soil is compacted (see e.g. Gregory et al., 2006;Muthanna et al., 2018;Pitt et al., 2008).

Runoff during rainfall events
Distribution plots for the event percentage runoff are shown in Fig. 9. For most soil profiles, SWMM and SHE showed similar bi-modal distributions, i.e. events tended to generate either less than 10% or more than 60% runoff with relatively few events generating runoff between these percentages. This effect was clearer for larger rainfall events (compare the upper and lower panel of Fig. 9). (Although the distributions for SHE and SWMM appeared to be similar, the standard two-sample Kolmogorov-Smirnov test at the 95% significance level showed that they may not be considered to be samples from the same distribution.) This behaviour was not present in MOUSE, where interevent variation was smaller except for sand soil. For rainfall events larger than 10 mm (lower panel in Fig. 9) there are no events for which MOUSE does not predict any runoff (except for sandy soils), while SWMM and SHE predict some of these events even for clay soils. The observations above may be attributed to the mechanisms described in Section 3.2, i.e. the higher sensitivity of SWMM and SHE to varying rainfall properties and initial conditions and the larger influence of the groundwater table. In addition, the smaller inter-event variation in MOUSE can be explained by negative feedback effects where water infiltrating into the root zone will increase the root zone storage L/L max , which then lowers infiltration and groundwater recharge at the next time step. In a design context, it is important to note that the differences in the distributions found here would be expected to influence the return intervals of runoff events, and in this way lead to different models, resulting in different designs of urban drainage systems. In addition, this suggests that if the distribution of the observed event percentage runoff is similar to the unimodal distribution shown by MOUSE it may be more difficult to match this in a multi-event calibration with SHE or SWMM, and vice versa.

Sensitivity to rainfall intensity
The dependence of simulated surface runoff on rainfall intensity is visualized in Fig. 10. With the exception of SHE for sandy soil, there was no clear increase of event percentage runoff with increasing rainfall intensity for intense events with > 15 mm rain in 2 h. For such intense events the percentage runoff for clay soils would be expected to be closer to 100% than the figure shows. This may be attributed to the fact that these events are observed events where intensity will vary throughout the event, where the most intense two-hour period is used in the figure. This means that the events also can contain less intense periods where little runoff is generated. Only two intense events (one event with 15.4 mm in two hours and a total of 51.9 mm over 34 h; one event with 24.7 mm in two hours and a total of 46.8 mm over 24 h) had close to 100% runoff (in both SWMM and SHE). This shows that the variability of rainfall within events makes it challenging to analyse the relationship between rainfall intensity and runoff using observed rainfall data. For shallow sand soil, SHE showed a group of events in which surface runoff increased roughly linearly with rainfall intensity, which did not exist for the other models. This is likely caused by high groundwater levels in the soil. For clay soil, events with an intensity of less than 15 mm in two hours were split into two groups for SHE: one group with runoff of > 75%, and another group with less runoff, but both groups showed a slight increase in runoff for increasing rainfall intensities. For SWMM it was mainly the first group that was visible for shallow clay soil and mainly the second group that was visible for deep clay soil. For MOUSE there was no relation between rainfall intensity and runoff at all (for any of the soils), which may be attributed to the fact that MOUSE does not feature a limited infiltration rate like SHE and SWMM.

Conclusions
The objective of this study was to examine the differences in model structure of three models (SWMM, SHE and MOUSE) used for urban drainage and their impact on simulation outcomes. This comparison was made for varying soil types and varying soil depths. The differences found between the three different models are described below.
On a seasonal scale, the differences between the models are large (e.g. up to 33 percentage points for seasonal runoff for the same soil profile) compared to those between different soil types. MOUSE   Fig. 8. Percentage of rainfall events (n = 1192 events) that generated at least 1 mm of runoff. I. Broekhuizen, et al. Journal of Hydrology X 5 (2019) 100044 generates much more runoff than the other two models. Runoff in SHE and MOUSE is mainly sensitive to soil type, and less so to soil depth, while SWMM is sensitive to both. Inter-annual variation is higher in SHE and SWMM than in MOUSE. On an event basis, the inter-model variation of runoff generation is similar to the seasonal scale, while statistical distributions of event percentage runoff are different. While MOUSE predicts between 10% and 60% runoff for most events, SWMM and SHE mainly predict either less than 10% or more than 60% event runoff. There was considerable inter-event and inter-model variation in the initial conditions for rainfall events. The sensitivity to rainfall intensity for different soil types also varied between the three models.
The observed differences may be linked to differences in the model structures. In MOUSE, the relative root zone moisture content L/L max influences most of the important processes (runoff generation, infiltration, groundwater recharge), creating negative feedback loops which reduce the sensitivity of the model output to variations in soil type, soil depth and meteorological conditions. In SWMM and SHE, saturation of the top layer of soil can limit the maximum infiltration rate, making runoff generation more sensitive to rainfall characteristics and increasing inter-event and inter-annual variation in the model outputs.
The main practical implications of these findings are that model structure should be regarded as an influential source of uncertainty in urban drainage modelling. The differences in the simulation outcomes and in the responses of the models to changes in soil type and depth are large enough that they may significantly impact the outcome of various modelling studies. In the special case of model forecasts (e.g. studying the impacts of climate change or changing land use, where calibration is, per definition, impossible) the magnitude of the uncertainties arising from model structure should be explicitly considered in the analysis. In addition, the results from this paper further underline that green areas may contribute significantly to runoff in urban areas and should not be simply ignored.
Even if different models are first calibrated to, for example, runoff measurements, it cannot be expected that the results will be the same for other components of the water balance or in terms of inter-event or inter-annual variation. The differences that were observed in the distribution of the event percentage runoff may also mean that different models may encounter different challenges while calibrating them to multiple events, for example, different trade-offs between the different events might be necessary. If flow rate data is available, the values of the subset of parameters that are identifiable from flow rate data can be calibrated. After this, the approach shown in this paper can be used to estimate the range of model outcomes that is still possible for the different values of the parameters that were not identified from the flow Fig. 9. Distribution plots of the percentage runoff for the simulated rainfall events (n = 1192 events) The upper half of the figure shows events with total rainfall of less than 10 mm, the lower half of the figure shows events with more than 10 mm rainfall. The line for SWMM is heavier only to improve its visibility.
data. Finally, the results from this paper show that different models set up with the explicit goal of representing identical simple synthetic catchments may give strongly varying results. Thus, the direct comparison of parameter values between models may not be meaningful.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. I. Broekhuizen, et al. Journal of Hydrology X 5 (2019)