Hydrology: Regional Studies Simulation of river ﬂow in the Thames over 120 years: Evidence of change in rainfall-runoff response?

Study region: The Thames catchment in southern England, UK. Study focus: Modelling with 124 years of rainfall, potential evaporation (PE), temperature and naturalised ﬂow data. Daily rainfall-runoff ﬂow simulation using current and three historic land cover scenarios to determine the stationarity of catchment response examined through three time-frames of analysis – annual, seasonal and ﬂow extremes. The criterion of response stationarity is often assumed in climate change impact studies. New hydrological insights: The generally close correspondence between observed and simulated ﬂows using the same model parameter values for the whole period is indicative of the temporal stability of hydrological processes and catchment response, and the quality of the hydrometric data. Changes that have occurred are a decrease in ﬂood peak response times, typically two to three days pre and post the early 1940s, from change in agricultural practices and channel conveyance, and an increase of about 15% in summer ﬂow from increase in urban land cover between the ﬁrst decade of the 20th and 21st centuries. The water balance was found to be sensitive to the PE data used, with care needed to avoid discontinuity between two parts of the data record using different methods for calculation. Long-term mean annual rainfall shows little change but contrasting patterns of variation in seasonal rainfall demonstrate a variable climate for which simulated ﬂow is similar to observed ﬂow. investigation into rainfall-runoff response for the Thames using a long daily rainfall set, to identify the main differences between observed and simulated river ﬂow and their possible causes. Potential drivers of differences considered (after Harrigan et al., 2014) are consistency of the rainfall, PE and ﬂow datasets, historic land cover scenarios and change in agricultural practices and river management.


Introduction
The Thames is a highly studied catchment partly due to the length of the flow record at its lowest gauging station which, beginning in 1883, is the longest continuous flow series in the UK (Marsh and Harvey, 2012). The value of the observed record is substantially enhanced by having a companion naturalised series which takes account of major abstractions in the lower reaches of the river. This series provides a record of catchment runoff for over 100 years, through many variations in climate, land-use and channel hydraulics which have occurred within the time-span.
Hydrological models for simulating river flow from rainfall are widely available with a broad range of structures and representation of hydrological processes and similarly broad range of purposes (Todini, 2007). One purpose has been simulating impacts of change, for example, climate and land cover, on the flow regime but models are often run with the assumption that rainfall-runoff processes in the current or baseline time period are applicable under the changed conditions (e.g. Quilbé et al., 2008;Prudhomme et al., 2013). Many models require catchment-specific calibration of model parameters through optimising fit between observed and simulated flow over a particular period of data, but increasingly research has allowed models to be set-up through development of generalised relationships between physical catchment properties and model parameter values which can be applied over national or larger areas (e.g. Bell et al., 2007;Hrachowitz et al., 2013;Skaugen and Onof, 2014). This generalised approach has advantages through parameter values being not so specifically related to a calibration time period with possibly a limited range of hydrological events, or biased to wet or dry conditions, with consequent implications for use of the model under different conditions (Wilby, 2005;Merz et al., 2011). While measures of model performance using parameter values determined through generalised relationships may be slightly lower than can be achieved with catchmentspecific calibration, advantages are gained in terms of temporal and spatial stability of parameter values.
Lack of pre-1961 digitised climate data (in the UK) has up to now limited historic applications of hydrological models requiring continuous daily data but with this situation beginning to change simulation of river flow over long time periods (∼100 years) has become possible. A European monthly data-set from 1887 (HISTALP; Auer et al., 2007) was used by Kling et al. (2012) to evaluate historic modelling of the upper Danube before applying climate change scenarios. Monthly rainfall and evaporation data were used by Jones et al. (2006) in a regression-based reconstruction of flow records for 15 catchments in England and Wales from 1865 including the upper Thames. The availability of long daily data series for the Thames provides the opportunity for a much fuller examination of temporal variability of catchment rainfall-runoff response, over a range of time scales, than has been possible up to now. By definition, extremes of the flow regime are likely to have occurred only once, or not at all, in a 30-or 40-year record so use of a much longer data period may well allow examination of more extreme events than those with which a model was calibrated, or set-up, for a catchment. However, methods of data measurement and resulting data quality have changed over time and need to be considered in interpretation of comparison between observed and simulated flow series. An existing generalised model, CLASSIC (Crooks and Naden, 2007), has been used for continuous simulation of flows in the Thames from 1890 to 2013, previously only run from 1961. The model is used to provide a 'benchmark' flow series with which the observed series is compared. Time series of differences between the two series are analysed, over a range of time scales, to examine the following questions. Are the same model parameter values, mostly determined from physical catchment properties, appropriate for the whole period including flow extremes; do patterns in the difference time series relate to changes in the catchment; is data quality an issue in interpretation of these differences? The use of a model allows investigation of temporal variation in the relationship between rainfall and river flow, questions about which cannot be easily answered by separate analysis of each observed data series because of the non-linearity between them.
One hydrological model has been used for this initial investigation into rainfall-runoff response for the Thames using a long daily rainfall set, to identify the main differences between observed and simulated river flow and their possible causes. Potential drivers of differences considered (after Harrigan et al., 2014) are consistency of the rainfall, PE and flow datasets, historic land cover scenarios and change in agricultural practices and river management. Uncertainty from hydrological model structure and parameterisation and from use of statistical methods in construction of datasets and analysis of results is not fully considered in this initial investigation.
Section 2 provides an overview of the Thames catchment and the hydrological model used for the simulation, with details of the hydrometric and land cover data in Section 3. Results comparing differences between observed and simulated flows are given in Section 4 for different time scales (annual, seasonal and monthly) and for flow extremes (drought and flood events). Discussion and conclusions around the key questions, including suggestions to further this initial investigation, follow in Sections 5 and 6.

The Thames catchment
The Thames is the largest catchment in the UK, with an area above the lowest gauging station of 9948 km 2 . It is relatively low-lying, with a maximum altitude of 330 m but characterised by three bands of higher ground across the catchment corresponding with outcrops of permeable strata (one of Jurassic limestone and two of chalk; Fig. 1). The permeable strata underlie about 45% of the catchment and are separated by comparatively impermeable clay vales, providing distinctive regions of contrasting response to rainfall. Annual average rainfall of 706 mm  is fairly evenly distributed through the year but, with around 65% lost to evaporation predominantly in the summer half of the year, a pronounced seasonal pattern is imposed on the flow regime. Catchment drainage is dominated by nine main tributaries (area >300 km 2 ) whose differing drainage characteristics are determined by their underlying geology.

CLASSIC
Flow in the Thames has been simulated using CLASSIC (Climate and Land-use Scenario Simulation in Catchments); originally developed in the mid-1990s for estimating impacts of climate and land use change in three large catchments in Britain. The semi-distributed model (Crooks and Naden, 2007) uses a grid framework overlaid with a topographic catchment boundary to simulate flows through three constituent modules -soil-moisture accounting, soil-drainage and channel routing. The first two modules operate within each grid square while the third routes runoff directly from each grid square to the catchment outlet. Catchment discharge is given by the summation of routed flow from all grid squares. A snowmelt module can be run as a pre-cursor to the soil-moisture accounting.
Parameter values in the soil-moisture accounting and soil-drainage modules are determined from physical properties of each grid square -land cover, soil type, slope and altitude. The generalised methodology for determining model parameter values, derived using hydrometric data between 1961 and 2001, has been shown to give good simulation of observed mean daily flow for catchments (>100 km 2 ) across Britain . The two parameters in the channel routing module, for wave velocity and channel attenuation, are derived by calibration, normally optimising on Nash-Sutcliffe efficiency (Nash and Sutcliffe, 1970) but including manual adjustment if appropriate for specific purposes, for example fit of the flood frequency curve. The channel network in a catchment is represented through a network width function for each grid square, which is the number of channels at kilometre distances from the outlet, derived from the 50 m IHDTM (Integrated Hydrological Digital Terrain Model; Morris and Flavin, 1990). The snowmelt module is based on that of Bell and Moore (1999) and is usually applied using parameter values derived as suitable for general application across Britain (Crooks et al., 2009). Uncertainties associated with using general values in the snowmelt module are discussed in Kay and Crooks (2014).
Soil type, required to determine parameter values, is from the HOST system (Hydrology of Soil Types, Boorman et al., 1995). HOST groups soils into 29 classes based on the dominant features controlling movement of water through the soil. Determining characteristics are whether the soil is mineral or peat based, whether there is an underlying aquifer, and the presence or not of an impermeable layer within the top metre. In CLASSIC, response from each HOST class in a grid square is simulated separately. Given the high percentage of permeable strata in the Thames catchment it is important that this component is adequately represented in the rainfall-runoff model. CLASSIC contains three separate pathways through the soil-moisture accounting and soil-drainage modules, for permeable and semi-permeable substrate and urban/impermeable surfaces. Areas with HOST classes 1-3 (with soils overlying major aquifers) use the permeable pathway, with the remainder (classes 4-29) using the semi-permeable pathway. A regional response-time index has also been derived to allow for variation of drainage response times within HOST classes 1-3 . In the Thames catchment, all areas of HOST 2 (Jurassic limestone of the Cotswolds) have the same drainage response time but the different regions of HOST 1 (chalk of the Chilterns, Berkshire Downs and North Downs) have separate response times. Additionally chalk areas can be modelled as two bands, conceptually representing upper and lower chalk, with individual response times. A further allowance is made for drift soils overlying permeable substrate where the drainage response is that of the underlying geology. In the Thames this occurs where soils of HOST class 18 overlie chalk (notably in the Chilterns) and manual adjustment has been made to model the drainage response, where this occurs, as that of chalk (HOST 1).  Morton et al., 2011). In CLASSIC the 25 land cover types are amalgamated into six groups; grass, deciduous woodland, coniferous woodland, arable, upland and urban. Historic land cover data are described in Section 3.5.
CLASSIC has been used to simulate flow in the Thames in a number of other studies including assessing the effect of land use change on floods (Crooks and Davies, 2001), event attribution for the Autumn 2000 floods (Kay et al., 2011) and regionalising the impacts of climate change on flooding in Britain (Prudhomme et al., 2013).

Data
Data required to run CLASSIC are daily rainfall and monthly potential evaporation (PE), with daily mean temperature required to use the snowmelt module. Data was setup for CLASSIC using a 20 km modelling grid, shown in Fig. 1. Details of the climate datasets are given in Sections 3.1-3.3 and for the observed river flow in Section 3.4. A summary of the time periods, resolution and sources of rainfall, PE, temperature and flow data used for 1890-2103 is given in Table 1. Details of land cover datasets are given in Section 3.5.

Rainfall data
CLASSIC is run using daily rainfall from CEH-GEAR (Centre for Ecology & Hydrology-Gridded Estimates of Areal Rainfall), available on a 1 km grid for Britain for 1890-2012 (Tanguy et al., 2014;Keller et al., 2015) and using provisional data to extend the record to 2013. Numbers of digitised raingauge records from which the gridded data series is derived have varied over the period; most raingauge records are digitised post-1961 with a maximum number in the 1970s. From 1961 the Thames catchment is covered by several hundred raingauges, but even in the latter half of the 19th century the catchment had a network of well-distributed gauges. For use in CLASSIC, the daily rainfall data are averaged from the 1 km rainfall grid to the 20 km modelling grid.
Uncertainty at the beginning of the CEH-GEAR record, from the small number of raingauges (at least 12) in operation at that time, was assessed using an alternative catchment rainfall dataset: the Environment Agency (formerly the Thames Conservancy) calculate catchment rainfall for the Thames to Teddington/Kingston from 12 well-distributed raingauges (although the gauges used have changed over time). The monthly 12-gauge series dates from 1883 (Bowen, 1965) and the daily series is from 1904 (Marsh and Harvey, 2012). Catchment monthly rainfall totals from the 12-gauge series were compared with those from CEH-GEAR for 1961-1974, and gave a correlation coefficient (R 2 value) of 0.997 but with slightly higher totals using the 12-gauge series. Hence a good estimate of rainfall should be obtained from CEH-GEAR despite the early limited number of raingauges, though uncertainty for daily rainfall distribution will be higher than for monthly totals.

Potential evaporation data
Monthly PE from 1961 is from MORECS (Met Office Rainfall and Evaporation Calculation System; Hough and Jones, 1997), available for a short grass cover on a 40 km grid. The four 20 km model grid squares within a 40 km MORECS box take the same PE as the 40 km box. As MORECS is only available from 1961, an alternative source of PE is required to simulate the earlier period. MORECS is based on Penman-Monteith equations, which require the climate variables temperature, wind speed, net radiation and humidity. Many alternative equations have been developed for estimating PE (Oudin et al., 2005) with the simplest requiring only one variable, usually temperature. A temperature based equation was developed by Blaney and Criddle (1950) to estimate irrigation requirements in Western USA. A general form of this equation is given by Jensen et al. (1990); where p d is mean daily percent of annual daytime hours for the latitude for day d, and T is mean air temperature. Values of the coefficients a and b were originally derived from climate variables but may be estimated by calibration for a specific region. Prudhomme and Kelvin (2012) calculated values of a and b for each month, using the least squares method, for 30 catchments in England (including the Thames), where T was taken as catchment average monthly temperature and PE was monthly catchment MORECS PE for 1962-2009. Catchment average monthly temperature was derived from a 5 km gridded monthly temperature data set generated by the UK Met Office (Perry and Hollis, 2005). Goodness of fit between calculated monthly Blaney-Criddle PE (BC) and MORECS PE is generally better in the summer and winter months (R 2 values of 0.44-0.77) and lower in spring and autumn (0.09-0.39), but tests using it as input in hydrological modelling showed biases within ±5% for mean monthly flow and flow quantiles Q 1 -Q 95 , when compared with modelling using MORECS PE (Prudhomme and Kelvin, 2012). The 5 km gridded monthly temperature data are available from 1910 and were used to calculate monthly PE for 1910-1960. As gridded temperature data are not available before 1910, an alternative source of temperature data is required to calculate PE for the years 1890-1909. Station data are available from the British Atmospheric Data Centre, and a station in Oxford (centrally located in the Thames catchment, Fig. 1) has daily maximum and minimum temperature data from 1853. These data were used to calculate a monthly mean temperature series for Oxford. Then, to allow for variation in temperature across the catchment, regression equations were calculated between Oxford and mean monthly temperature for 40 km grid boxes for 1910-1960 (the 40 km grid is the same as that for MORECS, and mean monthly temperature on the 40 km grid was calculated from the 5 km gridded temperature data). The regression equations (which all have R 2 values greater than 0.995) were then applied to pre-1910 monthly temperature data for Oxford to make mean monthly temperature data for 40 km boxes for 1890-1909. The mean monthly 40 km temperature data were then used in Eq. (1) to estimate monthly 40 km PE data for 1890-1909. The intention of using the BC method, calibrated against MORECS, is to generate a PE dataset for 1890-1960 which is as consistent as possible with the post-1960 MORECS data with which the generalised parameter method was developed. Use of other temperature based methods, not calibrated against MORECS, could result in a modelling discontinuity in 1961. Alternatively, the same temperature based method could be used throughout, but it was preferred to use actual MORECS data as it is likely to provide the best available PE data. A comparison of BC and MORECS mean monthly PE, for decades pre-and post-1961 with similar mean temperature, suggested that there was a small positive bias in the PE data generated using Eq. (1). A monthly bias correction, average −7%, was applied to all the pre-1961 BC PE data.
Monthly PE for grass, for use with the 20 km CLASSIC modelling grid, was taken from the overlying 40 km MORECS grid for 1961-2013, calculated from 5 km gridded monthly temperature data for 1910-1960 (averaged to 20 km, see Section 3.3) and derived for a 40 km grid from mean monthly temperature data for Oxford for 1890-1909 (Table 1). PE rates for vegetation other than grass are estimated using monthly regression equations between MORECS PE for grass and MORECS PE for five land cover types: deciduous woodland, coniferous woodland, arable (winter and spring sown grain crops) and upland. The regression equations were calculated using daily PE data for the six land cover types, provided by the UK Met Office for a number of climate stations across Britain for 1985-1992. Equations for one of the climate stations, Stansted (approximately 65 km north-east of Kingston, Fig. 1) are used in simulation of the Thames. The monthly equations have mean R 2 values of 0.94 for deciduous woodland (minimum 0.84 in January), 0.91 for coniferous woodland (0.77 April), 0.95 for winter sown arable (0.72 June), 0.94 for spring sown arable (0.89 January) and 0.97 for upland (0.86 March). The arable land cover type is assumed to be 50% winter and 50% spring sown and is modelled with a seasonal growth cycle from bare ground in autumn to maximum growth in early summer followed by harvest in late summer. Urban (impermeable) surfaces are assumed to evaporate a maximum of 0.5 mm per day when this rate is equalled or exceeded by the rainfall.

Daily temperature data
Daily mean temperature data for use in the snowmelt module are from a 5 km grid of daily maximum and minimum temperature for 1961-2013 from the UK Met Office (Perry et al., 2009). The data are for the mid-point of each grid, the altitude of which is obtained from the 50 m IHDTM. The temperature for each 20 km modelling grid square is calculated using a lapse rate of 0.0059 • C m −1 to change between the mid-point altitude of a 5 km temperature grid and the mid-point altitude of the overlying 20 km model grid, and then averaging the sixteen 5 km values. Temperature data pre-1961 are the average of the maximum and minimum daily data for Oxford for 1890-1960 (see summary in Table 1). The mean daily temperature data, for grid square or Oxford, are lapsed to 50 m elevation bands within each 20 km grid square during a model run.

River flow data
Observed flow data are from the UK National River Flow Archive (NRFA) for station 39001 on the Thames. Measurement on the Thames began in 1811 at Teddington Weir, the tidal limit of the river, with the continuous record of mean daily flow dating from 1883. The weir is a complex barrage of gates and sluices with many hydraulic limitations and has undergone many structural changes over the period of the flow record (Marsh and Harvey, 2012). Hydraulic formulae either based on flow over a sharp-crested weir or on gate and sluice settings, broadly endorsed by current meter gauging, were used to estimate medium and low flows (<85 m 3 s −1 in 1942) but higher flows were generally derived from tail-water levels at Teddington lock using stage-discharge relationships (Anon, 1986). Hydrometric performance during flood events was said to be poor (McClean, 1936), while leakages and operation of lock gates resulted in underestimation of early low flows (Marsh and Hannaford, 2008). Measurement since 1974 has been by ultrasonic gauging at Kingston, 1 km upstream of Teddington, originally single-path but upgraded to multi-path in 1983. Calibration of the ultrasonic gauge endorsed the high flow rating which had been used following a major refurbishment of Teddington Weir in 1951 but adjustment was made to flows below 230 m 3 s −1 , between 1951 and 1974, to allow for differences between the measurement methods (Anon, 1986). Daily mean flows at Teddington were derived from two level readings per day while those for Kingston are based on 15-min data. Hence post-1951 flow data are acknowledged to be more reliable and more homogeneous than earlier data (Marsh and Harvey, 2012). See summary in Table 1.
In this paper reference to observed flows means the daily naturalised flow record (available from the NRFA), which is the gauged flows to which major abstractions in the lower reaches of the river for London's water supply have been added. Current abstractions, which can exceed 50 m 3 s −1 , are well recorded but there is more uncertainty for the early abstraction rates (average of <5 m 3 s −1 over the first 10 years of the Teddington record) (Marsh and Harvey, 2012). Use of the naturalised flow record for comparison of differences between observed and simulated flows is essential to ensure that changes in abstraction rates over the period of record are not, within the accuracy of the naturalisation method, a factor in the analysis. Although uncertainty about the data used for the naturalisation is higher for the first part of the record, abstraction rates are lower so the net impact of uncertainty on the homogeneity of the naturalised flow series is reduced.

Historic land cover data
Digitised land cover is available from three CEH databases, for years 1990 (Fuller, 1993), 2000 (Fuller et al., 2002) and 2007 (LCM07). Land cover is one aspect of the Thames catchment which has changed through the period being simulated. Each survey has used different methods to digitise information on land cover and differences between them are not necessarily representative of changes over time. The 1990 data-set has been used in previous modelling with CLASSIC and is representative for a longer period of the flow data than more recent surveys. However, for the Thames catchment the total area of urban land obtained using the 1990 survey is higher than that from LCM07. As this cannot reflect an actual change in land cover, and as the survey methods used for LCM07 are likely to give a more accurate record of the urban area than those of 1990, LCM07 is used for the current land cover. Comparison between observed and simulated flows for 1961-2013 using both land cover data-sets shows slightly higher correspondence using LCM07 (not shown).
Estimates of changes in land cover over the Thames catchment from 1870 to 1990 are given in Crooks and Davies (2001). These estimates were derived from statistics, available on a county basis, principally from a report by Sinclair (1993Sinclair ( ) for 1945Sinclair ( -1990 and two sources for 1870-1945 -the Land Utilisation Survey from the 1930s conceived by Stamp (1948) and annual returns on areas of arable land, grassland and rough grazing to the Ministry of Agriculture and Fisheries for 1870-1939, given in county reports of the Survey (e.g. Marshall, 1943). The changes can be summarised as two periods of gradual decline in arable land (1890-1940 and 1950-1990) with a rapid increase of over 100% in arable land during the 1940s to increase food production during the Second World War. Changes in arable land between 1890 and 1990 were combined with approximately opposite changes in grassland areas, but with a small increase in woodland and larger increase in urban areas. One of the objectives of this paper was to determine whether the changes in land cover (increase in urban over the whole time period and change from grass to arable in the 1940s) were accompanied by an associated signal in the relationship between observed and simulated flow.
To implement the evolution of land cover derived for the Thames catchment, ratios between current and historic percentages for five land cover types (grassland, woodland, upland/rough grazing, arable and urban) were calculated for three years typical of different land cover combinations -1900, 1939 and 1950. The ratios were calculated from statistics for each of the eight counties covering the Thames catchment and area weighted to adjust the current percentages for each 20 km modelling grid square. It was assumed for this purpose that ratios between a historic year and 1990 could be applied directly to the LCM07 data (i.e. assuming no change in land cover between 1990 and 2007). It was also assumed that changes in woodland could be applied equally to areas of deciduous and coniferous woods in LCM07. Using ratios between time periods (rather than taking percentages directly from the county statistics) was helpful in overcoming differences in meaning of land cover names, particularly urban, when combining information from different sources. Percentage areas of the five land cover groups for the whole Thames catchment for years 1900, 1939 and 1950 and from LCM07 are given in Table 2.

Results
CLASSIC was run for 1890-2013, using current land cover percentages (from LCM07) and the same model parameter values throughout, to provide a benchmark flow series with which the observed flow series is compared. Parameter values for each 20 km grid square are derived using percentages of the six land cover groups and 29 HOST classes, together with the mean altitude and slope. The same parameter values in the snowmelt module are also used throughout the period. Although the benefit of using the snow module is marginal for the Thames, the module has been used as snowmelt was a major factor in the generation of the second highest flood event in the 124 year period. Differences between observed and benchmark flows are investigated for three time-framesannual, sub-annual and flow extremes -to consider if change has affected dominant hydrological processes in the catchment. The model was then set-up and run for the full period using the historic land cover scenarios for 1900, 1939 and 1950, and the three simulated flow series compared with observed flows and the benchmark series. The first year (1890) is used as a warm-up period in analysis of simulated flows.

Annual flow
Annual rainfall, PE, simulated actual evaporation (AE) and the difference between rainfall and AE, together with annual observed and simulated flow and the difference between them are shown in Fig. 2. Variation in the differences between annual PE and AE is indicative of climatic conditions in the summer; with a cool, wet summer AE is around 90% of PE but with a hot, dry summer, when PE is high, AE may be only 50% of PE. At an annual scale correspondence between observed and modelled flow indicates whether the simulated losses are realistic in the water balance of the catchment. From  1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 year -50  Table 3 Annual rainfall, evaporation, snowfall and flow averaged over four 30-year periods, two data sub-periods (1891-1960 and 1961-2013)  prevalent in the early years of the record and generally the simulated annual flow exceeds the observed flow pre-1960. The overall bias in water balance is 5.2%. Average values of rainfall, PE, AE, modelled snowfall, observed and simulated flow and the difference between them, and observed percentage runoff (PR, observed flow/rainfall) are given in Table 3, for the complete period (1891-2013), two data sub-periods and four non-overlapping 30-year periods. The two data sub-periods are 1890-1960 (with lower raingauge density, PE estimated from temperature, and lower quality flow measurements) and 1961-2013 (with higher raingauge density, PE calculated from Penman-Monteith equations, and higher quality flow measurement).
Results from running the model with the same parameter values for 124 years show a decreasing difference between average 30-year observed and simulated flows (Table 3). Possible causes are physical changes in the catchment and poorer quality of data, pre-1961. Catchment average rainfall shows a small increase through the four 30-year periods, with the pattern repeated in the observed and modelled annual flow, despite a slight increase in the modelled AE. The observed PR also increases through the four 30-year periods. The average rainfall modelled as snow for each period shows the comparatively small amount compared to the total and that the lowest value is for the last 30-year period, reflecting the increase in annual temperature (Marsh and Harvey, 2012). Averaging the annual water balance over 30-year or longer time periods indicates some non-stationarity in the relationship between the observed and simulated flow series. However, an annual water balance may not be sensitive to factors operating at a seasonal or shorter time-scale and, as shown in Harrigan et al. (2014), it is at this finer scale that impacts of change in the catchment are more likely to be evident. To investigate further, differences between the two flow series are quantified over decades using appropriate seasonal and flow distribution measures.

Seasonal and monthly flows
A range of measures was selected to quantify the differences between the observed and simulated flow series for different aspects of the flow regime; these 'signature measures' (after Yilmaz et al., 2008) are defined in Appendix A. Overall differences are represented through the water balance (WB), Nash-Sutcliffe efficiency (NS) and root mean square error (RMSE), seasonal difference through mean monthly flow (MM) and over the flow range through differences in volume for three bands of flow duration percentiles (HFV for high flows (Q 1 -Q 5 ), MFV for mid-range flows (Q 33 -Q 66 ) and LFV for low flows (Q 70 -Q 95 )). Values of the measures are shown in Fig. 3 for decades starting in every year between 1891 and 2004. All measures show greater similarity between the observed and simulated flow series post-1950 and generally greater differences for the period before 1910. Values of NS for all decades are greater than 0.8 and for decades post-1900 the water balance bias is less than 10%; both thresholds used to indicate good agreement between observed and modelled hydrological behaviour (Harrigan et al., 2014). High flow differences (HFV) are all less than 5% apart from decades starting between 1903 and 1914, while mid-range and low flow volumes have a similar pattern characterised by a pronounced decrease in differences during the 1940s and 1950s. This decrease is consistent with the  1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 year 0 20 40 LFV (%) refurbishment of Teddington Weir in 1951 (which provided, in particular, greater accuracy of low flow measurement) and other changes which occurred during the 1940s, discussed in later sections. But greater differences between observed and simulated flows pre-1910 and smaller differences post-1950 are not unexpected given the differences in data quality.
Use of models for simulating impacts of climate change assumes that the same parameter values are valid under the changed conditions. Natural climate variation between 1890 and 2013 provides an opportunity to test this assumption. Variation in rainfall for the Thames catchment is shown in Fig. 4 for ten-year moving averages for the four seasons. How these seasonal oscillations combine has different hydrological impacts. Analysis of rainfall for England and Wales by Marsh et al. (2013) showed a considerable increase in 'winter' (November-April) rainfall for the 30 years from the mid-1970s, with 'summer' (May-October) rainfall totals prior to the 20th century normally exceeding those for the 'winter'. This pattern is evident for the Thames with summer (JJA) rainfall exceeding winter (DJF) at the beginning of the period in the early 1890s, but increase in 'winter' rainfall since the 1970s is largely a result of increase in the autumn (SON). Summer rainfall is highest in the 1900s (when it is the wettest season) and lowest through the 1970s and 1980s but increases again through the 1990s and 2000s. The increase in winter rainfall from its low point in the decade starting in 1890 to a maximum two decades later (when it is the wettest season) is a notable feature of these seasonal patterns. While winter and summer rainfall is often of consequence for flow extremes in large catchments, rainfall in spring and autumn is important in controlling the build-up and replenishment of soil-moisture deficits. A cyclical pattern is evident in spring (MAM) rainfall with its variation, combined with that of other seasons, providing a contrast between the 1950s with dry springs and wet summers and the 1970s with wet springs and dry summers. Autumn is, normally, the wettest season but a trough between the 1920s and 1990s produces a period in the 1960s when autumn is nearly the driest. Rainfall in autumn and ) 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010  winter is of particular importance in the Thames catchment for recharge to the chalk and limestone aquifers, which maintain the flow in the river through the summer and times of low rainfall.
The effect of these variations in seasonal rainfall is shown in Fig. 4 through observed and simulated decadal mean monthly flow for January, April, July and October (representing the four seasons). The low winter and spring rainfall at the beginning of the period caused persistent 'drought' conditions with generally depressed runoff until around 1910 (Marsh and Harvey, 2012); hence the low mean January and April flows pre-1900 and the lowest mean decadal flow of the whole period (57.94 m 3 s −1 for the decade starting in 1893). The much wetter conditions which followed resulted in the highest overall decadal mean flow (95.41 m 3 s −1 for the decade starting in 1910), despite the high autumn rainfall in the 1990s and 2000s. This seasonal variability is not a noticeable feature of the time-line of annual rainfall in Fig. 2 but the close similarity between the observed and simulated monthly decadal flows is evidence of the general stationarity of the catchment hydrological response. Simulated July flow is greater than observed in most decades, but particularly pre-1961, with a similar pattern for October for the pre-1951 period. The greater difference in July and October flows, compared to January and April, is partly a function of data quality at low flows, as differences are smaller post-1951, and may be partly related to changes in land cover. Temporal variation is less prominent in decadal flow percentiles (Fig. 4 for Q 1 , Q 5 , Q 10 , Q 25 , Q 50 and Q 90 ) with good correspondence between observed and simulated for the higher percentiles but simulated flows exceed observed for Q 50 and Q 90 pre-1951. The effect of using historic land cover scenarios on differences between observed and simulated flows is shown in Section 4.4.

Flow extremes
Differences between observed and simulated flow through extremes of flow examine inter-related issues of data quality and simulation at times of low and high rainfall, and how these factors may have changed over time and between different events. Although 120 years is noteworthy in terms of availability of data for hydrological simulation, it is not long in terms of low frequency (rare) events. Examples are given of severe droughts and floods in the period.

Droughts
While wet years are not necessarily synonymous with flood events, dry years have a high potential for drought conditions to develop. From Fig. 2 the driest years were 1921 and 1933, but dry winters are an important criterion for development of drought flows in the Thames through lack of recharge to the chalk and limestone aquifers, and the low rainfall in 1921, unlike 1933, did not extend to subsequent years. Examples of four 3-year catchment rainfall, effective rainfall, observed and simulated flow series, where the maximum winter flow was less than 200 m 3 s −1 , are shown in Fig. 5 for 1892 -94, 1933-35, 1943-45 and 1975-77. Similarly dry winters also occurred in 1991/92, 1996/97 and 2004-06. All these events, apart from that of 1943-45, are included in a review of major droughts affecting England and Wales by Marsh et al. (2007), who comment that lack of documented evidence about the impact of the 1943-45 drought is probably because it occurred during the Second World War. Simulated recession rates from decline of groundwater flow show good correspondence with observed rates in all events. There is little increase in groundwater flow through the winters of 1933/34, 1943/44 and 1975/76 and where there is some deviation between observed and simulated flow the difference resolves within a month or two.
The flows for 1892-94 are illustrative of the effect of comparatively low winter rainfall and high summer rainfall, prevalent in the years before 1910, but where the summer rainfall is still insufficient to exceed the evaporative demand. Timing of replenishment of soil moisture deficits and generation of simulated higher flows at the end of drought periods generally agrees well with observed flows, though simulated flows at the end of 1976 and the first part of 1977 are higher than observed. There is no evidence that rainfall on dry soils results in runoff before replenishment of the soil moisture deficit. Generation of summer peaks, when there is no effective rainfall, is through simulation of flow from urban or impermeable surfaces. Overestimation of simulated summer flow is evident in all examples, but particularly in the first three, which contributes to higher simulated mean monthly flows than observed in these months. While underestimation of observed flow is likely to be a factor, simulated flows have been generated using the 2007 land cover scenario, and differences are reduced with historic land cover scenarios (Section 4.4).

Floods
A chronology of floods in the Thames catchment is given in Griffiths (1983) which collates documentary information and quantitative evidence from over 1000 years. The majority of the evidence is obtained from water levels either recorded at the many locks along the Thames or marks on historic buildings. Water levels above and below locks have been recorded since the 1890s, and analysis by Crooks (1994) showed a greater number of extreme peak water levels in the period 1890-1940 than in the subsequent fifty years, though the number of flood events where the peak discharge exceeded bankfull had stayed relatively constant. Water levels are important for determining the extent of inundated land, but because they are influenced by hydraulic changes to the channel network and floodplain they are a less reliable indicator of changes in rainfall-runoff response than flow rates. Visual inspection of flow time-series shows a distinct difference in correspondence between observed and modelled flood hydrographs between the beginning and end of the period. The difference is most apparent in the timing of the observed and modelled peaks and subsequent recession, with observed peaks in the first decades being a few days later than the modelled peaks. More detailed investigation found that the change to similarity in timing of observed and simulated flood hydrograph shapes occurs in the early 1940s. This date is consistent with the time of change found by Crooks (1994), from an analysis of event duration of both water levels and river flows.
The three highest observed and simulated peaks pre-1940 are in 1894, 1915 and 1929 and are all rainfall-driven late autumn/winter events. The highest peaks post-1940post- are in 1947post- , 1968post- and 1974post- . Water levels in 1894post- and 1947 are the highest recorded at most locks. Of these six flood events, only the peak in 1974 has a percentage difference between observed and simulated of less than 10%; all the other peaks are underestimated. Snow was a major cause of the flood in March 1947 (Stock, 1947), following the second coldest winter in the 20th century, and was a contributory factor in a number of floods in the flow record pre-1960 but has played a very minor role in flood events since then (Marsh and Harvey, 2012). In a longer historical context snowmelt was a more common mechanism in major floods including those of 1809, 1774, 1768 and 1593 (Griffiths, 1983). The event in September 1968 was the result of unusually heavy 2-day rainfall over South East England (Salter and Richards, 1974) which, for the Thames catchment, particularly affected the two lowest, relatively impermeable, tributaries (Wey and Mole). The observed flood hydrograph for this event can be simulated more realistically using routing parameter values with faster wave velocity and less attenuation for the relevant south-easterly grid boxes compared with those used for the whole catchment.
Observed and simulated hydrographs for events in 1894, 1915, 1929 and 1974 are shown in Fig. 6, with catchment rainfall and calculated effective rainfall for each event. There is considerable uncertainty about the value of the observed peak discharge in 1894. Investigation of the hydrology of the event by Marsh et al. (2005), including rainfall-runoff modelling, suggested a revised peak flow of 800 m 3 s −1 (compared with the original value of 1064 m 3 s −1 (20,236 mgd); Bowen, 1965), which is used here. The context of the event in November 1894 is of interest as it occurred during the depressed runoff period prior to 1910 and at the end of the 'drought' sequence for 1892-1894 shown in Fig. 5. The main differences between the observed and simulated hydrographs for the pre-1940 events are in the timing of the peak (typically three days earlier in the simulated flow), the generally more rounded shape of the observed hydrographs (though not the main peak in 1894) and the lower simulated peak flow. Earlier timing of the simulated peak was also found by Marsh et al. (2005) using rainfall-runoff modelling for six flood events between 1894 and 1933. The later timing of the observed peak and more rounded shape indicate a slower rainfall-runoff response than is typical of post-1940 events (e.g. 1974), compatible with changes in the hydraulic characteristics of the river and land drainage of the catchment which have occurred over the period. Land drainage in England and Wales became more economical during the second half of the 19th century, through cheaper methods for producing drainage tiles and the availability of loans, but little drainage was carried out in the agricultural depression which began in about 1890 and continued to the 1930s (Robinson, 1986). Extensive land drainage was undertaken during the Second World War in association with increased food production (Stock, 1951)  Drainage Division between 1951 and 1993 (Defra, 2002). Hydraulic changes include channel clearance on the tributaries in the 1940s (Crooks and Davies, 2001), along with straightening and dredging of the main river, bed re-alignment and improvements in weir design (Marsh and Harvey, 2012). The greater percentage difference between observed and modelled peak flows in 1894 compared with events in 1915 and 1929 may reflect differences in data quality, measurement of extremes of both rainfall and flow, as well as differences in land management.
Observed and simulated flood frequency pre-and post-1940 was compared using a peaks-overthreshold (POT) method (Naden, 1992) extracting independent peak flows to give an average rate of three peaks per year (peaks are considered independent if the time between them is at least three times the average time to peak, and the flow rate between them has declined to less than two-thirds of the first peak flow). A generalised Pareto distribution is fitted to the POT series using probability weighted moments to give a flood frequency curve. Frequency is expressed as a return period, which is the average time between flows exceeding that magnitude. Flood frequency curves for the two periods, 1891-1940 and 1941-2013, are given in Fig. 7. There is good correspondence between observed and simulated curves at all frequencies for 1941-2013. For the earlier period, observed peaks below around 450 m 3 s −1 tend to have a higher simulated peak, while observed peaks above this value have a lower simulated peak. The threshold for an average of three peaks per year is 182 m 3 s −1 from the simulated flow series for both time periods, compared with 166 m 3 s −1 for the earlier period and 190 m 3 s −1 for the later period, from the observed series. This increase in discharge of high frequency events in the observed record is consistent with a significant increase in frequency of events above 250 m 3 s −1 found by Marsh and Harvey (2012), but which is combined with a lack of trend in the annual maxima series. A slower response time from un-drained land can result in lower flows as the movement of water to a channel is distributed over a longer time, as shown in the build-up to the main flood event in 1915 and 1929. But with further rainfall there is the potential for the peak to be increased, possibly with a higher percentage runoff through rain falling onto water-logged ground.
Hydrograph shape for simulated flows is partly determined by the rainfall-response rate from the soil types in each grid square but also by the parameter values in the kinematic routing function used in CLASSIC to route the runoff from each grid square to the catchment outlet. The parameter values for wave velocity and attenuation were determined by calibration using Nash-Sutcliffe efficiency and fit between observed and simulated flood frequency curves for 1962-2000 as objective functions. The example flood hydrographs show that at the daily time-scale there has been a substantial change in the response of the catchment. The peak of the simulated flood events can be altered by changing the channel routing parameters but the shape of the observed pre-1940 flood hydrographs cannot be reproduced by recalibration of the routing parameters alone. This would require recalibration of the response-time parameters in the soil-drainage module, which implies that changes in land drainage as well as changes to channel morphology have contributed to changes in typical flood hydrograph shape.

Effect of change in land cover
Impacts of land cover change on river flow have been investigated and debated over many years with effects related to size of catchment, degree of change, and types of land cover which change. Potentially, land cover affects river flow partly through differences in evaporation and transpiration rates from different types of vegetation and differences in rooting depth, and partly through differences between vegetated and impermeable surfaces. Oudin et al. (2008) found that land cover made a small but important contribution in an international study of catchment water balance. Increase in annual runoff from a small UK catchment was observed after ploughing of upland grassland followed by a decrease in runoff with the growth of trees to maturity (Birkinshaw et al., 2014). An increase in annual runoff was also detected following tree felling (Robinson and Dupeyrat, 2005) accompanied by an increase in low flows. While impacts of land cover change are more likely to be evident in small catchments, Quilbé et al. (2008) found that the hydrological regime of a 6682 km 2 catchment was sensitive to changes between agricultural land and shrub over a 30-year period and Siriwardena et al. (2006) report an increase of 40% in annual runoff following substantial forest clearance of a large catchment (16,440 km 2 ). Paired catchment studies are also a standard method for investigating how land cover (particularly grass and forest) affects runoff (e.g. Brown et al., 2005;Zhao et al., 2012). Renner et al. (2014) found that evaporative loss was affected by changes in tree growth due to air pollution, while Sawicz et al. (2014) concluded that inadequate information on land use change may limit ability to determine causes of hydrological change. Changes in flood frequency have been linked with changes in land use and land management (Brath et al., 2006;Harrigan et al., 2014) though O'Connell et al. (2007 found little evidence that local changes have a noticeable impact at a larger catchment scale. The sensitivity of the flood regime to land use change has been shown to decrease with increasing return period of the event. Changes through urbanisation, mainly the change from vegetated to impermeable surfaces and alterations to natural drainage, impact on the hydrological Table 4 Signature measure values between observed flow and flow simulated using historic and current land cover (WB water balance bias; NS Nash-Sutcliffe efficiency; RMSE root mean square error; MM mean monthly difference; HFV, MFV and LFV high flow, mid flow and low flow biases; see Appendix A).

Period
Land cover  regime through lower evaporation (no storage of moisture in the soil) and change in rates of runoff (e.g. Miller et al., 2014). CLASSIC was run for the full period, 1890-2013, using each of the three alternative land cover years, 1900, 1939 and 1950. Values of signature measures are given in Table 4 for specific decades, compared with those using LCM07. Arable land cover in the 1900 scenario was assumed to be 100% spring sown, but 50% winter and 50% spring sown for 1939 and 1950 (as with LCM07). The 1900 scenario was used for comparison for decades beginning in 1891, 1900 and 1911; the 1939 scenario for decades beginning in 1931 and 1941; and the 1950 scenario for decades beginning in 1941 and 1951. The reduced urban area with the alternative land cover scenarios gives an improvement in the overall water balance, mean monthly flows and volume of low flows, though mean monthly flow in the summer is still overestimated; there is also a small increase in NS.
It is likely that the hydrological effects of land characterised as urban have changed between the beginning and end of the 20th century, particularly through the differences in storm drainage. Runoff from impervious surfaces is most evident in the flow hydrograph in summer months when soil moisture deficits prevent runoff from vegetated surfaces. Visual inspection of hydrographs for May-September shows that simulated response to rainfall is broadly in line with observed flow, at least at the beginning of the timeline, with the 1900 land cover scenario, and at the end, with LCM07. However, it is possible that areas of urban cover for the middle part of the 20th century have been overestimated using the constructed land cover scenarios (Table 2), as the estimated urban area for 1939 is better for flow simulation up to the mid-1950s and that for 1950 is more appropriate for the mid-50s to mid-60s. The urban area from LCM07 appears appropriate from the late 1980s onwards. Differences between observed and simulated annual flow (bottom graph of Fig. 2) are reduced by an average of 10 mm for 1891-1930 with the 1900 scenario and an average of 7.5 mm for 1931-1969 with the 1939 scenario. Average flow for 2001-2010 for July-September is around 15% higher from increase in urban area since 1900, using the simulated flow series with the 1900 land cover scenario and LCM07. Probable underestimation of gauged low flows before 1951, particularly before 1910, contributes to higher simulated flows than observed even after allowing for change in land cover. As the urban area is only a small percentage of the total catchment area (Table 2) and is more concentrated in the lower part of the catchment it is unlikely to have a noticeable effect on high flows, though it may have a contributory impact where heavy rain over the lower part of the catchment combines with antecedent high flows from the upper catchment.
Impacts of changes of vegetated land cover through the first half of the 20th century are more difficult to detect partly because of the dominating effect of variation in the rainfall but also because the nature of the change, between grass and arable, has less difference in evaporation than changing, for example, between grass and trees. Compared with grass, arable crops have a seasonal cycle of growth, with a time in late summer or early autumn when growth is cut and the ground may be bare soil for a period. Agricultural practices have changed over the modelling period, in terms of both strains of grass and crops grown (including time of seed sowing), but also through changes in land management. All these factors may affect loss of water through evaporation. Comparison of observed and simulated flows using the different land cover scenarios, with emphasis on flows in the spring and autumn when the effect of cropping may be more evident, does not indicate that rainfall-runoff relationships have been affected by changes between grass and arable. However, the effect of such changes may be obscured by the comparatively dry conditions in the 1940s, including the drought of 1943-1945, and the lower flow data quality pre-1951.

Discussion
Use of a generalised rainfall-runoff model to simulate daily river flow in the Thames over 120 years has shown that, within the uncertainties imposed by model structure, model parameter values and changes in data quality, broad relationships between rainfall and runoff have changed little over the time period. Within this broad stationarity of response, differences between the observed and simulated flow series have determined aspects of the hydrological regime which have changed over time. The questions posed in Section 1 on use of stationary parameter values, relationship between differences and catchment changes, and data quality are discussed below.

Model and parameters
Flows have been simulated with one hydrological model using the same parameter values for generation of a benchmark flow series. Although the generalised parameter values were determined using data from the post-1961 period, the simulated flows compare well with observed flows over the previous 70 years, demonstrating the long-term stability of the parameter values. Smaller differences between observed and simulated flows in the early 1960s are generated with the 1890 run compared with a model run starting in 1961 (not shown), illustrating the importance of antecedent conditions over months or years in generation of flow in the Thames. Use of generalised parameters, based on physical catchment properties, helps to ensure the parameter values are independent of the time period from which they were determined. One feature of hydrological response which could not be reproduced with the current structure and parameter values of CLASSIC is hydrograph shape of the pre-1940 flood events with slower response time. Harrigan et al. (2014) found that models calibrated for a catchment without field drainage showed a large discrepancy compared with observed flows for the post-drainage period. For the Thames catchment much land drainage was implemented before the period of the flow record so impacts of change between non-drained and drained, apart from timing and shape of flood hydrograph, do not appear to be present in the flow regime.
Inter-annual variation in the observed seasonal cycle of runoff, through variation in patterns of seasonal rainfall combined with loss through evaporation, is replicated in the simulated flow over the whole period. Hydrological processes represented in CLASSIC, combined with the generalised method for determining parameter values, realistically simulate rainfall-runoff responses from climatic variation, including extremes of drought and flood. It is likely that trends in flow, which may be evident from long-term climatic oscillations , should also be reproduced by simulated flow. Simulation with hydrological models other than CLASSIC would allow uncertainty from model structure and parameterisation to be incorporated in interpretation of the results.

Changes to the hydrological regime
Two aspects of the rainfall-runoff regime of the Thames are shown, by comparison of the observed and simulated flow series, to have changed over the time period; the timing and shape of flood hydrographs and the runoff from urban/impermeable surfaces.

Flood response
Simulation of the Thames suggests the main change in flood hydrograph shape occurred in the early 1940s, with peaks after this time occurring two to three days earlier than would previously have been the case. Improved land drainage and channel conveyance, with less retention of water in the headwaters of a catchment, can result in increased flood risk further downstream and Harrigan et al. (2014) report that field drainage in an Irish catchment contributed to increased annual mean and high flows. For the Thames the impact, post-1890, seems apparent just in the high flow response.
Flood frequency analysis suggests that there has been an increase in more frequently occurring peaks but a decrease in peak flows of rarer events. This result agrees with the significant increase in number of events over 250 m 3 s −1 found by Marsh and Harvey (2012), who also show a significant decrease in lock water levels over the period, which is probably related to the improved land drainage and increased channel conveyance. The relative changes to flood peaks (increase in higher frequency peaks and decrease in lower frequency peaks) have contributed to the strongly concave shape of the POT flood frequency distribution (Fig. 7); though the moderating effect from the high percentage of permeable bedrock in the catchment is also a contributory factor. It is possible that temporary conditions of impermeability, increasing the percentage runoff, contributed to the three highest flood peaks of the last 124 years -through poor drainage and water-logging in November 1894; frozen ground in March 1947; and intense convective rainfall over the relatively impermeable clay soils of the two lower tributaries (Mole and Wey) in September 1968. With subsequent flood alleviation works for these rivers (Foster and Harris, 1988), increase in temperature and changes in land drainage, such combinations of catchment conditions and meteorological events are less likely to recur.

Land cover
The appearance of the catchment, through changes in agricultural practice and spread of developed land, would be markedly different in 2013 compared to 1890, but these surface differences have had limited impact on rainfall-runoff response. However, much of the change in vegetation has been between grass and arable and it is possible that substantial increase in the wooded area would have more impact. Within the limitations of data quality and model structure, the only change from the historic land use scenarios that has an identifiable effect in the observed flow record is through the increase in urban development. Using land cover percentages for 2007 to simulate flow in the early part of the 20th century results in overestimation of summer flows, while use of the land cover scenario for 1900 provides a more reasonable simulation of small peaks in the observed flow series during periods when effective rainfall is zero. Comparing observed low flows with low flows simulated using the estimated historic urban land area suggests that the urban area in the 1939 and 1950 land cover scenarios (Table 2) has been overestimated. This is not unexpected given the assumptions that were required in combining information on land cover from different sources and different methods of classifying land as urban. Most of the urban area at the beginning of the observed flow record was in the region around, and immediately upstream of, the gauging station, with subsequent development more distributed throughout the catchment. As reported by Miller et al. (2014), impacts of changes from rural to urban may be greatest during the time of initial development and depend on the introduction of storm water drainage as much as the change from soil to impermeable surface. Evidence of impact of urbanisation is most noticeable in the summer but occurs at all times of (non-minimal) rainfall. As the most concentrated urban development is in the lowest part of the catchment, and given the size of the catchment relative to the urban area, shorter response time from these urban areas is not normally a factor in increased peak flows. But different combinations of rainfall distribution over the Thames catchment over multi-day events could result in urbanisation adversely affecting generated flood peaks.

Data quality
None of the data series is entirely consistent throughout the 124 years of flow simulation, but change in flow quality from refurbishment of the weir in 1951, combined with adjustment to allow for change in measurement methods and change in site, appears the dominant reason for increase in similarity between observed and simulated flows from that date (Fig. 3). Differences between observed and simulated flow duration for all quantiles post-1951 are less than 10%. Simulated overestimation of low flows in the early part of the record, even after allowing for reduced urban runoff, probably confirms that observed low flows, pre-1951, are underestimated. But close agreement between pre-1951 volumes of observed and simulated high flows (Q 5 -Q 1 ) indicates good estimation of catchment rainfall and that measurement of high flows (though not necessarily extreme peaks) is of higher accuracy than perhaps thought at the time. However, realistic rainfall data for the Thames catchment for 1890-1960 does not imply that CEH-GEAR for other catchments in Britain will be similarly representative; spatial variation in raingauge density is likely to be a critical factor.
Evaporation from the catchment plays a major role in controlling the volume of annual runoff, and the simulated water balance is sensitive to both the PE data used to run the model and the calculated rates of AE. Simulation of the Thames using PE data from MORECS replicates observed mean monthly flows over a wide range of different combinations of monthly rainfall. It was found that using PE data derived from temperature data using the Blaney-Criddle formula, as for pre-1961 simulation, gave improved difference measures compared with just using mean monthly MORECS PE (1961 for pre-1961 years (not shown). However, given the low goodness-of-fit in the BC PE equations in spring and autumn, there is higher uncertainty in the PE data used pre-1961 compared with post-1961, which is higher again for the first two decades where temperature data are only available for one location. Sensitivity to the PE data was demonstrated using non-bias corrected BC PE data, when a discontinuity was apparent in analysing time-series of differences between observed and simulated flows, pre-and post-1961. Although the differences between non-bias and bias corrected BC PE data are only a few millimetres per month, because of the even balance between rainfall and evaporation in many months, the impact on runoff may be enhanced. Detecting the sensitivity of catchment runoff to differences in evaporation data relies on the use of suitable measures, such as those that quantify water balance issues over annual or sub-annual time periods. Values of NS in Table 4 show the measure is insensitive to running the model with different land cover scenarios. However, measures based on mean monthly flow and the flow duration curve show that differences between observed and simulated flows are affected by PE and may be reduced by allowing for appropriate changes in land cover. Oudin et al. (2005) concluded that rainfall-runoff models are insensitive to detailed PE but used NS and the overall water balance bias as the assessment criteria. Appropriate measures are required to detect and attribute the hydrological consequences of different land cover scenarios.

Conclusions
Apart from the changes discussed above, the Thames demonstrates a relatively stable relationship between rainfall and runoff over the last 120 years, with variations in rainfall, particularly variation between the four seasons, providing the dominant cause of variation in statistics of flow. This conclusion, in agreement with Kling et al. (2012) for the Danube, is important for use of hydrological models, with unchanged parameter values, in estimating impacts of change. Although aggregated changes in vegetation cover over the last hundred years have played a negligible role in rainfall-runoff response in the Thames catchment to its tidal limit, local changes may be more evident in response from sub-catchments. Future changes which could affect the water balance of the catchment include response of vegetation to increase in CO 2; as changes to leaf area index and stomatal opening may limit the increase in evaporation which could otherwise occur with an increase in temperature Rudd and Kay, 2015). Future climate change scenarios suggest an increase in winter and decrease in summer rainfall for the Thames (Bell et al., 2012), though rainfall in spring and autumn, in balance with evaporation, is critical in controlling the development and replenishment of soil moisture deficits and hence what effect increased winter rainfall has on the flow regime. Dry winters are the main cause of drought in the catchment, and the chalk and limestone aquifers are a vital component of the rainfall-runoff response in maintaining flows through the five months of the year when, on average , evaporation exceeds rainfall, and limiting the impact of high rainfall on peak discharges.
The daily CEH-GEAR dataset has greatly extended the potential for generation of long flow series for British rivers and exploration of a much wider range of hydrological events and extremes than was possible with monthly data (Jones et al., 2006). This initial simulation of flow, concurrent with the long observed Thames record, provides the background for more detailed study of detection and attribution of causes of change, including full uncertainty analyses (Merz et al., 2012), at a range of spatial scales within the catchment and nationally. Improvement in consistency of derivation of historic land cover scenarios through digitisation of land cover surveys would aid research into impacts of land cover change on the water balance. Further research is also required into hydrological model parameterisation for simulation of flow with and without land drainage, which may be important in historic flow reconstructions.
The signature measures between observed and simulated flow time series were selected to provide information about differences between the two series covering a range of aspects of the flow regime. Equations use the notation Q for observed flow and q for simulated flow, over-bars denote overall mean flows, subscripts d and m denote mean daily and mean monthly flow, Q n indicates daily flow exceeded n% of the time and nday is the number of days in the time series.
The water balance bias (WB), expressed as a percentage difference between overall mean simulated and observed flow, indicates how well the balance between rainfall and evaporation agrees with the observed volume of flow over the time period, but may mask underlying errors in the water balance at an annual time scale.
The root mean square error (RMSE) is the mean daily difference between observed and simulated flows. Model efficiency (NS), introduced by Nash and Sutcliffe (1970), is a dimensionless measure which expresses the RMSE as a proportion of the variability in observed flows. A value of 1 indicates an exact fit between observed and simulated flows and a value of 0 that the model is only as good as using the mean flow. RMSE and NS are sensitive to differences in timing of high flows.

RMSE
The following measures, as percentages, are derived from mean monthly flow, MM , and the flow duration curve. The latter are based on measures used by Yilmaz et al. (2008) -HFV (bias in High Flow Volume), FMV (bias in Mid Flow Volume), and LFV (bias in Low Flow Volume). MM shows differences in seasonality of observed and simulated flow, while HFV, MFV and LFV capture differences in high, mid and low ranges of the flow regime. A positive value indicates simulated flow is greater than observed.