Deﬁning the hundred year ﬂood: A Bayesian approach for using historic data to reduce uncertainty in ﬂood frequency estimates

This paper describes a Bayesian statistical model for estimating ﬂood frequency by combining uncertain annual maximum (AMAX) data from a river gauge with estimates of ﬂood peak discharge from various historic sources that predate the period of instrument records. Such historic ﬂood records promise to expand the time series data needed for reducing the uncertainty in return period estimates for extreme events, but the heterogeneity and uncertainty of historic records make them difﬁcult to use alongside Flood Estimation Handbook and other standard methods for generating ﬂood frequency curves from gauge data. Using the ﬂow of the River Eden in Carlisle, Cumbria, UK as a case study, this paper develops a Bayesian model for combining historic ﬂood estimates since 1800 with gauge data since 1967 to estimate the probability of low frequency ﬂood events for the area taking account of uncertainty in the discharge estimates. Results show a reduction in 95% conﬁdence intervals of roughly 50% for annual exceedance probabilities of less than 0.0133 (return periods over 75 years) compared to standard ﬂood frequency estimation methods using solely systematic data. Sensitivity analysis shows the model is sensitive to 2 model parameters both of which are concerned with the historic (pre-systematic) period of the time series. This highlights the importance of adequate consideration of historic channel and ﬂoodplain changes or possible bias in estimates of historic ﬂood discharges. The next steps required to roll out this Bayesian approach for operational ﬂood frequency estimation at other sites is also discussed.


Introduction
On 5-6 December 2015 Storm Desmond swept across northern Britain, leaving record rainfall and widespread flooding in its wake (Priestly, 2016). One of the worst affected places was Carlisle, where thousands of homes were flooded after newly completed flood defences were overtopped by rising flood waters (BBC, 2015a). While ministers insisted the defences could not be expected to cope with the ''completely unprecedented and unpredicted levels of rainfall" (ITV News, 2015) from Storm Desmond, local residents wondered why a £38million defence scheme had failed barely five years after it was built (BBC, 2010).
The Carlisle case illustrates many of the central challenges of quantifying the risk of flooding. The design of individual schemes, like the wider system for allocating resources for flood defence in England, requires estimates of the probability of flooding to support risk-based management . Quantitative assessments of flood frequency are also central to planning regulation (Porter and Demeritt, 2012), insurability standards and pricing (Krieger and Demeritt, 2015), and to the flood risk maps and management plans mandated by the EU Floods Directive (2007/60/EC). These and other flood risk management policy instruments all depend on so-called 'design floods', whose magnitude is defined in terms of nominal return-periods, like the 1 in 100 year flood. However estimating flood frequency is necessarily uncertain, and uncertainty in the design floods used for flood risk assessment can have major implications for multi-million pound decisions about whether and where new developments are permitted and what, if any, standard of protection will be provided to defend them from flooding.
One of the major sources of uncertainty in flood frequency analysis is the paucity of instrumental records. Although the 0.01 Annual Exceedance Probability (AEP) flood event has become something of an international default for design floods, mandated both by the EU Floods Directive and by the US National Flood Insurance Program, only a tiny portion of gauges provide continuous data for that long. Hydrologists have developed a number of approaches for dealing with the uncertainties arising from this deficit of instrumental records, but they still leave wide confidence http intervals, particularly for extreme flood events, whose frequency we are most concerned with estimating correctly.
To address that challenge, this paper develops a Bayesian model for supplementing instrumental readings of flood discharge from a river flow gauge with estimates derived from documentary records of historical floods, using the River Eden at Carlisle, UK as a case study. For this case study, we use the term 'historic data' to refer to estimates of flood discharge from any time prior to the introduction of the river flow gauge in 1967; whereas discharge readings from the systematic period since 1967 are referred to as 'gauge data'. Compared to the conventional frequentist approaches to estimating flood return periods, our Bayesian approach allows us to update our initial estimates of flood frequencies by incorporating other increasingly uncertain kinds of data and to quantify the total uncertainty involved in combining them through Monte Carlo methods of sensitivity analysis. Our data, analytical methods, and statistical models are described in Section 3. Then in Section 4 we contrast the results of our Bayesian model to the flood frequency estimation performed using the WINFAP-FEH software (WHS, 2014), which the Environment Agency (2012) recommends for use in official flood appraisals in the England and Wales. In Section 4 we also estimate the AEP for the recent extreme flood in Carlisle and use this example to evaluate the effect that an additional data point can make to the uncertainty in the flood frequency curve for sites of interest. The paper concludes with a discussion of the wider implications of its findings for flood risk management.

Uncertainty in flood frequency analysis
Flood frequency analysis involves both epistemic uncertainties, which arise from imperfect knowledge of the system being modelled, and aleatory or stochastic uncertainties, which, for the purposes of flood frequency analysis, can be thought of as truly random (Beven, 2008). Environmental modelling has advanced by minimising epistemic uncertainty through improved representation of natural processes whilst characterising aleatory uncertainties probabilistically (Merz and Thieken, 2009). In the context of flood frequency analysis, the aleatory uncertainties associated with the chaotic aspects of long term weather forecasting limit the goals of the analyst to that of establishing, through a flood frequency curve, the probabilities of floods of certain magnitudes rather than making predictions of when flood will actually occur. Consequently, an idealised flood frequency curve for a location would be an accurate representation of the flood probabilities in that location. In practice it is inconceivable that all epistemic uncertainty could be removed, even then, there would remain some sources of aleatory uncertainty (such as the nonstationarity of the system due to climate change) which would introduce error.
In this section we briefly review some of the sources of epistemic uncertainty that undermine the conventional statistical approaches for deriving probability distribution functions used to represent stochastic processes. We then introduce the concept of Bayesian approaches to analysis that allow the incorporation of heterogeneous sources of data into the statistical analysis with the overall aim of reducing the uncertainty in the probability distribution of flood frequencies.
2.1. Statistical approaches to flood frequency analysis Fisher and Tippett (1928) developed the first frequency curves for estimating the probability of extreme events from time series data, and the field of extreme value statistics is now well developed (e.g. Castillo et al., 2005;Coles et al., 2001;De Haan and Ferreira, 2006;Reiss and Thomas, 2007). For hydrological purposes, time series are typically divided into periods of 1 year 1 with the maximum discharge figure recorded in each year termed the annual maximum (AMAX). Strategies for selecting the most appropriate extreme value distribution (EVD) and fitting the distribution parameters to the data can be reviewed in texts such as Hosking and Wallis (1997), Beven andHall (2014), andParkes (2015).
In the UK the Flood Studies Report (FSR) (NERC, 1975) first prescribed standardised ways to estimate flood frequencies from the limited river and rainfall gauge data that was available. Subsequently, the Flood Estimation Handbook (FEH), the successor to the FSR (IH, 1999), described several ways of estimating flood frequency curves which, together with the related WINFAP-FEH software from WHS (2014), are heavily recommended by the Environment Agency (2012). The WINFAP-FEH software makes it relatively simple to derive a flood frequency curve for a river location, especially if there is an operational river flow gauge nearby (see Fig. 6 for an example).
Often, the dominant source of uncertainty in flood frequency analysis is the sampling error due to the shortness of the available time series (Apel et al., 2008;Kjeldsen et al., 2014a). The FEH advises that return period estimates from flood frequency curves derived from a single gauge should not be used to estimate flood return periods greater than half the length of the gauged record (WHS, 2009b). This 'rule of thumb' advice has serious consequences for practitioners: most gauge records in Britain are less than 40 years old (Kjeldsen et al., 2008) so the use of conventional statistical methods for flood frequency analysis is severely restricted. A widely used method for overcoming this limitation is to ''trade space for time" (Van Gelder et al., 2000) by combining the limited systematic gauge data at one site with data from 'hydrologically similar' sites elsewhere. This method is known as 'regional frequency analysis', and it is comprehensively assessed by Hosking and Wallis (1997). The WINFAP-FEH software supports regional frequency analysis with its 'pooled analysis' feature but, at present, the WINFAP-FEH software provides no method for estimating confidence limits for pooled analyses. Furthermore, concerns with the earlier methods of site pooling have led to alterations in the pooling method, referred to as 'enhanced single site analysis', such that greater weighting is now given to the gauged records at the site (WHS, 2009b). This suggests that issues of inter-site correlation and heterogeneity endemic to site pooling methods are not yet fully understood and will undermine any attempt at uncertainty estimation. Consequently, pooled analysis is not considered further in this paper. For further information see Hosking and Wallis (1997) or Kjeldsen and Jones (2006). Uncertainties also arise from the different methods of measuring discharge, whose accuracy varies considerably with the type of gauge, the geomorphology of the channel, and the level of water. Furthermore discharge measurement uncertainties increase during floods (Di Baldassarre and Montanari, 2010;IH, 1999;Neppel et al., 2010;Rosso, 1985). For the majority of stations in the UK, the water level is recorded every 15 min and the discharge is calculated indirectly from the stage-discharge relation using a rating curve (CEH, 2014). Estimates of the general errors in discharge estimates range from 3% to 5% of the discharge estimate (Cong and Xu, 1987) up to as much as 30% for extreme flows (Kuczera, 1996;Potter and Walker, 1981). Other sources give typical values in the range of 4-8% with estimates tending to cluster around 6% (see for example Leonard et al., 2000;Pappenberger et al., 2006 and sources therein). Measurement errors will contribute to the overall uncertainty of the flood frequency curve, but flood frequency curves derived from river flow gauges often do not take 1 The 'year' for hydrological purposes in the UK is defined as the 'water year' which begins on 1st October, deemed to be the time when groundwater storages are most usually low. account for this source of uncertainty (for example MacDonald et al., 2013;Miller et al., 2012). The model described in this paper explicitly considers the uncertainty in the gauged data (see Section 3.6).

Information from historic flood records
Where information about historic floods is available, researchers have sometimes tried to incorporate it into their analyses. While the original flood studies report (NERC, 1975) mentioned incorporating historical data (Stedinger and Cohn, 1986), the practice was unusual, with just a few isolated examples (Potter (1978), Sutcliffe (1978), McEwen (1987McEwen ( , 1990 and Acreman (1989)). But subsequently, flexible data fitting methods such as the Expected Moments Algorithm (EMA) (Cohn et al., 1997(Cohn et al., , 2001) that can accommodate forms of historical data sets are becoming standard practice in the US. In the UK, a report published by the Centre for Ecology and Hydrology in 2001 provided guidance for, and underlined the benefits of, incorporating historical data in flood frequency estimates (Bayliss and Reed, 2001). Soon thereafter the British Hydrological Society produced a database of historical hydrological records (Black and Law, 2004), which have been used in several studies (see, for example Black and Burns, 2002;Macdonald, 2013;Macdonald and Black, 2010;Macdonald et al., 2006;Williams and Archer, 2002). But concerns remain about historical data introducing biases that can degrade the accuracy of flood frequency analysis (Hosking and Wallis, 1986).
A recent review by Kjeldsen et al. (2014b) highlights the unrealised potential to reduce the uncertainty in flood risk estimates through careful use of historic data. Europe provides a wealth of archival resources for studying historical floods; Brázdil et al. (2006) cite numerous European studies spanning no fewer than 10 countries. In China too there is a long history of recording high water marks, which have been exploited by historical flood studies (see for example Li et al., 2013). Other historic information sources include articles or photographs from newspaper archives (MacDonald, 2014), high water marks carved onto bridges or other buildings (Macdonald, 2007), dated stones placed at high water marks of out of bank flows (Condie and Lee, 1982), diary or journal entries and even literary works (The Bristol Post, 2013).
An epigraphic mark or a description in a journal can give an idea of the maximum flood depth at a certain point, but this must then be converted into peak discharge estimate, typically by using a rating curve (see Section 2.1), though geomorphological changes to the channel may mean the current rating curve equation is less applicable to historic floods (see for example Archer et al., 2007). Issues of non-stationarity such as this are discussed further in Section 5.
Such historic flood discharge estimates are likely to contain considerable errors that must be accounted for. First, regardless of the source, all historic data must be subjectively evaluated to assess its reliability and the value it might add to existing flood frequency estimates. For example a recording of a single flood event from a journal of 1000 years ago may provide valuable insights about the potential impact of a similar flood on contemporary society (Brázdil et al., 2006) and assist in the estimation of the Probable Maximum Flood (see for example Acreman, 1989;Kirby and Moss, 1987) for a location. But that same source would be of little value in improving estimation of modern day flood frequencies if it doesn't cover a reasonably continuous time period. If, however, the historical flood record provides some quantifiable information about the likely flood magnitude and covers a continuous multidecade period such that the researcher can be reasonably confident that all significant floods in that period at that location are recorded, then the data can often be combined with the recent, instrumental time series to improve the estimation of flood frequencies.
As well as using historical data to provide estimates of peak flood discharge, historical data may also provide information on historical changes to the catchment, channel or hydrology that undermine assumptions of stationarity and affect flood magnitude and inundation extent (see for example Naulet et al. (2005), Razavi et al. (2015)). This aspect is discussed for the Carlisle case study in Section 3.3.
One important step in working with historic flood records is identifying the smallest flood event still significant enough to have been captured in the records. The estimated discharge associated with that event becomes the 'perception threshold' (Q 0 ) for the historical time series which can then be treated as 'censored sample' for statistical purposes . Furthermore, even if it is not possible to estimate the precise magnitude of floods above the threshold of perception, the historical time series can still be incorporated as a 'binomial-censored' sample whereby it is known that in the historical record of n years, Q 0 was exceeded k times (Stedinger and Cohn, 1986). Where it is possible to estimate the magnitude of historical floods, it is likely that the uncertainty in those peak discharge estimates would be greater than the instrumental record, and this must also be considered when estimating the uncertainty of the flood frequency curve (Viglione et al., 2013). Examples of studies where MCMC methods have been used in Bayesian models that incorporate 'censored samples' of historic floods include: O'Connell (2005), Parent and Bernier (2003), ; and Viglione et al. (2013).

Bayes theorem
The Reverend Thomas Bayes' (1702-1761) famous theorem relating conditional probabilities was published posthumously by Richard Price, a mathematician friend of Bayes in an essay that appeared in the Philosophical Transactions of the Royal Society (Bayes and Price, 1763;Bertsch-McGrayne, 2011). Bayes' rule can be considered very simplistically as a statement that ''by updating our initial belief with objective new information, we get a new and improved belief" (Bertsch-McGrayne, 2011). In mathematical form it is written: For any probability distribution with parameters h, such that p (h) is the prior distribution (or initial belief) and p(h|D) is the posterior distribution (new and improved belief) after having observed the data, D (new information). '(D|h) is the probability distribution or likelihood of the data (D) conditional on the parameters (h). The integral denominator in Eq. (1) is a normalisation constant ensuring the area under p(h|D) is unity. Consequently Bayes' theorem can be expressed more simply as: Notwithstanding the simplicity of Eqs.
(1) and (2), the integral denominator in Eq. (1) often has no analytical solution (Viglione et al., 2013). Furthermore as more than one source of uncertain observed data is used to infer multi-parameter distributions, the expression of the joint posterior distribution also tends to become intractable (Lunn et al., 2013). The steps required to address these problems are far from straightforward, which is perhaps why Bayesian approaches to flood frequency analysis are uncommon outside academic research (Kjeldsen et al., 2014b).
But advances in computing power now make it possible to run Markov Chain Monte Carlo (MCMC) simulations that produce a sample of the desired posterior distribution after a large number of steps. This brings Bayesian solutions within the reach of researchers without extensive specialist statistical expertise Robert and Casella, 2004), and there are many examples where researchers have proposed ways to combine several data sources for flood frequency analysis and manage the uncertainty within those data sources (see for example Chandler, 2013;Fernandes et al., 2010;Fill and Stedinger, 1998;Gaume et al., 2010;Kuczera, 1999;Neppel et al., 2010;O'Connell et al., 2002;Parent and Bernier, 2003;Payrastre et al., 2011Payrastre et al., , 2013Renard et al., 2006;Ribatet et al., 2007;Seidou et al., 2006;Viglione et al., 2013;Wood and Rodríguez-Iturbe, 1975).

Data and methods
Carlisle is a city in Cumbria, North West England with a population of approximately 100,000 and a long history as an important regional centre pre-dating the Roman occupation. Carlisle is located on the confluence of the rivers Eden, Caldew, and Petteril, all of which have flooded in recent decades. The fact that the city has been established for many centuries means there are substantial local archives charting the long history of flooding in the area. A river flow gauge was installed on the River Eden at Sheepmount (see Fig. 1 for location) in 1967. The gauge has been in operation for over 40 years recording both flow and water level data every 15 min (CEH, 2014). At the time of writing the most recent published gauged data is from 2012, giving a 46 year AMAX time series.
On 8 January 2005 Carlisle experienced its worst flood for over 200 years (Environment Agency, 2006), inundating much of the city (see Fig. 1). This event has been heavily analysed thanks to a unique set of accurate observations of peak flood depth and extent collected immediately after the flood (see for example Fewtrell et al., 2011;Horritt et al., 2010;Neal et al., 2012Neal et al., , 2009Parkes et al., 2013). As a result of the January 2005 event, an improved flood defence scheme was developed and completed in 2010 (BBC, 2010), protecting the city against 0.005 AEP floods (return period of 200 years) (Environment Agency, 2009).

River gauge data at Carlisle
The Flood Studies report (CEH, 2014) lists the gauge on the River Eden at Sheepmount as 'grade B', while a Flow Quality Comment document describes the gaugings there as showing less than 10% deviation from the rating curve. It characterises the high flow record as ''good to valid" although substantial bypass flow is also noted when the level is above bank (Environment Agency, 2011c). In a paper modelling the January, 2005 Carlisle flood, Horritt et al. (2010) set up a simple finite volume model (Horritt, 2004) in order to assess the performance of the Sheepmount gauge and the Cummersdale gauge on the River Caldew. They find the EA rating curve to be a good fit to both the within bank stagedischarge measurements (up to 1000 m 3 s À1 ) and for the January, 2005 event, although they recommend revising the peak flow from 1520 down to 1490 m 3 s À1 . This evidence suggests that gauge readings at Sheepmount can be relied upon to be within approximately ±10% even for out-of-bank events. The Bayesian model described in Section 3.5 takes account of this source of uncertainty and the sensitivity of the model to it can be seen in Fig. 5.

Historical data at Carlisle
The January 2005 flood was described at the time as the worst to affect Carlisle in over 200 years (Environment Agency, 2006), only to be exceeded by the December 2015 flood (Marsh, 2016).
But these are by no means the only destructive floods to have affected Carlisle; flood marks have been recorded on the Eden Bridge since its construction in 1815, and these can be cross referenced with newspaper reports and a staff gauge maintained by Carlisle City Council for nearly 100 years from 1850 (Environment Agency, 2006). Smith and Tobin (1979) where h is the measured (or estimated) stage, which for this site is AOD -6.9 m. Notes: 1. Level at Eden Bride derived from recurrence interval in Smith and Tobin (1979). For historical records, the water level at the Sheepmount gauging station is calculated as approximately 1.1 m lower than the estimated level at the Eden Bridge. 2. Level at Eden Bridge from Cumbria Floods Technical Report (Environment Agency, 2006). For historical records, the water level at the Sheepmount gauging station is estimated to be approximately 1.1 m lower than the estimated level at the Eden Bridge. This figure comes from hydraulic flood models of the January 2005 flood (see Parkes, 2015;Parkes et al., 2013). 3. From Sheepmount river gauge records (CEH, 2014). 4. These records are not used in the historical data set because they are not the maximum flood for the 'water year' which starts on 1st October.
The historical (pre-gauge) flood data available for Carlisle is in the form of a ''censored" time series where discharge estimates for all floods above a certain ''perception threshold" are assumed to be known (Stedinger and Cohn, 1986). Combining the gauged data from the Sheepmount gauge with the historical estimates (Table 1) produces the time series shown in Fig. 2. Fig. 2(a) displays the gauged and historical flood peaks as known values, whereas in reality both the gauged AMAX readings and the historical estimates are uncertain (although the gauge readings are likely to be more accurate than the historical estimates). The 'perception threshold' is inferred from the lowest historical flood peak. In this case the perception threshold is likely to remain fairly constant over the historical period because, in some sense, it is anchored by the presence of the Eden Bridge, which was constructed in 1815, and has provided a benchmark against which the height of subsequent floods have been marked by local residents (see Section 3.3). The smallest flood in the historical record is estimated to be 893 m 3 s À1 , suggesting a perception threshold of approximately 890 m 3 s À1 . Since the perception threshold is inferred from the highly uncertain historical flood record, its value is also uncertain. Fig. 3(b) shows the uncertainty associated with the gauge readings, historical flood estimates, and the perception threshold. Methods for deriving flood frequency curves should ideally incorporate all these sources of uncertainty when generating confidence limits for the EVD parameters. Fortunately a Bayesian analysis integrating the measurement errors and uncertainties can provide a full posterior distribution of the EVD parameters .

Land use and channel change
When using historic flood data to estimate contemporary flood frequency curves, it is often necessary to consider physical changes to the catchment, river channel or floodplain that may have affected the extent of historic flooding and thus the rating curve for the river at the study site.
The relationship between land-use change in the catchment and fluvial flood risk is not yet fully understood, but is thought to vary with the scale of the catchment under consideration (Pattison and Lane, 2012). Historical maps of the catchment show minimal land use change over the historical period; 200 years ago the uplands were devoted to sheep grazing with only a little tillage, much as they are today (see Vision Of Britain (2016) for historical maps and details on land use). Furthermore, for larger catchments, such as the River Eden at Carlisle, there is no evidence that local scale effects of land-use change aggregate downstream (O'Connell et al., 2007).
However, natural or man-made changes to the river channel in or near the study area are likely to have a significant impact on local flood levels (Neppel et al., 2010;Shaw et al., 2010, p. 128). For example, the construction of levees will reduce the available flood plain storage resulting in a higher water level at the channel for a given discharge. Similarly, bridges often narrow the channel and restrict flow. Changes to the river bed, from natural processes such erosion and deposition or anthropomorphic activities from dredging, are also likely to affect the stage-discharge relationship over time (Jalbert et al., 2011).
Quantifying the full effects of this non-stationarity in Carlisle is beyond the scope of this paper. However, a brief review of the literature, including historic maps and images (e.g. Fig. 4) of the area, provides some indicative guidance. Fig. 3(a) shows that before the historical period of this study, the River Eden at Carlisle comprised two branches separated by an area called 'The Sands'. By 1821 ( Fig. 3(b)) the Eden no longer had two branches and 'The Sands' was not an island as the southern branch of the Eden had been filled in (Cooke, 1829). The two bridges shown in the 1805 map (Fig. 3a) were replaced by the 'Eden Bridge', completed in 1815 and still standing (Eden Bridge, 1932). It is clear that the geomorphology of the Eden changed considerably between 1805 and 1821 and construction of the Eden Bridge in that period may also have altered the river's hydraulics.
From the evidence above it can be deduced that the infilling of the southern branch of the Eden and the removal of the earlier bridges over it occurred at some point between 1805 and 1815, when the new Eden Bridge was completed. The impact of these changes on flood levels and discharge is likely to be substantial and largely unquantifiable. Consequently the estimated discharge of 1025 m 3 s À1 for the 1809 flood is less certain than the discharge estimates for the subsequent historic floods. For this reason, the discharge estimate for the larger 1809 flood is not used directly; instead, the flood is given broad upper and lower bounds (see Section 3.5).
Although the bridge built in 1815 still stands, its form has changed. There was originally a second new bridge completed contemporaneously to the south of the five arches in existence today. Both bridges can be seen in Fig. 4. The second bridge spanned the earlier, southern branch of the River Eden that was now dry, but still available for channelling floodwater. The arches forming the second bridge were replaced by an embankment in 1969 to support a new road scheme (Cumberland News, 2010). The removal of the arches will have further restricted the flow of floodwater past the Eden Bridge in times of flood.
Similarly flood defences on the floodplain in Carlisle have been built and improved in several stages during the historical period. Smith and Tobin (1979) document the construction of embankments in the early nineteenth and twentieth centuries with further improvements in 1932. More significantly, major improvements to the flood defences were implemented following the 1968 flood, which affected more than 400 properties (Atkins, 2011;CEH, 2014). The removal of the additional bridge arches and the flood defence works carried out over the past century were probably designed to keep the flows in the river. This hypothesis is backed up by expert consultation with a local Flood Risk Management Advisor for the Environment Agency (Environment Agency FRMA, 2014 interview), who explained: ''I suppose over the years we have done a lot of work which will keep the flows in the rivers so flow that was a meter below in 1856 might have had the same flow as we've had now, but we don't know" There is no systematic information available about within-bank channel changes over the historical period, but it is likely that the practice of dredging to improve river navigation and, to a lesser extent, flood management would have been more common during the historical period than in recent decades, as is suggested by references to the practice in Carlisle in 1948and 1963(Carlisle History, 2014. This has been the trend across Britain as cost, environmental concerns, and the shift from river to road transportation have reduced dredging activities (Environment Agency, 2010).
Overall the effect of channel changes is likely to mean that discharge figures calculated for historical floods using the current rating curve formula for the Sheepmount river gauge will be low when compared to the AMAX time series from the gauge data. This would manifest itself as a systematic error/bias in the historic data.
In order to estimate the scale of this bias, simulations of the January 2005 flood were run using a digital elevation model (DEM) that had been altered to remove the post-1968 defences and incorporate the additional Eden Bridge arches. The simulations were run using a 2-D flood model conditioned with data from the January, 2005 flood. A full description of the model set-up, calibration, and performance can be found in Parkes et al. (2013) and Parkes (2015). The maximum water level at the Eden Bridge (in the approximate location of the historic flood marks) was 0.297 m lower when compared against comparable simulations run using the 2005 DEM. By increasing the estimated flood levels for the historic floods in Table 1 and recalculating the discharges using the rating Eq.

Selection of extreme value distribution (EVD)
Several different distributions have been proposed and used to characterise flood frequency curves (Cunnane, 1989). Official river agencies sometimes recommend or even mandate the use of particular distributions they consider most appropriate. In the United States, Bulletin 17B from the United States Water Resources Committee (Kirby and Moss, 1987;USWRC, 1982) recommends using the Log-Pearson Type III (LP3) distribution (Bobée and Ashkar, 1991;Vogel and McMartin, 1991). Conversely, in the UK the Flood Estimation Handbook (vol. 3) recommends the generalised logistic (GL or GLO) distribution (IH, 1999), which, together with the generalised extreme (GEV) distribution, is thought to be a satisfactory fit for almost all of the 602 gauged locations in the UK (Kjeldsen et al., 2008).
There is no purely logical reason for preferring any particular extreme value distribution (EVD) for modelling a time series of AMAX river gauge readings. Fisher and Tippett (1928) assert that for a sequence of independent, identically distributed random variables, the asymptotic distribution for the largest value in a block of n values will be of a GEV type as n tends to infinity. However, in the real world, the AMAX time series is not bound to eventually converge on a GEV distribution or any other mathematical function, especially when one considers the many factors that violate the 'independent, identically distributed' assumption. Most commonly used distributions have 3 parameters (often describing their location, scale and shape), giving them the flexibility to approximate a wide range of 'real' distributions which are obviously not bound to adhere asymptotically to a mathematical formula. This uncertainty about the relationship of the chosen EVD to the natural processes it is representing, is referred to as model uncertainty (Wood and Rodríguez-Iturbe, 1975).
One approach to managing model uncertainty is to create a composite distribution of several different EVDs, each allocated a weighting based on how well it fits the available data (Apel et al., 2008(Apel et al., , 2004Wood and Rodríguez-Iturbe, 1975). However the goodness of fit tests employed to establish the weightings for candidate EVDs will always be limited because they can only operate within the range of observed data (Keef, 2014, p. 199). Furthermore, as Fig. 6 suggests, the sampling uncertainty often masks the model uncertainty (Cunnane, 1987) such that it is not possible to directly estimate the effect of the model error (Kjeldsen et al., 2014a). For these reasons, we make no attempt to represent model uncertainty using a composite of different distribution types. Instead we selected a single distribution using the 'goodness of fit' test proposed by Hosking and Wallis (1997) and described in detail by Kjeldsen et al. (2008). Table 2 shows the results for the Sheepmount river gauge of the 'goodness of fit test' implemented using the WINFAP-FEH software. The test was run both for a single site analysis and a pooled analysis. Under the goodness of fit test, the distribution is considered an acceptable fit to the data if the absolute value of Z is less than 1.64 (Hosking and Wallis, 1997;Kjeldsen et al., 2008), which roughly corresponds to acceptance of the hypothesised distribution at a 90% confidence level (Hosking and Wallis, 1993). Only the GEV and Pearson Type III distributions have an acceptable fit for both single site and pooled analysis. Overall the GEV distribution gives the lowest absolute Z value so we selected it for use in our Bayesian analysis of flood frequency at the Sheepmount river gauge. Within extreme value theory, the GEV distribution is considered a sensible choice of distribution for modelling sequences of independent random variables (Chandler et al., 2014;Embrechts et al., 1997;Jenkinson, 1955) and is commonly used for flood frequency analysis (see for example Gaume et al., 2010;Haberlandt and Radtke, 2014;Kuczera, 1996;Naulet et al., 2005;Neppel et al., 2010).

Model design
To conduct the Bayesian analysis for this paper, we used the OpenBUGS software (Lunn et al., 2009). BUGS stands for Bayesian inference Using Gibbs Sampling, so the software uses an implementation of the Gibbs sampler (Geman and Geman, 1984), which is a special case of the Metropolis-Hastings algorithm (Hastings, 1970;Metropolis et al., 1953).
If just the gauged AMAX data were available, the likelihood function for gauged data, ' g (D|h), in Eq. (1) would be: In this case D is the AMAX time series (x 1 , x 2 , x 3 ,. . ., x s ) of gauged data of length s years and f X () is the probability distribution function (pdf) for X. Here the GEV distribution is used, which has parameters, h ¼ fl À location; r À scale; g À shapeg 2 and pdf given by: f ðx :; l; r; gÞ ¼ 1 shows uncertainty in discharge estimates and perception threshold, see Section 3.6 for a discussion of the error bars. 2 The notation for the location, scale, and shape parameters of the GEV distribution is not consistent across disciplines. For example Kuczera (1996)   For non-zero g and, when g = zero: f ðx :; l; r; gÞ ¼ When incorporating historic data, it is necessary to take account of the incomplete nature of the dataset. The concept of the 'perception threshold' described in Section 3.2 can be defined as 'the level above which all flood events were recorded in the historic period'. This can be expressed in terms of probabilities as: where X P is the perception threshold and Pr(recorded|X) is the probability of there being a recording of a historical flood in any year given the existence of the 'true' annual maximum flood, X. The assumption that no large flood events were missed during the historic period is justifiable in the case of Carlisle due to the relatively extensive local sources listed in Table 3. However, setting a value for the perception threshold isn't simply a case of choosing a value just below the smallest historical flood event, because of the uncertainty about the discharge estimates for those historic floods. Whereas Viglione et al. (2013) set the perception threshold to the upper bound of the smallest known historical flood, we have given the perception threshold itself an uncertain value in the model. During the historic period of length h years it is known there were k floods above the perception threshold, X P . For k 0 of the historical floods there are estimates of the peak discharge available denoted fy 1 ; y 2 ; . . . y k 0 g. For the other k 00 historical floods it can only be said the flood was somewhere between an upper and lower bound (note: k = k 0 + k 00 ). The upper bounds for these historic floods are denoted fy U1 ; y U2 ; . . . y Uk 00 g and the lower bounds fy L1 ; y L2 ; . . . y Lk 00 g. In this case there are 21 known historical floods in Carlisle since 1800, of which there is an estimate of peak discharge available for 20, so k 0 = 20 and k 00 = 1. For the remaining (h-k) years of the historical period it can only be said that the maximum flood was below X P . Assuming that annual floods are independent events, and following Stedinger and Cohn (1986) and  methods for representing censored data and the method of Viglione et al. (2013) for representing historical floods with upper and lower bounds, the likelihood function for the historical data, ' h (D|h) is: This contains terms for the h-k historical floods below the perception threshold, the k 0 floods for which an estimate of peak discharge is available, and the k 00 historical floods known to lie between upper and lower bounds. h k ¼ h! k!ðhÀkÞ! is the binomial coefficient, and F X () is the cumulative density function of the GEV distribution given by: Combining Eqs. (4) and (8) gives the overall likelihood function '(D|h): Error terms for historical (y j ) and gauged (x i ) discharge estimates are not shown.

Uncertainty in discharge estimates
Our Bayesian model for producing the probability distribution functions for the location, scale, and shape parameters of the GEV distribution needs to take account not only of the sampling uncertainty due to the limited data available but also any random and possibly systematic data errors in the gauged and historical flood estimates. Fortunately the MCMC can easily incorporate data uncertainty without much impact on execution time . Various approaches to representing the error in gauged and historical readings have been employed (see for example Neppel et al., 2010), such as a log-normally distributed error for gauged data (Kuczera, 1996) and a uniformly distributed error with upper and lower limits for historical flood estimates (Viglione et al., 2013). The approach for random errors taken here is to assume both gauged and historical discharge estimates have a normally distributed error term associated with them, whose standard deviation represents our confidence in the accuracy of the estimate. Error in the gauge readings is likely to increase with the magnitude of the flood (Di Baldassarre and Montanari, 2010;IH, 1999;Neppel et al., 2010;Rosso, 1985). By using a standard deviation for the error term which is a percentage of the gauge reading, error will scale in proportion to the estimated magnitude of the flood.
Historical estimates of floods are not necessarily affected by this bias. Instead, the uncertainty of historical estimates is affected by other factors such as the time since the event and the richness of the information source. We believe that modelling these factors explicitly would add unnecessary complexity to the model. Consequently the model uses the same standard deviation for the uncertainty in all historical discharge estimates rather than a percentage of the estimate (see Fig. 2b). Additionally the likelihood of a systematic bias in the historical discharge estimates should be considered and is discussed in the following section.

Model parameters and prior distributions
The OpenBUGS configuration settings and parameters used in our Bayesian model are described in Table 3. The first entries in Table 3 are the three prior distributions for the GEV model parameters (h ¼ fl; r; ggÞ. Kuczera (1999) recommends using informative priors based on a regional flood frequency analysis. Another approach is to assume prior independence of the GEV parameters using broad, uniform distributions for the location (l) and scale parameters (r), and then giving some sort of geophysically representative prior for the shape (g) parameter: a beta distribution in the case of Martins and Stedinger (2000) or a Gaussian distribution in the case of Neppel et al. (2010). For the UK there is no selfevidently representative prior distribution, so we used uniform (non-informative) priors for all three GEV parameters (see Robert and Casella, 2004 for a discussion on choice of priors).
The default value for the gauged data uncertainty is based on the reviewed literature (Environment Agency, 2011c;Horritt et al., 2010), where an error of ±10% is roughly equivalent to a standard deviation of 5%. The default value for the historical adjustment factor is based on the historical flood modelling described in Section 3.1. The uncertainty in the historical floods and the perception threshold are more subjective. We chose confidence limits thought to be wide enough to cover most uncertainties in the historical flood estimates.

Sensitivity analysis
In order to establish how the model behaviour changes as the parameters are altered, we performed a sensitivity analysis. The direct model output consists of the posterior estimates for the mean and standard deviation of the three GEV parameters. The flood frequency curve is then derived with confidence limits by running a Monte Carlo simulation sampling from the uncertain posterior estimates for the GEV parameters. For the sensitivity analysis in Fig. 5, the mean and 95% limits for the 0.01 AEP flood are used to show the effect on the flood discharge estimates of varying the model parameters. Fig. 5 shows the sensitivity of the model while varying five model parameters: Gauged data uncertainty; historical uncertainty; historical adjustment factor; perception threshold; and perception threshold uncertainty. The 'chain length' and 'burn-in steps' are settings required by the BUGS MCMC simulations; they were included in the sensitivity analysis to check model convergence. Increasing the chain length and burn-in stapes above the defaults made no difference to the model output suggesting satisfactory convergence.
All the graphs in Fig. 5 use the 0.01 AEP (100 year return period) flood as the key metric, which is towards the higher end of the flood frequency curve. The 0.01 AEP flood was chosen because it is a familiar measure in the UK, being used as the definition of the boundary between medium and high fluvial flood risk (Environment Agency, 2011b). This metric will be more influenced by the bigger floods in the time series, which explains why, as shown be Fig. 5(a and b), the model appears insensitive to the uncertainty both in the gauge readings and the historical flood estimates until they reach values towards the upper end of the scale. Extending the upper bounds of the gauged and historical time series beyond that which is realistic (as defined by the literature review and discussion with local Flood Risk Manager Advisor (Environment Agency FRMA, 2014)) has the effect of enhancing the influence of the medium sized floods on the upper end of the flood frequency curve.
Overall the model shows the greatest sensitivity to the historical adjustment factor (Fig. 5(c)) which shows the importance of selecting a realistic value for this parameter. It should be noted that since this factor is added directly to the historical flood estimates, this parameter combines with the historical uncertainty parameter to cover stochastic (random) and systematic (bias) error. Fig. 5(d) shows the model is sensitive to the value chosen for the perception threshold across all values; however there is no reason to select a value far from that derived from the historical flood data.  6. Flood frequency curve generated using single site analysis and showing annual maxima (AMAX) discharge data for the River Eden at the Sheepmount river gauge (blue +), plotted using WINFAP-FEH software from WHS (2009a). Three different extreme value distributions (EVDs) have been fitted using the method of L-moments: 1. Generalised Logistic (GL); 2. Generalised Extreme Value (GEV); 3. Log-Pearson Type III (LP3). Also shown are the 95% confidence limits for the fitted GEV curve based on balanced re-sampling (Gleason, 1988) of 4999 samples taken from the AMAX series. The data points from the AMAX series are plotted using the Gringorten plotting position formula (Gringorten, 1963). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Table 2 Goodness of fit test results for 4 candidate (3 parameter) distributions for the AMAX data from the Sheepmount river gauge in Carlisle, Cumbria. The test is designed such that a distribution is considered to have an adequate fit the absolute value of Z (|Z|) is less than 1.64 (Hosking and Wallis, 1997;Kjeldsen et al., 2008

Frequency analysis using gauged data only (single site analysis)
Initially we present a 'single site' flood frequency analysis performed with the WINFAP-FEH software using just the 46 years of annual maximum discharge time series data (Section 3.1) from the Sheepmount river gauge. The resulting flood frequency curve (referred to as 'single site analysis') is shown in Fig. 6 where the GEV, GL and LP3 distributions have been fitted to the data using the method of L-moments (Hosking, 1990). The 95% confidence intervals for the GEV distribution were calculated by the WINFAP-FEH software using a balanced re-sampling method (Gleason, 1988;WHS, 2009a), which is a commonly used type of 'bootstrap' procedure (Kjeldsen et al., 2014a). The confidence intervals represent the uncertainty of the curve fitting due to the length of the time series: the longer the time series, the narrower the confidence interval would be. Unfortunately, long and reliable time series of AMAX records from river gauges are uncommon in the UK; the 46 years recorded at Sheepmount is comparatively long, the average record length being roughly 35 years (Kjeldsen et al., 2008). The estimated AEP for the peak discharge (1516 m 3 s À1 ) measured during the January, 2005 Carlisle flood is 0.0055, or a return period of 180 years, with 95% confidence intervals of 0.0386 to less than 0.0001, or return periods from 34 to greater than 10,000 years (see Table 6).
Even the enormous width of these confidence limits may be unrealistically narrow; the 'bootstrap' method is often found to give lower uncertainty estimates than other, comparable methods using the same data (Hall et al., 2004;Kjeldsen et al., 2014a). This raises questions about how much protection is actually provided by flood defence designs. After the 2005 flood, defences in Carlisle were upgraded to protect against a similar scale event. But if the AEP of the 2005 flood really is 0.0386, then a flood at least as large might potentially be expected on average once every 34 years, which is a much less comforting thought than if the new defences provide a return period standard of protection greater than 10,000 years. However the width of the confidence intervals is so great that it is impossible to say one way or the other using the conventional statistical methods of flood frequency estimation recommended in the FEH.

Bayesian model results
Running the model with the default parameters gives the posterior distributions for the GEV model parameters described in Table 4. Fig. 7 shows the resulting flood frequency curve with confidence limits plotted using output from the Bayesian model. Fig. 8 transposes the flood frequency curves for the single site analysis using the WINFAP-FEH software (GEV distribution fitted using L-Moments) with the flood frequency curve generated using the Bayesian model. The advantages of the Bayesian model over the single site analysis presented in Section 4.1 can be seen in its narrower confidence limits, which are also evident in the figures for the confidence limits for the design floods shown in Table 5. The Bayesian model gives an AEP for the January, 2005 flood of 0.00382 (return period 262 years) with 95% confidence limits of 0.014 to 0.000246 (return periods: 70 years to 4060 years).

Incorporating the December 2015 flood in the Bayesian model
The results presented so far have only included data up to 2012, which at the time of writing was the extent of the AMAX time series available from CEH (2014). But the controversy over the recent Storm Desmond flooding of Carlisle in December 2015 has raised important questions about just how unprecedented that event was. At the time of writing two estimates for the peak discharge at the Sheepmount gauging station are available: 1680 m 3 s À1 (Marsh, 2016) and 1700 m 3 s À1 (Parry et al., 2016). These preliminary estimates for the 2015 flood can also be incorporated into our Bayesian model simply by treating the recent flood as an uncertain estimate of an historic event. This enables us to update Fig. 2(b) with Fig. 9. Note that, in producing Fig. 9, we have not applied the historical adjustment factor to the December 2015 flood estimate. Also it was necessary to extend the censored period to account for the years 2013, 2014 and 2015 3 which have not been published yet as part of the official instrumental record. The updated flood frequency curve at the Sheepmount gauging station is shown in Fig. 10. The effect of adding such a large flood to the dataset driving the model has a significant effect on the upper (low frequency event) end of the curve. Table 6 shows how the output of the Bayesian model changes as the new data is added and Fig. 11 compares the flood frequency curves from the Bayesian model both without and including the December 2015 flood.

Discussion
The magnitude of the flood flows affecting Carlisle in December 2015 was unprecedented, with an AEP and return period we estimate to be 0.00311 and 420 years respectively (95% confidence limits of 0.0119-0.000255 for AEP, 42 to greater than 10,000 years Historical adjustment factor Factor applied to estimates of historical flood discharge to take account of construction of flood defences and reduction in dredging activities.

Perception threshold
Minimum value of floods in the historic period included in the historic flood record. 900 m 3 s À1 740-1100 m 3 s À1 Perception threshold uncertainty Uncertainty (standard deviation of normally distributed perturbation) applied to perception threshold. Absolute value.  Agency, 2006). But there were substantial uncertainties about those return period estimates, and the December 2015 flood event raises questions about their validity. First, AEP estimates are sensitive to the inclusion of such extreme events in the sample used to calculate the frequency curve. Beyond mere 'hindsight bias' (Tversky and Kahneman, 1974), our analysis shows how accounting for the 2015 flood suggests that very large floods of this kind are likely to occur with greater frequency than previously thought. For example, we show that a flood in Carlisle of 1341 m 3 s À1 which had been expected to occur on average once every 100 years according to our Bayesian model (Table 5) is now expected on average once every 75 years when the model is run to include the December 2015 flood data (Fig. 11).
Second, in the wake of yet another winter of severe flooding in Britain, many people are asking if climate change is now causing such extreme flood events to occur more frequently (Met Office, 2015). Indeed the government has commissioned a review to see if its flood frequency estimates are taking appropriate account of climate change uncertainties (Defra, 2016). There is a substantial body of research into the impacts of long term climate change on river discharge (e.g. Arnell et al., 2014;Fowler and Kilsby, 2007;Hannaford, 2015;Ramsbottom et al., 2012;Reynard et al., 2009). Several studies have sought to attribute specific flood events in the UK to climate change (Kay et al., 2011;Pall et al., 2011), and similar attribution studies of Storm Desmond are under way (van Oldenborgh et al., 2015). The censored time series presented in Fig. 9 is certainly consistent with the hypothesis that extreme floods are becoming more frequent. However, the large uncertainties in the flood frequency curve derived for this paper emphasise the difficulties of attributing individual flood events to global-scale climate changes. This problem of detecting a climate change signal is further confounded by other sources of non-stationarity. Landuse and channel change were discussed in Section 3.3, and there is also evidence of non-stationarity caused by climate variability. For example Pattison and Lane (2011) find evidence that multidecadal cycles in the North Atlantic climate system are correlated with variations in flood frequencies in Cumbria.
Regardless of the cause, non-stationarity seems to have been accepted as a fact of life (Milly et al., 2008). To assist in understanding this, the use of proxy records from before the historical period can provide valuable information on non-stationarities. Paleoflood analysis typically involves the use of sedimentology or geomorphology techniques to search for evidence of deposits or erosion from previous events (see for example Baker, 2006;Benito et al., 2004;Li et al., 2013), but other proxy records such as tree ring chronology can also inform researchers on issues of climate and hydrologic stationarity (Razavi et al., 2015). In theory it would also be possible to extend our Bayesian model to take account of one or more paleoflood estimates derived from sediment analysis in the study area. This would be done by adding a 'prehistoric' period to join the 'gauged' and 'historic' periods. The 'prehistoric' period would consist of uncertain estimates for the paleofloods and a second prehistoric perception threshold derived from those paleoflood estimates. In practice, however, the model might require some redesign to deal with the length of the prehistoric period. Uncertainties about the impacts of climate change and nonstationarity in the catchment simply add to and compound the many other epistemic uncertainties in flood frequency estimation, which frustrate policymakers needing to prioritise scarce resources for flood risk management and communicate the standards of protection being provided to the public (Kuklicke and Demeritt, 2016). Chief amongst these uncertainties is the sampling error caused by the limited availability of data; symptomatic of this is the sensitivity of the high return period end of the flood frequency curve to the addition of a new extreme event, as demonstrated in Section 4.3. Our analysis shows that this sampling error can be reduced by using historic flood estimates. They are particularly important because they consist of estimates of previous extreme floods and those are the ones we are most interested in because they disproportionately influence the upper end of the flood frequency curve.
In this paper we have shown that these uncertainties can be substantially reduced by incorporating historical flood records into flood frequency analysis. The Bayesian model described in Section 3.5 shows the value of incorporating estimates of historic floods, bringing a reduction in 95% confidence limits of roughly 50% when compared to analysis that uses only the local gauged data (Section 4.2). In addition the model employed here can also incorporate subjective adjustment factors resulting from consultation with stakeholders with local hydrological knowledge. The ability to add possibly subjective, expert hydrological information to the Bayesian model (also see Viglione et al., 2013) can perhaps assuage the concern that flood frequency analysis is dominated by statistical analysis rather than hydrology (Singh and Strupczewski, 2002). This point was borne out during discussions with practitioners responsible for making flood risk estimates in areas they have worked in for many years (Environment Agency FRMA, 2014; Flood Risk Manager, 2014). The ability to incorporate their local expertise could potentially make the resulting estimates Fig. 8. River Eden at Sheepmount flood frequency curves with 95% confidence limits showing comparison of curves derived using the Bayesian model with the single site analysis using WINFAP-FEH software (GEV distribution fitted using L-Moments).

Table 5
Estimates of the discharge at the Sheepmount gauging station for certain design floods with 95% confidence intervals, comparison of single site analysis (GEV distribution fitted using L-Moments -Section 4.1) with results of Bayesian model. Also shown is the percentage reduction in the 95% confidence interval of the design floods achieved by implementing the Bayesian model. The results of running the Bayesian model using only the gauge data (no historic flood estimates) is included for comparative purposes.
AEP (return period, years) 0.5 (2) 0.0133 (75) 0.01 (100) 0.001 (1000) Mean Discharge (m 3 s À1 ) (95% confidence intervals) more 'physically-based' in the sense that they take account of the physical events and changes to the channel known to have occurred but otherwise difficult to quantify. This can be done within the Bayesian framework as long as it is communicated transparently how the subjective factors are incorporated in the model and how the model responds. As such flood risk estimates and their associated uncertainty ought to be more plausible insofar as they incorporate, in a systematic and rigorous way, more of the available information about the historical frequency and nature of flooding in the catchment.
But the software recommended by the Environment Agency does not support the use of unsystematic (non-gauged) data, so flood frequency estimates making direct use of historic data are currently confined to the academia. Discussions with practitioners responsible for making flood risk estimates in areas they have worked in for many years suggest flood risk managers do indeed take account of local expertise when making flood risk estimates (Environment Agency FRMA, 2014;Flood Risk Manager, 2014;Lane et al., 2011;Odoni and Lane, 2010), but the subjective process is unlikely to be done consistently or transparently if it is not part of the recommended protocol.
Bayesian approaches can overcome both these problems. They enable you to use historic data and to do so transparently so that the resulting impacts on the uncertainty are clear. The results presented in Section 4.2 show a reduction of over 50% in the confidence intervals for the 0.01 AEP flood when our Bayesian model is run using historic data combined with gauged data.
If we wanted to mainstream this approach, what would be the next steps? At the moment the model employed for this project has only been used for the data from Carlisle. But there is no reason  why the model would not work at other sites where there is a mixture of gauged and historic data either in the form of uncertain estimates of historic floods (censored historic time series), or just knowledge of large floods in the past with no estimate of the peak discharge (binomial censored time series) ). If the model proved adaptable to other sites, it could be packaged up with a 'user friendly' interface that would allow more widespread use among flood risk management community.
However, substantial changes are often required to successfully adapt academic software for commercial distribution (The Economist, 2016). As well as improving the user interface, error handling, and support infrastructure, one particular hindrance is that, when using Bayesian MCMC analysis, it can be unclear when the simulation is not converging on the posterior distribution (Lunn et al., 2013, p. 72). For all the model runs in this project, 20,000 iterations with 5000 taken as burn-in was sufficient to achieve convergence for the GEV parameter distributions. But this may not be the case for other locations with different sets of data. There are techniques for formally identifying convergence, for example Plummer et al. (2006), but none are completely reliable (Lunn et al., 2013, p. 75). Even though commercial statistical packages are starting to incorporate Bayesian analysis modules, (for example Stata, 2015), it is still difficult to envisage a software package for estimating flood frequencies using Bayesian analysis that could be used by practitioners with no knowledge of the underlying techniques.
Furthermore, for the Carlisle case study, our job as researchers was facilitated by the fact that the historical flood data had already been collated from local archives and published. In order to mainstream our Bayesian model, a practitioner would need to gather the following inputs: 1. A historic catalogue of flood events. A useful starting point may be the national database of historical records produced by the British Hydrological Society (Black and Law, 2004), but the practitioner may be required to supplement that data with records from local archives. 2. Historic changes to the catchment and channel that may have affected the intensity of floods and the stagedischarge relationship. This too may require searching local archives then testing the sensitivity of hydrological or hydraulic models to the changes. 3. Estimates of the peak discharge associated with historic flood events. The task of creating a censored time series of historical flood estimates would be daunting for a practitioner starting from scratch, who was unlikely to have been trained in the methods required for scouring local archives for records of flooding. 4. Perception threshold. The sensitivity analysis of our model (Section 3.8) shows that the identification of a realistic perception threshold is important when building a censored historical flood record. The estimation of the perception threshold was facilitated in the case of Carlisle due to the presence of the Eden Bridge which provided a benchmark for much of the historical period.
In the UK it may be feasible to identify a number of sites that do satisfy these criteria. A cursory review of previous studies in the UK suggest locations on the Rivers Tyne (Archer et al., 2007), Trent (Macdonald, 2013), Ouze in Sussex  and Yorkshire (Macdonald and Black, 2010) and Tay in Scotland (Macdonald et al., 2006) would be suitable for the use of our Bayesian model. Further research should utilise a framework such as that employed by Renard et al. (2013) to compare, across several case studies, this method of frequency analysis with others that can incorporate historical data, e.g. the EMA (Cohn et al., 2001).

Conclusion
While uncertainty is endemic to all estimates of flood risk, the Bayesian approach developed in this paper offers a method both for reducing its scale and managing it more transparently. In the case of Carlisle, incorporating historic data into our Bayesian model reduced the size of 95% confidence intervals by roughly 50% for annual exceedance probabilities of less than 0.0133 (return periods over 75 years) compared to standard flood frequency estimation methods using solely gauge data. Several other locations in the UK have been identified as possible additional case studies, but even if these localities do not show similar reductions in uncertainty from incorporating historic data, using a formal Bayesian framework still provides greater transparency in flood frequency estimation.
The two recent floods in Carlisle have highlighted the danger in not taking explicit account of uncertainty when planning defences and communicating their limits to residents. Although there is clearly great promise in using models that can incorporate all available data sources, more work is still required to mainstream the Bayesian approach tested here and to develop decisionsupport systems for helping making it available to users with no specialist knowledge of the statistical techniques behind the model.