Quantifying radar-rainfall uncertainties in urban drainage flow modelling

This work presents the results of the implementation of a probabilistic system to model the uncertainty associated to radar rainfall (RR) estimates and the way this uncertainty propagates through the sewer system of an urban area located in the North of England. The spatial and temporal correlations of the RR errors as well as the error covariance matrix were computed to build a RR error model able to generate RR ensembles that reproduce the uncertainty associated with the measured rainfall. The results showed that the RR ensembles provide important information about the uncertainty in the rainfall measurement that can be propagated in the urban sewer system. The results showed that the measured flow peaks and flow volumes are often bounded within the uncertainty area produced by the RR ensembles. In 55% of the simulated events, the uncertainties in RR measurements can explain the uncertainties observed in the simulated flow volumes. However, there are also some events where the RR uncertainty cannot explain the whole uncertainty observed in the simulated flow volumes indicating that there are additional sources of uncertainty that must be considered such as the uncertainty in the urban drainage model structure, the uncertainty in the urban drainage model calibrated parameters, and the uncertainty in the measured sewer flows. 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http:// creativecommons.org/licenses/by/4.0/).


Introduction
The quantitative measurement and forecasting of precipitation is crucial for predicting and mitigating the effects of flood-producing storms. Real-time management of urban drainage systems requires measurements and forecasts of precipitation with high spatial and temporal resolutions (Verworn, 2002;Einfalt et al., 2004). For instance a typical urban catchment of about 10 km 2 requires spatial and temporal resolutions of about 3 km and 5 min respectively (Berne et al., 2004). Some urban catchments have response times of less than a few hours, and weather radar is the key component to provide short-term precipitation measurements and forecasts. In fact, early studies highlighted the fact that for urban hydrology it is desirable to have rainfall data with spatial and temporal resolutions of 1 km and 1 min respectively (Schilling, 1991). However, there are well-documented cases where rainfall with finer spatial resolution (e.g. 5 ha) is required in environments dominated by convective storms (Faures et al., 1995). Foundation for Water Research described how fine resolution rainfall data are needed by the UK water industry and found that 5 ha subdivisions and 2-min intervals are necessary for modelling what happens on a street scale and scale of individual properties (WaPUG, 2004). This highlights the need of rainfall data with high spatial and temporal resolutions. Weather radars are able to provide rainfall measurements with high spatial (e.g. 1 km or lower) and temporal (e.g. 5 min or lower) resolutions. In fact, several studies have shown the potential of using high-resolution weather radar rainfall estimates into urban drainage flow modelling (Austin and Austin, 1974;Yuan et al., 1999;Han et al., 2000;Tilford et al., 2002;Smith et al., 2007;Kramer and Verworn, 2009;Gires et al., 2012;Schellart et al., 2012;Schellart et al., 2014;Goormans and Willems, 2013). However, radar rainfall (RR) measurements are affected by various sources of error as discussed in different studies (e.g. Browning, 1978;Krajewski and Smith, 2002;. Quality control and correction techniques can certainly improve the estimation of precipitation using weather radar (Harrison et al., 2000(Harrison et al., , 2009Fulton et al., 1998). However, in spite of the significant progress for correcting and adjusting RR estimates, residual errors often remain (see e.g. Krajewski et al., 2010). This has significant consequences in the context of hydrological applications of weather radar, in particular when forecasting flash floods or extreme rainfall events in large river systems and small urban areas.
Currently, there is a significant amount of work in quantifying the errors in RR (Ciach et al., 2007;Villarini and Krajewski, 2009;Germann et al., 2009;Quintero et al., 2012;Dai et al., 2014a;Dai et al., 2014b). The knowledge of the uncertainties affecting RR measurements can be effectively used to build a hydro-meteorological forecasting system in a probabilistic framework. Rossa et al. (2011) summarised the progress made to quantify uncertainties in precipitation observation and forecasting and how these uncertainties propagate in hydrological and hydraulic models for flood forecasting and warning. Hydrological Ensemble Prediction Systems (HEPS) have been developed for research and operational purposes to assess the propagation of uncertainty into hydrological predictions mainly implementing probabilistic rainfall predictions from Numerical Weather Prediction models (see Cloke and Pappenberger, 2009, for a review on the topic). The application of HEPS has recently been extended to account for the assessment of the propagation of uncertainty from RR forecasts (Schröter et al., 2011) and observations (Zappa et al., 2008;Liechti et al., 2013) into hydrological systems. Most of the study cases available in the literature deal with the propagation of RR uncertainty through large river catchments (see e.g. Borga, 2002;Vivoni et al., 2007;Collier, 2009;Zhu et al., 2013) and little is known about the way the RR uncertainty propagates on simulated flow peaks/volumes in urban drainage systems. Schellart et al. (2012) showed that for a small urban catchment, large differences are observed in the flow peaks simulated by radar and raingauges due to the inherent uncertainties from both rainfall estimates. Therefore, there is a need to quantify how much of the uncertainty observed in the simulated sewer flows can be explained by the uncertainty in the rainfall estimates and by the uncertainties associated with the sewer model (e.g. model structure, model parameters, model calibration). A preliminary study on the propagation of radar rainfall uncertainties into sewer flows was carried out by  and Rico-Ramirez and Liguori (2014), but using a limited number of events.
Probabilistic RR fields can be generated by adding perturbations to the original RR field. The perturbations should be able to represent the uncertainties in the RR measurements. There are two methods to characterise the RR uncertainties. The first method compares RR estimations directly with observations of rainfall on the ground such as those provided by raingauge measurements. In this approach, the perturbations can be obtained by looking at the spatial and temporal error characteristics of the RR measurements assessed with reference to a ground true measurement. This approach provides a direct estimate of the overall uncertainty in RR estimations and includes all sources of uncertainty. For instance, Ciach et al. (2007) developed the so-called product-error-driven (PED) approach in which the relation between the true rainfall and the radar-measured rainfall can be described as the product of a systematic distortion function (representing the systematic bias) and a stochastic component (representing the random errors). Germann et al. (2009) developed a radar ensemble system where the uncertainties in the RR can be modelled by means of stochastic perturbations having the same spatial and temporal correlation of the radar residual errors. The second method to characterise the RR uncertainties uses static models (Krajewski and Ciach, 2003). In this method, the individual error sources in RR estimations are identified and quantified to model their individual error structures, which are superimposed to calculate the overall error structure. However, the individual error structures are correlated in a complex manner and the superposition is not trivial (Germann et al., 2009).
In this paper, we use the model proposed by Germann et al. (2009) to generate the ensemble RR fields. This system has been tested and implemented for studies on Alpine catchments (Liechti et al., 2013) and a similar approach has been implemented by Pegram et al. (2011). The probabilistic urban drainage system developed in this paper comprises a generator of probabilistic ensemble RR fields to represent the uncertainties in RR measurements and a rainfall-runoff and hydraulic model of the sewer network of an urban catchment. The objectives of this paper are to quantify the errors in RR measurements; to assess the benefit of using the ensemble RR fields to simulate sewer flows in an urban area; and to understand how the RR uncertainty propagates in the sewer flow simulations.
This paper is organised as follows. Section 2 presents the methodology and data sets used in this study. Section 3 describes the spatial and temporal correlations of the radar residual errors for the study area whereas Section 4 describes the spatial and temporal correlations of the perturbations. Section 5 presents the results of applying the RR ensembles to an urban catchment in the north of England. Section 6 summarises the conclusions of this work. Germann et al. (2009) proposed a model to generate the ensemble RR fields by modelling the radar residual errors. The residual errors are those errors in radar rainfall estimations that remain in the measurements even though different corrections were applied to the data to remove the well-known sources of errors in radar rainfall (e.g. ground clutter, attenuation, variation of the vertical reflectivity profile, bright band, etc.). The residuals errors () are calculated using radar (R) and raingauge (G) time series at specific locations. The perturbation fields (d) can be generated by computing the covariance matrix (C) of the residual errors, but excluding values with zero rainfall. The covariance matrix can be decomposed into lower and upper triangular matrices using the Choleskly decomposition, that is, C ¼ LL T . The matrix L can be multiplied with a random vector y with zero mean and unit variance to generate the perturbations d. The temporal correlation of the perturbations can be imposed by using a second order autoregressive model AR(2). The perturbations are added rather than multiplied to the deterministic RR field because the analysis is carried out in the log domain. Note that in this model the perturbations are computed at the raingauge locations and the two-dimensional perturbation field can be obtained by interpolation.

Methodology and data
The radar rainfall measurements come from the radar mosaic product from the UK Met Office (UKMO). This data set is provided through the British Atmospheric Data Centre (badc.nerc.ac.uk) with spatial and temporal resolutions of 1 km and 5 min respectively. The UK RR product is a composite of 18 C-band weather radars. However, there are only 3 radars available within the study region (see Fig. 1) which are Hameldon Hill radar in the west of that region, Ingham radar in the south east of that region, and High Moorsley radar in north of that region. Note that High Moorsley radar was installed in 2008. The RR product was quality-controlled by the UKMO using different correction algorithms (Harrison et al., 2000(Harrison et al., , 2009. The processing of radar data comprises identification of clutter and beam blockage, noise filtering, removal of spurious echoes often due to anomalous propagation of the radar beam, attenuation correction, conversion of radar reflectivity to precipitation rate, antenna pointing, corrections for variations in the vertical reflectivity profile, mean field bias adjustment, spatial averaging and conversion from polar to Cartesian coordinates. In addition, 229 tipping bucket raingauges (TBRs) provided by the Environment Agency with a 15-min temporal resolution were used in this analysis. Both, radar and raingauge data were accumulated to produce time series with a temporal resolution of 60-min and covering a period of four years between 2007 and 2010. Fig. 1 shows the study area with the locations of the urban area, radars and raingauges.
The quality control of raingauge data is crucial in this analysis and it is important to highlight that raingauge measurements are also prone to error. Some of the typical errors in raingauge measurements are related to gauge malfunctioning, blockages, wetting and evaporation, delayed in rain delivery, underestimation during high rain rates, condensation errors, wind effects, timing errors (Upton and Rahimi, 2003), as well as sampling and random errors (Habib et al., 2001;Ciach, 2003). Therefore, it is necessary to identify any significant errors in raingauge measurements if they are used as ground true measurements. Therefore, the raingauge data quality was assessed by a nearest neighbour comparison and those gauges showing significant deviation compared to the surrounding gauges were removed from the analysis (around 10% of the gauges). Additional errors in the comparisons between radar and raingauge measurements arise because raingauges provide point measurements, whereas RR measurements represent a larger volume in space. This results in representativeness errors (also known as sampling errors) because of the differences between the two very different sampling volumes (Kitchen and Blackall, 1992). In fact, part of the differences between radar and raingauges measurements could be explained by the sampling errors (Bringi et al., 2011). Ciach and Krajewski (1999) proposed a method to estimate the variance reduction factor (VRF), that is, the variance between radar and raingauge measurements (point-to-area variance) with respect to the variance of the raingauge measurements (point variance) (see Eq. (13) in Ciach and Krajewski, 1999). If the point-to-area variance is small compared to the point variance, then the VRF tends to zero. Following the VRF methodology and using the spatial correlation obtained with a dense raingauge network inside a radar pixel (see Fig. 8a in Bringi et al., 2011), it can be estimated that the VRFs for a raingauge located at the centre and at the corner of a 1 km 2 radar pixel are around 4% and 9% respectively (assuming 60-min rain accumulations). These errors are relatively small compared to other sources of error in RR and therefore not taken into account in this study.
The rainfall-runoff processes on the urban area and the flow through the sewer network conduits were modelled in Infoworks CS. The model was provided by Yorkshire water for research purposes. The urban model includes the main pipes of the sewer system as well as gullies and manholes. The urban area is located in the Pennine hills with an area of 11.06 km 2 . Most of the sewer system is combined, carrying both rainfall runoff and domestic waste water. The urban model consists of a distributed rainfall runoff model in combination with a hydrodynamic pipe network model, built and simulated using the Infoworks CS software package. For a more detailed description of the urban model please refer to , Schellart et al. (2012), Schellart et al. (2014). The urban model was previously calibrated following current UK industrial practice (WaPUG, 2002) and therefore we can use this model as a surrogate of the urban catchment in order to study the propagation of RR uncertainty at specific locations within the sewer network. The instrumentation in the urban catchment includes 15 flow monitors, 7 depth monitors and 4 additional raingauges (see Figs. 1 and 2 in . These additional raingauges within the urban area were only used to simulate the flows in the drainage system. Table 1 shows a summary of the validation events to test the RR ensembles. The rainfall depths were computed by averaging the rainfall from the gauges within the urban area for the duration of the event. Note that events 7 and 13 were larger than the rainfall events originally used for calibration of the Infoworks CS model (three events from an earlier monitoring campaign were used for model calibration, with rainfall depths ranging between 9.8 and 35 mm). Also, most of the events shown in Table 1 can comprise several storms and this is why some of them have a long duration. We tried to include a sufficient number of events to cover summer and winter storms. In fact, the events were classified as either stratiform or convective following the algorithm proposed by Steiner et al. (1995) and using RR data. A simplified version of Steiner et al. (1995)'s algorithm was adopted here according to the size and duration of convective pixels for each storm event. A precipitation event can be classified as convective if the event has a number of convective pixels that cover an area equal or above 3% of the whole precipitation area during 3 h or more within the region shown in Fig. 1. Otherwise the event is classified as stratiform. The results indicate that most of the summer events were classified as convective whereas the winter events were classified as stratiform (see Table 1).
A 10-min moving average digital filter was applied to the flow data to smooth the high-frequency fluctuations of the flow measurements in the sewer system. The filtering was necessary because the flow measurements were too noisy (see Fig. 2). Although the filtering will reduce slightly the flow peaks, it would not cause any significant effect with the comparisons of the flow volumes. The flow volumes were computed by integrating the flow from a flow monitor located just upstream of a main CSO (Combined Sewer Overflow) structure which serves the town centre (see Figs. 1 and 2 in . Note that in all the events (except events 12, 17 and 18), the main CSO is expected to have spilled to a smaller or larger extent, as the recorded water level in the CSO chamber was higher than the CSO spill weir level (see last column in Table 1). In fact, the event selection criteria were that good measured flow data were available for a particular event and that the main CSO had spilled.

Spatial and temporal correlations of the rainfall data and residual errors
The spatial correlation of radar and raingauge measurements at the 15 min and 60-min time scales are shown in Figs. 3 and 4 respectively. The correlation was calculated using the Pearson correlation coefficient and assuming that the spatial correlation is isotropic. A spatial correlation function was fitted to the data using a two-parameter exponential model (Habib and Krajewski, 2002) given by qðdÞ ¼ exp½Àðd=R 0 Þ F , where d represents the separation distance, R 0 is the correlation radius, and F is the shape parameter. The results show that there is a good agreement among the spatial correlations computed individually from radar and gauge measurements. At the 15-min time scale, the correlation radius is 38 km for the radar data and 25 km for the gauge data, whereas at the 60-min time scale, the correlation radius increases to 61 km for the radar data and 52 km for the gauge measurements. The spatial correlation results are consistent with those shown by Villarini et al. (2008), who showed that the spatial correlation of rainfall measurements increases with larger accumulation times. Our results also indicate that the radar measurements are slightly more correlated in space than the gauge measurements. This is in part due to the point-to-area difference between raingauge and radar measurements. According to Cheng and Agterberg (1996), the spatial correlation function is a function of the spatial resolution. Since radar (areal measurement) and raingauge (point measurement) data represent rainfall measurements at different spatial scales, their correlation functions are consequently different.
The residuals errors () were computed by ¼ 10 logðG=RÞ, where R and G represent the radar and gauge rainfall time series respectively for a given location and using 60-min accumulations. The error model was built with hourly data in order to compute a more robust spatial correlation (and therefore a more robust covariance matrix), to smooth any potential timing errors between radar and gauge measurements and to minimise the errors introduced by the fact that the gauges are point measurements integrated in time whereas radar provides areal instantaneous rainfall measurements. However, even though the covariance was computed with hourly data, the ensembles were generated every 5-min, making them more suitable for urban applications, by taking advantage of the temporal correlation of the errors as shown at the end of this section. The mean error between radar and raingauges was computed using the expected value of the residual error, that is Efg. The results are shown in Fig. 5 for the years 2007-2010. Note that the errors were computed at the raingauge locations and they were interpolated using the bi-harmonic spline interpolation described in Sandwell (1987) for display purposes. Positive errors indicate underestimation by the radar when compared to raingauges, whereas negative errors indicate radar overestimation. A mean error of +/À3 dB is equivalent to radar u nderestimation/overestimation respectively of a factor of 2. In general terms, the radar tends to overestimate the rain rates with the exception of the north east region, which shows radar underestimation in 2007/2008, and the north west region, which also shows radar underestimation but in 2008 The right side of Fig. 6 shows the mean temporal correlation of the errors for the year 2007. The mean correlations at 1 h, 2 h and 3 h are 0.18, 0.12 and 0.07 respectively. The years 2008-2010 also showed a similar pattern, with correlations in the range 0.16-0.20 at 1 h, 0.10-0.14 at 2 h and 0.06-0.09 at 3 h. The temporal correlation shown in Fig. 6 is slightly lower than that reported by Germann et al. (2009) in the Alps.
In the UK, most of the convective rainfall events occur during the summer or at the end of the spring, whereas most of the stratiform or frontal rainfall events occur during the autumn and winter. The different characteristics of these events may produce slightly different error characteristics. This is shown in Fig. 7, where the spatial and temporal correlation of the error was computed for spring/summer and autumn/winter separately. The results show that on average the spatial correlation of the error is slightly larger during the winter/autumn (R 0 ¼ 24 km and     F ¼ 0:61) than during the spring/summer (R 0 ¼ 20:5 km and F ¼ 0:56). Similar findings are observed for the temporal correlation of the error (see Fig. 7 right).
The RR errors for the year 2007 will be used to calibrate the ensemble generator, whereas the year 2008 will be used for the validation using rainfall and flows from the urban catchment.

Spatial and temporal correlations of the perturbations
As shown in the previous section, the residual errors are correlated in space and time (see Fig. 6). Therefore, the RR ensemble model should be able to capture not only the covariance of the residual errors, but also their spatial and temporal correlations. To demonstrate this, the RR ensemble model was calibrated with the year 2007 and the spatial and temporal correlations of the generated perturbations (d) are shown in Fig. 8. As shown, the spatial correlation of the perturbations is in close agreement with the spatial correlation of the measured errors and shown in the same figure with the solid line (i.e. qðdÞ ¼ exp½Àðd=24Þ 0:58 , see also Fig. 6).
On the other hand, the temporal correlations of the perturbations at 1 h and 2 h respectively are also in close agreement with the temporal correlation of the measured errors, which are 0.18 and 0.12 for 1 h and 2 h respectively. Beyond 2 h, the temporal correlation of the perturbations decreases more rapidly than the measured one shown in Fig. 6 because a second order autoregressive model was used to impose the temporal correlation of the perturbations and only the correlations at 1 h and 2 h were considered. Note that a smaller number of raingauges was used to compute the covariance matrix to generate the perturbations. This was done to ensure that the covariance matrix C can be decomposed into lower and upper triangular matrices (C ¼ LL T ) using the Choleskly decomposition. Initial tests were carried out to decompose the covariance matrix using all valid gauges (around 200) without any success and for this reason a smaller number of gauges was used (44 gauges). However, as shown in Fig. 8, the perturbations are still able to capture the spatial and temporal correlations of the measured errors.
By comparing the covariance of the residuals errors and the covariance of the perturbations we can assess if the ensemble generator is able to reproduce the covariance of the RR errors. This is shown in Fig. 9 using one month worth of hourly perturbations and 25 ensemble members. The results clearly indicate that the perturbations are also able to reproduce the covariance of the residual errors.
Finally, the question is how many ensembles are able to represent the uncertainties in RR estimates. The number of ensembles members depends on how close the expected values of the perturbations are compared to the mean residual errors shown in Fig. 5. Some simulations are shown in Fig. 10 for different numbers of ensemble members. As shown, if the number of ensembles is too small, the expected value of the perturbations is far from the mean of the residual errors. On the other hand, a large value of ensembles (e.g. 1000) seems to reproduce well the mean of the residual errors, but may be unrealistic for use in real-time urban drainage flow modelling given the fact that the hydraulic models take some time to perform a single simulation. Therefore, a number of ensembles between 25 and 100 could be a good compromise between having sufficient ensembles to represent the uncertainties in RR estimates and being able to use them in real-time flood forecasting in urban areas. Therefore 100 ensembles were generated per event.

Application of the rainfall ensembles in urban flow modelling
The covariance of the RR ensembles was calibrated with radar and raingauge data from the year 2007 as discussed in previous sections. However, the perturbations produced in this way are valid at the raingauge locations. Therefore, they need to be interpolated to produce a distributed perturbation field d that can be added to the original RR field R. We used the bi-harmonic spline interpolation described in Sandwell (1987) (also known as V4 interpolation in Matlab) because this method produces smooth perturbation fields whereas other techniques such as linear or nearest neighbour interpolation may show discontinuities. The perturbations were added in the log domain to the original RR field to produce the ensemble rainfall fields U, that is, 10logðUÞ ¼ 10 logðRÞ þ d (see Germann et al., 2009).
The rainfall events to validate the RR ensemble generator were selected from the year 2008 (see summary of events in Table 1). Ensemble RR fields were generated for all these events to provide the input to the rainfall-runoff and hydrodynamic model of the  sewer network for the urban area. The number of ensemble members was fixed to 100 per event with spatial and temporal resolutions of 1 km and 5 min respectively which are suitable for applications in urban areas. An example of the original RR field and five ensemble members is shown in Fig. 11.
Note that to simulate every event shown in Table 1, we run the urban sewer model for at least 4 months previous to the event to obtain a better representation of the initial state of the catchment. In this way, the Infoworks model can calculate the antecedent precipitation index (API), which is an input to the runoff model for the  impermeable areas included in the model. For all the warm-up model runs, we used the measured rainfall from the gauges available within the urban area. Fig. 12 shows the results for some of the events described in Table 1. The figure shows the measured and simulated flows using raingauges, radar and radar ensembles. For event 2, the simulated flows with radar and raingauges are very similar, but both underestimate the main flow peak. However, some of the RR ensembles are able to capture this large peak. For event 10, there are two large peaks at around 2200 h. The simulated flows with the raingauges seem to better capture the first peak whereas the simulated flow with the radar seems to better capture the second peak. The radar ensembles however are able to capture both flow peaks. For event 15, there is a relatively small peak at around 1700 h. The raingauges are able to simulate this flow peak better than the radar, but the radar ensembles are able to capture well the measured flow. Some extreme ensembles however are able to produce larger peaks than expected. Fig. 13 shows the results for events 13 and 19. For event 13, the simulated flows with raingauges, radar and radar ensembles underestimate the measured flow for the whole event. Event 13 has a much larger rainfall depth (57.3 mm) than the events used for model calibration (between 9.8 and 35 mm), and therefore the Infoworks model is unable to simulate the larger flow peaks observed in this event. In this case, it is likely that additional flows coming from the surrounding permeable areas increase the total measured flow. This is a well-known problem with urban drainage models, because flows from surrounding permeable areas are not included in the urban drainage model (e.g. Bailey and Margetts, 2008). For event 19, the simulated flows with radar or raingauges underestimate all three flow peaks although the raingauges are doing a slightly better job than the radar rainfall in simulating the first and second flow peaks. The radar ensembles are also unable to capture the second flow peak and also fail to capture the recession curve of the third flow peak. These two events indicate that the there are additional sources of uncertainty that Fig. 12. Flow simulations for events 2, 10 and 15 from top to bottom respectively. The light grey area represents the simulated flow with all RR ensembles (Q E ), whereas the dark grey area represents their 15% and 85% percentiles; Q m is the measured flow; Q G is the simulated flow using raingauge measurements; Q R is the simulated flow using RR. cannot be explained with the uncertainty in the RR measurement alone. For instance, uncertainties due to the urban model calibration, uncertainties due to the urban model structure and uncertainties in the flow measurements. There are considerable uncertainties associated with the way Infoworks CS simulates rainfall runoff from permeable areas, (e.g. Schellart et al., 2010); or delayed impermeable area response, (e.g. Terry and Margetts, 2003). The modelling of these additional sources of uncertainty is outside the scope of this paper, but this shows the potential use of the method introduced in this paper to assist in the analysis of uncertainty in urban drainage models, as it can help enabling to distinguish uncertainty in flow simulations caused by the radar rainfall input uncertainty.
To summarise the results from all the events shown in Table 1, we computed the total flow volume per event. The results are shown in Fig. 14. The figure shows the measured flow volume as well as the simulated flow volume by raingauges, radar and radar ensembles. The box plots summarise the results from the radar ensembles, with the ensemble median as the central mark, the 25th and 75th percentiles shown by the edges of the box and the whiskers representing the most extreme ensemble members. The results indicate that the simulated flow volume with the raingauges (or radar) measurements not necessarily agree with the measured flow volume except for event 18 where the simulated flow volume with raingauges matched the measured flow volume. This highlights the fact that uncertainties in the input rainfall measurements produce uncertainties in the simulated sewer flows and therefore, the sewer flow simulations should be represented in a probabilistic rather than a deterministic way in order to highlight the uncertainty due to the input rainfall measurement. The radar ensembles on the other hand are able to capture the total flow volume for 11 out of 20 events (i.e. 55%, events 1, 2,7,8,9,10,11,14,16,17,20). This result indicates that in 55% of the simulated events, the uncertainties in the RR measurements can explain the uncertainties in the simulated flow volumes. For the remaining 9 events, in 6 of them the flow volumes from the ensembles overestimate the measured flow volume (events 3, 4, 5, 6, 12, 15), whereas in the other 3 events the ensemble flow volumes underestimate the measured flow volume (events 13,18 and 19). There were also five events (events 3, 4, 6, 15 and 19) where the measured flow volume is very close to the simulated flow volume from the most extreme ensemble members. There are cases such as event 13, where neither the raingauges nor the radar ensembles were able to capture the measured flow volume (around 42,000 m 3 for event 13 from Table 1). The results also indicate that in 60% of the events (events 1, 2, 5, 6, 8, 10, 11, 12, 15, 16, 17, 19), the flow volume simulated with radar is very close (within 6%) to the flow volume simulated with raingauges.

Summary and conclusions
The residual errors from RR measurements were modelled following the approach proposed by Germann et al. (2009). The RR error model takes into account the error covariance matrix computed with radar rainfall and raingauge measurements for the year 2007 for the study area shown in Fig. 1 in the UK. The important assumption made when calibrating the error model was that raingauge measurements represent the true areal rainfall, even though they are point measurements, whereas RR estimations represent a larger volume in space (1 km 2 in this analysis). The spatial correlation of the residual errors was reproduced with the covariance matrix, whereas the temporal correlation of the residual errors was imposed by using a second order autoregressive model. The RR error model generates the perturbations that can be added (in the log domain) to the RR measurements in order to generate an ensemble of RR estimations able to represent the uncertainty in the rainfall measured by radar. The results showed that the Fig. 13. Flow simulations for events 13 and 19 from top to bottom respectively. The light grey area represents the simulated flow with all RR ensembles (Q E ), whereas the dark grey area represents their 15% and 85% percentiles; Q m is the measured flow; Q G is the simulated flow using raingauge measurements; Q R is the simulated flow using RR.
perturbations are able to capture the spatial and temporal correlations of the residual errors (see Fig. 8). Also the covariance of the perturbations is able to reproduce the covariance of the residual errors (see Fig. 9). The number of ensemble members able to reproduce the RR errors was found to be between 25 and 100. A larger number of ensembles may be inadequate for real-time urban drainage flow modelling applications. Therefore, 100 ensemble members were used in this analysis. A large number of rainfall events from the year 2008 (20 events -see Table 1) was used to simulate the propagation of RR errors into flow peaks and flow volumes in the drainage system of a 11 km 2 urban area. The results showed that in many cases the flow peaks are bounded by the uncertainty area produced by the RR ensembles (see Fig. 12). However, there are also cases where the ensembles were unable to capture the flow peaks. At urban scales, the non-linearity of rainfall can be magnified and therefore the 2nd order (statistical moment) approximation might be insufficient. A high-order model may be necessary to well capture small-scale rainfall dynamics (Schertzer et al., 2013;Wu et al., 2015). 55% of the simulated events showed that the uncertainties in the RR measurements are able to explain the uncertainties in the simulated flow volumes (see Fig. 14). There are also some events where the uncertainties in the RR measurement cannot explain all the uncertainties observed in the simulated flow volumes. This highlights the fact that there are additional sources of uncertainty that must be considered such as the uncertainty in the urban drainage model structure, the uncertainty in the urban drainage model calibrated parameters, and the uncertainty in the measured sewer flows. Nevertheless, the proposed methodology enables to distinguish the uncertainty in the flow simulations caused by the uncertainty in the radar rainfall measurements.