Sampling design optimisation for rainfall prediction using a nonstationary geostatistical model.

The accuracy of spatial predictions of rainfall by merging rain-gauge and radar data is partly determined by the sampling design of the rain-gauge network. Optimising the locations of the rain-gauges may increase the accuracy of the predictions. Existing spatial sampling design optimisation methods are based on minimisation of the spatially averaged prediction error variance under the assumption of intrinsic stationarity. Over the past years, substantial progress has been made to deal with non-stationary spatial processes in kriging. Various well-documented geostatistical models relax the assumption of stationarity in the mean, while recent studies show the importance of considering non-stationarity in the variance for environmental processes occurring in complex landscapes. We optimised the sampling locations of rain-gauges using an extension of the Kriging with External Drift (KED) model for prediction of rainfall ﬁelds. The model incorporates both non-stationarity in the mean and in the variance, which are modelled as functions of external covariates such as radar imagery, distance to radar station and radar beam block- age. Spatial predictions are made repeatedly over time, each time recalibrating the model. The space-time averaged KED variance was minimised by Spatial Simulated Annealing (SSA). The methodology was tested using a case study predicting daily rainfall in the north of England for a one-year period. Results show that (i) the proposed non-stationary variance model outperforms the stationary variance model, and (ii) a small but signiﬁcant decrease of the rainfall prediction error variance is obtained with the optimised rain-gauge network. In particular, it pays off to place rain-gauges at locations where the radar imagery is inaccurate, while keeping the distribution over the study area suﬃciently uniform.


Introduction
Accurate information about the space-time distribution of rainfall is essential for hydrological modelling. Rain-gauge rainfall measurements are generally accurate and have high temporal resolution, but they typically have a low spatial density, which may cause large errors in interpolated maps given the high spatial variability of rainfall. In contrast, weather radar imagery provide a full spatial coverage of the rainfall field in combination with high temporal resolution. However, radar-derived rainfall predictions experience complex spatio-temporal disturbances and can be inaccurate, especially in mountainous regions.
Over the past years, many statistical techniques have been used to combine the strengths of the two measurement devices, such as Bayesian techniques ( Todini, 2001 ), spatial logistic regression ( Fuentes et al., 2008 ), radar bias correction ( Seo and Breidenbach, 2002;Sinclair and Pegram, 2005 ) and copulas ( Vogl et al., 2012 ). There is also a wide range of geostatistical prediction methods that combine rain-gauge measurements with radar imagery, such as kriging with external drift (KED) ( Velasco-Forero et al., 2005 ) and co-kriging ( Sideris et al., 2014 ). Provisions to address non-normality have also been employed, e.g. Box-Cox, square root and normal-score transformation. Besides, various techniques for parameter estimation are available, such as Least Squares and (Restricted) Maximum Likelihood estimation. Velasco-Forero et al. (2009) and Schiemann et al. (2011) make use of a non-parametric correlogram to derive a rainfall field from radar imagery, dealing with anisotropy and temporal variation of the rainfall struc-ture. Goudenhoofdt and Delobbe (2009) showed that geostatistical merging methods gave the best results for rainfall prediction in the Walloon region in Belgium, although the performance was dependent on the network configuration. For a more detailed review of radar-gauges merging techniques, we refer to Goudenhoofdt and Delobbe (2009) , Nanding et al. (2015) and Jewell and Gaussiat (2015) .
Few studies focus on the sampling design of the rain-gauge network. For example, Pardo-Igúzquiza (1998) derives the optimal network design by minimising an objective function based on prediction accuracy combined with monetary costs, Barca et al. (2008) explore the optimal location of new monitoring stations by minimising the mean shortest distances. Spatial optimisation of the gauge network in radar-gauge merging studies remains largely unexplored.
In sampling design for spatial prediction of rainfall by ordinary kriging (OK), using the average OK variance as a minimisation criterion leads to spreading of the locations in geographic space. However, for mapping with the help of covariates as in KED, we also need to spread the locations in feature (i.e., covariate) space. By selecting locations such that the covariate space is fully covered, uncertainty about the regression coefficients is minimised. Brus and Heuvelink (2007) showed that minimising the spatially averaged KED variance achieves a proper balance between optimisation in geographic and feature space. Heuvelink et al. (2012) extended this to a space-time kriging case and minimised the spacetime averaged KED variance to optimise static as well as dynamic sampling designs.
In this study we only consider static designs, i.e. we assume that the rain-gauge locations do not change over time. This is because it is impractical to move rain-gauges in an operational context. Our objective is to optimise the static rain-gauge sampling design such that it minimises the space-time averaged prediction error variance. We use a geostatistical model in which both the mean and the standard deviation are assumed to be a linear combination of covariates. The model parameters (regression coefficients and correlogram parameters) are estimated from the rain-gauge data using Restricted Maximum Likelihood. We optimise the raingauge locations with Spatial Simulated Annealing (SSA). The model is tested in a case study in the north of England for daily rainfall mapping in the year 2010.

Case study and data
The study area is located in the United Kingdom, north-east of the city of Manchester. The area is 27 874 km 2 in size and contains several hydrological catchments of different sizes and shapes. Two rainfall datasets are used in this study, rain-gauges and radarderived rainfall maps.
The area is covered by a network of 229 tipping bucket raingauges from the Environment Agency (EA). The data originally provided by the EA are at 15-min resolution and were aggregated to daily sums. We checked the quality of the rain-gauge data and reduced the number of gauges to 185, by excluding gauges with anomalies, such as an excessive number of missing values. The locations of the remaining 185 gauges are shown in Fig. 1 .
The radar composite imagery is obtained from the MetOffice NIMROD system. The system makes use of three radars (Hameldon Hill, Ingham and High Moorsley) shown in Fig. 1 . The preprocessing of the weather radar data includes removal of nonmeteorological echoes (e.g. ground clutter, ground echoes due to anomalous propagation), correction for antenna pointing, correction for beam blockage, rain attenuation correction, vertical reflectivity profile correction and rain-gauge adjustment ( Harrison et al., 2009 ). The radar rainfall product is available with a spatial and temporal resolution of 1 km and 5 min, respectively ( Met Office, 2003 ). The radar data set contains several missing 5 min periods and therefore a nowcasting model was used to interpolate missing periods for a maximum of 3 h. Next the 5-min resolution images were aggregated to daily sums.
Besides these two rainfall datasets, the following covariate maps were used: • Digital Elevation Model (DEM) ( Fig. 3 a) at 50 m resolution from the SRTM (Shuttle Radar Topography Mission), see Farr et al. (2007) . The elevation ranges from 6 m to 926 m above sea level (a.s.l.) with an average of 159 m a.s.l. • Radar beam blockage map at 1 km resolution ( Fig. 3 b). The radar beam blockage maps were generated for each radar station using the DEM and a ground clutter model described in Rico-Ramirez et al. (2009) . The individual beam blockage maps were combined to produce a single map with 1 km resolution for the 0.5 °radar scan inclination. When merging overlapping areas, priority was given to the lowest beam blockage value. The blockage maps represent the degree of deviation from the 0.5 °radar inclination due to topographic obstacles. Values are expressed in percentages from 0 to 100. Their mean is 4.8%. • Distance from nearest radar stations map at 1 km resolution ( Fig. 3 c). Values are expressed in km and vary from 0 (radar station location) to 102.6 km. The mean is 51.3 km.

Model definition
Daily rainfall as measured by rain-gauges Z t ( s ) at any location s in the study area A and time (day) t ∈ T is modelled by: where m t = { m t (s ) | s ∈ A} is a spatial trend, σ t the spatial standard deviation and ε t a zero-mean, unit variance, normally distributed, second-order stationary and spatially correlated residual at time t .
Note that ε t may be correlated in space, whereas we assume that ε t and ε t are uncorrelated if t = t . Both the trend and the standard deviation are modelled as linear combinations of covariates: where the β tk and κ tl are regression coefficients and the f tk and g tl are covariates. We assume that f t0 (s ) = g t0 (s ) = 1 for all t and s , so that β t 0 is an intercept and κ t 0 is a space-invariant constant contribution to the standard deviation. Note that the covariates may vary in space and time. Note also that the space-time model effectively consists of a set of separate spatial models, one for each day of the year. Temporal correlation is not modelled in this case study. We consider the situation that Z t has been measured at n locations s i (i = 1 , . . . , n ; s i ∈ A ) . The measurements z t ( s i ) are treated as realisations of Z t ( s i ) and prediction is done for Z t at new, unobserved locations s 0 . Stacking the z t ( s i ) in a (column) vector z t and changing to matrix notation yields: where F t is the n × (K + 1) matrix of "spatial trend" covariates at the observation locations, β t is the (K + 1) vector of trend coefficients, ε t is the n -vector of standardised residuals with correlation matrix R t and H t is an n × n diagonal matrix defined by: where G t is the n × (L + 1) matrix of "standard deviation" covariates at the observation locations and κ t is an (L + 1) vector of standard deviation regression coefficients. Note that while ε t has correlation matrix R t , the stochastic component H t ε t of Eq. (4) has variance-covariance C t = H t R t H t . The parameters of the model defined by Eq. (4) are the β t , κ t and the parameters of a model for the spatial autocorrelation of the standardised residuals. We assume an isotropic exponential correlogram r t (h ) = r t0 { exp(− h a t ) } (where h > 0 is the Euclidean distance between two points, by definition r t (0) = 1 ), thus introducing two more parameters, namely the micro-scale correlation r t 0 and the spatial correlation length parameter a t . Note that parameter r t 0 equals one minus the nugget-to-sill ratio. For notational convenience from here on we drop the subscript t .

Parameter estimation
For each day, two subsets of model parameters must be estimated, the spatial trend regression coefficients β and all parameters of the stochastic part of the model, Given the estimation of β is straightforward and can be done analytically by Generalised Least Squares (GLS) ( Lark and Webster, 2006 ). However, estimation of is more difficult. We used a Restricted (or Residual) Maximum Likelihood (REML) approach for this. Similar to Maximum Likelihood, REML aims to find the vector of parameters for which the observed data yield the highest probability density (i.e., likelihood). In our case the model contains a spatial trend (fixed effect), and so the likelihood must be computed from the probability distribution of the model residuals, which can be computed from the observations if the spatial trend is known. This implies that the likelihood depends on the regression coefficients, which are unknown and also must be estimated. The solution to this problem, proposed by Patterson and Thompson (1971) is to detrend the data by multiplying the data by a projection matrix (see also Lark and Cullis, 2004 ). After detrending the data and estimating by minimising the negative restricted log-likelihood given in Eq. (6) below, the estimate of β is obtained by substituting the REML estimates of in the GLS equations. The negative restricted log-likelihood function is given by ( Webster and Oliver, 2007 ): where I is an identity matrix and Q is defined as: and: Next matrix C is obtained by substituting the optimised and used to estimate β using GLS ( Marchant et al., 2009 ):

Kriging
In KED, predictions at new locations are made by: where ˆ ε (s 0 ) is the kriged standardised residual. Ignoring estimation errors in ˆ κ allows us to use a standard result from universal kriging ( Webster and Oliver, 2007 ) which yields: where f 0 is a (K + 1) vector of trend covariates at the prediction location and c 0 is an n vector of covariances between the residuals at the observation and prediction locations. Note that these are covariances of the (unstandardised) residuals σ t · ε t and thus depend on the standard deviation covariates g l , their associated (estimated) regression coefficients κ l and the correlogram of ε. Recall that since we use separate, independent models for each day, only rainfall observations of that day are used to predict Z ( s 0 ).
The variance of the prediction error is given by ( Cressie, 2015 ): where σ 2 ( s 0 ) is the variance of Z ( s 0 ). The first two terms on the right-hand side of Eq. (12) quantify the prediction error variance of the residuals, while the last term is the estimated spatial trend error variance. Note that here we take uncertainty about the β k into account, whereas uncertainty about the κ l and correlogram parameters r 0 and a is ignored. Taking the latter uncertainties into account is not an easy task and beyond the scope of this work.

Optimising the rain-gauge locations
We suppose that, due to budget constraints, the number of rain-gauges n is fixed. The aim is to find the optimal locations of the rain-gauges for predicting daily rainfall for a given time period. In order to do this, a criterion is needed that defines the performance of a given sampling design and that allows to compare designs. It makes sense to use the spatially averaged kriging variance as a criterion, because this provides an appropriate summary measure of the prediction accuracy ( Brus and Heuvelink, 2007 ). In our case, where a static rain-gauge network must be optimised for a longer period of time, in addition we should also average the criterion over time. This results in the following minimisation criterion: A closer look at the kriging variance Eq. (12) and hence the criterion Eq. (13) shows that it only depends on the sampling locations s i , the correlogram r and the spatial trend and standard deviation covariates. This implies that the configuration can be optimised before the observations are taken, provided that the model and covariance structure are known. Recall that in this study we only consider static designs, i.e. we assume that the rain-gauge locations do not change over time. In theory, with a finite number of possible rain-gauge locations N derived from discretising the study area A , we could try all N n combinations, and choose the one that minimises the criterion. However, finding the optimal gauge network in this way is practically impossible given the exorbitant number of possible combinations, even with a coarse discretisation of the study area. A solution to this problem is to use a spatial numerical search algorithm. We used Spatial Simulated Annealing (SSA), as proposed in Van Groenigen and Stein (1998) .
Spatial simulated annealing is an iterative optimisation algorithm in which a sequence of new possible sampling locations is generated. A new sampling location is derived by selecting randomly one sampling location and shifting it in a random direction over a random distance. Each time a new possible location is generated, the criterion ( Eq. (13) ) is calculated for the new candidate design and compared with the criterion value of the current design. The new location is always accepted if the criterion becomes smaller. If the criterion becomes larger the new location is sometimes accepted, namely with probability: where temp is a control parameter accounting for the number of remaining iterations, called the temperature. It decreases from a positive starting value to zero as the number of iterations increases. Eq. (14) shows that, given temp , the larger the increase of the criterion, the smaller the probability of accepting a worse design. Also, the smaller temp , i.e. the larger the number of iterations already done, the smaller the acceptance probability of a worse sample. The temp parameter is kept constant during a set of m iterations, called a "chain", after which it is decreased to a value α * temp , with α < 1. This process repeats itself until the total number of planned iterations have been completed. Parameter α should be chosen such that the acceptance probability is close to one in the first chain and approximating zero during the final stage of iterations. At first worsening designs are accepted to be able to escape from local minima, but towards the end only designs that improve the criterion are accepted. We refer to Heuvelink et al. (2010) for a more detailed explanation of the numerical optimisation algorithm used in this study.

Application to the case study
Parameter estimation We chose one covariate for the spatial trend (radar image) and three for the standard deviation (DEM, distance from nearest radar station and radar beam blockage). These were chosen after consulting experts on rainfall mean and radar uncertainty. The standard deviation covariates were multiplied by the radar image of each day to obtain a standard deviation that is proportional to the amount of rainfall and to avoid having a positive standard deviation where no rainfall is detected by the radar. All covariates were projected to the British National Grid system and resampled to a spatial resolution of 500 m × 500 m. The spatial trend covariate was standardised for each individual day, using time-specific means and variances. The standard deviation covariates were not standardised to avoid negative values. Negative values might lead to singularity of the covariance matrix, as explained below. Because of extreme values in the radar imagery (likely anomalies) the upper 0.1% of the radar image values were bounded to the 99.9% quantile and inspected visually to ensure that inconsistent values were detected and corrected. Rainfall measurements were not transformed prior to modelling. Averaging to daily values and including trend covariates removed much of the skewness, as confirmed by a post-hoc analysis of the residuals.
The global optimum for the log-likelihood function was obtained using differential evolution ( Storn and Price, 1997 ) as implemented in the R package DEoptim ( Ardia et al., 2015 ). We fixed the convergence threshold at 10 −10 . Calculations were done using parallel computing on an eight cores computer and estimation of the parameters took approximately 15 h for the whole year.
The correlogram and standard deviation parameters were bounded prior to estimation. The corresponding upper and lower limits are given in Table 1 . The limits were chosen based on physical reasoning and theoretical restrictions, e.g. the correlation length parameter a was not allowed to be greater than one-third of the extent of the study area and the micro-scale correlation parameter r 0 was forced between 0 and 1. The intercept for the standard deviation κ 0 was bounded with a lower bound set to a small positive value to avoid singularity problems that would occur if the standard deviation were too close to zero. For the same reason all other standard deviation coefficients were restricted to nonnegative values. In addition, whenever a proposal combination of model parameters generated by a DEoptim iteration produced a near-singular C matrix, such combination was rejected. We will discuss the singularity issue more extensively in the Discussion. The calibration was performed for 315 days of the year 2010. Days with no rainfall or excessive missing data were excluded.
Kriging prediction Predictions were made with global kriging using all observations. Since no standard implementation is available for non-stationary variance kriging, we developed our own code. We speeded up the algorithm by inverting matrices using Cholesky decomposition and by using parallel computing.
Simulated annealing For SSA we used the R package spsann ( Samuel-Rosa, 2016 ). The maximum distance that points could  move was set to half the extent of the study area, the actual distance was drawn from a uniform distribution between zero and the maximum distance. The maximum distance in which the raingauges can be moved becomes smaller as the number of iterations increases and converges to zero at the end of the process. The initial temperature was set to 0.1 with a cooling parameter α of 0.8.
The maximum number of chains was set to 140 whereas the number of iterations within a chain was set to the number of observations, so that the total number of iterations is 185 × 140 = 25 900. The process stops if no improvement is made after 100 chains or when the maximum number of chains is reached. The prediction error variance was evaluated on a coarse grid (3 km × 3 km) to avoid excessive computing time. We used a Linux server 4.4.0-38generic Ubuntu SMP with 48 cores, the total processing time for the SSA was approximately 580 h. and a . Recall that the trend covariate radar image was standardised in order to be able to compare its estimated regression coefficients with the intercept coefficients. The trend coefficients associated with the radar-rainfall map are nearly always positive ( β 1 > 0 for 92% of the days), indicating a positive effect of radar rainfall. This is as expected, since radar rainfall and rain-gauge rainfall are positively correlated (their Pearson correlation coefficient is about 0.96). Note also that the distributions of the trend coefficient estimates are positively skewed. This can be explained from the skew distribution of the rainfall (see Fig. 2 ). Since the radar image covariate was standardised the trend coefficient estimates are likely to be large during days with high rainfall, in particular the trend intercept. This is confirmed by the scatter plots shown in Fig. 11 ).   Table 1 for associated covariates. Summary statistics are provided in Table 3 and daily values in Fig. 8 for β, Fig. 9 for κ and in Fig. 10 for r 0 and a .

Parameter estimation
The estimates of the regression coefficients associated with the standard deviation covariates are always greater or equal to zero because zero was taken as a lower bound (except for κ 0 , which has a lower bound of 0.0 0 01). It appears that the lower bounds for κ 1 , κ 2 and κ 3 are a real restriction because the estimates are often almost equal to their lower bounds. For a few days estimates of κ 1 , κ 2 and κ 3 are pushed to their upper bounds. This occurs mainly when the rainfall amount is close to zero ( Fig. 11 ), which is not surprising because these are days where the standard deviation covariates are small. Note also that the contribution of each covariate to the standard deviation cannot be inferred by direct comparison of the coefficient estimates because the standard deviation covariates were not standardised.
The boxplots of the correlogram parameters in Fig. 4 show that for most days there is significant spatial correlation in the residuals. The micro-scale correlation parameter is symmetrically distributed around 0.50 whereas the correlation length parameter has a skew distribution with a median of about 10 km. For the exponential model this indicates a correlation up to about 30 km, which is not very large given the extent of the study area. Table 3 provides summary statistics of all parameter estimates. Fig. 5 shows an example of three successive days with radar image, spatial trend ( Eq. (2) ), prediction ( Eq. (11) ), standard deviation of residuals ( Eq. (3) ) and prediction error standard deviation (kriging standard deviation in Eq. (12) ), as obtained using the initial rain-gauge network design. For all three days the spatial pattern of the predicted rainfall is very similar to that of radar rainfall, illustrating the strong effect of radar rainfall on the final prediction. This is confirmed by the high β 1 estimates for February 15 and February 16. The β 1 estimate for February 14 is much smaller, but this is because the average rainfall was low on that day (recall that the radar map was standardised while the rainfall data were not). Note that February 15 and 16 show an underestimation of the actual rainfall accounted for by β 0 ( Table 2 ).

Kriging
The spatial pattern of the standard deviation maps is correlated with the radar rainfall map for February 14 and February 15. For February 16, from the two rainfall events of the predicted map (south-west and north-west), only one appears in the standard deviation map. This can be explained from the Elevation and Distance from nearest radar station covariate maps ( Fig. 3 ), which have low values in the south-west and high values in the north-west. The effect is even stronger in the kriging standard deviation map, because the rain-gauge density is higher in the south-west. The maps show also that the rainfall pattern may change dramatically over the course of one day. This confirms that the temporal correlation at daily scale is not be very strong. Fig. 6 shows the decrease of the prediction error variance as the sampling design is perturbed during SSA. The graph shows that several worsening designs are accepted at the beginning. After this initial phase the prediction error variance steadily decreases. After about 10,0 0 0 iterations, no substantial further reduction is achieved, suggesting that the algorithm reached a nearly optimum       7. Inital (left) and optimised (right) rain-gauge network with associated density of rain-gauges. Density is calculated using a Gaussian kernel as defined in Baddeley and Turner (2005) , using a standard deviation of 10 km. Values are expressed in rain-gauge per grid cell (500 m × 500 m).

Optimisation
design, as was confirmed by running the algorithm again and obtaining a similar pattern (results not shown). Note that a marked decrease is observed at the very end of the process. We explain the cause of this in the Discussion. Overall the criterion drops from 4.41 to 4.15, which represents an improvement of about 5.8 %. Fig. 7 shows the initial and optimised sampling locations of the rain-gauges with the associated spatial sampling density. The optimised design has a fairly uniform distribution of rain-gauges with a higher density in the north-west and a lower density in a large band from north-est to south-est and in the south-west. The optimised sampling network also puts rain-gauges towards the bound-ary of the study area. This is a well-known effect reported in Brus and Heuvelink (2007) and Van Groenigen et al. (1999) .

Discussion
For the three example dates, the trend and kriging prediction maps have a very similar pattern to that of radar rainfall. The trend is taken as a linear function of the radar image. The trend map and the kriging prediction map are nearly the same. This shows that the kriging step does not add much, which is because the residual variance is small and the residual spatial correlation is often weak.  Apparently, the radar signal is an important covariate and explains a large part of the rainfall spatial variation. This is not a surprising result that has been reported in many previous studies (e.g. Verworn and Haberlandt, 2011 ). The importance of radar rainfall is also confirmed by the trend regression coefficients ( Fig. 4 ), which are large for the radar covariate. Temporal correlation in daily rainfall is weak and ignored in this example study, but might become more important in case of modelling at a finer time scale, such as required in urban hydrology applications ( Muthusamy et al., 2016 ). Increase of temporal correlation would imply that rainfall at a previous time step becomes a significant covariate. In such case, a more elegant approach might be to replace spatial kriging as employed here by space-time kriging ( Gräler et al., 2016;Heuvelink et al., 2015 ).   . 11. Cross-correlation matrix between parameters and daily averaged rainfall from rain-gauges.
The standard deviation maps in Fig. 5 show that the radar image is also an important covariate to help explain the residual variance, but that other covariates, such as elevation and distance to the radar station, are important too. This is most obvious from the maps for February 16, where there are significant differences between the radar image and standard deviation maps, particularly in the south-west part of the study area. Comparison of the spatial standard deviation of the residuals and kriging standard deviation maps also shows that mapping does benefit from spatial interpolation of the residual: the kriging standard deviation is substantially smaller than the spatial standard deviation, particularly in areas with high rain-gauge density and high rainfall.
Comparison of the estimated trend and standard deviation coefficients with the amount of rainfall in Fig. 11 reveals that the upper bounds of the standard deviation coefficients are reached when very little rainfall is recorded. This can be explained from the fact that the standard deviation covariates are small when the radar rainfall is small, and hence any residual variation has to be represented by increasing the coefficient estimates. In contrast, the trend parameters are nearly always high during days of heavy rainfall. This is also as expected because the trend covariates were standardised and higher rainfall and higher rainfall variation is then modelled through higher trend regression coefficients.
Overall, the micro-scale correlation and correlation length parameters of the correlogram is insensitive to the amount of rainfall and shows a seasonal (winter/summer) pattern. In summer there is a stronger micro-scale correlation and a larger correlation distance than in winter. This may be related to rainfall type which varies by season, i.e. frontal weather systems in winter and convective rainfall in summer. The currently used correlogram is assumed isotropic due to computational simplicity, but one might consider relaxing this assumption as daily rainfall often exhibits significant anisotropy ( Gyasi-Agyei, 2016 ). Fig. 7 shows that after optimisation rain-gauges are placed fairly uniformly over the study area, but that some parts have up to four times higher sampling density than other parts. The high density areas are those that have on average large residual and kriging standard deviation. Since radar rainfall maps vary day by day and their annual average is nearly constant in space, high density sampling areas are correlated with the other standard deviation covariates, notably elevation and distance from nearest radar station. Elevation turns out to be most important, as indicated in Table 4 that shows the Pearson correlation coefficients between rain-gauge density and standard deviation covariates. Distance from nearest radar station is least important, which comes as a surprise but might be explained from the fact that the training data (i.e., rainfall data from the initial network) do not cover the distance from the nearest radar station feature space entirely, and hence the relationship between distance from nearest radar station and residual variation may be difficult to detect.
The decrease of the space-time average prediction error variance that results from optimising the network is relatively modest (i.e. 5%). It is smaller than that obtained in similar studies (e.g. Baume et al., 2011;Wang et al., 2014 ). The main reason for the modest decrease is that we imposed a static design that must do well for all days of the year. On the long run it performs better than the initial design but there will be days where the initial design (or any other design, for that matter) will do better, simply because prediction error variance is relatively large in those parts of the study area where there is substantial rain, and these vary day by day. If sampling design optimisation were applied to dynamic designs then a stronger reduction of the criterion would have been achieved, but clearly this is not a realistic option. The costs of moving rain-gauges would be too high, and moreover it is difficult to predict ahead of time where areas of high rainfall intensity will be. Even though the current 5% improvement is modest, it does improve the accuracy of the resulting maps. Alternatively, optimising the sampling design could also be used to reduce sampling costs. We evaluated this by optimising a sampling design that uses only 90% (166) of the rain-gauges used in this study. This resulted in a slightly smaller criterion value than that of the initial design using 185 rain-gauges. Thus, a 10% reduction in the num-ber of rain-gauge stations can be achieved without accuracy loss, provided these are placed optimally.
For this case study, the criterion decreases at the very end of the SSA iteration process when we expect it to stabilise ( Fig. 6 ). This can be explained by the coarse prediction grid that we used for calculating the space-time average kriging variance. The distance over which sampling locations are shifted becomes smaller than the grid mesh towards the end of the iteration process. The algorithm then moves points to the centre of grid cells, since these are the points for which the kriging variance is computed. We tested this hypothesis by computing the mean shortest distance from gauge locations to grid cell centres. For the optimal design it was only 10% of the expected mean shortest distance for a random design (which is 1150 m in case of a 3 km × 3 km grid). The final drop of the criterion is thus an artefact caused by using a coarse prediction grid. It can be eliminated by using a fine prediction grid, but this would increase computing time. Considering the extent of the study area, the artefact has no serious consequences for the optimal design, since these final shifts are relatively small.
Even with a coarse prediction grid the SSA algorithm took a lot of computing time. Alternative numerical optimisation methods could be tried (e.g. genetic algorithms ( Behzadian et al., 2009 ), Particle Swarm Optimisation ( Jarboui et al., 2007 ) or metaheuristic search (e.g. NSGAII ( Deb et al., 2003 )), but another option is to reduce the computing time during each SSA iteration step. The bulk of the work is associated with solving the kriging system, which we did by Cholesky decomposition of the covariance matrix ( Section 2.6 ). However, since each SSA iteration step only involves moving one station and hence only one row and column of the covariance matrix are changed, computations could be speeded up by using block inversion ( Heesterman, 1983 ). This would become particularly attractive when the number of rain-gauges n is large. However, it might conflict with the use of parallel computing solutions.
This study optimised a spatial sampling design for a case in which spatial variation was characterised by a non-stationary variance model. To the best of our knowledge this has not been done before, but it was important because it is not realistic to assume that rainfall spatial variation is stationary. We verified this by calculating and comparing the log-likelihood and Akaike Information Criterion (AIC) ( Akaike, 2011 ) for a stationary variance and nonstationary variance model. The stationary variance model was obtained from the non-stationary variance model by setting parameters κ 1 , κ 2 and κ 3 to zero. Parameters were optimised using REML as before. The log-likelihood and AIC results are shown in Table 5 . They clearly show that the non-stationary variance model is more suitable. The non-stationary variance model had a larger log-likelihood for 305 out of 315 days, while the AIC was smaller for 257 out of the 315 days. The use of a non-stationary variance model did pose some additional problems, though. We used a model in which the standard deviation is a linear combination of multiple covariates. In this respect, we extended the work of Lark (2009) or Hamm et al. (2012) by using a more complex variance component. Kriging is sensitive to a near-singular covariance matrix and different approaches may be used to avoid it ( Marchant and Lark, 2007a;Marchant et al., 2009 ). In our case, we initially avoided near-singularity during parameter estimation by rejecting parameter combinations suggested by the differential evolution algorithm if they lead to a reciprocal condition number ( Golub and Van Loan, 2012 ) smaller than 0.2. However, this did not completely solve the problem because during SSA optimisation, new network designs are tried. It then happened that near-singularity problems were introduced at this stage, while they did not occur for the initial network. We therefore imposed further restrictions on the standard deviation regression coefficients, by requiring that none can be negative and that the intercept must be greater or equal than a small positive threshold, as explained in Section 2.6 . This solved the near-singularity problem, but at the expense of restricting the parameter search space. Alternatively, a solution might be to model the log-transformed standard deviation as a linear function of covariates. Finally, we should note that while our approach included uncertainty in the trend regression coefficients, we ignored uncertainty about the standard deviation regression coefficients and correlogram parameters. The KED variance given in Eq. (12) may therefore underestimate the "true" uncertainty. In principle uncertainty about the covariance parameters can be included, such as by using a geostatistical approach ( Diggle and Ribeiro, 2007;Marchant and Lark, 2007b;Zhu and Stein, 20 06;Zimmerman, 20 06 ), but it is not obvious that the improved uncertainty assessment outweighs the substantial increase of computational complexity.

Conclusions
We extended geostatistical interpolation of rainfall data by employing a non-stationary variance KED model to predict rainfall by merging rain-gauge and radar data. We optimised the rain-gauge sampling design by minimising the space-time average KED variance for a study area in England. The main conclusions are: • Geostatistical prediction of rainfall from rain-gauge data and radar data benefits from a model that incorporates nonstationarity in the mean and variance. This model matched real-world observations better than a stationary variance model, as shown by likelihood and Akaike Information Criterion statistics. Estimation of non-stationary variance parameters is hampered by (near-)singularity problems. This particular problem should be investigated more closely. • In our case study we made spatial interpolation repeatedly over time, without accounting for the temporal structure. Temporal correlation of daily rainfall is small, but it might increase if a smaller temporal support is used, such as for hourly or 10-min average rainfall. Space-time kriging might then be a more attractive approach. • The standard deviation of rainfall residuals, i.e. rain-gauge rainfall minus a trend mainly derived from radar rainfall, is positively correlated with radar rainfall, elevation, distance to radar station and beam blockage. In our case study the standard deviation depended more on elevation than on distance to radar station, which in turn was more important than beam blockage. Future studies may show whether this is a consistent finding or case dependent. • Geostatistical optimisation of a rain-gauge network is feasible and yields plausible designs. The optimal design aims for a fairly uniform spatial distribution of the gauges, with an increased density in areas where the residual variance is large. In our case study this was in areas with high elevation, far from radar stations and near the study area boundary. The sampling density in densely sampled parts of the study area was four times higher than in sparsely sampled parts. Further work could include field accessibility in a multi-objective optimisation procedure ( Stumpf et al., 2016 ).
• Optimisation of the rain-gauge network leads to only a modest improvement of the space-time average prediction error variance. This is a consequence of using a static design, which cannot increase sampling density of subareas with heavy rainfall, because these subareas vary from day to day. Nonetheless, the achieved improvement is relevant and implies that savings on data collection costs could be achieved without compromising on prediction accuracy.