Comparative analysis of different techniques for spatial interpolation of rainfall data to create a serially complete monthly time series of precipitation for Sicily, Italy

https://doi.org/10.1016/j.jag.2011.01.005Get rights and content

Abstract

The availability of good and reliable rainfall data is fundamental for most hydrological analyses and for the design and management of water resources systems. However, in practice, precipitation records often suffer from missing data values mainly due to malfunctioning of raingauge for specific time periods. This is an important issue in practical hydrology because it affects the continuity of rainfall data and ultimately influences the results of hydrologic studies which use rainfall as input. Many methods to estimate missing rainfall data have been proposed in literature and, among these, most are based on spatial interpolation algorithms.

In this paper different spatial interpolation algorithms have been evaluated to produce a reasonably good continuous dataset bridging the gaps in the historical series. The algorithms used are deterministic methods such as inverse distance weighting, simple linear regression, multiple regression, geographically weighted regression and artificial neural networks, and geostatistical models such as ordinary kriging and residual ordinary kriging. In some of these methods, the elevation information, provided by a Digital Elevation Model, has been added to improve estimation of missing data. These algorithms have been applied to the mean annual and monthly rainfall data of Sicily (Italy), measured at 247 raingauges.

Optimization of different settings of the various interpolation methods has been carried out using a subset of the available rainfall dataset (modeling set) while the remaining subset (validation set) has been used to compare the results obtained by the different algorithms.

Validation results indicate that the univariate methods, neglecting the information of elevation, are characterized by the largest errors, which decrease when the elevation is taken into account. The ordinary kriging of residuals from linear regression between precipitation and elevation, which has provided the best performance at annual and monthly scale, has been used to complete the precipitation monthly time series in Sicily.

Highlights

► Univariate methods: the best performance obtained with the ordinary kriging method. ► Univariate methods improved by the introduction of the elevation information. ► Residual kriging application improves the accuracy of underlying deterministic methods. ► Residual kriging application increases, unfortunately, the bias of the deterministic ones. ► Morphology cannot be neglected when interpolation of climatic variables is carried out.

Introduction

The availability of a reliable source of rainfall and climate data is a fundamental prerequisite for the modeling of a wide variety of hydrological and environmental processes. While the nature and the structure of hydrological and environmental models may vary, most of them need a precipitation dataset that is complete and reliable on a temporal and spatial basis. Unfortunately, measurement of hydrological variables (e.g. rainfall, temperature, streamflows, etc.) can suffer from systematic, random errors and gaps (missing data) (Larson and Peck, 1974, Vieux, 2001) and, among these, the missing data problem is probably the most important one.

Generally there are two different approaches to treat the missing data or data gaps: one possible approach consists of using only continuous records, ignoring the prior (or subsequent) events, while another approach suggests ignoring the gaps, assuming that the data are one continuous series of records. With the former approach many data are wasted and correct statistical inference cannot be made whereas the latter approach reduces the period of recorded events and overestimates the likelihood of occurrence of extreme events. On the other hand, the use of the dataset prone to missing data can result in errors that exhibit temporal and spatial patterns (Stooksbury et al., 1999). A valid alternative to the above mentioned approaches consists of filling the gaps in the rainfall time series by estimating the missing values. The reconstruction of serially incomplete data records has been the subject of a large number of scientific works where numerous techniques for estimating missing data values have been implemented and compared.

Generally traditional weighting and data-driven methods can be used for estimating rainfall data; while regression, artificial neural networks and time series analysis belong to the latter, the former methods are given by a class of spatial interpolation techniques which, in turn, can be classified in two categories: the deterministic methods such as inverse-distance weighting and non-linear interpolation such as spline techniques, and the stochastic interpolation methods of the kriging family.

Many papers have been dedicated to the comparison between deterministic and stochastic approaches to reconstruct daily records using spatial interpolation algorithms to estimate missing data. Eischeid et al. (2000) used six different methods of spatial interpolation to create a serially complete daily temperature and precipitation dataset for the United States. Each of these six methods has been compared by month for each station and the one with the highest correlation with the station where the missing data have to be estimated was chosen as the method used to replace any missing data at that location. Jeffrey et al. (2001) derived a comprehensive archive of Australian rainfall and climate data using a thin plate smoothing spline to interpolate daily climate variables and ordinary kriging to interpolate daily and monthly rainfall.

Among all different spatial interpolation methods, the inverse distance weighting (IDW) method is, probably, the most commonly used for estimation of missing data in hydrology and geographical sciences. But several variants of IDW are derived and adopted by researchers with a main focus on the weighting schemes. In fact the success of the IDW method depends primarily on the existence of positive spatial autocorrelation (Griffith, 1987, Vasiliev, 1996), because data from locations near one another in space are more likely to be similar than data from locations remote from one another (Tobler, 1970). Unfortunately this condition is not always true and then inserts arbitrariness in the choice of weighting parameters.

Another significant issue related to the IDW method is the arbitrary selection of neighborhood points of observations for the estimation of missing data at a point of interest. Beginning from these limitations, Teegavarapu and Chandramouli (2005) introduced several conceptual improvements to the traditional inverse distance weighting method, suggesting six different versions of spatial interpolation algorithms. The results obtained suggested that the conceptual revisions of the IDW method can improve estimation of missing precipitation records by changing the procedure to estimate the weighting parameters and surrogating measures for distances used in the same method.

Other works used artificial neural network (ANNs) to infill missing data in climatic time series. Among these, Coulibaly and Evora (2007) performed a comparison of six different types of ANN approaches for infilling of missing daily total precipitation and daily extreme temperature series in study. The evaluation of the accuracy of the different models for infilling data gaps, carried out using daily precipitation from 15 weather validation stations, highlighted the Multi Layer Perceptron (MLP) as the most effective for infilling missing daily precipitation values. Demyanov et al. (1998) proposed a two step spatial interpolation method named direct neural network residual kriging: the first step is a data-driven approach which includes estimating large scale spatial structure by using an ANN while the second step is the analysis of residuals carried out using a geostatistical method; final estimates are produced as a sum of ANN estimates and ordinary kriging estimates of residuals.

Different authors used the elevation in order to improve the spatial prediction of rainfall. Martinez-Cob (1996), using three different geostatistical methods (ordinary kriging, co-kriging with elevation and modified residual kriging) to interpolate precipitation and reference evapotranspiration at annual scale, found that co-kriging was superior for precipitation interpolation, reducing estimation uncertainty by 18.7% and 24.3% compared with ordinary kriging and modified residual kriging, respectively. Goovaerts (2000) applied spatial interpolation methods to annual and monthly rainfall observations measured at available raingauges using two different groups of algorithms: three multivariate geostatistical algorithms that incorporate a digital elevation model into the spatial prediction of rainfall (simple kriging with varying local means, kriging with an external drift, colocated cokriging) and three univariate techniques (the Thiessen polygon, inverse square distance, ordinary kriging) which do not take into account the elevation. The comparison among these methods pointed out that the three multivariate geostatistical algorithms gave the lowest errors in rainfall prediction. Diodato and Ceccarelli (2005) compared three interpolation methods (IDW, linear regression and co-kriging) to the rainfall recorded in a region of 1400 km2 in Southern Italy with elevation ranging from 400 m to 1100 m, concluding that the best method is co-kriging since it is able to take into account several properties of the landscape like the elevation. The elevation can also be taken into account by the Hierarchical Bayesan models (Banerjee et al., 2004), which are included among the most promising stochastic spatial interpolation methods and are widely used for the estimation and modeling of climatic spatial data, and can estimate precipitation at ungauged site.

One of the main limitations of the spatial interpolation methods used to fill the climatic time series is that they neglect the spatio-temporal structure of the time series. In order to overcome this problem, models handling the spatial and temporal dependence simultaneously have been developed and used. A practical use of these kinds of models is given by Gneiting (2002) who proposed general classes of nonseparable stationary covariance functions for spatio-temporal random processes. The author used a covariance model with a readily interpretable space–time interaction parameter to analyze climatic data in Ireland. Another interesting work that used spatio-temporal modeling by adopting Bayesian inference has been carried out by Gelfand et al. (2005). The authors viewed climatic data as a time series of spatial processes and worked in the setting of dynamic models, achieving a class of dynamic models for such data (precipitation from monitoring stations in Colorado, USA).

In our paper, only the rainfall data spatial structural dependence is used to reconstruct missing rainfall data, neglecting then the spatial-temporal dependence. In particular, from previous works (Bono et al., 2005), different algorithms used for the spatial interpolation of rainfall data are presented and applied to annual and monthly average rainfall data of Sicily (Italy) measured at 247 raingauges. These different models are then compared with each other through a validation procedure in order to choose the process of reconstruction of the historical data that leads to better results, that is, the model characterized by the lower bias and by the greater accuracy on the validation set.

Section snippets

Case study

This study has been carried out for the largest island in the Mediterranean Sea: Sicily, Italy which extends over an area of 25,700 km2. The mean annual precipitation over Sicily is about 715 mm (period 1921–2004) with rainfall concentrated in the winter period. The July–August months are usually rainless. There is a strong spatial variability of precipitation, ranging from an average of 400 mm in the South-Eastern part to an average of 1300 mm in the Northern-Eastern part.

Precipitation dataset

Interpolation algorithms

The problem here analyzed is to provide the estimate z of the rainfall variable z at an ungauged location x0 using rainfall data at gauged sites. Denoting with {z(xi), i=1,2,….N} the precipitation dataset measured at the N sites xi, two different classes of interpolators have been here used: univariate methods and elevation-aided interpolation (EAI) methods. While the former take into account only the data and spatial coordinates xi, the latter use also supplementary data as elevation q(xi) of

Analysis of results

In this section the results obtained using different interpolation methods are analyzed and compared. This comparison is initially carried out using the average annual rainfall data. The results provided by this kind of analysis will be used to limit the number of trials carried out on the dataset concerning the average monthly precipitation. Finally, on the basis of the best average annual and monthly estimation methods, respectively, the reconstruction of rainfall data corresponding to the

Conclusions

As mentioned above the aim of this study has been the comparison of various treatment methods, based on spatial interpolation, finalized to estimate the missing data in precipitation records.

From the comparison of these methods, it has been observed that, among the univariate methods, the best performance has been obtained with the ordinary kriging method. In fact the geostatistical methods, such as kriging, unlike the simpler methods such as inverse distance weighting, take into account most

References (36)

  • C. Bishop

    Neural Networks for Pattern Recognition

    (1995)
  • Bono, E., La Loggia, G., Noto, L., 2005. Spatial interpolation methods based on the use of elevation data. Geophysical...
  • C. Brunsdon et al.

    Spatial variations in the average rainfall-altitude relationship in great britain: an approach using geographically weighted regression

    International Journal of Climatology

    (2001)
  • M.J. de Smith et al.

    Geospatial Analysis—A Comprehensive Guide

    (2006)
  • V. Demyanov et al.

    Neural network residual kriging application for climatic data

    Journal of Geographic Information and Decision Analysis

    (1998)
  • J. Dennis et al.

    Numerical Methods for Unconstrained Optimization and Nonlinear Equations

    (1983)
  • N. Diodato et al.

    Interpolation processes using multivariate geostatistics for mapping of climatological precipitation mean in the sannio mountains (Southern Italy)

    Earth Surface Processes and Landforms

    (2005)
  • J. Eischeid et al.

    Creating a serially complete, national daily time series of temperature and precipitation for the western United States

    Journal of Applied Meteorology

    (2000)
  • Cited by (193)

    View all citing articles on Scopus
    View full text