Comparative analysis of different techniques for spatial interpolation of rainfall data to create a serially complete monthly time series of precipitation for Sicily, Italy

doi:10.1016/j.jag.2011.01.005

International Journal of Applied Earth Observation and Geoinformation

Volume 13, Issue 3, June 2011, Pages 396-408

https://doi.org/10.1016/j.jag.2011.01.005 Get rights and content

Abstract

The availability of good and reliable rainfall data is fundamental for most hydrological analyses and for the design and management of water resources systems. However, in practice, precipitation records often suffer from missing data values mainly due to malfunctioning of raingauge for specific time periods. This is an important issue in practical hydrology because it affects the continuity of rainfall data and ultimately influences the results of hydrologic studies which use rainfall as input. Many methods to estimate missing rainfall data have been proposed in literature and, among these, most are based on spatial interpolation algorithms.

In this paper different spatial interpolation algorithms have been evaluated to produce a reasonably good continuous dataset bridging the gaps in the historical series. The algorithms used are deterministic methods such as inverse distance weighting, simple linear regression, multiple regression, geographically weighted regression and artificial neural networks, and geostatistical models such as ordinary kriging and residual ordinary kriging. In some of these methods, the elevation information, provided by a Digital Elevation Model, has been added to improve estimation of missing data. These algorithms have been applied to the mean annual and monthly rainfall data of Sicily (Italy), measured at 247 raingauges.

Optimization of different settings of the various interpolation methods has been carried out using a subset of the available rainfall dataset (modeling set) while the remaining subset (validation set) has been used to compare the results obtained by the different algorithms.

Validation results indicate that the univariate methods, neglecting the information of elevation, are characterized by the largest errors, which decrease when the elevation is taken into account. The ordinary kriging of residuals from linear regression between precipitation and elevation, which has provided the best performance at annual and monthly scale, has been used to complete the precipitation monthly time series in Sicily.

Highlights

► Univariate methods: the best performance obtained with the ordinary kriging method. ► Univariate methods improved by the introduction of the elevation information. ► Residual kriging application improves the accuracy of underlying deterministic methods. ► Residual kriging application increases, unfortunately, the bias of the deterministic ones. ► Morphology cannot be neglected when interpolation of climatic variables is carried out.

Introduction

The availability of a reliable source of rainfall and climate data is a fundamental prerequisite for the modeling of a wide variety of hydrological and environmental processes. While the nature and the structure of hydrological and environmental models may vary, most of them need a precipitation dataset that is complete and reliable on a temporal and spatial basis. Unfortunately, measurement of hydrological variables (e.g. rainfall, temperature, streamflows, etc.) can suffer from systematic, random errors and gaps (missing data) (Larson and Peck, 1974, Vieux, 2001) and, among these, the missing data problem is probably the most important one.

Generally there are two different approaches to treat the missing data or data gaps: one possible approach consists of using only continuous records, ignoring the prior (or subsequent) events, while another approach suggests ignoring the gaps, assuming that the data are one continuous series of records. With the former approach many data are wasted and correct statistical inference cannot be made whereas the latter approach reduces the period of recorded events and overestimates the likelihood of occurrence of extreme events. On the other hand, the use of the dataset prone to missing data can result in errors that exhibit temporal and spatial patterns (Stooksbury et al., 1999). A valid alternative to the above mentioned approaches consists of filling the gaps in the rainfall time series by estimating the missing values. The reconstruction of serially incomplete data records has been the subject of a large number of scientific works where numerous techniques for estimating missing data values have been implemented and compared.

Generally traditional weighting and data-driven methods can be used for estimating rainfall data; while regression, artificial neural networks and time series analysis belong to the latter, the former methods are given by a class of spatial interpolation techniques which, in turn, can be classified in two categories: the deterministic methods such as inverse-distance weighting and non-linear interpolation such as spline techniques, and the stochastic interpolation methods of the kriging family.

Many papers have been dedicated to the comparison between deterministic and stochastic approaches to reconstruct daily records using spatial interpolation algorithms to estimate missing data. Eischeid et al. (2000) used six different methods of spatial interpolation to create a serially complete daily temperature and precipitation dataset for the United States. Each of these six methods has been compared by month for each station and the one with the highest correlation with the station where the missing data have to be estimated was chosen as the method used to replace any missing data at that location. Jeffrey et al. (2001) derived a comprehensive archive of Australian rainfall and climate data using a thin plate smoothing spline to interpolate daily climate variables and ordinary kriging to interpolate daily and monthly rainfall.

Among all different spatial interpolation methods, the inverse distance weighting (IDW) method is, probably, the most commonly used for estimation of missing data in hydrology and geographical sciences. But several variants of IDW are derived and adopted by researchers with a main focus on the weighting schemes. In fact the success of the IDW method depends primarily on the existence of positive spatial autocorrelation (Griffith, 1987, Vasiliev, 1996), because data from locations near one another in space are more likely to be similar than data from locations remote from one another (Tobler, 1970). Unfortunately this condition is not always true and then inserts arbitrariness in the choice of weighting parameters.

Another significant issue related to the IDW method is the arbitrary selection of neighborhood points of observations for the estimation of missing data at a point of interest. Beginning from these limitations, Teegavarapu and Chandramouli (2005) introduced several conceptual improvements to the traditional inverse distance weighting method, suggesting six different versions of spatial interpolation algorithms. The results obtained suggested that the conceptual revisions of the IDW method can improve estimation of missing precipitation records by changing the procedure to estimate the weighting parameters and surrogating measures for distances used in the same method.

Other works used artificial neural network (ANNs) to infill missing data in climatic time series. Among these, Coulibaly and Evora (2007) performed a comparison of six different types of ANN approaches for infilling of missing daily total precipitation and daily extreme temperature series in study. The evaluation of the accuracy of the different models for infilling data gaps, carried out using daily precipitation from 15 weather validation stations, highlighted the Multi Layer Perceptron (MLP) as the most effective for infilling missing daily precipitation values. Demyanov et al. (1998) proposed a two step spatial interpolation method named direct neural network residual kriging: the first step is a data-driven approach which includes estimating large scale spatial structure by using an ANN while the second step is the analysis of residuals carried out using a geostatistical method; final estimates are produced as a sum of ANN estimates and ordinary kriging estimates of residuals.

Different authors used the elevation in order to improve the spatial prediction of rainfall. Martinez-Cob (1996), using three different geostatistical methods (ordinary kriging, co-kriging with elevation and modified residual kriging) to interpolate precipitation and reference evapotranspiration at annual scale, found that co-kriging was superior for precipitation interpolation, reducing estimation uncertainty by 18.7% and 24.3% compared with ordinary kriging and modified residual kriging, respectively. Goovaerts (2000) applied spatial interpolation methods to annual and monthly rainfall observations measured at available raingauges using two different groups of algorithms: three multivariate geostatistical algorithms that incorporate a digital elevation model into the spatial prediction of rainfall (simple kriging with varying local means, kriging with an external drift, colocated cokriging) and three univariate techniques (the Thiessen polygon, inverse square distance, ordinary kriging) which do not take into account the elevation. The comparison among these methods pointed out that the three multivariate geostatistical algorithms gave the lowest errors in rainfall prediction. Diodato and Ceccarelli (2005) compared three interpolation methods (IDW, linear regression and co-kriging) to the rainfall recorded in a region of 1400 km² in Southern Italy with elevation ranging from 400 m to 1100 m, concluding that the best method is co-kriging since it is able to take into account several properties of the landscape like the elevation. The elevation can also be taken into account by the Hierarchical Bayesan models (Banerjee et al., 2004), which are included among the most promising stochastic spatial interpolation methods and are widely used for the estimation and modeling of climatic spatial data, and can estimate precipitation at ungauged site.

One of the main limitations of the spatial interpolation methods used to fill the climatic time series is that they neglect the spatio-temporal structure of the time series. In order to overcome this problem, models handling the spatial and temporal dependence simultaneously have been developed and used. A practical use of these kinds of models is given by Gneiting (2002) who proposed general classes of nonseparable stationary covariance functions for spatio-temporal random processes. The author used a covariance model with a readily interpretable space–time interaction parameter to analyze climatic data in Ireland. Another interesting work that used spatio-temporal modeling by adopting Bayesian inference has been carried out by Gelfand et al. (2005). The authors viewed climatic data as a time series of spatial processes and worked in the setting of dynamic models, achieving a class of dynamic models for such data (precipitation from monitoring stations in Colorado, USA).

In our paper, only the rainfall data spatial structural dependence is used to reconstruct missing rainfall data, neglecting then the spatial-temporal dependence. In particular, from previous works (Bono et al., 2005), different algorithms used for the spatial interpolation of rainfall data are presented and applied to annual and monthly average rainfall data of Sicily (Italy) measured at 247 raingauges. These different models are then compared with each other through a validation procedure in order to choose the process of reconstruction of the historical data that leads to better results, that is, the model characterized by the lower bias and by the greater accuracy on the validation set.

Section snippets

Case study

This study has been carried out for the largest island in the Mediterranean Sea: Sicily, Italy which extends over an area of 25,700 km². The mean annual precipitation over Sicily is about 715 mm (period 1921–2004) with rainfall concentrated in the winter period. The July–August months are usually rainless. There is a strong spatial variability of precipitation, ranging from an average of 400 mm in the South-Eastern part to an average of 1300 mm in the Northern-Eastern part.

Precipitation dataset

Interpolation algorithms

The problem here analyzed is to provide the estimate $\overset{\circ}{z}$ of the rainfall variable z at an ungauged location x₀ using rainfall data at gauged sites. Denoting with {z(x_i), i=1,2,….N} the precipitation dataset measured at the N sites x_i, two different classes of interpolators have been here used: univariate methods and elevation-aided interpolation (EAI) methods. While the former take into account only the data and spatial coordinates x_i, the latter use also supplementary data as elevation q(x_i) of

Analysis of results

In this section the results obtained using different interpolation methods are analyzed and compared. This comparison is initially carried out using the average annual rainfall data. The results provided by this kind of analysis will be used to limit the number of trials carried out on the dataset concerning the average monthly precipitation. Finally, on the basis of the best average annual and monthly estimation methods, respectively, the reconstruction of rainfall data corresponding to the

Conclusions

As mentioned above the aim of this study has been the comparison of various treatment methods, based on spatial interpolation, finalized to estimate the missing data in precipitation records.

From the comparison of these methods, it has been observed that, among the univariate methods, the best performance has been obtained with the ordinary kriging method. In fact the geostatistical methods, such as kriging, unlike the simpler methods such as inverse distance weighting, take into account most

References (36)

P. Coulibaly et al.
Comparison of neural network methods for infilling missing daily weather records
Journal of Hydrology
(2007)
P Goovaerts
Using elevation to aid the geostatistical mapping of rainfall erosivity
Catena
(1999)
P. Goovaerts
Geostatistical approaches for incorporating elevation into the spatial interpolation of rainfall
Journal of Hydrology
(2000)
S.J. Jeffrey et al.
Using spatial interpolation to construct a comprehensive archive of australian climate data
Environmental Modelling and Software
(2001)
A. Martinez-Cob
Multivariate geostatistical analysis of evapotransiration and precipitation in mountainous terrain
Journal of Hydrology
(1996)
M. Moller
A scaled conjugate gradient algorithm for fast supervised learning
Neural Networks
(1993)
R. Teegavarapu et al.
Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records
Journal of Hydrology
(2005)
D.F. Andrews
A robust method for multiple linear regression
Technometrics
(1974)
S. Banerjee et al.
Hierarchical Modeling and Analysis for Spatial Data
(2004)
R. Battiti
1st-order and 2nd-order methods for learning between steepest descent and newton method
Neural Computation
(1992)

C. Bishop

Neural Networks for Pattern Recognition

(1995)

Bono, E., La Loggia, G., Noto, L., 2005. Spatial interpolation methods based on the use of elevation data. Geophysical...

C. Brunsdon et al.

Spatial variations in the average rainfall-altitude relationship in great britain: an approach using geographically weighted regression

International Journal of Climatology

(2001)

M.J. de Smith et al.

Geospatial Analysis—A Comprehensive Guide

(2006)

V. Demyanov et al.

Neural network residual kriging application for climatic data

Journal of Geographic Information and Decision Analysis

(1998)

J. Dennis et al.

Numerical Methods for Unconstrained Optimization and Nonlinear Equations

(1983)

N. Diodato et al.

Interpolation processes using multivariate geostatistics for mapping of climatological precipitation mean in the sannio mountains (Southern Italy)

Earth Surface Processes and Landforms

(2005)

J. Eischeid et al.

Creating a serially complete, national daily time series of temperature and precipitation for the western United States

Journal of Applied Meteorology

(2000)

Cited by (193)

Improving the accuracy of wind speed spatial interpolation: A pre-processing algorithm for wind speed dynamic time warping interpolation
2024, Energy
Wind power is one of the most vital renewable energy resources in the world. Wind energy production is directly correlated with the quality and quantity of wind speed data. Interpolation techniques can be employed to fill in the gaps in the current wind speed observation data series. However, existing methods for obtaining blank data do not pre-regulate the regional spatial wind speed sequence and instead rely on direct interpolation, which leads to low accuracy in the interpolation. This is not a problem with the model itself, but rather with the wind speed flowing in space and exhibiting sequence misalignment on the timeline. To address this issue, this study proposes a Wind Speed Dynamic Time Warping (WSDTW) algorithm based on Dynamic Time Warping (DTW) to match similar wind speed reduction sequences in terms of time error. We used the shape context descriptor to encode wind speed and introduced wind rose descriptors to represent wind direction initially. The matching cost of DTW was then optimized. Finally, five common interpolation methods were selected to evaluate the method. The research results indicate that interpolation after WSDTW matching and warping can significantly improve the accuracy of wind speed interpolation and reduce the spatial dependence of wind speed. This method demonstrates good stability and generalization, performing exceptionally well in situations where there are gaps in regional wind speed data or missing data.
Comparison of IDW, Kriging and orographic based linear interpolations of rainfall in six rainfall regimes of Ethiopia
2024, Journal of Hydrology: Regional Studies
Six rainfall regimes of Ethiopia covering an area of 1.13 million km².
Hydrological studies in mountainous terrain require high-resolution spatiotemporal rainfall data. Individual linear interpolation based on elevation, aspect and slope, multivariate linear interpolation,inverse distance weighted, and kriging were employed in six regimes. The performance of the six methods were assessed through cross validation of rain gauge stations and ENACTS data.
We adopted different coefficients for six rainfall regimes with respect to orographic liner interpolations. The results confirmed the importance of six rainfall regimes to carry out analysis of the interpolations. This is verified by the fact that performance of the products derived from the approaches differed from regime to regime. Elevation based interpolation is the dominant in regime I and IV. For regime II and III respectively slope and aspect based interpolations are performed well, whereas kriging interpolation is exhibited better in both regime V and VI. It is difficult to get a single best rainfall interpolation method for the Ethiopia heterogeneous land surface. Thus, this study recommends elevation based interpolation to be employed in the mountainous areas where as Kriging can be applied in the area where orographic and local climatic variability are insignificant. For interpolation of rainfall stations, it is essential to apply the methods after classify the study regions based on rainfall variability.
Application of multiple spatial interpolation approaches to annual rainfall data in the Wadi Cheliff basin (north Algeria)
2024, Ain Shams Engineering Journal
This study addresses a challenging problem of predicting mean annual precipitation across arid and semi-arid areas in northern Algeria, utilizing deterministic, geostatistical (GS), and machine learning (ML) models. Through the analysis of data spanning nearly five decades and encompassing 150 monitoring stations, the result of Random Forest showed the highest training performance, with R square value (of 0.9524) and the Root Mean Square Error (of 24.98). Elevation emerges as a critical factor, enhancing prediction accuracy in mountainous and complex terrains when used as an auxiliary variable. Cluster analysis further refines our understanding of station distribution and precipitation characteristics, identifying four distinct clusters, each exhibiting unique precipitation patterns and elevation zones. This study helps for a better understanding of precipitation prediction, encouraging the integration of additional variables and the exploration of climate change impacts, thereby contributing to informed environmental management and adaptation strategies across diverse climatic and terrain scenarios.
A comparative analysis of missing data imputation techniques on sedimentation data
2024, Ain Shams Engineering Journal
Sediment data pertains to various hydrological variables with complex sediment hydrodynamics such as sedimentation rates which are often incompletely presented. Thus, the availability of sedimentation data is of utmost necessity for data accessibility. A comparative analysis on the missing fine sediment data imputation performance was made based on four different techniques, namely the k-Nearest Neighbourhood (k-NN), Support Vector Regression (SVR), Multiple Regression (MR), and Artificial Neural Network (ANN), under the single imputation (SI) and multiple imputation (MI) regimes. Across different missing data proportions (10%-50%), the ANN demonstrated optimal results with consistent performance metrics recorded over both SI and MI regimes. For the highest missing data proportion (50%), the ANN presented the best imputation performance with a reported root mean squared error (RMSE) 0.000882, mean absolute error (MAE) 0.000595, coefficient of determination (R²) 71%, and Kling-Gupta Efficiency (KGE) 72%. The imputation performance ranking is as follows: ANN, SVR, MR, and k-NN.
Imputation of missing daily rainfall data; A comparison between artificial intelligence and statistical techniques
2023, MethodsX
Handling missing values is a critical component of the data processing in hydrological modeling. The key objective of this research is to assess statistical techniques (STs) and artificial intelligence-based techniques (AITs) for imputing missing daily rainfall values and recommend a methodology applicable to the mountainous terrain of northern Thailand. In this study, 30 years of daily rainfall data was collected from 20 rainfall stations in northern Thailand and randomly 25–35 % of data was deleted from four target stations based on Spearman correlation coefficient between the target and neighboring stations. Imputation models were developed on training and testing datasets and statistically evaluated by mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R²), and correlation coefficient (r). This study used STs, including arithmetic averaging (AA), multiple linear regression (MLR), normal-ratio (NR), nonlinear iterative partial least squares (NIPALS) algorithm, and linear interpolation was used.
- •
  STs results were compared with AITs, including long-short-term-memory recurrent neural network (LSTM-RNN), M5 model tree (M5-MT), multilayer perceptron neural networks (MLPNN), support vector regression with polynomial and radial basis function SVR-poly and SVR-RBF.
- •
  The findings revealed that MLR imputation model achieved an average MAE of 0.98, RMSE of 4.52, and R² was about 79.6 % at all target stations. On the other hand, for the M5-MT model, the average MAE was 0.91, RMSE was about 4.52, and R² was around 79.8 % compared to other STs and AITs. M5-MT was most prominent among AITs. Notably, the MLR technique stood out as a recommended approach due to its ability to deliver good estimation results while offering a transparent mechanism and not necessitating prior knowledge for model creation.
Land use/cover changes and subsequent water budget imbalance exacerbate soil aridification in the farming-pastoral ecotone of northern China
2023, Journal of Hydrology
Understanding the impact of human activities on changes in surface dryness/wetness is crucial for the sustainable development of terrestrial ecosystems. The farming-pastoral ecotone of northern China (FPENC) is experiencing obvious soil aridification under ongoing climate change. The impacts of human activities on cropland and grassland vary. However, how human activities affect soil aridification in terms of cropland and grassland remains unclear. To fill this knowledge gap, this impact mechanism based on the remote sensing retrieval of aridification index and the construction of water budget mismatch index(WMI) is quantified in this study. The results demonstrate a strong spatial consistency between the WMI and the trend of soil aridification. In particular, the trend of soil aridification in the west-central and northeastern areas remained unmitigated where the WMI was less than one. This was due to a more dramatic increase in evapotranspiration (ET) (+64.7 mm) compared to precipitation after subtracting runoff (P_i) (+53.9 mm) in the decade between 2001 and 2019. Under the same climatic conditions, croplands had higher crop evapotranspiration and lower soil water storage capacities (WSC) than grasslands, resulting in the aridification intensity of croplands being 1.3 times greater than that of grasslands. Moreover, irrigation exacerbated the depletion of regional groundwater and intensified soil aridification in adjacent drylands and grasslands within a radius of at least 2000 m. These findings provide a scientific foundation for comprehending the mechanism of aridification, developing ecological restoration strategies, and promoting regional sustainable development in the FPENC.

View all citing articles on Scopus

View full text

Comparative analysis of different techniques for spatial interpolation of rainfall data to create a serially complete monthly time series of precipitation for Sicily, Italy

Abstract

Highlights

Introduction

Section snippets

Case study

Interpolation algorithms

Analysis of results

Conclusions

Journal of Hydrology

Catena

Journal of Hydrology

Environmental Modelling and Software

Journal of Hydrology

Neural Networks

Journal of Hydrology

A robust method for multiple linear regression

Technometrics

Hierarchical Modeling and Analysis for Spatial Data

1st-order and 2nd-order methods for learning between steepest descent and newton method

Neural Computation

Neural Networks for Pattern Recognition

Spatial variations in the average rainfall-altitude relationship in great britain: an approach using geographically weighted regression

International Journal of Climatology

Geospatial Analysis—A Comprehensive Guide

Neural network residual kriging application for climatic data

Journal of Geographic Information and Decision Analysis

Numerical Methods for Unconstrained Optimization and Nonlinear Equations

Interpolation processes using multivariate geostatistics for mapping of climatological precipitation mean in the sannio mountains (Southern Italy)

Earth Surface Processes and Landforms

Creating a serially complete, national daily time series of temperature and precipitation for the western United States

Journal of Applied Meteorology