Convergence in distribution of the -deviations of the kernel-type variogram estimators with applications
Introduction
The kriging techniques allow the researchers to reconstruct a phenomenon over the whole observation region, from a finite set of data, with applications in a large spectrum of areas, such as hydrology, atmospheric science, geology, etc. However, the aforementioned procedures demand an appropriate characterization of the second-order structure of the underlying random process, which can be addressed through the variogram or the covariance functions.
Let be a spatial random process, where denotes the observation region. We will focus our attention on the variogram of an intrinsically stationary random process, so the following conditions are assumed:
- (I1)
, for all and some constant .
- (I2)
, for all and some function , which is referred to as the semivariogram ( is the variogram).
- (I2′)
, for all and some function , where denotes the Euclidean norm.
For approximation of the variogram, nonparametric procedures may be used in a first step, providing us with the empirical estimator and more robust alternatives, studied in Matheron (1963), Cressie (1993) or Genton (1998). On the other hand, kernel-type approaches can be derived by adapting the Nadaraya–Watson estimation or the local linear method to the spatial setting, as suggested in Hall and Patil (1994) or in García-Soidán et al. (2003), respectively. Alternative mechanisms for approximating the semivariogram include the constant and the variable nearest-neighbor estimators, given in Yu and Mateu (2002), which yield generalizations of the Matheron and the Nadaraya–Watson semivariograms, respectively.
The performance of different nonparametric semivariogram estimators is analyzed in Menezes et al. (2006), which have been put into comparison in a numerical study covering a range of dependence situations. Nevertheless, the above-mentioned approaches are not necessarily valid for their direct application to spatial prediction. In fact, they typically fail to fulfill the conditionally negative definiteness property, which is satisfied by the theoretical semivariogram and must be required from their estimators, in order to guarantee a solution for the kriging equations. We can cope with this problem by first choosing a valid parametric family and then selecting within it the variogram which best fits the data, as described in Cressie (1993) or extended in Shapiro and Botha (1991) to a broad class of valid variograms, not depending on a small number of parameters.
The usage of a parametric estimator may be attractive at first because of its simplicity and validity, although one of its main drawbacks is the procedure followed for the choice of a parametric model, typically addressed through graphical diagnostics, since the shape of some variogram models is similar. An alternative is developed in Gorsich and Genton (2000), based on the fact that, unlike the variogram models, their derivatives are often quite different; hence, an estimation of the first variogram derivative may help to select from different models.
The current study is aimed to obtain global measures for the Nadaraya–Watson variogram deviations, involving the -norm. These results will provide the basis for developing applications, such as that of testing the goodness of fit of a variogram model, which requires solving the general contrast:
In the statistics literature, an approach for the aforementioned goal was proposed in Maglione and Diblasi (2004), specifically designed for gaussian random processes. An extensive option is analyzed in Crujeiras et al. (2010), based on constructing tests for the spectral density of spatial processes observed on a regular grid. However, an alternative methodology could be applied, which tried to mimic the goodness of fit tests suggested for curve estimation with independent data, such as those given in Fan (1994) or Härdle and Mammen (1993), for the density or the regression settings. With this aim, the convergence in distribution of the -deviations of the Nadaraya–Watson estimators of the variogram must be established. In addition, the approximation of the model parameters and the critical points should be addressed, together with the selection of the bandwidth in the variogram estimators or the threshold involved in each of the statistics considered. Then, the adequateness of a parametric variogram model could be tested through the kernel-type estimation and this procedure would be valid for their application to intrinsically stationary random processes, under stochastic sampling design. In the current work, we will focus on the estimation of the model parameters and the resulting critical points, as well as provide some ideas for choice of the other elements, the thresholds and the bandwidth parameters.
For specifying the parameters of the variogram model, different alternatives have been proposed, based on any of the following criteria: maximum likelihood, minimum norm quadratic, minimum variance or least squares; see Cressie (1993), Müller (1999), Stein (1999) and references therein. We suggest proceeding through the least squares approaches, since they require the fewest distributional assumptions about the random process and their consistency follows from the results in Lahiri et al. (2002). To derive our results, we will deal with the general variogram, corresponding to the anisotropic setting, as well as with the specific case of isotropy. Additional tests provided in the literature would allow us to check whether or not the isotropic condition is satisfied, as the one described in Maity and Sherman (2012), aiming us to take a decision about this issue, previously to select the appropriate goodness of fit test to be applied in each case.
Regarding the critical points, they can be approximated from the normal limit distribution that could be established for the functionals considered, although this procedure is not recommended in general due to its slow speed of convergence, particularly when the sample size is small. A second strategy might consist of deriving the aforementioned values from the resulting asymptotic functionals; however, their dependence on unknown terms would involve the approximation of additional parameters and, consequently, could lead to an increment in the errors of the final estimates derived. Therefore, we propose solving this problem by appealing to the resampling techniques. A discussion about the accuracy of the bootstrap approximations for independent random variables can be found in Bose and Babu (1991). With spatial data, the dependence structure of the stochastic process asks for the selection of a bootstrap approach specifically designed for this setting. Under the assumption of knowledge of the underlying distribution of the random process, the parametric bootstrap methodology can be easily extended for its application to the spatial setting, by estimating the unknown parameters of the distribution model from the observed data. In addition, nonparametric resampling approaches could be used, as that introduced in García-Soidán et al. (2014).
This paper is organized as follows. Section 2 introduces the main hypotheses that will be imposed along this work. The properties of the -deviations of the kernel-type variograms are studied in Section 3, where we also address the estimation of the model parameters. Section 4 presents some applications of these properties, focused on checking the validity of a parametric variogram. To illustrate their performance in practice, numerical studies with simulated and real data have been derived, whose results are described in Sections 5 Numerical studies with simulated data, 6 Assessment of the variogram model in practice, respectively. The main conclusions are summarized in Section 7.
Section snippets
Main hypothesis
Suppose that data, , have been collected, at the respective spatial locations . The Nadaraya–Watson semivariogram is a kernel-type estimator obtained as a weighted average of the square differences , given by: where represents a -variate symmetric density and is the bandwidth parameter.
Estimator (2) can be adapted for its specific use in the isotropic setting, leading us
Properties of the -deviations of the kernel variograms
The kernel-type estimation provides consistent estimators for the dependence structure of a stationary random process, as proved in Hall and Patil (1994), by requiring appropriate conditions. With similar arguments, the convergence rates of (2), (3) have been derived in García-Soidán (2007), under hypotheses H1–H7 for equal to 1 and 2, respectively, where even the asymptotic normality of the Nadaraya–Watson semivariograms has been established. The resulting variance of the kernel estimators
Applications
Next, some potential applications of the results provided in Section 3 are outlined, focused on checking the adequateness of a selected variogram. With this aim, the critical points of the resulting tests must be approximated, as we will describe next.
Firstly, as an immediate consequence of Theorem 3.1, Theorem 3.2, the following contrast can be performed: for a fixed parameter , at an approximate level . This issue demands substituting for in , defined in
Numerical studies with simulated data
In this section, we describe the results of the simulation studies developed to illustrate the behavior of our proposals, when applied to check the adequateness of a parametric variogram. With this aim, the statistic given below will be considered: under different scenarios.
Bear in mind that represents the analogue of , obtained by omitting (see Remark 4.1), as well as by considering these terms: a generic weight function , some estimator of the
Assessment of the variogram model in practice
To exemplify the application of our proposals in practice, we have considered the Meuse data set, provided with the statistical package sp of the R library, which is described in Bivand et al. (2008). It contains measures of heavy metal pollutants on the topsoil of a flood plain along the Meuse river, near the village of Stein (Netherlands). Fig. 3 displays a map of the region, taken from Hengl (2009), where 155 spatial locations were considered.
We focused our attention on the concentrations of
Conclusions
This research is focused on deriving the asymptotic distribution of the deviations of the Nadaraya–Watson variograms, for both the anisotropic and the isotropic settings. These results can be applied to construct goodness of fit tests for the dependence structure of an intrinsic random process, although they require the specification of the model parameters. The latter issue has been addressed through the ordinary and the weighted least squares, because we have proved that the statistics
Acknowledgments
The first author’s work has been partially supported by the Spanish National Research and Development Program project [TEC2015-65353-R], by the European Regional Development Fund (ERDF), and by the Galician Regional Government under project GRC 2015/018 and under agreement for funding AtlantTIC (Atlantic Research Center for Information and Communication Technologies).
References (31)
- et al.
The asymptotic distribution of reml estimators
J. Multivariate Anal.
(1993) - et al.
Local linear regression estimation of the variogram
Statist. Probab. Lett.
(2003) - et al.
On asymptotic distribution and asymptotic efficiency of least squares estimators of spatial variogram parameters
J. Statist. Plann. Inference
(2002) - et al.
Breaking the curse of dimensionality in nonparametric testing
J. Econometrics
(2008) - et al.
Testing for spatial isotropy under general designs
J. Statist. Plann. Inference
(2012) Least-squares fitting from the variogram cloud
Statist. Probab. Lett.
(1999)- et al.
Variogram fitting with a general class of conditionally nonnegative definite functions
Comput. Statist. Data Anal.
(1991) - et al.
Cross-validatory bandwidth selections for regression estimation based on dependent data
J. Statist. Plann. Inference
(1998) - et al.
Asymptotic properties of discrete Fourier transforms for spatial data
Sankhyā Ser. A
(2009) - et al.
Applied Spatial Data Analysis with R
(2008)
Accuracy of the bootstrap approximation
Probab. Theory Related Fields
Fitting variogram models by weighted least squares
J. Int. Assoc. Math. Geol.
Statistics for Spatial Data
Goodness-of-fit tests for the spatial spectral density
Stoch. Environ. Res. Risk Assess.
On the estimation of parameters of variograms of spatial stationary isotropic random processes, Research Report 2
Cited by (2)
Spatio-temporal statistical methods in environmental and biometrical problems
2017, Spatial StatisticsSemiparametric Goodness-of-Fit Test for Clustered Point Processes with a Shape-Constrained Pair Correlation Function
2022, Journal of the American Statistical Association