Estimation and prediction of Gaussian processes using generalized Cauchy covariance model under fixed domain asymptotics

We study estimation and prediction of Gaussian processes with covariance model belonging to the generalized Cauchy (GC) family, under fixed domain asymptotics. Gaussian processes with this kind of covariance function provide separate characterization of fractal dimension and long range dependence, an appealing feature in many physical, biological or geological systems. The results of the paper are classified into three parts. In the first part, we characterize the equivalence of two Gaussian measures with GC covariance function. Then we provide sufficient conditions for the equivalence of two Gaussian measures with Mat{\'e}rn (MT) and GC covariance functions and two Gaussian measures with Generalized Wendland (GW) and GC covariance functions. In the second part, we establish strong consistency and asymptotic distribution of the maximum likelihood estimator of the microergodic parameter associated to GC covariance model, under fixed domain asymptotics. The last part focuses on optimal prediction with GC model and specifically, we give conditions for asymptotic efficiency prediction and asymptotically correct estimation of mean square error using a misspecified GC, MT or GW model, under fixed domain asymptotics. Our findings are illustrated through a simulation study: the first compares the finite sample behavior of the maximum likelihood estimation of the microergodic parameter of the GC model with the given asymptotic distribution. We then compare the finite-sample behavior of the prediction and its associated mean square error when the true model is GC and the prediction is performed using the true model and a misspecified GW model.


Introduction
Two fundamental steps in geostatistical analysis are estimating the parameters of a Gaussian stochastic process and predicting the process at new locations. In both steps, the covariance function covers a central aspect. For instance, mean square error optimal prediction at an unobserved site depends on the knowledge of the covariance function. Since a covariance function must be positive definite, practical estimation generally requires the selection of some parametric families of covariances and the corresponding estimation of these parameters.
The maximum likelihood (ML) estimation method is generally considered the best method for estimating the parameters of covariance models. Nevertheless, the study of the asymptotics properties of ML estimation, is complicated by the fact that more than one asymptotic frameworks can be considered when observing a single realization (Zhang and Zimmerman, 2005). The increasing domain asymptotic framework corresponds to the case where the sampling domain increases with the number of observed data and where the distance between any two sampling locations is bounded away from 0. The fixed domain asymptotic framework, sometimes called infill asymptotics (Cressie, 1993), corresponds to the case where more and more data are observed in some fixed bounded sampling domain.
General results for the asymptotics properties of th ML estimator, under increasing domain asymptotic framework and some mild regularity conditions, are given in Mardia and Marshall (1984) and Bachoc (2014). Specifically they show that ML estimates are consistent and asymptotically Gaussian with asymptotic covariance matrix equal to the inverse of the Fisher information matrix.
Under fixed domain asymptotics, no general results are available for the asymptotic properties of ML estimation. Yet, some results have been obtained when assuming the covariance belongs to MT (Matérn, 1960) or GW (Gneiting, 2002) models. Both families allow for a continuous parameterization of smoothess of the underlying Gaussian process the GW family being additionally compactly supported . Specifically, when the smoothness parameter is known and fixed, not all parameters can be estimated consistently, when d = 1, 2, 3, with d the dimension of the Euclidean space. Instead, the ratio of variance and scale (to the power of a function of the smoothing parameter), sometimes called microergodic parameter is consistently estimable. This follows from results given in Zhang (2004) for the MT model and Bevilacqua et al. (2017) for the GW model.
Asymptotic results for ML estimation of the microergodic parameter of the MT model can be found in Zhang (2004), Du et al. (2009), Wang and Loh (2011) when the scale parameter is assumed known and fixed. Kaufman and Shaby (2013) give strong consistency and asymptotic distribution of the microergodic parameter when estimating jointly the scale and the variance parameters and by means of a simulation study they show that the asymptotic approximation is considerably improved in this case, even for large sample size. Similar results for the microergodic parameter of the GW model can be found in Bevilacqua et al. (2017).
In terms of prediction, under fixed domain asymptotic, Stein (1988Stein ( , 1990 provides con-ditions under which optimal predictions under a misspecified covariance function are asymptotically efficient, and mean square errors converge almost surely to their targets. Stein's conditions translates into the fact that the true and the misspecified covariances must be compatible, that is the induced Gaussian measures are equivalent (Skorokhod and Yadrenko, 1973;Ibragimov and Rozanov, 1978). A weaker condition, based on ratio of spectral densities, is given in Stein (1993).
In this paper we study ML estimation and prediction of Gaussian processes, under fixed domain asymptotics, using GC covariance model. GC family of covariance models has been proposed in Gneiting and Schlather (2004) and deeply studied in Lim and Teo (2009). It is particularly attractive because Gaussian processes with such covariance function allow for any combination of fractal dimension and Hurst coefficient, an appealing feature in many physical, biological or geological systems (see (Gneiting et al., 2012) and Gneiting and Schlather (2004) and the references therein).
In particular, we offer the following results. First, we characterize the equivalence of two Gaussian measures with covariance functions belonging to the GC family and sharing the same smoothness parameter. A consequence of this result is that, as in MT and GW covariance models, when the smoothness parameter is known and fixed, not all parameters can be estimated consistently, under fixed domain asymptotics. Then we give sufficient conditions for the equivalence of two Gaussian measure where the state of truth is represented by a member of the MT or GC family and the other Gaussian measure has a GC covariance model.
We then assess the asymptotic properties of the ML estimator of the microergodic parameter associated to the GC family. Specifically, for a fixed smoothness parameter, we establish strong consistency and asymptotic distribution of the microergodic parameter assuming the scale parameter fixed and known. Then, we generalize these results when jointly estimating with ML the variance and the scale parameter.
Finally, using results in Stein (1988) and Stein (1993), we study the implications of our results on prediction, under fixed domain asymptotics. One remarkable implication is that when the true covariance belongs to the GC family, asymptotic efficiency prediction and asymptotically correct estimation of mean square error can be achieved using a compatible compactly supported GW covariance model. The remainder of the paper is organized as follows. In Section 2 we review some results about MT, GW and GC covariance models. In Section 3 we first characterize the equivalence of Gaussian measure under the GC covariance model. Then we give sufficient conditions for the equivalence of two Gaussian measures with MT and GC and two Gaussian measures with GW and GC covariance models. In Section 4 we establish strong consistency and asymptotic distribution of the ML estimation of the microergodic parameter of the GC models, under fixed domain asymptotics. Section 5 discuss the consequences of our results in terms of prediction, under fixed domain asymptotics. Section 6 provides two simulation studies: the first show how well the given asymptotic distribution of the microergodic parameter apply to finite sample cases, when estimating with ML a GC covariance model under fixed domain asymptotics. The second compare the finite-sample behavior of the prediction when using two compatible GC and GW models, when the true model is GC. The final Section provides a discussion on the consequence of our results and open problems for future research.

Matérn, Generalized Wendland and Generalized Cauchy covariance models
This section depicts the main features of the three covariance models involved in the paper.
We denote {Z(s), s ∈ D} a zero mean Gaussian stochastic process on a bounded set D of R d , with stationary covariance function C ∶ R d → R. We consider the family Φ d of continuous with s, s ′ ∈ D, and ⋅ denoting the Euclidean norm. Gaussian processes with such covariance functions are called weakly stationary and isotropic. Schoenberg (1938) characterized the family Φ d as being scale mixtures of the characteristic functions of random vectors uniformly distributed on the spherical shell of R d , with any probability measure, F : with Ω d (r) = r −(d−2) 2 J (d−2) 2 (r) and J ν is Bessel function of the first kind of order ν. The family Φ d is nested, with the inclusion relation Φ 1 ⊃ Φ 2 ⊃ . . . ⊃ Φ ∞ being strict, and where Φ ∞ ∶= ⋂ d≥1 Φ d is the family of mappings φ whose radial version is positive definite on any d-dimensional Euclidean space.
The MT function, defined as: is a member of the family Φ ∞ for any positive values of α and ν. Here, K ν is a modified Bessel function of the second kind of order ν, σ 2 is the variance and α a positive scaling parameter.
We also define Φ b d as the family that consists of members of Φ d being additionally compactly supported on a given interval, [0, b], b > 0. Clearly, their radial versions are compactly supported over balls of R d with radius b. The GW correlation function is defined as Gneiting, 2002): where B denotes the beta function, σ 2 is the variance and β > 0 is the compact support.
Equivalent representations of (2.1) in terms of Gauss hypergeometric function or Legendre polynomials are given in Hubbert (2012). Closed form solutions of integral (2.1) can be obtained when κ = k with k ∈ N, the so called original Wendland functions (Wendland, 1995), and, using some results in Schaback (2011), when κ = k + 0.5, the so called missing Wendland functions.
The parameters ν > 0 and κ ≥ 0 are crucial for the differentiability at the origin and, as a consequence, for the degree of the differentiability of the associated sample paths in the MT and GW models. In particular for a positive integer k, the sample paths of a Gaussian process are k times differentiable if and only if ν > k in the MT case and if and only if κ > k − 1 2 in the GW case.
The smoothness of a Gaussian process can also be described via the Hausdorff or fractal dimension of a sample path. The fractal dimension D ∈ [d, d+1) is a measure of the roughness for non-differentiable Gaussian processes and higher values indicating rougher surfaces. For a given covariance function φ ∈ Φ d if 1 − φ(r) ∼ r χ as r → 0 for some χ ∈ (0, 2] then the sample paths of the associated random process have fractal dimension D = d + 1 − χ 2. Here χ is the so called fractal index that governs the roughness of sample paths of a stochastic process. In the case of a MT model χ = 2ν so D = d + 1 − ν if 0 < ν < 1 and d otherwise (Adler, 1981;Gneiting et al., 2012). Thus the MT model permit the full range of allowable values for the fractal dimension. In the case of GW family χ = 2κ + 1, so that in this case D = d + 0.5 − κ if 0 ≤ κ < 0.5 and d otherwise. Thus the GW model does not allow to cover the full range of allowable values for the fractal dimension.
Long-memory dependence can be defined trough the asymptotic behavior of the covariance function at infinity. Specifically, for a given covariance function φ ∈ Φ d , if the power-law φ(r) ∼ r −ε as r → ∞ holds for some ε ∈ (0, 1) the stochastic process is said to have long memory with Hurst coefficient H = 1 − ε 2. MT and GW covariance models does not posses this feature.
A celebrated family of members of Φ ∞ is the GC class (Gneiting and Schlather, 2004), defined as: where the conditions δ ∈ (0, 2] and λ > 0, γ > 0, σ 2 > 0 are necessary and sufficient for C δ,λ,γ,σ 2 ∈ Φ ∞ . The parameter δ is crucial for the differentiability at the origin and, as a consequence, for the degree of the differentiability of the associated sample paths. Specifically, for δ = 2, they are infinitely times differentiable and they are not differentiable for δ ∈ (0, 2).
The GC family represents a breaking point with respect to earlier literature based on the assumption of self similarity, since it decouples the fractal dimension and the Hurst effect.
Specifically, the sample paths of the associated stochastic process have fractal dimension D = d + 1 − δ 2 for δ ∈ (0, 2) and if λ ∈ (0, 1) it has long memory with Hurst coefficient Thus, D and H may vary independently of each other (Gneiting and Schlather, 2004;Lim and Teo, 2009).
Fourier transforms of radial versions of members of Φ d , for a given d, have a simple expression, as reported in Stein (1999) and Yaglom (1987). For a member φ of the family Φ d , we define its isotropic spectral density aŝ and through the paper we use the notationĈ δ,λ,γ,σ 2 , M ν,α,σ 2 andφ µ,κ,β,σ 2 for the spectral density associated to C δ,λ,γ,σ 2 , M ν,α,σ 2 and ϕ µ,κ,β,σ 2 . A well-known result about the spectral density of the Matérn model is the following: Define the function 1 F 2 as: which is a special case of the generalized hypergeometric functions q F p (Abramowitz and Stegun, 1970), with (q) k = Γ(q + k) Γ(q) for k ∈ N ∪ {0}, being the Pochhammer symbol. The spectral density of ϕ µ,κ,β,σ 2 for κ ≥ 0 are given in Bevilacqua et al. (2017). For instance, if For two given functions g 1 (x) and g 2 (x), with g 1 (x) ≍ g 2 (x) we mean that there exist two constants c and C such that 0 < c < C < ∞ and c g 2 (x) ≤ g 1 (x) ≤ C g 2 (x) for each x.
The next result follows from Lim and Teo (2009) and describe the spectral density of the GC covariance function and its asymptotic behaviour.

Equivalence of Gaussian measures with Generalized Cauchy, Matérn and Generalized Wendland covariance models
Equivalence and orthogonality of probability measures are useful tools when assessing the asymptotic properties of both prediction and estimation for stochastic processes. Denote with P i , i = 0, 1, two probability measures defined on the same measurable space {Ω, F}. P 0 and P 1 are called equivalent (denoted P 0 ≡ P 1 ) if P 1 (A) = 1 for any A ∈ F implies P 0 (A) = 1 and vice versa. On the other hand, P 0 and P 1 are orthogonal (denoted P 0 ⊥ P 1 ) if there exists Gaussian measures are completely characterized by their mean and covariance function.
We write P (ρ) for a Gaussian measure with zero mean and covariance function ρ. It is well known that two Gaussian measures are either equivalent or orthogonal on the paths of {Z(s), s ∈ D} (Ibragimov and Rozanov, 1978).
Let P (ρ i ), i = 0, 1 be two zero mean Gaussian measures with isotropic covariance function ρ i and associated spectral densityρ i , i = 0, 1, as defined through (2.3). Using results in Skorokhod and Yadrenko (1973) and Ibragimov and Rozanov (1978), Stein (2004) has shown that, if for some a > 0,ρ 0 (z)z a is bounded away from 0 and ∞ as z → ∞, and for some finite and positive c, For the reminder of the paper, we denote with P (M ν,α,σ 2 ), P (ϕ µ,κ,β,σ 2 ), P (C δ,λ,γ,σ 2 ) a zero mean Gaussian measure induced by a MT, GW and GC covariance function respectively. The following Theorem is due to Zhang (2004). It characterize the compatibility of two MT covariance models sharing a common smoothness parameter ν. (3. 2) The following Theorem is a generalization of Theorem 4 in Bevilacqua et al. (2017) and it characterize the compatibility of two GW covariance models sharing a common smoothness parameter κ. We omit the proof since the result can be obtained using the same arguments.
Theorem 3. For a given κ ≥ 0, let P (ϕ µ i ,κ,β i ,σ 2 i ), i = 0, 1, be two zero mean Gaussian The first relevant result of this paper concerns the characterization of the compatibility of two GC functions sharing a common smoothness parameter.
Proof. Let us start with the sufficient part of the assertion. From Theorem 1 point 3, we know that z δ+dĈ δ,λ 0 ,γ 0 ,σ 2 0 (z) is bounded away from 0 and ∞ as z → ∞. In order to prove the sufficient part, we need to find conditions such that for some positive and finite c, We proceed by direct construction, and, using Theorem 1 Point 2 we find that as z → ∞, , with i = 0, 1.
Moreover since δ < 2, the condition δ > d 2 can be satisfied only for d = 1, 2, 3. The sufficient part of our claim is thus proved. The necessary part follows the arguments in the proof of Zhang (2004).
An immediate consequence of Theorem 4 is that, for a fixed δ ∈ (d 2, 2), the parameters λ, γ and σ 2 cannot be estimated consistently. Nevertheless the microergodic parameter σ 2 λ γ δ is consistently estimable. In Section 4, we establish the asymptotic properties of ML estimation associated to the microergodic parameter of the GC model.
The second relevant result of this paper give sufficient conditions for the compatibility of a GC and a MT covariance model.
Remark I: As expected, compatibility between GC and MT covariance models is achieved only for a subset of the parametric space of ν that leads to non differentiable sample paths and in particular for d 4 < ν < 1, d = 1, 2, 3.
The following are sufficient conditions given in Bevilacqua et al. (2017) concerning the compatibility of a MT and a GW covariance models.
Remark II: As expected, compatibility between GC and GW covariance models is achieved only for a subset of the parametric space of κ that leads to non differentiable sample paths and in particular 0 ≤ κ < 1 2, d = 1, 2 and 1 4 ≤ κ < 1 2, d = 3.

Asymptotic properties of the ML estimation for the Generalized Cauchy model
We now focus on the microergodic parameter σ 2 λ γ δ associated to the GC family. The following results fix the asymptotic properties of its ML estimator. In particular, we shall show that the microergodic parameter can be estimated consistently, and then assess the asymptotic distribution of the ML estimator.
Let D ⊂ R d be a bounded subset of R d and S n = {s 1 , . . . , s n ∈ D ⊂ R d } denote any set of distinct locations. Let Z n = (Z(s 1 ), . . . , Z(s n )) ′ be a finite realization of Z(s), s ∈ D, a zero mean stationary Gaussian process with a given parametric covariance function σ 2 φ(⋅; τ ), with σ 2 > 0, τ a parameter vector and φ a member of the family Φ d , with φ(0; τ ) = 1.
In order to prove consistency and asymptotic Gaussianity of the microergodic parameter, we first consider an estimator that maximizes (4.1) with respect to σ 2 for a fixed arbitrary scale parameter γ > 0, obtaining the following estimator Here R n (γ) is the correlation matrix coming from the GC family C 1,λ,δ,γ . The following result offers some asymptotic properties of ML estimator of the migroergodic parameterσ 2 n (γ)λ γ 2δ both in terms of consistency and asymptotic distribution. The proof is omitted since it follows the same steps in Bevilacqua et al. (2017) and Wang and Loh (2011).
Remark III: The results in Theorem 9 are restricted to the case d = 1 and δ = λ ∈ (1 2, 1]. In order to generalize this result we need to find more general conditions in Lemma 1. Specifically we need conditions on λ and δ such that γ δĈ σ 2 ,λ,δ,γ (z) is a non-decreasing function of γ. Unfortunately the problem is not simple and we are studying the solution in an ongoing paper. The same problem associated to the GW family has been solved in Porcu et al. (2017).
Theorem 10 is still valid interchanging the role of the correct model with the wrong model. For instance point 3 can be rewritten as follows.

Simulations and illustrations
The main goals of this section are twofold: on the one hand, we compare the finite sample behavior of the ML estimation of the microergodic parameter of the GC model with the asymptotic distributions given in Theorems 8 and 9. On the other hand, we compare the finite sample behavior of MSE prediction of a zero mean Gaussian process with GW covariance model, using a compatible GC covariance model (Theorem 11).
For the first goal we have considered 4000 points uniformly distributed over [0, 1] and then we randomnly select a sequence of n = 500, 1000, 2000, 4000 points. For each n we simulate using Cholesky decomposition and then we estimate with ML, 500 realizations from a zero mean Gaussian process with GC model. For the GC covariance model, C δ,λ,γ 0 ,σ 2 0 we fix σ 2 0 = 1 and in view of Theorem 9, we fix δ = λ = 0.75. Then we fix γ 0 such that the practical range of the GC models is 0.3, 0.6 and 0.9 that is γ 0 = 0.0059, 0.01183, 0.0178 respectively. For a given correlation, with practical range x, we mean that the correlation is approximatively lower than 0.05 when r > x.
For each simulation, we consider δ and λ as known and fixed, and we estimate with ML the variance and scale parameters, obtainingσ 2 i andγ i , i = 1, . . . , 1000. In order to estimate, we first maximize the profile log-likelihood (4.3) to getγ i . Then, we obtainσ 2 where z i is the data vector of simulation i. Optimization was carried out using the R (R Development Core Team, 2016) function optimize where the parametric space was restricted to the interval [ε, 10γ 0 ] and ε is slightly larger than machine precision, about 10 −15 here.
As expected, the best approximation is achieved overall when using the true scale parameter, i.e., x = γ 0 . In the case of x =γ i , the sample distribution converge to the the asymptotic distribution given in Theorem 9 when increasing n, even if the convergence seems to be slow. Note that, for a fixed n, when increasing the practical range the convergence to the standard Gaussian distribution is faster. In particular, for n = 4000 and practical range equal to 0.9 the asymptotic distribution given in Theorem 9 is a satisfactory approximation of the sample distribution. When using scale parameters that are too small or too large with respect to the true compact support (x = 0.75γ 0 , 1.25γ 0 ), the convergence to the asymptotic distribution given in Theorem 8 is very slow. These results are consistent with Kaufman and Shaby (2013) and Bevilacqua et al. (2017) and when generating confidence intervals for the microergodic parameter we strongly recommend jointly estimating variance and compact support and using the asymptotic distribution given in Theorem 9.
For instance if δ = 1.2 and γ 1 is such that the practical range is equal to 0.3 then β * 1 = 0.204. Figure 1, top left part, compares the GW and GC covariance model in this case. The right part compares the GW and GC covariance model under the same setting but with δ = 1.8.
In Figure 1,  model (dotted line). In both cases γ 1 is chosen such that the practical range is 0.3 and β * 1 is computed using the equivalence condition. Bottom part: two realizations from two Gaussian random process with covariances as shown in top left part (C 1.2,5,γ 1 ,1 on the left and ϕ 2.1,0.1,β * 1 ,1 on the right).
performed using cholesky decomposition and they share the same Gaussian simulation. It is apparent that the two realizations look very similar.
It is interesting to note that the speed of convergence is clearly affected by the magnitude of δ. In particular for δ = 1.8 the convergence of both ratios is slower, in particular for U 1 (xβ * 1 ), x = 1, 0.5, 2. For instance when the practical range is equal to 0.3, n = 1000 is not sufficient to attain the convergence forŪ 1 (xβ * 1 ), x = 1, 0.5, 2.

Concluding Remarks
In this paper we studied estimation and prediction of Gaussian processes with covariance models belonging to the GC family, under fixed domain asymptotics. Specifically, we first characterize the equivalence of two Gaussian measures with CG models and then we estab- We then give sufficient conditions for the equivalence of two Gaussian measures with GW and GC model and two Gaussian measures with MT and GC model and we study the consequence of these results on prediction under fixed domain asymptotics.
One remarkable consequence of our results on optimal prediction is that the mean square error prediction of a Gaussian process with a GC model can be achieved using a GW model under suitable conditions.
Then, under fixed domain asymptotics, a misspecified GW model can be used for optimal prediction when the true covariance model is GC or MT . GW is an appealing model from computational point of view since the use of covariance functions with a compact support, leading to sparse matrices as in the covariance tapering (Furrer et al. (2006), Kaufman et al. (2008)), is a very accessible and scalable approach and well established and implemented algorithms for sparse matrices can be used when estimating the covariance parameters and/or predicting at unknown locations (e.g., Furrer and Sain (2010)).
As outlined in Section 1, the parameter δ is crucial for the differentiability at the origin and, as a consequence, for the degree of differentiability of the associated sample paths.