Eﬀective sample size for spatial regression models

: We propose a new deﬁnition of eﬀective sample size. Although the recent works of Griﬃth (2005, 2008) and Vallejos and Osorio (2014) provide a theoretical framework to address the reduction of information in a spatial sample due to spatial autocorrelation, the asymptotic properties of the estimations have not been studied in those studies or in previously ones. In addition, the concept of eﬀective sample size has been developed primarily for spatial regression processes with a constant mean. This pa- per introduces a new deﬁnition of eﬀective sample size for general spatial regression models that is coherent with previous deﬁnitions. The asymp- totic normality of the maximum likelihood estimation is obtained under an increasing domain framework. In particular, the conditions for which the limiting distribution holds are established for the Mat´ern covariance fam-ily. Illustrative examples accompany the discussion of the limiting results, including some cases where the asymptotic variance has a closed form. The asymptotic normality leads to an approximate hypothesis testing that es- tablishes whether there is redundant information in the sample. Simulation results support the theoretical ﬁndings and provide information about the behavior of the power of the suggested test. A real dataset in which a transect sampling scheme has been used is analyzed to estimate the eﬀective sample size when a spatial linear regression model is assumed.


Introduction
This paper introduces a new definition of effective sample size for spatial regression processes. By definition (Cressie, 1993, p. 16), the effective sample size (ESS), which in this paper will be denoted as n * , is the number of equivalent independent observations associated with a spatial sample of size n. The correct computation of the ESS is important due to the possible existing duplicated information in the data, and it has many implications for the subsequent analysis and inference of spatial models (Clifford et al., 1989;Dutilleul, 1993;Dutilleul et al., 2008). A similar problem has been reported by Box (1954a,b) for approximating the distribution of a quadratic form. This approach was later used by Clifford et al. (1989) to assess the spatial association between two spatial processes. Directly related to the content of this paper is the work of Griffith (2005), who defined the effective sample size for georeferenced data for simple and multiple means and applied the methodology to soil samples in Syracuse, NY (Griffith, 2008). Alternatively, Faes et al. (2009) proposed another way of computing the ESS based on the Fisher information quantity, which is applicable to linear models with replicates. In a Bayesian context, model selection procedures often depend explicitly on the sample size of the experiment. Extensions of the Bayesian information criteria (BIC) for non-iid vectors require a definition of effective sample size that also applies in such cases (Berger et al., 2014). Although discoveries about ESS are still disperse in the literature, standard books on spatial statistics have mentioned the problem of how many uncorrelated samples provide the same precision as correlated observations (Cressie, 1993;Haining, 1990;Schabenberger and Gotway, 2005;Griffith and Paelinck, 2011;Dale and Fortin, 2009). Some applications involving the computation of the ESS can be found in Thiébaux and Zwiers (1984); de Gruijter and ter Braak (1990); Cogley (1999). Practical guidelines for ESS determination can be found in Lenth (2001).
Any given formula to quantify the reduction of information from a spatial sample should depend on the covariance matrix. Consequently, the estimation of the ESS and posterior asymptotics rely on the estimation of the covariance matrix, which is a subject that has been extensively studied in the literature; however, in the context of spatial statistics, there are no formal treatments regarding the estimation and limiting properties of the ESS. Vallejos and Osorio (2014) proposed a definition of ESS and developed several examples to show the behavior of such a definition. The restricted maximum likelihood (REML) estimation of the ESS was addressed as an application of the conditions stated in Cressie and Lahiri (1993) for the estimation of the variance components of a spatial linear process. If the random field Y (s) : s ∈ D ⊂ R d has been observed on the set of n spatial sites D n = {s 1 , s 2 , . . . , s n } ⊂ D, then a model of the form where Y = (Y (s 1 ), . . . , Y (s n )) , = ( (s 1 ), . . . , (s n )) ∼ N (0, R(θ)), θ ∈ R q , 1 = (1, . . . , 1) , and μ ∈ R, was assumed to define the ESS as n * = 1 R(θ) −1 1, (1.1) which is the Fisher information quantity with respect to μ.
Here, we consider a regression of a response Y given the predictors x j = (x j (s 1 ), x j (s 2 ), . . . , x j (s n )) , j = 1, . . . , p, such that X = (x 1 , x 2 , . . . , x p ), and where β is a vector parameter of size p. The information matrix of β in (1.2) is I(β) = X R(θ) −1 X. To obtain a formula for the effective sample size of Y , several issues should be addressed in advance, for example, how to reduce the Fisher information matrix about β to a single number, lying on the interval [1, n], which is a desirable property among others established by Vallejos and Osorio (2014). One way to generalize (1.1) is to consider a weighted version of the ESS of the form where v i are suitable weights to be determined. Combining Equations (1.1) and (1.3), we can build a definition of the ESS of Y through n * = n · tr(X R(θ) −1 X) tr(X X) , (1.4) where tr(·) denotes the trace operator. This definition is consistent with that given in Vallejos and Osorio (2014) in the sense that for X = 1, we recover (1.1), but for X = (1, x 2 ) and X = (1, x 3 ), with x 3 = kx 2 and k ∈ R \ {0}, the effective sample size (1.4) for X and X respectively are and it is affected by the scales of the variables because n * 1 = n * 2 if and only if k 2 = 1. In addition, the simple intraclass and Toeplitz correlation structures respectively described by R(ρ) = (1−ρ)I+ρJ with J = 11 , −1/(n−1) < ρ < 1, and R(ρ) ij = ρ |i−j| , if i = j, or R(ρ) ij = 1, otherwise, for |ρ| < 1, (correlation function of an AR(1) process) show a nondecreasing pattern as a function of ρ, when x 2 is a random sequence in the [0, 1], as shown in Figure  1. The behavior of the effective sample size is similar for other choices of x 2 . This is a considerable difference from the constant mean case, in which the patterns of both correlation structures decrease for increasing values of ρ ∈ [0, 1].
On the basis of model (1.2), we first provide a new definition of effective sample size as a function of the correlation structure of a process that does not depend on the scale of the variables. For most correlations used in spatial statistics, the ESS decreases when the correlation increases. We then present several examples and preliminary results that support the given definition. In particular, the problem of computing the ESS for variables that are normal after applying a logarithm transformation is addressed via Taylor approximations of the elements of the correlation structure. We next turn to studying the limiting distribution of the maximum likelihood (ML) estimator of θ under an increasing domain framework. The conditions given by Mardia and Marshall (1984) to obtain the asymptotic distribution of the ML estimator of θ are established, and the limiting distribution of the ESS is obtained using the delta method. In particular, these conditions are particularized for a Gaussian random field with the Matérn covariance, as well as for CAR and directional CAR processes. The asymptotic variance in each case is derived. An approximate hypothesis testing for the null hypothesis H 0 : n * = n is developed. Although under a suitable definition of ESS, if Σ(θ) = σ 2 I, n * = n, this type of test does not exactly correspond to the test of the form Σ(θ) = σ 2 I, and cannot be compared with the existing proposals, mainly due to the lack of replicates in the spatial case. However, for the alternative hypothesis of the form H 1 : n * < n, the procedure admits testing for information reduction due to the presence of autocorrelation. Simulation studies and a real data example support the theoretical findings and provide a closer insight into the estimation of the effective sample size. The broader impacts resulting from this research lie in its contributions to understanding the reduction of information due to the effect of autocorrelation in a spatial statistics context.
The remainder of this paper is organized as follows. In Section 2, we present the definition of effective sample size for spatial regression models accompanied by several illustrative examples for models commonly used in spatial statistics. More examples to illustrate how the ESS looks like for SAR, multivariate CAR, and partitioned normal models are provided in Appendix A. In Section 3, we present three propositions that characterize the ESS. In Section 4, we develop the asymptotics for the ML estimation of the ESS for an increasing domain. In Section 5, the hypothesis testing for H 1 : n * < n is presented with illustrative examples. The performance of the asymptotic methods is ex- plored via two Monte Carlo simulation studies in Section 6. The estimation of the effective sample size for a macroalgae dataset collected off the central coast of Chile is discussed in Section 7. We conclude the paper with a discussion in Section 8. We relegate all technical proofs to Appendix B. The programs that were used to analyze the data and the dataset can be obtained from http://spatialpack.mat.utfsm.cl.

The effective sample size
To avoid the scale problem mentioned in Section 1, we consider a spatial regression model such as (1.2) with X = (1, x 2 , . . . , x p ) such that x j x j = n for all j = 2, . . . , p , where x j = (x j (s 1 ), x j (s 2 ), . . . , x j (s n )) . Then, the effective sample size of Y is defined as This definition is consistent with that given in Vallejos and Osorio (2014) because if X = 1 in (2.1), n * = 1 R(θ) −1 1. If R(θ) = I, then n * = n. Now, we explore how the ESS appears for different correlation structures.
. Because x j x j /n − x 2 j ≥ 0 and x j x j = n for all j = 2, . . . , p, we have that 0 ≤ 1 − x 2 j ≤ 1, therefore 0 ≤ a ≤ p − 1 < p. Expanding (2.2) in term of a we obtain, Clearly, n * IC in (2.3) is a convex function, if and only if, −1/(n − 1) < ρ < 1. Moreover, n * IC is the sum of two convex functions defined in the interval (−1/(n − 1), 1), one strictly increasing function tending to infinity when ρ → 1 − , and another strictly decreasing function tending to infinity when ρ → −1/(n − 1) + . Thus, there exist a unique critical point that can be obtained by solving the equation in the interval (−1/(n − 1), 1), which is a global minimum of n * IC given by Effective sample size for spatial regression models 3153 Example 2. Suppose that Y ∼ N (Xβ, R(ρ)), where R(ρ) is the AR(1) correlation structure, and that the columns of X have an Euclidean norm equal to √ n. Then, In order to prove that n * AR in (2.6) is convex, note that by using the triangular and Young's inequality we have that a+b ≥ a−|b|, a−b ≥ a−|b| and a−|b| ≥ −c. Indeed, the latter inequality is coming because Then a + b + c ≥ 0 and a − b + c ≥ 0. Therefore, the function (4(n − 1) is convex if and only if ρ < 1. In consequence, n * AR is a convex function if only if −1 < ρ < 1. Moreover, n * AR corresponds to the sum of two convex functions defined in the interval (−1, 1), similarity to case developed in Example 1, there exist only one critical point of n * AR in the interval (−1, 1) corresponding to a global minimum, that can be obtained by solving the equation A simple calculation shows that If b = −2(n − 1), then the only solution of (2.7) in the interval (−1, 1) is , the only solution of (2.7) in the interval (−1, 1) is ρ 0 = 0, and thus n * AR (0) = n. This means that the lowest possible value for n * AR is n. However, this case is valid for very specific covariates. In fact, consider X = (1, x 2 ), and A model for which the effective sample size is greater than n is described in Equation (3.2).
The ESS for some covariance structures is explored next.

Model
Correlation function can be any of the correlation functions listed in Table 1). Then, the elements of R(θ) are For illustrative purposes, we plotted the ESS versus φ for n = 100, where the sites s 1 , s 2 , . . . , s 100 were generated from uniform random variables in the region [0, 1] × [0, 1], while σ 2 = 1 and τ 2 = 0.1. Figure 2 shows that the ESS decays as a function of φ ∈ (0, 1), as expected.  Table 1.
The ESS for other related processes is provided in Appendix A.
We want to emphasize that the examples described above are strongly dependent on the knowledge we have about the covariance parameters. This has been addressed in Vallejos and Osorio (2014). As an illustration, consider the intraclass correlation model of the form where Z 0 is a random variable with variance ρ, Z(s) is a sequence of noncorrelated (pure nugget) variables with variance equal to (1 − ρ), and Z 0 and Z(s) are independent. To get any information about μ it is necessary to have some knowledge about the variance of Z 0 . Moreover, the estimation of μ is dispairing when model (2.9) has an additional variance parameter σ 2 .

Preliminary results
From definition (2.1), the following propositions for the ESS can be obtained.

Proposition 1.
Let R(θ), θ ∈ R q be a correlation matrix, and X be a matrix of order n × p as in (2.1), n ≥ 1. Then, n * > 1.
There is no upper bound for the ESS in the general case, although computational experiments provide empirical evidence in favor of the upper bound n. The next result provides the maximum number of covariates allowed in a spatial regression process for the intraclass correlation structure, and it also provides an upper bound for the ESS.
An immediate consequence of Proposition 2 is the equivalence which provides a range for the correlation for a fixed number of covariates to include in a regression model. For instance, if p = 2, ρ ≤ 1/2−1/(2(n−1)) → 1/2 when n → ∞. This helps to explain the behavior shown in Figure 1(a). It should be stressed that there are examples where the effective sample size is greater than n. Let us consider the model where R(ρ) is the AR(1) correlation structure, X = (1, x 2 ) with covariate x 2 = (x 1 , . . . , x n ) , and x i = (−1) i . Then the constants used in Example 2 are a = n−2, b = −2(n−1), and c = n. Therefore, a+b+c = 0, a−b+c = 4(n−1), and by Equation (2.6) Because in this case b = −2(n − 1), from Example 2, we have that n * AR ≥ n, for all ρ ∈ (−1, 1); and n * AR = n, if and only if, ρ = 0. For models with constant mean there are specific correlation structures (with negative autocorrelation) for which the effective sample size is greater than n (Richardson, 1990). For models with non-constant mean, the example describe in (3.2) alert us that the properties of the effective sample size also depend on the structure of the covariates.
In practice, the normality assumption could be not satisfied. One way to address this problem is a transformation approach. Box and Cox (1964) proposed the transformation where Y (·) is the original variable and δ = (δ 1 , δ 2 ) is an unknown parameter vector to be estimated to achieve normality of Z(·). Given the vector of spatial observations (Z(s 1 ), . . . , Z(s n )) , δ can be estimated by maximizing the likelihood function Alternatively, an estimation for δ can be obtained by finding the optimal value that maximizes the correlation between Φ −1 ((i − 0.5)/n) and Z (i) , where Φ −1 is the inverse of the cumulative distribution function of Z(s i ) and Z (i) is the order statistic associated with Z(s i ), for i = 1, . . . , n (Kutner et al., 2004).
Observe that R Y = K m R Z , where denotes the Hadamard product and K m is the matrix with elements k ij . In particular, K 1 = 11 . Thus, for m = 1,

Estimation and asymptotics under increasing domain
It is assumed that D n is a non-random set satisfying s − t ≥ γ > 0 for all s, t ∈ D n . This ensures that the sampling set is increasing as The estimation of θ and β can be made by ML estimation, maximizing be the gradient vector and Hessian matrix, respectively, obtained from (4.1).
n ] be the Fisher information matrix with respect to β and θ.
For a twice differentiable covariance function σ(·, ·; θ) on Θ with continuous second derivatives, Mardia and Marshall (1984) provided sufficient conditions on Σ and X such that the limiting distribution of (β , θ ) is normal as is stated in the following result. Theorem 1. Let λ 1 ≤ · · · ≤ λ n be the eigenvalues of Σ, and let those of Σ i = ∂Σ ∂θ i and Σ ij = ∂ 2 Σ ∂θ i ∂θ j be λ i k and λ ij k , k = 1, . . . , n, such that |λ i 1 | ≤ · · · ≤ |λ i n | and |λ ij 1 | ≤ · · · ≤ |λ ij n | for i, j = 1, · · · , q. Suppose that as n → ∞ For the application of Theorem 1, the matrix X can be chosen in such a way that condition (iv) is satisfied. When the process Y (·) has a stationary covariance function on R d with σ(s i , s j ; θ) = σ(s i − s j ; θ) and D n represents a regular but not necessarily rectangular grid with a fixed distance between any pair of locations, conditions (i) and (ii) are satisfied if σ k , σ k,i and σ k,ij are absolutely summable over Z d for all i, j = 1, . . . , p, where σ k , k ∈ Z d , is the covariance for lag k = (k 1 , . . . , k d ) of the lattice, σ k,i = ∂σ k /∂θ i , and σ k,ij = ∂ 2 σ k /∂θ i ∂θ j (Mardia and Marshall, 1984). These conditions are established next for a particular parametrization of the Matérn covariance function.
Theorem 2. Let Y (s) : s ∈ D ⊂ R d be a covariance stationary Gaussian process sampled on a regular lattice of fixed spacing, with the Matérn covariance Next, we establish the asymptotic normality of the ML estimate for the effective sample size (2.1). (2.1), and let θ be the ML estimator of θ as in Theorem 1.

Theorem 3. Let X be a normalized design matrix as in Equation
Let Y (s) : s ∈ D ⊂ R d be a random field observed on n sites on a space such that E[Y ] = Xβ. Assume that there are two valid covariance functions denoted as σ 1 (·, ·, θ 1 ), θ 1 ∈ R q1 and σ 2 (·, ·, θ 2 ), θ 2 ∈ R q2 . Then, we can state the following result.
It is not immediate to establish condition (iii) of Theorem 1. It is sufficient to prove that lim n→∞ |A n | = 0, where A ll is obtained from θ l , l = 1, 2, while A 12 is obtained from tr(Σ −1 Σ i Σ −1 Σ j ) for i = 1, . . . , q 1 and j = q 1 +1, . . . , q 1 +q 2 . By hypothesis lim n→∞ |A ll | = a l = 0, l = 1, 2. It follows that However, it is not straightforward to establish any of these conditions for a general case.

Hypothesis testing
As a consequence of the limiting distribution of the ESS established in Theorem 3, an approximate hypothesis testing for the ESS is constructed. Consider the null hypothesis H 0 : g(θ) = n 0 , versus one of the following three alternative hypotheses H 1 : g(θ) = n 0 , H 1 : g(θ) < n 0 or H 1 : g(θ) > n 0 , where 1 < n 0 < n. When n 0 = n, this hypothesis testing problem is relevant because it leads to the case when Σ = σ 2 I. For the Matérn covariance function, the parameter φ controls the range of spacial association such that if φ = 0, then Σ = σ 2 I. Thus, the test about the parameter φ, H 0 : φ = 0 versus H 1 : φ > 0 implies the test about the effective sample size H 0 : g(θ) = n versus H 1 : g(θ) < n. For a given size α, the critical regions C α for the three alternative hypotheses are respectively where z α denotes the upper quantile of the standard normal distribution and σ 2 g = var(g(θ)). The following two examples illustrate the computation of σ 2 g for a Gaussian random field with the exponential covariance function and for the CAR and DCAR processes.

Numerical experiments
In this section, we conducted two Monte Carlo simulation studies. The first one is designed to observe the performance of the ML estimates of the ESS for different correlation structures and sample sizes. The second study is dedicated to exploring the power function of the test stated in Section 5 as a function of the sample size. The first experiment involves the generation of 500 replicates from a Gaussian random field, sampled on a regular lattice of size n = r × r, for r = 8, 16, 32, where the sites s 1 , . . . , s n are of the form s i = (x i , y i ). Each replicate was generated from the model where cov(ε(s i ), ε(s j )) = σ 2 ρ( s i − s j , φ) for four correlation functions ρ(·, φ) given in Table 1, with β 0 = 2, β 1 = 0.1, σ 2 = 2, and φ = 1. The spherical correlation was not included because it does not satisfy the conditions of Theorem 1. Table 2 displays the simulation estimates and mean square error (in parentheses) for the parameters β 0 , β 1 , σ 2 and φ. In addition, n * , n * /n and its estimates were included to make the estimates comparable when n varies. n * /n represents the proportion of equivalent independent observations associated with a spatial sample of size n. There is empirical evidence to support the consistency of the estimates of n * and n * /n. In practice, the size of the grid Table 2 Monte Carlo estimates based on 500 replicates. The true parameters are β 0 = 2, β 1 = 0.1, σ 2 = 2 and φ = 1, and the numbers in brackets are the root mean square error of the simulation estimates.
Covariance n n * n * /n β0 β1 σ 2 φ n * n * /n could be considerably large. We explore this by performing the same simulation experiment for r = 64, 128, and 256 (not shown here), obtaining the same results with respect to the consistency of the estimates; however, the computational cost have increased considerably. In Figure 3, we show the average user time that the system takes to compute the ML estimate of the ESS based on 100 replicates. A breaking point in the slope of the curves for n ≥ 40 × 40 is observed regardless of the type of covariance function that is used. In the Gaussian case, there are 32 observations for which the user time was greater than three hours for the gird sizes 40 × 40 and 48 × 48, making it not comparable with the other covariance functions. In practice, the computation of the effective sample size can be performed on a regular computer for grid sizes of less than 32 × 32.
Our simulation experiments were developed using a server HP ProLiant DL360 Gen9 E5-2630v3 2.4 GHz 8-core 1P 16 GB-R P440ar 500 W PS Base SAS. The second simulation study is designed to explore the power of the hypothesis testing problem H 0 : n * = n versus H 1 : n * < n. The true value of the n * is denoted by n * 0 . Based on 500 replicates of a Gaussian random field, the empirical power function was computed for r = 8, 16, 32. For r = 8, n * 0 = 38.67, 60.41, 20.69, 15.26, for r = 16, n * 0 = 142.02, 240.02, 67.82, 45.84, and for r = 32, n * 0 = 142.02, 240.02, 67.82, 45.84, for the exponential, Gaussian and Matérn (m = 1, 2) covariance functions, respectively. In contrast to a usual hy- pothesis testing, the value of n * 0 is variable. The effect of this behavior is shown in Figure 4. For large values of n, the power function become small, regardless of the covariance structure that is used. In all cases, the Gaussian covariance function has the worst behavior in terms of power, whereas the Matérn model with m = 2 has the best.

Real data analysis
We analyzed the macroalgae dataset from Acosta et al. (2016), which consists of 427 observations related to the density of Lessonia trabeculata (scientific name of macroalgae in the study) per 20 m 2 . This dataset was collected by the Fisheries Research Institute (IFOP in Spanish) in a protected area near Quintero, Chile. In Acosta et al. (2016), the effective sample size computations were obtained under the assumption of a constant mean Gaussian process using 26 perpendicular line transects to the coastline to study those species located at a depth of no more than 20 m because the experts are knowledgeable of the existence of species with different sizes. In addition to the density, the depth was measured for 144 georeferenced sites (see Figure 5). We examined the effective sample size for the subset of 144 sites for which both the density and depth are available. Acosta et al. (2016) found a high asymmetry (Fisher asymmetry was applied to the original dataset, where δ is the value that maximizes the correlation (0.9878) between the percentiles of the normal distribution and the order statistics Z (i) , obtaining δ = 0.588 (Acosta et al., 2016, Appendix 2). The Fisher asymmetry coefficient for the 144 observations is −0.00176.
Let Z(·) be a random field and assume that Z = (Z(s 1 ), . . . , Z(s 144 )) is a Gaussian random vector such that E[Z] = Xβ, where X is the design matrix and is one of the functions presented in Table 1. A preliminary model might be x i /n,ȳ = y i /n and d(s i ) is the depth at the site s i , i = 1, . . . , 144. For each of the covariance structures listed in Table 1, the ML estimates of the parameters were computed, obtaining that the overall F test led us to a significant model with a p-value in each case of less than 0.05. However, the t-tests on single regression coefficients yielded nonsignificant estimates for β 1 and β 2 and significant estimates for β 3 , regardless of the covariance function used. Then, we consider the reduced trend model of the form E[Z(s i )] = β 0 + β 1 d(s i ). (7.1) The overall F test, AIC coefficient, mean square error, n * and the ML estimates for the parameters of model (7.1) are presented in Table 3. The Gaussian covariance structure minimized the AIC and the MSE. For this case, the effective sample size of the transformed variable is 27.5, which is equal to the estimate obtained by using the Taylor approximation of order 1. In Acosta et al. (2016), the obtained ESS value was 66.1 for the Gaussian covariance, but the sample size in that case was 427; therefore, the results are not comparable. The modeling of the density of Lessonia trabeculata constitutes a valuable advance in the design of protocols in IFOP that can be applied in the future when they plan to collect new datasets in the same study area for this or other variables of interest, under similar conditions to those described in this paper, where the inclusion of covariates plays an important role when constructing the spatial regression model to be used.

Discussion
This paper extended the known methodology for effective sample size computations for general spatial regression models. The approach, equipped with powerful computational machinery, is appropriate for large spatial datasets, and it provides formulas for a number of spatial processes. We conjecture that the n * in (2.1) is increasing in n. This is suggested by a comprehensive simulation study that we performed. However, at this stage, we can only provide a proof with additional hypotheses.
The definition of ESS can easily be extended to more general models. For instance, for a nonlinear model of the form (Golub and Pereyra, 1973) Y = X(α)β + ε, (8.1) where ε ∼ N (0, R(θ)), β ∈ R p , α ∈ R r is a conditionally linear parameter (Soo and Bates, 1992), X(·) is normalized by columns up to √ n design matrix and R(·) is as before, it is straightforward to obtain where X i (α) = ∂X(α) ∂αi , i = 1, . . . , r. After simple algebra, we have that where n * β = tr X (α)R −1 (θ)X(α) /p and which correspond to the effective sample size associated with the linear and conditionally linear parameters respectively, and λ = np/(np + r i=1 X i (α)β 2 ). The asymptotic results provide support for the inference when using the ML estimates for processes defined on rectangular grids. The conditions of Theorem 1 are not easy to establish for a particular covariance function. The results and examples developed here highlight the limitation of the technique. For a nonlinear model as in Equation (8.1), the conditions of Theorem 1 are not sufficient to guarantee the normality assumption of the ML estimate of (β , α , θ ) . A hypothesis ensuring that the Fisher information matrix converges to a nonsingular matrix is needed. Such a condition is assumed for the Fisher information matrix associated with the mean of a nonlinear model in Crujeiras and Van Keilegom (2010).
The hypothesis testing discussed in Section 5 provides a framework for testing whether there is redundant information in the spatial sample of size n due to autocorrelation. Although the test is simple, for more sophisticated models, the computation of the asymptotic variance is a challenging problem.
We make specific recommendations regarding the calculation of the effective sample size in practice.
(a) The computation of the ESS requires the estimation of the covariance function. To select a good isotropic model, the classical cross-validation techniques might be used to choose from among the competing models. However, Stein (1999) noted that making parametric estimates of semivariograms match the empirical one is a considerable flaw in classical geostatistics (Gelfand et al., 2010). (b) At the first stage, we suggest addressing the hypothesis testing problem H 0 : n * = n H 1 : n * < n, to elucidate whether there is a reduction of information. Then other values for n 0 (e.g., n/2 or n/3) can be used to quantify the percentage of reduction with respect to the original sample size. For a single mean spatial regression model, the ESS might be greater than n when there is negative autocorrelation (Richardson, 1990). Hence, the alternative hypothesis H 1 : n * > n is not very meaningful as a description of a conventional situation in spatial statistics.

A.1. ESS for SAR models
Consider an SAR model of the form where B is a fixed matrix of spatial interactions, the columns of X have been normalized, E[ν] = 0, and Σ ν = diag(σ 2 1 , . . . ,σ 2 n ). Then, ii is the ii-th element of Σ. If B = ρW , where W is any contiguity matrix and Σ ν = σ 2 I, the effective sample size is n * SAR is a quadratic function of ρ, where the coefficient of ρ 2 is positive; thus, n * SAR is decreasing when ρ ≤ ρ 0 , where ρ 0 is given by .

A.2. ESS for multivariate CAR models
Suppose that Φ = (Φ 1 , Φ 2 , . . . , Φ n ), where each Φ i is a matrix of size m × 1. A multivariate CAR (MCAR) process is described by Gelfand and Vounatsou (2003) as where each B ij and Σ i are of size m × m. Brook's lemma provides the joint density of Φ given by where Γ is a block diagonal matrix with blocks Σ i , and B is nm × nm with (i, j)th block B ij . Given the contiguity matrix W = (w ij ), define where ρ is a parameter that can be considered as the autocorrelation that stabilizes the covariance matrix (Banerjee et al., 2004, Sections 3.3 and 7.4). This model is consequently called MCAR(ρ, Σ).

Proof of Theorem 2.
Lemma 1. Let P m (y) = m i=0 a i y i , be a polynomial of degree m ∈ N. Then, for d < ∞, where Γ(·) is the gamma function.
Proof. Applying polar coordinates on R d , with r = x , one has where c d = 2π d/2 Γ(d/2) represents the surface area of the unit sphere S d−1 .
Lemma 2. For d < ∞ and m ≥ 0 fixed, Proof. The result is obtained by emulating the proof of Lemma 1.
Proof of Theorem 2 (continued). We assume that X is suitable chosen such that lim(X X) −1 = 0 holds, and hence, condition (iv) in Theorem 1 holds. The proof of conditions (i) − (iii) is divided into two parts.