An asymptotic theory for spectral analysis of random fields

For a general class of stationary random fields we study asymptotic properties of the discrete Fourier transform (DFT), periodogram, parametric and nonparametric spectral density estimators under an easily verifiable short-range dependence condition expressed in terms of functional dependence measures. We allow irregularly spaced data which is indexed by a subset Γ of Zd. Our asymptotic theory requires minimal restriction on the index set Γ. Asymptotic normality is derived for kernel spectral density estimators and the Whittle estimator of a parameterized spectral density function. We also develop asymptotic results for a covariance matrix estimate.


Introduction
Analysis of irregularly spaced data has been attracting considerable attention from researchers in various fields, ranging from environmental science to economics. The origin of irregular data is in fact the limit theorem for random fields with continuous parameter where the sets of integration in the limit theorems approach to infinity in Van Hove sense, see for example, Ivanov and Leonenko (2012). In a broad sense, there are two different approaches to deal with the irregularly spaced spatial data. The classical and more popular Kriging or interpolation approach (Cressie (1988)) is parametric in nature. A nonparametric or a frequency domain approach was considered by Fuentes (2007) which revolves around the assumption that the sampled locations are fixed and not random. Vidal-Sanz (2009) also considered nonparametric estimation of spectral densities for second-order stationary random fields on a d-dimensional lattice. In that paper, the author proposed modified estimator classes with improved bias convergence rate. In a much recent work, Bandyopadhyay et al. (2015) formulated a spatial frequency domain empirical likelihood method for irregularly spaced data. Other works in the frequency domain can be found in Hall and Patil (1994), Bandyopadhyay and Lahiri (2009) and the references therein.
Spectral domain methods to approximate the Gaussian likelihood for irregularly spaced datasets were proposed by Matsuda and Yajima (2009) where the sampled locations are assumed random with a particular distribution having a continuous density function. Their non-parametric and parametric estimators of the spectral density function of the underlying random fields are similar to those in classical time series analysis. The parametric spectral density was estimated by minimizing the Whittle likelihood while the non-parametric spectral density estimator was a spectral window estimator, and they studied the asymptotic properties of those estimators.
In spatial data analysis one usually deals with irregularly spaced data. To set the notation, let (Γ n ) n≥1 be a sequence of finite subsets of Z d representing the sampling locations or design points. Our goal here is to work with an asymptotic regime which imposes minimal restrictions on the sampling set and its boundary: Assumption 1. (Asymptotic regime) Let Γ n = {L n,1 , . . . , L n,n } ⊂ Z d be the set of sampling locations such that the choice of Γ n satisfies the property |Γ n | → ∞.
For simplicity, from now on, we would write L k = L n,k . It is instructive to compare the above regime with the Matsuda and Yajima (2009) setup where each sampling location t i is obtained from a randomly generated d-dimensional vector u i = (u i,1 , . . . , u i,d ) by t ij = A j u ij for j = 1, 2, . . . , d. They further assumed that the coefficient A j 's and the sample size (n k ), if expressed as a function of k, satisfy the condition |S k | /n k → 0 as k goes to infinity, where |S k | is the area of the rectangle [0, Other examples with special constraints on the sample set can be found in Jones (1962), Neave (1970a), Neave (1970b), Parzen (1963), Clinger and Van Ness (1976). A related study with irregularly spaced observations for an increasing spatio-temporal domain can be found in Li et al. (2008). Similar to Matsuda and Yajima (2009), they also viewed the spatial locations at which the data is observed as random in number and location; generated from a homogeneous 2-dimensional Poisson process. As an interesting feature, our theory requires a minimal condition on the set Γ n , which is an attractive property in spatial applications in which the underlying observation domains can be quite irregular.
For a given Γ n , the discrete Fourier transform (DFT) is defined by where ı = √ −1 and θ = (θ 1 , . . . , θ d ) ∈ R d . The periodogram of the data is defined by (1.2) We will study aspects of both parametric and nonparametric estimators of the spectral density functions of stationary random fields. We begin by finding asymptotic results for the discrete Fourier transform (DFT) for irregularly spaced random fields, and then study the asymptotic properties of the spectral density estimates. As pointed out earlier, an important feature of our approach is that we do not impose any restriction on the index set Γ n , other than the natural requirement |Γ n | → ∞.
The paper is structured as follows: Section 2 presents the setup, assumptions and some preliminary results regarding short-memory stationary random fields. The discrete Fourier transform of the data and its asymptotic properties are presented in Section 3 while the Whittle likelihood and parametric spectral density estimator are discussed in Section 4. In Section 5, we will present the nonparametric spectral density estimator and the covariance function and its different aspects (e.g. consistency, asymptotic normality). We will also discuss the estimation of covariances matrices for an irregular set-up in Section 6. All proofs are provided in Appendix.

Short-range dependent random fields
We shall consider a very general class of stationary random fields which are functions of independent and identically distributed (iid) random variables. In related works, Whittle (1954) considered two-dimensional linear auto-regression fields and Besag (1974) discussed stationary auto-normal processes and proposed estimation methods and goodness-of-fit tests applicable to spatial Markov schemes defined over a rectangular lattice. Other noteworthy examples may be found in Guyon (1982) and Kashyap (1984). Our setup is general enough to include the most common linear and nonlinear processes.
where g is a measurable function such that X i is well-defined.
Throughout the paper, we work with short-range dependent stationary processes. We use the idea of coupling (Wu (2005)) to define dependence measures.
and ε * j = ε j if j = 0 and ε * 0 = ε 0 . Also let p = min{2, p} and define Two concrete examples of such processes are given next.

Example 1. (Linear process) Let
random variables with mean 0 and ε 0 ∈ L p , p ≥ 2, and a s are real coefficients Example 2. (Spatial Autoregressive Scheme) Let N ⊂ Z d be a finite set and 0 ∈ N. Consider the spatial process in the form of nonlinear autoregressive scheme where the function G is such that there exists nonnegative numbers j , j ∈ N , with j∈N j < 1 and the following holds: for all (x −j ) j∈N and (x −j ) j∈N , Then following the argument in Shao and Wu (2004), we have δ i,p = O(ρ |i| ) for some 0 < ρ < 1.

Asymptotic theory of the DFT
In this section, under some regularity conditions on the data-generating process we study the asymptotic distribution of DFT. These are the key ingredients for developing the spectral analysis of stationary processes. Peligrad and Wu (2010) proved a central limit theorem for the Fourier transform of a stationary process in the regular case. Other works related to bias and variance of periodogram estimates for a regular set-up can be found in Pukkila (1979) and Lin and Liu (2009).

Asymptotic normality of the DFT
We establish the asymptotic normality of the DFT defined in (1.1) using the Cramer-Wold device. To this end, instead of the DFT we consider the more general expression and wish to find the asymptotic joint distribution of (Y n (θ), are the cosine and sine transforms of the data. As mentioned earlier, we are interested in linear combinations aY n (θ) + bZ n (θ) (without loss of generality, we can assume that a 2 + b 2 = 1) which is of the form (3.1). For k ≥ 1, let us use N k (0, Σ) to denote the k-variate normal distribution with 0 mean vector and variance-covariance matrix Σ.
The above theorem gives the joint asymptotic distribution of the discrete Fourier transform. For the regular set-up, we can further show that the two coordinates (Y n (θ), Z n (θ)) are asymptotically independent and the same will happen at two different frequencies.

Bias of the periodogram
It is known that for regularly observed stationary time series the periodogram is an asymptotically unbiased estimator of the spectral density function. This is not the case for irregular spatial data. In this section, we provide an expression for the bias of the periodogram.
For the process (X i ) given in (2.1), assume that the mean is 0 and define the covariance function Define the spectral density (3.5) In the literature the scaled form by (2π) −d is also widely used. In this paper we use the form (3.5). Recall that Γ n = {L 1 , . . . , L n }.
We define the location adjusted spectral density function by The bias of I n (θ) is If, for any fixed k, m k /n → 1 (which holds for the regular rectangle index , then by the Lebesgue dominated convergence theorem, B n (θ) → 0 as n → ∞. However, the same cannot be said for the irregular spatial case. In particular, if for each k ∈ Z d , the ratio m k /n approaches r k as n → ∞ and these r k 's are constant, then the asymptotic bias is given by B(θ) = k∈Z d (1 − r k )γ k cos(k θ).

Whittle likelihood and parametric spectral estimate
In this section we shall discuss a parametric estimator of the spectral density function f (θ). In what follows, we write the spectral density f (θ) (where θ ∈ R d ) as f α (θ) for a certain parameter vector α ∈ R p . Since the spectral density governs the covariance function of a stationary process, γ k will also be a function of α. Therefore, the location adjusted spectral density function f J (θ) and the bias B n (θ) discussed in the previous section are functions of both θ and α. We will denote them by f J,α (θ) and B n,α (θ), respectively.
A widely used approach to estimate the unknown parameter α is to minimize the (negative) Whittle's likelihood or an approximation to the Gaussian log likelihood which is of the form (4.1) Dahlhaus and Künsch (1987) developed an asymptotic theory for Whittle estimator for regularly spaced time series data using a bias-adjusted periodogram in place of I n (θ). We will adopt their technique to correct for the bias of the periodogram. The Whittle likelihood for the irregularly spaced data is We denote the Whittle estimator that minimizes p n (α) byα n and study the asymptotic behavior of this estimator. Suppose the parameter space for α is A. Let the true value of the parameter be α 0 , and suppose that the Whittle estimatê α n exists in the parameter space A for all n. Before stating the theorem for the Whittle estimator, in addition to the asymptotic regime (Assumption 1) and the nonlinear nature of the stationary random field (Assumption 2), we list a few more assumptions: Assumption 3. The parameter space A ⊂ R p is compact, and D ⊂ R d is symmetric and compact such that the spectral density function (now denoted as f J,α (θ), defined on A × D) is twice differentiable with respect to α and the first and second order derivatives are continuous for θ ∈ D.

Assumption 5. If ∇ denotes the first order derivative of a function, then
Finally, let us use p(α) to denote the limit of the Whittle likelihood p n (α) defined by (4.2). Note that f J,α (θ) and consequently, p(α) depends on the index set Γ n . Also, for any Γ n , we can say that p(α) > p(α 0 ), for any α = α 0 .
We now discuss the connection with the regularity assumptions. If the functional dependence measure δ i,p satisfies the summability condition i∈Z d |i| 2 δ i,2 < ∞, then the spectral density function is a bounded, twice partially differentiable function in view of Assumptions 3 to 5 are similar to the ones used in Matsuda and Yajima (2009). Theorem 4.1 below concerns the asymptotic properties of the Whittle estimator.
Theorem 4.1. Let f J,α (θ) be the location adjusted spectral density function (3.6) and denote the true value of the parameter α by α 0 and assume that the Whittle estimatorα n exists in the parameter space A for all large n. Then, under Assumptions 1-5, the estimatorα n satisfies the following: Whittle estimatorα n satisfies the following (L(., .) is the Levy distance): . From a practical point of view, it is not possible to use the above theorem directly to find a confidence set for the true value of the parameter α 0 , for the covariance matrix in the above theorem depends on α 0 . Next, we describe a subsampling procedure to find a confidence set of α. However, for irregularly spaced stationary random field, a subsampling method will not work in most cases. Here, we describe a method only for a regular random field.
For the following discussion, let us assume that we have data (X i ) i∈I from a random field indexed by a rectangle I in Z d . For convenience, since we are going to consider regular spaced data, let us assume that I = {1, 2, . . . , l} d . And suppose the Whittle likelihood estimator for this data isα l .
To use the subsampling procedure, we will be considering smaller blocks from I. For a point t = (t 1 , . . . , t d ),α l,t,b will denote the Whittle likelihood estimator based on the data ( . . , t i + b}. Naturally, these estimates can be obtained for all t such that I t,b ⊂ I. Let us use Q l,b to denote all such t's. We are going to show that an adjusted empirical distribution of the Whittle likelihood estimators for all possible blocks is essentially an approximation for the limiting distribution ofα l , described in the previous theorem. To do that, we will look at indicators of Borel sets. For any Borel set A ∈ R p , let us use F (A) to denote the limiting value of P [c l (α l − α 0 ) ∈ A] where c l is an appropriate scaling constant (see Remark 1). Analogously, F b (A) denotes the same for the subsamples and so, We are going to consider the following empirical distribution of the subsampled Whittle likelihood estimators: (4.4) Then, the following theorem proves the consistency of the above approximation and helps us determine a confidence set for α 0 .
in the above definition goes to F (A) in probability, for each Borel set A whose boundary has mass zero under F (·).

Non-parametric estimator of spectral density function
In this section, we shall study non-parametric kernel spectral density estimators of f and their large-sample properties such as consistency and asymptotic normality. For a symmetric kernel function K(·) and for bandwidth B n we define the kernel spectral density estimator For the consistency of the above estimator, we choose the kernel K to satisfy the following The above condition readily implies that κ = ∞ −∞ K 2 (x)dx < ∞, a quantity that will be needed later. A simple choice that satisfies the above properties is the rectangular kernel K(x) = I {|x|≤1} . The following theorem asserts the consistency result of the nonparametric estimator above.
One can apply Corollary 5.1 to construct confidence intervals for the mean μ based on irregularly spaced data X L1 , . . . , X Ln . Letf n (θ) be defined as f n (θ) in (5.1) with X j therein replaced by X j −X n . Given 0 < α < 1, the (1 − α)th confidence intervals for the mean μ isX n ± z 1−α/2 f n (0)/n. Next, we discuss the asymptotic distribution of the non-parametric spectral density estimator in (5.1). The proof of the theorems are given in the Appendix.

Theorem 5.2. Let Condition 1 be satisfied and define
Remark 2. An interesting observation is that the variance of the estimate f n (θ) is asymptotically equal to E(f n (θ)) 2 , multiplied by an appropriate scaling term. Therefore, the concepts of variance stabilizing transformation and delta method tell us that we can take the logarithm of the estimate to obtain This result can be used to form a confidence interval for E(f n (θ)) of the form exp(log f n (θ) ± z 1−α/2 κB n /n).

Estimation of covariance matrices
In spatial statistics, a fundamentally important problem is to estimate the covariance matrix of the data. It is useful in many aspects of multivariate analysis including principal component analysis, linear discriminant analysis and graphical modeling. One can infer dependence structures among variables by estimating the associated covariance matrices. In this section, we will discuss the estimation of the covariance matrix of an irregular spaced data (X L1 , . . . , X Ln ). Now, to judge the quality of a matrix estimate, we will use the operator norm.
For an estimate of the covariance matrix, we are going to use the l 2 "operator norm" and give an upper bound for this. Recall that l 2 norm or spectral radius of a matrix A is defined as ρ(A) = max |x|=1 |Ax|. Let Σ n = (γ Li−Lj ) 1≤i,j≤n be the covariance matrix to be estimated. Its estimatorΣ n is defined bŷ In the above, m k = #{(i, j) : L i − L j = k}. Based on the known inconsistency results for the periodogram estimate, it can be shown thatΣ n is not a consistent estimate (Wu and Pourahmadi (2009)). So, instead of this, we will use non-parametric kernel-based estimators, similar to what was used for the spectral density. More precisely, we define the following estimate for the covariance matrix:Σ where K is a kernel function and B n is a bandwidth sequence satisfying appropriate conditions, as discussed below. For this banded covariance matrix estimate, we prove the following Theorem 6.1. Suppose {X t } t∈Z d is a stationary process and each X i ∈ L p for some p ∈ (2, 4]. If m k n, the kernel K satisfies Condition 1, the bandwidth B n satisfies the conditions B n → ∞ and B n /n δ → 0 for some δ > (1 − 2/p)/d, then the estimatorΣ n,Bn in equation (6.1) is consistent and the spectral radius ρ(Σ n,Bn − E(Σ n,Bn )) has a convergence rate of O P (B d n n 2/p−1 ).
Remark 3. Note that the above covariance matrix estimate is not necessarily non-negative definite. If we define a matrix K n,Bn = (K((L i − L j )/B n )) 1≤i,j≤n , then the covariance matrix estimate can be written asΣ n,Bn =Σ n K n,Bn ; where is the Hadamard or Schur product, which is formed by the element-wise multiplication of matrices. By Schur Product theorem, sinceΣ n is already non-negative definite, the Schur product will be non-negative whenever K n,Bn is non-negative definite. One particular example is the triangular kernel K(u) = max(0, 1 − |u|) which would lead to a positive definite weight matrix K n,Bn . Thus, using this kernel function will give us a non-negative definite covariance matrix estimateΣ n,Bn .

A simulation study
In this section, we assess empirically the impact of the sampling index set Γ n on the parameter estimation procedure. For that, we consider an isotropic spatial auto-regressive (AR) model in a two-dimensional setting. This model is similar to the isotropic spatial AR model discussed by Azomahou (2009) andLavancier (2011). More precisely, in our spatial AR model, each observation is dependent on the four neighbors in the following way: (7.1) where the parameter c gauges the strength of dependence of an observation on its four neighbors, and the innovations ε i,j are generated independently from a standard normal distribution. Note that this model is an example of non-linear random fields. For more detailed information on such models; see (Cressie, 2015, Chapter 6).
We start with a full grid of size n-by-n from which we would like to generate sample data of different sizes. In order to generate the full data, we consider it in a vector form X, such that the above model helps us write it as X = AX+e, and thereby, X = (I − A) −1 e. Here, A an n 2 × n 2 sparse matrix (non-zero elements are the parameter of the model) and e is a vector such that each component is of the In our simulation study, we generate a data-set on a grid of 75 × 75. Then, we take different samples of varying sizes to estimate the parameter c using the Whittle likelihood approach and calculate the mean squared error (MSE) of the estimates. For each of the sample sizes, we also choose a regular grid and compare the performances of the regular setup to that of the irregular setup. The parameters and the set-up of the simulation are described below: • We used the parameter c = 0.2. • The innovations ε i,j are generated independently from a N (0, 1) distribution. • The grid consists of 75 × 75 data points and we take samples of different sizes (between 10 2 and 40 2 ). First, these samples are chosen randomly from the whole grid, to ensure that we have an irregularly spaced data. And then, we choose a regular grid of similar size to compare the performances. • For each sample size, the experiment is repeated multiple times, and the mean squared error for the parameter estimate (error calculated from the actual value c = 0.2) is calculate for the regular and irregular data. For the irregular set-up, the Whittle likelihood is defined using the location adjusted spectral density, as shown in equation (4.2). In addition to that, we also perform same analysis for Whittle likelihood defined in the original way (4.1). The results are shown in Table 1 below. The results of the simulation confirm that the estimation performance in the irregular set-up is comparable to the regular set-up when the sample size is larger than 20 2 . However, the regular set-up outperforms the irregular one for smaller sample sizes. On the other hand, when we perform the simulation for the original definition of the Whittle likelihood, the results are not at par with the location adjusted version. In fact, the mean squared error is very big for smaller sample sizes, but it reduces steadily as we take bigger samples. For sample size 40 2 , it is about 0.0081, which clearly establishes that as more and more samples are taken, the location adjusted spectral density approaches the true value of the spectral density. This is expected, and follows what was discussed in Section 4.

Proofs of theorems
Proof of Theorem 3.1. It is worth mention that this proof is somewhat similar to theorem 1 in El Machkouri et al. (2013). However, for completeness of our paper, we are giving a detailed proof here.
At first, assume that lim inf n v 2 n /n > 0. Then, there exists a constant c 0 > 0 and n 0 ∈ N such that n/v 2 n ≤ c 0 for any n ≥ n 0 .
Let (m n ) n≥1 be a sequence of positive integers going to infinity and denotẽ X j = E(X j | F mn (j)) where F mn (j) = (ε j−s ; |s| ≤ m n ). So, there exists a measurable function h such thatX j = h(ε j−s ; |s| ≤ m n ). Similarly, for a coupled process (see Definition 2.1), we can writẽ In order to prove the theorem, we will need the following results.
• Using lemma 8.1, denoting Δ j,p , for any n ∈ N and any p ≥ 2, (8.1) • If Δ p < ∞, then for any fixed p ≥ 0, Δ (mn) p → 0 as n → ∞. To this end, note that Since lim n→∞ δ (mn) j,p = 0 and j∈Z d δ j,p = Δ p < ∞, using the dominated convergence theorem, we can say that lim n→∞ Δ (mn) p = 0. • DefineW n = j∈Γn c jXj . Then, using the last two results, we can write And hence, using the assumption mentioned earlier, • We will also use the following central limit theorem, due to Heinrich (1988).
Theorem 8.1. Let (Γ n ) n≥1 be a sequence of finite subsets of Z d with |Γ n | → ∞ as n → ∞ and (m n ) n≥1 be a sequence of positive integers. For each n ≥ 1, let {U n (j), j ∈ Z d } be an m n -dependent random field with E(U n (j)) = 0 for all j ∈ Z d . Also, assume that E( j∈Z d U n (j)) 2 → σ 2 as n → ∞ where σ 2 is finite. Then, j∈Z d U n (j) converges in distribution to N (0, σ 2 ) if there exists a finite constant c > 0 such that for any n ≥ 1, j∈Z d E(U 2 n (j)) ≤ c and for any > 0 it holds that We now apply the above results to prove our theorem. At first, define U n (j) := c jXj /v n , where c j is same as the coefficient of X j in W n . Note that this definition satisfies the criteria that {U n (j), j ∈ Z d } is an m n -dependent random field with E(U n (j)) = 0 for all j ∈ Z d . Further, Thus, using our assumptions, +2Δ 2 (n/v 2 n ) → 0. And hence, as n → ∞, E j∈Z d U n (j) 2 → 1. On the other hand, for any Finally, let > 0 be fixed. Since |c j | ≤ 1, we have and then, we get n Spectral analysis of random fields In order to ensure that L n ( ) → 0, define the sequence (m n ) n≥1 by Hence, applying Theorem 8.1 and using (8.2), we derive that So, we have proved required result (3.3) assuming that lim inf n v 2 n /n > 0. If this condition fails, we can get a subsequence n → ∞ such that (8.4) for some l ∈ [0, ∞]. Furthermore, if v 2 n /|Γ n | does not converge to 0, we can get a further subsequence n such that lim inf n (v 2 n /|Γ n |) > 0, implying The above can be shown using the method we adopted previously and this contradicts (8.4). Consequently, we can say that v 2 n /|Γ n | converges to 0 and thus, W n / |Γ n | converges to 0 in probability, implying that the Levy distance between W n / |Γ n | and N (0, v 2 n /|Γ n |)] goes to 0, again contradicting (8.4). So, finally, proof of the first part of the theorem 3.1 is complete. The second part is just a corollary that follows easily from the first part.

Proof of Theorem
So, for any α = α 0 , we can get because the integrand is 0 for α = α 0 and otherwise always positive. Observe that p n (α) → p(α) in probability, in view of Lemma 8.4 and the assumptions mentioned in Section 4. Thus, there exists a positive constant C α such that Now, consider any two α 1 , α 2 such that α 1 − α 2 < δ for some small positive constant δ, fixed a priori. Then, using the continuity of the functions, we can get that where K 1 , K 2 are constants. Let us denote the above bound by K n (δ) and observe that one can always find δ such that lim n→∞ P K n (δ) < C α = 1. (8.9) Now, for any particular α 1 , consider all points α 2 such that α 1 − α 2 < δ. Then, using (8.8) and (8.9), we can say that for large n, p n (α 1 ) − p n (α 2 ) ≤ C α with probability going to 1. Combining this with (8.7) for α = α 1 , we get that for any α 1 For an α 1 , let us denote the above sets of the form {α 2 : α 1 − α 2 < δ} by S(α 1 ). To remain consistent with out notation, for α 0 , let us consider S(α 0 ) = {α 2 : α 0 − α 2 < γ}. Observe that the above result is true for any possible value of γ. Now, for the whole parameter space A, if we consider the collection of subsets {S(α) : α ∈ A}, this forms an open cover of the whole parameter space and since A is compact, it will have a finite cover. Let us denote this as {S(α i ) : i = 0, 1, 2, . . . , m and α i ∈ A ∀ i}. Note that from (8.10), Now, A = ∪ m i=0 S(α i ) and from the above equation, we get that as n → ∞, p n (α 0 ) < inf A p n (α) with probability going to 1. Thus, Therefore, clearly, as n → ∞, the minimizer will always be in S(α 0 ) and since we can fix γ as small as possible, we can say that lim n→∞ P |α n − α 0 | < γ = 1 and so,α n P −→ α 0 . Hence, the Whittle likelihood estimate is consistent.

Proof of Theorem 4.1, part (b).
Sinceα n is the minimizer for the Whittle likelihood, we will start with the Taylor series expansion and we get Now, we would consider the two terms on the right hand side separately. For the second term, using the assumptions, we have, Let us use ∇g and ∇ 2 g to denote the first order derivative and the Hessian of a function g. Also, let h α0 (θ) = ∇{f J,α0 (θ)} −1 . Note that this is non-stochastic, but depends on n and the true value of α. In order to find the asymptotic distribution of the above term, we will make use of Lemma 8.3. Observe that I n (θ) can be written as Now, if we consider the stochastic term in the integral above, we can write it as Let us consider the asymptotic distribution of c P n for some real-valued vector c ∈ R p . Note that the coefficients c β n,j−k do not depend on θ. Hence, comparing the above expression with that of Lemma 8.3, we can set b n,j = a n,j = c β n,j and fix θ = 0 in that theorem. Then, Based on the assumptions we have, we can say that the integrals are finite and on the other hand, it depends only on α 0 (through h(.)). Let us now denote the matrix in the middle of the above expression by nΣ 2 n . Using the assumptions in Section 4, one can now easily check that (8.27) and other conditions, required to apply the lemma, are satisfied. So, using it, we get that where L(·, ·) denotes the Levy distance. Since the above is true for all c, we can say that L nP n (θ) − nE[P n (θ)], N(0, f 2 J,α0 (0) · 2nΣ 2 n ) → 0. Combining this with the expression obtained for ∂p n (α 0 )/∂α, we obtain Now, for the first term, we need to take the second order derivative of the integrand in the expression of Whittle likelihood. Then, We already have proved thatα n is consistent for α 0 . Using this, along with the assumptions mentioned in Section 4 and Lemma 8.4, the first term above goes to 0 as n → ∞ and hence, Finally, the central limit theorem follows from (8.13), (8.15) and (8.16).

Proof of Theorem 4.2.
We are going to set some notation at first. Let us also define the following empirical distribution forα l,t,b : Because of the assumptions on b, we can easily say that P (E l,b, ) → 1 as n → ∞, for any > 0. Further note that and hence, with probability tending to one, Now, if we can prove that L 0 l,b (A) → F (A) for any Borel set A whose boundaries have measure zero under F (·), then we would get that F Letting → 0 such that A ± are Borel sets whose boundaries have measure zero under F (·), we can then get that L l,b (A) is a good approximation for F (A).
In order to show that L 0 and thus, we only have to show that Var(L 0 l,b (A)) → 0 as n → ∞.
We will be using m-dependence approximation to prove the required result. Here, we deal with the case d = 2 for convenience, but the proof would hold for any dimension. Let us write (i 1 , i 2 ) or (j 1 , j 2 ) for the 2-dimensional indexes i, j. Define the sigma field F i1−m,i1 = σ(ε j : j ∈ R d , i 1 − m ≤ j 1 ≤ i 1 ) and write F i1 for F −∞,i1 . The m-dependence approximation, for m ≥ 0, is then defined by the following:X Now, let us define a new quantity, as follows: Here,α l,t,b is the Whittle likelihood estimator based onX t 's. Below, for notational convenience, let us denote It is easy to note that that A 1 = O(m 1/d q −1 ) and so, it goes to 0 as n → ∞, for any fixed m. On the other hand, A 2 is exactly equal to 0, since for any s, τ s−t = 0 for all t ∈ R c s,m . The theorem is then proved in view of the fact that Var(L 0 n,b (A)) = Var(L n,b (A)) + o(1). Proof of Theorem 5.1. This proof will revolve around the notion of m-dependent processes, as defined by (8.18). Observe that, in this theorem, we are dealing with the quantity n t=1 a n,Ls−Lt X Ls X Lt , where a n,r = K(r/B n )e ır θ .
Based on the assumptions, it is easy to observe that r∈Z |a n,r | 2 = O(B n ). Now, since we intend to approximate using the m-dependent process, we should consider the following term, which is defined in a similar way as above, but withX Lt 's. So, define Y t =X Lt t−k s=1 a n,Ls−LtXLs . Theñ (8.20) The idea here is to prove the required result in three steps. At first, we consider the last term in the above expression. Now, observe that the sequence {Y t+rk } r≥0 are L p martingale differences whenever |k| > m. For convenience, let us take |k| = 2m. On the other hand, using Lemma 8.2, we get that Now, we will write the last term in (8.20) using the sequences of the martingale differences and then it would satisfy the following. (Here, N j is used to denote the maximum possible index in that sequence and it will be of the order n.) Letγ k = E(X 0Xk ) and denote the last term in (8.20) by W n /n. We are now going to consider the first two terms together in view of the fact that for any l, n −1 tX tXt+l −γ l p/2 → 0 as n → ∞. Observe that in the first two terms, we are essentially combining all the terms of the formX LtXLs where |s − t| ≤ k. So, we can say that the following holds: Combining this with the above result that W n /n p = O[(mB n /n) 1/2 ], based on our assumption that B n /n → 0 as n → ∞, we can conclude that Now, define the following: Lj e ıL j θ .
So, from the assumptions and using Lemma 8.2, we can say that S n (θ) − S n (θ) p = Θ m,p O(n 1/2 ) and S n (θ) p + S n (θ) p = O(n 1/2 ). Now, ifK is the Fourier transform of K, we can write f n (θ) with the help of S n in the form nf n (θ) = K (u)|S n (B −1 n u + θ)| 2 du and we can use a similar definition forf n (θ) usingS n . And then, the following can be obtained.
We already have (8.21) for the second term. Now, as Θ m,p goes to 0 if we take m → ∞, from (8.22), we can say that both first and last terms in the righthand-side of the above inequality will go to 0 and that completes the proof.
Proof of Theorem 5.2. This theorem is a particular case of the general quadratic forms of stationary random fields and we will make use of Lemma 8.3 to prove the theorem. Note that nf n (θ) is of the form S n in the above-mentioned lemma with b n,j = K(j/B n ). Using the given conditions, one can now show that That means the conditions of the above lemma are satisfied and hence, we can say that Then, in view of the fact that E(f n (θ)) = f J (θ) + o(1), theorem 5.1 and a simple application of Slutsky's theorem completes our proof.
Proof of Theorem 6.1. At first, we will define a new covariance matrix using the actual mean (unknown) of the process. Suppose, each X i has mean μ and let us define a new banded covariance matrix estimatê . This proof will mainly use the Gershgorin circle theorem, which states that every eigenvalue of a complex n × n matrix A lies within at least one of the Gershgorin discs D i (a ii , j =i |a ij |), where a ij 's are the elements of the matrix A. Thus, if λ i , for i = 1, . . . , n denote eigenvalues of A, then the spectral radius satisfies the following: Let us now consider the spectral radius of the matrixΣ 0 n,Bn − E(Σ 0 n,Bn ). The (i, j)-th element of this matrix is K( Below, we will use J n and J n,Bn to denote the following two sets: J n = {k : there exists 1 ≤ i, j ≤ n satisfying L i − L j = k}, J n,Bn = {k : |k| ≤ B n and there exists 1 ≤ i, j ≤ n satisfying L i − L j = k}.
Also, let us denote J n,Bn = J n − J n,Bn . Then, using the above mentioned property of spectral radius, it can be written that Note that if we relabel X i − μ as Y i then the above expression essentially deals with the autocovariance function and its estimate of a zero-mean stationary process. Thus, if each X i ∈ L p for some p ∈ (2, 4], then based on the results obtained by Wu and Pourahmadi (2009) and using the assumption m k n, we can write that the above bound is of the order O P (B d n n 2/p−1 Θ 2 0,p ). If we further assume that the process is p-stable (see Definition 2.2) which tells us that the term Θ 0,p is finite, we can say that the bound is of the order O P (B d n n 2/p−1 ). Now, for the covariance matrix estimateΣ n,Bn , we can write ρ Σ n,Bn − E(Σ n,Bn ) ≤ ρ Σ n,Bn −Σ 0 n,Bn +ρ Σ 0 n,Bn −E(Σ 0 n,Bn ) +ρ E(Σ n,Bn )−E(Σ 0 n,Bn ) .
We have already got the limit of the second term. For the first and third term, we can follow similar procedures as before. In case of first term, using Gershgorin Circle Theorem, we will again get a bound of a sum of terms of the formγ k −γ 0 k , over the set J n,Bn . Now, sinceX − μ is O P (n −1 ), we can say that this bound is of the order O P (B d n n −1 ). Note that this bound is less than what we got for the second term. We can get similar results for the third term too.
Clearly, the overall bound for the spectral radius ofΣ n,Bn − E(Σ n,Bn ) is O P (B d n n 2/p−1 ).

Proofs of lemmas
Lemma 8.1 (This lemma is due to El Machkouri et al. (2013)). Following the aforementioned notation, consider Γ n and let (α i ) i∈Γn be a family of real numbers. Then, for any p ≥ 2, we get Lemma 8.2. Let X i ∈ L p for p > 1, E(X k ) = 0, α 1 , α 2 , . . . ∈ C, p = min{2, p}, A n = ( n k=1 |α L k | p ) 1/p and C p = 18p 3/2 (p−1) −1/2 . Then, Proof. Let τ : Z → Z d be a bijection. For any i ∈ Z, for any j ∈ Z d , define the projection operator P i X j := E(X j | F i ) − E(X j | F i−1 ), where F i = σ(ε τ (l) ; l ≤ i). Also, define T j F i = σ(ε τ (l)−j ; l ≤ i). Now, it is easy to note that P i X j ≤ X j−τ (i) − X * j−τ (i) p and hence, we get the inequality P i X j ≤ δ j−τ (i),p . Then, noting that X i = j∈Z P j X i for all i ∈ Z d , we can make use of Burkholder inequality and Cauchy-Schwarz inequality to get the first two results. The proof here is similar to Proposition 1 in El Machkouri et al. (2013). And then, the third result is a direct consequence of the first two. Lemma 8.3. Let β j ∈ R with β j = β −j ; α j = β j e ıj θ where θ ∈ [−π, π] d . Also, for any θ, defineω(θ) = 2 if θ/π ∈ Z d . Else, define it to be 1. Consider the quadratic form where we assume that E(X 0 ) = 0, X 0 ∈ L 4 , Θ 0,4 < ∞. In addition, let ζ 2 n = 1≤t≤n β 2 Lt , and let us assume that max 1≤t≤n β 2 Lt = o(ζ 2 n ), n d ζ 2 n = O(σ 2 n ). We further consider that Now, nS n is obtained using the m-dependent approximations and so, for a fixed m, the variance ofS n goes to 0. Combining the above, it is straightforward to show that the variance of D I n (θ)f (θ) dθ (which is same as S n ) goes to 0 as n → ∞.