Needlet-Whittle Estimates on the Unit Sphere

We study the asymptotic behaviour of needlets-based approximate maximum likelihood estimators for the spectral parameters of Gaussian and isotropic spherical random fields. We prove consistency and asymptotic Gaussianity, in the high-frequency limit, thus generalizing earlier results by Durastanti et al. (2011) based upon standard Fourier analysis on the sphere. The asymptotic results are then illustrated by an extensive Monte Carlo study.

the spectral parameters (e.g., the spectral index ) of isotropic Gaussian random fields defined on the unit sphere S 2 . Under Gaussianity, it was indeed possible to establish consistency and a central limit theorem allowing for feasible inference, under broad conditions on the behaviour of the angular power spectrum. These results, as many others concerning statistical inference on spherical random fields, have recently found strong motivations arising from applications, especially in a Cosmological framework (see for instance [40] and the references therein). From the technical point of view, the asymptotic framework is rather different from usual, as it is based on observations collected at higher and higher frequencies on a fixed-domain (the unit sphere): even consistency of the angular power-spectrum estimator under such circumstances becomes non-trivial, and largely open for research, see also [39].
The procedure considered in [13] is based upon spherical harmonics and classical Fourier analysis on the sphere. Despite its appealing properties, it must be stressed that in many practical circumstances spherical harmonics may suffer serious drawbacks, due to their lack of localization in real space. This can make their implementation infeasible, due to the presence of unobserved regions on the sphere (as is commonly the case for Cosmological applications), and it may exclude the possibility of separate estimation on different hemispheres, as considered for instance by [6], [51]. In view of these issues, it is natural to investigate the possibility to extend Whittle-like procedures to a spherical wavelet framework, so as to exploit the double-localization properties (in real and harmonic space) of such constructions. This is the purpose of this paper, where we shall focus in particular on spherical needlets.
As mentioned earlier, the asymptotic framework we are considering here is rather different from usual: we assume we are observing a single realization of an isotropic field, the asymptotics being implemented with respect to the higher and higher resolution level data becoming available. In view of this, our paper is to some extent related to the growing area of fixed-domain asymptotics (see for instance [3], [24], [37], [55]); on the other hand, as for [13] some of the techniques exploited here are close to those adopted by [52], where semiparametric estimates of the long memory parameter for covariance stationary processes are analyzed. In terms of angular power spectrum behaviour, we shall also allow for semiparametric models where only the high-frequency/small-scale behaviour of the random field is constrained. In particular, we consider both full-band and narrow-band estimates, the latter entailing a slower rate of convergence but allowing for unbiased estimation under more general circumstances.
The plan of the paper is as follows: in Section 2, we will recall briefly some well-known background material on needlet analysis for spherical isotropic random fields; in Section 3 we will introduce and motivate the Whittle-like minimum contrast estimators. In Section 4 we shall establish the asymptotic properties of these estimators, in particular weak consistency and Gaussianity, while in Section 5 we present results on narrow band estimates. Some Monte Carlo evidence on performance and comparisons with the procedures in [13] are collected in Section 6, while some auxiliary technical results are collected in the Appendix.

Spherical Random Fields and Angular Power Spectrum
It is a well-known fact in Fourier analysis that the set of spherical harmonics {Y lm : l ≥ 0, m = −l, ..., l} represents an orthonormal basis for the space L 2 S 2 , the class of square-integrable functions on the unit sphere (see for instance [2], [56], [26], [40], for more details, and [36], [38] for extensions). Spherical harmonics are defined as the eigenfunctions of the spherical Laplacian ∆ S 2 , e.g. ∆ S 2 Y lm = −l(l + 1)Y lm , see again [56], [57] and [40] for more discussion and analytic expressions. The spherical needlets ( [45], [46]) are defined as where {ξ jk } is a set of cubature points on the sphere, indexed by j, the resolution level index, and k, the cardinality of the point over the fixed resolution level, while λ jk > 0 is the weight associated to any ξ jk (see also e.g. [7]) and [40]). Let N j denote the number of cubature points for a given level j; as discussed by ( [45], [46]), cubature points and weights can be chosen to satisfy where by a ≈ b, we mean that there exists c 1 , c 2 > 0 such that c 1 a ≤ b ≤ c 2 a. In the sequel, for notational simplicity we shall assume that there exists a positive constant c B such that N j = c B B 2j , for all scales j. In practice, cubature points and weights can be identified with those evaluated by common packages such as HealPix (see for instance [5], [11], [23]). Viewing L l ( x, y ) = l m=−l Y lm (x) Y lm (y) as a projection operator, (1) can be considered a weighted convolution with a weight function b (·) , chosen so that the following properties holds (see [45], [46]): for fixed B > 1, b (·) has compact support in B −1 , B and therefore b l B j has compact support in l ∈ B j−1 , B j+1 ; this implies that needlets have compact support in the harmonic domain. Moreover, b (·) ∈ C ∞ (0, ∞), which is pivotal to prove the following quasi-exponential localization property (see [45]): for any M = 1, 2, ... there exists c M > 0 such that for any x ∈ S 2 , Finally, we have the so-called partition of unity property: for l > B which allows the establishment of the following reconstruction formula (see again [45]): for f ∈ L 2 S 2 , we have, in the L 2 sense: where Consider now a zero-mean, isotropic Gaussian random field T : S 2 × Ω → R; we recall also that for every g ∈ SO (3) and x ∈ S 2 , a field T (·) is isotropic if and only if where the equality holds in the sense of processes. It is again a standard fact (see e.g. [26], [40]) that the following spectral representation holds: Note that this equality holds in both the L 2 S 2 × Ω, dx ⊗ P and L 2 (P) senses for every fixed x ∈ S 2 . For an isotropic Gaussian field, the spherical harmonics coefficients a lm are Gaussian complex random variables such that where the angular power spectrum {C l , l = 1, 2, 3, ...} fully characterizes the dependence structure under Gaussianity. Characterizations of the spherical harmonics coefficients under Gaussianity and isotropy are discussed for instance by [4], [40]; here we simply recall that: Hence, given a realization of the random field, an estimator of the angular power spectrum can be defined as: the so-called empirical angular power spectrum. It is immediately observed that Now recall that the needlet coefficients can be written as where Remark 1 It should be noted that in this paper we consider as observations directly the needlet coefficients, rather than their measurements on actual data. While this is clearly a simplifying assumption, we believe it can be heuristically justified, at least as a first order approximation, as follows. The data collection procedure for astrophysical experiments can be described as consisting of continuous averages around pointing directions, obtained as Heuristically, the kernel K(., .) represents the spatial effect of the measuring antenna; in the astrophysical literature, it is usually labelled a beam function, which we take to be radially symmetric, so that it can be expanded in terms of Legendre polynomials as Standard computations then show that the observed spherical harmonic coefficients are related to the intrinsic ones by the simple modulation factor a obs l,m = h l a l,m . For our purposes, the factor h l can be incorporated in the asymptotic behaviour of the angular power spectrum C l , for which we allow some flexibility, see Conditions 1-4 below.
A more realistic experimental set-up would allow also for the presence of masked or unobserved regions, e.g. we can consider the observed field where M (x) is the mask function, e.g. the indicator function of the set where observations are actually collected. However, this more general setting does not really pose extra-difficulties; indeed, defining it was proven in [6] that for all coefficients outside an arbitrarily small neighbourhood around the masked regions. Heuristically, this result is stating that for coefficients centred outside the masked regions, the presence of missing observations is asymptotically negligible. This is indeed a major advantage of the needlet analysis, when compared to standard spherical harmonics transforms.
As in [5], we introduce the following regularity condition on the angular power spectrum: Condition 1 (Regularity) The random field T (x) is Gaussian and isotropic with angular power spectrum such that: where c −1 0 ≤ G(l) ≤ c 0 , α 0 > 2, for all l ∈ N ,and for every r ∈ N there exist c r > 0 such that: for u ∈ (0, ∞) .
Condition 1 requires some form of regular variation on the tail behaviour of the angular power spectrum C l . For instance, in the CMB framework the so-called Sachs-Wolfe power spectrum (i.e. the leading model for fluctuations of the primordial gravitational potential) takes the form (7), the spectral index α 0 capturing the scale invariance properties of the field itself (α 0 is expected to be close to 2 from theoretical considerations, a prediction so far in good agreement with observations, see for instance [10] and [34]). In particular, this Condition will be necessary to prove needlet coefficients (3) to be asymptotically uncorrelated (see [5]). For asymptotic results below, we shall need to strengthen Condition 1 as in [13], imposing in particular Condition 2 Condition 1 holds, and moreover As we shall show, Condition 2 is sufficient to establish consistency for the estimator we are going to define. We shall also consider two further assumptions, 3 (which implies 2), to derive asymptotic Gaussianity, and 4 (which implies 3) to provide a centered limiting distribution, see also [13] for related assumptions.
Under Condition 1 we have: more details can be found the Appendix, Proposition 13. As mentioned before, in [5] it is proven, in view of (8), that: Lemma 2 Under Condition 1, there exists M > 0 such that: As a direct consequence of this lemma, needlets coefficients at any finite distance are asymptotically uncorrelated, and hence asymptotically independent in the Gaussian case.
Following (6) and [45], we have easily that: where, by equation (5): The following Lemma provides the asymptotic behaviour of V ar k β 2 jk .
Lemma 3 Under Condition 7, we have For the sake of notational simplicity, in the sequel we shall write σ 2 , τ 2 + , τ 2 − (omitting the dependence on α 0 , B) whenever this does not entail any risk of confusion.
Proof. Simple calculations based on (5) lead to: and, for j 1 < j 2 , Finally, we have: as claimed.

A Needlet Whittle-like approximation to the likelihood function
Our aim in this Section is to discuss heuristically a needlet Whittle-like approximation for the log-likelihood of isotropic spherical Gaussian fields, and to derive the corresponding estimator. We start from the assumption that needlet coefficients can be evaluated exactly, i.e. without observational or numerical error, up to resolution level J L . This is clearly a simplified picture, analogous to what we assumed in [13] for the case of spherical harmonic coefficients; however in the wavelet case the assumption can be considered much more realistic. Indeed, it is shown for instance in [5] that the effect of masked or unobserved regions is asymptotically negligible, in view of the localization properties of the needlet transform. Hence we believe our results provide a useful guidance also for realistic experimental situations. Needless to say, the maximal observed scale J L grows larger and larger when more sophisticated experiments are undertaken: indeed J L is a monotonically increasing function of the maximal observed multipole L. The latter is for instance in the order of 500/600 for data collected from W M AP and 1500/2000 for those from P lanck. In terms of our following discussion, it is harmless to envisage that B J L +1 = L. The analysis of frequency-domain approximate maximum likelihood estimators based on spherical harmonics is described in [13], while narrow-band, wavelet-based maximum likelihood estimators over R can be found in [44].
To motivate heuristically our objective function, consider the vector of coef- Under the hypothesis of isotropy and Gaussianity for T , we have that In view of Lemma 2 and equation (2), it is to some extent natural to consider the approximation where I Nj denotes the N j × N j identity matrix. We stress, however, that the present argument is merely heuristic -indeed, for instance, elements on the first diagonal do not converge to zero. The approximation however motivates the introduction of the pseudo-likelihood function: and the corresponding log-likelihood as: , up to an additive constant. The full (pseudo-)likelihood is obtained by combining together all scales j, so that .
Let us now introduce the following: Our objective function will hence be written compactly as: More precisely, in view of Condition 2 and the discussion in the previous Section, the following Definition seems rather natural: Remark 2 To ensure that the estimator exists, as usual we shall assume throughout this paper that the parameter space is a compact subset of R 2 ; more precisely we take α ∈ A = [a 1 , a 2 ] , 2 < a 1 < a 2 < ∞, and G ∈ Γ = [γ 1 , γ 2 ] , 0 < γ 1 < γ 2 < ∞. This is little more than a formal requirement that is standard in the literature on (pseudo-)maximum likelihood estimation.
We can rewrite in a more transparent form the previous estimator following an argument analogous to [52], i.e. "concentrating out" the parameter G. Indeed, the previous minimization problem is equivalent to consider It is readily seen that: and because Hence G (α) maximizes R J L (G, α) for any given value of α. It remains to compute α = arg min where

Asymptotic properties
In this Section we investigate the asymptotic properties of the estimators α J L and G J L . We start by studying their asymptotic consistency: in order to achieve this result, we will apply a technique developed by [8] and [52], see also [13].
Theorem 4 Under Condition 2, as J L → ∞ we have: Proof. Let us write: It is easy to see that: The proof is then completed with the aid of the auxiliary Lemmas 7, 8 which we shall discuss below. In particular The previous probability is bounded by, for any δ > 0 Pr inf For α 0 −α = 2 or α 0 −α > 2 the same result is obtained by dividing ∆R J L (α, α 0 ) by, respectively log log B J L or log B J L and then resorting again to Lemmas 7,

By adding and subtracting
is defined as (29) in Proposition 13, we obtain: By Proposition 13 we have that in view of the consistency of α J L . As far as G B is concerned, we obtain, for a sufficiently small δ > 0: and hence, in view of Finally, in view of (30), Hence, because for a sufficiently small δ > 0: as claimed.
Here we present the auxiliary results we shall need on G and its derivatives. We introduce where: Also, let: The first result concerns the behaviour of expected value and variance of the estimator G (α 0 ) computed in α 0 , the second regards the uniform convergence in probability of the ratio between G and G, and their k-th order derivatives G k , G k .
Lemma 5 Let G (α) be as in (10). Under Condition 7, we have where I 0 (B) is defined by (28) in Proposition 13 and Proof. By (9), we obtain that while from Lemma 3 and Proposition 13 (see also the proof of Lemma 10), we have Lemma 6 Under Condition 2 we have for n = 0, 1, 2: Proof. Under Condition 3, we observe that: Kj (α) U j,n (α) .
Then we have: In view of Proposition 13 and equations (35) and (36) in Corollary 14, described in the Appendix, we have Then, Also, by Markov inequality and Lemma 3 , we have that, for all j: Under Condition 2 we have: . From (37), it is easy to see that the second term in the last equation is O B −J L , while for the first term we follow the same procedure already described, also considering that, by (37): We can establish now the asymptotic behaviour of U J L (α, α 0 ), for which we have the following Proof. By recalling N j = c B B 2j (see (2)) and (27), we observe that, Durastanti et al./Needlet-Whittle Estimates on the Unit Sphere 20 using (39) in Proposition 15, described in the Appendix, with J 1 = 1 and s = 2. Now, for α − α 0 > −2: Hence, combining (15) and (14) we obtain

Now consider the function
it is readily seen that for x > −2, l(x) is a continuous function such that whence l(0) = 0 is the unique minimum, and l(x) > 0 for all x = 0. The first part of the proof is hence concluded.
Take now α − α 0 < −2; we have Finally, for α 0 − α = 2 we obtain as claimed. Now we look at T J L (α, α 0 ). From (5), we can prove the following: Proof. From (12) and (13), we have that By applying Chebichev's inequality, we have and from Slutzky's lemma: On the other hand, in view of Lemma 6, so the proof is complete. The second main result to be achieved is a Central Limit Theorem for the estimator α J L , which will be investigated by exploiting some classical argument on asymptotic Gaussianity for extremum estimates, as recalled for instance by [47], see also [13]. We shall in fact establish the following Theorem 9 Let α J L = arg min α∈A R L (α).
a) Under Condition 2 we have: b) Under Condition 3 we have: where m = κI (B, α 0 + 1, α 0 ) log B (B + 1) ; c) Under Condition 4 we have: where Proof. By a standard Mean Value Theorem argument and consistency, for each J L there exists α J L ∈ (α 0 − α, α 0 + α) such that, with probability one: where S J L (α) is the score function corresponding to R J L (α), given by: where G(α), G 1 (α), G 2 (α) are respectively the estimate of G and its first and second derivatives, as in Lemma 6. In order to establish the Central Limit Theorem, we analyze the fourth order cumulants, observing that this statistics belong to the second order Wiener chaos with respect to a Gaussian white noise random measure (see [48]). Let In the Appendix, Lemma 17 shows that: Central Limit Theorem follows therefore from results in [48]. The proofs of (17) and (18) are completed by combining the following Lemmas 10 and 11. Observe that under Condition 3, the only difference between (16) and (17) concerns the possibility to estimate analytically the bias term.
The following result concerns the behaviour of S J L (α).
Lemma 10 Under Condition 3, we have: while under Condition 4 we have: Proof. We have that: where we recall that for Lemma 6: Then we will study the behaviour of Under Condition 3, simple calculations, in view of (38), (35) in Corollary 14 and (40) in Proposition 15, lead to Moreover we obtain: In view of Proposition 13 and Lemma 3, we obtain: and, likewise, On the other hand, by Lemma 5, we have: Finally we have: Hence, following Corollary 16 in the Appendix and equation (39), we have: Finally we have: as claimed.
The following Lemma regards instead the behaviour of Q J L (α).

Lemma 11
Under Condition 2, we have: Proof. By using 6, we obtain: Q J L (α) can be rewritten as the sum of three terms: where: The next step consists in showing that: Using Corollary 14, Q num 2 (α) can be written as: It remains to study Q den 2 (α) ; by using (13) and (15), we have: Using Corollary 14, we write the numerator Q num 1 (α) as: ; by applying (43) we have: It remains to study Q den 1 (α) ; by using again (27) and (15):

Durastanti et al./Needlet-Whittle Estimates on the Unit Sphere
For the consistency of α L , for α L ∈ [α 0 − α L , α 0 + α L ], we have To investigate the efficiency of needlet estimates, fix B J L = L/B, so that the highest frequency covers the multipoles l = L/B 2 + 1, ..., L; observe that, under Condition 4 while parametric estimates based upon standard Fourier analysis (see [13]) yield For any given value of B, the asymptotic variance D (B, α 0 ) can be evaluated numerically by means of (19) and a plug-in method, where α 0 is replaced by its consistent estimate α J L . In practice, though, this is not really needed for the values of B which are commonly in use, i.e. B 1.1/1.5. In fact, using A standard choice for the function b(.) (see [5], [40]) is provided by For this choice of b(.), analytical and numerical approximations allow to show that lim Summing up, the variance of the needlet likelihood estimator is very close to the "optimal" value (e.g. 8) which was found by [13] for the Fourier-based method. Some numerical results to validate this claim are provided in Table 1 for a range of values of B and α 0 . These numerical results are confirmed with remarkable accuracy by the Monte Carlo evidence reported in Section 6 below.  Remark 3 It is shown in [5] how needlet coefficients are asymptotically unaffected by the presence of masked or unobserved regions, provide they are centred outside the mask. It is then possible to argue that the asymptotic results presented here remain unaltered in case of a partially observed sphere, up to a normalization factor representing the so-called sky fraction, i.e. the effective number of available observations. This is a major advantage when compare to standard Fourier analysis techniques -in the latter case, asymptotic theory can no longer be entertained in the case of partial observations. Again, for brevity's sake we do not develop a formal argument here; proofs, however, can be routinely performed starting from the inequality valid for every integer M > 0, some constant C M > 0, for G denoting the unobserved region, see again [5], [6].

Narrow-band estimates
As discussed in the previous Section, under Condition 3, asymptotic inference is made impossible by the presence of the nuisance parameter m. It is possible to get rid of this parameter, however, by considering narrow-band estimates focussing only on the higher tail of the power spectrum. The details are similar to the approach pursued in analogous circumstances in [13]. We start from the following Definition 3 The Narrow-Band Needlet Whittle estimator for the parameters ϑ = (α, G) is provided by or equivalently: For notational simplicity B J1 is defined as an integer (if this isn't the case, modified arguments taking integer parts are completely trivial). For definiteness, we can take for instance g (J L ) = J −3 L . Theorem 12 Let α J L ;J1 defined as in (23). Then under Condition 3 we have Proof. Because the proof is very similar to the full band case, we put in evidence here just the main differences. Consider:
On the other hand, simple calculations on Proposition 15 lead to Consider now Q J L ;J1 (α), which we rewrite as Following a procedure similar to Lemma 11, we have where s = 2 1 + α−α0 2 . Following (25), we obtain Finally, we obtain and, for the consistency of α, we have Thus as claimed. Finally we can see that Remark 4 A careful inspection of the proof reveals that the asymptotic result could be alternatively presented as

Numerical Results
In this Section we provide some numerical evidence to support the asymptotic results discussed earlier. More precisely, using the statistical software R, for given fixed values of L, α 0 , B and G 0 and the alternative conditions discussed in the previous Section, we sample random values for the angular power spectra C l , and evaluate the corresponding needlet coefficients β jk ; we implement standard and narrow-band estimates with both standard Fourier (as described in [13]) and needlet methods. We start by analyzing the simplest model, i.e. the one corresponding to Condition 4. Here we fixed G 0 = 2. In Figure 1, the first column reports the distribution of Fourier estimates of α L − α 0 normalized by a factor √ 2L/4, while the second column reports the distribution of α J L − α 0 normalized by the factor D(α 0 , B) − 1 2 B J L . In Table 2, we report the sample means and variances for different values of L and α 0 , while in Table 3 we report the corresponding Shapiro-Wilk test of Gaussianity results. Figure 2 describes graphically the behavior of normalized distributions of estimates of α 0 in both classical Fourier and needlet analysis, full band and narrow band, with κ = 1, under Condition 3. Table 4 provides sample means and variances for different values of α 0 , with = 8 √ 2, L = 1024, L 1 = 724, and the results of the corresponding Shapiro-Wilk test of Gaussianity. Overall, we believe this numerical evidence to be very encouraging; in particular, we stress how the asymptotic expression reported earlier provide extremely good approximations for the Monte Carlo estimates of the standard deviation. where Proof. Recalling that N j = c B B 2j from (2), simple calculations lead to (26): by applying the Lagrange Mean Value Theorem we obtain which is a non-zero, finite positive real number. Obviously: Because by construction 0 < C min ≤ b 2 (ξ) ≤ C max < ∞, we have Hence, fixing C min /C max ≤ C I ≤ C max /C min , we obtain (30). We recall moreover that I (B, α, α) = 1.
As far as K j,1 (α) is concerned, we prove (31). In fact: Now, we have, by applying again Lagrange Mean Value Theorem: Similarly, we obtain for (32): which is trivially strictly positive and bounded. Hence we have: We now provide some further auxiliary results on the function K j (α) ; these results are exploited in the proofs for consistency and elsewhere.
Lemma 17 Let A j and B j be defined as in (20) and (21). As J L → ∞ Proof. It is readily checked (see also [13]) that cum C l , C l , C l , C l = O l −3 l −4α0 .
Let us compute:

45
Then we have As in [13], the proof can be divided into 5 cases, corresponding respectively to where we have used 20, 21. We have for instance