Large scale reduction principle and application to hypothesis testing

Consider a non--linear function $G(X_t)$ where $X_t$ is a stationary Gaussian sequence with long--range dependence. The usual reduction principle states that the partial sums of $G(X_t)$ behave asymptotically like the partial sums of the first term in the expansion of $G$ in Hermite polynomials. In the context of the wavelet estimation of the long--range dependence parameter, one replaces the partial sums of $G(X_t)$ by the wavelet scalogram, namely the partial sum of squares of the wavelet coefficients. Is there a reduction principle in the wavelet setting, namely is the asymptotic behavior of the scalogram for $G(X_t)$ the same as that for the first term in the expansion of $G$ in Hermite polynomial? The answer is negative in general. This paper provides a minimal growth condition on the scales of the wavelet coefficients which ensures that the reduction principle also holds for the scalogram. The results are applied to testing the hypothesis that the long-range dependence parameter takes a specific value.

1. Introduction 2 2. Long-range dependence and the multidimensional wavelet scalogram 4 3. Reduction principle at large scales 8 4. Critical exponent 10 5. Examples 12 5.1. G is even 12 5.2. G is odd 12 5.3. I 0 = ∅ and q 0 ≥ 2 12 5.4. I 0 = ∅, q 0 = 1 and δ(q 1 ) > 0 12 6. Application to wavelet statistical inference 13 6. Let X = {X t } t∈Z be a centered stationary Gaussian process with unit variance and spectral density f (λ), λ ∈ (−π, π). Such a stochastic process is said to have short memory or shortrange dependence if f (λ) is bounded around λ = 0 and long memory or long-range dependence if f (λ) → ∞ as λ → 0. We will suppose that {X t } t∈Z has long memory with memory parameter 0 < d < 1/2, that is, where the short range part f * of the spectral density is a bounded spectral density which is continuous and positive at the origin. The parameter d is also called the long-range dependence parameter. A standard assumption in the semi-parametric setup is where β is some smoothness exponent in (0,2]. This hypothesis is semi-parametric in nature because the function f * plays the role of a "nuisance function". It is convenient to set f (λ) = |1 − e −iλ | −2d f * (λ), λ ∈ (−π, π] .
Consider now a process {Y t } t∈Z , such that for K ≥ 0, where (∆Y ) t = Y t −Y t−1 , {X t } t∈Z is Gaussian with spectral density f satisfying (3) and where G is a function such that E[G(X t )] = 0 and E[G(X t ) 2 ] < ∞. While the process {Y t } t∈Z is not necessarily stationary, its K-th difference ∆ K Y t is stationary. Nevertheless, as in Yaglom (1958) one can speak of the "generalized spectral density" of {Y t } t∈Z , which we denote f G,K . It is defined as where f G is the spectral density of {G(X t )} t∈Z . Note that G(X t ) is the output of a non-linear filter G with Gaussian input. According to the Hermite expansion of G and the value d, the time series Y may be long-range dependent (see Clausel et al. (2012) for more details). We aim at developing efficient estimators of the memory parameter of such non-linear time series.
Since the 80's many methods for the estimation of the memory parameter have been developed. Let us cite the Fourier methods developed by Fox and Taqqu (Fox and Taqqu (1986)) and Robinson (Robinson (1995b,a)). Since the 90's, wavelet methods have become very popular. The idea of using wavelets to estimate the memory parameter of a time series goes back to Wornell and Oppenheim (1992) and Flandrin (1989aFlandrin ( ,b, 1991Flandrin ( , 1999. See also ; , Bardet (2002), Bardet et al. (2008), Bardet et al. (2000). As shown in Flandrin (1992), , Veitch and Abry (1999) and Bardet (2000) in a parametric context, the memory parameter of a time series can be estimated using the normalized limit of its scalogram (21), that is the average of squares of its wavelet coefficients computed at a given scale. It is well-known that, when considering Gaussian or linear time series, the wavelet-based estimator of the memory parameter is consistent and asymptotically Gaussian (see Moulines et al. (2007) for a general framework in the Gaussian case and Roueff and Taqqu (2009b) for the linear case). This result is particulary important for statistical purpose since it provides confidence intervals for the wavelet-based estimator of the memory parameter.
The application of wavelet-based methods for the estimation of the memory parameter of non-Gaussian stochastic processes has been much less treated in the literature. See Abry et al. (2011) for some empirical studies. In Bardet and Tudor (2010) is considered the case of the Rosenblatt process which is a non-Gaussian self-similar process with stationary increments living in the second Wiener chaos, that is, it can be expressed as a double iterated integral with respect to the Wiener process. In this case, the wavelet-based estimator of the memory parameter is consistent but satisfies a non-central limit theorem. More precisely, conveniently renormalized, the scalogram which is a sum of squares of wavelet coefficients converges to a Rosenblatt variable and thus admits a non-Gaussian limit. This result, surprisingly, also holds for a time series of the form H q 0 (X t ) where X t is Gaussian with unit variance and H q 0 denotes the q 0 -th Hermite polynomial with q 0 ≥ 2 (see Clausel et al. (2014)).
The general case G(X t ) is expected to derive from the case G = H q 0 . Namely, one could expect that some "reduction theorem" analog to the one of Taqqu (1975) holds. Recall that the classical reduction theorem of Taqqu (1975) states that if G(X) is long-range dependent then the limit in the sense of finite-dimensional distributions of [nt] k=1 G(X k ) adequately normalized, depends only on the first term c q 0 H q 0 /q 0 ! in the Hermite expansion of G. The reduction principle then states that there exist normalization factors a n → ∞ as n → ∞ such that 1 a n

[nt]
k=1 G(X k ) and 1 a n have the same non-degenerate limit as n → ∞. A reduction principle was established in Clausel et al. (2012), Theorem 5.1 for the wavelet coefficients of a non-linear time series of the form G(X t ). In applications, the wavelet coefficients are not used directly but only through the scalogram. For example, Faÿ et al. (2008) use the scalogram to compare Fourier and wavelet estimation methods of the memory parameter. The difficulty is that the scalogram is a quadratic function of the wavelet coefficients involving not only the number of observations but also the scale at which the wavelet coefficients are computed. In practice, however, the scalogram is easy to obtain and one can take advantage of the structure of sample moments to investigate statistical properties. Its use is well-illustrated numerically in Abry et al. (2011) who consider a number of statistical applications. The following is a natural question : Does a reduction principle hold for the scalogram?
In Clausel et al. (2013) we illustrated through different large classes of examples, that the reduction principle for the scalogram does not necessarily hold and that the asymptotic limit of the scalogram may even be Hermite process of order greater than 2. It is then important to find sufficient conditions for the reduction principle to hold. In this case, the normalized limit of the scalogram of the time series G(X t ) would be the same as the time series c q 0 H q 0 (X)/q 0 ! studied in Clausel et al. (2014) and therefore will be asymptotically Gaussian if q 0 = 1 and a Rosenblatt random variable if q 0 ≥ 2. In Theorem 3.2, we prove that the reduction principle holds at large scales, namely if n j ≪ γ νc j as j → ∞ , that is, if the number of wavelet coefficients n j at scale j (typically N 2 −j , where N is the sample size) does not grow as fast as the scale factor γ j (typically 2 j ) to the power ν c , as the sample size N and the scale index j go to infinity. The critical exponent ν c depends on the function G under consideration and may take the value ν c = ∞ for some functions, in which case the reduction principle holds without any particular growth condition on γ j and n j besides n j → ∞ and γ j → ∞ as j → ∞. The paper is organized as follows. In Section 2, we introduce long-range dependence and the scalogram. The main Theorem 3.2, which states that under Condition (6) the reduction principle holds is stated in Section 3 with the critical exponent ν c given in Section 4 and examples provided in Section 5. Section 6 contains statistical applications. The decomposition of the scalogram in Wiener chaos is described in Section 7. That section contains Theorem 7.2 on which Theorem 3.2 is based. Several proofs are in Section 8. Section 9 contains technical lemmas. The integral representations are described in Appendix A and the wavelet filters are given in Appendix B. Appendix C depicts the multiscale wavelet inference setting.
For the convenience of the reader, in addition to providing a formal proof of a given result, we sometimes describe in a few lines the idea behind the proof.

Long-range dependence and the multidimensional wavelet scalogram
The centered Gaussian sequence X = {X t } t∈Z with unit variance and spectral density (3) is long-range dependent because d > 0 and hence its spectrum explodes at λ = 0.
The long-memory behavior of a time series Y of the form (4) is well-known to depend on the expansion of G in Hermite series. Recall that if E[G(X 0 )] = 0 and E[G(X 0 ) 2 ] < ∞ for X 0 ∼ N (0, 1), G(X) can be expanded in Hermite polynomials, that is, One sometimes refer to (7) as an expansion in Wiener chaos. The convergence of the infinite sum (7) is in L 2 (Ω), and are the Hermite polynomials. These Hermite polynomials satisfy H 0 (x) = 1, H 1 (x) = x, H 2 (x) = x 2 − 1 and one has Observe that the expansion (7) starts at q = 1, since by assumption. Denote by q 0 ≥ 1 the Hermite rank of G, namely the index of the first non-zero coefficient in the expansion (7). Formally, One has then In the special case where G = H q , whether {H q (X t )} t∈Z is also long-range dependent depends on the respective values of q and d. We show in Clausel et al. (2012), that the spectral density of {H q (X t )} t∈Z behaves like |λ| −2δ + (q) as λ → 0, where We will also let δ + (0) = δ(0) = 1/2. For q ≥ 1, δ + (q) is the memory parameter of {H q (X t )} t∈Z . It is a non-increasing function of q. Therefore, since 0 that is, d must be sufficiently close to 1/2. Specifically, for long-range dependence, From another perspective, and thus {H q (X t )} t∈Z is short-range dependent if q ≥ 1/(1 − 2d). Recall that the Hermite rank of G is q 0 ≥ 1, that is the expansion of G(X t ) starts at q 0 . We always assume that {H q 0 (X t )} t∈Z has long memory, that is, The condition (16), with q 0 defined as the Hermite rank (10), ensures such that {Y t } t∈Z = {∆ −K G(X t )} t∈Z is long-range dependent with long memory parameter More precisely, we have the following result which also determines a Hölder condition on the short-range part of the spectral density. This condition shall involve q 0 defined in (17), and, if G is not reduced to c q 0 H q 0 /(q 0 !), it also involves the index of the second non-vanishing Hermite coefficient denoted by If there is no such q 1 we let δ + (q 1 ) = 0 in (18).
Theorem 2.1. Let Y be defined as above. Then the generalized spectral density f G,K of Y can be written as where d 0 is defined by (17) and f * G is bounded, continuous and positive at the origin. Moreover, for any ζ > 0 satisfying ζ ≤ min(β, 2(δ(q 0 ) − δ + (q 1 )) and, if q 0 ≥ 2, ζ < 2δ(q 0 ) , there exists a constant C > 0 such that Proof. See Section 8.1.
Idea behind the proof of Theorem 2.1. Starting with the regularity of the nuisance function f * in (1), one derives that of f * Hq and, more generally, that of f * G , taking advantage of the fact that the terms in the expansion of G(X) in Hermite polynomials are uncorrelated.
Remark 2.1. The exponent ζ in (18) will affect the bias of the mean of the scalogram (see (53)). The higher ζ, the lower the bias. Since in (18), ζ is required to satisfy a non-strict and a strict inequality (if q 0 ≥ 2), we cannot provide an explicit expression for ζ. However, in most cases one has q 0 = 1 or δ + (q 1 ) > 0 and hence one can set ζ = min(β, 2(δ(q 0 ) − δ + (q 1 ))) which then satisfies both inequalities in (18).
Our estimator of the long memory parameter of Y is defined from its wavelet coefficients, denoted by {W j,k , j ≥ 0, k ∈ Z}, where j indicates the scale index and k the location. These wavelet coefficients are defined by where γ j ↑ ∞ as j ↑ ∞ is a sequence of non-negative decimation factors applied at scale index j. The properties of the memory parameter estimator are directly related to the asymptotic behavior of the scalogram S n j ,j , defined by as n j → ∞ (large sample behavior) and j → ∞ (large scale behavior). More precisely, we will study the asymptotic behavior of the sequence S n j+u ,j+u = S n j+u ,j+u − E(S n j+u ,j+u ) = 1 n j+u adequately normalized as j, n j → ∞. There are two perspectives. One can consider, as in Clausel et al. (2012), that the wavelet coefficients W j+u,k are processes indexed by u taking a finite number of values. A second perspective consists in replacing the filter h j in (20) by a multidimensional filter h ℓ,j , ℓ = 1, · · · , m and thus replacing W j,k in (20) by (see Appendix C for more details). We adopted this second perspective in Clausel et al. (2014Clausel et al. ( , 2013 and we also adopt it here since it allows us to compare our results to those obtained in Roueff and Taqqu (2009b) in the Gaussian case.
We use bold faced symbols W j,k and h j to emphasize the multivariate setting and let h j = {h ℓ,j , ℓ = 1, · · · , m}, W j,k = {W ℓ,j,k , ℓ = 1, · · · , m} , We then will study the asymptotic behavior of the sequence adequately normalized as j → ∞, where, by convention, in this paper, The squared Euclidean norm of a vector x = [x 1 , . . . , x m ] T will be denoted by |x| 2 = x 2 1 + · · · + x 2 m and the L 2 norm of a random vector X is denoted by We now summarize the main assumptions of this paper in the following set of conditions.
Assumptions A {W j,k , j ≥ 1, k ∈ Z} are the multidimensional wavelet coefficients defined by (23) , where (i) {X t } t∈Z is a stationary Gaussian process with mean 0, variance 1 and spectral density f satisfying (3). (ii) G is a real-valued function whose Hermite expansion (7) satisfies condition (16), namely q 0 < 1/(1 − 2d), and whose coefficients in the Hermite expansion satisfy the following condition : for any λ > 0 (iii) the wavelet filters (h j ) j≥1 and their asymptotic Fourier transform h ∞ satisfy the standard conditions (W-1)-(W-3) with M vanishing moments. See details in Appendix B. We shall prove that, provided that the number of vanishing moments of the wavelet is large enough, these assumptions yield the following general bound for the centered scalogram.
Theorem 2.2. Suppose that Assumptions A hold with M ≥ K + δ(q 0 ). Then for any two diverging sequences (γ j ) and (n j ), we have, as j → ∞, Proof. Theorem 2.2 is proved in Section 8.2.
Idea behind the proof of Theorem 2.2. One decomposes S n j ,j further in terms S (q,q ′ ,p) n j ,j as in (64) and applies the bounds obtained in part in Proposition 7.1.
It is important to note that Theorem 2.2 holds whatever the relative growth of (γ j ) and (n j ) but it only provides a bound. This bound will be sufficient to derive a consistent estimator of the long memory parameter K + δ(q 0 ), see Theorem 6.1 below.
Obtaining a sharp rate of convergence of the centered scalogram and its asymptotic limit is of primary importance in statistical applications but this can be quite a complicated task. We exhibit several cases in Clausel et al. (2014Clausel et al. ( , 2013) that underline the wild diversity of the asymptotic behavior of the centered scalogram. In general the nature of the limit depends on the relative growth of (γ j ) and (n j ). We will show, however, that if n j ≪ γ νc j , where ν c is a critical exponent, then the reduction principle holds. In this case, the limit will be either Gaussian or expressed in terms of the Rosenblatt process which is defined as follows.
Definition 2.1. The Rosenblatt process of index d with is the continuous time process The multiple integral (30) with respect to the complex-valued Gaussian random measure W is defined in Appendix A. The symbol ′′ R 2 indicates that one does not integrate on the diagonal u 1 = u 2 . The integral is well-defined when (29) holds because then it has finite L 2 norm. This process is self-similar with self-similarity parameter H = 2d ∈ (1/2, 1), that is for all a > 0, {Z d (at)} t∈R and {a H Z d (t)} t∈R have the same finite-dimensional distributions, see Taqqu (1979). When t = 1, Z d (1) is said to have the Rosenblatt distribution. This distribution is tabulated in Veillette and Taqqu (2013).

Reduction principle at large scales
We shall now state the main results and discuss them. They are proved in the following sections. We use L −→ to denote convergence in law. The following result involving the case is proved in Theorem 3.2 of Clausel et al. (2014) and will serve as reference : Theorem 3.1. Suppose that Assumptions A (i) and A (iii) hold with M ≥ K + δ(q 0 ), where δ(·) is defined in (12). Assume that Y is a non-linear time series such that ∆ K Y = cq 0 q 0 ! H q 0 (X), with q 0 ≥ 1 and q 0 < 1/(1 − 2d). Define the centered multivariate scalogram S n,j related to Y by (22) and let (n j ) and (γ j ) be any two diverging sequences of integers. (a) Suppose q 0 = 1 and that (γ j ) is a sequence of even integers. Then, as j → ∞, where Γ is the m × m matrix with entries (32) (b) Suppose q 0 ≥ 2. Then as j → ∞, where Z d (1) is the Rosenblatt process in (30) evaluated at time t = 1, f * (0) is the short-range spectral density at zero frequency in (1) and where for any p ≥ 1, L p is the deterministic m-dimensional vector [L p ( h ℓ,∞ )] ℓ=1,...,m with finite entries defined by for any g : R → C.
Thus Theorem 3.1 states that in the case G = H q 0 , q 0 ≥ 1 the limit of the scalogram is either Gaussian or has a Rosenblatt distribution 2 . Our main result Theorem 3.2 states that beyond this simple case, the limits continue to be either Gaussian or Rosenblatt under fairly general conditions, involving n j and γ j , namely that n j ≪ γ νc j as j → ∞ where ν c is a positive (possibly infinite) critical exponent given in Definition 4.1, see Section 4 for details.
(35) Define the centered multivariate scalogram S n,j related to Y by (22). Let (n j ) be any diverging sequence of integers such that, as j → ∞, where ν c is given in Definition 4.1 below. Then, the following limits hold depending on the value of q 0 . (a) If q 0 = 1 and γ j even, then, the convergence (31) holds. (b) If q 0 ≥ 2, then, the convergence (33) holds.
Proof. We shall prove in Theorem 7.2, see (70), that, under Conditions (35) and (36), S n j ,j can be reduced to a dominating term S (q 0 ,q 0 ,q 0 −1) n j ,j in the sense of the L 2 norm (26). This dominating term depends only on the term c q 0 H q 0 (X)/(q 0 !) of the expansion of G(X). We can then apply Theorem 3.1 to conclude. This result extends Theorem 3.1 stated above, where G was restricted to G = cq 0 q 0 ! H q 0 . While extending the result to a much more general function G, Theorem 3.2 involves two additional conditions. Condition (35) is merely here to avoid logarithmic corrections, see Remark 3.2 below. Condition (36) is restrictive only when ν c is finite, in which case it imposes a minimal growth of the analyzing scale γ j with respect to that of n j . We say that the reduction principle holds at large scales. The main interest of having a reduction principle is to conclude that the same asymptotic analysis is valid as in the case G = cq 0 q 0 ! H q 0 . Remark 3.1. In practice such a result can be used as follows : If d, G are both known, ν c can be evaluated numerically. We then get a practical condition, albeit asymptotic, for the reduction principle. See Section 6.3 for an application.
Remark 3.2. The values d = 1/2 − 1/(2q), q ≥ 1, constitute boundary values which already appear in the classical reduction theorem, see Taqqu (1975). These boundary values also exist in our context. If d = 1/2 − 1/(2q), q ≥ 1, one gets similar results but with logarithm terms. In fact, one can show that if one drops the restriction (35), then the conclusion of Theorem 3.2 holds if 1) n j ≪ γ νc j (log γ j ) −4 . 2) For any ε > 0, log n j = o(γ ε j ) and log γ j = o(n ε j ) as j → ∞. The technical condition 2) is very weak and condition 1) is the same as (36) up to a logarithmic correction. We assume (35) for simplicity of the exposition.
Remark 3.3. We provided in Clausel et al. (2013) several examples for which different limits are obtained. In these examples one does not have (36) and consequently different terms in the decomposition in Wiener chaos of the scalogram dominate and provide different limits. Since the limits are not the same as when G = H q 0 , the reduction principle does not hold in these cases.

Critical exponent
The precise description of the critical exponent given below involves a number of sequences, in particular, the subsequence of Hermite coefficients c q , q ≥ 1 that are non-vanishing. We denote this subsequence by {c q ℓ } ℓ∈L where (q ℓ ) ℓ∈L is a (finite of infinite) increasing sequence of integers such that Thus the indexing set L is a set of consecutive integers starting at 0 with same cardinality as the set of non-vanishing coefficients. We set that is, q ℓ and q ℓ+1 take consecutive values when ℓ ∈ I 0 . The set I 0 could be either empty (there are no consecutive values of q ℓ ) or not empty. Then we set When ℓ 0 is finite (that is, I 0 is not empty), q ℓ 0 is the smallest index q such that two Hermite coefficients c q , c q+1 are non-zero. We define similarly for any r ≥ 0 which involves the terms distant by r + 1. Finally, we extend the definition of ℓ 0 in (39) to any r ≥ 0 by ℓ r = min(I r ) .
Thus r ∈ R describes the gaps r + 1 where H r+1 (X t ) is long-range dependent. Since by (12), Finally, let where we used the expression for δ(q) in (12). Note that and thus We illustrate these quantities in the following example.
The value ν c = ∞ is the simplest case since then the reduction principle holds whatever the respective growth rates of the diverging sequences (n j ) and (γ j ) are. This happens for instance when there are no consecutive non-zeros coefficients (I 0 = ∅) and either q 0 = 1 and d ≤ 1/4 or q 0 ≥ 2 (which implies d > 1/4).

Examples
In this section, we examine some specific cases of functions G. We always assume that G satisfies Assumption A (ii). 5.1. G is even. If G is an even function then q 0 ≥ 2 and I 0 = ∅ because the Hermite expansion has only even terms. Hence ν c = ∞ and the reduction principle applies for any diverging sequences (n j ) and (γ j ).

G is odd.
If G is an odd function then we have again I 0 = ∅ since the Hermite expansion has no even terms. But unlike the even case, we may have q 0 = 1. If it is not the case, then q 0 ≥ 3 so that ν c = ∞ and the reduction principle applies for any diverging sequences (n j ) and (γ j ). If q 0 = 1 and d ≤ 1/4, we find again ν c = ∞. If q 0 = 1 and d > 1/4, the formula of the exponent ν c is more involved and takes various possible forms, see Section 5.4 for one of the possible cases, namely I 0 = ∅, q 0 = 1 and δ(q 1 ) > 0. 5.3. I 0 = ∅ and q 0 ≥ 2. This corresponds to the class studied in Section 3.1 of Clausel et al. (2013) with the additional condition δ(q ℓ 0 + 1) > 0 (see (3.3) in this reference). Using this additional condition, we have δ(q ℓ 0 ) > 0 since δ(q) is decreasing. Hence δ + (q ℓ 0 ) = δ(q ℓ 0 ) and This value of ν c corresponds to the exponent ν defined in (3.4) and appearing in Theorem 3.1 of Clausel et al. (2013). This theorem shows that if the opposite condition to (36) holds, namely, γ νc j ≪ n j , then the reduction principle does not apply since the limit is Gaussian instead of Rosenblatt. We say that the reduction principle does not apply at small scales. In Theorem 3.2, the reduction principle is proved even when δ + (q ℓ 0 + 1) = 0, but whether the reduction principle does apply or not at small scales, namely if γ νc j ≪ n j , remains an open question.
5.4. I 0 = ∅, q 0 = 1 and δ(q 1 ) > 0. The expansion of G contains H 1 but does not contain any two consecutive polynomials. This corresponds to the class studied in Section 3.2 of Clausel et al. (2013) (see (3.8) in this reference). The exponent ν c simplifies as follows. First observe that δ(q 1 ) > 0 implies δ + (q 1 − 1) > 0, so that q 1 ∈ R, and also δ(2) > 0 and hence d > 1/4. We thus need to focus on the term of ν c in Definition 4.1 involving min. Using (12), for the first term in the min which corresponds to the exponent ν 2 in (3.10) of Clausel et al. (2013). Now focus on the second term in the min. Take any r ∈ R and consider ℓ r defined in (41). Note that q ℓr is the smallest Hermite polynomial index of the expansion of G such that the next one appears after a gap equal to r + 1. There are only two possibilities : (a) either q ℓr = q 0 = 1, (b) or q ℓr ≥ q 1 . In case (a), we have r + 1 = q ℓr+1 − q ℓr = q 1 − 1 and thus which corresponds to the exponent ν 1 in (3.10) of Clausel et al. (2013). In case (b), using r + 1 ≥ 2 (since I 0 = ∅) and q ℓr ≥ q 1 , we get which already appeared in (47). Therefore with (47) and (48) and Definition 4.1 of ν c for q 0 = 1 and d > 1/4, we get which corresponds to min(ν 1 , ν 2 ) using the definitions in (3.10) of Clausel et al. (2013). Hence the reduction principle established in Theorem 3.2 under the condition n j ≪ γ νc j corresponds to the cases n j ≪ γ ν 1 j and n j ≪ γ ν 2 j of Theorems 3.3 and 3.5 in Clausel et al. (2013), respectively. These two theorems further show that when the additional condition δ(q 1 ) > 0 holds the reduction principle does not hold under the opposite condition γ νc j ≪ n j , illustrating the fact that the reduction principle may not hold at small scales.
6. Application to wavelet statistical inference 6.1. Wavelet inference setting. Suppose that we observe a sample Y 1 , . . . , Y N of Y . Recall that Y has long memory parameter d 0 = K + δ(q 0 ). In this section, we assume that we are given an unidimensional wavelet filter g j satisfying Assumptions (W-1)-(W-3) in Appendix B (see also (128) and (133)). Then one can derive the wavelet estimator where w 0 , . . . , w p are well chosen weights and (σ 2 j ) i≤j≤i+p denotes the multiscale scalogram obtained from Y 1 , . . . , Y N , (see Appendix C for more details). In this setting, (γ j ) and (n j ) are specified as follows As usual in this setting the asymptotics are to be understood as N → ∞ with a well chosen diverging sequence j = j N such that and thus (n j ) diverge as N → ∞. We refer to (Moulines et al., 2007, Theorem 1) for the asymptotic behavior of the mean of the scalogram where C is a positive constant and ζ is an exponent satisfying the conditions of Theorem 2.1. This relation follows from Theorem 2.1, provided that M ≥ d 0 − 1/2. Choosing weights such that i w i = 0 and i iw i = 1/(2 log 2) then yields 6.2. Consistency. We now state a consistency result.
Proof. By (49), we havê The numerators in the last ratio are the components of S n j ,j by (50). By Theorem 2.2 and (51), we have Hence, with (55) and (53), we get that Applying (54) then yieldŝ The result then follows from (52).
Remark 6.1. We note that this consistency result applies without any knowledge of G or β.
6.3. Hypothesis testing. Consider again a sample Y 1 , . . . , Y N of Y and suppose now that G is known and has Hermite rank q 0 . Denote by d 0 the estimator that would be obtained instead ofd 0 if we had G replaced by c q 0 H q 0 /(q 0 !). We shall apply Theorem C.1 and Theorem C.2 of Appendix C. Theorem C.1 (case q 0 = 1) derives from Theorem 2 of Roueff and Taqqu (2009a) and Theorem C.2 (case q 0 ≥ 2) derives from Theorem 4.1 of Clausel et al. (2014). We obtain the following : for conveniently chosen diverging sequences j = (j N ), there exists some renormalization sequence (u N ) such that as N → ∞, and where U (d, K, q 0 ) is a centered Gaussian random variable if q 0 = 1 and a Rosenblatt random variable if q 0 ≥ 2. The precise distribution of U (d, K, q 0 ) is given in Theorems C.1 and C.2. Beside the chosen wavelet, the distribution of U only depends on d, K and q 0 .
As application of the reduction principle in this setting, we use (57) to define a statistical test procedure which applies to a general G. Let d * 0 be a given possible value for the true unknown memory parameter d 0 of Y and consider the hypotheses HereK denotes a known maximal value for the true (possibly unknown) integration parameter K. So to insure that the number M of vanishing moments satisfies M ≥ d 0 , it suffices to impose M >K. Since G is assumed to be known, for the given value d * 0 , one can define the parameters d * , K * and ν * c defined as d, K and ν c by replacing d 0 by d * 0 . Let α ∈ (0, 1) be a level of confidence. Define the statistical test The following theorem provides conditions for the test δ s to be consistent with asymptotic level of confidence α, namely, that its power goes to 1 and its first type error goes to α as N goes to ∞.
Theorem 6.2. Suppose that Assumptions A(i),(ii) hold with M >K and that the unidimensional wavelet filter g j satisfies Assumptions (W-1)-(W-3). Assume additionally that (35) holds. Let j = (j N ) be a diverging sequence such that (52) holds. Suppose moreover that, as N → ∞, and that there exists a positive exponent ζ satisfying (18) and with u N defined as in (58). Then, if (36) is satisfied, δ s is a consistent test with asymptotic level of confidence α.
Remark 6.2. Observe that the different conditions that have to be simultaneously satisfied by (j N ) can be reformulated as follows : In particular, one can easily check that since ν * c and ζ are both positive so is ζ ′ . Hence these conditions are not incompatible.
Idea behind the proof of Theorem 6.2. Condition (60) states that n j ≪ γ ν * c j and will insure that the reduction principle holds under H 0 . Condition (61) will ensure that the bias is negligible under H 0 . These conditions will allow us through Relation (109) to transfer the problem to the case G(x) = cq 0 q 0 ! H q 0 (x) which was treated in Clausel et al. (2014).

Decomposition in Wiener chaos
As in Clausel et al. (2012) and Clausel et al. (2013), we need the expansion of the scalogram into Wiener chaos. The wavelet coefficients can be expanded in the following way : where W (q) j,k is a multiple integral of order q. Then, using the same convention as in (25), we have where the convergence of the infinite sums hold in L 1 (Ω) sense. Each W

(q)
j,k is a multiple integral and consequently so is S n j ,j in (24). (Basic facts about Multiple integrals and Wiener chaos are recalled in Appendix A).
In Proposition 4.2 of Clausel et al. (2013), we gave the following explicit expression of the Wiener chaos expansion of the scalogram.
Proposition 7.1. For all j, {W j,k } k∈Z is a weakly stationary sequence. Moreover, for any j ∈ N, S n j ,j can be expanded into Wiener chaos as follows where, for all q, q ′ ≥ 1 and 0 ≤ p ≤ min(q, q ′ ), S (q,q ′ ,p) n j ,j is of the form and where the infinite sums converge in the L 1 (Ω) sense. The function g (65) is defined as follows : where f denotes the spectral density of the underlying Gaussian process X and for any integer n, denotes the normalized Dirichlet kernel, and for ξ 1 , ξ 2 ∈ R, if p = 0, and, if p = 0, The random summand S (q,q ′ ,p) n j ,j is expressed in (65) as a Wiener-Itô integral of order q + q ′ − 2p and q + q ′ − 2p will be called the order of S (q,q ′ ,p) n j ,j . The limits involved in Theorem 3.1 are those given by the term S (q 0 ,q 0 ,q 0 −1) n j ,j as proved in Propositions 5.3 and 5.4 of Clausel et al. (2013). A sufficient condition to get the reduction principle is that the other terms are negligible with respect to this term. Theorem 3.2 is then a direct consequence of the following main result : (12) and that (35) holds. Define the centered multivariate scalogram S n,j related to Y by (22). Suppose that (γ j ) and (n j ) are any diverging sequences of integers. Then Condition (36) implies, as j → ∞, Proof. Theorem 7.2 is proved in Section 8.4.
Idea behind the proof of Theorem 7.2. One uses the expansion (64). The norms of the relevant terms are bounded in Proposition 7.3. We then deduce bounds for the difference S n j ,j − S (q 0 ,q 0 ,q 0 −1) n j ,j 2 in Proposition 7.4. The main task in the proof of Theorem 7.2 is to show that these bounds are negligible compared to the leading term S (q 0 ,q 0 ,q 0 −1) n j ,j 2 whose asymptotic behavior is also given in Proposition 7.4.
Our results are based on L 2 (Ω) upper bounds of the terms S (q,q ′ ,p) n j ,j 2 established in Proposition 5.1 of Clausel et al. (2013). To recall this result, we introduce some notations.
We first recall Proposition 5.1 of Clausel et al. (2013) where Part (i) corresponds to p ≥ 1 and Part (ii) to p = 0.
Proposition 7.4. Assume that Assumptions (A) hold with M ≥ K + δ + (q 0 ) and suppose that Condition (35) holds. Let (n j ) and (γ j ) be any diverging sequences. Then, there exists a positive constant C such that, for all j ≥ 1, where we denote Moreover, the two following assertions hold : where C is a positive constant. (ii) If q 0 ≥ 2, as j → ∞, where C is a positive constant.
Proof. By Proposition 7.1 and (81), applying the Minkowski inequality, we have The bound (79) implies that By Lemma 8.6 of Clausel et al. (2013), the last two displays yield (80). We now prove (82) and (83). First consider the case where q 0 = 1. This asymptotic equivalence (82) is related to the convergence (31) and follows from its proof, see e.g. Moulines et al. (2007). Since q 0 = 1, we have c 1 = 0. Moreover in Condition (W-3) on the wavelet filters recalled in Appendix B, h ℓ,∞ are functions that are non-identically zero and which are continuous as locally uniform limits of continuous functions. Therefore ℓ Γ 2 ℓ,ℓ > 0 and we get (82). Now consider the case where q 0 ≥ 2. The bound (83) is then related to Theorem 3.2((b)) where the weak convergence is stated and follows from its proof, see Clausel et al. (2014).

Proofs
8.1. Proof of Theorem 2.1. The generalized spectral density f G,K of Y is related to the spectral density f G of G(X) by (5). By definition of d 0 , the result shall then follow if we prove the existence of a bounded function f * G such that and satisfying all the properties stated in Theorem 2.1. We now prove (84). To this end, we consider the following decomposition of G(X) as the sum of two uncorrelated processes, The proof of Proposition 6.2 in Clausel et al. (2012) shows that G 2 (X) admits a bounded spectral density f G 2 . We first consider the case where G 1 reduces to the term c q 0 H q 0 /q 0 !. Since the two processes H q 0 (X) and G 2 (X) are uncorrelated, one has We can then set Let us check f * G has the properties stated in the theorem. Relation (84) follows from the definition of f * G and f * Hq 0 . To prove the other properties stated in Theorem 2.1, we distinguish the two cases q 0 = 1 and q 0 ≥ 2. If q 0 = 1, f Hq 0 = f , f * Hq 0 = f * and then ζ ≤ β, one has for some C > 0. If q 0 ≥ 2, Lemma 9.1 yields that there exists a bounded function f * Moreover for any ζ ∈ (0, 2δ(q)) such that ζ ≤ β, one has for some C > 0. In any case, the boundedness of f G 2 and the properties of f * Hq 0 (equation (85) if q 0 = 1 or (86), (87) if q 0 ≥ 2) then imply that (19) holds in the case G 1 = c q 0 H q 0 /q 0 !, that is if δ + (q 1 ) = 0.
• If d ≤ 1/4, the sup over A 2 can be restricted to A 22 . This gives • If d > 1/4, the sup over A 2 has to be performed over A 21 and A 22 . This gives Optimization on q ′ . We only need to consider A j since B j corresponds to q ′ = q. For A j , optimizing on q ′ means optimizing on ℓ ′ in the sup of (98). We know from Lemma 9.3 that, for each ℓ, α(q ℓ , q ℓ ′ , q ℓ ) is non-decreasing and β ′ (q ℓ , q ℓ ′ , q ℓ ) is non-increasing as ℓ ′ increases, hence the sup ℓ,ℓ ′ is achieved when ℓ ′ = ℓ + 1 and thus α(q ℓ , q ℓ ′ , q ℓ ) < 1/2 implies where J d is defined in (44). When J d = ∅, the sup in (96) is taken over the empty set. We use the convention sup ∅ (. . . ) = 0.
Optimization on q. We deal separately with the cases (a) d ≤ 1/4.
8.4.2. Proof of Theorem 7.2 in the case q 0 ≥ 2. In this case, we need to show that (36) implies (92).
8.5. Proof of Theorem 6.2. The fact that the test δ s is consistent follows directly from the consistency statement in Theorem 6.1 and the fact that (u N ) is diverging.
To show that the test δ s has asymptotic confidence level α, it suffices to show that when We first observe that under the conditions on j = (j N ) of the theorem, the convergence (57) involving d 0 holds, see Clausel et al. (2014). The computations of Section 5 in Clausel et al. (2014) allows us to specify (56) aŝ where L is the linear form where the weights (w i ) have been defined in Section 6.1. The same linearization holds for d 0 − d 0 with S n j ,j replaced by S (q 0 ,q 0 ,q 0 −1) n j ,j , so by subtracting, we get Using (60), which corresponds to (36) under H 0 , we can apply Theorem 7.2 so that With (109), we get and u N 2 −ζj = o(1). By the definition of u N in (57) and since γ j = 2 j , d 0 = K + δ(q 0 ) and δ(1) = d, the asymptotic equivalences (82) and (83) in Proposition 7.4 can be written as The bound u n S (q 0 ,q 0 ,q 0 −1) n j ,j 2 = O(1) follows under H 0 . Finally the bound u N 2 −ζj = o(1) follows from the bias negligibility condition (61). Hence we get (108), which concludes the proof.

Technical lemmas
The next lemma give an explicit expression of the spectral density of H q (X) for q < 1/(1 − 2d) and is a refined version of Lemma 4.1 in Clausel et al. (2012). It is used in the proof of Theorem 2.1.
Lemma 9.1. Let q be a positive integer greater than 2. The spectral density of where f denotes the spectral density of is bounded on λ ∈ (−π, π) and for any ζ ∈ (0, 2δ(q)) such that ζ ≤ β, where β has been defined in (2), one has |f * for some L > 0.
Proof. The explicit expression (110) of f Hq has already been given in Lemma 4.1 in Clausel et al. (2012). Moreover in the same lemma, we also already showed that f * Hq defined by (111) is a bounded function. We then only need to prove that (112) holds for some L > 0. We prove the result by induction on q.
Lemmas 9.2 to 9.4 are used in the proof of Theorem 7.2.
Lemma 9.5. Consider a sequence {q ℓ , ℓ ∈ L} with L a set of consecutive integers starting at 0. Let ν c (d) be as in Definition 4.1 for all d ∈ (1/2(1 − 1/q 0 ), 1/2), so that (16) holds. Then the following assertions hold : Proof. We first consider the case q 0 ≥ 2. In this case, either I 0 = ∅ and ν c (d) = ∞, or I 0 = ∅ and ν c is a continuous function taking values Hence we obtain (ii).
Since the two arguments in the min are decreasing functions of d over d > 1/2(1 − 1/p), we conclude that (b) holds. The proof of the lemma is achieved.

Appendix A. Integral representations
It is convenient to use an integral representation in the spectral domain to represent the random processes (see for example Major (1981);Nualart (2006)). The stationary Gaussian process {X k , k ∈ Z} with spectral density (3) can be written as This is a special case of where W (·) is a complex-valued Gaussian random measure satisfying, for any Borel sets A The integral (116) is defined for any function g ∈ L 2 (R) and one has the isometry The integral I(g), moreover, is real-valued if We shall also consider multiple Itô-Wiener integrals where the double prime indicates that one does not integrate on hyperdiagonals λ i = ±λ j , i = j. The integrals I q (g) are handy because we will be able to expand our non-linear functions G(X k ) introduced in Section 1 in multiple integrals of this type. These multiples integrals are defined for g ∈ L 2 (R q , C), the space of complex valued functions defined on R q satisfying Hermite polynomials are related to multiple integrals as follows : if X = R g(x)d W (x) with E(X 2 ) = R |g(x)| 2 dx = 1 and g(x) = g(−x) so that X has unit variance and is real-valued, then Appendix B. The wavelet filters The sequence {Y t } t∈Z can be formally expressed as The study of the asymptotic behavior of the scalogram of {Y t } t∈Z at different scales involve multidimensional wavelets coefficients of {G(X t )} t∈Z and of {Y t } t∈Z . To obtain them, one applies a multidimensional linear filter h j (τ ), τ ∈ Z = (h j,ℓ (τ )), at each scale index j ≥ 0. We shall characterize below the multidimensional filters h j (τ ) by their discrete Fourier transform : The resulting wavelet coefficients W j,k , where j is the scale index and k the location are defined as where γ j ↑ ∞ as j ↑ ∞ is a sequence of non-negative scale factors applied at scale index j, for example γ j = 2 j . We do not assume that the wavelet coefficients are orthogonal nor that they are generated by a multiresolution analysis. Our assumption on the filters h j = (h j,ℓ ) are as follows : (W-1) Finite support: For each ℓ and j, {h j,ℓ (τ )} τ ∈Z has finite support. Further there exists some A > 0 such that for any j and any ℓ one has (W-2) Uniform smoothness: There exists M ≥ K, α > 1 and C > 0 such that for all j ≥ 0 and λ ∈ [−π, π], By 2π-periodicity of h j this inequality can be extended to λ ∈ R as where {λ} denotes the element of (−π, π] such that λ − {λ} ∈ 2πZ. (W-3) Asymptotic behavior: There exists a sequence of phase functions Φ j : R → (−π, π] and some non identically zero function h ∞ such that locally uniformly on λ ∈ R. In (W-3) locally uniformly means that for all compact K ⊂ R, It implies in particular that h ∞ is continuous over R.
A more convenient way to express the wavelet coefficients W j,k than in (121) is to incorporate the linear filter ∆ −K into the filter h j and denote the resulting filter h where h is the discrete Fourier transform of h (K) j , see Clausel et al. (2013) for more details.
Appendix C. The multiscale wavelet inference setting We state here two theorems that are used in Section 6 to derive statistical properties of the estimator of the memory parameter d 0 . This parameter is obtained from univariate multiscale wavelet filters g j . Since, Theorem 3.2 applies to multivariate filters h j which define the multivariate scalogram S n,j , we explain in this appendix the connection between these two perspectives.
We first give some details about the definition of the estimator of the memory parameter. We use dyadic scales here, as in the standard wavelet analysis described in Moulines et al. (2007), where the univariate wavelet coefficients are defined as which corresponds to (20) with γ j = 2 j and with (g j ) denoting a sequence of filters that satisfies (W-1)-(W-3) with m = 1. In the case of a multiresolution analysis, g j can be deduced from the associated mirror filters.
The number n j of wavelet coefficients available at scale j, is related both to the number N of observations Y 1 , · · · , Y N of the time series Y and to the length T of the support of the wavelet ψ. More precisely, one has where [x] denotes the integer part of x for any real x. Details about the above facts can be found in Moulines et al. (2007); Roueff and Taqqu (2009a). The univariate scalogram is an empirical measure of the distribution of "energy of the signal" along scales, based on the N observations Y 1 , · · · , Y N . It is defined as and is identical to S n j ,j defined in (21). The wavelet spectrum is defined as where the last equality holds for M ≥ K since in this case {W j,k , k ∈ Z} is weakly stationary.
To define our wavelet estimator of the memory parameter d 0 , we are given some positive weights w 0 , · · · , w p such that p i=0 w i = 0 and p i=0 iw i = 1 2 log (2) .
We then setd To derive statistical properties of this estimator, we apply Theorem 3.2 using a sequence of multivariate filters (h j ) j≥0 related to the family of univariate filters g j in a way indicated below.
We first give an example and consider the case p = 1. To investigate the asymptotic properties ofd 0 , we then have to study the joint behavior of W j−u,k for u = 0, 1. Recall that j − 1 is a finer scale than j. Following the framework of Roueff and Taqqu (2009a), we consider the multivariate coefficients W j,k = (W j,k , W j−1,2k , W j−1,2k+1 ), since, in addition to the wavelet coefficients W j,k at scale j, there are twice as many wavelet coefficients at scale j − 1, the additional coefficients being W j−1,2k , W j−1,2k+1 . These coefficients can be viewed in this case as the output of a three-dimensional filter h j defined as h j (τ ) = (g j (τ ), g j−1 (τ ), g j−1 (τ + 2 j−1 )). These three entries correspond to (u, v) below equal to (0, 0), (1, 0) and (1, 1), respectively, in the general case below.
We now indicate the asymptotic behavior of the univariate multiscale scalogram in the case G = H q 0 since it will be needed in Section 6. We state the results separately for q 0 = 1 and for q 0 ≥ 2.
Proof. We first observe that the proof of formula (4.5) in Theorem 4.1 of Clausel et al. (2014) remains valid in the case q 0 = 1. This yields (135). We now prove the convergence (136). To do so we adapt the corresponding proof of Theorem 4.1 of Clausel et al. (2014) done for q 0 ≥ 2. From Clausel et al. (2014) (see equality (9.5)), we have where we denoted the entries of the multivariate scalogram S n j ,j in (22) as [S n j ,j (ℓ)] ℓ=1,...,m .
We then recover (137) which concludes the proof.
The case q 0 ≥ 2 has been considered in (Clausel et al., 2014, Theorem 4.1). We recall it here.