Small Deviations in $L_2$-norm for Gaussian Dependent Sequences

Let $U=(U_k)_{k\in\mathbb{Z}}$ be a centered Gaussian stationary sequence satisfying some minor regularity condition. We study the asymptotic behavior of its weighted $\ell_2$-norm small deviation probabilities. It is shown that \[ \ln \mathbb{P}\left( \sum_{k\in\mathbb{Z}} d_k^2 U_k^2 \leq \varepsilon^2\right) \sim - M \varepsilon^{-\frac{2}{2p-1}}, \qquad \textrm{ as } \varepsilon\to 0, \] whenever \[ d_k\sim d_{\pm} |k|^{-p}\quad \textrm{for some } p>\frac{1}{2} \, , \quad k\to \pm\infty, \] using the arguments based on the spectral theory of pseudo-differential operators by M. Birman and M. Solomyak. The constant $M$ reflects the dependence structure of $U$ in a non-trivial way, and marks the difference with the well-studied case of the i.i.d. sequences.


Introduction
Let (Y (t)) t∈T be a centered Gaussian process defined on some parametric measure space (T, µ). Many studies have been devoted to the asymptotic behavior of its small deviation probabilities as ε → 0, see e.g. [9,11,12,16,22,23,24], to mention just a small sample. Since by the Karhunen-Loève expansion (see for instance [1,Section 1.4]) where (X k ) k≥0 is a standard Gaussian i.i.d. sequence and d 2 k are the eigenvalues of the covariance operator of Y , the small deviation probability may be written as , as ε → 0.
Sharp evaluation of this asymptotics is available when the limiting behavior of the eigenvalues d 2 k is understood well enough. Moreover, a considerable amount of results is known also for the case where (X k ) is an i.i.d. non-Gaussian sequence, see e.g. [9,26,27]. The importance of small deviation probabilities in a broader context and the wide spectrum of their applications are described in the surveys [18,19]; for an extensive up-to-date bibliography see [20].
In this paper, we move towards a different direction and examine the asymptotic behavior of the small deviation probabilities of dependent sequences. That is, for some stationary centered Gaussian random sequence U = (U k ) k∈Z that is dependent and only satisfies some mild regularity condition. The motivation for looking at this small deviation problem under dependence (1.1) is twofold. First, it is an interesting mathematical question in its own right. The existing literature on small deviation probability for sums of random variables has been strictly confined to the i.i.d. framework, so the dependent case is still an open field of research. Second, there are several potential statistical applications where this extension could be found useful. In functional statistics literature, it is well-known that the convergence rates of nonparametric estimators depend upon the asymptotics of the associated small deviation probabilities, see e.g. [10], [21] and references therein. Yet in many practical situations where the functional variable of interest is discretevalued, strict independence assumption between the coordinate variables is too restrictive, so the extent to which the existing small deviation results can be feasible is limited and the asymptotics of (1.1) should be understood. We refer the reader to [14] for more details.
Consider a random vector Z ∈ ℓ 2 (Z) defined by its coordinates Z k = d k U k , k ∈ Z, where the positive coefficients d k satisfy the assumption where at least one of the numbers d ± is strictly positive. This assumption is typical of the literature on small deviations of Gaussian processes and related matters; see for example [16,17,25]. We are interested in the asymptotics of the small deviation probabilities In particular, one wonders to what extent this asymptotics is the same as that for the i.i.d. Gaussian sequence having the same variance with U k . One example of mild dependence structure one can think of would be linear regularity (in the sense of [6, Chapter VII, p.248] and [15,Chapter 17,p.303]). We say that a stationary sequence U = (U k ) k∈Z is linearly regular if where H m denotes the closed linear span of {U k } k≤m . It is a type of asymptotic independence condition that roughly means the process has no significant influence from the distant past.
When the process is Gaussian, linear regularity is implied by the class of mixing-type conditions, a popular notion of dependence under which probability theories have been extensively studied in the literature; see e.g. [5] and [8] for the precise definition and a comprehensive review.
Since a consequence of the Wold decomposition theorem suggests that any stationary linearly regular Gaussian sequence admits a causal moving average representation (cf. [6, Chapter VII, Theorem 13]): standard Gaussian sequence, it follows that many popular dependent processes such as strongly mixing sequences do have such representations.
In the sequel we shall consider a more general assumption than causality, and postulate that where (a m ) ∈ ℓ 2 (Z), and (X j ) is i.i.d. standard Gaussian as above. In fact, this representation exists iff the stationary sequence (U k ) has a spectral density (cf. Remark 2.1 below) but we will not develop this point of view any further. Our main result is as follows: with the constants The power term in the logarithmic small deviation asymptotics is the same as that in the i.i.d. case (characterized by a m = a 0 1 {m=0} ), but the constant C in front of it depends on the sequence (a m ) in a nontrivial way, no matter how weak the linear dependence in (U k ) is (in other words, how fast a m decays).

Remark 1.3
We do not know whether the extra assumption on (a m ) for p < 1 is essential or purely technical.

Remark 1.4
For sharper results on small deviations, one would need to know a sharper spectral asymptotics (the so-called two-term asymptotics). This seems to be a much harder problem in general.

Remark 1.5
Similar technique can be applied in the study of the weighted L 2 -norm small deviations for continuous time stationary processes. This will be done elsewhere.

Proof of Theorem 1.1
Recall that we have a random vector Z = (d k U k ) ∈ ℓ 2 (Z) and a random vector with independent coordinates X = (X j ), j ∈ Z. It follows from the definitions that where D is the diagonal matrix with elements d kj = d k 1 {k=j} and A is the Toeplitz matrix with elements a kj = a k−j . Therefore, the covariance operator of Z that maps ℓ 2 (Z) into ℓ 2 (Z) can be expressed as and by the Karhunen-Loève expansion (see [1, Section 1.4]), where (ξ n ) n∈N is an i.i.d. standard Gaussian sequence and (λ n ) n∈N are the eigenvalues of K Z . We remark that the small deviations (1.3) depend heavily on the asymptotic behavior of λ n . In particular, if we can show that then (1.5) will follow from [9, p.67] or [28], and [23]. The decay rate for λ n would then be the same as that of d 2 n , and the constant C in front of the power rate would depend on the sequence (a m ) in a non-trivial way, cf. (1.6).
Therefore it now remains to prove the eigenvalue asymptotics (2.7), and to specify the constant C.
Notice that in this space A becomes the multiplication operator Af = af related to the function while D becomes the convolution operator

Remark 2.1
Interestingly, |a(·)| 2 is the spectral density of the stationary sequence (U k ).
In our spectral analysis, we will first slightly reinforce condition (1.2) by assuming that (d k ) is exactly equal to the non-isotropic power function where d(±1) = d ± are two constants and d 0 = 0. In the sequel, our main argument will be a reduction of the operator A * D to a special case of the pseudo-differential operators (ΨDO) studied by M. Birman and M. Solomyak (hereafter BS) in [2,3] 1 , see also [7].
The following exposition provides an interpretation of [2] and [3] adapted to our case. The aim of the papers BS is the spectral analysis of the following operator (in their notation) Here and elsewhere by spectral analysis of an operator, we understand the study of the asymptotic behavior of its singular values.
In our case the space dimension m = 1, and we can assume that the function F depends only on the second argument, i.e. (2.10) The kernel F(·) in [2] is of specific Fourier transform form, namely, Here ζ(·) is any smooth function that vanishes on a neighborhood of zero and equals to one on a neighborhood of infinity, while d(·) in the one-dimensional case is a homogeneous function as in (2.9) but considered in continuous time, i.e., in the notation of BS where d(±1) = d ± are two constants. For us, the homogeneity index α in (2.12) is p. Notice immediately that the "mysterious" formula (2.11) is, apart from the inessential smoothing term ζ, a version of our former kernel definition (2.8) for continuous time.
BS consider the operator F either on R m or on a cube. The latter means that the weights b and c in (2.10) are supported by a cube. In our case the weight function b(·) from (2.10) is a(·), and the function c(·) is the indicator on the interval [0, 2π] that plays the role of a cube. Moreover, the index µ = m α used by BS for the description of singular values behavior is 1 p in our notation. Notice that [2] distinguishes three cases µ > 1, µ = 1 and µ < 1, which in our notation are p ∈ ( 1 2 , 1), p = 1 and p > 1, respectively. The weight size restrictions in [2] are b ∈ L q 1 , c ∈ L q 2 . Our assumptions give q 1 = 2 for p ≥ 1 and q 1 = r r−1 > 2 for p < 1 (the latter fact is due to the Hausdorff-Young inequality, see, e.g. [13, § 8.5]). Without loss of generality we can suppose r r−1 < 1 p . Further, q 2 ≥ 1 may be taken arbitrarily.
The main results of BS are stated in Theorems 1 and 2 of [2]. Let us first check the weight assumptions of Theorem 1 in [2].
If p = 1, then µ = 1 and Theorem 1(c) applies with q 1 = 2 and any q 2 > 2. This case is relevant to Wiener process and its relatives such as Brownian bridge, OU-process etc.
If p ∈ ( 1 2 , 1), then 2 > µ > 1, and Theorem 1(a) applies with q 1 > 2 and q 2 > 2 chosen from the relation 1 q 2 = p − r−1 r , as required in Theorem 1(a). Theorem 2 in [2] is disregarded because it requires some extra assumptions and only applies to the case of infinite q 1 or q 2 . Now let us proceed to follow the BS result. They denote the singular values of F by s n (F) and study the corresponding distribution function N F (s) := #{n : s n (F) ≥ s} and its asymptotics at zero. This is indeed an equivalent setting because (2.13) Next, BS introduce the following notations (2.14) In their Theorem 2 of [2] BS prove that ∆ µ = δ µ and find the common value for the upper and the lower limit Namely, they introduce the "operator symbol" G(s, ξ), see formula (14) of [2]. In the onedimensional case the symbol is a scalar defined by Further, formula (18) of [2] suggests that in our case (recall that µ = 1 p ) Now we compare the spectral behavior of the operator of our interest A * D with that of the operator F in (2.10), assuming that the parameters d ± in (2.9) coincide with their counterparts d ± in (2.12), and substituting b = a · 1 [0,2π] and c = 1 [0,2π] in (2.10).
Using the equivalence in (2.13), we obtain Since λ n = s 2 n (A * D) by the definition of singular values, it follows that as required in (2.7), and the conclusion for small deviations follows.
So far, the result of the theorem is obtained only for the homogeneous coefficients (2.9). However, since any finite number of terms in the sequence (d k ) is irrelevant for small deviation probability asymptotics, by monotonicity of the quadratic form ∑ k∈Z d 2 k U 2 k in (d k ), it follows that (1.5) also holds for any (d k ) satisfying (1.2).