Large Deviations for Weighted Sums of Stretched Exponential Random Variables

We consider the probability that a weighted sum of $n$ i.i.d. random variables $X_j$, $j = 1, . . ., n$, with stretched exponential tails is larger than its expectation and determine the rate of its decay, under suitable conditions on the weights. We show that the decay is subexponential, and identify the rate function in terms of the tails of $X_j$ and the weights. Our result generalizes the large deviation principle given by Kiesel and Stadtm\"uller [8] as well as the tail asymptotics for sums of i.i.d. random variables provided by Nagaev [10, 11]. As an application of our result, motivated by random projections of high-dimensional vectors, we consider the case of random, self-normalized weights that are independent of the sequence $\{X_j\}_{j \in \mathbb N}$, identify the decay rate for both the quenched and annealed large deviations in this case, and show that they coincide. As another example we consider weights derived from kernel functions that arise in non-parametric regression.


Introduction
Let {X j } j∈N be a sequence of independent and identically distributed (i.i.d.) random variables on a probability space (Ω, F, P) with values in R and with finite expectation m := E[X 1 ] < ∞. For n ∈ N, let S n := n j=1 X j , denote the partial sum andS n := S n /n the empirical mean values. The strong law of large numbers implies thatS n → m almost surely. Cramér's Theorem on large deviations tells us that, if the X j have finite exponential moments, that is, there exists t > 0 such that where Λ * (x) := sup t≥0 {tx − log M (t)} > 0. We will refer to this case as the "light-tailed" case. It is well known that if M (t) = +∞ for all t > 0, the probabilities P S n ≥ x decay slower than exponentially. The reason is that, in contrast to when (1.1) holds, a "deviation" of the typeS n ≥ x is produced by the event that just one of the random variables takes a large value. For instance, if there is r ∈ (0, 1) and c > 0 such that P (X 1 ≥ t) = c exp(−t r ) for t large enough, then The result in (1.2) goes back to [10] and it will also follow from our main result, Theorem 1. Cramér's Theorem was generalized by [8] to weighted sums of i.i.d. random variables, see Section 2 below for a precise statement of their results. Our main result, Theorem 1, gives a corresponding statement for weighted sums of i.i.d. random variables with stretched exponential tails. One motivation to consider weighted sums, which is elaborated upon in Section 5.1, comes from random projections of high-dimensional vectors, which are of relevance in asymptotic geometric analysis [5,9] and data analysis [2]. Another motivation stems from statistics (kernel functions, moving averages) considered for the light-tailed case in [8], since stretched exponential random variables arise in many applications. See Section 5.2 for an example. This article is organized as follows: We first present the result and the regularity conditions from [8] in Section 2. Our main result, Theorem 1, is given in Section 3, and its proof is presented in Section 4. Finally, in Section 5.1, we give an application to random weights, and in Section 5.2, we consider weights derived from kernel functions that arise in non-parametric regression.

The Light-Tailed Case
For n ∈ N, let {a j (n)} j∈N be a sequence of real numbers which we will call weights. For n ∈ N define the weighted sum a j (n)X j and the measure µ n on B (R), the set of Borel sets in R, as When the {X j } j∈N have finite exponential moments, that is the moment generating function M (t) defined in (1.1) is finite for all t ∈ R, a large deviation principle for the sequence of weighted sums {S n } n∈N was established in [8] under suitable assumptions on the weights, see Assumption A below. The "classical" case of Cramér's theorem corresponds to a j (n) = 1/n, j = 1, 2, . . . , n, n ∈ N.
Now, let Λ denote the cumulant (or log moment) generating function of X 1 , and let {c ν } ν∈N be the sequence of coefficients that arise in the power series expansion for Λ: Also, for t > 0, let χ(t) := ∞ ν=1 sν cν ν! t ν , and let χ * denote its Legendre-Fenchel transform: It was shown in [8] that under Assumption A the sequence of measures {µ n } n∈N on B(R) defined in (2.4) satisfies a large deviation principle with speed n and rate function χ * . Recall that this means that where A • andĀ, respectively, represent the interior and the closure of the set A.
Remark 2. 1. In fact, [8] provides a more general result that considers an infinite sum and refers to a general scale within the regularity conditions (cf. Assumption A), that is, they prove large deviations for the family of weighted sums of the form A(λ) := ∞ j=1 a j (λ)X j , where λ ∈ I and either I = N or I = [0, ∞].
Our goal will be to relax the finiteness assumption (2.7) on the moment generating function M (·).

Main Result
In order to present our large deviation result for weighted sums of stretched exponential random variables, we will use slightly different assumptions on the weights from those used in [8]. We will restrict our considerations to non-negative weights. As we show in Lemma 3.3 below, in this case, our assumptions are weaker than those used in [8]. Examples for weight sequences that satisfy both Assumption A and Assumption B include Valiron means, see [8] as well as kernel functions (see Section 5.2).
and let m := E[X 1 ]. Suppose that there exist a constant r ∈ (0, 1) and slowly varying functions b, c 1 , c 2 : (0, ∞) → (0, ∞) and a constant t * > 0 such that for t ≥ t * , For every n ∈ N, let {a j (n)} j∈N be a sequence of non-negative numbers that satisfy Assumption B with associated constants s 1 , s ∈ R, and let {S n } n∈N be the sequence of weighted sums defined in (2.3). Then Remark 3. 1. The non-negativity assumption on the weights could be relaxed only if one had more information about the lower tail of the {X j }, that is, about the probabilites P(X 1 ≤ −t) for t > 0. Consider the following example: a j (n) = 1/n, j = 1, . . . , ⌊2n/3⌋, a j (n) = −1/n, j = ⌊2n/3⌋+1, . . . , n (where, for z ∈ R, ⌊z⌋ represents the greatest integer less than or equal to z). Then Assumption B is satisfied with s 1 = 1/3 and s = 1. Take i.i.d. random variables {X j } j∈N with mean m that satisfy (3.11) and (3.12) and, in addition, satisfy P(X 1 ≤ −t) = exp(−t α ) for some α with 0 < α < r, and t large enough. Then, for every x > m/3, it can be shown that Indeed, to show (3.14), for any ε > 0, first write Then, applying Theorem 1 twice, first to {X j } j∈N and then to {−X j } j∈N , both times with a j (n) = 1/n, j ∈ N, and recalling that α < r, we infer that as n → ∞, n −α ln P( Sending ε → 0, we see that (3.14) holds with ≤ instead of equality. To show the opposite inequality in (3.14), write The first probability on the right-hand side goes to 1 due to the law of large numbers. Once again, applying Theorem 1 to {−X j } j∈N with a j (n) = 1/n, j ∈ N, for the second term on the right-hand side, and then letting ε → 0, we obtain (3.14) with ≥ instead of equality.
Together, both inequalities prove (3.14). However, we cannot recover α from the assumptions in Theorem 1.

Remark 3.2.
For the same reason as in the last remark, namely that the only assumption on the lower tail of {X j } j∈N is (3.11), we cannot strenghten (3.12) to a large deviation principle without imposing further assumptions. For x < s 1 m, the decay of P(S n ≤ x) is determined by the lower tail of the {X j }. For example, if the {X j } j∈N are bounded below, Cramér's Theorem implies that P(S n ≤ x) decays exponentially in n. If, on the other hand, P(X 1 ≤ −t) = exp(−t α ) with 0 < α < r, then as in Remark 3.1 we can show −∞ < lim n→∞ n −α log P(S n ≤ x) < 0.
Stretched exponential distributions have been proposed as a complement to the frequently used power law distributions to model many naturally occurring heavy-tailed distributions. Any distribution that satisfies (3.12) and is bounded below also satisfies (3.11). A concrete example is the Weibull distribution with shape parameter lying in the interval (0, 1). Before proceeding to the proof of Theorem 1, let us comment on the relationship between Assumptions A and B. In fact, for a non-negative sequence of weights, Assumption B is weaker than Assumption A. see Lemma 3. 3. To see that it is strictly weaker, consider the sequence of weights defined by a j (n) = n −1 + n −(1+ε) , j = 1, ..., n, for some ε ∈ (0, 1 2 ), for which it is easy to show that Assumption B holds, but (A.2) cannot be satisfied.

Proof of Theorem 1
We will prove a slightly stronger statement than Theorem 1, namely we show in Section 4.2 that if the first inequality in (3.12) is satisfied, then the lower bound First, in Section 4.1, we summarize some relevant properties of slowly varying functions. Throughout the section, the notation f (x) ∼ g(x) as x → ∞ for two functions f, g : R → R means that lim x→∞ f (x)/g(x) = 1. Also, given a set A, 1 1 A will denote the indicator function of A, which equals 1 on A and 0 on the complement. 4. 1. Properties of Slowly Varying Functions. We will need the following preliminaries on slowly varying functions. Proposition 3 corresponds to Proposition 1. 3.6 in [1], where Lemma 4 refers to (1.4) in [6].   is slowly varying if and only if there exist a > 0,η ∈ R and bounded measurable functions η(·) and ε(·) with η(x) →η, ε(x) → 0 as x → ∞ such that, for x ≥ a, ℓ can be written in the form As a direct consequence of Lemma 4.2, we have the following result.
where t 1 (n) = t ε 1 (n) is defined by Applying the lower bound of (3.12) with t = t 1 (n), we obtain (4.24) Note that by Assumption B, t 1 (n) ∼ x s − s 1 s m + ε s n as n → ∞. Since c 1 (·) and b(·) are slowly varying functions, Lemma 4.3 implies that c 1 (t 1 (n)) ∼ c 1 (n) and b (t 1 (n)) ∼ b(n) as n → ∞. Moreover, note that for some fixed δ ∈ (0, r), we can express log c 1 (n)/b(n)n r = (log c 1 (n)/ log n)(log n/n δ )(b(n)n r−δ ) −1 , and the right-hand side goes to zero as n → ∞ by properties (i) and (iii) of Proposition 4.1. Furthermore, since the {X j } have finite second moments by (3.11), and (B.2) implies that n j=1,j =j * (n) a j (n) 2 ≤ n(a max (n)) 2 → 0 as n → ∞, it follows that j∈{1,...,n},j =j * (n) a j (n)(X j − m) converges to 0 in L 2 . In turn, this implies that lim n→∞ P( j∈{1,...,n},j =j * (n) a j (n)(X j − m) ≥ −ε) = 1. Thus, taking logarithms of both sides of (4.24), then dividing by b(n)n r and sending first n → ∞, and then ε ↓ 0, we obtain the lower bound (4.19). 4. 3. The Upper Bound. Let t 2 (n) := n x s − s 1 s m . Then, we can write P S n ≥ x ≤ A n 1 + A n 2 , (4.25) where, for n ∈ N, The union bound and the upper tail bound for X 1 in (3.12) imply that A n 1 ≤ nP(X 1 ≥ t 2 (n)) ≤ nc 2 (t 2 (n)) · exp {−b (t 2 (n)) (t 2 (n)) r } . Since b is slowly varying, b (t 2 (n)) ∼ b (n) as n → ∞, and properties (i) and (iii) of Proposition 4.1 show that lim n→∞ log n/b(n)n r = lim n→∞ log c 2 (t 2 (n))/b(n)n r = 0. Together with the last display, this implies that Next, we turn to A n 2 . Applying the exponential Chebyshev inequality with a positive real parameter β ζ (n)/s (to be specified later) we obtain where, for j = 1, . . . , n, n ∈ N, and ζ > 0, we define We now show that the upper bound (4.20) is satisfied if the following proposition holds. Together with (4.25), and the analogous bound (4.26) for A n 1 , we obtain the upper bound (4.20). Thus, to prove the upper bound, it only remains to prove Proposition 4. 4. We use similar techniques as in [7].
Proof of Proposition 4.4. Fix ζ < ( x s − s 1 s m) r−1 and denote β ζ (n) and Λ j ζ simply as β(n) and Λ j . For the fixed r ∈ (0, 1), we also choose k ∈ N such that r < k/(k + 1). Then, by the definition (4.30) of Λ j , the estimates log x ≤ x − 1 for x > 0 and e x − 1 ≤ x + 1 2 x 2 + 1 6 x 3 + ... + 1 (k+1)! x k+1 e x , finiteness of the moments of X j due to (3.11), and the fact that β(n)/(b(n)n r ) → ζ and n j=1 a j → s 1 as n → ∞, we have lim sup Since, by Assumption B, lim To complete the proof of Proposition 4.4, it suffices to show that B 0 = 0. In this regard, we distinguish between the cases X (n) j < t * and X Combined with (4.32) and recalling that a max (n) := max 1≤j≤n a j (n), this shows that B 1 (n) → 0 as n → ∞. Now, to bound B 2 (n), first note that by Hölder's inequality, for any ε > 0 we have . (4.36) Due to the finiteness of the moments of X 1 assumed in (3.11), the limit in (4.34) yields When combined with (4.33) and (4.36), to prove the convergence of B 2 (n) to zero, it clearly suffices to show that (4.37) lim sup s − s 1 s m r−1 and the claim follows as ε → 0. To derive an upper bound for the expectation in (4.37) we will use the following integration-by-parts formula.

Lemma 4.5 (Integration by parts).
For any random variable X on a probability space (Ω, F, P) and any α > 0, a, b ∈ R with a < b the following relation holds: Recalling that X (n) j = X j 1 1 {X j <t 2 (n)} , applying Lemma 4.5 with a = t * and b = t 2 (n), we deduce that Since b(n)n r → ∞, the second term on the right-hand side of (4.38) converges to 0 by (4.35). Now, let ζ * := ζ · x s − s 1 s m . Inserting the upper bound (3.12) on the tail of X 1 , substituting y := (t 2 (n)) −1 z and recalling the definition of β(n) from (4.28), we see that the first term on the right-hand side of (4.38) is bounded above by t * t 2 (n) I n (y)dy, (4.39) where the integrand I n (·) is given by for y ∈ (0, 1]. Since b(·) is slowly varying and condition (B.2) holds, we see that the coefficient in front of the integral in (4.39) converges to (1 + ε)ζ * as n → ∞. It now remains to show that, for every ζ * < (1 + ε) −1 x s − s 1 s m r , the integral in (4.39) stays bounded as n → ∞.
By the assumption that b(·) is slowly varying and since r < 1, for any fixed y ∈ (0, 1] and any ζ * < (1 + ε) −1 x s − s 1 s m r , it follows that I n (y) → 0 as n → ∞. Therefore, we need to examine the lower limit of integration y n := t * /(t 2 (n)) and show that I n (y n ) stays bounded as n → ∞. Recalling that t 2 (n) = n( x s − s 1 s m) and ζ * = ζ( x s − s 1 s m), note that Since na max (n) ∼ s, b(t 2 (n)) ∼ b(n) and n r−1 b(n) → 0 as n → ∞, it follows that lim sup n→∞ I n (y n ) is finite.
Thus, we have shown that B n 2 converges to zero as n → ∞ and hence, that B 0 = 0. This completes the proof of Proposition 4.4, and hence, the upper bound (4.20) and Theorem 1 follow.

Example 1: Random Weights.
We consider a sequence of strictly positive i.i.d. random variables {θ j } j∈N on (Ω, F, P) and assume that they are P-almost surely uniformly bounded, that is, their essential supremum is finite: θ i X j , n ∈ N.
We prove a large deviation theorem for the sequence of random weighted sums {S n } n∈N , both in the "quenched" (i.e., conditioned on the weight sequence {θ j } j∈N ), and "annealed" (i.e., averaged over the weight sequence) cases. Note thatS n can be viewed as a random projection of the data {X i }. Random projections have attracted much interest in recent research in applied mathematics as an important tool in data analysis and dimensionality reduction [2], as well as in asymptotic geometric analysis [5,9].
We now turn to the proof of (5.44). Note that we have , P-almost surely, and the probability of a deviation decays exponentially in n, due to Cramér's Theorem (recall that the {θ i } are uniformly bounded!). We will now show that Remark 5. 1. The equality of the quenched and annealed rate functions in (5.43) and (5.44), respectively, is characteristic of our regime; it is in sharp contrast to the case of light-tailed random variables X j , that is, random variables X j satisfying (1.1). In the light-tailed case, P S n ≥ x θ 1 , θ 2 , ...) and P S n ≥ x both decay exponentially in n, but the rate functions will in general not be the same. This was one of the motivations for the present paper, and will be treated in forthcoming work.

Example 2: Kernel Functions.
In non-parametric regression kernels are frequently used as weighting functions. They are an important tool to smooth data. Applications include the approximation of probability density functions and conditional expectations. a j (n)X j = 1 n n j=1 k 2 · j − n/2 n X j , n ∈ N.