On normal domination of (super)martingales

Let ( S 0 , S 1 , . . . ) be a supermartingale relative to a nondecreasing sequence of σ -algebras ( H 6 0 , H 6 1 , . . . ), with S 0 6 0 almost surely (a.s.) and diﬀerences X i := S i − S i − 1 . Suppose that for every i = 1 , 2 , . . . there exist H 6 ( i − 1) -measurable r.v.’s C i − 1 and D i − 1 and a positive real number s i such that C i − 1 6 X i 6 D i − 1 and D i − 1 − C i − 1 6 2 s i a.s. Then for all natural n and all functions f satisfying certain convexity conditions where s := p s 21 + · · · + s 2 n and Z ∼ N (0 , 1). In particular, this implies P ( S n > x c 5 , P ( sZ > x ) ∀ x ∈ R , where c 5 , 0 = 5!( e/ 5) 5 = 5 . 699 . . . . Results for max 0 6 k 6 n S k in place of S n and for concentration of measure also follow

Since P(Z x) ∼ 1 x √ 2π e −x 2 /2 as x → ∞, a factor 1 x is "missing" here. The apparent cause of this deficiency is that the class of the exponential moment functions f (x) = e λx (λ 0) is too small (and so is the class of the power functions f (x) = |x| p ).
A similar result for the case when α = 1 and β = 0 is contained in the book by Shorack and Wellner (1986) (33), pages 797-799.
Remark 1.3. Typically, a log-concave tail function q(x) := P(η x) of a r.v. η with sup supp η = ∞ will satisfy the regularity condition It follows from the special case r = ∞ of (24, Theorem 4.2) that the constant factor c α,β in (1.5) and (1.8) is optimal not only over all pairs (ξ, η) satisfying (1.4), but also for every given r.v. η whose tail function q satisfies condition (1.10) and over all r.v.'s ξ satisfying (1.4). In particular, this is true when η has a normal or exponential distribution. (This remark was prompted by an anonymous referee's question.) Remark 1.4. As follows from (24, Remark 3.13), a useful point is that the requirement of the log-concavity of the tail function q(x) := P(η x) in Theorem 1.2 can be relaxed by replacing q with any (e.g., the least) log-concave majorant of q. However, then the optimality of c α,β is not guaranteed.
Note that c 3,0 = 2e 3 /9, which is the constant factor in (1.3). Bobkov, Götze, and Houdré (2001) (4) (BGH) obtained a simpler proof of inequality (1.3), but with a constant factor 12.0099 . . . in place of 2e 3 /9 = 4.4634 . . .. To obtain the comparison of the tails of S n := ε 1 a 1 + · · · + ε n a n and Z, BGH used a more direct method, based on the Chapman-Kolmogorov identity for the Markov chain (S n ) (rather than on comparison of generalized moments). Such an identity was used, e.g., in Pinelis (2000) (26) to disprove a conjecture by Graversen and Peškir (1998)  P (ε 1 a 1 + · · · + ε n a n x) for all values x that r.v. 1 √ n (ε 1 + · · · + ε n ) takes on with nonzero probability. In this paper, we obtain upper bounds on generalized moments and tails of supermartingales with bounded, possibly asymmetric differences. These bounds are substantially more precise than the corresponding exponential ones and appear to be new even for sums of independent r.v.'s.

Domination by normal moments and tails
Throughout, unless specified otherwise, let (S 0 , S 1 , . . . ) be a supermartingale relative to a nondecreasing sequence (H 0 , H 1 , . . . ) of σ-algebras, with S 0 0 almost surely (a.s.) and differences X i := S i − S i−1 , i = 1, 2, . . . . Unless specified otherwise, let E j and Var j denote the conditional expectation and variance, respectively, given H j . The following theorem is the basic result in this paper.
The proofs of this and other statements (wherever a proof is necessary) are deferred to Section 5.
By virtue of Theorem 1.2, one has the following corollary under the conditions of Theorem 2.1.
+ , and all n = 1, 2, . . . Ef (S n ) c 5,β Ef (sZ). (2.4) In particular, for all real x, + and c 5,β is the least, and hence the best, possible (recall (1.2)). It can be shown that this value, 5, cannot be replaced by 4. It may be possible to replace 5 by some number α in the interval (4,5). However, in view of the proof of Lemma 5.1.2, it appears that the proof for a (non-integer!) α ∈ (4, 5) in place of 5 would be very difficult, if attainable at all, and its benefits will not be very significant; indeed, for any α ∈ (4, 5) the factor c α,0 will be in the rather narrow interval (c 4,0 , c 5,0 ) ≈ (5.119, 5.699); that is, the constant factor c 5,0 cannot be significantly reduced. Cf. Remark 2.5 below. However, the following improvement of the bound in (1) may in certain instances be even more significant.
Theorem 2.6. Suppose that for every i = 1, 2, . . . there exist a positive H (i−1) -measurable r.v. D i−1 and a positive real numberŝ i such that a.s. Letŝ Then one has all the inequalities (2.3)-(2.8), only with s replaced byŝ.
Remark 2.7. Theorem 2.1 may be considered as a special case Theorem 2.6. Indeed, it can be seen from the proofs of these two theorems (see Lemma 5.1.1 in this paper and Lemma 3.1 in (29)) that one may assume without loss of generality that the supermartingales (S i ) in Theorem 2.1 and 2.6 are actually martingales with S 0 = 0. Therefore, to deduce Theorem 2.1 from Theorem 2.6, it is enough to observe that for any r.v. X and constants c < 0 and d > 0, one has the following implication: In turn, implication (2.12) follows from (14) (say), which reduces the situation to that of a r.v. X taking on only two values. Alternatively, in light of the duality result (24, (4)), it is easy to give a direct proof of (2.12). Indeed, EX = 0 and P(c X d) = 1 imply However, instead of deducing Theorem 2.1 from Theorem 2.6, we shall go in the opposite direction, proving Theorem 2.6 based on Theorem 2.1.
Thus, Theorem 2.1 is seen as the main result of this paper.
Remark 2.8. The set of conditions (2.9)-(2.10) is equivalent to for positive σ and d 0 . This follows simply because the inequalities X i D i−1 and d Thus, in the case when Var i−1 X i < D 2 i−1 a.s., conditions (2.9)-(2.10) represent an improvement of condition D 2 i−1 ∨ Var i−1 X i ŝ 2 i a.s., considered in (2; 3). In a certain variety of cases, this improvement may be even more significant than the improvement in the constant factor from 427 to 5.699 . . . before the probability sign.
On the other hand, it can be shown that the value σ * (d 0 , σ 2 ) is equal or close to the optimal value s = s * (d 0 , σ 2 ), which is the smallest value s 0 satisfying the inequality E( . It can be seen that, even if u is as small as From the "right-tail" bounds stated above, "two-tail" ones immediately follow:  That (S 0 , S 1 , . . . ) in Theorems 2.1 and 2.6 is allowed to be a supermartingale (rather than only a martingale) makes it convenient to use the simple but powerful truncation tool. (Such a tool was used, for example, in (20; 21) to prove limit theorems for large deviation probabilities based only on precise enough probability inequalities and without using Cramér's transform, the standard device in the theory of large deviations.) Thus, for instance, one has the following corollary from Theorem 2.6. Then for all real x P(S n x) P max (2.14) These bounds are much more precise than the exponential bounds in (10; 9; 19).  α and x > t, and let (S n ) be a martingale or, more generally, a submartingale. Assume, moreover, that α > 1. Then, for any natural n,

Concentration inequalities for separately Lipschitz functions
Definition 4.1. Let us say that a real-valued function g of n (not necessarily real-valued) arguments is separately Lipschitz if it satisfies a Lipschitz type condition in each of its arguments: for all i and all x 1 , . . . , x n ,x i , where ρ i (x i , x i ) depends only on i,x i , and x i . Let the radius of the separately Lipschitz function g be defined as where The concentration inequalities given in this section follow from martingale inequalities given in Section 2. The proofs here are based on the improvements given in (20) and ( Papers (36), (20), and (32) deal mainly with separately Lipschitz function g of the form g(x 1 , . . . , x n ) = x 1 + · · · + x n , where the x i 's are vectors in a normed space; however, it was already understood there that the methods would work for much more general functions g -see e.g. (32, Remark 1). In a similar fashion, various concentration inequalities for general functions g were obtained in (17; 18) and (1).
Suppose that a r.v. Y with a finite mean can be represented as a real-valued Borel function g of independent (not necessarily real-valued) r.v.'s X 1 , . . . , X n : Theorem 4.2. If g is separately Lipschitz with a radius r > 0, then

4)
where Z ∼ N (0, 1). In particular, for all real x, and  The next proposition shows how to obtain good upper bounds on Ξ i (x 1 , . . . , x i−1 , x i ) and EΞ i (x 1 , . . . , x i−1 , X i ) 2 , to be used in Theorem 4.4.
Proposition 4.6. If g is separately Lipschitz so that (4.1) holds, then for all i and all it is assumed that the function ρ i is measurable in an appropriate sense; for the second inequality in (4.10), it is also assumed that an appropriately defined expectation EX i exists, for all i. If, moreover, the function g is convex in each of its arguments, then for all i and all x 1 , . . . , x i , Remark 4.7. We do not require that ρ i be a metric. However, the smallest possible ρ i , which is the supremum of the left-hand side of (4.1) over all x 1 , . . . , x i−1 , x i+1 , . . . , x n , is necessarily a pseudo-metric. Note also that, for r i defined by (4.2), for all x i , provided e.g. the additional conditions that (i) ρ i (x i ,x i ) = x i −x i i for some seminorms · i and all i, x i andx i ; (ii) X i is symmetrically distributed; and (iii) x i belongs to the support of the distribution of X i .
Concerning exponential bounds for sums of independent B-valued r.v.'s and for martingales in 2-smooth spaces, see (23).
The separately-Lipschitz condition (4.1) is obviously equivalent the 1 -like Lipschitz condition for all x 1 , . . . , x n ,x 1 , . . . ,x n (provided that each ρ i is the smallest possible and hence a pseudometric, as indicated in Remark 4.7). A particular case of the 1 -like pseudo-metric is the widely used (especially in combinatorics and computer science (17; 18)) Hamming distance. The upper bounds presented in this section are substantially more precise than exponential bounds such as ones found in (17; 18); cf. Remark 2.4.

Proofs for Section 2
Let us first observe that Theorem 2.1 can be easily reduced to the case when (S n ) is a martingale. This is implied by the following lemma, which is obvious and stated here for the convenience of reference.
ThenX i is H i -measurable,C i−1 andD i−1 are H (i−1) -measurable, and one has Proof of Theorem 2.1. The proof is similar to the proof of Theorem 2.1 in (29) but based on the crucial Lemma 5.1.2 below, in place of Lemma 3.2 in (29). Also, one has to refer here to Lemma 5.1.1 instead of Lemma 3.1 in (29). Indeed, by Lemma 5.1.1, one may assume that E i−1 X i = 0 for all i. Let Z 1 , . . . , Z n be independent standard normal r.v.'s, which are also independent of the X i 's, and let R i := X 1 + · · · + X i + s i+1 Z i+1 + · · · + s n Z n .
LetẼ i denote the conditional expectation given X 1 , . . . , X i−1 , Z i+1 , . . . , Z n . Note that, for all i = 1, . . . , n, one hasẼ i X i = E i−1 X i = 0; moreover, R i −X i = X 1 +· · ·+X i−1 +s i+1 Z i+1 +· · ·+s n Z n is a function of X 1 , . . . , X i−1 , Z i+1 , . . . , Z n . Hence, by Lemma 5.1.2, for any f ∈ F whence Ef (S n ) Ef (R n ) Ef (R 0 ) = Ef (s Z) (the first inequality here follows because S 0 0 a.s. and any function f in F   N (0, 1). Then for all f ∈ F Proof. This proof is rather long. Let X c,d be the set of all r.v.'s X such that EX = 0 and c X d a.s. Without loss of generality (w.l.o.g.), f = f t for some t ∈ R, where In view of (14) (say), for any given real t, a maximum of Ef t (X) over all r.v.'s X in X c,d is attained when X takes on only two values, say a and b, in the interval [c, d].
Since the function f t is convex, it then follows that w.l.o.g. a = c and b = d. Indeed, one can prove that Ef t (σZ) is non-decreasing in σ > 0 by an application of Jensen's inequality. Moreover, by rescaling, w.l.o.g. d − c = 2. In other words, then one has the following: with probability 1 − r, 2r − 2 with probability r, for some r ∈ [0, 1]. Now the right-hand side of inequality (5.1) can be written as where P (t) := 8 + 9t 2 + t 4 and Q(t) := t(15 + 10t 2 + t 4 ), and its left-hand side as For t = 0, one has the identity where which is a polynomial in r and t. Note that Therefore, the critical points of Q 2 in the interior int B of domain B are the solutions (r, t) of the system of polynomial equations d(r, t) = 0, ∂ r Q 1 (r, t) = 0.
Hence, R(t)−L(r,t) < 0 for each r ∈ (0, 1) and all t < 0 with large enough |t|. Since R(t)−L(r,t) is decreasing in t on B, one has R(t)−L(r,t)

It remains to consider
Case 2 (r, t) ∈ C. Here, letting v := 2r − t, one has 0 v 2, and, by (5.3), Let us use here notation introduced in the above consideration of Case 1. Then This implies that Q 2 has no critical points in int C.
Next, with v > 0, On the boundaries r = 1 and t = 2r of C, one has Q 2 = −120 < 0. The boundary t = 2r − 2 of C is common with B, and it was shown above that Q 2 < 0 on that boundary as well.
Hence, just as on B, one has that L(r, t) R(t) on C − . Moreover, R(t)−L(r,t) is decreasing in t on C + , one has R(t)−L(r,t) > 0 on C + and hence L(r, t) < R(t) on C + .
One concludes that L(r, t) R(t) on the entire set C.
Proof of Theorem 2.6. This proof is similar to the proof of Theorem 2.1 in (29) and Theorem 2.1 of this paper, but based on the following lemma, instead of Lemma 3.2 in (29) N (0, 1). Then for all f ∈ F (5) Ef (X) Ef (sZ). (5.6) Proof. In view of (1.2), one has F (5) ⊆ F (2) . Therefore, by Lemma 3.2 in (29), one may assume without loss of generality that here X = d · X a , where a = σ 2 /d 2 . Now it is seen that Lemma 5.
Hence, letting and using Fubini's theorem, one has by Hölder's inequality.
Proof of Corollary 3.10. The second inequality in (3.10) follows by Propositions 3.6 and 3.8. Equalities (3.11) follow from the definitions. The first two equalities in (3.12) follow by Proposition 3.9, while the third equality in (3.12) follows from the definition.

Proofs for Section 4
The proofs here are based on the improvements given in (20) and (32)  For a r.v. Y as in Theorem 4.2, consider the martingale expansion Y − EY = ξ 1 + · · · + ξ n , of Y − EY with the martingale-differences where E i and Var i denote, respectively, the conditional expectation and variance given the σalgebra (say H i ) generated by (X 1 , . . . , X i ). For each i pick an arbitrary non-random x i , and introduce the r.v.