About Doob’s inequality, entropy and Tchebichef

In this paper we give upper bounds on the tail or the quantiles of the one-sided maximum of a nonnegative submartingale in the class L log L or the maximum of a submartingale in L p . Our upper bounds involve the entropy in the case of nonnegative martingales in the class L log L and the L p -norm in the case of submartingales in L p . Starting from our results on entropy, we also improve the so-called bounded diﬀerences inequality. All the results are based on optimal bounds for the conditional value at risk of real-valued random variables.


Introduction
This paper is motivated by the question below. Let (M k ) 0≤k≤n be a real-valued submartingale in L 1 . Define M * n = max(M 0 , M 1 , . . . , M n ). How to provide an upper bound on the tail or the quantiles of M * n under some additional integrability conditions on the submartingale?
In order to explain our results, we need the definition of the quantile function of a random variable X and some basic properties of this function. Definition 1.1. Let X be a real-valued random variable. The tail function H X is defined by H X (t) = IP(X > t). The quantile function Q X is the cadlag inverse of H X .
The basic property of Q X is: x < Q X (u) if and only if H X (x) > u. This property ensures that Q X (U ) has the same distribution as X for any random variable U with the uniform distribution over [0, 1]. Definition 1.2. The median µ(X) of a real-valued random variable X is defined by µ(X) = Q X (1/2).
Let us now recall Doob's maximal inequalities. Below we assume that the random variables M 0 , M 1 , . . . , M n are nonnegative. The first inequality is in fact due to Ville (1939 for any x > 0. Doob (1940) Assume now that the random variable M n is in the class L log L of real-valued random variables X such that IE(|X| log + |X|) < ∞. Applying Ville's inequality to the submartingale (M k log + M k ) 0≤k≤n , one immediately gets that, for any x > 1, This inequality proves that the tail of M * n is at most of the order of (x log x) −1 as x ∞. Nevertheless, first the upper bound tends to ∞ as x 1, even under the normalization condition IE(M n ) = 1 and, second, the quantities involved here fail to be homogeneous. Therefore, it seems clear that the above upper bound can be improved.
We now recall the known results for nonnegative submartingales in the class L log L. In mathematical finance,Q X is called conditional value at risk of X. Clearly Q X ≤Q X . Blackwell and Dubins (1963)  Later Dubins and Gilat (1978) proved the optimality of (1.4). For a nonnegative random variable X,Q X is known as the Hardy-Littlewood maximal function assoicated with X. Hardy and Littlewood (1930, Theorem 11) proved that 1 0Q X (u)du ≤ c IE(X log + X) + 1 for some universal positive constant c, which gives an alternative proof of (1.3), up to the constant. The above inequality is usually called L log L inequality of Hardy and Littlewood. Gilat (1986, Theorem 3) proved that the two-parameter inequality 1 0Q X (u)du ≤ cIE(X log X) + d (1.5) holds for any c > 1 and any d ≥ e −1 c 2 (c − 1) −1 . In particular, if c = e/(e − 1) then (1.5) holds true with d = e/(e − 1). Using (1.4), it follows that (1.2) holds true with IE(M n log M n ) instead of IE(M n log + M n ). The martingale counterpart of (1.5) may be found in Osekowski (2012, Theorem 7.7). Curiously (1.2) and (1.5) fail to be homogeneous, since they are not invariant under the multiplication of the submartingale or the random variable X by a constant factor, so that one can have some doubts about their optimality. Starting from Doob's inequality and introducing the entropy of M n , Harremoës (2008) improved Gilat's result. For a nonnegative real-valued random variable X such that IE(X > 0) and IE(X log + X) < ∞, define the entropy H(X) of X by Defining the function g : Harremoës (2008, Theorem 4) also proved that (1.8) is tight. It appears here that the entropy is the adequate quantity for nonnegative submartingales in the class L log L. In the present paper we will obtain estimates for the tails or the quantiles of M * n involving entropy. In order to get these estimates, we give a covariance inequality in Section 2. Next, in Section 3 we derive upper bounds on the tail function of M * n from (1.4) and this covariance inequality. We also prove that our main inequality is sharp for positive martingales with given entropy and expectation.
Assume now that the random variable M n fulfills the stronger moment condition IE|M n | p < ∞ for some p > 1. For any real y, let y + = max(0, y). By the Ville inequality applied to the nonnegative submartingale (M k − a) p + , for any x > 0 and any real a. (1.9) Setting a = IE(M n ) in the above inequality, we obtain a deviation inequality for M * n around IE(M n ). This inequality proves that the tail of M * n is at most of the order of x −p as x ∞. However, the upper bound tends to ∞ as x 0. Recall now the Tchebichef-Cantelli inequality (see Tchebichef (1874) 1 and Cantelli (1932), Inequality (19), p. 53): for any real-valued random variable X in L 2 and any positive x, Tchebichef (1874) We refer to Savage (1961) for a review of probabilities inequalities of the Tchebichef type with a complete bibliography. This inequality is equivalent to the upper bound For instance (1.10) ensures that µ(X) ≤ IE(X) + σ. In Section 4, we give a maximal version of (1.10) for submartingales in L p . In the special case p = 2 our result yields which is an extension of the above bound to maxima of submartingales. We then apply our results to martingales in L p for p in ]1, 2] and we compare the so obtained upper bounds on Q M * n with the upper bounds that can be derived from the minimization of (1.9) with respect to a. These upper bounds are based on von Bahr-Esseen type inequalities. In particular, in order to make a fair comparison of the results that can be derived from (1.9) with the extension of Tchebichef-Cantelli's inequality to martingales in L p , we prove a one-sided von Bahr-Esseen type inequality in the Annex.
(1. 16) In Section 5, we obtain extensions of (1.15) to martingales in L p for some p > 2. For instance, if p = 4, our result yields where σ and L 4 are defined in (1.13), which improves (1.14) in the case p = 4.
To conclude this paper, we consider sub-Gaussian martingales. As pointed by Ledoux (1996), entropy methods have interesting applications to concentration inequalities. In Section 6, we apply the results of Section 3 to sub-Gaussian martingales. With this aim in view, we introduce the notion of entropic sub-Gaussian random variable. We then prove that entropic sub-Gaussian random variables satisfy more precise tail inequalities than the usual sub-Gaussian random variables. Finally, in Section 7, we apply the results of Section 6 to the so-called bounded differences inequality.

A covariance inequality involving entropy
Throughout this section X is a nonnegative real-valued random variable. We assume that IE(X log + X) < ∞ and IE(X) > 0. The main result of this section is the covariance inequality below.
Theorem 2.1. Let X be a nonnegative random variable satisfying the above conditions and η be a real-valued random variable with finite Laplace transform on a right neighborhood of 0. Then

Proof.
A shorter proof can be done using the duality formula for the entropy (see Boucheron et al. (2013), Section 4.9 for this formula). However a self-contained proof is more instructive (see Remark 2.1 below). Define the two-parameter family of functions with the convention 0 log 0 = 0. Clearly Next the function x → xy − ϕ a,b (x) takes its maximum at point x = ae by − 1, from which Taking the expectation in the above inequality, Let us now minimize the upper bound. Deriving the upper bound with respect to a, we get that the optimal value of a is a = eIE(X)/IE(e bη ). Choosing this value in (2.5), we get that which implies Theorem 2.1.
Remark 2.1. Notice that the proof of Theorem 3 in Gilat (1986) is based on the inequality Xη ≤ ϕ 1,b (X) + ϕ * 1,b (η), where b = 1/c and η = log(1/u). The minimization with respect to a is omitted, which leads to a suboptimal inequality. The same default appears in the proof of Theorem 7.7 in Osekowski (2012).
Recall now the well-known upper bound which is a direct byproduct of (1.4). If E(M n ) = 1, from (2.8) and (2.7) applied with X = Q Mn (U ) and η = log(1/U ),
3 Bounds on the tail of M * n involving entropy The main result of this section is the upper bound below on conditional value at risk of X. This upper bound has a variational formulation. From this upper bound we will then derive explicit upper bounds on the tail function of M * n . Theorem 3.1. Let X be a nonnegative random variable, such that IE(X) = 1 and H(X) = H for some H in ]0, ∞[. LetQ X be defined by (1.3). Then, for any z > 1 An other formulation of ψ H is Furthermore ψ H (z) = z for any z ≤ e H and ψ H (z) < z for any z > e H .

(c)
Conversely, for any H in ]0, ∞[ and any z > 1, there exist a nonnegative random variable Y such that Proof. We start by the proof of (a). From Theorem 2.1 applied to the random variables Q X (U ) and B = z 1 zU ≤1 , which implies (a). To prove (b) it is enough to set t = z −1 log c in the definition of ψ H . Then e zt = c, which gives (b).
To prove (c) and (d), we separate two cases. If H ≥ log z, Hence, ψ H (z) ≥ z by Theorem 3.1(b). Now Next B is infinitely differentiable, strictly convex and has the asymptotic expansion It follows that g : t → t B (t) − B (t) is continuous, strictly increasing and satisfies lim 0 g = 0 and lim ∞ g = log z > H. Hence there exists a unique t 0 > 0 such that g(t 0 ) = H and f has a minimum at t = t 0 . Furthermore, since f (t 0 ) = 0, which gives (d) and completes the proof of Theorem 3.1.
Remark 3.1. For any nonnegative random variable X and any positive α,Q αX = αQ X and H(αX) = αH(X). Hence Theorem 3.1(a) implies that, for any nonnegative random variable X such that IE(X) > 0 and H(X) < ∞, Remark 3.2. From (1.4) and the above Remark, Theorem 3.1 applied to nonnegative submartingale (M k ) 0≤k≤n yields such that M 1 has the law µ and M * 1 = sup{M t : t ∈ [0, 1] } has the Hardy-Littlewood maximal distribution associated to µ, which means that Q M * 1 =Q M 1 . Hence Theorem 3.1 provides an optimal upper bound, at least for continuous time martingales, which shows that Ville's inequality cannot be improved if z ≤ e H .
We now give upper bounds on the tail function of M * n . For an integrable random variable X, letH X denote the tail function of the Hardy-Littlewood maximal distribution associated with the law of X. By definition, if U is a random variable with the uniform distribution over [0, 1]H Hence it is enough to bound upH Mn . Thus the upper bound below onH X will be the main ingredient for proving maximal inequalities.
Theorem 3.2. Let X be a nonnegative random variable, such that IE(X) = 1 and and L * v (y) = +∞ for y > 1. Define also the nonnegative function h by Proof. For any positive v, define the Bernoulli type random variable ξ by Define the Legendre-Fenchel dual L * of the convex and increasing function L : (3.14) With the above notations, if ψ H is the function already defined in Theorem 3.1(a), Applying now Theorem 3.1(a) with z = 1/p and noticing thatQ X (p) = x + 1 (thanks to the continuity ofQ X ), we get that which concludes the proof of Theorem 3.2.
We now derive explicit upper bounds onH X from Theorem 3.2.
Under the assumptions of Theorem 3.2, From Theorem 3.3 and (3.9), we immediately get the corollary below.
With the same notations as in Theorem 3.3, for any x > 0, Consequently Remark 3.4. The above bounds own the same structure as the Tchebichef-Cantelli inequality. Note that the first upper bound in (a) is equivalent Proof of Theorem 3.3. Let x > 0. We start by proving that To prove (3.17), we derive ϕ v twice: Bercu et al. (2015), page 34). Deriving again, Next, integrating this inequality and using the initial condition ϕ v (0) = p(1 + v), since p(1+v) = v. Finally, integrating twice this inequality and using the initial conditions ϕ(0) = ϕ (0) = 0, we get (3.17).
In the above inequation h 0 − h 1 > 0. Solving this inequation of order two with respect , which gives the first part of (a). The second part of (a) follows by noting that 4H(h 0 − h 1 ) > 0.
If furthermore x ≤ √ 2H, then h 0 ≤ H, which implies that From the above inequality and the fact that 2h 0 − h 1 > 0, which ends up the proof.
Numerical comparisons. To conclude this section, we compare Corollary 3.3 with usual tail inequalities for maxima of martingales. Here we assume that (M k ) 0≤k≤n is a positive martingale such that IE(M n ) = 1. Then, by the Ville inequality Next, let h be defined by (3.11): by the Ville inequality applied to the nonnegative submartingale (h(M k − 1) ) 0≤k≤n , which implies the weaker inequality

Tchebichef type inequalities
At the present time the Tchebichef-Cantelli inequality has not yet been extended to random variables in L p , for arbitrary p > 1. In this section we give an extension of this inequality to the Hardy-Littlewood maximal distribution associated with the law of a random variable X in L p . Next we apply this result to submartingales in L p . So, let (M k ) k∈[0,n] is a submartingale in L p . From Gilat and Meilijson (1988), the nonnegativity assumption can be dropped in (1.4). Hence, in order to bound Q M * n , it is enough to bound upQ X for a random variable X in L p . Theorem 4.1. Let p be any real in [1, ∞[ and X be a real-valued random variable in L p . LetQ X be defined by (1.3). Theñ Conversely, for any p > 1 and any z > 1, there exists a random variable X in L p such that If furthermore X has a symmetric law, theñ where σ is the standard deviation of X. Since Q X ≤Q X , it implies (1.10). For p > 1, the upper bound tends to IE(X) as z 1, which proves that Theorem 4.1 is efficient for any value of z.
If furthermore M n has a symmetric law, then In particular (4.1) ensures that the median of M * n is less than IE(M n ) + M n − IE(M n ) 1 . Note however that (4.1) is an immediate consequence of Ville's inequality applied to the submartingale (M k − IE(M n )) + 0≤k≤n .
We now minimize the upper bound with respect to b.
Then f is strictly convex and Next 1/(q − 1) = p − 1. Consequently the critical point b 0 exists and , we then get that which gives Theorem 4.1(a). We now prove Theorem 4.1(b). Let X be the Bernoulli random variable defined by which ends up the proof of Theorem 4.1(b).
To prove (c), it suffices to prove that, for any real-valued random variable X in L 1 with a symmetric law,Q X (1/z) =Q |X| (2/z) for any z > 2, (4.5) and next to apply (a) to the random variable |X|. Now, for any symmetric random variable X and any positive x, H |X| (x) = 2H X (x), which implies that Q X (s) = Q |X| (2s) for any s < 1/2. Hence, for any z > 2, which proves (4.5). Hence Theorem 4.1(c) holds true.
We now apply Corollary 4.1 to martingales in L p for p in ]1, 2] and we compare the so obtained upper bound with the upper bound that can be derived from (1.9). So, let (M k ) k∈[0,n] be a martingale in L p . Let X k = M k − M k−1 and ∆ p = IE|X 1 ] p + · · · + IE|X n | p . (4.6) By Proposition 1.8 in Pinelis (2015), As shown by Pinelis (2015), for p < 2 the constant K p is strictly larger than 1. Furthermore this constant is decreasing with respect to p and tends to 2 as p 1. From (4.7) and Corollary 4.1, we get the corollary below for martingales.
Proof. We prove Theorem 4.2 in the case ∆ p = 1. The general case follows by dividing the random variables X k by ∆ Let U be a random variable with uniform law over [0, 1]. Since Q (Mn+t) + (U ) has the same law as (M n + t) + and (Q Mn (s) + t) + = Q (Mn+t) + (s), by the Hölder inequality. Noticing that 1−1/q = 1/p, the two above inequalities together with (4.8) imply that Q M * n (1/z) ≤ −t + z 1/p (t p + 1) 1/p for any t ≥ 0.

Cantelli type inequalities
Let p be any real strictly more than 1 and X be a centered random variable in L 2p . In this section we give an extension of the Cantelli inequality to the Hardy-Littlewood maximal distribution associated with |X|. Next we apply this result to martingales in L 2p . Let us start by our extension of Cantelli's inequality.
Theorem 5.1. Let p be any real in ]1, ∞[ and X be a real-valued random variable in L 2p , such that IE(X) = 0. Set σ 2 = IE(X 2 ). Then, for any z > 1, Let a be any positive real and let z p > 1 be the unique solution of the equation Then, for any z ≥ z p , there exists a symmetric random variable X in L 2p such that If furthermore X has a symmetric law, theñ Now, recall that, if (M k ) k∈[0,n] is a martingale in L 1 , then (|M k |) k∈[0,n] is a submartingale in L 1 . Hence, from Theorem 5.1 and (1.4) we immediately get the corollary below.
If furthermore M n has a symmetric law, then, for any z > 2, Remark 5.1. From (1.14) applied with p = 2 The above inequality holds true if and only if z = L 4 . Hence (1.17) is strictly more efficient than (5.3) for z = L 4 . Consequently (1.17) is more efficient than (5.2) and (5.3) for any value of z.
Proof of Theorem 5.1. By the Jensen inequality,  We now apply Corollary 5.1 to sums of independent random variables. Here it will be convenient to introduce a condition of fourth order on the random variables.
Definition 5.1. A real-valued random variable X in L 4 is said to be sub-Gaussian at order 4 if X satisfies X − IE(X) 4 4 ≤ 3 X − IE(X) 4 2 .
Let X 1 , X 2 , . . . be a sequence of independent centered random variables in L 4 . Suppose furthermore that these random variables are sub-Gaussian at order 4. Let M 0 = 0 and M k = X 1 + X 2 + · · · + X k for k > 0.  5) which shows that L 4 ≤ 3 if the the random variables X k are sub-Gaussian at order 4. Hence Corollary 5.1(a) and (5.2) imply the proposition below.
The above upper bound is equivalent to (2/x 4 ) as x ∞. Under the same conditions, (5.3) yields the less efficient upper bound (3/x 4 ).
Assume now that the random variables X 1 , X 2 , . . . are symmetric. By (1.4) and (4.5), Then the above inequality and Corollary 5.1(b) imply the proposition below.
Proposition 5.2. Let X 1 , X 2 , . . . be a sequence of independent symmetric random variables in L 4 . Suppose furthermore that these random variables are sub-Gaussian at order 4. Let the martingale (M k ) 0≤k≤n be defined by (5.4). Then . . be a sequence of Bernoulli random variables with law b(p) for some p < 1/2. Let a 1 , a 2 , . . . , a n be a finite sequence of real numbers. Set X k = a k (η k − p) for any k > 0. (5.9) From the elementary inequality Now, let p denote the log-Laplace transform of η 1 −p. Hoeffding (1963, Section 4) proved that .
From this upper bound and usual arguments on exponential martingales, .   For n ≥ 8, (5.14) is asymptotically more efficient than (5.13). Below I give the numerical values of the upper bounds (5.13) and (5.14) when n = 5 4 = 625, for some integer values of z.

Entropic sub-Gaussian random variables
In this section, we are interested in sub-Gaussian random variables. For any real-valued random variable X with a finite Laplace transform on IR, define X (t) = log IE e tX for any real t. (6.1) Let b be any positive real. The random variable X is said to be sub-Gaussian with parameter b iff X has a finite Laplace transform on IR and X (t) ≤ t IE(X) + b 2 (t 2 /2) for any t > 0. (6.2) This property implies that the variance of X is bounded by b 2 . Our aim in this section is to improve the well-known equivalent inequalities valid for any centered sub-Gaussian random variable X with parameter b. We refer to Boucheron et al. (2013, Section 2.3) for an introduction to sub-Gaussian random variables with a proof of (6.3) and to Bobkov et al. (2006) for estimates of the sub-Gaussian constant.
In order to improve (6.3), we consider here a slightly stronger condition on the moment-generating function.
Definition 6.1. Let b be any positive real. A real-valued random variable X is said to be entropic sub-Gaussian with parameter b if X has a finite moment-generating function on IR and t X (t) − X (t) ≤ b 2 (t 2 /2) for any t > 0.
We denote the collection of such random variables by G E (b).
If X belongs to G E (b), then X − IE(X) satisfies (6.2) with the same parameter b (see Ledoux (1996), pages 69-70). However the class G E (b) does not contain all the sub-Gaussian random variables with parameter b, and thus, there is some hope to improve (6.2) for entropic sub-Gaussian random variables with parameter b. Theorem 6.1 below is a progress in this direction. Let L * v be defined by (3.10). Then, for any p < 1/2, Furthermore the above upper bound is strictly less than min(1/v, 2| log p|).
Applying (1.4), we immediately derive from Theorem 6.1 the corollary below for sub-Gaussian martingales.
Proof of Theorem 6.1. We start by proving (b). Let X be any random variable in the class G E (1) and λ be any positive real. Define the random variable Y λ from X by Y λ = exp(λX − X (λ)). (6.4) By the Jensen inequality applied to the convex function x → e λx , which is equivalent toQ By definition IE(Y λ ) = 1. Hence, we may apply Theorem 3.1(a) applied with z = 1/p to Y λ . Using also (3.15), we then get that Since X is entropic sub-Gaussian with parameter 1, it follows that H λ ≤ λ 2 /2. Hence, from (6.6) and the monotonicity of Combining the above inequality, (6.5) and the fact that an entropic sub-Gaussian random variable is sub-Gaussian with the same parameter, we get that, for any positive λ, Let x be any real in ]0, 1[. Taking λ = 2L * v (x) in the above inequality, we obtaiñ Since this upper bound is valid for any x in ]0, 1[, it implies the first part of (b). Now, if p < 1/2, v = p/(1 − p) < 1. Therefore, we can choose x = 1 − v in (6.10). For this choice of x, which gives the second part of (b). We now prove that 2| log v|/(1 − v 2 ) < min(1/v, 2 log(1 + 1/v)), (6.12) which implies the last statement of Theorem 6.1, since (1/p) = 1 + (1/v). First This inequality ensures that 2| log v|/(1 − v 2 ) < 1/v. And second, starting from the inequality Therefrom | log v|/(1 − v 2 ) < log(1 + 1/v), which ends up the proof of (6.12).
We now prove (a). If X is entropic sub-Gaussian with parameter 1, then the variance of X is less than 1. Consequently, by Theorem 4.1 applied with p = 2, σ = 1 and z = 1/p, It remains to prove that there exist some random variable X, entropic sub-Gaussian with parameter 1 and fulfilling the equality in (a) of Theorem 6.1. To prove this fact, we will use the lemma below. Proof of Lemma 6.1. We start by noticing that, for any random variable X with finite Laplace transform (t X − X ) (t) = t X (t). Therefrom, if X (t) ≤ b 2 for any positive t, then X is entropic sub-Gaussian with parameter b.

A more efficient bounded differences inequality
Ths section is devoted to the bounded differences inequality, sometimes called McDiarmid inequality (see McDiarmid (1989), Corollary 6.10). Let E n = E 1 × E 2 · · · × E n and let X = (X 1 , . . . X n ) be a random vector in E n with independent components. Let f : E n → IR be a bounded measurable function. For all 1 ≤ k ≤ n, denote by F (k) the σ-algebra generated by X 1 , . . . , X n except X k , F (k) = σ(X 1 , . . . , X k−1 , X k+1 , . . . , X n ).
Assume that for each 1 ≤ k ≤ n, there exist two F (k) -measurable bounded random variables A k and B k such that Then, for any positive x, This inequality is often called bounded differences inequality. We now recall an improvement of this inequality, due to Bercu et al. (2015): instead of assuming a uniform bound on each oscillation, they only assume a bound on the sum of squares. If Z = f (X), by Theorem 2.62 in Bercu et al. (2015), for any positive x, Of course, this inequality is equivalent to the quantile inequality Q Z (p) ≤ IE(Z) + D n | log p|/2 for any p ∈]0, 1[. (7.4) The proof of the above inequality is based on the entropy method, which has been widely developed by Ledoux (1996). In particular Bercu et al. (2015, page 56) prove that the random variable Z is entropic sub-Gaussian with parameter D n /4 . Consequently Theorem 6.1 yields the new more efficient concentration inequality below. and, for any p < 1/2, Consequently, for any z > 1, there exists a random variable Z satisfying the conditions of Theorem 7.1, such that One can see that (7.8) is better than (7.6) for z = 20 and almost equivalent for z = 16. However D n is often strictly less than C n . Then, for any f in F, (a) In particular, for any p in ]1, 2], any t ≥ 0 and any martingale (M k ) 0≤k≤n in L p such that M 0 = 0, IE (M n + t) p + ≤ t p + IE |X 1 | p + · · · + |X n | p . Taking then the expectation in (8.5), we get (8.4), which ends the proof.
Remark 8.1. This result cannot be derived from the von Bahr-Esseen inequality for absolute moments of Pinelis (2015), since the constant for absolute moments is strictly larger than 1.