Optimal binomial, Poisson, and normal left-tail domination for sums of nonnegative random variables

Let $X_1,\dots,X_n$ be independent nonnegative random variables (r.v.'s), with $S_n:=X_1+\dots+X_n$ and finite values of $s_i:=E X_i^2$ and $m_i:=E X_i>0$. Exact upper bounds on $E f(S_n)$ for all functions $f$ in a certain class $\mathcal{F}$ of nonincreasing functions are obtained, in each of the following settings: (i) $n,m_1,\dots,m_n,s_1,\dots,s_n$ are fixed; (ii) $n$, $m:=m_1+\dots+m_n$, and $s:=s_1+\dots+s_n$ are fixed; (iii)~only $m$ and $s$ are fixed. These upper bounds are of the form $E f(\eta)$ for a certain r.v. $\eta$. The r.v. $\eta$ and the class $\mathcal{F}$ depend on the choice of one of the three settings. In particular, $(m/s)\eta$ has the binomial distribution with parameters $n$ and $p:=m^2/(ns)$ in setting (ii) and the Poisson distribution with parameter $\lambda:=m^2/s$ in setting (iii). One can also let $\eta$ have the normal distribution with mean $m$ and variance $s$ in any of these three settings. In each of the settings, the class $\mathcal{F}$ contains, and is much wider than, the class of all decreasing exponential functions. As corollaries of these results, optimal in a certain sense upper bounds on the left-tail probabilities $P(S_n\le x)$ are presented, for any real $x$. In fact, more general settings than the ones described above are considered. Exact upper bounds on the exponential moments $E\exp\{hS_n\}$ for $h<0$, as well as the corresponding exponential bounds on the left-tail probabilities, were previously obtained by Pinelis and Utev. It is shown that the new bounds on the tails are substantially better.

Exponential upper bounds for S n go back at least to Bernstein. As the starting point here, one uses the multiplicative property of the exponential function together with the condition of independence of X 1 , . . . , X n to write E e hSn =  for all real h. Then one bounds up each factor E e hXi , thus obtaining an upper bound (say M n (h)) on E e hSn , uses the Markov inequality to write P(S n x) e −hx E e hSn B n (h, x) := e −hx M n (h) for all real x and all nonnegative real h, and finally tries to minimize B n (h, x) in h 0 to obtain an upper bound on the tail probability P(S n x).
This approach was used and further developed in a large number of papers, including notably the well-known work by Bennett [2] and Hoeffding [11]. Pinelis and Utev [22] offered a general approach to obtaining exact bounds on the exponential moments E e hSn , with a number of particular applications.
Exponential bounds were obtained in more general settings as well, where the r.v.'s X 1 , . . . , X n do not have to be independent or real-valued. It was already mentioned by Hoeffding at the end of Section 2 in [11] that his results remain valid for martingales.
However, the classes of exponential functions e h· and absolute power functions | · | p are too narrow in that the resulting bounds on the tails are not as good as one could get in certain settings. It is therefore natural to try to consider wider classes of moment functions and then try to choose the best moment function in such a wider class to obtain a better bound on the tail probability. This approach was used and developed in [9,10,23,25,4,31], in particular. The main difficulty one needs to overcome working with such, not necessarily exponential, moment functions is the lack of multiplicative property (1.1).
In some settings, the bounds can be improved if it is known that the r.v.'s X 1 , . . . , X n are nonnegative; see e.g. [13,6,12,19]. However, in such settings the focus has usually been on bounds for the right tail of the distribution of S n . There has been comparatively little work done concerning the left tail of the distribution of the sum S n of nonnegative r.v.'s X 1 , . . . , X n .
One such result was obtained in [22]. Suppose indeed that the independent r.v.'s X 1 , . . . , X n are nonnegative. Also, suppose here that m := E X 1 + · · · + E X n > 0 and s := E X 2 1 + · · · + E X 2 n < ∞. (1.2) Then [22,Theorem 7] for any x ∈ (0, m] (in fact, these inequalities were stated in [22] in the equivalent form for the non-positive r.v.'s −X 1 , . . . , −X n ). These upper bounds on the tail probability P(S n x) were based on exact upper bounds on the exponential moments of the sum S n , which can be written as follows:

Summary and discussion
Let X 1 , . . . , X n be nonnegative real-valued r.v.'s. In general, we shall no longer assume that X 1 , . . . , X n are independent; instead, a more general condition, described in the definition below, will be assumed. Moreover, the condition (1.2) will be replaced by a more general one. Definition 2.1. Given any m = (m 1 , . . . , m n ) and s = (s 1 , . . . , s n ) in [0, ∞) n , let us say that the r.v.'s X 1 , . . . , X n satisfy the (m, s)-condition if, for some filter (A 0 , . . . , A n−1 ) of sigma-algebras and each i ∈ 1, n, the r.v. X i is A i -measurable, Given any nonnegative m and s, let us also say that the (m, s)-condition is satisfied if the (m, s)-condition holds for some m = (m 1 , . . . , m n ) and s = (s 1 , . . . , s n ) in [0, ∞) n such that m 1 + · · · + m n m and s 1 + · · · + s n s.
The following comments are in order.
• Any independent r.v.'s X 1 , . . . , X n satisfy the (m, s)-condition if E X i m i and E X 2 i s i for each i ∈ 1, n; if at that (2.2) holds, then the (m, s)-condition holds as well.
Optimal binomial, Poisson, and normal left-tail domination and introduce the class of functions F k:j + := g ∈ S j : g (i) is nondecreasing for each i ∈ k − 1, j (2.5) and, finally, the "reflected" class It is clear that the class F k:j − gets narrower as j increases (with a fixed k), and it gets wider as k increases (with a fixed j).
As an example, the function x → a + b x + c e −λx belongs to F k:j − for any a ∈ R, b 0, c 0, λ 0 (and any natural k and j such that k j + 1). Also, given any a ∈ R, b 0, c 0, and w ∈ R, the function x → a + b x + c (w − x) α + belongs to F k:j − for any real α j (and any natural k and j such that k j + 1); here and elsewhere, as usual, x + := max(0, x) and x α + := (x + ) α for x ∈ R. Note also that the classes F k:j − are convex cones; that is, any linear combination with nonnegative coefficients of functions belonging to any one of these classes belongs to the same class.

Remark 2.3.
It is not difficult to see that, if a function f is in the class F k:j − , then the shifted and/or rescaled function x → f (bx + a) is also in the same class, for any constants a ∈ R and b 0. That is, these classes of functions are shift-and scale-invariant. Now we are ready to state the main result of this paper.

Theorem 2.4.
(I) Let X 1 , . . . , X n be any nonnegative r.v.'s satisfying the (m, s)-condition for some m and s in (0, ∞) n , so that (2.3) holds. Then both hold for all f ∈ F 1:2 The necessary proofs will be given in Section 3. E(X i |A i−1 ) = m i for all i, (2.12) E(X 2 i |A i−1 ) = s i for all i, (2.13) m 1 + · · · + m n = m, (2.14) s 1 + · · · + s n = s and also conditions the X i 's are bounded or f p for some quadratic polynomial p, (2.16) E X 3 i < ∞ for all i. (IV) inequality (2.10) holds if any one of the following two conditions holds: This remark can be verified similarly to Theorem 2.4.
Obviously, the r.v.'s Y m1,s1 , . . . , Y mn,sn in (2.7) satisfy the (m, s)-condition. So, inequality (2.7) is exact, in the sense that, given any natural n and any m and s in (0, ∞) n such that (2.3) holds, the right-hand side of (2.7) is the exact upper bound on its lefthand side. Similarly, given any natural n and any m and s in (0, ∞) such that (2.4) holds, inequality (2.8) is exact.  Let now positive m and s vary so that m 2 /s → ∞, which is the case e.g. when 0 = m 1 = m 2 = · · · , 0 < s 1 = s 2 = · · · , conditions (2.14) and (2.15) hold, and n → ∞. At that, fix any real κ and let w = m + κ √ s. Let L m,s;w := E f w,2 s m Π m 2 /s , which is, according to Proposition 2.6, the exact upper bound on E f w,2 (S n ) given m and s. Then s . This convergence is justified, since f κ,2 (Z) is uniformly integrable (as e.g. in [5,Theorem 5.4]), which in turn follows because for any λ and α in Let η denote an arbitrary real-valued r.v. Recalling that for any natural α and any w ∈ R the function f w,α belongs to F 1:α − and applying the Markov inequality, one sees that Theorem 2.4 immediately implies Corollary 2.7. Let X 1 , . . . , X n be any nonnegative r.v.'s satisfying the (m, s)-condition for some m and s in (0, ∞), so that (2.4) holds. Then P(S n x) P 3 Σ n;m,s ; x here and in what follows, x is an arbitrary real number (unless otherwise indicated), for natural n, for any real α > 0. Also, the upper bound P 3 m + Z √ s; x on P(S n x) can be somewhat improved: The computation of P α (η; x) is described (in a somewhat more general setting) in [25, Theorem 2.5]; for normal η, similar considerations were given already in [24, page 363] those descriptions are given for the right tail of η, so that one will have to make the reflection x → −x to apply those results . An elaboration of [25,Theorem 2.5] is presented in [28,Proposition 3.2]. Concerning fast and effective calculations of the positive-part moments E X α + , see [29]. In [3], one can find specific details on the calculation of P α (η; x) for α ∈ {1, 2, 3} and η with a distribution belonging to a common particular family such as binomial and Poisson.
Let us present here some of those results, which will be useful in this context. Take any real α > 1 and any r.v. η such that E η α − < ∞; then there exists E η ∈ (−∞, ∞]. Let where supp(η) denotes the support set of (the distribution of) the r.v. η, and . Then, by [28,Proposition 3.2], the function γ is continuous and nondecreasing on the interval (x * , ∞) and for every in particular, w x is the only root in (x * , ∞) of the equation In particular, the upper bound P α (η; x) on the left-tail probability P(η x) is exact for Thus, to evaluate P α (η; x) for any real x, it is enough to find w x (that is, to solve equation (2.27)) for any x ∈ (x * , E η). This is especially easy to do if the r.v. η takes values in a lattice, which is the case when η is Σ n;m,s or Σ ∞;m,s , as in Corollary 2.7. Again by [28,Proposition 3.2], for all real x and a and all b ∈ (0, ∞). So, the calculation of P α (η; x) for η equal Σ n;m,s or Σ ∞;m,s reduces to the situation when the r.v. η is integer-valued with x * = x * (η) = 0; assume for now that this is the case. In view of (2.19) and (2.20), assume also that α = 3.
Then, by (2.26), Therefore and in view of (2.27) and (2.26), for each x ∈ (x * , E η) = (0, E η) one finds w x as the only root in the interval (j x , j x + 1] of the quadratic equation where j x := min j ∈ 0, ∞ : a j (j + 1) 2 − 2b j (j + 1) + c j 0 . If a jx = 0 then, by (2.26) and (2.28), w x is the greater of the roots of the above quadratic equation.
The interesting paper [8] presents, for any given n ∈ 0, ∞ ∪ {∞} and λ ∈ (1, ∞), the exact upper bound (say B n,λ ) on P(S 1) under the condition that S = n i=1 X i , where the X i 's are independent r.v.'s such that 0 X i 1 for all i ∈ 1, n and E S = λ. For λ ∈ [0, 1], the exact upper bound B n,λ is trivial and equals 1; indeed, let X 1 take values 0 and 1 with probabilities 1 − λ and λ, respectively, and let X i = 0 for all i ∈ 2, n. Note that the conditions 0 X i 1 for all i and E S = λ imply i E X i = λ and i E X 2 i λ, which corresponds to the (m, s)-condition with m = s = λ. So, it makes sense to compare the bound P 3 Σ n;λ,λ ; 1 in (2.19)-(2.20) with B n,λ . Graphs of these two bounds and their ratio in the case n = ∞ are shown in Figure 1. The calculations of P 3 Σ ∞;λ,λ ; 1 here were done in accordance with the above description, containing formulas (2.25)-(2.29); it takes less than 0.3 sec with Mathematica on a standard laptop to produce either of the two graphs in Figure 1. It can be seen that the bound P 3 Σ ∞;λ,λ ; 1 is not much greater than the optimal bound B ∞,λ , especially when λ is close to either 1 or ∞; the corresponding comparisons for finite n look similar. On the other hand, our bounds P 3 Σ n;m,s ; x hold under much more general conditions: (i) for all x ∈ R, rather than just for x = 1; (ii) assuming only the (m, s)-condition (on the EJP 21 (2016), paper 20. sums of the first and second moments of the X i 's), rather than requiring all the X i 's to be bounded by the constant 1 -which latter also coincides with the value of x chosen in [8]; (iii) assuming the more general dependence conditions. By [28, Proposition 3.5], as α increases from 0 to ∞; thus, the bounds P α (η; x) improve on the so-called exponential bounds P ∞ (η; x). In particular, letting   For independent X i 's but without the additional restriction (2.38) , the exponential upper bounds in (2.31) and (2.33) on P(S n x) -as well as the exact upper bound E f s m Π m 2 /s on E f (S n ) for f (x) ≡ e hx with h < 0 -were essentially obtained in [22,Theorem 7]. Note two mistakes concerning the latter result: (i) in the proof in [22], ψ(u) should be replaced by ψ(hu) and (ii) what is presented as the proof of Theorem 7 in [22] is in fact that of Theorem 8 therein, and vice versa. Results of [22] seem yet relatively unknown, as the bound e −z 2 /2 on P(S n x) appeared later in [16].
It is seen that the bound P 3 Σ n;m,s ; x is close to the true tail probability P Σ n;m,s x , especially for λ = 10 and n = 11, with a zero error at the left end-point − √ λ of the range of each of the r.v.'s Σ n;m,s − m)/ √ s, which is in accordance with part (iv)(b) of the mentioned [28, Proposition 3.2]. In the latter case (λ = 10 and n = 11), the bound P 3 Σ n;m,s ; x is over 8 times better near the left-end point of the range than the "normal" exponential bound e −z 2 /2 . However, P 3 Σ n;m,s ; x may be slightly greater for z near 0 than the "normal" better-than-exponential bound P 2 m + Z √ s; x ; this is due to the fact the class F 1:2 − is somewhat richer than F 1:3 − .
The proof of Lemma 3.1 will be given at the end of this section. Note that F n,f is a function of n points P 1 , . . . , P n in R 2 , rather than of n real arguments. If the latter were the case, then Lemma 3.1 together with the well-known Muirhead lemma (see e.g. [15, Lemma 2.B.1]) would immediately imply the Schur-concavity and hence (3.3). However, no appropriate "multidimensional" analogue of the Muirhead lemma seems to exist. Indeed, if one defines the "multivariate" majorization by means of doubly stochastic matrices (in accordance with the Hardy-Littlewood-Polya characterization -see e.g. [15, Theorem 2.B.2]), then the analogue of the Muirhead lemma fails to hold. For example, take n = 3 and consider the doubly stochastic 3 × 3 matrices say A and B t , for some t ∈ [0, 1] that transform any triple τ := (Q 1 , Q 2 , Q 3 ) of points in R 2 to (say)τ := Q1+Q2 2 , Q1+Q3 2 , Q2+Q3 2 and τ t := (1 − t)Q 1 + tQ 2 , tQ 1 + (1 − t)Q 2 , Q 3 , respectively; matrices such as B t are referred to as T -transform matrices, all of which latter can be written as C −1 B t C for some t ∈ [0, 1] and some permutation matrix C; see e.g. [15,Section 2.B]. Then, if the points Q 1 , Q 2 , Q 3 are not collinear, already after one application of any matrix B t with t ∈ (0, 1) to τ one will never be able to get from τ t toτ via any chain of T -transforms, since the points Q1+Q3   F k+1,f (P k ,P k+1 ,P k , . . . ,P k ) by Lemma 3.1 with t = 1 k+1 = E F k,g k+1 (P k ,P k , . . . ,P k ) by the definition of g k+1 E F k,g k+1 (P k+1 , . . . ,P k+1 ) by induction and (3.4) = F k+1,f (P k+1 , . . . ,P k+1 ) by the definition of g k+1 .
This completes the proof of (2.8), modulo Lemma 3. converges in distribution to s m Π m 2 /s as n → ∞. So, the right hand-side of (2.8) is, not only nondecreasing in n, but also converging to the right hand-side of (2.9) as n → ∞ (for f = f w,3 ). Thus, (2.9) follows.
(3.10) Therefore, to finish the proof of inequality (3.7) and thus that of Lemma 3.1, it remains to verify the following lemma.
Proof of Lemma 3.2. For each k, D k is a polynomial and the conditions that define the case C k are polynomial (in fact, affine) inequalities. So, the verification that D k is nonnegative in each of the cases C k can be done in a completely algorithmic manner, due to the well-known Tarski theory [38,14,7]. This theory is imple- This completes the proof of Lemma 3.2, which appears no less reliable than computations done "by hand"; cf. e.g. the views of Okounkov [17, page 35], Voevodsky [35], and Odlyzko [18] on computer-assisted proofs.
However, as Okounkov [17] notes in his interview, "perhaps we should not be dependent on commercial software here". Indeed, details of the execution of the Mathematica command Reduce[] are not open to examination.
Therefore, in addition to the above proof, in the next section an alternative proof of Lemma 3.2 is provided, which relies, instead of the Mathematica command Reduce, on Yet another proof of Lemma 3.2 is given in Section 5 of the arXiv version [33] of this paper. That proof, which is very long, uses only standard tools of calculus and also such a standard tool of algebra as the resultant.

Alternative proof of Lemma 3.2
Recall that, for each k ∈ {0, 1, 2, 3}, D k is a polynomial in a, b, p, q. For each k ∈ {0, 1, 2, 3}, in the case (C k ), the quadruple (a, b, p, q) belongs to the set For each k ∈ {0, 1, 2, 3}, letω k denote the topological closure of ω k , so thatω k is defined by the system of non-strict inequalities corresponding to the strict inequalities defining the set ω k .
We shall use notation such as the following: D k;p=δ := D k p=δ , D k;q=ε := D k q=ε , D k;p=δ,q=ε := D k p=δ,q=ε ; (4.3) sometimes in such notation we shall use, instead of D k , a modified versionD k of D k , which differs from D k by a factor which is manifestly positive in the corresponding context.
Unfortunately, for polynomials in several variables the mentioned package Redlog is either much slower than Mathematica (as in the cases of the polynomials D 0 and D 3 in To verify the nonnegativity of the polynomials D 1 and D 2 with Redlog, each of these two verification problems has to be reduced, by a human, to a series (or rather a tree) of simpler problems, as presented below. Lemma 4.1. In the case (C 1 ), the polynomial D 1 in a, b, p, q is nonnegative for all p and q in (0, 1) -that is, D 1 0 for all (a, b, p, q) ∈ Ω 1 .
To complete the proof of Lemma 4.1, it remains to consider the subcase q = 1. Expanding D 1;q=1 in powers of p, one has Note that discr equals in sign the discriminant of the quadratic polynomial ψ(p). Therefore, discr > 0 if and only if ψ(p) takes both positive and negative values as p varies from −∞ to ∞. Using Redlog, we see that (i) A 0 (≈ 0.16 sec execution time); (ii) the conjunction of the conditions discr > 0, d 1 < 0, and b < 1/2 never takes place over the set ω 1 (≈ 3.1 sec execution time); and (iii) the conjunction of the conditions discr > 0, d 2 > 0, and b > 1/2 never takes place over the set ω 1 (≈ 25.5 min execution This completes the proof of Lemma 4.1.

Lemma 4.2.
In the case (C 2 ), the polynomial D 2 in a, b, p, q is nonnegative for all p and q in (0, 1) -that is, D 2 0 for all (a, b, p, q) ∈ Ω 1 .