On the probability that a binomial variable is at most its expectation

Consider the probability that a binomial random variable Bi$(n,m/n)$ with integer expectation $m$ is at most its expectation. Chv\'atal conjectured that for any given $n$, this probability is smallest when $m$ is the integer closest to $2n/3$. We show that this holds when $n$ is large.


Introduction
Consider the probability P Bi(n, p) np that a binomial random variable Bi(n, p) is less that or equal to its mean. (We slighly abuse notation, and let Bi(n, p) denote both the binomial distribution and a binomial random variable.) By the central limit theorem, unless n or p(1 − p) is small, this probability is close to 1 2 ; in fact, the Berry-Esseen theorem [2] and [5] (see also e.g. [8,Theorem 7.6.1]) shows that P(Bi(n, p) np) = 1 2 + O (np(1 − p)) −1/2 . (See also the explicit bounds in [4], [7], [14], [16], [17].) In the case when np = m is an integer, Neumann [12] showed that the mean np is also a (strong) median, i.e., P(Bi(n, p) < np) < (See [9], [10], and [11, for other proofs.) It follows that for any fixed n 1, the probability P Bi(n, p) np regarded as a function of p ∈ [0, 1], oscillates around 1 2 , with upward jumps at each m/n and monotone decrease between the jumps. See Figure 1 for an example.
Consider again the case when np = m is an integer, illustrated by the local maxima in Figure 1. Vašek Chvátal (personal communication) made the following conjecture, based on numerical experiments. Conjecture 1.1 (Chvátal). For any fixed n 2, as m ranges over {0, . . . , n}, the probability q m := P Bi(n, m/n) m is smallest when m is the integer closest to 2n/3.
The purpose of the present paper is to show that this conjecture holds for large n. Moreover, at least for large n, the probabilities q m are inverse unimodal, i.e., have no other local minimum. (The latter property was partly proved by Rigollet and Tong [16, (29)], who proved, for any n, that q m decreases for m n/2. We conjecture that also the inverse unimodality holds for all n.) Theorem 1.2. There exists n 0 such that Conjecture 1.1 is true for every n n 0 . Moreover, still for n n 0 , the difference q m+1 − q m is negative when m + 1 2 < 2n/3 and positive when m + 1 2 > 2n/3. np) (at m = 20) and the maximum of P(Bi(n, p) < np) (at m = 10) are marked with dots. Remark 1.3. By symmetry, i.e., considering n − Bi(n, m/n), it follows that for large n at least, the probability P(Bi(n, m/n) < m) is largest for the integer m closest to n/3. Remark 1.4. For general p, the value of P Bi(n, p) np is asymptotically given by P Bi(n, p) np = , (1.2) at least provided np(1 − p) log 2 n; this is a consequence of Theorem 3.2 and (5.1) (with k = 1). Cf. Figure 1. See also the explicit related bound by Doerr [4,Lemma 8].
Remark 1.5. In principle, it should be possible to calculate all constants in our proof explicitly, and thus find an explicit value for n 0 ; the conjecture then could be verified completely (assuming that it holds) by checking all smaller n numerically. However, we do not believe that this is practical. Presumably, other methods, completely different from ours, are needed to show Chvátal's conjecture in general. (We have verified the conjecture numerically for n 1000.) Our proof of Theorem 1.2 is based on the version for integer-valued random variables found by Esseen [6] of the asymptotic Edgeworth expansion for probabilities in the central limit theorem. This is usually stated for a single probability distribution, but we need to check that the estimates hold uniformly for Bi(n, p) with p in some range; hence we discuss this expansion in some detail in Section 3. In particular, we state in Theorem 3.2 the result that we need in a general form, and prove it in Section 4. We return to the binomial probabilities in Section 5, and prove Theorem 1.2 in Section 6. Remark 1.6. See Brown, Cai and DasGupta [3] for another aspect, with statistical implications, of the oscillations of binomial probabilities; they too use Edgeworth expansions with more than one term. C and c denote unimportant constants, in general different at different occurrences. The "constants" may depend on some given parameters, given by the context; we sometimes write e.g. C k to emphasize that C depends on a parameter k, but this is omitted when not necessary.
2.2. Special notation. We introduce here the notation needed for the expansions in the following sections. See further Esseen [6, Chapters III-IV] and Petrov [15,§VI.1].
Let X be a random variable. (X will be regarded as given. Most quantities defined below depend on the distribution of X, although we for simplicity do not show this in the notation.) Denote its mean by µ := E X, its central absolute moments by β j := E |X − µ| j , and its cumulants by γ j (when they exist, i.e., when E |X| j < ∞). Also, let σ 2 = β 2 = γ 2 = Var X be the variance of X, and define the scale-invariant Each cumulant γ j , j 2, can be expressed as a polynomial in central moments of orders 2, . . . , j, and it follows, using also Hölder's inequality, that and thus Furthermore, Hölder's inequality also easily yields, for 2 j k, Define polynomials P j (u), j 1, by expanding the formal power series Note that P j (u) is a polynomial of degree 3j; moreover, where each coefficient jr is a polynomial in λ k , k = 3, . . . , j + 2. In particular, P j (u) is well-defined provided E |X| j+2 < ∞. Furthermore, jr = 0 unless r − j is even. Let Φ(x) be the distribution function of the standard normal distribution, so that Define Q j (x) as the function obtained from P j (u) by replacing each power u r by where H r (x) are the Hermite polynomials (in the normalization natural in probability theory, i.e., orthogonal w.r.t. the standard normal distruibution); see e.g. [13, (18.5.9)] (there denoted He r (x)) or [15, p. 127].
Define also periodic functions ψ r , r 1, by their Fourier series Note that for r 2, the series (2.11) converges absolutely and defines a continuous periodic function with period 1. However, for r = 1, the series is only conditionally convergent; in fact it is the Fourier series of 1 2 − {x}. (It follows from standard results that the series converges for every x, but we do not really need this.) Hence, ψ 1 (x) has a jump 1 at every integer. For later convenience, we redefine ψ 1 to be right-continuous; thus we define noting that for r = 1, (2.11) holds only for non-integer x. Note also that, for any r 1 and x ∈ R, where B r denotes the Bernoulli numbers. Recall that B 1 = − 1 2 , B 2 = 1 6 and B 2j+1 = 0 for j 1.
We prefer the choice of signs in our definition, but this is only a matter of taste.
(We prefer this, weaker, version for our generalization in Theorem 3.2.) Theorem 3.1 (Esseen [6]). Let X, X 1 , X 2 , . . . be i.i.d. integer valued random variables with span 1 and let S n := n i=1 X i . Let k 1 be an integer and suppose that E |X| k+2 < ∞. With notations as above, define Then where, as n → ∞, Theorem 3.1 is stated for a single distribution. We want to apply it to X ∼ Be(p), but then need some uniform estimates for all p, or at least for a large range. It is no surprise that the proof of Theorem 3.1 yields such uniformity under suitable conditions, including some uniform moment estimates. For Be(p), the case p ∈ [p 1 , p 2 ] for a compact interval [p 1 , p 2 ] ⊂ (0, 1) does not cause any difficulties, but we can go beyond that. We will show the following extension of Theorem 3.1 in Section 4. and n 2 is an integer with We defined ψ 1 (x) to be right-continuous so that Ψ * n,k (x) and R n,k (x) are right-continuous, which enables 2) is sometimes convenient (for example in the proof), but it is often more convenient to modify (3.2) by dropping the redundant terms; we thus define also and define a modified remainder term R n,k (x) by P S n nµ + xσ √ n = Ψ * n,k (x) + R n,k (x). (3.10) It follows from (3.1) that in Theorem 3.1, this changes Ψ * n,k (x) and thus R n,k (x) by some terms which are O(n −m/2 ) for some m k + 1 so (3.4) and (3.5) still hold for R n,k (x). With only a little more effort, it can be verified that the same holds for Theorem 3.2; it follows from Lemma 4.1 below that the removed terms are all dominated by Λ k+3 n − k+1 2 , so (3.8) still holds. (This uses also that σ √ n is bounded below by the assumptions, and the fact that β i C i,j β j when 2 i j and X is integer valued.) Remark 3.5. Note that Theorem 1.2 is only for random variables with span 1, and that a uniform version for a family of random variables therefore requires some uniform condition preventing the variables from being too close to variables with larger span. We use condition (3.6) which is convenient and turns out to be sufficient; it can obviously be replaced by more general conditions. Remark 3.6. The assumption (3.7) in Theorem 3.2 is annoying but not a very serious restriction. Note that the right-hand side of (3.8) is, since X is integer-valued, at least C(σ √ n) −k−1 . Hence, if (3.7) is violated, (3.8) would, even if true, only give a weak bound. We do not know whether (3.7) really is needed. It is possible that Theorem 3.2 could be proved without this assumption, using the alternative method of proof in [6, Section IV.4], but we have not pursued this.

Proof of Theorem 3.2
Lemma 4.1. Suppose that 1 and E |X| +2 < ∞. Then, for every m 0, Proof. It follows from (2.6) that ,r is a linear combination of products m k=1 λ i k +2 with k (i k +2) = r and k i k = . By (2.4) and (2.5), each such product is bounded by Lemma 4.2. If X is a random variable with P(X = 0) a and P(X = 1) a for some a 0, then E e itX 1 − a π 2 t 2 e −aπ −2 t 2 , |t| π. (4.7) Proof.
The assumption implies that E e itX − a − ae it is the Fourier transform of a positive measure with mass 1 − 2a. Hence, if |t| π, and (4.7) follows.
Proof of Theorem 3.2. We follow the proofs of Theorems 3 and 4 in Esseen [6] (with d = 1 and thus t 0 = 2π) and mention only the main differences. Note that [6] considers centred variables, so X i there is our X i − µ. Let and replace T 3n in [6] by . (4.12) The second inequality in (3.8) is trivial, by the definition (2.1). We may also assume that Λ k+3 n − k+1 2 1, and thus T 1 1, since otherwise it follows from Lemma 4.1 and (2.5) that each term in (3.2) is bounded by CΛ k+3 n − k+1 2 , and thus (3.8) holds trivially.
In the range |t| T 1 , we have with T 3n improved to our T 1 /4, which can be proved in the same way). Hence, for the "main term" in the estimate (4.14) Furthermore, if |t| πσ √ n, then the assumption (3.6) and Lemma 4.2 yield Hence, The integral πσ √ n T 1 |g 0 (t)|/|t| dt has the same estimate. Consequently, by (4.14) and (4.16), The same arguments yield also Using the estimates (4.17) and (4.18), the rest of the proof is essentially the same as in [6]. One of the terms, generalizing I k on [6, p. 58-59], is where we use (4.18). This term exists for all integers j = 0 with (2|j| − 1)πσ √ n < T = n k+1 2 , and thus the sum of them is bounded by this is the reason for our assumption log n σ √ n, which leads to an estimate CΛ k+3 n − k+1 2 for (4.19) too. The remaining terms give no problems.
Remark 5.1. For P(S n < nµ), we have the same formulas with ψ 1 (x) replaced by its left-continuous version ψ 1 (x−). (All other appearing functions are continuous.) In (5.2), this means that the sign is changed for the terms with = 1; all other terms remain the same.
6. Proof of Theorem 1.2 We now prove our main result.
Proof of Theorem 1.2. The main idea of the proof is to estimate q m+1 − q m using the estimates above, in particular (5.6), but the details will differ for different ranges of m. We write p m := m/n. We sometimes tacitly assume that n is large enough. C 1 , C 2 , C 3 will denote some large constants.
Recall h 1 (p) and h 3 (p) given in (5.7)-(5.8). A simple differentiation yields Note that h 1 (p) = 0 for p = 2/3, with h (p) < 0 for 0 < p < 2/3 and h (p) > 0 for 2/3 < p < 1; this is the fundamental reason for the behaviour shown in Theorem 1.2, although we also have to treat error terms. There is no need to calculate h 3 (p) exactly; it suffices to note from (5.8) that We treat several cases separately. Case 1: 2 log 2 n m n − 2 log 2 n. Both p m and p m+1 satisfy (5.4); hence Theorem 3.2 applies and yields (5.6) for both. By subtraction and the mean value theorem, we obtain, for some p m , p m ∈ [p m , p m+1 ], recalling that p m+1 − p m = n −1 and using (6.1)-(6.2), Case 1a: 2 log 2 n m n/2. In this subcase, (6.4) yields provided n and thus np m = m is large enough. Case 1b: n/2 m 2n/3 − C 1 . In this subcase, (6.4) similarly yields provided C 1 is chosen large enough. Case 1c: 2n/3 + C 2 m n − 2 log 2 n. Similar arguments as in the two preceding subcases yield q m+1 − q m > 0.
Case 1d: 2n/3 − C 1 m 2n/3 + C 2 . This is the most delicate case, since p m − 2/3 = O(1/n) and the three terms in (6.3) all are of the same order. We thus expand one step further and use Theorem 3.2 with k = 6; this yields, again using (5.2), where we note that h 5 (p) is a differentiable function of p. (It can easily be calculated, but we do not need this.) Taylor expansions yield, recalling h 1 ( 2 3 ) = 0, The formula (6.8) holds for m + 1 too, and thus Since the ratio 1/15 < 1/6, it follows from (6.11) that if m (2n − 2)/3, then q m+1 − q m < 0, and if m (2n − 1)/3, then q m+1 − q m > 0, for large n. Case 2: m < 2 log 2 n. As said in the introduction, Rigollet and Tong [16, (29)] showed that (for every n) q m−1 q m , and their proof actually gives q m−1 > q m , for m n/2. Alternatively (for large n), we can argue using Poisson approximation as in the next case; we omit the details. Case 3: m > n−2 log 2 n. Define q m := P Bi(n, m/n) < m . By symmetry, 1−q m = q n−m ; hence the claim q m < q m+1 is equivalent to q m−1 < q m for m < 2 log 2 n.
We use Poisson approximation of the binomial distribution. It is well-known, see e.g. [1, Theorem 2.M] that the total variation distance between Bi(n, p) and Po(np) is less than p, and thus, in particular, q m − P Po(m) < m d TV Bi(n, p m ), Po(np m ) < p m . (6.14) We estimate P Po(m) < m by Theorem 3.2 (or Theorem 3.1) applied to X ∼ Po(1). This yields, using (5.2) and Remark 5. This shows that q m > q m−1 for C 3 < m < 2 log 2 n and n large. (In fact, for C 3 < m < cn 2/5 .) In the remaining subcase m C 3 , q m−1 < q m follows from (6.14) and Lemma 6.1 below.
These cases cover all m, which completes the proof of Theorem 1.2.
The proof used the following lemma, of independent interest. It gives two Poisson versions of the inequality mentioned above for the binomial distribution shown by Rigollet and Tong [16]. We use their method of proof. since t m+1 e −t is increasing for t m + 1.