Majority Bootstrap Percolation on $G(n,p)$

Majority bootstrap percolation on a graph $G$ is an epidemic process defined in the following manner. Firstly, an initially infected set of vertices is selected. Then step by step the vertices that have more infected than non-infected neighbours are infected. We say that percolation occurs if eventually all vertices in $G$ become infected. In this paper we study majority bootstrap percolation on the Erd\H{o}s-R\'enyi random graph $G(n,p)$ above the connectivity threshold. Perhaps surprisingly, the results obtained for small $p$ are comparable to the results for the hypercube obtained by Balogh, Bollob\'as and Morris (2009).


Introduction
The classical bootstrap percolation, called r-neighbour bootstrap percolation, concerns a deterministic process on a graph. Firstly, a subset of the vertices of a graph G is initially infected. Then at each time step the infection spreads to any vertex with at least r infected neighbours. This process is a cellular automaton, of the type first introduced by von Neumann in [13]. This particular model was introduced by Chalupa, Leith and Reich in [6], where G was taken to be the Bethe lattice.
A standard way of choosing the initially infected vertices is to independently infect each vertex with probability p. The probability that the entire graph eventually becomes infected is increasing with p. It is therefore sensible to study the quantity p c = inf{p : P p (G infected) ≥ c}, in particular the critical probability p 1/2 and the size of the critical window p 1−ǫ − p ǫ .
A natural setting for this problem is the finite grid [n] d . Many of the results on bootstrap percolation concern this problem. The first to study this graph were Aizenman and Lebowitz in [1], who showed that in 2-neighbour bootstrap percolation when d is fixed we have p 1/2 = Θ((log n) 1−d ).
The r-neighbour bootstrap percolation process has also been studied on the random regular graph by Balogh in [3] and on the Erdős-Rényi random graph G(n, p) by Janson, Luczak, Turova and Vallier in [8].
In majority bootstrap percolation a vertex becomes infected if a majority of its neighbours are. In [2] Balogh, Bollobás and Morris studied this process on the hypercube and showed that if the vertices of the ndimensional hypercube are independently infected with probability q = 1 2 − 1 2 log n n + λ log log n √ n log n , then, with high probability, percolation occurs (i.e., all vertices eventually become infected) if λ > 1 2 and does not occur if λ ≤ −2. In this paper we shall study majority bootstrap percolation on the Erdős-Rényi random graph G(n, p) above the connectivity threshold. We will see that for small p our results are in fact comparable to the results for the hypercube in [2], noting that the degree for each vertex in the ndimensional hypercube (with 2 n vertices) is equal to n.

Main Results
In this section we shall state our main results and discuss two different ways of selecting the initially infected set. The proofs of these theorems (in Section 3 and Section 4) use inequalities that are described separately in Section 5.
For a graph G with some subset I 0 ⊂ V (G) of initially infected vertices, the majority bootstrap process on G is defined by setting I t+1 = is the neighbourhood of v. For a finite graph G, this process will terminate with I T +1 = I T . Denote by I = I T the set of eventually infected vertices.
We shall look at the case of G = G(n, p), the graph on n vertices, where each edge is included independently with probability p. Often p := p(n) → ∞ as n → ∞, but we use the standard notation to just write p also for functions depending on n. Our initial setup is slightly different than for the hypercube mentioned above, instead of infecting each vertex independently with some probability q, we shall infect a random set of vertices of size m := m(n).
In the normal setup for the majority bootstrap process on G(n, p), we would first choose the edges of G(n, p), and then choose an initially infected set I 0 uniformly from [n] (m) . As these two choices are independent we shall equivalently set I 0 = [m], and then choose the edges of G(n, p). This is the M B(n, p ; m) process. We now introduce some notation that shall be used. We use the standard asymptotic little-o notation and this is always taken as n or N tends to infinity, i.e., if (b n ) is a sequence of numbers, we say that b n = o(a n ) if b n /a n → 0, as n → ∞.We set d = np 1−p , thus d is roughly the average degree in G(n, p) for p = o(1). We denote the binomial distribution with parameters n and p by B(n, p). We shall sometimes abuse the notation and denote by B(n, p) a random variable that has a binomial distribution. We reserve m for the size of I 0 and shall always assume that for some constant λ. We also use the standard notation that an event E n holds with high probability, i.e., for the event E n it holds that P(E n ) → 1, as n → ∞. Let ω(n) denote some arbitrary positive function that is increasing and unbounded, as n tends to infinity. The inequalities below are only claimed to be true for n large enough. For the M B(n, p ; m) process, define P m (G(n, p)) = P (I = [n]) .
We shall now state the main result of this paper. Theorem 1. Fix some number ǫ > 0. Assume that for n large enough, If the initially infected set I 0 has size Our second result concerns a more natural setup, where each vertex is initially independently infected with probability q, we have that, with high probability, log n , our result above shall still hold in this setting for q = m/n.
More formally define the M B ′ (n, p ; q) to be the process in which the graph G(n, p) is chosen, and each vertex is initially infected independently with probability q. Then the infection spreads by the majority bootstrap percolation process. For the process M B ′ (n, p ; q) define P ′ q (G(n, p)) = P(I = [n]).
Corollary 2. Fix some number ǫ > 0. Assume that for n large enough, where Φ(x) denotes the distribution function of the standard Normal random variable.
Proof. As each vertex is infected independently, |I 0 | has distribution B(n, q). Thus, with high probability, it holds that ||I 0 | − qn| ≤ ω(n) q(1 − q)n. If p ≪ (log log log n) 2 log n , then n log log log d √ d log d ≫ √ n and the result follows from Theorem 1.
If p ≫ (log log log n) 2 n , then for each fixed δ > 0 by the Central Limit Theorem we obtain where the fourth line follows as P ⌊qn+(δ−θ) √ n⌋ (G(n, p)) → 1 for p ≫ (log log log n) 2 log n by Theorem 1. A similar argument shows that When p is smaller than the connectivity threshold, G(n, p) contains isolated vertices. Due to the way we define the M B(n, p ; m) process, any uninfected isolated vertex becomes infected in the first time step, so this is not an obstruction to complete percolation. However, once p drops to below log n 2n , then, with high probability, G(n, p) contains isolated edges and neither endpoint of an isolated edge becomes infected if both endpoints are initially uninfected. This means that P m (G(n, p)) → 0 unless m = n − o(n).
Remark 3. Preliminary versions of this paper (including the same results) were included in the Phd thesis by Kettle [11] and in the PhD thesis by Juškevičius [9]. There is also a recent study by Stefánsson and Vallier [12] on this subject using completely different methods than those that are used in this paper (but using similar methods as was used by Janson, Luczak, Turova and Vallier in [8]), where they show the first asymptotics of the thresholds m ∼ n 2 in Theorem 1 above, and similarly thus the first asymptotics of the threshold q ∼ 1 2 in Corollary 2 above.

Upper Bound
As G is finite the M B(n, p ; m) process will eventually terminate with some set I ⊂ [n] of infected vertices. If we do not infect the whole graph, or, equivalently, we have that I = [n], then we can say something about the structure of I. We shall call a proper subset Recall that I 0 is the set of initial infected vertices and that a vertex v ∈ I t+1 , if either v ∈ I t or if at least half of its neighbours lies in I t . In particular I t ⊆ I t+1 . If the majority bootstrap process does not percolate, let T be such that the process has stabilized, i.e., I = I T = I T +1 = [n]. Then I is a closed set, and thus we must have that the initially infected vertices I 0 is a subset of a closed set. We shall show that, if λ > 1 2 , then, with high probability, I 0 is contained in no closed sets in three stages. Using Lemma 5 will allow us that, with high probability, the graph G(n, p) has no "large" closed sets. After that we shall bound the expected number of medium sized closed sets that I 0 is contained in, hence by the Markov inequality it will follow that, with high probability, there are no medium sized closed sets containing I 0 . But before we proceed with proving these two facts, we shall show that, with high probability, the number of infected vertices after one time step, |I 1 |, is large, and so I 0 can rarely be contained in a small closed set. Recall that We assume that for some fixed ǫ > 0 it holds that for n large enough (1 + ǫ) log n ≤ p(1 − p)n. However, for some of the results below it is enough to assume that p(1 − p)n = ω(n).

Lemma 4.
In the M B(n, p ; m) process, with high probability.
Proof. For i ∈ [n] \ I 0 , denote by A i the event that vertex i is infected at time one, that is the event that i has fewer neighbours in [n] \ I 0 than it does in I 0 . The events A i are identical and very weakly correlated but not independent. Let X be the number of vertices infected at the first step of the process. Then X = |I 1 \ I 0 | = 1(A i ). We shall use Chebyshev's inequality to bound the probability that X is small.
For p ≫ 1 n , we have p(n − 2m − 1) = ω(n) p(1 − p)n and p(n − 2m − 1) 2 = o(n p(1 − p)n). Applying the bound from Proposition 21 to the last equality with N = n−1 2 , S = n−1−2m 2 and h = p(n − 2m − 1), we obtain where in the second line we have used the asymptotic relation Let us calculate the variance of X. Let this being the same for any i = j. We have where the first term in (2)is the sum over i = j and the second term is the sum over i = j. Let B ij and B ij be the events that ij is, or is not, an edge in G respectively. Note that and where B(m, p) and B(n − m − 2, p) are independent random variables. Note that P(A j |B ij ) ≤ P A j |B ij , hence we may bound r ′ by where the last equality follows from (3) and (4).
The second term is much smaller than the first term, and so (for n large enough) We are now able to bound the probability that X is small. From (2) and Chebyshev's inequality we get From (5) and (1) this is at most and so we have, with high probability, that |I 1 \ I 0 | is at least (n−m)r 2 . By using (1) we get that for large n, which completes the proof.
We now show that G(n, p) contains no large closed sets by a simple edge set comparison.
Lemma 5. Suppose that for some fixed ǫ > 0 we have for n large enough. Then, with high probability, G(n, p) contains no closed set of size greater than n 2 + 7n 2 √ d . Proof. Let us write s for the size of the set S i.e., s =| S |. In order for the set S to be closed, each vertex v ∈ [n] \ S has to have the majority of its neighbours outside S. In other words, we must have Summing over the vertices in [n] \ S, we have that the number of edges from S to [n] \ S must be fewer than twice the number of edges in By Proposition 24 every set of size n − s has at most edges with probability at least 1 − 1 4 n−s , and by Proposition 25 every set S of size s has at least ps(n − s) − 3(n − s) p(1 − p)s edges between it and its complement with probability at least 1 − 1 4 n−s . Therefore, with high probability, every set S of size If s ≥ 4n 5 and p(1 − p)n ≥ 4 log n, then we know from Proposition 26 that with probability at least 1 − n − n−s 120 there does not exist a closed set of size s in G(n, p). The result follows as i≥1 n − i 120 = o(1). If n − n 27 28 ≥ s ≥ 4n 5 and 5 log n ≥ p(1 − p)n ≥ (1 + ǫ) log n, then we know from Corollary 27 that with probability at least 1 − n − n−s 120 there does not exist a closed set of size s in G(n, p).
If s ≥ n − n 27 28 and 5 log n ≥ p(1 − p)n ≥ (1 + ǫ) log n, then we know from Proposition 29 that with probability at least 1 − n − n−s 120 every set S C := [n]\S of size n − s has at most 2(n − s) edges, and so has a vertex v S C of degree at most 4. By Proposition 28 we have that, with high probability, the minimum degree of G(n, p) is at least 9, and so v S C will become infected if all of S is infected, and so S is not closed.
Lastly, we turn to bounding the expected number of medium sized closed sets I 0 is contained in. We shall therefore want a bound on the probability that a set S of size at least s in a particular range of s is closed. To do this we shall pick a test set T of a suitable size and bound the probability that none of the vertices in T are infected by S. Lemma 6. Fix ǫ > 0 and define Take any set of vertices S in G(n, p) of size s ≤ |S| < 2n 3 . Then for n large enough, Proof. Let S be a set of vertices such that s ≤| S |≤ 2n 3 . Consider a set We shall condition on the edge set of T as once we have done so the events F v , that v is not infected by S for each vertex v ∈ T , are independent.
Denote by E = E(T ) the edge set of T , and set d E (v) to be the degree of vertex v ∈ T , when T has edge set E. We have that where P(E) is the probability of a particular edge set E ⊂ {0, 1} ( t 2 ) and is equal to The rest of the proof shall be spent bounding (6). The degree of vertices in T is heavily concentrated around pt, and we shall expand f around pt to show that (6) is not much larger than f (pt) t .
We have by Corollary 13 that f is log-concave, and so for any x and y with f (y) = 0, There is no dependence on E other than its size, and so f (y) = 1 + a, we bound (7) using the inequalities 1 + w ≤ e w and (1 + We have that Let us write z = P (B(s, p) = B(n − s − t, p) + y) to ease up the notation. Thus, f (y +1) = f (y)+z. By Proposition 23 applied with N = n−t+T 2 , S = n−2s−t+T 2 and T = ⌈pt⌉ p and noting that 0 ≤ T − t < p −1 , we have The second term in (9) is much smaller than the first so as 6 < 2π and t log d = o(n) we get (for n large enough) We can rewrite f (y) as We have for p ≫ 1 n the asymptotic relation and so using Proposition 21 with (N, S, h) = ( n−t 2 , n−2s−t 2 , p(n − 2s) + y − pt) we obtain (for n large enough) that the second inequality follows from the same reasoning used in (9) and that e 6 > 2πe 4 . We can also apply Proposition 22 to get a lower bound on f (y) (for n large enough) of here the bound on 1 − f (y) is actually o(1), being within a constant factor of the bound in (10).
We are now able to get a good upper bound on a, Substituting these bounds into (8) we get (for n large enough) The second term in the exponential is much larger than the first term, and so (for n large enough) We shall now bound the expected number of closed sets in this medium sized range that contain I 0 , this is also a bound on the probability that I 0 is contained in such a medium sized closed set.

Proposition 7.
Assume that and choose some ǫ > 0. Then the expected number of closed sets in G(n, p) of size between Proof. Let S be a set of size s in our range, s can have at most n possible closed sets that can contain I 0 . By Lemma 6 the expected number of closed sets is (for n large enough) less than and this is o(1) as (log log d) ǫ is unbounded.
Corollary 8. Fix some number ǫ > 0. Assume that for n large enough, If the initially infected set I 0 has size then for λ > 1 2 , with high probability, the M B(n, p ; m) process percolates.
Proof. We have from Lemma 4 that, with high probability, I 0 = [m] is contained in no closed set of size less than Using the Markov inequality it follows from Proposition 7 applied to ǫ = λ − 1 2 that, with high probability, I 0 is contained in no closed set of size between We have from Lemma 5 that, with high probability, I 0 is contained in no closed set of size greater than n 2 + 7n and so, for λ > 1 2 , with high probability, I 0 is not contained in any closed set in G(n, p) and hence percolates.

Lower Bound
In this section we shall show the lower bound of Theorem 1. We show the following result.
Lemma 9. Fix some number ǫ > 0. Assume that for n large enough, If the initially infected set I 0 has size then, for λ < 0, with high probability, the M B(n, p ; m) process does not percolate.
In fact to prove Lemma 9, as might be expected, we shall show that, with high probability, the M B(n, p ; m) process terminates with I (the set of eventual infected vertices) only slightly larger than | I 0 |= m. We shall do this by bounding the expected number of sets of some size that could be the first vertices to be infected.
Proof. We say that a set of vertices T percolates if all of its vertices will be infected eventually. For T ⊂ I \ I 0 we can order the vertices of I 0 ∪ T by the time they get infected. That is, take any order of T such that a vertex from I j is infected before any vertex from I j ′ if j < j ′ . Notice that for each v ∈ T the majority of its neighbours (in the whole graph) are in the set of its predecessors in this order. Our strategy will be to show that if λ < 0 then, with high probability, there is no percolating set T of a particular size and thus the M B(n, p ; m) process does not percolate.
Set t = |T |, and denote by E = E(T ) the edge set of T . Write d E (i) for the degree within T of a vertex i ∈ T . We want to bound the probability that T percolates. To do so, we modify the infection rule within T so that the vertices inside T consider their neighbours in T to be already infected, regardless of their real state at any particular time step. The latter assumption only increases the probability and, more importantly, makes the events for vertices in T to be infected independent. This is because these events now only depend on how many edges each vertex has to I 0 and V (G)/ (I 0 ∪ T ). Conditioning on E, and then taking the expectation give (for independent random variables B(m, p) and B(n−m−t, p)).
Denote − t, p)). Due to the log-concavity of g (Corollary 13) we have for integers x, y, that g(x) ≤ g(y) g(y + 1) g(y) x−y .
Using the latter inequality with x = d E (i) and y = ⌈pt⌉, we can bound (11) by Substituting g(y+1) g(y) = 1 + a and the elementary inequality 1/(1 + a) ≤ 1 − a + a 2 , we bound the latter expression by We have by definition that g(y) is equal to where X 1 = B(m, p) with mean µ 1 and X 2 = B(n − m − t, (1 − p)) with mean µ 2 . Setting t = n(log log d) λ / √ d log d and using Proposition 22 with N = n−t 2 , S = n−2m−t 2 and h = p(n − 2m − t) − y to bound g(y), we obtain (for n large enough) when λ < 0. We can also bound g(y) from below by Proposition 21 when λ < 0 (and n is large enough). By definition of g we have that g(y + 1) = g(y) + P(B(m, p) + y + 1 = B(n − m − t, p)).
Let us write z = P(B(m, p) + y + 1 = B(n − m − t, p)) for convenience. We shall now obtain an upper bound for z. Using Proposition 23 with T = − y+1 p , N = n−t+T 2 and S = N − m, we obtain The first term is much larger than the second, and so we obtain (for n large enough) the inequality We have that a = z g(y) , and so from (13) and (14) (for n large enough) We can now bound the expression in (12) (for n large enough) by where the second inequality follows since the exponent in the exponential is o (1)). The expected number of sets of size t that percolates is (for n large enough) , and so the expected number of sets of size t that percolates is bounded above by (e 6 (log log d) λ ) t = o(1). Therefore, with high probability, percolation does not occur for λ < 0.

Inequalities
We begin this section with some remarks on the log-concavity of the distribution function of the Binomial distribution. These results are standard, see for example [10], but we prove them for completeness. Proposition 11. The sum of independent Bernoulli random variables is log-concave, that is if X i are independent Bernoulli random variables with means p i , then for any k we have, Proof. We proceed by induction on n, with the base case n = 1 being trivial as one of the terms on the left hand side of the inequality is zero. Otherwise conditioning on X n+1 , and writing f n,k = P( n i=1 X i = k) we get, The inequality follows as f n,k−2 f n,k+1 ≤ f n,k−1 f n,k is implied by the induction hypothesis.
Proposition 12. The cumulative distribution of a discrete non-negative log-concave random variable X is log-concave, that is for all k, Proof. Setting r i = P(X = i) we get by Proposition 11, (r 0 + . . . + r k−1 )r k+1 ≤ (r 1 + . . . + r k )r k + r k r 0 , and so, When X is the sum of n independent Bernoulli random variables, we can rewrite X = n − Y, where Y is also the sum of n independent Bernoulli random variables, and so Proposition 12 is still true if we replace ≤, with <, > or ≥.
Corollary 13. The cumulative distribution of the sum or difference of independent binomial random variables is log-concave.
Proof. Sums and differences of independent binomial random variables are also sums of independent Bernoulli random variables plus a constant, and so are log-concave.
A substantial part of this section is now taken up with providing tight bounds, up to a constant factor, on binomial probabilities and their sums.
Corollary 15. Suppose p(1 − p)n = ω(n) and k = pn + h, where 0 < h = o (p(1 − p)n) 2 3 , then Proof. For h in this range we have We also have that k = ω(n) and n − k = ω(n), and so the inequality follows from Proposition 14.
Corollary 17. Suppose p(1 − p)n = ω(n) and k ≥ pn + h, where Proof. For h in this range we have and so the inequality follows from Proposition 16, which can be applied as h(1 − p)n = ω(n).
Proof. This proof follows that of Theorem 1.3 in [5]. For m ≥ pn + h, we have P(B(n, p) = m + 1) Hence, As ( Proposition 19. Suppose p(1 − p)n = ω(n) and Proof. Due to the unimodality of the binomial distribution, we have that the probability density function of the binomial distribution is decreasing away from its mean, and so, We can apply Corollary 15 as h 3 ), and so it follows that This is greater than the stated bound because We shall also want a weaker but more general bound than Proposition 18 due to Bernstein in [4].
Lemma 20. Let X 1 , . . . , X n be independent zero-mean random variables. Suppose that |X i | ≤ M , then for all positive t, Proof. For a proof see [7].
We have in this section, so far discussed well-known deviation inequalities for standard binomial distributions. We will now proceed to present some analogous results for sums of binomial distributions with different parameters p, that we have not been able to find in the litterature.  2 )). For the independent random variables; X 1 = B(N − S, p), with mean µ 1 and variance σ 2 1 ,; and X 2 = B(N + S, (1 − p)) with mean µ 2 and variance σ 2 2 , we have Proof. The conditions on S and h imply that S = o(N ). Set z and l equal respectively. We can bound from below by summing over the disjoint regions (15) These regions are disjoint as if X 1 < µ 1 + h 2 −(i+1)z and X 2 < µ 2 + h 2 +(i+1)z, then X 1 + X 2 < µ 1 +µ 2 +h. For each i the region specified is an isosceles right angled triangle with axis-parallel legs of length z, and so there are at least ⌊z⌋(⌊z⌋ − 1)/2 pairs of integer values x 1 , x 2 , which X 1 , X 2 can take while still satisfying all three relations in (15). We have that h > 2lz, and so if X 1 , X 2 satisfy all three relations in (15), then X 1 ≥ µ 1 and X 2 ≥ µ 2 .
As we are only considering the region in which X 1 , X 2 are larger than their means we can bound the sum in (15) from below by (16) We have that p(1−p)(N −S) = ω(N ) and h+lz = o(p(1−p)(N −S)) 2 3 , and so we can apply Corollary 15 to get that the quantity in (16) is at least Expanding this out, and noticing ⌊z⌋ = z(1 + o(1)) and we get that the sum in (16) is at least where the approximations for ⌊z⌋ and σ 1 , σ 2 have been taken care of in the o(1) in the exponential term. We have that 4i 2 + 4i + 2 ≤ 6l 2 and l 2 z 2 ≤ 2p(1 − p)N , and so using the bounds in the statement of the proposition, the sum in (17) is at least The last inequality following because l > h/(2 2p(1 − p)N ) and Proposition 22. Suppose that p(1 − p)N = ω(N ). Furthermore assume that Then we have for independent random variables X 1 = B(N − S, p) with mean µ 1 and variance σ 2 1 , and X 2 = B(N + S, (1 − p)) with mean µ 2 and variance σ 2 2 .
Proof. The conditions on S and h imply that S = o(N ). Set z = 2N p(1−p) h , and l = h 2 4N p(1−p) . We bound P(X 1 + X 2 ≥ µ 1 + µ 2 + h) from below by covering the region where this inequality holds by We shall bound these three summands separately. Again because h > 2lz we are only considering the range in which X 1 and X 2 are greater than their means. Firstly for each i, j pair there are at most ⌈z⌉ 2 points inside the specified region, and so the product inside the sum of (19) is at most We have that p(1 − p)(N ± S) = ω(N ) and and so we can apply Corollary 17 to get that the sum in (19) is at most This is equal to We can bound the above by noting that |i − j| ≤ √ 2 i 2 + j 2 and |i 2 − j 2 | ≤ i 2 + j 2 . As we also have that N p(1 − p)/2 < z 2 l ≤ N p(1 − p), the inner sum appearing in (22) is at most A point (i, j) in the plane with integer coordinates and also satisfies |i − j| < √ 21tl, as if |i − j| ≥ √ 21tl, then i 2 + j 2 ≥ 21tl 2 , and so Therefore the number of points (i, j) in the plane with integer coordinates and satisfying both i 2 +j 2 4l − i 2 +j 2 4l < t, and −1 ≤ i + j < t is at most . This allows us crudely bound (23) by The latter sum is less than 50 √ l, and so the sum in (19) is bounded above by Secondly we bound the probability (20). As l > By Proposition 18 we get that the quantity in (20) is at most Similarly, the probability in (21) is at most As 50 √ 2π + 4 3 √ π < e 3 we get that the sum of our three bounds, (24), (25), and (26) is at most the stated bound.
and that T = o(N ), then, for N large enough, for independent random variables Z 1 = B(N − S, p) with mean µ 1 and variance σ 2 1 and Z 2 = B(N + S − T, p) with mean µ 2 and variance σ 2 2 .
Proof. Let φ(i) be the probability that Z 1 = Z 2 + pT = pN + i, then Denote the ratio between successive values of φ(i) by ψ(i). We obtain Hence, we get and so ψ is a decreasing function of i. By noting that e x−x 2 ≤ (1 + x) ≤ e x , for x ≥ − 1 2 , we can bound ψ for i = o(p(1 − p)N ) (when N is large enough). We apply e x−x 2 ≤ (1 + x) for the terms in the numerator of (27) and ≤ (1 + x) ≤ e x for the terms in the denominator of (27) to get the following lower bound of ψ and we apply ≤ (1 + x) ≤ e x for the terms in the numerator of (27) and e x−x 2 ≤ (1 + x) for the terms in the denominator of (27) to get the following upper bound of ψ Substituting in i = ± pS 2 , we get (for N large enough) that and Therefore ψ is greater than 1 at i = pN − pS 2 and less than 1 at i = pN + pS 2 . Consequently (for N large enough), the maximum value of φ occurs between these two values.
We have that , with mean µ ′ 2 and variance (σ ′ 2 ) 2 . By Corollary 17 we get that for |i| ≤ pS 2 . This is maximized when i = pST −2pS 2 2N −T and there takes the value We also obtain the bounds (for N large enough) Putting this all together and applying (28) and (29), we obtain (for N large enough) We end with some propositions about the number of edges in and between sets in G(n, p).
Proposition 24. Suppose that p(1 − p)n = ω(n). If n is large enough, then for all t > n 5 , we have that with probability at least 1 − 4 −t every set in G(n, p) of size t has at most p t 2 + 2t p(1 − p)t edges.
Proof. The expected number of sets of size t with more than By Lemma 20 and the fact that n t ≤ ( en t ) t , this expectation is at most As p(1 − p)t = ω(n), we have that if n is large enough, then for all t > n 5 we have Substituting this in we have that the expected number of sets of size t with more than p t 2 + 2t p(1 − p)t edges is (for n large enough) at most, Proposition 25. Suppose that p(1 − p)n = ω(n). If n is large enough then for all t in the range n 5 < t ≤ n 2 , we have that with probability at least 1 − 4 −t every set in G(n, p) of size t has at least pt(n − t)− 3t p(1 − p)(n − t) edges between it and its complement.
By Lemma 20 and the fact that n t ≤ ( en t ) t , this expectation is at most .
Substituting this in we have that the expected number of sets T with a small number of edges between T and [n] \ T is (for n large enough) less than exp (t(log 5 + 1) − 4t) < 4 −t .
Proposition 26. Suppose that p(1 − p)n ≥ 4 log n. If n is large enough, then for all t ≤ n 5 we have that with probability at least 1 − n − t 120 , for every set T in G(n, p) of size t there are at least twice as many edges between T and [n] \ T as there are in T .
Proof. The expected number of sets T of size t such that there are less than twice as many edges between T and [n] \ T as there are in T is n t P B (t(n − t), p) < 2B t 2 , p , for independent random variables B (t(n − t), p) and B t 2 , p . We can rewrite this as, n t P 2B t 2 , p − pt(t−1) − B(t(n−t), p) + pt(n−t) > pt(n−2t+1) .
For n 24 ≤ t ≤ n 5 , using the inequality n t ≤ en Corollary 27. Suppose that pn ≥ log n. If n is large enough, then for all t satisfying n 24 25 ≤ t ≤ n 5 , we have that with probability at least 1 − n −t 120 , for every set T in G(n, p) of size t, there are at least twice as many edges between T and [n] \ T than there are in T .
Proof. By the exact same reasoning as in Proposition 26 the expected number of sets T of size t with less than twice as many edges between T and [n] \ T than there are in T is (for n large enough) at most Proposition 28. For every fixed ǫ > 0 and p ≥ (1+ǫ) log n n , with high probability, the minimal degree of G(n, p) is greater than 8. twice the first term, therefore n t P B t 2 , p ≥ 2t ≤ n t 2 5et log n 4n 2t ≤ 2 e 3t 25 t (log n) 2t t t 16 t n t ≤ C(log n) 2 n 1 30 t , and so the expected number of set T in G(n, p) of size t with at least 2t edges is (for n large enough) at most n − t 120 .