Distinct degrees and homogeneous sets

In this paper we investigate the extremal relationship between two well-studied graph parameters: the order of the largest homogeneous set in a graph $G$ and the maximal number of distinct degrees appearing in an induced subgraph of $G$, denoted respectively by $\hom (G)$ and $f(G)$. Our main theorem improves estimates due to several earlier researchers and shows that if $G$ is an $n$-vertex graph with $\hom (G) \geq n^{1/2}$ then $f(G) \geq \big ( {n}/{\hom (G)} \big )^{1 - o(1)}$. The bound here is sharp up to the $o(1)$-term, and asymptotically solves a conjecture of Narayanan and Tomon. In particular, this implies that $\max \{ \hom (G), f(G) \} \geq n^{1/2 -o(1)}$ for any $n$-vertex graph $G$,which is also sharp. The above relationship between $\hom (G)$ and $f(G)$ breaks down in the regime where $\hom (G)<n^{1/2}$. Our second result provides a sharp bound for distinct degrees in biased random graphs, i.e. on $f\big (G(n,p) \big )$. We believe that the behaviour here determines the extremal relationship between $\hom (G)$ and $f(G)$ in this second regime. Our approach to lower bounding $f(G)$ proceeds via a translation into an (almost) equivalent probabilistic problem, and it can be shown to be effective for arbitrary graphs. It may be of independent interest.


Introduction
The focus of this paper is on the extremal relationship between the order of the largest homogeneous set in a graph G and the maximal number of distinct degrees which appear in some induced subgraph of G. More precisely, let hom(G) denote the homogeneous number of a In this context, Erdős, Faudree and Sós [14] noticed that the random graph G(n, 1/2) has f (G(n, 1/2)) = Ω(n 1/2 ) with high probability. They conjectured this property must be shared by Ramsey graphs: if G is an n-vertex graph with hom(G) = O(log n) then f (G) = Ω(n 1/2 ). Bukh and Sudakov confirmed this conjecture in [9] with an elegant and influential argument. Furthermore, they noted that there still appeared to be some flexibility here: (a) Although f (G(n, 1/2)) = Ω(n 1/2 ) forms a natural lower bound, they observed that it did not have a matching upper bound, as they proved that f (G(n, 1/2)) = O(n 2/3 ) whp.
It was later shown by Conlon, Morris, Samotij and Saxton [10], thus matching the upper bound given in (a), that in fact f (G(n, 1/2)) = Ω(n 2/3 ) whp. Recently, Jenssen, Keevash, Long and Yepremyan [19] proved that the same lower bound applies in the Ramsey context, giving a tight bound for the original Ramsey question of Erdős, Faudree and Sós.
In [26], Narayanan and Tomon solved the conjecture from (b) above, proving that actually f (G) = Ω (n/ hom(G)) 1/2 for all n-vertex graphs G. They also provided an interesting construction, which suggested a tight bound between the following parameters: if k ≤ n 1/2 then the n-vertex k-partite Turán graph T (see e.g. [6]) satisfies both hom(T ) = n/k and f (T ) = k. Narayanan and Tomon conjectured that a similar dependence must hold in general: if G is an n-vertex graph satisfying hom(G) ≥ n 1/2 then f (G) = Ω(n/ hom(G)). Supporting their conjecture, the authors proved that indeed f (G) = Ω(n/ hom(G)) when hom(G) = Ω(n/ log n). Jenssen et al. [19] improved this bound to hom(G) ≥ n 9/10 , noting that there were significant obstacles to obtaining hom(G) ≥ n 1/2 .
Our main result here confirms the Narayanan-Tomon conjecture up to a logarithmic loss. As an immediate corollary of Theorem 1.1 we obtain the following result, which strengthens the bounds Bukh and Sudakov [9] and of Narayanan and Tomon [26]. Again, note that the n 1/2 -partite Turán graph on n vertices shows that this bound is essentially sharp. However, as discussed below, there is a large and varied collection of graphs which are close to extremal value here.
Remark. As f (G) = f (G) for any graph G, we see that f (G(n, p)) and f (G(n, 1 − p)) follow identical distributions, so Theorem 1.3 determines the behaviour of f (G(n, p)) for all p ∈ [0, 1].
Together with the known estimates on the homogeneous number of sparse random graphs, Theorem 1.3 suggests a natural extremal relationship between hom(G) and f (G) when the hypothesis of Theorem 1.1 fails, i.e. when hom(G) < n 1/2 . We further examine this relationship in the concluding remarks in Section 7.
Our proofs to both Theorem 1.1 and Theorem 1.3 are building upon earlier approaches from [9] and [19], but there are many extra challenges in this regime, which require several key new ingredients and ideas. For instance, although Turán graphs represent an example of n-vertex graphs G with hom(G) = n 1/2 and f (G) = Θ(n 1/2 ), there are several very different looking graphs which exhibit (essentially) the same behaviour, including the random graph G(n, n −1/2 ). One interesting class of examples, which was also given by Naryanan and Tomon, comes from 'iterated' Turán graphs: take b < n 1/2 vertex-disjoint copies of the (n 1/2 /b)-partite Turán graph on n/b vertices. This example shows that there are many distinct extremal situations. Furthermore, one can combine such examples together to get similar behaviour. Besides, one can also 'iterate' again, a process leading to graphs with limited neighbourhood diversity, a key parameter in many earlier approaches. One of our results below (see Theorem 3.2) allows us to prove lower bounds on f (G) even without diversity control.
The above example also highlights a more significant challenge, which arises naturally for this problem. To find a large set U of vertices with distinct degrees in general, these iterated graphs show that sometimes we must first find sets U i of distinct degrees locally in smaller graphs and then combine the results into a larger set U = ∪U i . Combining such sets together can work very well for iterated graphs, but even small changes to the structure here can break the condition − at an extreme, it could be that the sets U i have distinct degrees in G[V i ] for i = 1, 2 with V 1 and V 2 disjoint, but that all vertices of U 1 ∪ U 2 have the same degree when combined in G[V 1 ∪ V 2 ]. We avoid this kind of difficulty by moving to a more general probabilistic setting, where we instead find probability distributions with certain well-controlled small ball probabilities.
Lastly, our approach in Sections 3 and 4 is quite applicable to the general problem of lower bounding f (G) in an arbitrary graph − see Theorem 3.2 and Lemma 3.4 below.
The paper is organised as follows. In the next section we present a number of tools which will be required in our proof. In Section 3 we present a probabilistic analogue of the problem of finding many distinct degrees in a graph. In Section 4 we extend this approach to a more robust variant and develop a variety of tools and estimates for studying the distinct degree problem. In Section 5 we prove Theorem 1.1 as follows: we first deal with a slightly weaker version in Section 5.1, which applies when hom(G) ≥ n 3/5+o (1) , and then build upon this in subsection 5.2 to prove Theorem 1.1. In Section 6 we present the proof of Theorem 1.3. Finally, in Section 7 we conclude with a discussion of the case when f (G) < n 1/2 .
Notation. Given a graph G and u, v ∈ V (G), we write u ∼ v if u and v are adjacent vertices in G and u ̸ ∼ v if they are not. The neighbourhood of u is given by N G (u) = {v ∈ V (G) : u ∼ v} and given S ⊂ V (G) we let N S G (u) := N G (u) ∩ S; we will omit the subscript G when the graph is clear from the context. We write d S G (u) = |N S G (u)|. Given a vertex u, we will also represent the neighbourhood of u by a vector u ∈ {0, 1} V (G) defined such that u v = 1 if and only if u ∼ v. Given a set U ⊂ V and a vector u ∈ R V , we will denote the projection of u onto the coordinate set S by proj S (u), i.e. for any v ∈ S we have proj S (u) v = u v . Given u, v ∈ V (G) we write div G (u, v) for the symmetric difference N (u)△N (v). Thus |div G (u, v)| is simply the Hamming distance between u and v.
We will write G for the complement of the graph G. It is easy to note that for any graph G we have hom(G) = hom(G) and Given n ∈ N and p ∈ (0, 1), the Erdős−Rényi random graph G(n, p) is the n-vertex graph in which each edge is included in the graph with probability p independently of every other edge. We say that an event that depends on n occurs with high probability (whp) if its probability tends to 1 as n → ∞.
Throughout this paper we will omit floor and ceiling signs when they are not crucial, for the sake of clarity of presentation.

Tools
In this short section we introduce some tools required for the rest of the paper. We will use the following version of Turán's theorem (see for example Chapter 6 in [6]).
Theorem 2.1. Let G be a n-vertex graph with average degree d. Then G has an independent set of size at least n/(d + 1).
Secondly, we require the following 'anticoncentration' theorem for the Littlewood-Offord problem, which is due to Erdős [12]: Theorem 2.2 (Erdős-Littlewood-Offord). Let S be a set of n real numbers of absolute value at least 1. Then, for each α ∈ R, there are at most n ⌊n/2⌋ = Θ(2 n n −1/2 ) subsets of S whose sum of elements lie in the interval [α, α + 1).
We now give a probabilistic interpretation of the previous theorem, which can also be found in [19]. For the sake of completeness, we include a proof of this result. Theorem 2.3. Fix non-zero parameters a 1 , a 2 , . . . , a n ∈ R and p 1 , p 2 , . . . , p n ∈ [0.1, 0.9]. Suppose that X 1 , X 2 , . . . , X n are independent Bernoulli random variables with X i ∼ Be(p i ). Then: and Y i ∼ Be(0.5) are independent random variables. We want to make this choice so that each w i ≥ 0.2 and we can do this by letting z i = 0, w i = 2p i if p i ≤ 1/2 and by letting z i = 1, We now condition on any choice C of the W i 's and Z i 's so that the set I = {i ∈ [n] : which by Theorem 2.2 (eventually with a scaling argument) is at most O(n −1/2 ). On the other hand, let W = (W 1 +W 2 +· · ·+W n )/n and note that |I| = nW . Moreover, By Chebyshev's Inequality we get P(|I| ≤ n/10) ≤ O(n −1 ).
The conclusion now follows by combining these two results in the Law of Total Probability.
The following optimization results will be very useful along the way.
This gives us the first part, whereas the second one is just f (0) > f (a−t).
Proof. We may assume that a ≤ b. Let now x := a + b and a := tx with 0 < t ≤ 1/2. Upon dividing by x, the inequality we need to prove rewrites as 2t + t log 2 (tx) The map f : (0, ∞) → R given by f (y) = y log 2 y is convex and can be continuously extended to f (0) = 0. Therefore the LHS in our last inequality above is convex, so we only need to check that the inequality holds for t = 0 and t = 1/2, which can be easily seen.
Finally, we require some classic concentration inequalities. See e.g. appendix A in [4]. Theorem 2.6 (Chernoff Inequality). Let X be a random variable with binomial distribution and let µ = E[X]. Then, for 0 ≤ δ ≤ 1, the following inequalities hold: .
The following bound will be useful for larger deviations.
Theorem 2.8 (Hoeffding's Inequality). Let X 1 , X 2 , . . . , X n be independent random variables such that Then given t > 0, the random variable S n = X 1 + · · · + X n satisfies: 3 Degrees and distributions on the continuous cube 3.1 Recasting the problem Given a graph G and a probability vector p = (p v ) v∈V (G) ∈ [0.1, 0.9] V (G) we will write G(p) to denote the probability space on the set of induced subgraphs of G, determined by including each vertex v ∈ V (G) independently with probability p v . Equivalently, given S ⊂ V (G), the induced subgraph G[S] is selected with probability v∈S p v v∈V (G)\S (1−p v ). Abusing notation slightly 1 , we will usually write G(p) to denote a random graph G[S] ∼ G(p).
Throughout the paper, given a vertex u ∈ V (G), we will we write u ∈ {0, 1} V (G) to denote the neighbourhood vector of u, which is given by: Note that, considering the standard inner product on R V (G) , given by x · y = v∈V (G) x v y v , this notation leads us to the useful representation: Our first lemma comes to show that two vertices whose expected degrees (under the distribution G(p)) are separated are unlikely to have the same degree in an induced subgraph selected according to G(p).
Proof. Set W := div(u, v) and T = |W |; by hypothesis T ≥ 2. Letting X := d G(p) (u)−d G(p) (v), this random variable can be written as X = w∈W ±X w where X w ∼ Be(p w ) are independent Bernoulli random variables. We seek to upper bound P d G(p) (u) = d G(p) (v) = P(X = 0).
As E[X] ≥ D by our hypothesis, one gets by Hoeffding's Inequality that: since X is a sum of T independent random variables bounded by [−1, 1]. On the other hand, by Theorem 2.3 we get P (X = 0) = O T −1/2 . Thus: The map x → 1/ √ x is decreasing on (0, ∞), whereas x → exp(−D 2 /4x) is increasing, and their intersection point satisfies the equation This gives x = Θ(D 2 / log(D)) and we get the conclusion by substituting this into (2).
In analogy with f (G), define: The next result shows a lower bound for f (G) follows from a lower bound for f p (G).
Theorem 3.2. Given a graph G and a probability vector p ∈ [0.1, 0.9] V (G) with f p (G) ≥ 2, the following relation holds: Proof. First note that f (G) ≥ 1 for every non-empty graph G, therefore we may assume that L := f p (G) ≥ C for some absolute constant C. As above, we will write G[S] to denote a random induced subgraph G[S] ∼ G(p). Let U ⊂ V (G) be a 1-separated set in G(p) with U = {u 1 , u 2 . . . , u L }, so that the vertices are ordered with increasing expected degree in G(p).
≥ j − i ≥ 2 and so we can apply Lemma 3.1 to obtain that: where here c > 0 is an absolute constant. Here we used that √ log x/x is decreasing for x ≥ 2. Now let us consider a random graph H on U 1 = {u 3 , u 6 , . . . , u 3⌊L/3⌋ }, where we build an edge between two vertices if they have the same degree in G[S] ∼ G(p). The expected number of edges in H is given by: It follows by Markov that P e(H) ≤ cL log 3/2 (L)/3 ≥ 2/3.

Moving to distributions
The message we get from Theorem 3.2 is that a lower bound on f (G) for any graph G (up to logarithmic factors) follows from a lower bound on: This second quantity can be perceived as a continuous relaxation of f (G) − which trivially corresponds to maximizing over {0, 1} V (G) . However, from our point of view the second solution space is considerably richer, and in particular will allow different behaviours to be blended in a way that is not possible with vectors from the discrete cube; for example, one can take convex combinations of vectors in [0, 1] V (G) .
Although we would like to lower bound f (G), this quantity turns out to be just too rigid for certain inductive steps which we want to carry out later 2 . Instead, we introduce a generalised parameter, defined in terms of probability distributions on [0.1, 0.9] V (G) , which turns out to be more robust in this respect.
Let G be a graph and let D be a probability distribution on [0.1, 0.9] V (G) . Given distinct vertices u, v ∈ V (G) and a set S ⊂ V (G), we define: This quantity can be viewed as a small ball probability − a measurement of how likely the expected degrees in S of two vertices u, v ∈ V (G) are to differ in G(p) by an (almost) fixed amount. Given sets U, S ⊂ V (G), we also set: Given another set V ⊂ V (G) we can also write: We will sometimes suppress the superscript when S = V (G), e.g. bad D (U ) = bad V (G) D (U ). Lastly, let us remark that in (3) we do not need D to be defined on all vertex coordinates of the set [0.1, 0.9] V (G) ; any vertex set T with S ⊆ T ⊆ V (G) is enough so that we can define D on [0.1, 0.9] T , as we can see by looking at the RHS of (3).
The following lemma shows that a lower bound on f p (G) (and on f (G) by Theorem 3.2) follows by finding a large subset U ⊂ V (G) such that bad D (U ) is bounded by |U |.
Proof. To see this, select p ∼ D and let Y denote the random set: Note that: It follows that there is a choice of p ∈ [0.1, 0.9] V (G) such that |Y (p)| ≤ α|U |. Viewing the pairs in Y (p) as the edges of a graph J on the vertex set U , again by Turán's theorem we can find an independent set in this graph which has order |U |/(1 + α). By definition of J, this gives a lower bound on f p (G), as required.
From Theorem 3.2, the quantity f (G) is (essentially) lower bounded by f p (G). To close this subsection, and complete the circle, we show that that this also holds in the reverse direction. In particular, up to logarithms the quantities f (G) and f (G) are of the same order of magnitude.

Lemma 3.4. Let G be a graph and let
Proof. To see this, let s ∈ {0, 1} V (G) denote the indicator vector of the set S and let 1 denote the constant 1 vector.
Select α uniformly at random in [−0.4, 0.4] and consider the random vector: Write D for the resulting probability distribution on [0.1, 0.9] V (G) . Given p, by (1) we get: is uniformly distributed over an interval of length at least 0.8(j − i). By definition (3), this then gives:

Building distributions for distinct degrees
From the previous section, via Theorem 3.2 and Lemma 3.3, we know that in order to find many distinct degrees in a graph G it suffices to find a large set U ⊂ V (G) and a probability distribution D such that bad D (U ) is small. In this section we will collect a number of results together, which will be used in combination to exhibit such distributions D.
From the 'iterated' graph examples discussed in Section 1 we saw that occasionally we must first find distinct degree sets U i in graphs G[S i ] where {S i } i are disjoint, and then combine these sets together so that i U i will have distinct degrees in G[ i S i ]. Unfortunately, it is also not hard to see that vertices within U i can easily agree in degree in the resulting union graph, even if we move from sets U i to vectors p i as in Section 3.
While working with fixed sets or vectors can cause difficulties, our first lemma shows that the setting of distributions allows more flexibility here: we can combine distributions while maintaining 'bad' control. This flexibility was the key motivation for working in this more generalised setting (indicated in subsection 3.2).
Proof. To see this, take c ∈ R and define X to be the random variable: It suffices to prove that P we denote by p i and q i its projections on V i and W i , respectively, which are mutually independent.
It is easy to see that: Conditioned on any choice for q i , we see that ] becomes a constant, therefore we obtain that: as p i and q i are independent. It follows that P Our second lemma gives a simple situation in which we can obtain 'bad' control. Let G be a graph and let S ⊂ V (G). Let U S denote the uniformly constant distribution on [0.1, 0.9] S , given by selecting α ∈ [0.1, 0.9] uniformly at random and setting p = α1 S ∈ [0.1, 0.9] S .
Proof. First note that by Lemma 4.1 we have bad D (u, v) ≤ bad S U S (u, v) and so it suffices to upper bound this second quantity.
Taking c ∈ R, we seek to upper bound the probability that To analyse this, note that: Since p = α1 S where α is selected uniformly at random from [0.1,0.9], this gives: As α varies uniformly over the interval [0.1, 0 varies uniformly over an interval of length at least 0.8D, giving that the probability that We next seek to provide 'bad' control for a set by blending neighbourhood structures together. Let G be a graph, let U, S ⊂ V (G), where U := {u 1 , . . . , u k }, and let β ∈ [0, 0.4]. We now let B β (U, S) denote the blended probability distribution on [0.1, 0.9] S , which is defined as follows. First independently select α i ∈ [−β, β] uniformly at random for each i ∈ [k] and set: Having made these choices, the distribution then returns p, a truncated version of p ′ , where: Our final lemma in this section provides 'bad' control for blended distributions under certain well-behaved situations. Given D > 0, γ ∈ [0, 1] and sets U and S as above we say that: . Then for all u, v ∈ U one has: Proof. Suppose U = {u 1 , u 2 , . . . , u k+1 } and that for each i ∈ [k + 1], given the vector p ′ on R S from (4), we define the random vector q i on R S by q i : The key observation is that q i is independent of α i . We will slightly abuse notation by writing p for both a vector in [0.1, 0.9] V (G) and its projection proj S (p) onto the coordinate set S. We can do this without much of a worry since D is the product distribution (3), to prove the lemma it will suffice to show that: We say the set Y i,j is naughty if it contains a naughty vertex and we let F i,j denote this event. By the law of total probability we get that: (v) uniform independent random variables, as the coordinates u iv are non-zero when v ∼ u i . Thus by Hoeffding Inequality we get: where the final inequality uses that d To compute P E i,j (c)|F i,j we condition on any choice of α := (α l ) l̸ =i such that F i,j does not hold. Given such a choice, let us first see that So none of the Y i,j -coordinates of p ′ will get truncated and recall that α i is independent of F i,j . Given a choice of α, consider now the following expression as a map of α i : Having conditioned on α above, note that the event E i,j (c) holds only if f (α i ) lies in an interval of length 2. However, as α i increases, the contribution from each coordinate of p to the inner product on the right hand side of (6) is non-decreasing. Furthermore, the contribution of all of the Y i,j -coordinates is exactly α i , since none of these coordinates were truncated from p ′ as we have conditioned on F i,j . It follows that for ε > 0: where g ε,v ≥ 0 for all v ∈ V (G) and g ε,v = ε for v ∈ Y i,j . Therefore, conditioned on α as above, if E i,j (c) occurs then α i lies in an interval of length 4/D. This happens with probability at most 2β −1 D −1 and the result in (5) quickly follows from the law of total probability. Before we end this section, we define a simple but convenient distribution. Given a graph G and a set S ⊂ V (G), let T S denote the trivial probability distribution on S, which is simply the distribution on [0.1, 0.9] S that selects the constant vector p 0 = 1 2 · 1 S with probability 1.

The Narayanan-Tomon conjecture
In this section we will prove Theorem 1.1, our approximate version of the Narayanan-Tomon conjecture. From Theorem 3.2 and Lemma 3.3 it will suffice to prove the following theorem.
Theorem 5.1. Let n ∈ N and k ≥ 1 with n ≥ 20000k 2 and suppose that G be an n-vertex graph with hom(G) ≤ n/25k. Then there is a set U ⊂ V (G) and a probability distribution D on [0.1, 0.9] V (G) such that: The proof will split into two regimes. The first deals with the case where n = Ω(k 5/2 ) and the more difficult second case focuses on the regime k 2 ≤ n = O(k 5/2 ).
To begin, we first present a quick application of Lemma 4.3 that guarantees 'bad' control for a set which is Ω(k 3/2 )-diverse.
Lemma 5.2. Let G be a n-vertex graph and suppose that there is a set Therefore U is both (k 3/2 + k)-diverse and 1-balanced to S. We let D := B β (U, S), where we set β −1 := 56(k + 1) log(k + 1), and apply Lemma 4.3 to obtain for all i ̸ = j that: As the map f : [1, ∞) → R given by f (x) := 16 log 2 (x + 1) − 4 √ 14 log 1/2 (x + 1) − 1 is increasing and positive at 1, we can now easily deduce that for all i ̸ = j one has: By summing over all i ̸ = j in [k + 1] we finally deduce that: as required.

The case when n = Ω(k 5/2 )
The next result controls 'bad' under the assumption that G has bounded maximum degree.
Proof. Set k = ⌈x⌉, noting that x ≤ k < 2x. We will first select U = {u 1 , . . . , u k+1 } step by step over a series of rounds. To do so, we are going to select a 'control' set Y i for each u i ∈ U , so that u i is strongly joined to Y i , but any u j ̸ = u i in U with j > i is quite weakly joined to Y i . This property will allow us to separate the expected degrees of vertices in U and build the distribution D.
We inductively construct vertex sets U i = {u 1 , u 2 , . . . , u i }, V i and Y i for i ∈ [k] so that: To begin, we set U 0 = Y 0 = ∅ and V 0 := V (G). Suppose now i ∈ [k] and that we have found U i−1 , V i−1 and {Y j } j<i as above and wish to find these sets for i. We look at G i := G[V i−1 ] and see that it must have a vertex u i with d G i (u i ) ≥ 2k. If not, then ∆(G i ) ≤ 2k − 1 and so by Turán we obtain an independent set in G i of size at least |V i−1 |/2k ≥ (n − 5(i − 1)∆(G))/(2k) > (n − 5x∆(G))/(4x) ≥ n/5x, contradicting the hom(G) condition from the hypothesis. We now let U i := U i−1 ∪ {x i } and we pick a subset Y i ⊂ N G i (u i ) of size 2k. We then define the set Observe that by construction (i)-(iv) hold above, and it just remains to show (v).
We bound |Z i | by double counting the number of edges between Z i and Y i . From each z ∈ Z i there are at least k/2 edges going to Y i , hence e(Y i , Z i ) ≥ k|Z i |/2. However, from each y ∈ Y i there are at most ∆(G) edges going to Z i , thus e(Y i , Z i ) ≤ 2k∆(G). It follows that |Z i | ≤ 4∆(G) and so |Z j ∪ Y j | ≤ |Z j | + 2k ≤ 4∆(G) + 2k ≤ 5∆(G), as required.
To complete the proof of the lemma, we set i := k and take u k+1 ∈ V k ̸ = ∅. By using (i)-(iv) above we get disjoint sets U = U k ∪ {u k+1 } = {u 1 , . . . , u k+1 } and {Y j } j∈[k] such that: For each i ∈ [k] we let D i denote the uniformly constant distribution on Y i , i.e. D i := U Y i . Taking Y 0 := V (G) \ (∪ i Y i ), we also let D 0 := T Y 0 denote the trivial distribution on the set Y 0 := V (G) \ (∪ i Y i ) (as defined at the end of Section 4). Lastly, we take D to be the product distribution D := i∈[0,k] D i on [0.1, 0.9] V (G) . Note that from Lemma 4.1, equation (7) and Lemma 4.2, for all i < j we obtain that: It follows that bad D (U ) ≤ k+1 2 2 k = |U |, as desired.
We are now in a position to prove Theorem 5.1 for n = Ω(k 5/2 ).
Theorem 5.4. Let n ∈ N and x ≥ 1 with n ≥ 1000x 5/2 . Suppose that G is an n-vertex graph with hom(G) ≤ n/20x. Then there is a probability distribution D on [0.1, 0.9] V (G) and a vertex set Proof. Let k := ⌈x⌉. We will prove the theorem by induction on |V (G)|. To start with, observe that there is nothing to prove when k ≤ 4 as we can set D to be any distribution on [0.1, 0.9] V (G) and the requirements are trivially satisfied by any (k + 1)-vertex set U , since bad D (u, v) ≤ 1 for any pair u, v of vertices; such a set U exists as k + 1 ≤ 1000x 3/2 . In particular, this proves that the theorem holds for the smallest possible case, when n = 1000 (where x must equal 1). We will proceed with the induction step and assume that k > 4.
Let V 0 be a largest vertex set of G such that | div(u, v)| ≥ 2k 3/2 for all u, v ∈ V 0 . If |V 0 | ≥ k+1 then we are done by Lemma 5.2, otherwise assume that V 0 = {v 1 , v 2 , . . . , v L } for some L ≤ k and now for each i ∈ [L] define the set The proof splits into two cases: It is easy to see that there are at most 3k 2 vertices of G that do not lie in a set V j of size at least 3k. Moreover, Case II: We pick a subset V of V j of size 3k such that v j ∈ V . Next, we set X 1 := N (v j ) \ V and X 2 =: V (G) \ (V ∪ N (v j )). By the choice of v j note that both |X 1 |, |X 2 | ≥ 10k 3/2 − 3k. Our aim is to show that most vertices in X 1 have big degree in V , whereas most vertices in X 2 have small degree in V . This will allow us to separate the distinct degrees we get in G[X 1 ] from those we get in G[X 2 ].
To do this, we double count the edges in G between X 1 and V . Recall X 1 ⊂ N (v j ) and for each v ∈ V we have | div(v, v j )| ≤ 2k 3/2 , so each v ∈ V gives at most 2k 3/2 edges from itself to X 1 . Hence the number of edges e G (X 1 , V ) ≤ (3k)(2k 3/2 ) = 6k 5/2 . It follows that there are at most 6k 3/2 vertices of X 1 that are connected to less than 2k vertices in V . Thus, if we let Similarly, we double count the edges in G between X 2 and V to see that there are at most 6k 5/2 of them. It follows that at most 6k 3/2 vertices of X 2 that are connected to more than k vertices in V . Therefore, if we let Y 2 := {u ∈ X 2 : d V G (u) ≤ k}, then we can also see that To complete the proof, we apply the induction hypothesis to both Y 1 and Y 2 . For i ∈ {1, 2} let Thus, for i ∈ {1, 2}, provided x i ≥ 1 holds, we can apply the induction hypothesis to G[Y i ] to find a probability distribution D i on [0.1, 0.9] Y i and a set U i ⊂ Y i satisfying |U i | ≥ x i + 1 and: Also note that if instead x i < 1 above, then as |Y i | = t i ≥ 10, we can take any set U i ⊂ Y i of order ⌈x i ⌉ + 1 = 2 and any distribution D i on [0.1, 0.9] Y i , so (8) holds in all cases.
We can also assume that max{|U 1 |, |U 2 |} < k + 1, as otherwise taking U to simply be one of these sets proves the theorem.
and v ∈ Y 2 , by definition of Y 1 and Y 2 . It then follows from Lemma 4.2 that for all such vertices we have: As n ≥ 1000x 5/2 and |Y 3 | ≤ 15k 3/2 , we can now lower bound the size of U : which gives |U | ≥ x + 1. Finally, we are able to estimate bad D (U ) as follows: The final three inequalities here respectively follow from Lemma 4.1, then from (8) and (9), and lastly from max{|U 1 |, |U 2 |} < k + 1 and Lemma 2.5. This completes the proof.

The case when n = O(k 5/2 )
Before we move to the case when n = O(k 5/2 ), we present two results which will allow us to move to a large induced subgraph, which is reasonably regular. Comparable results, with a different range of parameters, were proved by Alon, Krivelevich and Sudakov in [3] (Section 2). The next lemmas follow their approach.
Lemma 5.5. Every n-vertex graph G contains an induced subgraph H of order at least n/3 such that ∆(H) ≤ 2 log 2 n · d(H).
Proof. We set G 0 := G and for i = 0 to i = log 2 n we repeat the following algorithm: first set n i := |V (G i )|, ∆ i := ∆(G i ) and d i := d(G i ). Then, if ∆ i ≤ 2d i log 2 n we simply stop the process. Otherwise we repeatedly delete from G i all vertices of degree at least d i log 2 n to create a new graph G i+1 . Let H be the graph we obtain after we complete the algorithm.
If H was created because at some point ∆ i ≤ 2d i log 2 n then we are done. Otherwise H was obtained after log 2 n iterations and at each step i we have ∆ i+1 ≤ d i log 2 n and 2d i log 2 n ≤ ∆ i . Thus we see that ∆ i+1 ≤ ∆ i /2. It follows inductively that ∆(H) ≤ ∆(G) · 2 − log 2 n < n · n −1 = 1. We then get that ∆(H) = d(H) = 0, which also ends the solution.
Lemma 5.6. Every n-vertex graph G contains an induced subgraph H that is of order at least n/30 log 2 n with ∆(H) ≤ 5 log 2 n · δ(H).
Proof. By the previous lemma we can find an induced subgraph G 0 of G of order m ≥ n/3 such that ∆(G 0 ) ≤ 2 log 2 n · d(G 0 ). We now perform the following algorithm: starting with i = 0, let d i := d(G i ) and delete a vertex v of G i if 5d G i (v) < 2d 0 . Let now G i+1 be the resulting graph and increment i. Note that at each step we remove from G i at most 2d i /5 edges, which implies that is an increasing sequence. Therefore we stop before deleting all the vertices and we let H be the resulting graph.
We are interested in finding sets that have many diverse pairs of vertices as they will give us the freedom required to select vertices with distinct degrees. We thus make the following: Definition 5.7. Given a graph G and ε > 0, its diversity graph J ε (G) is the graph on V (G) with an edge vertices u and The following theorem is the main component of our proof in this case. We note that our earlier results from Subsection 5.1 will be crucial here.
Proof. We first note that if k is small then there is nothing to prove, so we can assume k > 2 40 . Moreover, together with the hypothesis this gives: 20 + 4 log 2 k ≤ 2 log 2 n ≤ 20 + 5 log 2 k ≤ (5.5) log 2 k ≤ k/100.
Next, we apply Lemma 5.6 to find an induced subgraph H of G of order m ≥ n log −1 2 n/30 with ∆(H) ≤ 5 log 2 n · δ(H). From now on we will only work with this subgraph H. Notice that ∆(H) ≥ k log −1 2 n/10, as otherwise by Turán, combined with (10), we find an independent set in G of order at least m/(∆ + 1) ≥ n(3k + 30 log 2 n) −1 > n/4k, contradicting the hypothesis.
Our proof will split according to the sizes of S 1 and S 2 .
We will show that in this scenario we can take the desired set U ⊂ S 1 . We select a set W ⊂ S 1 by including every element of S 1 independently with probability p := 8k/|S 1 |.
We now claim that each of the following events holds with probability at least 3/4: Combining the above bounds gives us P(A i ) + P(A ii ) + P(A iii ) ≤ 3/4. Therefore, by using the union bound we can choose a set W ⊂ S 1 that satisfies all the conditions in (i)−(iii).
To complete the proof in this case, we choose a subset U ⊂ U 0 of size 10 −3 k log −2 By summing over all pairs of distinct vertices in U , it immediately follows, as required, that: Our first step here is to find a set W ⊂ S 2 and for each vertex w ∈ W two sets S w , T w ⊂ V (H) with the following properties: 2 k for all w ∈ W . With these sets in hand, our set U will (roughly) be of the form U = w∈W U w , where each U w is a set produced by applying Theorem 5.4 to S w , while the sets T w will be used to reduce 'bad' control between vertices in distinct U w .
As the diagram suggests, our partition is guided by the neighbourhoods of vertices in the set W = {w i } i . The high 'J-degree' of vertices in S 2 guarantees a strong clustering behaviour, so that each vertex w i has a large set T w i of vertices which behave very similarly. These sets can be used to obtain 'bad' control between vertices in distinct S w i .
We now proceed with the details. To begin, select a set W 0 ⊂ S 2 by including each element independently with probability p := 1/8∆, where ∆ := ∆(H). For each w ∈ W 0 we set: We then let W ⊂ W 0 be the set W := {w ∈ W 0 : |S w | ≥ |N H (w)|/2}. Lastly, each w ∈ W is also an element of S 2 , by definition, so we have d J (w) ≥ m/600k > t. We take T w to be an arbitrary subset of size t from N J (w).
Having specified the sets, it remains to show that with positive probability properties (i)−(iv) hold for our choices. To see this, note that (ii) holds by definition of S w and W . Property (iii) also always holds as which by definition of S w implies w = w ′ . Lastly (iv) immediately holds by construction.
It only remains to prove that (i) holds with positive probability. To see this, note that given w ∈ S 2 and v ∈ N (w) we have: Thus P(v / ∈ S w |w ∈ W 0 ) ≤ 1 − e −1/4 ≤ 1/4 and so E |N (w) \ S w | w ∈ W 0 ≤ |N (w)|/4. It follows from Markov's inequality that: We can now further deduce that: Thus we can fix a choice of W so that (i), and hence (i)−(iv), are satisfied.
Our current aim is to find distinct expected degrees in each subgraph G[S w ] with w ∈ W by appealing to Theorem 5.4 and to use the control sets {T w } w∈W that ensure we can control the degrees between the different sets, so that we can find our required set U in w∈W S w .
To proceed with this, first observe that the sets {S w } w∈W are pairwise disjoint, since for distinct w, w ′ ∈ W we have S w ∩ S w ′ ⊂ S w ∩ N H (w ′ ) = ∅ by (ii) and (iii).
Next, notice that the sets {T w } w∈W are also pairwise disjoint. Indeed, suppose there is some v ∈ T w 1 ∩T w 2 for some distinct w 1 , w 2 ∈ W and assume |N (w 1 )| ≤ |N (w 2 )|. Let S v := S w 2 ∩N (v) and S v := S w 2 \ N (v). As v ∼ w 1 in J, N H (w 1 ) ∩ S w 2 = ∅ and S w 2 ⊂ N H (w 2 ), we immediately see that We want to ensure that vertices of S w have high degree in T w , whereas their degree in T w ′ with w ′ ̸ = w is low. Given w ∈ W we define: If we count the non-edges between S w and T w we see there are at least (t/3)|L w | of them, whereas their number is at most t(ε|N (w)|) since each vertex of T w is connected to w in J. It follows that |L w | ≤ 3ε|N (w)| ≤ 3ε(2|S w |) ≤ |S w |/8, using (ii) above and that ε = 1/48. Similarly, by double counting the edges between w ′ ̸ =w S w ′ and T w we obtain |R w | ≤ |S w |/8.
Discarding elements if necessary, we may assume that |S ′ w | > 1 for all w ∈ W . As the sets {S w } w∈W are pairwise disjoint, this also holds for {S ′ w } w∈W . From our bounds above we find that: The second inequality here comes from |L w |, |R w | ≤ |S w |/8, whereas the third one uses that: The final inequality above comes from (ii). Continuing with the previous expression, using that δ(H) ≥ ∆/(5 log 2 n) and that, by property (i), |W | ≥ |S 2 |/16∆ ≥ m/32∆, we obtain: We are now in good position to find the desired set U . To do this, our main aim is to apply Theorem 5.4 to each graph G[S ′ w ]. With this in mind, for each w ∈ W let k w := |S ′ w |k/n, and note that |S ′ w |/k w = n/k. Also recall that |S ′ w | ≤ |N H (w)| ≤ ∆ ≤ 4nk −1/3 , hence: We also have hom(G[S ′ w ]) ≤ hom(G) ≤ n/12k = |S ′ w |/12k w . Thus for each w ∈ W , provided k w ≥ 1, we can apply Theorem 5.4 to G[S ′ w ] to obtain a set U w ⊂ S ′ w with |U w | ≥ k w + 1 and a probability distribution D w on [0.1, 0.9] S ′ w such that: As in the proof of Theorem 5.4, if k w ≤ 1 then any set U w ⊂ S ′ w of size 2 ≥ k i + 1 trivially satisfies (12), thus the above computations all make sense.
We now set U := w∈W U w and S ′ := w∈W S ′ w . Our distribution D will again be a product distribution, with D w the forming factors. For each w ∈ W we also take E w to denote the uniformly constant distribution on T w given by E w := U Tw and set T := ∪ w∈W T w . We note that given distinct w, w ′ ∈ W and u ∈ U w , u ′ ∈ U w ′ we have d Tw H (u) ≥ 2t/3 ≥ d Tw H (u ′ ) + t/3. Therefore, by the choice of E w and from Lemma 4.2 we find that: We also set R := V (G) \ (S ′ ∪ R) and let F denote the trivial distribution on R, i.e.
To complete the proof, we are only left to lower bound |U | and upper bound bad D (U ). For the lower bound, using (11) and that log 2 n ≤ 2 √ 2 log 2 k from (10), we obtain: For the upper bound on bad D (U ), we have: The first inequality here follows Lemma 4.1, the second is immediate from the definition of bad S D (U, V ), whereas the third one holds by (12) and (13). Choose a smallest subset W ′ := {w 1 , . . . , w M } of W so that | w∈W ′ U w | ≥ t/9. If W ′ = {w ′ } for some w ′ ∈ W then we are done by simply taking Otherwise we can assume that the sequence U i := U w i is non-increasing in size with i, i.e. that |U 1 | ≥ |U 2 | ≥ ... ≥ |U M |. Setting U <i := j<i U i , we immediately see from our choice of W ′ that |U <i | ≤ t/9. Our bound on bad D (U ) from above thus gives: For each i ≥ 2 we have |U i | ≤ |U <i | from the ordering and so by applying Lemma 2.5 we get |U <i | · f (|U <i |) + |U i | ≤ |U <i+1 | · f (|U <i+1 |). Repeatedly applying this as i increases gives us bad D (U ) ≤ |U | · f (|U |), noting that U <m+1 = U . This completes the proof.
Let us remark by combining the two cases in the proof above that |U | ≥ 2 −25 k log −2 2 (k + 1). We are finally able to prove Theorem 5.1. The proof is very similar to that of Theorem 5.4 but, as the details are involved, for completeness we will go through it with care.
Proof of Theorem 5.1. We will prove a slightly more convenient statement, namely that given the hypothesis there is a set U ⊂ V (G) and a distribution D on [0.1, 0.9] V (G) such that |U | ≥ 2 −26 (k + k 3/4 ) log −2 2 (k + 1) and bad D (U ) ≤ 8|U | log 2 |U |. We will prove this by induction on |V (G)|. Note that the theorem trivially holds in the first case where the hypothesis applies, when n = 20000 and k = 1 (taking U to be any sets of size 1 and D the trivial distribution). Also, as in Theorem 5.8, when k is small there is nothing to prove, so we can assume k ≥ 2 25 .
The proof splits into the two already familiar cases: We have seen that there are at most 3k 2 vertices of G that do not lie in a set V j of size at least 3k. Moreover, . By the pigeonhole principle we then find a set V ⊂ V (G) with |V | ≥ (n − 3k 2 )/2 ≥ 12n/25 and ∆(G[V ]) ≤ nk −1/3 + 2k 3/2 < 4|V |k −1/3 ; otherwise we look at G. Moreover, |V | ≥ 8000k 2 and hom(G[V ]) ≤ hom(G) ≤ n/25k ≤ |V |/12k, hence we can apply Theorem 5.8 to G[V ] to obtain a distribution D 1 on [0.1, 0.9] V and a vertex set U ⊂ V with bad D (U ) ≤ 8|U | log 2 |U | and |U | ≥ 2 −25 k log −2 2 (k + 1) > 2 −26 (k + k 3/4 ) log −2 2 (k + 1) . We also take D 0 := T V (G)\V to be the trivial distribution on the set V (G) \ V , and let D : As in Theorem 5.4, pick a subset V of V j of size 3k such that v j ∈ V , then set X 1 := N (v j )\V and X 2 =: V (G) \ (V ∪ N (v j )). We note that both |X 1 |, |X 2 | ≥ nk −1/3 − 3k. The same double counting argument from Theorem 5.4 works here to give us the sets To complete the proof we will apply the induction hypothesis to both Y 1 and Y 2 . Let t i := |Y i | for i ∈ {1, 2} and set k i := k(t i /n) ≤ k. This gives us hom Therefore, for i = 1, 2 we can apply the induction hypothesis to G[Y i ] to find a probability distribution D i on [0.1, 0.9] Y i and a set U i ⊂ Y i which satisfies |U i | ≥ 2 −26 (k i + k 3/4 i ) log −2 2 (k i + 1) and: Let us remark that if k i < 2 33 then any set U i ⊂ Y i of size 2 ≥ 2 −25 k i log −2 2 (k i + 1) trivially satisfies (14), as already noted many times before, thus the above computations all make sense.
We can also assume that max{|U 1 |, |U 2 |} ≤ k, as otherwise the theorem follows immediately by just taking U to equal one of these sets.

Distinct degrees in random graphs
In this section we will study f (G(n, p)), the number of distinct degrees which can be found in an induced subgraph of the Erdős−Rényi random graph G(n, p). Our results extend the estimates for the case of constant p due to Bukh and Sudakov [9] and to Conlon, Morris, Samotij and Saxton [10]. We restate Theorem 1.3 for the reader's convenience. (i) f G(n, p) = Θ 3 pn 2 for p ∈ [n −1/2 , 1/2]; (ii) f G(n, p) = Θ ∆(G(n, p)) for p ≤ n −1/2 .
Although the estimation of f (G(n, p)) is quite natural in itself, we believe, as discussed in the concluding remarks, that the behaviour for p ∈ [n −1/2 , 1/2] essentially determines the extremal relationship between hom(G) and f (G) beyond the range of the Narayanan-Tomon conjecture, when hom(G) < n 1/2 . As a result, our calculations will focus on the case (i) of Theorem 1.3. The next subsection contains the proof the upper bound on f (G(n, p)) in this case, whereas the second subsection contains the more difficult lower bound. In the final subsection we briefly indicate how to approach the case when p ≤ n −1/2 .

Upper bound on f (G(n, p))
In this subsection we prove the upper bound on f (G(n, p)). Our approach closely follows that of Bukh and Sudakov (see Proposition 2.4 in [9]), but we include the complete details, as the estimates are more involved in the sparse case. Proposition 6.1. Given n ∈ N and p ∈ [n −1/2 , 1/2], one has f G(n, p) = O 3 pn 2 whp.
Proof. Suppose G ∼ G(n, p) has a subset A ⊂ V (G) of size a such that G[A] has 8b distinct degrees, where b = 16 3 pn 2 . As at most 6b − 1 of our distinct degrees can lie in the interval (pa − 3b, pa + 3b), either there are at least b vertices of A that have degree at least pa + 3b or at least b vertices that have degree at most pa − 3b.
We will assume first that we are in the former case, as this is the more intricate one. Let Letting F denote the event that there are sets A and B which satisfy (17), it suffices to show that P(F ) = o(1). To see this, first suppose that 16p(a where the final inequality uses that a − b ≤ a ≤ n. Therefore, the union bound implies that event F can happen with probability at most: This tends to zero as n → ∞, as b ≥ 16 3 pn 2 . Now suppose instead that 16p(a − b) < b. As e(A \ B, B) has binomial distribution, we have: Recalling that p ≥ n −1/2 and that b ≥ 16 3 pn 2 ≥ 16 3 √ n −1/2 n 2 ≥ 16 √ n, using the union bound we find that the event F occurs with probability at most: Hence, it follows again that P(F ) = o(1) in this second case. ≤ 1 to get: where the last inequality follows as a − b ≤ a ≤ n. We have seen before that the union bound gives us probability of at most 2 n · n b · exp (−b 3 /2pn) for such a set B to exist and we have shown in the previous case that this probability tends to 0 as n → ∞.

Lower bound on f (G(n, p))
We now focus on proving our sharp lower bound for f (G(n, p)). Before we start, we will present a few results that will help us along the way.
Given D > 0 and a graph G, we call a set Proposition 6.2. If p ≫ log n/n then all vertices of G(n, p) have degree of order np whp. In particular, whp they all have degrees less than 2np.
Proof. Let u be a vertex of G(n, p). Then d G(n,p) (u) ∼ Bin(n−1, p), so we can apply Chernoff's Inequality for δ = 3 log n np to get P |d(u)−np| ≥ 3 √ np log n ≤ 2n −9/2 . The result now follows by the union bound. The last part is a consequence of the fact that δ ≤ 1.
Proof. Let us first notice that |div(u, v)| ∼ Bin(n − 2, q) for all distinct u, v ∈ V (G), where q := 2p(1−p). This holds as every vertex of div(u, v) has to be in N (u) \N (v) or in N (v)\N (u) and these two possibilities represent disjoint events that happen with probability p(1 − p).
Note that q ≥ p ≫ log n/n and so we can apply Chernoff's Inequality for δ = 3 log n nq to obtain P |div(u, v)| ≤ nq − 3 √ nq log n ≤ n −9/2 . By the union bound, we therefore deduce that |div(u, v)| > nq(1 − δ) for any u, v ∈ V (G) whp. Since q ≥ p and δ → 0, we get that G(n, p) is p(n − 1)-diverse whp in this case. Lemma 6.4. Let n ∈ N and p ∈ [n −1/2 , 1/2] and suppose G ∼ G(n, p) and let V (G) := U ⊔ S be a vertex partition such that √ n ≤ 4|U | ≤ pn. Then, with high probability, there is a subset W ⊂ S such that U is pn/3-diverse to W and d U G (w) ≤ 10p|U | for all w ∈ W .
Definition 6.5. Let G be a n-vertex graph and let 0 < p ≤ 1/2. We call a set U of vertices of We now expose the randomness in G(n, p) and obtain a fixed graph G. According to Proposition 6.2 and Lemma 6.4, we may assume that G contains a p-convenient set U of size 3 pn 2 /4. At this point, the reader might have already noticed that the p-convenient conditions fit in very well with those from Lemma 4.3. Indeed, the set U is pn/3-diverse to W and 10p-balanced, so the hypothesis of the lemma is satisfied. The most natural thing to do would now be to apply the lemma with the blended distribution B β (U, W ). However, in order to obtain a set of Θ(|U |) degrees, we would like the first term in the RHS of (5) to be of order |U | −1 , which forces β := Θ (|U |/pn). This would then make the second term in the RHS of (5) to be a constant, so it seems that we cannot get the desired 'bad' control. One can obtain weaker bounds on f (G(n, p)) by altering the parameters here, but there is an unavoidable loss as things stand.
There is though a way around this issue. In Lemma 4.3 we solve the problem of coordinates lying outside [0.1, 0.9] by dealing with each pair of vertices {u 1 , u 2 } ⊂ U individually. However, in certain situations it is possible to show that many vertices u 1 ∈ U are simultaneously good for all pairs {u 1 , u 2 } ⊂ U . The crucial twist here is that the diversity term D := pn/3 satisfies D = Ω(∆(G)). This allows us to guarantee that a fixed vertex u 1 ∈ U whp is likely to have no neighbours in div(u 1 , u 2 ) whose coordinates are 'outliers', and this happens for all u 2 ∈ U . The approach here builds upon that of Jenssen, Keevash, Long and Yepremyan [19].
A slight change in our notation will be convenient below. Given a graph G with vertex partition V (G) = U ⊔ W and a probability vector p = (p w ) w∈W ∈ [0, 1] W , we write G(p) to denote the probability space on the set of induced subgraphs of G that contain U , where for each vertex set S ⊂ W , the induced subgraph G[U ∪ S] is selected with probability v∈S p v v∈W \S (1 − p v ). Proposition 6.6. Let n ∈ N, p ∈ [n −1/2 , 1/2] and let G be an n-vertex graph with a p-convenient set U ⊂ V (G) of size 3 pn 2 /4. Then there is a vector p ∈ [0.1, 0.9] V (G)\U and a set U ′ ⊂ U with |U ′ | ≥ |U |/500 so that E[d G(p) (u 1 )] − E[d G(p) (u 2 )] ≥ 1 for all distinct u 1 , u 2 ∈ U ′ .
Proof. We may assume that n is large enough so that all asymptotic bounds hold. First set β := |U |/5pn < 0.1 and let S ⊂ V (G)\U be such that U is pn/3-diverse to S and d U G (v) ≤ 10p|U | for all v ∈ S. As in the proof of Lemma 4.3, for each u ∈ U define the random vector q u on R S by q u := p ′ − α u · proj S (u), where p ′ is the vector from before the truncation in the definition of the blended distribution B β (U, S). Recall we do this so that q u is independent of α u .
We call a vertex u ∈ U good if there are at most d S G (u)/25 coordinates v ∈ S ∩ N (u) so that q u v / ∈ [0.2, 0.8]. Let U g ⊂ U denote the set of good vertices. We claim that P(|U g | ≥ |U |/2) > 1/2. To prove this, take u ∈ U and note that q u v is a sum of at most 10p|U | uniform independent random variables, thus by Hoeffding's inequality: −50 · 0.09 · p 2 n 2 10p|U | 3 = 2e −7.2 < 1 100 .
We deduce that the expected number of coordinates v ∈ V ∩ N (u) with q u v / ∈ [0.2, 0.8] is at most d V G (u)/100. By Markov we get that the vertex u is not good with probability less than 1/4. Therefore, the expected number of vertices u ∈ U that are not good is at most |U |/4 and the claim follows from a simple application of Markov's Inequality.
We now set T := V (G) \ (S ∪ U ) and let T T denote the trivial distribution on T . Take D to be the product distribution B β (U, S) × T T on [0.1, 0.9] S∪T . Given distinct vertices u 1 , u 2 ∈ U , we let E u 1 ,u 2 denote the event that E p∼D d G(p) (u 1 ) − d G(p) (u 2 ) ≤ 1. Moreover, since U is pn/3-diverse to S, we know that either |N S G (u 1 ) \ N S G (u 2 )| ≥ pn/6 or |N S G (u 2 ) \ N S G (u 1 )| ≥ pn/6. We set m S (u 1 , u 2 ) := u 1 in the first case and m S (u 1 , u 2 ) := u 2 in the second one.
Our next claim is that P E u,u ′ | m S (u, u ′ ) ∈ U g ≤ 120|U | −1 for all u ̸ = u ′ in U . To prove it, we can assume that u = m S (u, u ′ ). As u ∈ U g , at most 2pn/25 vertices in N G (u) represent coordinates v such that q u v / ∈ [0.2, 0.8]. Therefore, we can find a subset Y ⊂ N S G (u) \ N S G (u ′ ) of size pn/12 such that q u v ∈ [0.2, 0.8] for all v ∈ Y . Since p ′ v = q u v + α u u v and |α u | < 0.1, we deduce that no Y -coordinate of p ′ gets truncated when creating p ∼ B β (U, S). Condition now on any choice of α := (α w ) w̸ =u such that u ∈ U g and note that α u is independent of it.
By looking at the following expression (when p ∼ D) as a function of α u : we observe that E u,u ′ holds provided that, conditioned on α, this difference lies in an interval of length 2. The same argument as in Lemma 4.3 gives us that E u,u ′ |α happens with probability at most 24(pnβ) −1 = 120|U | −1 . The claim follows from the law of total probability.
Proof. We can assume n is sufficiently large. Let H be a random induced subgraph selected according to G(p) and define for it the following sets: Our first claim is that P(|B| ≥ |U ′ |/2) ≥ 1/2. To prove it, we start by estimating |B|. For any u ∈ U ′ we have Var d G(p) (u) = v∼u p v (1 − p v ) ≤ pn/2, thus Chebyshev's Inequality implies that P(u / ∈ B) ≤ 1/4. It follows that E[|U ′ \ B|] ≤ |U ′ |/4, so by Markov's inequality we get that P(|U ′ \ B| ≥ |U ′ |/2) ≤ 1/2, which is equivalent to our claim.
We now want to estimate |J|. First note that the separation in expected degree for U ′ implies that |P | ≤ 2|U ′ | √ 2pn. Each {u, u ′ } belongs to J with probability P d H (u)−d H (u ′ ) = 0 , which we claim is O 1/ √ pn . This happens because d H (u) − d H (u ′ ) = ξ v X v , where the sum is taken over all v ∈ div(u, u ′ ) \ U , ξ v ∈ {−1, 1} and X v ∼ Be(p v ) measures whether v ∈ V is picked as a vertex of H or not. As U is pn/3-diverse to some subset S ⊂ V (G) \ U , we deduce that |div(u, u ′ ) \ U | ≥ |N S G (u)△N S G (u ′ )| ≥ pn/3, so we can apply Theorem 2.3 to prove the previous claim. Therefore E[|J|] ≤ |P | · max It follows that P |J| = O(|U ′ |) > 1/2 by Markov, so together with the first claim, we are able to deduce that both |J| = O(|U ′ |) and |B| ≥ |U ′ |/2 happen with positive probability. The end of the proof follows the same idea as before: make a choice of H for which this happens and by Turán's Theorem the graph J[B] obtained by building edges between the vertices of B which have equal degree in H has an independent set of size Ω(|U ′ |). This set must consist of vertices with distinct degrees in H, as if u, u ′ ∈ B and d H (u) = d H (u ′ ) then {u, u ′ } ∈ P and so {u, u ′ } ∈ J, which represents an edge in J [B].
With all these ingredients, we are finally able to prove the following: