Almost sure behavior of the zeros of iterated derivatives of random polynomials

Let $Z_1,\, Z_2,\dots$ be independent and identically distributed complex random variables with common distribution $\mu$ and set $$ P_n(z) := (z - Z_1)\cdots (z - Z_n)\,. $$ Recently, Angst, Malicet and Poly proved that the critical points of $P_n$ converge in an almost-sure sense to the measure $\mu$ as $n$ tends to infinity, thereby confirming a conjecture of Cheung-Ng-Yam and Kabluchko. In this short note, we prove for any fixed $k\in \mathbb{N}$, the empirical measure of zeros of the $k$th derivative of $P_n$ converges to $\mu$ in the almost sure sense, as conjectured by Angst-Malicet-Poly.


Introduction
Let µ be a probability measure on C and let (Z j ) j 1 be independent and identically distributed (i.i.d.) complex random variables with distribution µ.Define the sequence of random polynomials (P n ) n 1 via (1) Pemantle and Rivin [18] introduced this model and conjectured that the critical points of P n are close to the roots of P n .More rigorously, for each fixed k ∈ N let ν n to be the empirical measure of P where δ y denotes the point mass at y. Pemantle and Rivin conjectured that in the case of k = 1, we have ν n converges weakly to µ; Pemantle and Rivin proved this under the assumption that µ has finite 1-energy.Kabluchko [10] proved Pemantle and Rivin's conjecture, showing that ν (1) n → µ in probability as n → ∞.Kabluchko's result was extended by Byun, Lee and Reddy [3] who proved that for each fixed k ∈ N one has that ν (k) n converges weakly to µ in probability.Recently, the authors showed the same holds if k grows slightly slower than logarithmically in n [15].The works [3] and [15] on convergence of higher derivatives follow the same general strategy as Kabluchko's original proof [10], which much of the new ingredients coming in to handle an anti-concentration estimate.For more references on this model and adjacent models, see the works [1,4,5,8,9,11,12,17,19,20] and the references therein.
Cheung-Ng-Yam [4] and independently Kabluchko (see [15]) conjectured that in fact ν n should weakly converge to µ almost surely and not just in probability.This was recently proven by Angst, Malicet and Poly [2] for all probability measures µ.Angst, Malicet and Poly also conjectured [2] that the almost-sure convergence of ν (k) n should hold for each fixed k ∈ N. In this short note, we confirm their conjecture.
Theorem 1.1.Almost surely with respect to P, for each fixed k ∈ N the sequence of empirical measure ν (k) n converges to µ as n tends to infinity.
Our proof of Theorem 1.1 takes inspiration from Angst, Malicet and Poly's proof of the k = 1 case [2]; the new ingredient is to handle a non-linear, high-dimensional, multivariate anti-concentration problem via a decoupling approach.We first outline the general shape of the Angst-Malicet-Poly approach as well as where our contribution comes into play in Section 1.1.We then prove our anti-concentration estimate in Section 2 and complete the proof of Theorem 1.1 in Section 3.
1.1.Outline of the Angst-Malicet-Poly strategy and our contribution.The main engine behind Angst-Malicet-Poly is a simple-yet-powerful fact about probability measures on the Riemann sphere.To set up their Lemma, set C to be the Riemann sphere, let M := {ψ(z) = az+b cz+d } be the set of Möbius transformations, and let λ M be the measure on M inherited by setting taking the complex Lebesgue measure on the tuple (a, b, c, d).Define log − z = | log z|1 z∈[0,1] .Their main engine is the following lemma [2, Lemma 2.7]: Lemma 1.2.Let m 1 and m 2 be two probability measures on C so that (4) An appealing aspect of this Lemma is that it requires only a one-sided bound; further, the space of probability measures on C is compact, and so it will suffice to verify Equation ( 4) where m 1 will be an arbitrary cluster point ν ∞ of the sequence (ν (k) n ) n 1 and m 2 will be the measure µ.The route towards establishing Equation (4) begins at Jensen's formula.Letting C(0, 1) denote the unit circle, one may apply Jensen's formula to the ratio S n := where {ρ} enumerates the roots of P (k) and {ζ} enumerates the roots of P (see Fact 3.1).Our task then is to control the right-hand side.The term with the maximum is fairly straightforward to control almost-surely (Lemma 3.2), and it is the term log |S n (ψ −1 (0)| that is more challenging.In particular, if we set a = ψ −1 (0) then The case of k = 1, this is precisely a sum of i.i.d.random variables and so we depending on the distribution of Z and the choice of a we may have that P(S n = 0) = Θ(n −1/2 ).Since we seek almost sure statements, this is too large to apply Borel-Cantelli.To get around this issue, Angst-Malicet-Poly look instead at triples of Möbius transformations.For most such triples (ψ 1 , ψ 2 , ψ 3 ) one has that the vector (ψ −1 1 (0), ψ −1 2 (0), ψ −1 3 (0)) consists of three distinct complex numbers, say (a, b, c).The vector (S n (a), S n (b), S n (c)) now behaves like a sum of three-dimensional random variables.In particular, a sufficiently general version of Erdős's solution to the Littlewood-Offord problem shows that the probability all coordinates of (S n (a), S n (b), S n (c)) are small simultaneously decays like O(n −3/2 ), which is now summable.An application of Borel-Cantelli will allow one to deduce that almost-surely for generic triples of Möbius transformations and all large enough n we have at least one of (S n (ψ −1 )) is at least, say, 1 in modulus.Working with a given cluster point ν ∞ of (ν (k) n ) n 1 and applying Equation ( 5) together with an application of the law of large numbers to handle the sum over {ζ} will prove Equation ( 4).
The main challenge in adapting this approach to fixed k ∈ N is to handle the anti-concentration estimate.In particular, for fixed k 2, handling the quantity P(S n (a) = 0) is a non-linear anti-concentration problem, and major open problems remain in this arena.As an example, one expects that for each k 2 one has P(S n (a) = 0) = O(n −1/2 ), but this is only known up to subpolynomial factors [14].Furthermore, we need to consider vectors of such quantities.Roughly, for each fixed k, we need to take L large enough so that for distinct complex numbers (z 1 , . . ., z L ) we have To handle this quantity, we use a decoupling approach for anti-concentration.This was introduced by Costello-Tao-Vu [7] in their study of random symmetric matrices and anti-concentration of quadratic forms (see also the survey [16]).The intuition here is to tackle multilinear anti-concentration problems by comparing them to linear anti-concentration problems, at the cost of decreasing the rate of decay of the resulting bounds.For us, we need to apply a decoupling lemma to the vector (S n (z 1 ), . . ., S n (z L )) and handle all coordinates simultaneously in order to obtain a high-dimensional but linear anti-concentration problem.Our main new contribution is the following Lemma: Lemma 1.3.Suppose µ does not have finite support and let k ∈ N. Then for L = 2 k+2 k and all pairwise distinct complex numbers (z 1 , . . ., z L ) we have where C > 0 depends on k and µ.
We note that increasing L yields an increase in the exponent on n on the right-hand side; we only need the right-hand side to be summable in n.Further, it is plausible that the right-hand side of Lemma 1.3 should be of the order n −L/2 ; however, even in the case of k = 2 and L = 1 this is a non-trivial instance of a significant open problem known as the quadratic Littlewood-Offord problem (see [6,7,13,14]).Since we only need summability, the sub-optimal bounds attained by decoupling will be strong enough provided we take L large as in Lemma 1.3.
We note that this approach differs fundamentally from the anti-concentration approach of our previous work [15].In [15] we deduced our anti-concentration estimate from a powerful theorem of Meka-Nguyen-Vu [14] which in turn is proven by a sophisticated Gaussian comparison argument.
The decoupling approach and proof of Lemma 1.3 is handled in Section 2. We then prove Theorem 1.1 in Section 3, and import the necessary tools and adapt ideas from [2].1.2.Notation.Throughout, the random variables (Z j ) j 1 are defined on the common probability space (P, F , Ω) .The random polynomials we consider are defined by The measure µ n is the empirical measure of P n and the measure ν

Anticoncentration via decoupling
The goal of this section is to prove Lemma 1.3; we begin with the abstract decoupling lemma of Costello, Tao and Vu [7].
Lemma 2.1.Let (Y 1 , . . ., Y r ) be a collection of random variables taking values in an arbitrary measurable space and let E = E(Y 1 , . . ., Y r ) be an event depending on these variables.Set (Y ′ 1 , . . ., Y ′ r ) to be an independent copy of the collection of random variables (Y 1 , . . ., Y r ) with the same joint distribution.Then The strategy will be to apply this lemma and subsequently take linear combinations of various versions of S n (z j ) in order to obtain a linear inequality rather than a multi-linear inequality.We then will need a highdimensional (linear) anti-concentration statement which is stated in [2].A random vector (X 1 , . . ., X d ) ∈ C d is non-degenerate if there do not exist complex numbers α j , β so that d j=1 α j X j − β = 0 almost surely.This non-degeneracy assumption asserts that (X 1 , . . ., X d ) is genuinely d-dimensional.
) n 1 be a sequence of i.i.d.nondegenerate random vectors taking values in C d and set S n = n k=1 X k .Then there is a constant C depending on d, r and the law of X so that for all n we have sup We are now ready to set up the decoupling approach.Recall that (Z j ) j 1 are the i.i.d.samples from µ giving the roots of the polynomials (P n ) n 1 .For fixed n and k, partition [n] into k disjoint sets R 1 , . . ., R k with ⌊n/k⌋ |R j | ⌈n/k⌉.For j ∈ [k] define Y j = (Z i ) i∈Rj .We now think of the rational function S n (z) as a function not only of z but also of the quantities (Y j ) j∈[k] and so we write S n (z; Y 1 , . . ., Y k ) when we want to make this dependence explicit.
Applying Lemma 2.1 shows ( 7) The main use of the decoupling is in the following combinatorial lemma.
The triangle inequality shows that |h(Y Swapping the sums, we claim that To see this, assume first that {i 1 , . . ., i k } ∩ R j = ∅ for some R j ; then when we sum over α j , the sign changes but the quantity does not, thus giving 0. Otherwise, we must have that each R j contains exactly one value from {i 1 , . . ., i k }, and thus the sum factors as stated.This shows We now use Lemma 2.3 to identify a high-dimensional but linear anti-concentration event lurking in the right-hand side of Equation (7).
Then there is some ℓ ∈ [k] and a set I with |I| L/k so that for all i ∈ I we have Proof.Apply Lemma 2.3 to note that for each i ∈ [L] we can associate some ℓ i ∈ [k] for which we have Since there are only k choices for each ℓ i , the pigeonhole principle shows that at least L/k values of i must have the same value of ℓ i .
With an eye towards applying Proposition 2.2, we confirm non-degeneracy of the summands appearing in Corollary 2.4: Fact 2.5.Suppose µ does not have finite support.Let L ∈ N and set z 1 , z 2 , . . ., z L to be pairwise distinct complex numbers.Let Z and Z ′ be independent samples from µ. Then the vector 1 Proof.The proof is similar to the case appearing in [2].Seeking a contradiction, suppose that this random vector is degenerate, and so suppose there are (possibly complex) numbers α j and β so that Reveal Z ′ , and set which implies that almost-surely in Z we have Clearing denominators, this implies that Z is the zero of a polynomial of degree at most L, which contradicts our assumption that µ does not have finite support.
We are now ready to prove Lemma 1.3.
Proof of Lemma 1.3.By Equation ( 7) and Corollary 2.4 there is a set R ℓ with |R ℓ | ⌊n/k⌋ and a set I with |I| L/k so that ( 8) We now apply Proposition 2.2-noting that the non-degeneracy condition is guaranteed by Fact 2.5-to bound ( 9) where C depends on L and µ.Recalling L = 2 k+2 k and combining Equation (7) with Equation ( 9) completes the proof.

Main Lemmas and Proof of Theorem 1.1
Following the Angst-Malicet-Poly strategy outlined in Section 1.1, recall that we have set P n (z) = (z − Z 1 ) • • • (z − Z n ) to be our random polynomial and We begin with the following basic consequence of Jensen's formula, proven in [2, Prop.2.2] .
Fact 3.1.Let S = Q/P where P and Q are two polynomials neither of which has 0 as a root.Then for each Möbius transformation ψ ∈ M we have where {ρ} enumerates the roots of Q up to multiplicity and {ζ} enumerates the roots of P .
Recalling that Möbius transformations map circles to circles, we aim to handle the max term in Fact 3.1 first.Set C(a, r) to be the circle of radius r centered at a.

Lemma 3.2. There is a set
Proof.First consider the case a = 0. Note that Bound (11) max and note that for r satisfying Equation (10) the strong law of large numbers implies that P-almost-surely we have ( 12) where the implicit constant depends on the instance ω ∈ Ω as well as r ∈ R .Combining Equation (11) with Equation (12) shows that for λ R -almost-every r > 0 and P-almost-surely we have (13) max z∈C(0,r) To show the same for arbitrary a ∈ C, simply replace Z with the random variable Z − a to reduce to the case of a = 0.This shows that for every a ∈ C there are sets Ω a , U a with P(Ω c a ) = λ R (U c a ) = 0 so that for all ω ∈ Ω a , r ∈ U a we have that the triple (a, ω, r) satisfies the hypotheses of the Lemma.An application of the Fubini-Tonelli theorem completes the proof.
An application of Lemma 1.3 will allow us to handle the remaining term on the right-hand side of Fact 3.1.Lemma 3.3.For each k ∈ N, set L = 2 k+2 k.Then there is a set E ⊂ Ω × C L with P ⊗ λ ⊗L C (E c ) = 0 so that for all (ω, z 1 , . . ., z L ) ∈ E and all n large enough there is at least one j ∈ [L] so that |S n (z j )| 1 (with the convention |S n (z)| = +∞ if z is a pole of S n ).
Proof.For each L-tuple (z 1 , . . ., z L ) of pairwise distinct points, Lemma 1.3 and the Borel-Cantelli lemma show that almost surely for sufficiently large n we have |S n (z j )| 1 for at least one j ∈ [L].Since the set of pairwise distinct L-tuples has complement of measure 0 under the measure λ ⊗L C , an application of the Fubini-Tonelli theorem completes the proof.
Finally, the strong law of large numbers will control the sum over {ζ}: We are now ready to verify the assumptions of Lemma 1.2 for cluster points of our sequences of random measures.
Lemma 3.5.Suppose µ does not have finite support.There is a set E ⊂ Ω × M with P ⊗ λ M (E c ) = 0 so that the following holds.For every cluster point ν ∞ of the sequence of empirical probability measures (ν (2) For each n large enough, there is some j ∈ We note that for Item 2 we use the fact that for λ ⊗L M -almost-every tuple (ψ j ) L j=1 , the values (ψ −1 j (0)) L j=1 are pairwise distinct.Fix an instance ω ∈ Ω 1 and a tuple (ψ 1 , . . ., ψ L ) for which the above three items hold.Combining Jensen's formula Fact 3.1 along with Item 1 and Item 2, we see that for each n sufficiently large there is some j ∈ L for which we have For a cluster point ν ∞ , there exists a subsequence (ν ni ) i 1 converging to ν ∞ .By truncating the log − and applying the monotone convergence theorem we see that for each j ∈ [L] we have Thinning our subsequence further so that Item 2 holds for a single j ∈ [L] for all (n i ) i 1 , we combine Equation (14), Equation (15) and Equation ( 16) to obtain we see that Equation (17) shows that λ ⊗L M (E L ) = 0 where E L is the L-fold cartesian product of E with itself; this implies that λ M (E) = 0 as desired.
Proof of Theorem 1.1.First note that if µ has finite support, then the theorem follows immediately by the strong law of large numbers.As such, we will assume throughout that µ does not have finite support.By Lemma 3.5, there is a set Ω 1 with P(Ω 1 ) = 1 so that for each ω ∈ Ω 1 , for λ M -almost-all ψ ∈ M we have By Lemma 1.2, this implies that each such cluster point satisfies ν ∞ = µ.This shows that each subsequence of {ν (k) n } n 1 contains a further subsequence that converges to µ, thus showing that for each ω ∈ Ω 1 we have ν n converges to µ.
and let µ n denote the empirical measure of P n , i.e.

n
is the empirical measure of P (k) n .We will make use of the ratio S n = P (k) n k!Pn .We set λ R to be the Lebesgue measure on R and λ C to be the Lebesgue measure on C; we write C for the Riemann sphere.We denote M = {ψ(z) = az+b cz+d } for the set of Möbius transformations and endow M with the measure λ M induced by the taking the Lebesgue measure λ ⊗4 C on the tuples (a, b, c, d) defining the Möbius transformations.We denote log z = log + z − log − z where log − z = | log z|, 0 z 1, 0, z 1, and log + z = 0, 0 z 1, log z, z 1, where log − 0 = +∞.We write C(a, r) for the circle centered at a ∈ C of radius r.For m ∈ N we write [m] = {1, 2, . . ., m}.