Stein’s method via induction

Applying an inductive technique for Stein and zero bias couplings yields Berry-Esseen theorems for normal approximation for two new examples. The conditions of the main results do not require that the couplings be bounded. Our two applications, one to the Erdős-Rényi random graph with a fixed number of edges, and one to Jack measure on tableaux, demonstrate that the method can handle non-bounded variables with non-trivial global dependence, and can produce bounds in the Kolmogorov metric with the optimal rate.


Introduction
We present new Berry-Esseen theorems for sums Y of possibly dependent variables by combining both the Stein and zero bias couplings of Stein's method with the inductive technique of Bolthausen (1984) originally developed for the combinatorial central limit theorem. We apply these results to obtain normal approximations in the Kolmogorov metric for two new examples.
in the problem size. This work is a broad extension and continuation of Ghosh (2009), applying induction and the zero bias coupling for the combinatorial central limit theorem where the random permutations are involutions, and of Goldstein (2013) using the size bias coupling to study degree counts in the Erdős-Rényi random graph; the inductive method considered here is inspired by Bolthausen (1984), but goes ultimately back to Bergström (1944).
At the center of Stein's method is the characterization that Z is a standard normal random variable if and only if E{Zf (Z)} = E{f (Z)} (1.1) for all locally absolutely continuous functions f for which the above expectations exist. Given a standardized variable W whose distribution is to be compared to Z, and a test function h on which to evaluate the difference Eh(W ) − Eh(Z), one solves the Stein for f . The difference Eh(W ) − Eh(Z) may then be evaluated by substituting W for w and taking expectation on the left hand side of (1.1), rather than the right. One explanation of why the expectation of the left hand side may simpler to compute, or bound, than that of the right is that it depends only on the distribution of W , whereas the right also depends on that of Z. In particular, on the left hand side one may apply couplings of W to auxiliary random variables having properties that allow for convenient manipulations.
In Theorem 1.1 we present results for situations in which one can form a Stein coupling as defined by Chen and Röllin (2010). Following the treatment there, we say that the triple (W, W , G) of random variables is a Stein coupling when E{Gf (W ) − Gf (W )} = E{W f (W )} (1.2) for all functions f for which the expectations above exist. It is not difficult to see that the canonical exchangeable pair coupling of Stein (1986), and the size bias coupling of Goldstein and Rinott (1996) are both special cases of Stein couplings. Indeed, recall that for λ ∈ (0, 1] we say (W, W ) is a λ-Stein pair if (W, W ) is exchangeable and E{W |W } = (1 − λ)W. (1.3) In this case, it is easily verified that (1.2) is satisfied with for all functions f for which the quantities above exist.
In Stein's method in general, simplification occurs when a coupling of W to an appropriate W can be achieved in such a way that the difference is almost surely bounded, or bounded uniformly in the size of the problem. However, in many situations appropriately bounded couplings may be difficult to construct, whereas unbounded couplings seem to appear naturally. Hence Theorems 1.1 and 1.2, which do not impose restrictive boundedness conditions, may be applied to produce new results in a variety of examples.
General Framework. Let (Θ, T ) and (Ω, F) be two measurable spaces, the parameter space and the sample space, respectively. All random variables are understood to be real valued measurable functions from the product space (Θ × Ω, T ⊗ F). The distribution of a random variable X is determined by a parameter θ ∈ Θ through a given transition kernel P θ from Θ to Ω. That is, for each θ ∈ Θ, P θ [·] is a probability measure on (Ω, F), and for each A ∈ F, the map P · [A] is T -measurable. Depending on context and emphasis, we may also write X as X(θ, ω) or X θ (ω), so that, for instance, E θ X = Ω X(θ, ω)P θ [dω].
These measurability conditions are needed to assure the measurability of mappings that appear later, such as of the mean µ θ , the variance σ 2 θ of Y , and of Y Ψ(θ,ω) (ω), which represents the value of Y at the parameter used in the inductive step. These conditions will not always be invoked explicitly below; we illustrate their use by showing in the Appendix, Section 4, that this latter variable in particular is measurable.
Our goal is to obtain bounds on the Kolmogorov distance between the standardized version W of a random variable Y and the normal distribution in terms of the parameter θ. Theorems 1.1 and 1.2 below yield a bound of the form C/r θ for r θ a positive 'rate' function of θ and C a constant not depending on θ.
As noted, one main step our method requires is to couple W to a random variable W , which satisfies either the Stein coupling relation (1.2) or the zero bias coupling relation (1.4). In order to apply induction, we identify a subset ⊂ Θ in Condition (G1), consisting of the 'nicely behaved' parameters; its complement plays the role of the base case, on which the bound C/r θ may be trivial. For our bound to be informative, it is necessary that the rate function r θ be unbounded on .
For the induction step, we also introduce a sub σ-algebra F θ that, roughly speaking, captures the information about the changes that were necessary to construct W from W (or equivalently, Y from Y ); the coarser F θ is, the better the normal approximation will be. A certain tension is created here, as F θ must be large enough to contain the variables describing the changes from Y to Y , but small enough so that the conditional distribution of Y on F θ , is sufficiently close to its original one.
Conditional on F θ , the variable Y may no longer have its original distribution, but induction is viable when one can identify within Y another variable V that has a distribution similar to the original Y ; when the parameter space Θ is ordered, V typically has a smaller parameter. For a successful induction, the parameter of the smaller problem should not stray too far from that of Y . There is some leeway here, as it suffices to have control over an event F θ,1 , as specified in Condition (G4). Intuitively, the event F θ,1 should contain the bulk of the support of the variables that generate F θ , and not their extremes. For instance, for the Erdős-Rényi graph problem considered, F θ contains the label and degree of a chosen vertex on which the coupling is based, and F θ,1 is an even on which its degree is 'not too large'.
Relaxing the condition that the difference D = W − W be bounded, we control the magnitude of this difference by its moments. Moreover, we upper bound D by D, and in the case of a Stein coupling, also G by G, where these majorizing variables are required to be F θ measurable; we are able to handle exceptional or boundary cases as these EJP 25 (2020), paper 132. upper bounds are only required to hold on F θ,1 . We will also require the existence of a random variable B that bounds the absolute difference |Y − V |, and which is not 'too large.' See Conditions (G3), (G4) and (G6) for the case of Stein couplings. There is also some leeway in that the distribution of V , conditionally on F θ , only needs to be close to that of Y on an event F θ,2 ∈ F θ . Precisely, for the Stein coupling case, with similar remarks also applying to zero bias couplings, we impose in Condition (G5) that L θ (V |F θ ) = L Ψ θ (Y ) on F θ,2 , (1.5) where Ψ θ is the (typically random) parameter capturing the conditional distribution of the embedded variable V . For clarification, by (1.5) we mean for all ω ∈ F θ,2 .
With the help of V , a recursive inequality for a bound on the distance between W and the normal can be produced. Before attempting to apply the methods presented in this article, it is advisable that a user first 'test the waters' by constructing a Stein or zero-bias coupling and proving a normal approximation for a smooth metric such as the Wasserstein distance; see Chen and Röllin (2010), or Goldstein (2007), respectively. Once this goal has been achieved, the sigma-algebra F θ will typically arise naturally from the coupling construction, and one may then proceed to identify a suitable variable V whose conditional distribution given F θ is within the same class of distributions determined by Θ and close to that of Y .
For instance, in occupancy problems, a Stein coupling or zero-bias coupling typically involves moving around a small number of balls among a small number of urns, and V will typically again represent an occupancy problem, but on fewer balls and fewer urns.

Abstract approximation theorems
We now state the conditions required for our main results. The inverse rate function r θ is assumed to be a positive function, measurable in θ, a condition satisfied for all natural examples, including the ones considered here. The mean µ θ = E θ Y and variance σ 2 θ = Var θ (Y ) are measurable by the conditions in our General Framework. To avoid repetition, the distribution of random variables indicated after θ ∈ Θ has been fixed is with respect to L θ (·). The random variable Z will always denote the standard normal.
The variable Y denotes the unstandardized random variable of interest. Theorem 1.1 shows that the following set of conditions are sufficient for the Kolmogorov distance between the standardized version W of Y and the normal to be bounded by C/r θ for some universal constant C.
(G1) Let r θ be a positive measurable function, let r be a positive number, and let = {θ ∈ Θ : r θ > r}. (1.6) Assume that r is chosen such that Var θ Y > 0 for all θ ∈ .
(G4) For each θ ∈ , let F θ ⊂ F be a sub-σ-algebra. Let G and D be random variables such that, for each θ ∈ , the mappings G(θ, ·) and D(θ, ·) are F θ -measurable and such that, on some event F θ,1 which need not be in F θ , we have |G| ≤ G, |D| ≤ D, (1.8) (G5) Let Ψ be a Θ-valued random element such that, for each θ ∈ , Ψ(θ, ·) is F θmeasurable. Let V be a random variable, and for each θ ∈ , let F θ,2 ∈ F θ be such (1.12) sup θ∈ ess sup (1.13) where the essential suprema are taken with respect to P θ .
(1.14) Theorem 1.1 extends Theorem 1.1 in Goldstein (2013), which produces a Kolmogorov bound equivalent up to constants to the bound in Chen and Röllin (2010) for the Wasserstein distance to the normal for bounded size bias couplings. In addition, the bound produced by Bartroff and Goldstein (2013) by an application of Theorem 1.1 of Goldstein (2013) to counts in a multinomial occupancy model was shown there to be of optimal order by the lower bound (1.6) of Englund (1981), see also (1.7) of Bartroff and Goldstein (2013); the bound of Theorem 1.2 of Goldstein (2013), using also Theorem 1.1 of that same work, for degree counts in the Erdős-Rényi random graph can also be shown to be optimal up to constant factors in the same manner.
When higher moments exist a number of the conditions of the theorem may be verified using simpler expressions, obtained via standard inequalities. For instance, using f (w) = w and that Var θ (W ) = 1 in (1.2) shows that E θ (GD) = 1, hence applying EJP 25 (2020), paper 132. the Cauchy-Schwarz inequality to the first expression in (1.7) in Condition (G3) above, followed by a consequence of the conditional variance formula, we obtain (1.15) where H is any σ-algebra with respect to which W is measurable.
We now state a parallel result for zero bias couplings.
Assume that r is chosen such that Var θ Y > 0 for all θ ∈ .
whenever σ θ > 0 and W = 0 otherwise. Let W * be defined on Ω, such that for each θ ∈ the variable W * has the W -zero bias distribution as in (1.4) with respect to P θ .
(Z3) For each θ ∈ let F θ be a sub-sigma algebra of F, let D = W * − W , and let D be a random variable such that D(θ, ·) is F θ -measurable, and let F θ,1 be an event, which need not be in F θ , on which |D| ≤ D and such that sup θ∈ r 2 θ E θ |D|(1 − I F θ,1 ) < ∞ and sup θ∈ r θ E θ |DW | + D < ∞.
(1.16) (Z4) Let V be a random variable, and let Ψ be a Θ-valued random element such that, for each θ ∈ , Ψ(θ, ·) is F θ -measurable. For each θ ∈ , let F θ,2 be an event in F θ such that Many of the conditions of Theorem 1.2, as for Theorem 1.1, can be shown to be satisfied using inequalities on moments. The proofs of Theorems 1.1 and 1.2 appear in Section 4. EJP 25 (2020), paper 132.

Applications
We apply Theorems 1.1 and 1.2 to obtain new results in two examples; the proofs are deferred to Sections 2 and 3.
The first examples invokes Theorem 1.1 for Stein couplings for the normal approximation of the number Y of isolated vertices in the Erdős-Rényi graph G ∼ ER(n, m) on n vertices, having exactly m edges, distributed uniformly at random. This model is related to the one where edges between each pair of vertices are chosen independently with some fixed probability, but in the model we consider the indicators that vertices are isolated exhibit a non-trivial global dependence since the total number of edges is fixed. In fact, while in the model with independent edges these indicators are positively correlated, the effect of the global dependence in ER(n, m) is stronger, resulting in a negative correlation; see proof of Lemma 2.5. Related work was done by Kordecki (1987) on the number of isolated vertices in the Erdős-Rényi graph model, although his general framework is not applicable here.The boundedness of the second derivative of the solution to the Stein equation on page 132 is shown only for the points where the second derivative exists, whereas, in order to perform the Taylor expansion on page 135, it is needed to hold everywhere; we were thus not able to reproduce his final results. In addition, the fixed number of edges model does not appear to satisfy the condition on page 134 of his work. We also mention the work by Goldstein (2013), who considered vertex degrees in general, though it only addressed the independent edge model. Theorem 1.3 provides the following bound on the Kolmogorov distance between the standardized variable Y and the normal. Theorem 1.3. Let Y count the number of isolated vertices in the Erdős-Rényi graph G ∼ ER(n, m) on n vertices, having exactly m edges, distributed uniformly at random. Then, with µ n,m and σ 2 n,m the mean and variance of Y , letting W = (Y − µ n,m )/σ n,m when σ n,m > 0 and zero otherwise, with Θ = (n, m) : n ≥ 3, 0 < m < n 2 , (1.20) there exists a universal constant C > 0 such that, for all (n, m) ∈ Θ, .
(1.21) Remark 1.4. In order to better understand the bounds obtained in Theorem 1.3, we now discuss in more detail the different regimes at which m and n can tend to infinity. To this end, denote by a(n) ∼ b(n) that lim a(n)/b(n) = 1, and by a(n) b(n) that lim inf a(n)/b(n) > 0 and lim sup a(n)/b(n) < ∞. By Lemma 2.7, if n and m tend to infinity so that max{m/n 2 , m 2 /n 3 } → 0, then µ n,m ∼ ne −2m/n and σ 2 n,m ∼ nϕ(2m/n) for ϕ(x) = e −x (1 − e −x (1 + x)).
For n m, the central domain, it follows that r n,m σ n,m , and moreover, in the special case where m ∼ cn, µ n,m ∼ ne −2c and σ 2 n,m ∼ ne −2c 1 − e −2c (1 + 2c) .
Regarding lower bounds, Englund (1981, Section 6) shows that for the standardized number of occupied cells in a uniform occupancy model with n balls and m boxes, Englund's argument holds without changes for any random variable with finite variance supported on the integers, and so also for the number of isolated vertices in our model.
Hence, since in the central domain r n,m σ n,m , the rate function is of optimal order. If m → ∞ and m/n → 0, the left domain, say, then r n,m σ n,m m 2 n 2 m 3 n 5/2 since 1−e −x (1+x) ∼ x 2 /2 as x → 0 for the first relation, and σ 2 n,m m 2 /n for the second. In this case, Englund's lower bound is not achieved since r n,m = o(σ n,m ). Nonetheless, the bound is informative as long as r n,m → ∞, which is the case as long as m/n 5/6 → ∞, such as when m = cn α for c > 0 and 5/6 < α < 1.
If m/n → ∞, the right domain, using σ 2 n,m ne −2m/n for the second relation we have r n,m ∼ σ n,m n 2 m 2 e −m/n n 5/2 m 2 , so Englund's lower bound is not attained. However, r n,m goes to infinity when m ≤ α n log n for 0 < α < 1/2.
In the second example, we use the zero bias coupling constructed in Fulman and Goldstein (2011, Theorem 3.1) in Theorem 1.2 to give a bound on the normal approximation of the content Y of a Young tableux under Jack α measure over a range of large α. In more detail, we recall that a partition of a positive integer n can be represented as a vector Λ = (λ 1 , . . . , λ p ) of non-increasing, positive integers summing to n, where p is the number of parts of the partition. For instance, Λ = (4, 2, 1) corresponds to a partition of n = 7 with p = 3. In turn, the partition Λ can be represented by a tableaux with p rows of equal sized boxes, whose j th row is of length λ j , such as in (1.23).
The Jack α measure on tableaux, defined for α > 0, recovers the Plancherel measure when specializing to the case α = 1. Under Jack α , see Fulman (2004) for instance, the probability of a partition Λ of n is given by , (1.22) where the product is over all boxes x in the partition, a(x) denotes the number of boxes in the same row of x and to the right of x (the "arm" of x), and l(x) denotes the number of boxes in the same column of x and below x (the "leg" of x). For each tableaux representing a partition of n we may define the α-content of any individual box by as depicted in the following tableaux for the partition (4, 2, 1) of 7: EJP 25 (2020), paper 132.
Here we study the distribution of the standardized sum of the α-contents over all boxes in the tableaux, that is, (1.24) and where the partition Λ n of n is sampled from the Jack α measure in (1.22). Fulman (2004) proved an O(n −1/4 ) bound for the error in the Kolmogorov metric for the normal approximation of W , improved by  using martingales to O(n −1/2+ε ) for any ε > 0, and by  to O(n −1/2 ) using Bolthausen's inductive approach and Stein's method, but without an explicit constant. Hora and Obata (2007) prove a central limit theorem, with no error bound, for W n,α using quantum probability. Fulman and Goldstein (2011) prove the bound (1.25) in the Wasserstein metric d 1 , where Z is a standard normal variable. In addition to providing explicit constants, this bound also highlights the role of α. A natural question it brings is whether a bound in the Kolmogorov metric can be shown that has this same dependence on α. A few weeks before the current work was posted, Chen, Raič and Thành (2020+, Theorem 1.1) proved the bound which achieves this goal with an explicit constant to within a logarithmic factor.
Here, given any ε ∈ (0, 1), we show that, in the 'large α' region α ≥ n 1+ε , this log factor may be removed, resulting in the bound having the same α dependence as (1.25). That is, as α ≥ n over the region we consider, the ratio between the right hand sides of (1.25) and (1.26) is bounded away from zero and infinity. This same result, with an explicit constant, was also achieved by Chen, Raič and Thành (2020+, Proposition 4.1) by applying a different approach. We do not consider ε > 1, as Theorem 3.1 below shows that this case is degenerate. Theorem 1.5. For W as given in (1.24) with Λ n sampled according to Jack α measure for some n ≥ 2, for every ε ∈ (0, 1) there exists a constant C depending only on ε such for all n ≥ 2 and α ≥ n 1+ε . (1.26) We remark that by applying the reasoning at the end of the proof of Theorem 4.1 of Fulman and Goldstein (2011) the result holds also for α ≤ n −1−ε when replacing the α on the right hand side by 1/α. In the computations that follow, C without subscript will denote a universal constant whose value may change from line to line, and for n a non-negative integer, [n] will denote the set {1, . . . , n}.

Isolated vertices in the Erdős-Rényi random graph
In this section we prove Theorem 1.3. We begin by reviewing Construction 2A of Chen and Röllin (2010) for Stein couplings. Let X = (X 1 , . . . , X n ) be a collection of mean zero random variables, and let I be a random index uniformly distributed over [n], EJP 25 (2020), paper 132.
independent of X. Let W = i∈[n] X i and suppose that for each i = 1, . . . , n there exists W i such that E{X i |W i } = 0. (2.1) Then, with G = −nX I , the triple (W, W I , G) is a Stein coupling. To verify the claim, first note that On the other hand, so (1.2) holds.

Isolated vertices in ER(n, m)
Consider the Erdős and Rényi (1960)  We remark that though there may be a choice of couplings for a given situation, the coupling we have chosen will work for the more general problem where Y is a sum of functions h v of the degree d v of vertex v. For instance, the size bias coupling will work, as in Goldstein (2013), for counting the number of vertices having specified degrees, but not in this greater generality. (2.2) Let π be a uniformly chosen random permutation of [N ]. We will describe the construction of a graph G(m, π), determined by m and π, that has distribution ER(n, m). As n is determined by N , and hence by π, n may be omitted in the notation for the graph; the same principle will be applied without comment for like quantities that appear later.
We construct G(m, π) as follows. For each {v, w} ⊂ [n] with v < w, connect vertices v and w with an edge if and only if π −1 (i) ≤ m, where i is the index in the enumeration (2.2) corresponding to the pair {v, w}. Clearly this construction results in a graph with m edges -those with labels {π (1), . . . , π(m)}. (2.4) We now verify the conditions of Theorem 1.1 with Θ and r n,m as given in (1.20) and ( Since our definition of r n,m in (1.21) implies that r n,m = 0 whenever σ 2 n,m = 0, the condition that σ 2 n,m > 0 on is satisfied. Note that by Lemma 2.8 n ≥ n and m ≤ m ≤ cn 3/2 whenever (n, m) ∈ . (2.6) and set W = 0 otherwise. Assume (n, m) ∈ . Let Σ = (σ 1 , . . . , σ n ) be a collection of uniform random permutations of [N ], with π, σ 1 , . . . , σ n mutually independent. The purpose of the following algorithm is to take the graph G(m, π) as input and to construct, , and which can be closely coupled to G(m, π). We first describe the algorithm in words: Initialise counters k and i that respectively record the number of edges successfully relocated, and the index of a candidate edge for possible addition to the new graph; for each given vertex v ∈ [n], begin with G(m, π) and relocate the d v (m, π) edges incident to v uniformly by, incrementing i when needed, adding E n (σ v (i)) as a new edge when it connects two vertices, neither of which are incident to v (Step 6), and which are not already connected (Step 7). The counter k records the number of edges successfully relocated, and the set 3. Let k ← 0 and i ← 0.

Let
is an edge in G(m, π), then return to Step 5. 8. In G connect the vertices in E n (σ v (i)) by an edge, and moreover, let 9. Let k ← k + 1.

Return to
Step 4.
It is not difficult to see that the algorithm will succeed in redistributing the edges incident on v if and only if m ≤ n−1 2 , which is guaranteed by our choice of . Note that, given m, π and σ v , the construction of G v (m, π, σ v ) from G(m, π) is deterministic and hence, for given m, π and σ v , will always result in the same graph G v (m, π, σ v ). Note also that, although G v (m, π, σ v ) has only n − 1 vertices, we keep the labeling from the original graph G(m, π). Since the order at which potential locations where the d v (m, π) edges are added are sampled uniformly at random without replacement (2.7). With V a uniformly chosen vertex from [n], independent of π, σ 1 , . . . , σ n , and recalling the notation in (2.4), let (2.9) Since the distribution of G v (m, π, σ v ) is the same regardless of the value of d v (m, π), we conclude that I v (m, π) − µ n,m /n and Y v (m, π, σ v ) are independent, so (2.1) holds, implying (W, W , G) is a Stein coupling.

Condition (G3).
In what follows, consider a fixed (n, m) ∈ , and drop the subscript θ in the expectations that follow. As W is a function of (π, Σ), using (1.15) we have Now, from (2.8) and (2.9), we have Splitting the sum into two and using Var(X + Y ) ≤ 2 Var X + 2 Var Y , we have I v (m, π)B v (m, π, σ v ) and g m (π, Σ) =
. Note that f m (π, Σ) and g m (π, Σ) are deterministic functions of m, π and Σ. Applying Lemma 2.1 and using the notation as there, we obtain (2.13) since all differences arising from the first sum in (2.11) cancel except the one with (2.14) Let Hyp(N, m, n) count the number of white balls among m draws from an urn with N balls, n of which are white and N − n black. Note that the marginal distribution of the degree of any vertex in G(m, π) is Hyp N, m, n − 1 , and hence has mean 2m/n, since the graph's m edges are uniformly sampled among all N possibilities, and exactly n − 1 of them are associated with a specific vertex. Hence, applying Lemma 2.2, (2.13) and (2.14), we obtain where we recall C denotes a universal constant, whose value may change from line to line. Thus, as µ n,m ≤ n, (2.15) (2.16) EJP 25 (2020), paper 132.
Bounding R g,2 In order to bound R g,2 , with τ ij the transposition of i and j, note first since g m is a function of the graph G(m, π) and Σ, and by (2.3), the graph G(m, π) obtained from π does not change when swapping edge with edge or non-edge with non-edge. Hence, averaging over τ j , a transposition of j and a uniformly chosen index in {j, . . . , N }, yields By exchangeability the expectation on the right hand side is constant for j ≤ m and i ≥ m + 1; hence, for such i and j, here, we have first applied the inequality (x + y) 2 ≤ 2x 2 + 2y 2 , followed by (2.17) with m replaced by m + 1 to the first expectation in the expression that results to yield that g m+1 (πτ 1,m+1 , Σ) = g m+1 (π, Σ), and πτ 1,m+1 = d π to the second expectation, where = d denotes equality in distribution. Hence, ( 2.18) Now, recalling that L v (m, π, σ v ) is the set of indices of edges to which those edges adjacent to vertex v were relocated, let the set of vertices that received at least one additional edge when redistributing those edges. Also, let  π). A vertex w = v, will have this same effect if w is isolated in G(m, π) but then has an edge attached to it in the redistribution of the removed edges of v. On the other hand, a vertex w = v will decrease this difference by one when w is connected to v, and has degree 1 in G(m, π), and does not have such an edge reattached.
Hence, this difference is given by For the first term in (2.22), we have used that for any vertex w ∈ [n] we can only have I w (m, π) = I w (m + 1, π) when w is an endpoint of the additional edge determined by π(m + 1), that is, when w ∈ E n (π(m + 1)). For the second term in (2.22) we have used similarly that Moving now to the third term in (2.22), if v / ∈ E n (π(m + 1)) and π(m ∈ E n (π(m + 1)), vertex v has the same degree in both G(m, π) and G(m + 1, π), and if also π(m + 1) ∈ L v (m, π, σ v ), then Algorithm 1 will redistribute the edges adjacent to v to the same available pairs of vertices when v has degree m or m + 1; indeed, note that between the two cases m and m + 1, Step 7 changes only if σ v (i) = π(m + 1) for any of the i tested there, which is equivalent to π(m + 1) ∈ L v (m, π, σ v )). Therefore, if L v (m, π, σ v ) = L v (m + 1, π, σ v ), we must either have v ∈ E n (π(m + 1)) or π(m + 1) ∈ L v (m, π, σ v ). Now, if v ∈ E n (π(m + 1)), EJP 25 (2020), paper 132. then the degree of v in G(m + 1, π) is one more than its degree in G(m, π), so L v (m + 1, π, σ v ) will contain one more edge than L v (m, π, σ v ). And if π(m + 1) ∈ L v (m, π, σ v ), then |L v (m, π, σ v ) L v (m + 1, π, σ v )| = 2 since π(m + 1) will be found blocked when forming G v (m + 1, π, σ v ) and a new non-edge has to be found. Hence, For the fourth term in (2.22) we apply the bound Finally, for the last term, similarly as for the third, if both v / ∈ E n (π(m + 1)) and π(m under these conditions, the set of vertices adjacent to v does not change with the addition of edge m + 1, and moreover, If v ∈ E n (π(m + 1)), then v has one more neighbour in G(m + 1, π) than in G(m, π), and so L v (m+1, π, σ v ) will contain one more edge than L v (m, π, σ v ). In this case, M v (m, π, σ v ) and M v (m + 1, π, σ v ) can differ by at most three elements. Indeed, they may only differ by the additional neighbour in G(m + 1, π), and by at most two existing neighbours of v in G(m, π) which were not assigned an edge in L v (m, π, σ v ), but were so assigned in L v (m + 1, π, σ v ).
Therefore, R f,2 ≤ Cµ n,m m n ≤ Cµ n,m 1 + m 2 n 2 . (2.33) Combining the bounds (2.15), (2.16), (2.32) and (2.33) as in (2.12), and then recalling (2.10), we obtain Recalling (1.21) and noting that σ 2 n,m ≤ µ n,m by Lemma 2.5, the first condition in (1.7) holds, as Next, it clearly suffices to verify the second condition in (1.7) of (G3) with D replaced by its absolute upper bound obtained in (2.14), and splitting the resulting expression to be bounded into two terms, we have (2.35) Now, let a ≥ 1. Using the given form (2.8) of G, we obtain where, for the final inequality, we used that D = 1/σ n,n when I V = 1 and that EI V = µ n,m /n on the first summand, and Lemma 2.  and using (x + y) 2 ≤ 2(x 2 + y 2 ) twice, we obtain Lemma 2.9 yields that the first term is bounded by a constant. For the second term, by removing all edges from the n th vertex and relocating them among the remaining vertices, we have a coupling of ER(n, m) and ER(n − 1, m) which yields |Y n−1,m − Y n,m | ≤ 1 + 2d n , so that Using that r n,m in (1.21) is lower bounded by r, which is at least 1 by Lemma 2.8, and that µ n,m ≥ σ 2 n,m by Lemma 2.5 yields σ n,m ≥ (1 + (m/n) 2 ), and using also (2.38), we conclude that (2.39) For the corresponding second term of (2.36), with a = 2 and the additional factor of |W |, using Cauchy-Schwarz and EW 2 = 1,  (2.43) Indeed, if for m ≤ n, the bound follows using that 2 log m ≤ (1/4 − 4/344)m for m ≥ 28, while for n ≤ m one verifies, for n ≥ 344, that 4 √ n + 2 log n ≤ n/4. Now, bounding D by D as given in (2.34), writing F as short for F n,m,1 and using that P[I V = 0] = 1 − µ n,m /n in the final inequality, we obtain EJP 25 (2020), paper 132.
Since V cannot be both isolated and have positive degree, we have I V (1 − I F ) = 0 almsot surely, and so the first term is zero. Applying Cauchy-Schwarz to the second term and then invoking Lemma 2.2, (2.44) By Lemma 2.2 with γ = 2m/n being the mean of d 1 (m, π), we have for any t > γ that trivially, the final expression upper bounds the left hand side for t ≤ γ as well and hence holds for all t ≥ 0. Hence, with t(n, m) as in (2.43), by (2.44) and recalling r n,m in (1.21), we obtain where we have used that σ 2 n,m ≤ min{µ n,m , 2m} via Lemma 2.5, and trivially µ n,m ≤ n, for the second inequality, thus showing the first condition in (1.8) is satisfied.
From (2.36) with a = 2 it follows that sup (n,m)∈ r n,m E n,m GD 2 < ∞, thus showing that the second condition in (1.8) is also satisfied.

Condition (G5).
Denote by G emb,V the "embedded" graph obtained by removing vertex V and all its incident edges; we keep the original vertex labeling. As the remaining m−d V (m, π) edges are uniformly distributed over the remaining n−1 vertices, conditional on F n,m in (2.41), the resulting graph has conditional distribution (2.46) almost surely; this identity is again to be understood up to labeling. In particular, letting EJP 25 (2020), paper 132.
Clearly Ψ is F n,m -measurable. Now set F n,m,2 = F n,m,1 as in (2.42), which is also clearly F n,m measurable. Condition (1.10) is clearly equivalent to the first condition in (1.8), which was verified in (2.45).

Condition (G6). Let
which is clearly F n,m -measurable. Moreover, σ −1 n,m |Y − V | ≤ B since removing any edge connected to vertex V can make at most one vertex, other than V, isolated; the additional term of one accounts for the case when vertex V is isolated. Since B ≤ D, as given in (2.34), by setting a = 3 in (2.36) we obtain .

Condition (G7).
We verify the stronger conditions that (1.12) and the second bound of (1.13) hold when taking the larger supremum obtained when removing the intersection with {Ψ ∈ }. This stronger version of (1.12) is an immediate consequence of Lemma 2.9.
As this same lemma shows that the ratios in (1.13) involving means and variances are bounded by a constant, it is only required to bound the ratios of the remaining factor.
applying the Cauchy Schwarz inequality and noting that ( , from which the claim follows.

Lemma 2.2 (Tail and moment bounds for the hypergeometric distribution).
Let H have the hypergeometric distribution Hyp(N, m, n), which equals the number of white balls among m draws from an urn with N balls, n of which are white and N − n black. Let γ = EH = nm/N . Then, for any t > 0, Moreover, for any k ≥ 1, there is a constant C k independent of γ such that EH k ≤ C k (γ k + 1).
Proof. To construct a bounded size bias coupling, index the white balls by [n], and write H = n i=1 I i where I i is the indicator that the i th white ball is sampled. Construct H s with the H-size biased distribution by uniformly sampling a random index J from 1 to n independently of I 1 , . . . , I n ; if I J = 1, set H s = H, otherwise independently and uniformly select a ball from the sample and swap it with the J th white ball. It is easy to see that H s has the size-bias distribution, see for instance, Lemma 2.1 of Goldstein and Rinott (1996). Moreover, H s = H + 1 if a sampled black ball was swapped with the J th white ball, and H s = H otherwise. Hence, |H s − H| ≤ 1, and the tail-bound (2.47) follows readily from Theorem 1.1 of Ghosh and Goldstein (2011). Now, it is straightforward to check that t 2 /(2γ + t) ≥ (t − 1)/(γ + 1) whenever t ≥ 1 and γ > 0, so that Hence, H − γ − 1 is stochastically dominated by an exponential random variable X with mean 1/(γ + 1), and in particular from which the second claim easily follows.
A bound similar to (2.47) can be obtained from Greene and Wellner (2017, Corollary 1) with better constants, but under additional conditions on the parameters of the hypergeometric distribution EJP 25 (2020), paper 132.
Proof. Vertex v is isolated if and only if none of the n − 1 edges that connect v to another vertex is included in the set of m edges selected. Likewise, distinct vertices v and w are both isolated if and only if none of a particular set of (n − 2) + (n − 2) + 1 edges is selected. Hence, the first claim is equivalent to Expanding the binomial coefficients and canceling common factors yields the equivalent form where (n) k = n(n − 1) · · · (n − k + 1), and pairing up the k th factors of the falling factorials we obtain It suffices to show the inequality holds termwise. Expanding both sides of the k th term of each side and simplifying yields N + 2n ≤ n 2 + 1 + k.
Proof. Since the distribution of each individual degree is Hyp(N, m, n − 1), and as the hypothesis of Lemma 2.3 holds due to the restriction assumed on m, it follows from that lemma that yielding the upper bound in (2.49). Since under the assertions on m and n we have from which we obtain the lower bound in (2.49).
In order to prove the upper and lower bounds on the variance, we use the fact that Var(W ) = E{G(W − W )} when (W, W , G) is a Stein coupling for a mean zero random variable W ; this identity follows immediately upon setting f (x) = x in (1.2). Now recall (2.8), (2.9) and (2.21), and that N v (m, π, σ v ) in (2.19) is the set of vertices that receive at least one edge when forming G v (m, π, σ v ), and that M v (m, π, σ v ) in (2.20) is the set of all vertices w = v such that {v, w} is an edge in G(m, π), and does not receive a redistributed edge. As when I v (m, π) = 1 the sets N v (m, π, σ v ) and M v (m, π, σ v ) are empty, and recalling that I w,1 (m, π) = I[d w (m, π) = 1], we have (2.52) Now consider the first sum in (2.52). Note that when d 1 (m, π) = k, of the potential N edges, n − 1 have vertex 1 as an endpoint, and an additional m − k edges remain in EJP 25 (2020), paper 132. G(m, π) and are not redistributed. Hence, (2.53) To arrive at the hypergeometric expression in the sum in the last equality from the conditional probability that vertex 2 is incident on any of the k redistributed edges that were removed from vertex 1 when making the new graph, note that the total number of edges available is reduced from N first by n − 1, as vertex 1 has been removed, and also due to the m − k edges that were part of the original graph that are not changed. Of these remaining edges, n − 2 are incident on vertex 2, which is one fewer than their original number of n − 1, due to the removal of vertex 1.
Similarly, using the second moment expression from (2.29) E d 1 (m, π) 2 d 2 (m, π) = 0 = m(n − 2)(N + mn − 2n − 3m + 3) (N − n)(N − n + 1) , and so from (2.54) we obtain the lower bound E w∈N 1 (m,π,σ1) Now, for the first term in the brackets we have where we have used (2.51) for the last inequality. For the second term in the brackets, where again we have used (2.51) for the last inequality. Hence, together with the upper bound (2.55), we arrive at where M c,1 (m, π, σ 1 ) = {w : {w, 1} ∈ G(m, π), w ∈ N 1 (m, π, σ 1 )}. Taking expectation of the first sum on the right hand side of (2.57) and noting that the distributions of the degrees in the graph are hypergeometric, we obtain that E w:{w,1} ∈G(m,π) .
From this equality and using the assertions on m and n, we obtain µ n,m n Now taking expectation of the second sum of (2.57), We arrive at the first Hypergeomtric expression in the sum in the last equality by the same reasoning as that given following (2.53); the remaining two expressions in the sum follow by similar, and simpler, means. Now, for the first and last terms, using Lemma 2.3 for the upper bound, we have and thus, using in the final inequality that n 2 ≤ 4(N − m − n + 2), which holds via the assumption that m ≤ n 2 /4 − 3n/2, and that n 2 ≤ 4(N − n), which holds as n ≥ 6, true by EJP 25 (2020), paper 132. assumption, we obtain E w: M c,1 (m,π,σ1) (2.59) Using the estimates from (2.58) and (2.59) in the difference (2.57), and then applying that result and (2.56) in (2.52) yields the claim.
(2.63) Hence, with the first inequality in (2.60) holding with n 0 replaced by 27, and taking c 0 ≤ 1, Lemma 2.6 can be invoked to yield from which (2.61) now follows for any C 0 ≥ 8.
Since (2.64) is equivalent to the inequality y < e x − x − 1, which in turn is satisfied if y ≤ x 2 /2, since x 2 /2 < e x − x − 1, we arrive at the sufficient condition 78m(m + n) n 3 ≤ 2m 2 n 2 , which is equivalent to 39 ≤ m(2 − 39/n). This inequality holds whenever both n ≥ 78 and m ≥ 78.
We now proceed to bound the ratio between the upper and lower bounds, say σ 2 n,m and σ 2 n,m , respectively, of (2.50). Using the identity ( 2m n 2 when min(n, m) ≥ 312. If 2m/n > 1 and so x > 1, we simply use the lower bound and for any positive c 0 we can take n 0 large enough so that Hence, writing O(·) with the understanding that the implied bound holds with universal constants, recalling (2.66), and using Lemma 2.6 to bound µ n,m /n in its numerator, we as R 3 = O(R 2 ). In the case 2m/n ≤ 1, we have showing the first bound in (2.68). In the case 2m/n > 1, Applying (2.63), the second bound in (2.68) is shown. Now, using that σ 2 n,m ≥ σ 2 n,m , and writing small enough and m 0 large enough to guarantee that 0 ≤ a < 1 and −1 < b < 1, and hence obtain the upper and lower bounds from which the estimate (2.62) follows.
Lemma 2.8. Let r n,m be defined as in (1.21). For any integers n and m and any positive constant c > 0, there exists r ≥ 1 such that r n,m > r implies n ≥ n and m ≤ m ≤ cn 3/2 . (2.69) Proof. We will show that r n,m ≤ r for r = max n 1/2 , (2m) 3/2 , 1/c 2 , 1 if (2.69) is violated. Indeed, if n < n, we have by Lemma 2.5, and that µ n,m ≤ n, then Finally, if m > cn 3/2 , then similarly which is bounded by a constant via m ≤ c 0 n 3/2 , as in ( which, here using that d ≤ n/4, we see is also so bounded. For the ratios of variances, for 0 ≤ d ≤ min{n, m}/4 let We show that these four terms, and their reciprocals, can be uniformly bounded over the range of the supremum in (2.70). Since (2.60) holds for n, and m, we can apply Lemma 2.7, and also (2.5) for the first and final bounds, and obtain (2.71) Next, since m ≥ 2m 0 by (2.5) and d ≤ m/4, we have that m − d ≥ 3m/4 ≥ 3m 0 /2 ≥ m 0 . Since n ≥ 2n 0 , again by (2.5), we have that n − 1 ≥ n 0 , and since m ≤ (c 0 /2)n 3/2 by (2.5) and (n/(n − 1)) 3/2 ≤ 2 for n ≥ 3, we have that m − d ≤ m ≤ c 0 (n − 1) 3/2 . It follows that (m−d, n−1) also satisfies the hypotheses of Lemma 2.7. Using the lower bound on n from (2.5), we have 1/(n − 1) 3 ≤ 2/n 3 , and also from (2.6) that C 0 (2/m + 2m 2 /n 3 ) ≤ 1/2, so also using d ≤ m/4 for the second and second to last inequality, (2.72) Hence, (2.71) and (2.72) imply that Clearly, 1 ≤ R 3 ≤ 2 for n ≥ 2. Lastly, Note that by (2.6), and by (2.5) that gives that c ≤ 1, and also using d ≤ n/4, (2.73) It follows that 1/e y remains bounded on , and therefore, to show R 4 is bounded it suffices to show that But this ratio remains bounded from above, away from 1, as d ≤ m/4 implies The reciprocal 1/R 4 is bounded similarly, using that (2.73) shows that e y is bounded.

Jack measure on tableaux
We now turn to the study of the distribution of the standardized sum of the α-contents over all boxes in a tableaux whose shape is determined by the partition Λ n of n, that is, and where the partition Λ n is sampled from the Jack α measure in (1.22), as described in detail in the introduction; see (1.23) for an illustration of c α (x), where x ∈ Λ 7 .
Our bound is based on the zero bias construction in Fulman and Goldstein (2011), which itself depends on an exchangeable pair constructed using Kerov's growth process, a sequential procedure for growing a random partition distributed according to Jack α measure.
The state of Kerov's growth process at times n = 1, 2, . . . is a partition of n, starting at time 1 with the unique partition (1) of 1. To describe its transition rule from time n − 1 to n for n ≥ 2, given a box x in the diagram of a partition Λ n of n, let a(x) denote the number of boxes in the same row of x and to the right of x (the "arm" of x), and let l(x) denote the number of boxes in the same column of x and below x (the "leg" of x), as in (1.22). Now set and, for Λ n−1 a partition of n − 1 obtained from Λ n by removing a single corner box, let where C Λn/Λn−1 is the union of columns of Λ n that intersect Λ n − Λ n−1 and R Λn/Λn−1 is the union of rows of Λ n that intersect Λ n − Λ n−1 . If at stage n − 1 the state of the process is the partition Λ n−1 , a transition to the partition Λ n occurs with probability It is shown in Kerov (1994), see also , that if Λ n−1 is distributed according to Jack α measure on partitions of n − 1, then the partition Λ n obtained by this process at time n has the Jack α distribution.
In the proof of Theorem 3.1 of Fulman and Goldstein (2011), a variable having the zero bias distribution of W was constructed as follows. Fix n and α and let Λ k be the state of Kerov's growth process at time k, and set With dF (t|Λ n−1 ) the conditional distribution of T given Λ n−1 , constructing the pair on the same space as Λ n−1 , and letting U ∼ U[0, 1] be independent of V, T † and T ‡ , the has the W -zero bias distribution. In fact, the joint distribution on the right hand side of (3.4) can be achieved by running Kerov's growth process twice, conditionally independent on Λ n−1 . As shown in Fulman and Goldstein (2011), the resulting variables, say T and T , yield the crucial exchangeable Stein pair in (1.3) via (3.3). Again by Fulman and Goldstein (2011), both the conditional mean and variance of T given Λ n−1 do not depend on Λ n−1 ; specifically, E{T |Λ n−1 } = 0 and E{T 2 |Λ n−1 } = 2 n .
( 3.6) It is essentially for this reason that we may construct W * as in (3.5), using V ; for details, see Fulman and Goldstein (2011).
Proof of Theorem 1.5. We verify the conditions of Theorem 1.2.
Condition (Z2). The variable Y , given in (3.1) is easily seen to satisfy the needed conditions, and the construction of the zero bias variable W * is outlined above in (3.3), (3.4) and (3.5).
where λ 1 and λ 1 respectively denote the length of the first row and first column of the tableaux Λ n−1 produced by Kerov's growth process at time n − 1. Clearly D is F n,α measurable.
( 3.13) In what follows, we think of (n, α) ∈ as fixed and suppress the subscript in E n,α .
Turning to the moment conditions, we claim that (3.14) Now, .
To verify the first condition in (1.16), apply the Cauchy Schwarz inequality, (3.8) and (3.14) to obtain (3.16) To control P[F c n,α,1 ], with m = n − 1, we apply the inequality from the proof of Lemma 6.6 in Fulman (2004). Using that α ≥ n 1+ε ≥ m 1+ε in the third inequality below we obtain Substitution into (3.16) now verifies the first condition in (1.16).
The next result shows that the case when α is taken larger than that in Theorem 1.5 is degenerate; the boundary case ε = 1 is left unresolved.
Proof. Note that for all boxes x in the Tableaux with λ 1 = n we have a(x) = 0 and l(x) takes all values between 0 and n − 1. Hence, from the Jack α measure distribution as given in (1.22), Substituting the lower bound on α n into this inequality yields P n,αn [λ 1 = n] ≥ exp(−n 1−ε ) → 1 as n → ∞. for all n ≥ 2 and α > 0.
( 3.18) This rate function is equivalent to the one we take in (3.8) for the 'large α' parameter set (3.7), as there n ≤ α and 1/ √ n is dominated by √ α/n. Directly extending the arguments used here to cover the 'small' alpha regime requires that (3.16) hold for some choice of F n,α,1 . In particular, (3.14) shows that E n,α D 2 ≤ C/r 2 n,α , with r n,α as in (3.18). Hence, taking this route, one needs to specify F n,α,1 as an appropriate restriction on Λ n−1 that satisfies P n,α [F c n,α,1 ] < C/r 2 n,α , and which gives rise to a bounding D of the right order. If in this case B may be taken to be D as in (Z5) above, then D needs to be of order 1/r n,α .

Proof of Theorems 1.1 and 1.2
The proofs of Theorems 1.1 and 1.2 ultimately rely on obtaining information about the solution to a certain recursive inequality. In its simplest form, and closely related to the argument in Bolthausen (1984), this inequality becomes a n ≤ qa n−1 + c for n ≥ 2 and a 1 = 1 (4.1) for some 0 < q < 1 and c > 0. In this simple case, it is not difficult to solve the corresponding equality explicitly to yield a n = q n−1 + c 1 − q n−1 1 − q for n ≥ 1.
What is important here is not the exact form of the solution but rather that a n is uniformly bounded over n ≥ 1. We show below that this property holds in greater generality when we replace n on the left hand side of (4.1) by a generic parameter θ ∈ Θ, and average EJP 25 (2020), paper 132.
which is clearly impossible.
Proof of Theorem 1.1. Throughout the proof, C denotes a constant that does not depend on θ and can change from formula to formula. Note first that by Condition (G1) the bound (1.14) trivially holds for every θ ∈ Θ \ by taking C = r. Therefore we need only show that (1.14) holds for all θ ∈ . Let (4.6) Fix ε > 0, whose exact value is to be chosen later, and for z ∈ R define h z,ε (x) = Let f z,ε be the unique bounded solution to the Stein equation f z,ε (x) − xf z,ε (x) = h z,ε (x) − Eh z,ε (Z).
Using a standard smoothing inequality, see e.g. the proof of Theorem 5.1 in Chen, Goldstein and Shao (2011), we have (4.7) For ease of notation, we drop the indices z and ε from f .
are measurable, the first as each component is measurable, and the second being a composition of measurable maps.
Next, we show that if f (θ, ω) is measurable and P θ -integrable for all θ ∈ Θ, then θ → Ω f (θ, ω)dP θ (ω) is a measurable function of θ. Indeed, the collection M of subsets E of Θ × Ω for which the integral of f (ω, θ) = I E (ω, θ) is measurable with respect to P θ is a monotone class. The class M contains the rectangles which are products of measurable sets A and B, as their indicator which is a product of measurable functions of θ. Hence M contains the algebra of all finite disjoint unions of such rectangles, and hence, by the Monotone Class theorem, the sigma-algebra these rectangle generate, that is, the product sigma-algebra. Given a non-negative integrable function f (θ, ω), standard arguments using an approximating sequence of simple functions from below in concert with the Monotone Convergence Theorem yields the measurability of the integral of f (θ, ω), and then for real valued functions by breaking up of any given integrable function into positive and negative parts.