On the rate of normal approximation for Poisson continuum percolation

It is known that the number of points in the largest cluster of a percolating Poisson process restricted to a large finite box is asymptotically normal. In this note, we establish a rate of convergence for the statement. As each point in the largest cluster is determined by points as far as the diameter of the box, known results in the literature of normal approximation for Poisson functionals cannot be directly applied. To disentangle the long-range dependence of the largest cluster, we use the fact that the second largest cluster has comparatively shorter range of dependence to restrict the range of dependence, apply a recently established result in Chen, R\"ollin and Xia (2021) to obtain a Berry-Esseen type bound for the normal approximation of the number of points belonging to clusters that have a restricted range of dependence, and then estimate the gap between this quantity and the number of points in the largest cluster.


Introduction and the main result
Let R m be the m-dimensional Euclidean space equipped with the Euclidean norm • .For each A, B ⊂ R m , we define d(A, B) = inf{ x − y : x ∈ A, y ∈ B}, where inf ∅ := ∞.We write d({x}, B) =: d(x, B) for simplicity.Given r > 0, we define B(A, r) = {y ∈ R m : d(y, A) < r} and write B({x}, r) = B(x, r), so that B(x, r) is simply a ball of radius r with its centre at x.We say that a Borel set A ⊂ R m is connected with radius r if for any x 1 , x 2 ∈ A, there exist a finite positive integer k ≥ 2 and {y 1 := x 1 , y 2 , . . ., y k−1 , y k := x 2 } ⊂ A such that B(y i , r) ∩ B(y i+1 , r) = ∅ for all i = 1, . . ., k − 1.We use card(A) or |A| to denote the cardinality of the set A and for convenience, we use the terms cardinality and size interchangeably.
Definition 1.1.For a Borel set S ⊂ R m , a subset A ⊂ S is called a cluster of S with radius r if A is connected with radius r and B(A, r) ∩ B(S \ A, r) = ∅.
For fixed r > 0, we say that a point process X percolates with radius r if X almost surely contains a unique infinite cluster with radius r.Let P λ be the homogeneous Poisson point process on R m with rate λ > 0. For any r > 0 and m ≥ 2, it is well known that there exists 0 < λ c (r, m) < ∞ such that P λ percolates if and only if λ > λ c (r, m); see [Zuev and Sidorenko (1985a), Zuev and Sidorenko (1985b)].Since P λ is scale invariant, it is enough to consider only r = 1 and write λ c := λ c (m) := λ c (1, m).From now on, any cluster with radius 1 is simply referred to as a cluster.Proving the exact values of λ c (m) remains an open question, although for m = 2, a sharp estimate was given in [Balister, Bollobás and Walter (2005)].There is also a vast literature considering more general continuum percolation since [Gilbert (1961)] initiated the study, where the point processes are not necessarily homogeneous Poisson and each ball B(x, r) can be replaced by a random shape centred at x; we refer the reader to [Meester and Roy (1996)] for a comprehensive overview.
In practice, any physical system is finite and the percolation phenomenon is examined through growing observation windows, hence it is of practical interest to study the asymptotic behaviour of the cardinality of the largest cluster inside a growing window in R m .The statistical behaviour of the largest cluster in a large finite observation window under both the regimes λ < λ c and λ > λ c have been thoroughly investigated in [Penrose (2003), Penrose and Pisztora (1996)], and here we briefly summarise some results for the case λ > λ c .Let Γ n := [−n/2, n/2] m be such a window and N n be the number of points in the largest cluster in P λ n := P λ ∩Γ n .It was shown in [Penrose (2003), Chapter 10] that when m ≥ 2, n −m N n → λp(λ) in probability as n → ∞, where 0 < p(λ) < 1 is the probability that the infinite cluster contains the origin.Furthermore, with probability tending to one as n → ∞, the size of the second largest cluster in P λ n is of the exact order Θ (ln n) m/(m−1) , thus establishing the uniqueness of the largest cluster.Large deviation estimates for the size, volume, and diameter of the largest cluster were provided in [Penrose and Pisztora (1996)].The result that is most pertinent to our work here is the central limit theorem for N n established in [Penrose (2003), Theorem 10.22] that holds for m ≥ 2 and λ > λ c .Let B 2 n = Var(N n ).[Penrose (2003), Theorem 10.22] and the errata in [Penrose (2010)] showed that, for m ≥ 2, there exists a constant 0 < σ Our main result below compliments this central limit theorem by providing a convergence rate in the Kolmogorov distance.
Remark 1.3.The logarithmic factor in (1.2) seems unavoidable because the percolation is a long-range dependent structure and it differs significantly from the geometric structures studied in [Schulte and Yukich (2023)].However, we suspect that the dependence of dimensionality in (ln n) 2m is due to the choice of the window size that we use to construct another score function with local dependence and it is not clear whether one can reduce or remove the dependence on m with a smaller window size or another method.
The proof of the central limit theorem in [Penrose (2003)] hinges on a martingale argument, while here we rely on Stein's method [Chen, Goldstein and Shao (2011)] to deduce the convergence and the rate in Theorem 1.2.Besides Stein method, one may also consider other tools such as the stabilisation tool [Penrose and Yukich (2001), Penrose and Yukich ( 2005)], the Malliavin-Stein technique via the Wiener-Itô expansion [Peccati et al. (2010)] and the second order Poincaré inequalities [Last, Peccati and Schulte (2016)].In fact, using these tools, a variety of central limit theorems have been developed for random quantities of the form x∈P λ ∩A ξ(x, P λ ), where A ⊂ R m is a bounded Borel set, and ξ(x, P λ ) is a score function that measures the contribution of x with respect to P λ ; see for examples [Barbour and Xia (2001), Penrose and Yukich (2001), Penrose and Yukich (2003), Penrose and Yukich (2005) To obtain rates of convergence, the existing literature on normal approximation generally requires the score functions to have short-range dependence, which loosely speaking, is the condition that the score function ξ(x, P λ ) depends only on points that are not too far away from x.For instance, [Bhattacharjee and Molchanov (2022), Penrose (2007), Schulte (2016), Xia and Yukich (2015), Schulte and Yukich (2023)] require the score function ξ(x, P λ ) to be determined by the points of P λ in a region near x or a ball B(x, R) with a random radius R such that P(R > t) decreases as the reciprocal of a polynomial or an exponential function of t as t → ∞.In our case, with the long-range dependence of the points in the percolation, P(R = Θ(n)) ≈ 1, so the score function in consideration here does not fit into the framework of such literature.
Strategy of the proof.To disentangle the long-range dependence, we use the characteristic of the second largest cluster to construct a suitable score function ξ ′ (x, P λ ) that takes value one if x belongs to a 'local' cluster that is typically larger than the second largest cluster, apply [Chen, Röllin and Xia (2021), Corollary 3.2] to obtain a Berry-Esseen type bound for the normal approximation of the sum N ′ θ,n of these score functions, and then bound the gap between N n and N ′ θ,n .

The proof of Theorem 1.2
To represent N n as the sum of appropriate score functions, for any X ⊂ R m , we write C(X ) as the set of all clusters of X , and for x ∈ X , let C(x, X ) be the cluster of X containing x, and write C 0 (X ) as the largest cluster of X if it is unique.Furthermore, define the score function of the point configuration P λ n at x as ξ(x, . The score function collects the points in the largest cluster in P λ n and N n = x∈P λ n ξ(x, P λ n ).
To tackle the long-range dependence, we first observe that the typical size of the second largest cluster in P λ n is no more than c(ln n) m/(m−1) for a constant c > 0 not depending on n.Next, for each x ∈ R m , we take the cube A x,θ,1 with the centre x and edge length 2θ ln n, A x,θ,2 = A x,θ,1 ∩ Γ n , and show that the point x is in the largest cluster in P λ n is essentially the same as that C(x, P λ ∩ A x,θ,2 ) is the largest cluster in P λ ∩ A x,θ,2 .However, the latter characterisation ensures that its corresponding score function has short-range dependence, so the tools of normal approximation to the sum of locally dependent score functions can be applied.For the size of the second largest cluster, the following lemma is a direct consequence of modifying (10.56) and (10.58) in the proof of [Penrose (2003), Theorem 10.18]; noting that the proof itself is an application of [Penrose and Pisztora (1996), Theorem 2].
Lemma 2.2.There exists a constant θ > 0 such that Proof.Let θ := θ(λ) denote the probability that there is an unbounded cluster D such that B(D, 1) intersects the ball of unit volume centred at the origin 0 ∈ R m .Furthermore, let E 1 be the event that the largest cluster C 0 (P λ n ) is the unique cluster such that |C 0 (P λ n )| ≥ 0.5λ θn m and with diameter at least 0.5n, where the diameter of a subset A ⊂ R m is sup{ x − y : x, y ∈ A}.Then, [Penrose and Pisztora (1996), Theorem 2] states that there exist constants k 3 > 0 and n 1 > 0 such that For any x ∈ Γ n , let E 1,x be the counterpart of E 1 with P λ n replaced with P λ n ∪ {x}.Since the extra point x does not reduce the largest cluster, (2.2) implies that Let k 1 , k 2 and k 3 be as in (2.1) and (2.2), E 2 := {|C (2) 1) } and θ = 11m/k 3 .In addition, for any x ∈ P λ n , let E 0,x be the event that the largest cluster in P λ ∩ A x,θ,2 is unique, |C 0 (P λ ∩ A x,θ,2 )| ≥ 0.5λ θ(θ ln n) m and it is of diameter at least 0.5θ ln n.We claim that (2.4) or equivalently {ξ(x, We first consider the case where ) must have diameter at least θ ln n − 1.On the event E 0,x ∩E 1 , the only cluster with diameter at least 0.5θ ln n is C 0 (P λ ∩A x,θ,2 ), and so C(x, P λ ∩A x,θ,2 ) = C 0 (P λ ∩A x,θ,2 ) and ξ ′ (x, θ, We turn to the other case where x ∈ P λ n belongs to C 0 (P λ n ∩ A x,θ,2 ) but not C 0 (P λ n ), i.e. {ξ(x, P λ n ) = 0, ξ ′ (x, θ, P λ n ) = 1}.On the event E 0,x ∩E 1 ∩E 2 , the second largest cluster C (2) 0 (P λ n ) has at most k 1 (ln n) m/(m−1) points, while the largest cluster in A x,θ,2 ∩ P λ has at least 0.5λ θ(θ ln n) m points, hence if ξ ′ (x, θ, P λ n ) = 1 and ξ(x, P λ n ) = 0, then 1) , which leads to a contradiction.This concludes the proof of (2.4).
Lemma 2.3.With θ as in the proof of Lemma 2.2, we have Proof.For the first claim, below we apply the Cauchy-Schwarz inequality in the second inequality and Lemma 2.2 in the last inequality to get Likewise, we have The third claim follows from (1.1) and the second claim.It can also be directly obtained from [Xia and Yukich (2015), Lemma 4.6] and the fact that the score function ξ ′ is locally dependent.
We now establish the error bound of the normal approximation to N ′ θ,n .Let B ′ θ,n be the standard deviation of N ′ θ,n .
With these preparations, we are now ready to prove Theorem 1.2.