Distribution of components in the k-nearest neighbour random geometric graph for k below the connectivity threshold

Let S_{n,k} denote the random geometric graph obtained by placing points inside a square of area n according to a Poisson point process of intensity 1 and joining each such point to the k=k(n) points of the process nearest to it. In this paper we show that if Pr(S_{n,k} connected)>n^{-\gamma_1} then the probability that S_{n,k} contains a pair of `small' components `close' to each other is o(n^{-c_1}) (in a precise sense of `small' and 'close'), for some absolute constants \gamma_1>0 and c_1>0. This answers a question of Walters. (A similar result was independently obtained by Balister.) As an application of our result, we show that the distribution of the connected components of S_{n,k} below the connectivity threshold is asymptotically Poisson.


Introduction
In this paper, we study the distribution of small connected components below the connectivity threshold in the k-nearest neighbour model. We prove that they lie far apart, and are approximately distributed like a Poisson process in both a numerical and a spatial sense.

Definitions
Let S n denote the square [0, √ n] 2 of area n, and let || · || denote the usual Euclidean norm in R 2 . Let k = k(n) be a nonnegative integer parameter. Scatter points at random inside S n according to a Poisson process of intensity 1. (Or equivalently, scatter N ∼ Poisson(n) points uniformly at random inside S n .) This gives us a random geometric vertex set P. The k-nearest neighbour graph on P, S n,k = S n,k (P), is obtained by setting an undirected edge between every vertex of P and the k vertices of P nearest to it.
The Poisson law of the random pointset P coupled with our deterministic definitions of S n,k gives rise to a probability measure on the space of k-nearest neighbour graphs with vertices in S n , which we refer to as the k-nearest neighbour model S n,k .
As this model has some inherent randomness, rather than proving deterministic statements about S n,k we are mostly interested in establishing that properties hold for 'typical' k-nearest neighbour graphs. Formally, given a property Q of k-nearest neighbour graphs (i.e a sequence of subsets Q n of the set of all geometric graphs on S n ) and a sequence k = k(n) of nonnegative integers, we say that S n,k(n) has property Q with high probability (whp) if lim n→∞ P(S n,k(n) ∈ Q n ) = 1.

Previous work on the k-nearest neighbour model
An important motivation for the study of the connectivity properties of the k-nearest neighbour model comes from the theory of ad-hoc wireless networks: suppose we have some radio transmitters (nodes) spread out over a large area and wishing to communicate using multiple hops, and that each transmitter can adjust its range so as to ensure twoway radio contact with the k nodes nearest to it. The connectivity of this radio network is modelled using S n,k in a natural way. As a result, the model has received a significant amount of attention, though many questions remain. (See e.g. the recent survey paper of Walters [12].) Elementary arguments show that there exist constants c l ≤ c u such that for k(n) ≤ c l log n S n,k is whp not connected, while for k(n) ≥ c u log n S n,k is whp connected. A bound of c u ≤ 5.1774 was obtained by Xue and Kumar [14], using a substantial result of Penrose [10] for the related Gilbert disc model [8]. An earlier bound of c u ≤ 3.5897 could also be read out of earlier work of Gonzáles-Barrios and Quiroz [9]. These results were substantially improved by Balister, Bollobás, Sarkar and Walters in a series of papers [3,5,4] in which they established inter alia the existence of a critical constant c ⋆ : 0.3043 < c ⋆ < 0.5139 such that for c < c ⋆ and k ≤ c log n, S n,k is whp not connected while for c > c ⋆ and k ≥ c log n, S n,k is whp connected. Building on their work, Walters and the author [7] recently proved that the transition from whp not connected to whp connected is sharp in k: there is an absolute constant C > 0 such that if S n,k is connected with probability at least ε > 0 and n is sufficiently large, then for k ′ ≥ k + C log(1/ε), S n,k ′ is connected with probability at least 1 − ε.
As part of their results, Balister, Bollobás, Sarkar and Walters [3] showed that when 0.3 log n < k < 0.6 log n whp all of the following hold: (Strictly speaking, they only showed that S n,k has at most one giant component; that such a component exists follows from the study of percolation in the k-nearest neighbour graph: see e.g. [2].) Refining their techniques, Walters [13] showed that around the connectivity threshold, there are no 'small' components (of diameter O( √ log n)) lying 'close' to the boundary of S n (within distance O(log n)), and used this to improve the upper bound on c ⋆ to 0.4125. Towards the end of his paper [13], Walters asked a number of questions about the properties of small components of S n,k below the connectivity threshold, with the aim of completing the picture we currently have and perhaps improving the upper bound on c ⋆ .

Results of the paper
In this paper we contribute to this project by proving that the 'small' components are whp far apart (or more precisely, not 'close' together) and that they are distributed like a Poisson point process inside S n . This answers one of Walters's questions.
To state our results formally, we must ascribe a precise meaning to 'small' and 'close'. We recall a result of Balister, Bollobás, Sarkar and Walters to do this.
Definition 1. Let λ = max(λ(0.3, 0.6, 2), e 2 ). A component shall be deemed small if it has diameter less than λ √ log n. Two small components shall be deemed close if they contain two points lying at a distance less than 8λ √ log n from one another.
Our main result is the following: Theorem 2. There exist absolute constants γ 1 > 0 and c 1 > 0 such that if P(S n,k connected) > n −γ 1 then P(S n,k contains a pair of small, close components) = o(n −c 1 ).
This answers Question 1 in Walters [13]. Balister has independently obtained a similar result (personal communication). Remark 1. Theorem 1 of [7] showed that, for n large enough, we need to increase k by at most a constant times log(1/ε) for S n,k to go from having an ε chance of being connected to having a 1 − ε chance of being connected. Assuming there is a 'bluntness' converse to this result, i.e. that we need to increase k by at least a (smaller) constant times log(1/ε) for this transition to occur, then there must be some constant δ : 0 < δ < c ⋆ such that for all k = k(n) with k(n) > (c ⋆ − δ) log n we have P(S n,k connected) > n −γ 1 . In particular Theorem 2 is not vacuous since there is a range of k for which (For example, k = ⌊c log n⌋ for any c with c ⋆ − δ < c < c ⋆ will do.) Next we turn to the distribution of the small components of S n,k . Let X = X n,k denote the number of small connected components of S n,k . (Since there is whp a unique non-small connected component [3], X is whp the number of components of S n,k minus 1.) Also, given ν ≥ 0 and A ⊆ N ∪ {0}, let Po ν (A) denote the probability a Poisson random variable with parameter ν takes a value inside A.
As an application of Theorem 2, we prove: There exist absolute constants γ 2 and c 2 > 0 such that if k = k(n) is an integer sequence with P(S n,k connected) > n −γ 2 for all n, then, writing ν = ν(n) for − log (P(S n,k connected)), we have Corollary 4. Let k(n) be an integer sequence. Suppose there is a subsequence (k(n i )) i∈N such that P(S n i ,k(n i ) connected) → e −ν for some constant ν ≥ 0. Then the law of X n i ,k(n i ) converges in distribution to Poisson with parameter ν: We also prove a spatial analogue of Theorem 3: not only is the number of small components (approximately) Poisson distributed, but their spatial location is (approximately) distributed according to a Poisson point process inside S n . We defer a precise statement of this result, Theorem 28, until Section 4.

Structure of the paper
We follow the strategy introduced by Balister, Bollobás, Sarkar and Walters in [4] and developed in [7]: we prove Theorem 2 by looking at local events. In Section 2, we prove a local result, Theorem 5, which can be thought of as an analogue of the local sharpness result Lemma 12 in [7].
Theorem 5 is then used in Section 3 together with local-global correspondence results from [7] to prove the global result Theorem 2.
Finally in the last section we use a form of the Chen-Stein Method [6,11] due to Arratia, Goldstein and Gordon [1] together with our results from the first two sections to prove Theorems 3 and 28 on the distribution of small components.

Proof of the local theorem
In this section we consider the connectivity of the k-nearest neighbour random geometric graph model on a local scale (i.e. within a region of area O(log n)).
Pick n sufficiently large. Let M = max(160⌈λ⌉, 50). (We remark that this is similar to but slightly larger than the choice of M in [7].) We consider a Poisson point process of intensity 1 in the box We place an undirected edge between every point and its k nearest neighbours to obtain the graph U n,k . Let us define two families of events related to the connectivity of U n,k : Definition 2. Let A k be the event that U n,k has a connected component wholly contained inside the central subsquare 1 2 U n . Let B k be the event that U n,k has at least two connected components wholly contained inside the central subsquare 1 2 U n .
x D U n x D U n Figure 1: The square grid Γ, the point x and the quarter-disc and complete disc considered in the proof of Lemma 6.
Our aim is to prove: Theorem 5. There exist constants c 3 , c 4 > 0 such that for all integers k with k ∈ (0.3 log n, 0.6 log n), we have This is saying that on a local scale it is far less likely that we have two small connected components close together than just one small connected component on its own. Our proof strategy is as follows: we show that whp if B k occurs then there must be a large empty region inside U n to which many points can be added without joining up all the components of U n,k which are contained inside 1 2 U n . This ensures that A k still occurs for the new pointset, and can be exploited to show A k is much more likely than B k .
To prove Theorem 5, we shall need a simple lemma on the concentration of Poisson random variables. Lemma 6. There exist constants λ 1 and λ 2 , such that for all integers k with k between 0.3 log n and 0.6 log n, the probability that there is any point x ∈ U n (not necessarily in the pointset arising from the Poisson point process) such that the ball of radius λ 1 √ log n about x contains at least k vertices of U n,k or that the ball of radius λ 2 √ log n about x contains fewer than k vertices of U n,k is o(n −2 ). Moreover, λ 2 can be chosen to be less than λ.
Remark 2. This says that the probability U n,k contains a pair of vertices not joined by an edge and lying within distance λ 1 √ log n of each other, or a pair of vertices joined by an edge and lying at distance at least λ 2 √ log n from one another is o(n −2 ), i.e. a negligible quantity.
Proof of Lemma 6. We first show that there is a constant λ 2 ≤ λ such that the probability that there is some point in U n with fewer than 0.6 log n vertices of U n,k within distance λ 2 √ log n of itself is o(n −2 ).
Let Γ denote the intersection of the integer grid Z 2 with U n . Let λ ′ 2 = 2 e 3 π and consider any x ∈ Γ. Since M > 50, at least one of the four standard quadrant quarter discs of radius λ ′ 2 √ log n about x is contained inside U n ; call this quarter disc D. The probability that D contains fewer than 0.6 log n vertices of U n,k is at most Thus the probability that Γ contains any point with fewer than 0.6 log n vertices of U n,k within distance λ ′ 2 √ log n is less than |Γ| × 0.6 log n n 4 = o(n −2 ). Since every point in U n lies at distance at most √ 2 from a point in Γ, this proves our claim for λ 2 = λ ′ 2 + 1, say. This is at most e 2 , which is less than λ, as required. Now let us show that there is a constant λ 1 such that the probability that there is some point in U n having more than 0.3 log n vertices of U n,k within distance and consider any x ∈ Γ. Let D denote the intersection of the disc of radius λ ′ 1 √ log n about x with U n . The probability that D contains more than 0.3 log n vertices of U n,k is at most Thus the probability that Γ contains any point with more than 0.3 log n vertices of U n,k . Now every point in U n lies at distance at most √ 2 from a point in Γ, so this proves our claim for Proof of Theorem 5. Let us assume that there is no point in U n with more than k vertices of U n,k within distance λ 1 √ log n of itself or fewer than k vertices of U n,k within distance λ 2 √ log n of itself. We shall denote by C the set of pointsets we are thus excluding. By We consider a perfect tiling of U n into tiles of area log n N 2 , for some (large) constant N . Explicitly, we shall choose where N 1 , N 2 and N 3 are the constants each of which will appear at one of the stages of our argument. The choice of N ≥ N 2 ensures that the inequality holds. (Both inequalities clearly hold for all sufficiently large N , and it is easily checked by solving two quadratic equations that the values of N 2 and N 3 given above will do.) Given a pointset P ⊂ U n , write U n,k (P) for the k-nearest neighbour graph on P.
Definition 3. For each tile Q, let B k (Q) be the event that the pointset P resulting from the Poisson process on U n has the following properties: (i) P ∈ B k (i.e. U n,k (P) has at least two connected components contained inside 1 2 U n ), (ii) Q contains no point of P, and (iii) for any set of points B ⊂ Q, the pointset P ∪ B lies in A k (i.e. U n,k (P ∪ B) has at least one connected component wholly contained inside 1 2 U n ). The key step in the proof of Theorem 5 is to show: In other words, if B k occurs we can (except in a negligible proportion of cases) find an empty tile to which we can add many points and still have A k occurring in the resulting pointset. This can be thought of as an analogue of the key Lemma 7 in [7].
Proof of Theorem 7. Let P be a pointset for which B k \ C occurs. Say that a tile is empty if it contains no point of P. Let X, Y be the vertex sets of two small connected components of U n,k (P) witnessing B k . (At least two such components must exist, though there could potentially be more.) Let a be a vertex in X ∪ Y nearest to the bottom side of U n . Without loss of generality, we may assume that a ∈ X. Let E be the horizontal line through a. Then all points of X and Y must lie on or above E. Now a lies in some tile, Q a say.
We consider the tiles directly below Q a . Since N ≥ N 1 > √ 5 λ 1 , the topmost of these tiles must be empty. There are two cases to consider. Either all tiles directly below Q a are empty, in which case we let Q denote the one among them which is incident with the boundary of U n ; or there is some tile directly below Q a , which is nonempty. Then let Q ′ be the topmost of these nonempty tiles, and let Q denote the (empty) tile directly above it.
Proof. Let B ⊂ Q be a nonempty set of points in Q, and let P ′ = P ∪ B. Our claim is that P ′ ∈ A k . To establish this, it is enough to show that there are no edges from Y to Y c in U n,k (P ′ ) (as then Y will be a connected component of U n,k (P ′ ) contained inside 1 2 U n ). Since the only edges in U n,k (P ′ ) that are not also edges of U n,k (P) have at least one end in Q, it suffices to show no vertex of Y is joined by an edge of U n,k (P ′ ) to a point b ∈ B. We split into two cases.
First of all, suppose Q is incident with the boundary of U n . Since P / ∈ C and λ 2 < M 4 (since λ 2 ≤ λ ≤ M 160 ), we know that no point in Q can have any of its k nearest neighbours inside 1 2 U n and vice-versa. Thus there are no edges between Q and Y ⊆ 1 2 U n in U n,k (P ′ ), and P ′ ∈ A k as required.
We now turn to the less trivial case where Q is not incident with the boundary of U n . Then the tile Q ′ directly below Q is nonempty: there exists c ∈ P ∩ Q ′ .
Let R denote the distance between c and its k th nearest neighbour in P.
Thus the distance between b and its k th nearest neighbour in P ′ will be at most R + √ 5 N √ log n. Also, ac is not an edge of the k-nearest neighbour graph on P, log n for all b ∈ B. Now, let b ∈ B ⊆ Q and suppose d is a point lying above E such that d is one of the k nearest neighbours of b in P ′ . Our earlier choice of N ≥ N 2 ensures the following holds: Remark 3. By our assumption on P (namely our assumption that P / ∈ C), this implies that ad is an edge in U n,k (P). Since a ∈ X it follows that d / ∈ Y .
Proof of Claim 1. This is an exercise in Euclidean geometry. (See Figure 2 Let e be the foot of the perpendicular to E which goes through b. As b lies in a tile directly below a's tile, we have ||a − e|| ≤ 1 N √ log n. It follows by Pythagoras's Theorem that Now the angle bed is obtuse. Hence, Finally, we have . Substituting this into the above yields which by our choice of N ≥ N 2 is less than λ 1 √ log n.
On the other hand, suppose d is a point lying above the line E such that b is one of the k nearest neighbours of d in P ′ . Then our earlier choice of N ≥ N 3 ensures the following holds: Remark 4. This implies that a is one of the k nearest neighbours of d in U n,k (P ′ ), and hence also in U n,k (P). As a ∈ X it follows that d / ∈ Y .
Proof of Claim 2. This is again an exercise in Euclidean geometry. (See Figure 2.) Since b is among the k nearest neighbours of d, it follows that ||b−d|| ≤ λ 2 √ log n (since d ∈ P and P / ∈ C). Similarly, as ac is not an edge of U n,k (P), we must have that ||a − c|| ≥ λ 1 √ log n. Let e be the foot of the perpendicular to the line E which goes through b. Since the triangle bed is obtuse, we have ||e − d|| ≤ ||b − d||. As b lies in a tile directly below a's tile, we have ||a − e|| ≤ 1 N √ log n. Also, as c lies in a tile directly below b's tile we have Using again the fact that the triangle bed is obtuse, we have Finally we have where the last line follows from the fact that ||e − d|| ≤ ||b − d|| ≤ λ 2 √ log n. Now our choice of N (more specifically our choice of N ≥ N 3 ) guarantees that so that by comparing (1) and (2) we have ||a − d|| < ||b − d|| as claimed.
As remarked Claim 1 tells us that if d is one of b's k nearest neighbours in P ′ then d is one of a's k nearest neighbours in P (since P / ∈ C) -and in particular d / ∈ Y . On the other hand Claim 2 tells us that if b is one of d's k nearest neighbours in P ′ then a is one of d's k nearest neighbours in P -so that again d / ∈ Y . Combining these two claims, we see that there are no edges between B and Y in U n,k (P ′ ), and hence that Y is a connected component of U n,k (P ′ ) contained inside 1 2 U n . We thus have P ′ ∈ A k , as claimed.
For every P ∈ B k \ C, we have thus shown there exists a tile Q such that P ∈ B k (Q), proving Theorem 7.
Having established Theorem 7, the rest of the proof of Theorem 5 is straightforward. We can consider a Poisson process of intensity 1 on U n as the union of two independent Poisson processes on U n \ Q and Q respectively. Call the corresponding random pointsets P and B respectively. The event B k (Q) can then be considered as a product event whereB k (Q) is an event depending only on the points inside U n \ Q.
We then have where the last line follows from property (iii) of B k (Q) (which stated that if B k (Q) occurs then no matter what points we add to Q, the event A k will occur in the resulting modified pointset). Now This concludes the proof of Theorem 5 with c 3 = (M N ) 2 and c 4 = 1 N 2 .

Proof of the global theorem
In this section we prove our global result, Theorem 2. It will be convenient to prove something slightly stronger than Theorem 2 (albeit with a more cumbersome statement), namely: Theorem 9. There exist constants γ ′ 1 and c ′ 1 > 0 such that for every ε : 0 < ε ≤ 1 2 , all n > ε −1/γ ′ 1 and all integers k ∈ (0.3 log n, 0.6 log n), if P(S n,k connected) ≥ ε holds, then P(S n,k contains a pair of small, close components) < log 1 ε n −c ′ 1 .
Proof of Theorem 2 from Theorem 9. The proof of Theorem 5 of [3] establishes that there exists a constant γ > 0 such that for any k ≤ 0.3 log n, the probability that S n,k is connected is o(n −γ ). Also, Theorem 13 of [3] shows that for k ≥ 0.6 log n, the probability that S n,k contains any small component is o(n −(0.6 log 7−1) ). Thus Theorem 2 is immediate from Theorem 9 together with an appropriate choice of the constants γ 1 and c 1 .
Proof of Theorem 9 from Theorem 5. Let B be the event that S n,k contains a pair of small, close components. We need a few results from [7] relating local connectivity to global connectivity.
Lemma 10 (Lemma 2 of [7]). For any n and any integer k with 0.3 log n < k < 0.6 log n, the probability that U n,k contains an edge of length at least M √ log n 8 is O(n −6 ).
(Note that our choice of M is slightly larger than in [7]; however the proof of Lemma 2 in that paper only used the fact that M > 30, and so holds in the present setting also.) To do away with boundary effects, we shall restrict our attention to 'most' of S n . Let The nice feature of T n is that it is not very close to any of the boundary of S n . The following is an easy Corollary of Theorem 1 of [13]: Lemma 11. There is a positive constant 0 < c 5 < 2 such that if k > 0.3 log n then the probability that S n,k contains any small component not wholly contained inside T n is O(n −c 5 ).
As in [4,7], we now define two covers of T n by copies of U n . The independent cover C 1 of T n is obtained by covering T n with copies of U n with disjoint interiors. The dominating cover C 2 of T n is obtained from C 1 by replacing each square V ∈ C 1 by the twentyfive translates V + (i M ), i, j ∈ {0, ±1, ±2}. By construction, we have C 1 ⊆ C 2 and the copies of 1 4 U n corresponding to elements of C 2 cover the whole of T n . Also |C 2 | < 25n M 2 log n . We shall write 'A k occurs in C i ' as a convenient shorthand for 'there is a copy V of U n in C i for which the event corresponding to A k occurs', and similarly for B k . We shall need the following results from [7]: Lemma 12 (Lemma 5 in [7]). For all n ∈ N and all integers k with 0.3 log n < k < 0.6 log n, and c 5 as given by Lemma 11, Lemma 13 (Lemma 6 in [7]). There exists a constant γ ′ 1 > 0 such that for all ε : 0 < ε ≤ 1 2 , all integers n > ε −1/γ ′ 1 and all integers k with k ∈ (0.3 log n, 0.6 log n), if P(S n,k connected) ≥ ε holds then P(A k ) ≤ eM 2 log n n log 1 ε .
Similarly to Lemma 12, we have Lemma 14. For all n ∈ N and all integers k with 0.3 log n < k < 0.6 log n, and c 5 as given by Lemma 11, Proof. This is an easy modification of the proof of Lemma 12. . Now, fix ε : 0 < ε ≤ 1 2 . Suppose P(S n,k connected) ≥ ε. Provided n > ε −1/γ ′ 1 , we have by Lemma 13 that P(A k ) < eM 2 log n n log 1 ε .

The distribution of the small connected components
In this section, we use Theorems 2 and 5 together with a form of the Chen-Stein Method due to Arratia, Goldstein and Gordon [1] to show that the small components in S n,k are asymptotically Poisson distributed in a spatial as well as a numerical sense.
The Arratia, Goldstein and Gordon result essentially tells us that if the presence of one small component in a subregion of area O(log n) does not greatly increase the chance of having other small components in the same subregion, then the number of small components is Poisson distributed (just as we would expect it to be if small components were rare events occurring independently at random inside S n ).
We thus proceed in two stages. First of all we find a good approximation to the distribution of the small components of S n,k using local events. This requires us to use Theorem 2, amongst other things. Then we adapt our local result Theorem 5 to show that the local events we define are negatively correlated -to be more precise, they are independent if sufficiently far apart, and negatively dependent otherwise. An application of the theorem of Arratia, Goldstein and Gordon concludes the proof.

Local approximation
We set up some counting functions for the small components. Given a point x ∈ S n , let V n (x) be the square of area 4λ √ log n 2 centred at x.
(Recall that λ is the constant given by Definition 1 in the introduction; with probability 1 − o(n −2 ) there are no edges of length greater or equal to λ √ log n and not more than one component of diameter greater than λ √ log n in S n,k .) Also let V n,k (x) be the k-nearest neighbour graph on the set of points placed inside V n (x) by the Poisson point process on S n . Definition 4. Let Γ be the grid Γ = {x ∈ Z 2 : V n (x) ⊆ S n }. Given x ∈ Γ, let the local counting function Y (x) = Y n,k (x) be the random variable taking the value 1 if there is a connected component H in V n,k (x) such that H has diameter less than λ √ log n and x is the (almost surely unique) member of Z 2 closest to the (almost surely unique) bottom-most vertex of H. Pick x ∈ Γ and set p = p(n, k) := P(Y (x) = 1).
Note that P(Y (x) = 1) = P(Y (x ′ ) = 1) for all x, x ′ ∈ Γ, so that the definition of p is independent of x.
Definition 5. Given x ∈ Γ, let the global counting function X(x) = X n,k (x) be the random variable taking the value 1 if there is a connected component H in S n,k such that H has diameter less than λ √ log n and x is the (almost surely unique) member of Z 2 closest to the (almost surely unique) bottom-most vertex of H.
We shall show that whp the small components of S n,k are counted exactly by x X(x), and then that whp X(x) = Y (x) for all x ∈ Γ. For this we need some easy lemmas.
Lemma 15. Suppose S n,k contains no edge of length greater than λ √ log n and that x ∈ Γ is such that V n,k (x) contains no edge of length greater than λ √ log n. Then X(x) = Y (x).
Proof. This is a minor modification of Lemma 4 of [7].
Lemma 16. Suppose S n,k has at most one non-small component, no two small components close together and no small component close to the boundary of S n . Then there is (almost surely) a one-to-one correspondence between the small components of S n,k and the x ∈ Γ for which X(x) = 1.
Proof. Almost surely every connected component of S n,k is counted by at most one X(x).
Since there are no small components close to the boundary, it follows that every small component is counted by at least one X(x). Finally since there are no small components close together, every X(x) counts at most one component.

Definition 6.
We set D to be a collection of bad events. Let D be the event that any of the following occur: (i) S n,k contains an edge of length at least λ √ log n (ii) there is some x ∈ Γ for which V n,k (x) contains an edge of length at least λ √ log n (iii) S n,k contains at least two components of diameter at least λ √ log n (iv) S n,k contains a small component H such that the point of Z 2 closest to the bottommost vertex of H does not lie in Γ (v) S n,k contains at least two components of diameter less than λ √ log n lying within distance less than 8λ √ log n of each other (vi) S n,k contains a small component H such that there is more than one element of Z 2 closest to a bottom-most vertex of H (vii) there is some x ∈ Γ for which V n,k (x) contains a small component H such that there is more than one element of Z 2 closest to a bottom-most vertex of H.
Our two previous lemmas have the following corollary: Corollary 17. Suppose that D does not occur. Then there is a one-to-one correspondence between the small components of S n,k and the x for which Y (x) = 1.
How large can this bad set D be? All but (v) were shown whp not to occur in [7] (up to some trivial changes of constants); so all we need to do is apply Theorem 2.
Lemma 18. There exists a constant c 6 > 0 such that if k is an integer with k ∈ (0.3 log n, 0.6 log n) and P(S n,k connected) ≥ n −γ 1 , then Proof. Provided we choose c 6 small enough, this is immediate from the properties of λ given in Definition 1 (properties (i) and (iii)), Lemma 10 applied n times (property (ii)note we use the fact λ 2 ≤ λ here), Lemma 11 (property (iv)), Theorem 2 (property (v)) and the fact that almost surely no point of the Poisson process falls on the midpoints of two members of Γ and no two points of the Poisson process fall on the same horizontal line (properties (vi) and (vii)).

Global approximation
We now study the distribution of Y := x∈Γ Y (x). Our aim is to show its law is Poissonlike. To achieve this we use the Chen-Stein Method [6,11] in a form due to Arratia, Goldstein and Gordon [1].
Informally this says that provided that mutually dependent pairs of random variables Y (x) and Y (x ′ ) are not likely to both equal 1 and that there are not too many such pairs, then the distribution of Y is approximately Poisson. To state Arratia, Goldstein and Gordon's theorem precisely, we need some definitions and notation.
Let x ∈ Γ. We can consider a Poisson point process on S n as the union of independent Poisson point processes on V n (x) and S n \ V n (x). By definition Y (x) is independent of the point process on S n \ V n (x).
Set Γ x to be the set of y ∈ Γ for which V n (x) ∩ V n (y) = ∅; this can be thought of as the set of possible dependencies for Y (x). Define Now let µ be the mean of Y , and recall from the introduction that Po µ (A) is the probability that a Poisson random variable with parameter µ takes a value inside the set A. Then the following holds: Theorem 19 (Arratia, Goldstein and Gordon [1]).
Thus provided we can obtain good upper bounds on b 1 and b 2 , we will be close to done. That b 1 is small will follow from our assumption that the probability of S n,k being connected is not too small. Let c 8 > 0 and γ 2 > 0 be strictly positive constants chosen sufficiently small to satisfy c 8 < min(1, c 4 , c 6 ) and γ 2 ≤ min(γ 1 , c 8 2 ) respectively.
Proof of Lemma 20. Suppose that P(S n,k connected) > n −γ with γ ≤ min(γ 1 , c 6 2 ). Now We have P(S n,k connected) > n −γ by hypothesis and P(D) = o(n −c 6 ) by Lemma 18, so that Now by definition Y (x) and Y (x ′ ) are independent random variables for any x, Since there is a set of at least n 1000λ 2 log n members of Γ which are 4 √ 2λ √ log n separated, we have Combining this with the lower bound for P(Y = 0) obtained above, we get for some suitable constant c 7 > 0 (since γ ≤ c 6 2 ). In particular, µ = |Γ|p < c 7 (log n) 2 .
We now turn our attention to b 2 . Here we will need to use a variant of Theorem 5: Lemma 22. There exists a constant c ′ 3 > 0 such that if k ∈ (0.3 log n, 0.6 log n) and x, x ′ are points in Γ with x ′ ∈ Γ x , then Proof. Lemma 22 does not follow directly from Theorem 5 since instead of working with a nice square U n , we begin instead with the union of two intersecting squares V n (x) and V n (x ′ ). However only a slight modification of our argument in the proof of Theorem 5 is needed to establish Lemma 22. We consider a translate U ′ n of U n such that the centre subsquare We consider the k-nearest neighbour graph U ′ n,k on the points placed inside U ′ n by our Poisson process on S n . We can define events A k (x) and B k (x, x ′ ) corresponding to {Y (x) = 1} and {Y (x) = Y (x ′ ) = 1} in a natural way: for y = x, x ′ let A k (y) be the event that U ′ n,k contains a Figure 3: The intersecting squares V n and V n (x ′ ), and the translate U ′ n with its centre subsquare 1 2 U ′ n featuring in the proof of Lemma 22.
small connected component H such that y is the unique element of Z 2 closest to a bottommost point of H, and let B k (x, x ′ ) be the event that both of these happen, By following exactly the proof of Theorem 5, we get . Now all we have to show is that A k (x) and A k (x ′ ) are essentially the same events as {Y (x) = 1} and {Y (x ′ ) = 1} respectively. This is straightforward from Lemma 4 of [7], Lemma 15 and Lemma 10, which taken together show so that Lemma 22 follows with c ′ 3 = 2c 3 .
We are now able to bound b 2 from above: Lemma 23. Suppose k ∈ (0.3 log n, 0.6 log n) and P(S n,k connected) > n −γ 2 .

Putting it together
We now put together the results of the previous sections to prove Theorem 3.
Corollary 24 tells us that Y is approximately Poisson with parameter µ = EY , while Corollary 17 tells us that whp Y counts exactly the small components of S n,k . Our aim is to show that the number of small components X of S n,k is approximately Poisson with parameter ν = ν(n, k), where ν(n, k) := − log P(S n,k connected).
Thus what we have left to show is that Po µ and Po ν are approximately the same probability measure. We do this in two stages: first we prove µ and ν are almost equal, and then use that to show Po µ and Po ν are approximately the same.
It readily follows that Po µ and Po ν are close to being the same measure. Theorem 3 then follows from the triangle inequality and appropriately small choices of the constants c 2 > 0 and γ 2 > 0.
Proof of Theorem 3. Let X = X n,k denote the number of small components in S n,k . Assume P(S n,k connected) > n −γ 2 .
The proof of Theorem 5 of [3] establishes the existence of a constant γ > 0 such that for k ≤ 0.3 log n we have P(S n,k connected) = o(n −γ ).
Provided γ 2 < γ and n is sufficiently large this together with our assumption on the connectivity of S n,k guarantees k > 0.3 log n. Also Theorem 13 of [3] shows that for k ≥ 0.6 log n the probability that S n,k contains any small component Thus picking c 2 and γ 2 to satisfy 0 < c 2 < min c 6 , 0.6 log 7 − 1, c 8 2 and 0 < γ 2 < min γ, γ 1 , c 8 2 we are done.

Process version
Let us conclude this paper by showing as promised that the locations of the small components are approximately distributed like a Poisson process.
Let X and Y be the |Γ|-dimensional vectors X = (X(x)) x∈Γ and Y = (Y (x)) x∈Γ respectively. We define two Poisson processes on Γ.
Arratia, Goldstein and Gordon [1] gave the following process version of their Poisson approximation theorem: which is known as the total variation distance between the distributions of Y and Z.
Let us now apply Theorem 27 to prove a process version of Theorem 3: Theorem 28. Suppose P(S n,k connected) > n −γ 2 . Then D(X, Z ′ ) = o(n −c 2 ).
Proof. Suppose P(S n,k connected) > n −γ 2 . As in the proof of Theorem 3, we may assume 0.3 log n < k < 0.6 log n provided n is sufficiently large. We know by Corollary 17 that if D c does not occur then X = Y. Theorem 27 and the bounds on b 1 and b 2 we obtained in Lemmas 21 and 23 tell us D(Y, Z) is at most o(n −c 2 ). We know by Lemma 25 that ν = µ + o(n −c 8 /2 ). Dividing through by |Γ| we get that from which it is easy to show that D(Z, Z ′ ) = o(n −c 2 ): running exactly the same argument as in Corollary 26 but with p and p ′ instead of µ and ν, and using (5) instead of Lemma 25, we get that sup = o(n −c 2 ) by Lemma 18, since c 2 < c 6 .
What Theorem 28 effectively says is that the location of the small components of S n,k inside S n is approximately a Poisson point process. Thus the distribution of the small components is approximately Poisson in both a numerical and in a spatial sense.

Acknowledgements
I am grateful to Paul Balister, Oliver Riordan and Mark Walters for helpful comments and discussions, which helped greatly improve the presentation of this article.
I would also like to thank the anonymous referee for his/her careful work on the paper.