The Distribution of the Number of Isolated Nodes in the 1-Dimensional Soft Random Geometric Graph

We study the number of isolated nodes in a soft random geometric graph whose vertices constitute a Poisson process on the torus of length L (the line segment [0,L] with periodic boundary conditions), and where an edge is present between two nodes with a probability which depends on the distance between them. Edges between distinct pairs of nodes are mutually independent. In a suitable scaling regime, we show that the number of isolated nodes converges in total variation to a Poisson random variable. The result implies an upper bound on the probability that the random graph is connected.


Introduction
Random geometric graphs (RGGs) were introduced in [Gilbert, 1961] as a model for communication networks with limited connection range and have subsequently been widely used to model networks with a spatial element; see, e.g., [Penrose, 2003] and references therein. In Gilbert's model, nodes or vertices are randomly distributed within some space, typically R d or a bounded subset of it, and an edge is placed between nodes if their mutual distance is smaller than a specified threshold. We shall refer to this model as the hard RGG, as the connection probability has a hard cutoff as a function of the distance between nodes. A generalisation is to allow the edge between a pair of nodes separated by distance r to be present with some probability H(r), independent of all other edges. This model has been termed the Soft RGG [Penrose, 2016], the random connection model (RCM) [Meester and Roy, 1996], or a Waxman graph [Waxman, 1988]. In this work we refer to them as Soft RGGs and term H(·) the connection function. The hard RGG is a special case, obtained by setting H(r) = 1 for r ≤ r c and H(r) = 0 otherwise; r c > 0 is a parameter of the model.
The one-dimensional version of this model is motivated by vehicular ad-hoc networks (VANETs), which are expected to be essential for autonomous vehicles; these will be fitted with on-board radios to transmit information such as location, velocity, and hazard warnings between vehicles. The road is represented by a line, the nodes represent vehicles and an edge between two nodes indicates that the two vehicles can communicate directly. One key question is connectivity: When is every vehicle in the network able to communicate with every other, either via a single-or multi-hop path? A necessary condition for full connectivity is the absence of isolated nodes, namely nodes that are not connected to any other in the graph. In the 2-D and 3-D versions of the soft RGG model, it has been shown in a suitable asymptotic regime that the soft RGG is connected if and only if there are no isolated nodes [Mao andAnderson, 2012, Penrose, 2016]. It was further shown in [Mao and Anderson, 2012] that the the number of isolated nodes in a 2-D soft RGG can be well approximated by a Poisson distribution. The result was extended in [Penrose, 2016] to dimension three and greater; also see [Ganesh and Xue, 2007] for an analogous result for small-world networks. There has been little work to date on the 1-D model, which differs in important respects from those in two or more dimensions. In particular, the dominant reason for disconnection in 1-D hard RGGs is the presence of uncrossed gaps rather than of isolated nodes. Soft RGGs in 1-D were studied in [Wilsher et al., 2020], where it was shown that isolated nodes dominate uncrossed gaps as a cause of disconnection. The threshold for the emergence of isolated nodes was established in [Wilsher et al., 2020], and a Poisson limit was conjectured for the number of isolated nodes at the threshold (Conjecture 4.1). We prove this conjecture here.
We now describe the model studied in this paper and our assumptions. We consider a sequence of networks indexed by a parameter L ∈ R + , which tends to infinity along the sequence. The nodes or vertices of the network constitute a Poisson point process (PPP) of unit intensity on [0, L], which we denote P L . If two nodes are located a distance r apart, the edge between them is present with probability H L (r) = H(r/R L ), independent of all other edges; here, H : R + → [0, 1] is a given connection function, and R L > 0 is a scaling parameter to be specified.
In order to avoid inessential technicalities associated with the boundaries, we shall study the model with periodic boundary conditions. In other words, we turn [0, L] into a circle or torus by identifying 0 with L. We denote the circular distance on [0, L] by ρ L , i.e., ρ L (x, y) = min{|x − y|, L − |x − y|}. Finally, we define the connection probabilities to be h L (x, y) = H L (ρ L (x, y)). We denote by G h L (P L ) the graph with vertex set P, and independent edges generated with connection probabilities h L . This is the graph model we study. We believe that the same results hold if [0, L] is treated as a line segment rather than a circle, and that this can be established by analysing boundary effects separately, as in [Dettmann and Georgiou, 2016].
We make the following assumptions about the connection function. Assumptions: Let H 1 and H 2 denote the L 1 and L 2 norms of H : We assume that H 1 < ∞ and H 2 2 < H 1 . The first assumption says that H is integrable, and is required for the mean number of neighbours of a node to remain bounded as L tends to infinity. We expect this to hold in real-world networks. Next, observe from the definition that H 2 2 ≤ H 1 , since H(x) ∈ [0, 1] for all x as it is a probability. Hence, the second assumption, which asserts that this inequality is strict, is a mild one. It is satisfied whenever the set, {x ∈ R + : 0 < H(x) < 1}, where the connection probability is strictly between 0 and 1, has positive Lebesgue measure. Nevertheless, the assumption excludes the connection function of a hard RGG, which is {0, 1}-valued.
We now state our main result, which resolves a conjecture in [Wilsher et al., 2020].
Theorem 1. Fix τ ∈ R + and take R L = ln(τ L)/2 H 1 . Let N iso denote the number of isolated nodes in the soft RGG G h L (P L ). Its dependence on L and τ has been suppressed for notational convenience. As L tends to infinity, the random variable N iso converges in total variation distance to a Poisson random variable with mean 1/τ . In particular, P(N iso = 0) tends to e −1/τ .
As the soft RGG is disconnected if there are any isolated nodes, an immediate corollary of the theorem is that the probability that G h L (P L ) is connected is asymptotically bounded above by e −1/τ , in the scaling regime considered in the theorem. This upper bound would be tight if isolated nodes were the dominant cause of disconnection in this random graph model, as conjectured in [Wilsher et al., 2020] under the mild additional condition that the connection function has unbounded support. Resolving this conjecture remains an open problem, as does extending the analysis of this paper to point process models other than the Poisson process.

Proofs
Denote the total variation distance between probability distributions µ and ν on R by d T V (µ, ν). With a slight abuse of notation, we shall write d T V (X, Y ) for random variables X and Y to denote the total variation distance between their probability distributions. The following bound on the total variation distance between random variables X and Y defined on the same probability space is well-known and elementary: The proof of Theorem 1 proceeds through a sequence of lemmas. Our first result approximates the number of isolated nodes in our model with one in which the connection function is truncated. This step is not needed if the connection function has bounded support.
Proof. We can couple G h L (P L ) and Gh L (P L ) by first generating G h L (P L ), and then removing any edges of length at least R 1+1/α L . Observe that N iso ≥ N iso , since removing edges cannot reduce the number of isolated nodes. Therefore, it follows from Markov's inequality that (2) Using the expression for the expected number of isolated nodes in a soft RGG derived in [Wilsher et al., 2020, eqn. (9)], we get Now, L/2R L tends to infinity as L tends to infinity, and so L/2R L 0 H(r)dr tends to H 1 = ∞ 0 H(r)dr. Hence, it follows from the above that Similarly, since R L = ln(τ L) 2 H 1 tends to infinity with L, we have It follows from (3) and (4) (1) and (2).
Henceforth, we shall work with Gh L (P L ), the soft RGG with truncated connection function. We shall use the Chen-Stein method for Poisson approximation described in [Barbour et al., 1992]; as it requires a discrete index set, we fix m ∈ N and discretise the torus [0, L]. into mL segments of width 1/m. (Assume for convenience that L is an integer. Otherwise, we need one segment of a different width, which does not fundamentally alter the analysis.) Denote the ith segment by A i , where i takes values in the index set Γ = {1, 2, ..., mL}. Denote by P L and I the vertex set and the set of isolated nodes of the random graph Gh L (P L ), and by P L (A i ) and I(A i ) the subsets of these nodes lying within A i . Define for i ∈ Γ. Denote the centre of the segment A i by x i . (Although A i , I i , J i , and x i all implicitly depend on m, this dependence is suppressed for notational convenience.) Our next result states that the number of isolated nodes in well-approximated by W m,L when m is large.
Lemma 2. Let I i denote the indicator that the i th segment of [0, L] contains exactly one node, and that node is isolated in Gh L (P L ). Let W m,L denote the sum of these indicators, as defined in (5). Let N iso denote the total number of isolated nodes in Gh L (P L ). Then, for any fixed L > 0, Proof. Notice that the random variable W m,L is exactly the same as the number of isolated nodes unless there is a segment containing two or more nodes. Now, the number of nodes in a segment of length 1/m has a Poisson distribution with mean 1/m. Hence, it follows from (1) and the union bound that which tends to zero as m tends to infinity, as claimed.
In light of the above lemma, it suffices to establish a Poisson approximation for the random variables W m,L . To this end, we define the "neighbourhood" of an index i ∈ Γ by We also define the quantities We shall use of the following result on Chen-Stein approximation.  (7). Then, Thus, in order to establish a Poisson approximation for W m,L , we need to bound the terms b 1 , b 2 , b 3 .
Lemma 4. Let B i be defined as in (6) and b 1 , b 2 , b 3 as in (7). Then, b 3 = 0 for all m sufficiently large.
Proof. The connection functionh L is defined by truncating h L at R 1+1/α L . Hence, the event for which I i is the indicator depends only on nodes within distance R 1+1/α L of the segment A i . By the same reasoning, the isolation or otherwise of nodes at distance greater than 2R 1+1/α L from this segment is independent of the Poisson point process in the set {x : min y∈A i ρ L (x, y) ≤ R 1+1/α L }. Hence, I i is jointly independent of I j for all j such that min x∈A i ,y∈A j ρ L (x, y) > 2R 1+1/α L . In particular, if m is large enough that 1 1 m < R 1+1/α L , then I i is independent of the collection of random variables, {I j , j ∈ Γ \ B i }. Hence, b 3 = 0 for all m sufficiently large.
Lemma 5. Let B i be defined as in (6) and b 1 , b 2 , b 3 as in (7). Then, b 1 tends to zero as we let m tend to infinity, followed by L.
Proof. Observe from the definition of I i that where v ∈ P L (A i ) exists and is unique since J i = 1. Say v is located at (i/m) + x, where 0 ≤ x < 1/m. Now, the set of nodes to which v is connected form an inhomogenous Poisson process on [0, L] \ A i , of intensityh L (i/m + x, y); v is isolated only if this set is empty. Thus, by translation invariance of the connection function, the probability that v is isolated is given by L (x, y)dy, sinceh L is bounded above by 1. We have not made the dependence of κ x on m and L explicit in the notation. Suppose L is large enough that L/2R L ≥ R 1+1/α L . Then, using the translation invariance ofh L once more, we have which does not depend on x. Substituting in (8), we get that Substituting in the definition of b 1 , and noting that p i does not depend on i, we see that H(y)dy tends to H 1 as L, and hence R L , tends to infinity. So, it follows that This completes the proof of the lemma.
Lemma 6. Let B i be defined as in (6) and b 2 as in (7). Then, b 2 tends to zero as we let m tend to infinity, followed by L.
Proof. Consider two nodes located at x, y ∈ [0, L]. The set of all other nodes to which at least one of them has an edge constitutes an inhomogeneous PPP on [0, L], with intensity φ(·, {x, y}) given by Fix i, j ∈ Γ, i = j, and condition on the event that J i = 1 and J j = 1, i.e., that there is a unique point of the PPP, P L , on each of the segments A i and A j . Denote their positions by x and y. The set of nodes to which these might be connected, besides each other, constitutes a PPP on [0, L] \ (A i ∪ A j ) with intensity φ defined above. Hence, the probability that both these nodes are isolated is given by Now, by well-known properties of Poisson point processes, the unique points of the homogeneous PPP on the segments A i and A j are independently and uniformly distributed within them. Hence, Substituting in the definition of b 2 , we obtain that where we write A i (x) and A j (y) to denote the segments in which x and y lie; allowing them to lie in the same segment yields an upper bound, as does dropping the (1 −h L (x, y)) term. Now, since φh L is bounded above by 1. Hence, we conclude using the translation invariance ofh L that lim sup Substituting for φh L from (9), we have Now, by the Schwarz inequality, Ash L (y, ·) is just a circular shift ofh L (0, ·), we also have Substituting (12) and (13) into (11), we get Now, Substituting (15) and (16) in (14) Since R L = ln(τ L)/2 H 1 , we now obtain from (10)  It is straightforward to check that E[W m,L ] = mLp i tends to 1/τ . Finally, a straightforward calculation shows that, if a sequence λ n converges to λ, then P o(λ n ) converges to P o(λ) in total variation distance. Hence, invoking the triangle inequality once more, we obtain that d T V (N iso , P o(1/τ )) → 0 as m, L → ∞.
This completes the proof of the theorem.