Concentration Bounds for Geometric Poisson Functionals: Logarithmic Sobolev Inequalities Revisited

We prove new concentration estimates for random variables that are functionals of a Poisson measure defined on a general measure space. Our results are specifically adapted to geometric applications, and are based on a pervasive use of a powerful logarithmic Sobolev inequality proved by L. Wu (2000), as well as on several variations of the so-called Herbst argument. We provide several applications, in particular to edge counting and more general length power functionals in random geometric graphs, as well as to the convex distance for random point measures recently introduced by M. Reitzner (2013).

1. Introduction 1.1. Overview. Let η be a Poisson random measure over some measurable space (X, X) such that X is countably generated, and assume that η has a σ-finite intensity µ. Let F = F (η) be a real-valued functional of η having finite expectation. In this paper, we are interested in proving several novel estimates for the upper and lower tails P(F ≥ EF + r) and P(F ≤ EF − r), r > 0, that are well adapted for geometric applications, with particular emphasis on quantities appearing in the modern theory of random geometric graphs -see e.g. [30].
Our techniques are based on many variations of the so-called Herbst argument (see e.g. [3,4,25,26]), basically consisting in using a logarithmic Sobolev inequality (or, alternatively, an integration by parts formula) in order to deduce a differential inequality involving the moment generating function u → K(u) := E[e uF ]; solving the inequality then yields an upper bound on K(u), implying in turn a tail estimate for F by means of Markov's inequality. The main insight developed in the present paper is that, by carefully combining the Mecke formula for Poisson point processes (see Section 2) with logarithmic Sobolev inequalities such as the one in Theorem 1.1 below, one can deduce bounds on K(u) involving quantities of a fundamental geometric nature. As discussed below, other approaches to concentration via the Herbst argument on the Poisson space (see e.g. [7,16,43]) do not yield conditions that are amenable to geometric analysis.

1.2.
Logarithmic Sobolev inequalities and a motivating example. Our starting point is the following powerful Theorem 1.1, proved by Wu in [43] (see also [8]), and extending previous breakthrough findings contained in [1,2]. Such a result involves two objects: (i) the entropy of a random variable Z > 0 with EZ < ∞, that is defined as Ent(Z) := E(Z log(Z)) − E(Z) log(EZ), and (ii) the difference (or add-one cost) operator DF , that is defined for any x ∈ X as where δ x denotes the Dirac mass at x ∈ X. Theorem 1.1 (See Corollary 2.3 in [43]). For all λ ∈ R satisfying E(e λF ) < ∞ we have where ψ(z) = ze z − e z + 1.
A typical way of applying (1.1) to concentration estimates (via the Herbst argument) is demonstrated e.g. in [43,Proposition 3.1], where it is proved that, if DF and X (DF ) 2 dµ are almost surely bounded by positive constants β and c, respectively, then the upper tail of F is bounded by the function r → exp − r 2β log 1 + βr c , r > 0.
Letting β → 0, one deduces from this estimate that, if DF ≤ 0 and X (DF ) 2 dµ ≤ c, then, , r > 0, that is: such a concentration result only captures a Gaussian behaviour for the upper tail in the case of a non-increasing functional F such that X (DF ) 2 dµ is deterministically bounded. Apart from the monotonicity requirement on F , a crucial limitation of a result of this kind is that, in most examples where F is a quantity arising in stochastic geometry (for instance, F is an edge-counting statistic such as the ones considered in Section 6 below), the quantity X (DF ) 2 dµ does not admit any meaningful geometric interpretation -roughly because averaging DF over the deterministic measure µ completely cancels the special role played by those points in X that belong to the support of η. One should contrast such a situation with the following statement, that will be proved later on as a special case of Corollary 3.3 (such a result also implies the already quoted Proposition 3.1 in [43]): Proposition 1.2. Assume that there exists a finite constant c > 0 such that, almost surely, where (u) − and (u) + stand for the negative and positive part of u ∈ R, respectively. Then, Proposition 1.2 is particularly interesting when F is non-decreasing, that is, when DF ≥ 0. Indeed, in this case one has that V + = X (F (η) − F (η − δ x )) 2 + dη(x) and, since the role of µ is now immaterial, the relation V + ≤ c can be in principle verified by means of arguments of a purely geometric or combinatorial nature. For instance, we implement this strategy in Proposition 8.3 below, where we use Proposition 1.2 in order to deduce a novel intrinsic proof of the Gaussian upper tail behaviour of the convex distance for point processes -as recently introduced by Reitzner in [31]. As anticipated, the principal aim of this paper is to prove a large collection of statements with the same flavour as Proposition 1.2 (see Section 3), and then to apply them to random variables arising in the theory of random geometric graphs.

Plan.
Our work is organised as follows. After some preliminary facts discussed in Section 2, the subsequent Section 3 contains the statements of our main concentration estimates. Our results involve random variables having a form similar to the quantity V + introduced above, and largely generalise Proposition 1.2. Proofs are detailed in Section 4. Section 5 presents several applications of the results of Section 3 to Poisson Ustatistics of arbitrary order -as defined in the seminal reference [32] (see also [6,9,12,20,21,29,37,22]). As a by-product of our analysis, in Proposition 5.7 we also establish a new characterisation of square-integrable Poisson U-statistics.
The results of Section 5 are specialised in Section 6 to the case of edge-counting statistics associated with general random geometric graphs. Several careful comparisons with the existing literature (in particular [11,33]) are presented.
Section 7 contains further estimates on U-statistics of order two, that are proved by adapting some techniques introduced in [17,34]. Geometric applications to edgelength functionals are discussed in detail.
As anticipated, in Section 8 we apply the estimates of Section 3, in order to deduce a novel intrinsic proof of the concentration estimates for the convex distance for random point measures established in [31]. Such a fundamental object generalises to the framework of random point processes the celebrated convex distance introduced by Talagrand in [41]; see [33,22] for several applications. We stress that the problem of finding an intrinsic proof of the striking concentration results from [31] has been one of the main motivations for elaborating the theory developed in the present paper.
1.4. Further remarks on the literature. The inequalities obtained in this paper (as well as some techniques exploited in the proofs) are very close in spirit to those appearing in the seminal references [3,4,26], where the so-called entropy method (roughly corresponding to a combination of the Herbst argument and of logarithmic Sobolev inequalities -see e.g. [25]) is developed in the framework of functions of finite vectors of independent random elements. We recall that the results from [3,4,26] typically apply to random variables with the form F = f (X 1 , ..., X n ), where X = (X 1 , ..., X n ) is a vector of independent random elements and f is some deterministic measurable function, and are based on a pervasive use of random difference operators of the type ∆ i f (X) = f (X) − f (X 1 , ..., X i−1 , X i , X i+1 , ..., X n ), i = 1, ..., n, where X is an independent copy of X. By inspection of the results presented below, it is not difficult to show that, in the case where the intensity µ of the Poisson measure η is finite and non-atomic, some versions of the main results of the present paper could be obtained by implementing the following rough strategy: (i) Select a sequence of measurable partitions {B n 1 , ..., B n k(n) : n ≥ 1} of X, in such a way that k(n) → ∞ and max i=1,...,k(n) µ(B n i ) → 0, as n → ∞. (ii) Consider a random variable F = F (η) and represent it in the form F = f n (X n,1 , .., X n,k(n) ), where X n,i is defined as the restriction of η to the set B n i , and f n is some appropriate measurable mapping. (iii) For a fixed n, prove a concentration estimate for F (η) by applying the results from [3,4,26] to f n (X n,1 , .., X n,k(n) ), in particular by considering an independent copy of X n,1 , .., X n,k(n) defined in terms of an independent Poisson measure η on X with intensity µ. (iv) Let n → ∞, and recover a bound involving quantities related to the add-one cost operator DF described above, by exploiting the fact that independent Poisson measures with non-atomic intensities have almost surely disjoint supports.
Apart from the fact that this approach only works with finite intensity measures without atoms, some investigations in this direction have convincingly shown us that (to the best of our expertise), in order for the step described at Point (iv) to take place in a meaningful way, one should systematically add to our statements some additional technical assumptions, that are indeed not required if one implements the direct approach based on the Mecke formula that is systematically adopted in this paper. An analogous phenomenon can be observed for instance in [15,Theorem 4], where a weaker version of the Poincaré inequality on the Poisson space is deduced by means of a discretisation procedure similar to the one outlined above, and of the use of the classical Efron-Stein inequality. In view of these remarks, we decided not to directly exploit the connection with the entropy method on product spaces in the proofs of our main results.
Another collection of results that is relevant for our paper is contained in references [7,16], where the authors obtain concentration estimates by applying integration by parts techniques, in particular by using the properties of the so-called Ornstein-Uhlenbeck semigroup associated with a given Poisson measure -see also [40]. As in the already discussed examples from [43], the estimates contained in these references have an equally problematic geometric interpretation, since they involve integrals of add-one cost operators with respect to the underlying intensity measure µ. Moreover, in order to exploit some probabilistic representation of the Ornstein-Uhlenbeck semigroup, one has also to work on extended probability spaces. It is a natural question to ask whether the Mecke formula could be combined with some of the estimates from [7,16] in order to obtain concentration inequalities that are adapted to a geometric framework. We prefer to think of this issue as a separate problem, and leave it open for further research.

Framework
For the rest of the paper, we shall denote by (X, X, µ) a σ-finite measure space, such that the σ-field X is countably generated and µ(X) > 0. We write η to indicate a Poisson point process on (X, X). This means that η = {η(A) : A ∈ X 0 } is a collection of random variables, defined on some probability space (B, B, P) and indexed by the elements of X 0 = {A ∈ X : µ(A) < ∞}, such that the following properties are satisfied: (i) for every fixed A ∈ X 0 , η(A) is a Poisson random variable with parameter µ(A), and (ii) for every collection of pairwise disjoint A 1 , ..., A n ∈ X 0 , one has that the random variables η(A 1 ), ..., η(A n ) are stochastically independent.
As usual, we interpret η as a random element in the space N = N(X) of integervalued σ-finite measures ξ on X equipped with the smallest σ-field N making the mappings ξ → ξ(B) measurable for all B ∈ X; see e.g. [36] or [23]. The standard notation x ∈ η is shorthand in order to indicate that the point x is an element of the support of η. We writeη for the compensated random (signed) measure η − µ. We shall write F = F (η) to indicate that a given random variable F can be written in the form F = f(η), P-a.s., for some measurable function f : N → R; such a function f (which is uniquely determined by F up to sets of P-measure zero) is customarily called a representative of F , and F is called a Poisson functional.

Remark 2.1 (Some conventions).
In what follows, we will indifferently use the notation F and F (η) if there is no ambiguity. Also, for any ξ ∈ N, the notation F (ξ) refers to f(ξ), where f is a fixed representative of F . Finally, we observe that, in the statements of some of our main results, we will often work under the assumption that the add-one cost operator D x F (ξ) verifies a given property P (for instance, D x F (ξ) ≤ 0) for every x ∈ X and every ξ ∈ N: this requirement means of course that there exists a representative f of F such that the quantity f(ξ +δ x )−f(ξ) verifies P for every x ∈ X and every ξ ∈ N.
We will systematically use the following standard notation: for every ξ ∈ N, A result that we shall use in several occasions (and that in some sense represents the backbone of our approach) is the following well-known Slivnyak-Mecke formula: for every m ≥ 1 and every non-negative measurable function H on N × X m , one has that where δ x stands for the Dirac mass at x and η (m) is the point process on X m with support η m = and η (m A standard proof of the fundamental relation (2.4) can be found e.g. in [36, Theorem 3.2.5 and Corollary 3.2.3], in the case of a non-atomic intensity µ. The result extends straightforwardly to the case of a general σ-finite measure µ -see e.g. [23]. When specialised to the case m = 1, relation (2.4) is known as Mecke formula, and boils down to the following identity: for every non-negative measurable function H on N × X, one has that For the rest of the paper, for every integer k ≥ 1 and every real p > 0, we will write L p (µ k ) := L p (X k , X ⊗k , µ k ), and also use the shorthand notation L p (µ 1 ) = L p (µ). In Section 5.4, the symbol L 2 (µ 0 ) is used to denote the real line R, endowed with the usual Euclidean inner product.

Deviation Inequalities for Poisson Functionals
In the following, we are going to develop new tools for proving deviation inequalities of Poisson functionals. Our approach is an adaptation of the entropy method for product space functionals that was particularly investigated in [4].
The heart of the method we are about to present is the modified logarithmic Sobolev inequality stated below. For the remainder of the section, we consider a Poisson functional F . As above, we will use the difference (or add-one cost) operator DF , that is defined for any (x, ξ) ∈ X × N by where δ x denotes the Dirac mass at x ∈ X. To shorten notations, we write x ∈ X and ξ ∈ N. In the same spirit we will also use the notations x F (ξ) and D − x F (ξ) are defined analogously, and so are the operators D >β x F and D <β x F (with strict inequalities). The following observation is derived by combining Wu's modified logarithmic Sobolev inequality for Poisson point processes (1.1) with the Mecke formula (2.5).
For any β ∈ R we define the random variables Note that we will write V + = V + 0 and V − = V − 0 . The notation is in correspondence with [4] where the entropy method for product spaces was investigated. The upcoming result can be regarded as a generalized analogue of [4,Theorem 2] for the Poisson space. The proof is similar to the product space version where Proposition 3.1 takes now the role of the log Sobolev inequality. The generalization is achieved using arguments similar to those in the proof of [43, Proposition 3.1]. To get prepared for the presentation of the theorem, for β ∈ R and z > 0, let where φ and ψ are as in Proposition 3.1. Note that we will frequently use the fact that these functions are non-decreasing.
Assume that F is not necessarily integrable and that one of the following conditions is satisfied: Then, the relation E exp(λV + β /θ) < ∞ implies that E|F | < ∞ and E exp(λF ) < ∞. Also, the relation E exp(λV − β /θ) < ∞ implies that E|F | < ∞ and E exp(−λF ) < ∞. With the above results the methods for deriving deviation inequalities presented in [4] naturally carry over to the Poisson space. In the following we present some variations of these techniques that will be used for the applications later on. In the case when V + β and V − β for β = 0 are almost surely bounded by a constant, the above entropy inequalities yield exponential tails for the random variable F (η). If V + and V − are almost surely bounded (i.e. β = 0), we even obtain Gaussian tails. Note that Wu's deviation inequality [43,Proposition 3.1] is implied by the following more general results. Corollary 3.3. Assume that F satisfies V + β ≤ c almost surely. Then F is integrable and the following statements hold: (i) If either condition (i) or (ii) of Theorem 3.2 is satisfied, then for all r ≥ 0, log 1 + |β|r c .
(ii) If β = 0, that is if V + ≤ c holds almost surely, then for all r ≥ 0, We continue with the corresponding version for the lower tail. This corollary is obtained in the same way as the above one where inequality (3.7) is used instead of (3.6). The proof is therefore omitted.
Corollary 3.4. Assume that F satisfies V − β ≤ c almost surely. Then F is integrable and the following statements hold: (i) If either condition (i) or (ii) of Theorem 3.2 is satisfied, then for all r ≥ 0, (ii) If β = 0, that is if V − ≤ c holds almost surely, then for all r ≥ 0, The following result is useful to obtain deviation inequalities under less restrictive boundedness conditions on V + .
Corollary 3.5. Assume that F ≥ 0 and that there is a random variable G ≥ 0 and an α ∈ [0, 2) such that almost surely Let θ > 0 and λ ∈ (0, 2/θ) be such that E exp(λG/θ) < ∞. Then EF 1−α/2 < ∞ and In the case when the random variable G in the above corollary is just a constant, we obtain the following deviation inequality for the upper tail.
Corollary 3.6. Assume that F ≥ 0 and that for some α ∈ [0, 2) and c > 0 we have almost surely Then F is integrable and for all r ≥ 0, The next result is a variation of Corollary 3.5 for Poisson functionals that are not necessarily non-negative. This is the Poisson space analogue of [4,Theorem 5].
Theorem 3.7. Assume that the Poisson functional F is integrable and that for some a > 0 and b ≥ 0 we have almost surely Then for any λ ∈ (0, 2/a) we have E(e λF ) < ∞ and Moreover, for any r ≥ 0, We continue with a result that applies whenever F is non-decreasing and V − is non-decreasing and integrable. In this case, the random variable F has a Gaussian lower tail: Then F = F (η) is integrable and for all r ≥ 0 we have Remark 3.9. Assume that the Poisson functional F is non-decreasing. A sufficient condition for the assumption DV − ≥ 0 in the above theorem is that the second interation of the difference operator of F is non-negative. Indeed, assume that So we see that also D(DF ) 2 is non-negative, thus yielding We conclude this section with a result that deals with the situation when F is nondecreasing and the difference operator DF is bounded. In this case, we obtain a deviation inequality for the lower tail by controlling the random variable V + . This is a Poisson space analogue of [27,Theorem 13].
Theorem 3.10. Assume that F ≥ 0 and that for some a > 0 we have Then F is integrable and for any r ≥ 0 we have

Proofs
We begin with the proof of the crucial logarithmic Sobolev type inequality, namely Proposition 3.1, that is the foundation of our techniques.
The following lemma will be used occasionally in the upcoming proofs.
Proof. The proof uses a truncation argument that is standard in this context, see e.g. the proof of [43, Proposition 3.1]. The statement for V − β is proved in the same way than for V + β . Consider for any n ∈ N the truncation .
This would be a contradiction to F n → F in probability. We see that the family {EF n } n∈N is bounded. Together with sup n∈N VF n < ∞ this implies also sup n∈N E(F 2 n ) < ∞. Thus, the family {F n } n∈N is uniformly integrable. In particular, as desired, we have E|F | < ∞.
We continue with the proof of Theorem 3.2. As in the proof of the product space version [4, Theorem 2], we also need [26,Lemma 11]. This result states that for any λ > 0 and any two random variables X and Y satisfying E(e λX ), E(e λY ) < ∞, we have Proof of Theorem 3.2. We prove (3.6). We only deal with the case β = 0, whereas the case β = 0 can be obtained with by similar arguments. To prove the desired inequality we adapt the proof of [4, Theorem 2] and combine this with arguments from the proof of [43, Proposition 3.1]. Let φ and ψ be as in Proposition 3.1. Then ψ(z)/z 2 and φ(z)/z 2 are non-decreasing. Hence, for any u ∈ (0, λ] we have Together with ψ(0) = φ(0) = 0 this gives Hence, taking Moreover, taking X = V + β /θ and Y = F , it follows from (4.8) that Invoking the definition of the entropy, it follows from the last two displays that Since by assumption Φ β (u)θ ≤ Φ β (λ)θ < 1, the latter inequality is equivalent to .
Defining h(u) = 1 u log E(e uF ) and g(u) = log E(e uV + β ), the above estimate can be restated as follows: for any u ∈ (0, λ]. Since lim u→0+ h(u) = EF , integration from 0 to λ gives It's a well known fact that the logarithm of a moment generating function is convex, hence g is convex on the interval [0, λ/θ]. In particular, we have for any u ∈ (0, λ] that . In the case β < 0, we bound the integral on the right hand side by Φ β (λ) = Ψ β (λ). This works since Φ β (u)/u is non-decreasing. In the case β > 0, the integral can be explicitly computed and one obtains This proves inequality (3.6). Repeating the above reasoning for −F instead of F where the set I is replaced by its complement proves inequality (3.7).
To prove the second part of the theorem, assume that one of the conditions (i) to (iii) is satisfied. For n ∈ N consider the truncated random variables We will now conclude that if E exp(λV + β /θ) < ∞, then F is integrable and the family of random variables converges in probability to exp(λ(F − EF )) and is uniformly integrable. Thus, it follows that E exp(λ(F − EF )) < ∞ and hence also E exp(λ(F )) < ∞. Integrability of F follows from Lemma 4.1 since the assumption E(exp(λV + β /θ)) < ∞ implies that EV + β < ∞. By dominated convergence, integrability of F now implies the convergence in probability of the sequence in (4.10). To prove the uniform integrability, first observe, that if (i), (ii) or (iii) holds, then Also note that we can choose ν > 1 such that Φ β (νλ)νθ < 1. Then E(e λνFn ) < ∞ for all n ∈ N, so it follows from (3.6) that Denoting the map x → x ν by Λ, the above inequality yields By the Theorem of de la Vallée-Poussin this implies uniform integrability of the family in (4.10).
Repeating the above reasoning for −F instead of F where inequality (3.7) is used instead of (3.6) proves the corresponding statement for V − β .
Proof of Corollary 3.3. It follows from Lemma 4.1 that F is integrable. Theorem 3.2 yields log E(e λ(F −EF ) ) ≤ inf Markov's inequality now gives for any λ > 0, Optimizing in λ yields the desired deviation bounds.
Proof of Corollary 3.5. Here we adapt and combine the proofs of [4, Theorem 8 and Theorem 9]. For α = 0, the statement follows directly from Theorem 3.2, so let α ∈ (0, 2). Let γ = 1 − α/2. Then, on the event {F = 0}, we have Hence, the above expression does not exceed Quite similarly one obtains that on the event {F = 0}, Hence, it follows that on the event Moreover, it is easy to check that on the event {F = 0, V + ≤ GF α }, one has that V + (F γ ) = 0 = V + . Therefore, by virtue of the assumption that almost surely V + ≤ GF α , it follows that almost surely V + (F γ ) ≤ G. Applying Theorem 3.2 to the random variable F γ yields the result.
Proof of Corollary 3.6. For α = 0, the statement follows directly from Corollary 3.3 (ii), so let α ∈ (0, 2). Let γ = 1 − α/2. Continuing in the same way as in the proof of Corollary 3.5 yields that almost surely We conclude that Corollary 3.3 (ii) applies to F γ . So F γ is non-negative and has an exponentially decaying upper tail. Thus, by virtue of [19,Lemma 3.4], all moments of F γ exist. In particular, F is integrable. As it was pointed out in [4, p. 1588], we can now write We continue with the proof of Theorem 3.7. To get prepared for this, we first establish the following lemma.
Lemma 4.2. Let n ∈ N and consider F n = min(max(F, −n), n) and V + (n) = V + (F n ). Then for any real number b ≥ 0, almost surely Proof. It is easy to see that V + (n) ≤ V + . Hence, the desired statement holds on the event {F = F n }. If F = F n , then either F n = n < F or F n = −n > F . The latter case implies V + (n) = 0 and F, F n ≤ 0, hence the desired statement holds. So consider the case F n = n < F and let A = F/n. Then the desired inequality is equivalent to To prove this, it suffices to conclude We prove (4.11). If F (η − δ x ) > n, then Hence, This proves (4.11) and analogously one obtains (4.12). The result follows.
Proof of Theorem 3.7. For the case when F is bounded, we adapt the proof of [4,Theorem 5]. Here we can argue in the same way as in the beginning of the proof of Theorem 3.2 to obtain for any u ∈ (0, λ], Invoking the assumption on V + yields With h(u) = 1 u log E(e uF ) this can be rearranged as Integrating this from 0 to λ gives Noting that aλ < 2 and rearranging the above inequality, we obtain the result for the bounded case. For the unbounded case, consider for any n ∈ N the truncated random variables It follows from the assumptions and Lemma 4.2 that almost surely Therefore, almost surely V + (n) ≤ aF n + b, so the result holds for all F n . By dominated convergence, the sequence EF n is convergent, hence bounded above by some constant C. Moreover, we can choose a ν > 1 such that νλ < 2/a. Thus, since we already proved that the result applies to all the F n , we conclude By the Theorem of de la Vallée-Poussin this implies that the family of random variables is uniformly integrable. Continuing as in the proof of Theorem 3.2 gives We note again that the result is already proved for the F n and that EF n → EF as n → ∞. This concludes the proof of the first inequality. The deviation inequality now follows using the inequality we just proved together with Markov's inequality and [4, Lemma 11].
For the proof of Theorem 3. Then It was also remarked in [18] that under conditions like F, G ≥ 0 or EF 2 , EG 2 < ∞, the above result easily extends to unbounded functionals by monotone convergence. For our purpose we need the following extension.
Corollary 4.4. Let F, G ≥ 0 be Poisson functionals. Assume that F is bounded and G is integrable. Moreover, assume that Then Proof. Since F is bounded, it follows from EG < ∞ that also E(F G) < ∞. Now consider for any n ∈ N the truncations G n = min(G, n). Then we have almost surely G n → G and F G n → F G as n → ∞. By monotone convergence, EG n → EG and E(F G n ) → E(F G) as n → ∞. It follows from Lemma 4.3 that for any n ∈ N, The result follows.
The following proof is inspired by ideas from the proof of [4,Theorem 6].
Proof of Theorem 3.8. For any n ∈ N consider the truncations Then the F n are again non-decreasing. Let λ < 0. It follows from Proposition 3.1 with I = X × N that for any u ∈ [λ, 0) we have Since ψ(−z) ≤ (1/2)z 2 for z ≥ 0, the right hand side of the above expression does not exceed in the above display can be upper bounded by E(e uFn V − ). Now, since F n is non-decreasing and u < 0, the functional e uFn is non-increasing and bounded. Moreover, by assumption the functional V − is non-decreasing and EV − < ∞. Hence, by Corollary 4.4 we have It follows that Integrating from λ to 0 yields Since EV − < ∞, by Lemma 4.1 we have E|F | < ∞. Thus, applying the Theorem of de la Vallée-Poussin similarly as in the proof of Theorem 3.2, we conclude that the inequality in the last display also holds for the random variable F . Using Markov's inequality and optimizing in λ yields the result.
To prove the statement of Theorem 3.10 for bounded F , we adapt the proof of the product space version [27,Theorem 13]. To extend the result to unbounded F , Lemma 4.2 is used similarly as it was done in the proof of Theorem 3.7.
Proof of Theorem 3.10. First consider the case when F is bounded. Let λ < 0 and u ∈ [λ, 0). Then by Proposition 3.1 with I = ∅ we have Moreover, since 0 ≤ DF ≤ 1, we have −uD x F (η − δ x ) ≤ −u and since the map z → φ(z)/z 2 is increasing, this implies Dividing by u 2 E(e uF ) and integrating from λ to 0 yields Similarly as in the proof of Theorem 3.7, the above inequality can be extended to the case when F is unbounded. Here one should notice that according to Corollary 3.6, the condition V + ≤ aF guarantees E|F | < ∞. It was pointed out in [27] that for any λ < 0, Markov's inequality now gives P(F ≤ EF − r) ≤ E(e λ(F −EF ) )e λr ≤ exp λ 2 max(a, 1) 2 EF + λr .
Optimizing in λ concludes the proof.

5.
Applications to U-Statistics 5.1. General remarks. The aim of the present section is to investigate the concentration properties of Poisson U-statistics. For this purpose, we need to specialize the very general framework that was in order so far. Throughout this section, the intensity measure µ on the space X is assumed to be non-atomic, that is, {x} ∈ X and µ({x}) = 0, for every x ∈ X. This assumption is equivalent to the fact that the Poisson process η on X is simple, meaning that almost surely η({x}) ≤ 1 for all x ∈ X. It is common practice in this setting to identify the simple point process η with its support, which now corresponds to a random set in X. Plainly, the integral of a map f : X → R with respect to η is now exactly given by the (possibly infinite) sum for all x ∈ X. Then, since η is simple, we have that almost surely F = f([η]). It follows that another representative of F is given by f : Therefore, without loss of generality, we can assume that F (ξ) = F ([ξ]) for all ξ ∈ N, that is, given an arbitrary functional ξ → F (ξ), in this section we will systematically select a representative of F that only depends on ξ via the mapping ξ → [ξ]. With this convention, one has that D x F (ξ) = D x F ([ξ]), and also that D x F (ξ) = 0 whenever ξ({x}) > 0. Finally we observe that, again by virtue of the above convention and in accordance with the content of Remark 2.1, the fact that the quantity D x F (ξ) verifies some property P for every x ∈ X and every ξ ∈ N is equivalent to the fact that P is verified for all (x, ξ) ∈ X × N such that ξ charges each singleton with a mass at most equal to 1.
We now recall some relevant definitions. Let f : X k → R ≥0 be a symmetric measurable map and define the functional A (Poisson) U-statistic F of order k with kernel f is a random variable such that almost surely F = S f (η). According to the Slyvniak-Mecke formula (2.4), the expectation of a U-statistic F is given by see e.g. [32, Section 3] for more details as well as for an introduction to U-statistics with kernels that may have arbitrary sign.

5.2.
Choice of a representative. In order to apply results from Section 3 to a Poisson U-statistic F with kernel f ≥ 0, we first need to choose a suitable representative of F as defined in Section 2. Whenever the considered U-statistic F is almost surely finite, we can choose as a representative of F the map f : N → R defined by In order to avoid technical problems arising from the choice of this representative, we will often assume that a given U-statistic with kernel f is well-behaved. By this we mean that there exists a measurable set B ⊆ N with P(η ∈ B) = 1, such that If F is well-behaved, then we will choose as a representative of F the map f : Note that by virtue of (2.3) and (5.13), the above choices of a representative imply F (ξ) = F ([ξ]) for all ξ ∈ N which is consistent with Remark 5.1. Finally, note that U-statistics that arise in typical applications (in particular, all U -statistics considered in this paper) are usually well-behaved in the sense described above. 5.3. General results. We will use an explicit expression for the difference operator of a U-statistic that was established in [32]. The following result gathers together several results from [32, Lemma 3.3 and Theorem 3.6], in a form that is adapted to our setting.
Proposition 5.2. Let the above assumptions and notation prevail, let F be a Ustatistic with non-negative kernel f and let S f be as in (5.13). Then, for any ξ ∈ N and x ∈ ξ, one has where, for any ξ ∈ N and every x ∈ ξ such that ξ({x}) = 1, the local version of F is defined as where ξ \ x is shorthand for the set obtained by deleting x from the support of ξ, and As a direct consequence of the above result together with our canonical choices of a representative, described in Section 5.2, we obtain: Corollary 5.3. Let F be a U-statistic with non-negative kernel f . Then the following statements hold: (i) If F is almost surely finite, then there exists a measurable set B ⊆ N that satisfies P(η ∈ B) = 1 such that for any ξ ∈ B and x ∈ ξ, the local version F (x, ξ) is finite and (ii) If F is well-behaved, then there exists a measurable set B ⊆ N that satisfies P(η ∈ B) = 1 such that the following holds: (a) For any ξ ∈ B, x ∈ ξ and z ∈ X, the local versions F (x, ξ) and F (z, ξ + δ z ) are finite, and moreover (b) For any ξ ∈ B c , x ∈ ξ and z ∈ X, one has The previous Corollary 5.3 implies that, if F is an almost surely finite U-statistic with kernel f ≥ 0, then almost surely If F is in addition well-behaved, then almost surely We have therefore the following consequences of Corollary 3.6 and Theorem 3.8.
Corollary 5.4. Consider an almost surely finite U-statistic F of order k with nonnegative kernel f . Assume that for some α ∈ [0, 2) and c > 0 we have almost surely Then F is integrable and for all r > 0, Corollary 5.5. Consider a well-behaved U-statistic F of order k with non-negative kernel f . Assume that Then, for all r > 0 we have Proof. We have EV − (F ) = k 2 EV < ∞ and since F is well-behaved, it follows from Corollary 5.3 (ii) that D x F (ξ) ≥ 0 for any (x, ξ) ∈ X × N. So the result follows from Theorem 3.8 together with Remark 3.9 once we proved that DDF ≥ 0. According to [32], and since F is well-behaved, for any (z, x, ξ) ∈ X × X × N, the second iteration of the difference operator either satisfies D z D x F (ξ) = 0 or it can be written as The right-hand side of the above display is non-negative since f ≥ 0.

5.4.
Computing V in formula (5.15). We will now provide a direct proof that condition (5.15) is equivalent to the fact that F is a square-integrable U-statistic and that one can obtain a rather explicit expression for V in terms of some set of auxiliary kernels built from f . Definition 5.6. Let f be a symmetric element of L 1 (µ k ), for some k ≥ 1. For i = 1, ..., k, we define the kernels f i as follows: if the integral on the right-hand side is well defined, and f i (y 1 , ..., y i ) = 0 otherwise.
Observe that, since f is in L 1 (µ k ), then the class of those (y 1 , ..., y i ) such that the integral on the right-hand side of (5.16) is not defined has measure µ i equal to zero, for every i = 1, ..., k. Plainly, each f i is a symmetric mapping from X i into R and f i ∈ L 1 (µ i ), for every i = 1, ..., k, and f k = f by definition.
The upcoming result provides new necessary and sufficient conditions for the squareintegrability of U-statistics. Although the investigations in the present paper (and hence also in the result below) are restricted to U-statistics with non-negative kernels, we stress that this assumption is not needed in the forthcoming proof, and thus, after appropriately adapting the notion of a well-behaved U-statistic for kernels with arbitrary sign, the presented characterization for square-integrable U-statistics also applies when the kernels are not necessarily assumed to be non-negative.
Proposition 5.7 (Characterization of square-integrable U -statistics). Consider a well-behaved U-statistic F of order k ≥ 1, with non-negative kernel f ∈ L 1 (µ k ). Then, the following assertions are equivalent: where the kernels f i have been introduced in Definition 5.6; (iii) V < ∞, where V is defined in (5.15).

If either one of conditions (i), (ii) or (iii) is verified, then
Proof. [Step 1: (i) → (ii), (iii) ] According to [32,Theorem 3.6], if F is a U-statistic as in the statement and F is square-integrable, then necessarily f i ∈ L 1 (µ i ) ∩ L 2 (µ i ) for every i = 1, ..., k, and moreover F admits the following representation: where I i denotes a multiple Wiener-Itô integral of order i, with respect to the compensated Poisson measureη = η − µ (see e.g. [28,Chapter 5] for definitions). Note that, exploiting the standard orthonormality properties of multiple integrals, one has also that which corresponds to the first relation in (5.17). Combining [23, Theorem 3.3] with the previous discussion, one also infers that, if F is square-integrable, then a version of the add-one cost operator DF is given by where I i−1 (f i (x, ·)) indicates a multiple Wiener-Itô integral of order i−1, with respect toη = η − µ, of the kernel f i (x, ·) : X i−1 → R, obtained from f i (see Definition 5.6) by setting one of the variables in its argument equal to x; observe that, as usual, the right-hand side of (5.18) is implicitly set equal to zero on the exceptional set of those x ∈ X such that f i (x, ·) / ∈ L 2 (µ i−1 ) for at least one i ∈ {1, ..., k}. Exploiting the standard orthonormality properties of multiple integrals, one has therefore that so that the conclusion (as well as the explicit expression of k 2 V = E X (D x F ) 2 dµ(x) appearing in (5.17)) follows from an application of the Fubini Theorem.
[Step 2: (ii) → (i)] Assume that, for every i = 1, ..., k, f i ∈ L 2 (µ i ) ∩ L 1 (µ i ). Then, according to [40,Theorem 4.1] the multiple integral I i (f i ) is a well-defined squareintegrable random variable, and moreover The previous discussion yields that where we have used the fact that the sum k i=j k i i j (−1) i−j equals one if j = k, and vanishes otherwise. It follows that F is square-integrable, since it is equal to a finite sum of square-integrable random variables.
[Step 3: (iii) → (ii)] If V < ∞, then there exists a measurable set B ⊂ X such that µ(B c ) = 0, and E(D x F ) 2 < ∞, for every x ∈ B. Using [32, Lemma 3.5, Theorem 3.6] together with the fact that, since F is well-behaved, D x F is the (well-behaved) U-statistic of order k − 1 defined in Proposition 5.2, we immediately deduce that, for x ∈ B, one has that (adopting the same notation as in Step 1) f i (x, ·) ∈ L 2 (µ i−1 ), and also The conclusion follows by using once again (5.19) and the Fubini Theorem.
Remark 5.8. According e.g. to [29,Lemma 3.1], the condition E X (D x F ) 2 dµ(x) < ∞ is equivalent to the fact that F belongs to the domain of the Malliavin derivative associated with η. This fact is consistent with the fact that square-integrable U -statistics have a finite Wiener-Itô chaotic expansion, and therefore belong automatically to the domain of the Malliavin derivative.
An application of Proposition 5.7 to the estimation of lower tails for edge-counting in random geometric graphs (involving in particular U-statistics of order k = 2) is presented in Section 6.3.

Applications to edge counting
In this section, we let η denote a Poisson point process on (R d , B(R d )), with intensity given by a Borel measure µ (in particular, µ(K) < ∞ for every compact set K). We also assume again that µ has no atoms, that is, µ({x}) = 0 for every x ∈ R d . For a fixed ρ > 0, we shall consider the graph G (often called the Gilbert graph, or the disk Graph with radius ρ associated with η) obtained as follows: the vertex set of G is given by the points in the support of η, and two vertices x, y are linked by an edge (in symbols, x ↔ y) whenever 0 < x − y ≤ ρ (in particular, G has no loops). For technical reasons clarified below, we will assume for the rest of the section that the following condition on µ is verified: denoting B(x, ρ) the closed ball of radius ρ centered in x, Relation (6.20) is verified whenever µ(R d ) < ∞, but such a finiteness condition is not necessary for (6.20) to hold 1 . Note that, if µ is Borel and (6.20) is in order, then the mapping x → µ(B(x, ρ)) is necessarily bounded. To see this, choose γ > 0 such that the ball B(0, ρ) can be written as a union of 1/γ many sets with diameter less than ρ. Then the pigeonhole principle yields that for any y ∈ R d we can choose a set C y ⊂ B(y, ρ) satisfying: (i) C y ⊆ B(x, ρ) for all x ∈ C y , and (ii) µ(C y ) ≥ γµ(B(y, ρ)). Now, Originally introduced in 1959 by Gilbert in the seminal work [13], the disk graph G is the archetypical example of a random geometric graph. Since then, the study of such an object has been at the center of a formidable collective effort, both at a theoretical and applied level. We refer the reader to the fundamental monograph [30] for a detailed overview of the literature on Gilbert graphs up to the year 2003.
In this section, we will provide new concentration estimates for the random variable corresponding to the number of edges of G. It is immediately seen that N is a Poisson U-statistic of order 2 with positive kernel f (x, y) = 1 2 1{ x − y ≤ ρ}. In particular, the Slivniak-Mecke formula (2.4) together with a standard use of the Fubini Theorem yields that the assumption (6.20) is actually equivalent to integrability of N and that We also see that assumption (6.20) implies that N < ∞ almost surely, yielding in turn that N is well-behaved.
The above considerations yield that there exists a constant C ≥ 0 such that, for r large enough along any subsequence diverging to infinity, Dividing this inequality by I(r) and letting r diverge to infinity gives lim inf r→∞ γ(r) log(γ(r)/m) The conclusion is obtained by observing that, as r → ∞, γ(r) log(γ(r)/m) ∼ (r/2) 1/2 log r.
The following statement is an elementary consequence of Proposition 6.1.
Corollary 6.2. Let r → I(r) be a positive mapping verifying (6.21), and assume that there exist constants a, b > 0 such that, as r → ∞, I(r) ∼ b r a . Then, necessarily, a ≤ 1 2 . 6.2. Deviation inequalities for the upper tail. We will now deal with bounds on the upper tail of N . We start by observing that, for every x ∈ η, the local version N (x, η), as defined in (5.14), is exactly given by the quantity deg(x)/2, where deg(x) = #{y ∈ η : x ↔ y} is the degree of the vertex x. Our aim in what follows is to show that, for some constant c > 0, one has that almost surely x∈η deg(x) 2 ≤ cN 3/2 . (6.24) Hence, Theorem 3.6 yields the following deviation inequality for the upper tail: Observe that the right-hand side of (6.25) has the form exp(−I(r)), where I(r) ∼ r 1/2 /2c, as r → ∞. According to Corollary 6.2, the power 1/2 for r is optimal in this situation. We will see in Section 7.1 that, by adopting an alternative approach, the rate of decay of I(r) can indeed be improved by the square root of a logarithmic factor. Also notice that by virtue of relation ( In the plane R 2 one has for example that p = 3. The picture below illustrates the situation described in the proof of the upcoming lemma. Let T ξ and N ξ denote the number of triangles and edges in G ξ , respectively. Then This inequality also holds for the left-degree instead of the right-degree. Proof. For the rest of the proof, we write G, T and N , without the subscript ξ, to simplify the notation. Without loss of generality, we can assume that ξ only contains a finite number of non-isolated points in the topology of the graph; otherwise N = ∞ and the estimate in the statement is trivially satisfied. Let x ∈ ξ and denote by T (x) the number of those triangles {x, y, z} in G incident to x, and such that x 1 < min(y 1 , z 1 ), where a 1 indicates the first coordinate of a given vector a ∈ R d . Moreover, for i = 1, . . . , p let n i denote the number of elements of ξ contained in B i + x. Then, since any two vertices contained in the same B i + x yield an edge and thus a triangle incident to x in the sense described above, we have In view of the relation i n i = deg r (x), the right-hand side of the previous expression can be further bounded from below, thus yielding the relation: Observe that in the sum x∈ξ T (x) each triangle is counted at most once, thus x∈ξ T (x) ≤ T . Also, in the sum x∈ξ deg r (x) each edge is counted at most once, so this sum is less than N .
We now deduce a bound on x∈ξ deg(x) 2 for any countable point configuration ξ. Corollary 6.4. Let ξ ⊂ R d be countable. Let G ξ be the disk graph (with some arbitrary radius ρ > 0) associated with ξ and denote the number of edges of G ξ by N ξ . Then, Proof. First we observe that without loss of generality it can be assumed that the first coordinates of all elements in ξ are distinct. Obviously, the combinatorial structure of G ξ is invariant under rotation of the set ξ. Also, the assumption in question can be achieved to hold by rotating the set ξ with respect to a direction a ∈ R d , a = 1 that satisfies (x − y)/ x − y = ±a for all distinct x, y ∈ ξ. Such a direction exists since the set of directions {(x − y)/ x − y : (x, y) ∈ ξ 2 = } is countable and hence a strict subset of all directions {a ∈ R d : a = 1}. So, it can be assumed that the elements of ξ have distinct first coordinates. In particular, for any x ∈ ξ we have deg(x) = deg r (x) + deg l (x). Thus By Lemma 6.3, the latter expression does not exceed where T ξ stands for the number of triangles in G ξ . The result follows by using an estimate taken from [35], where it was proven that The next statement is one of the main achievements in the present section.
(i) Let ξ ⊂ R d be countable. Let G ξ be a disk graph with arbitrary radius ρ associated with ξ, and denote the number of edges of G ξ by N ξ . Then, p + 32 9 p 2 + 4p − 1.
(ii) Now let η be the Poisson measure on R d with non-atomic intensity µ considered in this section, and denote by G η = G the random disk graph (with arbitrary radius ρ) associated with η. Let N η = N be the number of edges of the random disk graph G. Then relation (6.24) holds almost surely, for c = 8 In particular, the tail estimate (6.25) is verified. Proof.
[Proof of (i)] By Corollary 6.4 we have Observe that among all graphs with N ξ edges, the star, i.e. the graph with deg(x) = N for one vertex x and deg(y) = 1 for all other vertices y, maximises the sum of the squared degrees. Thus, we also have is monotonically decreasing in N ξ and N ξ + 1 N ξ is monotonically increasing in N ξ , the minimum of both functions is always less or equal to the value at the intersection of them. Computing this value yields the result.
[Proof of (ii)] This follows directly from part (i) of the statement.
6.3. Deviation inequalities for the lower tail. We now focus on lower tails. In order to do that, we introduce the notation and we define the parameter v as v := 2(K + 1) EN.
Our main estimate is the following: Theorem 6.6. For every r > 0 one has the estimate Proof. We will freely use the notation and definitions introduced in Section 5.4. Since N is a U-statistic of order 2 with kernel f (x, y) = 1 2 1{ x − y ≤ ρ}, by virtue of Proposition 5.7 it is sufficient to show that, in this case, 4V ≤ v. In order to do that, observe first that f 1 (x) = µ(B(x, ρ)) and f 2 = f . As a consequence, Plugging the above upper bounds into the definition of V yields the inequality 4V ≤ v, and therefore the desired conclusion.
6.4. Comparison with the literature. We shall now briefly compare the concentration inequalities for edge counting presented above with the results already existing in the literature. To the best of our knowledge, the only concentration inequalities known so far that apply to the setting of edge counting in random disk graphs over Poisson point configurations, are established in [33] and [11]. Both papers deal with the case where the intensity measure µ is finite.
Comparison with [11]. We shall use some computations from [33], where it is explained how the general results about stabilizing functionals from [11] apply to random graph statistics. In the special case of edge counting, using the notation and assumptions of the present section, one deduces indeed from [33, Proposition 5.1] an estimate of the type P(|N − EN | ≥ r) ≤ exp(−I 0 (r)), where I 0 (r) ∼ ar 1/3 , for some positive constant a, as r → ∞. As far as the asymptotic behaviour of r → I 0 (r) is concerned, this result is worse than the estimates that one can obtain from (6.25) (where the argument of the exponential bound on the upper tail is asymptotic to −r 1/2 /2c, and therefore optimal in the sense of Corollary 6.2), and than those given by (6.26) (where we have proved a Gaussian upper bound on the lower tail).
Comparison with [33]. Writing m for a median of the law of N , in [33,Theorem 5.2] one can find an estimate of the type P(|N − m| ≥ r) ≤ exp(−I 1 (r)), where I 1 (r) ∼ br 1/2 , as r → ∞, for some positive constant b. Note that, as r → ∞, the upper tail bound determined by I 1 has the same order as our estimate (6.25), whereas the lower tail estimate is worse than our Gaussian upper bound (6.26). We also stress that [33, Theorem 5.2] has a different nature than our results, since it gives a concentration inequality around the median, and not the expectation. This might be a drawback for applications since the median of the edge count is harder to deal with than the expectation -which can be easily expressed using the Slivniak-Mecke formula. Finally, a major advantage of our results over those presented in [33,Theorem 5.2] is that the latter only applies to disk graphs built on finite intensity measure Poisson processes, whereas our tail estimates merely require that the number of edges is almost surely finite. 6.5. Consistency with CLT. In the following, we compare the deviation inequality for the upper tail with a CLT that was proven in [32]. Let η 1 be a Poisson point process in R d with intensity measure µ 1 and fix some radius ρ > 0. For any n ∈ N, let η n be a Poisson point process with intensity measure µ n = nµ 1 and denote the number of edges in the corresponding random geometric graph by N n . Assume that EN 1 < ∞ (and hence EN n < ∞ for all n). Then by [32,Theorem 5.2], the sequence of random variables N n satisfies a central limit theorem, i.e. (N n − EN n )/ √ VN n converges to a standard Gaussian distribution, where VN n stands for the variance of N n . Therefore, as n → ∞, the sequence of probabilities P(N n ≥ EN n + √ VN n r), n ≥ 1, converges to the quantity According to the next result, the asymptotic behavior of the upper tail deviation inequality is consistent with this Gaussian tail.
Theorem 6.7. Let c > 0 be a constant satisfying (6.24). Then there exists a constant C > 0 and a sequence (x n ) n∈N with x n → ∞ as n → ∞ such that for any n ∈ N, Proof. It was pointed out in [32] that there are constants α, β > 0 such that Let A = √ 2Cc. Then the desired inequality is equivalent to This holds if and only if Now, choose C > 0 such that 4A < α 1/2 β −3/4 . Consider the equality corresponding to the inequality in the above display. For any n ∈ N let x n be the (unique) positive solution of this equality in case such a solution exists, and let x n = 0 otherwise. Then, since the right-hand side of the last display is increasing in r, the desired inequality holds for all r ∈ [0, x n ]. Moreover, the left hand side converges to From this it follows that x n → ∞ as n → ∞.

Another look at U-Statistics of order two
In this section, we develop a different approach for obtaining deviation inequalities for the upper tail of U-statistics of order 2, that is partially inspired by the results from [17,34]. Throughout this section, we let the assumptions of Section 5 prevail; in particular, the intensity µ of η is a non-atomic positive measure on (X, X). We begin by generalizing [34,Theorem 3] to Poisson processes with possibly non-finite intensity measure: Assume that EG < ∞. Then for any λ > 0 we have where φ(λ) = e λ − λ − 1.
Proof. First note that by monotone convergence, we can assume without loss of generality that |J| < ∞. For each n ∈ N let G n = min(G, n). Then E(e λGn ) < ∞, hence Proposition 3.1 with I = ∅ gives Consider some realization of η. Since we assumed |J| < ∞, it follows that for some j * ∈ J we have

From this we obtain
Since φ(−λz) ≤ φ(−λ)z for λ > 0 and 0 ≤ z ≤ 1, it follows from the above considerations that Continuing in the same way as in the proof of [26,Theorem 10] gives Now, since λ > 0 and EG < ∞, by monotone convergence we have Invoking (7.28) yields the result.
We continue with an analytic lemma that is used in the proof of the upcoming theorem, but which might be of independent interest in similar situations. The proof is inspired by the proof of [ λz − e λ 2 + 1 ≥ log(z + 1)z 3/2 4 √ z + 8 .

satisfies almost surely
sup y∈η x∈η\y f (y, x) ≤ cG and EG < ∞. Then for any r > 0 we have where Proof. The assumptions imply that almost surely By Corollary 3.5 and Theorem 7.1 this gives for any λ > 0, Let r > 0. Then, using the above computation and Markov's inequality, we obtain for any λ > 0, Hence, writing z = ( √ EF + r − √ EF )/( √ 4cEG) and substituting λ by λ/ √ 4c, we obtain The result now follows from Lemma 7.2. 7.1. Length power functionals. As an application, we now focus on length power functionals in random geometric graphs. These estimates contain as special case the edge counting statistics that we studied in Section 6. We will see in particular that one can take advantage of the upper tail estimate stated in Theorem 7.3 in order to provide an alternate bound to that appearing in (6.25), which actually displays a strictly faster rate of decay in r. As we did in Section 6, we consider a Poisson measure on R d with σ-finite and non-atomic Borel intensity measure µ. We let ρ > 0 be some radius and consider again the disk graph G(η) associated with η. For any α ∈ [0, 1] the length power functional L (α) is the U -statistic of order 2 with kernel Note that L (0) = N , the number of edges in G(η), and L (1) is just the (total edge) length of the graph. One easily sees that sup y∈η x∈η\y It is also straightforward to check that, if EN < ∞, then the expectation of the right-hand side of the above inequality is finite. We stress also that, if EN < ∞, then L (α) is trivially well-behaved for every α ∈ [0, 1]; see Section 5.2. The following consequence of Theorem 7.3 therefore holds.
Remark 7.5. The right-hand side of (7.29) has the form exp(−I(r)), where I(r) ∼ b √ r log r, for some b > 0, as r → ∞. Such a rate of decay is better than the one we can deduce from [33, Proposition 5.1] (which is indeed a translation of the results from [11]), that applies to the case where µ is a multiple of the restriction of the Lebesgue measure to a convex body, and implies an upper bounds of the form exp(−I 0 (r)), with I 0 (r) ∼ b 0 r 1/3 . Our result provides also a rate of decay that is faster than the one appearing in [33,Theorem 5.5], where the bound has the form exp(−I 1 (r)), with I 1 (r) ∼ b 1 r 1/2 . It is remarkable that, in the case α = 0 and as far as the rate of decay (as r → ∞) is concerned, the estimate (7.29) is also strictly better than (6.25), and that this comes at the cost of somewhat more complicated constants. Finally, we observe that the asymptotic relation I(r) ∼ b √ r log r is consistent with Proposition 6.1.

7.2.
Length in more general graph models. In the following, we consider a slightly more general model of random geometric graphs. For this, let η be a Poisson point process on R d with σ-finite and non-atomic Borel intensity measure µ, and let ρ : R d → R + be given by (7.30) ρ(x) = 1 x + 1 γ , for some γ > 0. We define H(η) as the graph with vertex set η and an edge between vertices x, y ∈ η whenever 0 < x − y ≤ ρ(x) + ρ(y). Note that the graph H = H(η) is obtained by implementing the following two-step procedure: (a) for every x ∈ η, draw the closed ball B(x, ρ(x)), centered at x and with radius ρ(x), and (b) connect two distinct points x, y ∈ η with an edge, if and only if B(y, ρ(y)) ∩ B(x, ρ(x)) = ∅.
In other words, H is the intersection graph of the balls centered at the points of η with (decaying) radii given by ρ(x), x ∈ η. We will see that this model allows situations where H has almost surely infinitely many edges but still a finite length. Interestingly, even if there are infinitely many edges, the length can have an exponentially decaying upper tail. Before we analyse concentration properties of the length, we present an illustration of how the considered graph might look like in the plane.

Figure 2. A realisation of the intersection graph H(η)
Let L be the length of H. Then L is a U-statistic of order 2 with kernel Moreover, if µ guarantees that almost surely L < ∞, then L is even well-behaved. As a consequence of Theorem 7.3 we obtain the following result. Assume that EG < ∞ and EL < ∞. Then for any r ≥ 0, where χ is defined as in Theorem 7.3.
Proof. Let x, y ∈ R d be such that x−y ≤ ρ(x)+ρ(y). Then we have | x − y | ≤ 2 since ρ is upper bounded by 1. Hence, Therefore, x − y ≤ (3 γ + 1)ρ(x). It follows that the local version of L satisfies For x, y ∈ R d we define g x (y) = ρ(x)1{ x − y ≤ (3 γ + 1)ρ(x)}. Then the above reasoning gives that, almost surely, We see that Theorem 7.3 applies to L whenever EL, EG < ∞ and this concludes the proof.
Next we will prove a sufficient condition for the finiteness of the expectations appearing in Corollary 7.6 for the case when the Poisson process η is homogeneous, that is, when the intensity measure of η has the form t × λ, where λ is the Lebesgue measure.
Proposition 7.7. Assume that η is a homogeneous Poisson point process on R d with intensity tλ, t > 0. Let L be the length of H and define the random variable G as in (7.31). Then G and L are integrable, provided that Proof. First observe, quite similarly as it was done in (7.32), that for any x ∈ Q d andx ∈ B(x, (3 γ + 1)ρ(x)), Hence, writing c = 3 γ + 1 and c = (3 γ + 2) γ , we have It follows that EG < ∞ if the expectation of the following is finite: Using the Slivniak-Mecke formula (2.4), we obtain that the expectation of the latter expression equals where κ d denotes the Lebesgue measure of the unit ball in R d . So we have EG < ∞ provided that (7.34) holds. This condition also guarantees EL < ∞ (to see this, just perform a computation similar to the one above where the estimate (7.33) is used instead of (7.35)).
We initially claimed that H can have a.s. infinitely many edges but still EL < ∞. The following result, when combined with Proposition 7.7, substantiates this statement.
Proposition 7.8. Assume that η is a homogeneous Poisson point process in R d with intensity t > 0. Denote by N the number of edges in H. Then almost surely N = ∞, provided that Remark 7.9. The phenomenon described above is remarkable since it allows for situations where the U-statistic is indeed (as opposed to the edge counting statistics) almost surely an infinite series, i.e. we have almost surely f L (x, y) > 0 for infinitely many (x, y) ∈ η 2 = . Intuitively, one might expect that in this situation strong concentration properties for L are more difficult to establish. However, as we have seen above, our method works without problems and yields exponential tail bounds for the graph length regardless of whether finitely or infinitely many edges are present.
Proof of Proposition 7.8. Note that For all x ∈ N d define the cube many disjoint translated copies of Q x into the cube x + [0, 1] d . Denote these copies by Q 1 x , . . . Q r(x) x . Observe also that for the diameter of each Q i x one has diam(Q i x ) = ρ(x+1). Hence, any two distinct vertices x, y ∈ η within the same cube are connected by an edge. Therefore, Now, the η(Q i x ) are independent Poisson random variables. Thus, by the second Borel-Cantelli lemma, the right hand side in the above display is almost surely non-finite if The expectation of η(Q i x ) is given by Since 1 − e −z − ze −z ≥ (1/4)z 2 for z ∈ [0, 1] and since λ x → 0 as x → ∞, the above series is non-finite if It is easy to see that the above is implied by condition (7.36).

Concentration for the convex distance in Poisson-based models
The convex distance for product spaces that was introduced by M. Talagrand in [41] has proved to be a very useful tool in the context of concentration inequalities -see e.g. [10,Chapter 11], [39,Chapter 6], [42,Chapter 2] and the references therein. In the recent paper [31] by M. Reitzner, this notion has been adapted for models based on Poisson point processes with finite intensity measure. For both the product space and the Poisson space version, the method of using the convex distance to establish concentration properties is based on an isoperimetric inequality. First applications of this method for Poisson-based models are worked out in [33,22] where concentration inequalities for Poisson U-statistics are presented. The proof of the convex distance isoperimetric inequality in [31] uses an approximation of the Poisson process by binomial processes. The goal in this section is to give an alternative proof for this inequality. Apart from slightly worse constants, we entirely recover Reitzner's result [31, Theorem 1.1] with the tools developed in the present work. In particular, we only use methods from Poisson process theory, thus answering the question proposed in [31] of whether such a direct proof is possible. Moreover, the assumptions on the space X for our results are less restrictive than in [31] where only locally compact second countable Hausdorff spaces are considered. The upcoming presentation is based on [5] and [4] where the convex distance for product spaces is recovered using the entropy method.
8.1. Convex distance for Poisson processes. To introduce the convex distance for Poisson point processes, let N fin ⊂ N denote the space of finite integer-valued measures on X which is equipped with the σ-algebra N fin obtained by restricting N to N fin . We will write ξ(x) = ξ({x}) whenever ξ ∈ N fin and x ∈ X in order to simplify notations. For any two measures ξ, ν ∈ N fin , we define the measure ξ \ ν by where x ∈ ξ indicates that x belongs to the support of ξ. The convex distance d T (ξ, A) is now defined for any measurable set A ∈ N fin and ξ ∈ N fin by where the supremum ranges over all measurable maps u : X → R such that u ξ ≤ 1 and · ξ denotes the 2-norm with respect to the measure ξ. It is immediate from the above definition that The following result gives an alternative characterization for the convex distance which will be crucial for our proof of the isoperimetric inequality later on. where, here and for the rest of the section, we use the shorthand notation This establishes equation (8.38). We aim at applying Sion's minimax theorem [38,Corollary 3.3]. To get prepared for this, first note that the supremum in (8.38) can obviously by performed with respect to those functions u : X → R satisfying u(x) = 0 whenever x / ∈ ξ. Note also that these functions form a finite dimensional real vector space (whose dimension is given by #{x ∈ ξ}) which will be denoted by U . So the supremum is actually taken over which is a convex and compact subset of U . Denote by Q the finite set of maps q : X → N 0 satisfying q(x) ≤ ξ(x) for all x ∈ ξ and q(x) = 0 whenever x / ∈ ξ. Moreover, define the map I by I : A → Q, ν → (x → (ξ(x) − ν(x)) + ).
Then, for any ζ ∈ M(A) and x ∈ ξ we have Since both U ≤1 and MI(A) are compact, the suprema and infima are actually maxima and minima.

Convex distance inequality.
In what follows, we will give the announced new proof of the convex distance inequality for Poisson point processes. The result we aim to prove is the following: Theorem 8.2. Let η be a Poisson point process in X with finite intensity measure µ. Let A ∈ N fin be arbitrary. Then P(η ∈ A)E(e d T (η,A) 2 /10 ) ≤ 1.
In particular, for any r ≥ 0, P(η ∈ A)P(d T (η, A) ≥ r) ≤ e −r 2 /10 . Note that in [31, Theorem 1.1] (under more restrictive assumptions on the space (X, X, µ)), an inequality stronger than (8.39) is proved, where the constant 1/10 is replaced by 1/4. To get prepared for the proof of Theorem 8.2, we first establish the following result. This is interesting in its own right since it particularly states that the variance of the convex distance is bounded by 1. Proposition 8.3. Let η be a Poisson point process on X with finite intensity measure µ. Then for any A ∈ N fin almost surely In particular, Vd T (η, A) ≤ 1.
Proof. For this proof we adapt arguments from the proof of [4,Proposition 13]. According to Proposition 8.1 we can choose a mapû : X → R with û ξ ≤ 1 and a probability measureζ on A satisfying Then, for any z ∈ ξ, we have Choose someζ ∈ M(A) that achieves the minimum in the above right hand side.
As a final ingredient for the upcoming proof of the convex distance inequality, we derive the following consequence of the Cauchy-Schwarz Inequality.
Lemma 8.4. Let ξ ∈ N fin and consider the measure space (X, X, ξ). Then for any measurable map h : X → R, We see that the LHS in (8.40) is less or equal to the RHS. Moreover, we can take u = h/ h ξ to conclude that the RHS is less or equal to the LHS.
Moreover, since η ∈ A implies d T (η, A) = 0, it follows from (ii) with r = Ed T (η, A) 2 that So, the result follows once we have proven (8.41) and (8.42). To prove (8.42), first observe that d T (·, A) is a non-decreasing functional. Using this and Proposition 8.3 we compute