Large and moderate deviations for kernel–type estimators of the mean density of Boolean models

Abstract: The mean density of a random closed set with integer Hausdorff dimension is a crucial notion in stochastic geometry, in fact it is a fundamental tool in a large variety of applied problems, such as image analysis, medicine, computer vision, etc. Hence the estimation of the mean density is a problem of interest both from a theoretical and computational standpoint. Nowadays different kinds of estimators are available in the literature, in particular here we focus on a kernel–type estimator, which may be considered as a generalization of the traditional kernel density estimator of random variables to the case of random closed sets. The aim of the present paper is to provide asymptotic properties of such an estimator in the context of Boolean models, which are a broad class of random closed sets. More precisely we are able to prove large and moderate deviation principles, which allow us to derive the strong consistency of the estimator of the mean density as well as asymptotic confidence intervals. Finally we underline the connection of our theoretical findings with classical literature concerning density estimation of random variables.


Introduction
The mean density of lower dimensional random closed sets, such as fiber processes and surfaces of full dimensional random sets, is an important quantity which arises in different scientific fields. As a consequence its evaluation and estimation have undergone a growing interest during the last decades [6,19]. Recent areas of applications include pattern recognition and image analysis [40,28], computer vision [42], medicine [1,8,15,16,17], material science [14]. We remind that, given a probability space (Ω, F , P), a random closed set Θ in R d is a measurable map where F denotes the class of the closed subsets in R d , and σ F is the σ-algebra generated by the so called Fell topology, or hit-or-miss topology, that is the topology generated by the set system where G and C are the system of the open and compact subsets of R d , respectively (e.g., see [36]). We say that a random closed set Θ : (Ω, F ) → (F, σ F ) satisfies a certain property (e.g., Θ has Hausdorff dimension n) if Θ satisfies that property P-a.s.; throughout the paper we shall deal with countably H nrectifiable random closed sets, having denoted by H n the n-dimensional Hausdorff measure.
A random closed set Θ n of locally finite n-dimensional Hausdorff measure H n induces a random measure μ Θn (A) := H n (Θ n ∩ A), A ∈ B R d , and the corresponding expected measure is defined as where B R d is the Borel σ-algebra of R d . (The important issue of the measurability of the random variable μ Θn (A) has been addressed in [5,45].) Whenever the measure E[μ Θn ] is absolutely continuous with respect to the d-dimensional Hausdorff measure H d , its density (i.e. its Radon-Nikodym derivative) with respect to H d is called mean density of Θ n , and, according to notation in previous works (e.g., see [18,20]), denoted by λ Θn .
It is worth mentioning that, while the estimation of the mean density in stationary settings has been widely studied in the literature (see, e.g., [6,23]), only recently the non-stationary case has been addressed, and, to the best of our knowledge, a general density estimation theory for random sets is still missing. The aim of the present paper is the investigation of this area. As a matter of fact, the problem of the local and global approximation of λ Θn for non stationary random sets has been tackled by the authors in [2,18,19,20,44]. More specifically, given an i.i.d. random sample Θ (1) n , . . . , Θ (N ) n of size N for the random closed set Θ n , the authors have provided two different kinds of estimators for the mean density of Θ n : the so-called "Minkowski content"-based estimator, introduced in [43] through the notion of the Minkowski content of a set (see, e.g., [3]), and the so-called kernel-type estimator, introduced in [10] and denoted here λ κ,N Θn (for its precise definition see Eq. (6) bellow). We refer to [10] for a discussion on similarities and differences among them; we mention here that, even if the evaluation of λ κ,N Θn (x) is a non-trivial issue for very general random sets, it has been shown in [11] that it approaches the true value of λ Θn (x) much faster than the "Minkowski content"-based estimator.
We point out that the importance of the estimator λ κ,N Θn (x) arises in the general theory of random sets, because it may be regarded as a generalization of the classical kernel density estimator of random variables to the case of random sets (see also Section 6); this is the reason why we shall refer to λ κ,N Θn (x) as "kernel-type" estimator (or briefly kernel density estimator ), and why its investigation plays a pivotal role in the whole theory of random sets, providing a unifying approach to density estimation. While the asymptotic properties of the "Minkowski content"-based estimator, as well as asymptotic confidence intervals and central limit theorems, have been studied in [13], no analogous results are still available for the kernel-type estimator of the mean density. Hence the main aim of the present paper is the investigation of large and moderate deviation principles of λ κ,N Θn (x) for a large class of random closed sets, known as Boolean models, leaving to subsequent works extensions to more general classes. The analysis we will carry out is much in the spirit of [31,35], who proved similar results for kernel estimators of random variables. Even if Boolean models do not cover all the variety of random sets, as stated in [4], they are usually considered basic random sets models in stochastic geometry. So the present paper may be seen as the first step in extending large and moderate deviation principles for kernel density estimators of random variables to the case of kernel-type estimators of the mean density of random sets. The theorems we are going to prove are interesting in their own right, in addition they provide tools to derive asymptotic normality and strong consistency of kernel-type estimators, which are useful to determine asymptotic confidence intervals, as well.
The paper is organized as follows. In Section 2, we depict the general framework of Boolean models that we want to handle in this paper; besides we briefly recall all the results on stochastic geometry and large deviation theory that are necessary to the aim of the present paper. Large and moderate deviation principles for the kernel-type estimator of the mean density are presented in Section 3, namely in Theorem 2 and Theorem 3, respectively. These theorems are the basic building blocks to derive statistical properties of such an estimator. Indeed we are able to prove its strong consistency and to derive asymptotic confidence intervals (see Section 4). Some noteworthy examples of Boolean models are discussed as well in Section 5. Finally Section 6 contains a discussion on relevant connections with the literature and paves the way for future developments of the present work. For the reader's convenience, the proofs of the main theorems, and some related technical lemmas, are deferred to Appendix A.

Preliminaries and notations
This section gathers some basics on stochastic geometry and large deviations, which are necessary to understand our main results. Clearly the treatment is not exhaustive here, thus throughout the paper we provide some interesting references for those readers who want to deepen the results we just recall.

Point processes, intensity measure and Boolean models
Roughly speaking a point process, denoted here by Φ, is a locally finite collection {ξ i } i∈N of random points; more formally Φ is a random counting measure, that is a measurable map from a probability space (Ω, F , P) into the space of locally finite counting measures on R d . Throughout the paper we will deal with simple point processes, that is Φ({x}) ≤ 1 ∀x ∈ R d , P-a.s.
The measure Λ(A) := E[ Φ(A)] on B R d is called intensity measure of Φ; whenever it is absolutely continuous with respect to H d , its density is called intensity of Φ.
Marked point processes may be regarded as a generalization of point processes. They are collections of random points ξ i in R d , each one associated with a mark K i , which usually belongs to a complete and separable metric space (c.s.m.s.) K. Hence the resulting collection of random points Φ = {(ξ i , K i )} i∈N is a point process on R d × K, with the property that the unmarked process . A common assumption (e.g., see [33]) is that there exists a measurable function f : R d × K → R + and a probability measure Q on K such that Λ(d(x, K)) = f (x, K)dxQ(dK). We also recall that point processes can be considered on quite general metric spaces. In particular, a point process in C d , the class of compact subsets of R d , is called particle process (see [4] and references therein). It is well known that, by a center map, a particle process can be transformed into a marked point process Φ on R d with marks in C d , by representing any compact set C as a pair (x, Z), where x may be interpreted as the "location" of C and Z := C − x the "shape" (or "form") of C. In this case the marked point process Φ = {(X i , Z i )} is also called germ-grain model. Every random closed set Θ in R d can be represented as a germ-grain model by means of a suitable marked point process In a large variety of applications the random sets Z i are uniquely determined by a suitable random parameter S ∈ K. Typical examples include: union of random balls, where K = R + and S is the radius of a ball centered at the origin; segment processes in R 2 in which K = R + × [0, 2π] and S = (L, α) where L and α are the random length and orientation of the segment attached to the origin, respectively.
In order to be consistent with the notation used in previous works (e.g., [44,10]), we shall consider random sets Θ n described by marked point processes Φ in R d with marks in a suitable mark space K so that Z = Z(S) is a random set containing the origin: Whenever Φ is a marked Poisson point process, Θ n is said to be a Boolean model. Since we are going to consider here Boolean models, we also recall that a marked Poisson point process in R d with marks in K may be seen as a Poisson point process on R d × K with intensity measure Λ if Λ(· × K) is continuous and locally bounded.
For an exhaustive treatment of point processes we refer to [24,25], and to [34] for an elegant presentation of Poisson processes. Further, we mention [36,37,38,39] for a unified theory on germ-grain models.

Basics on large and moderate deviations
The theory of large deviations is concerned with the asymptotic estimation of probabilities of rare events, by giving an asymptotic computation of small probabilities in exponential scale. Assume that (X, X ) is a Polish space equipped with its Borel σ-algebra. The large deviation principle characterizes the asymptotic behavior of a family of probability measures {μ N } N ≥1 on (X, X ) as N goes to infinity in terms of a rate function. A rate function is a map J * : X → [0, +∞) lower semicontinuous, i.e. the level sets {x : J * (x) ≤ α} are closed for every α ≥ 0; J * is said to be a good rate function if the level sets are compact. The set {x : J * (x) < +∞} amounts to be the domain of J * . Let v N be a velocity, namely a function such that v N → +∞ as N → ∞.
A family of probability measure {μ N } N ≥1 is said to satisfy a Large Deviation Principle (LDP) with rate function J * and velocity v N if and only if for any 432

F. Camerlenghi and E. Villa
where • A and A are the interior and the closure af A, respectively, and with the convention that the infimum over the empty set equals +∞. We say that a sequence of random variables satisfies the LDP when the sequence of measures induced by these variables satisfies the LDP.
The Gärtner-Ellis Theorem [26, Theorem 2.3.6] is the main tool to prove large deviations results. For our purposes, we consider the case X = R m , with m ≥ 1, and X = B R m . In what follows a · b := m j=1 a j b j denotes the scalar product between two generic vectors a = (a 1 , . . . , a m ) and b = (b 1 , . . . , b m ) of R m . We also remind that a convex function f : R m → (−∞, ∞] is said to be essentially smooth (see e.g. Definition 2.3.5 in [26]) if the interior where w N → ∞ as N → ∞, a LDP holds for suitable centered random variables with speed v N = 1/a N and the same quadratic rate which does not depend on the choice of {a N }. Moderate deviations may be employed to obtain the weak convergence to a centered Normal distribution whose variance is determined by a suitable application of the Gärtner-Ellis Theorem (e.g., see also [9]). This will be clarified in Section 4 where we shall apply LDP and MDP to show that, for every x ∈ R d , the kernel estimator λ κ,N Θn (x) of λ Θn (x) is strongly consistent and asymptotically Normal, respectively.

Notations and assumptions
To fix the notation, b n denotes the volume of the unit ball in R n , and B r (x) is the closed ball centered at x ∈ R d with radius r > 0. For any A ⊂ R d and r > 0, its Minkowski enlargement at size r > 0 is denoted by For further definitions and properties on rectifiable sets refer to [3,29,30].
In the sequel, we will say that Θ n satisfies a certain property if such a property is satisfied for P-almost every ω ∈ Ω; in particular Θ n will be a Boolean model driven by a Poisson point process Φ in R d × K with intensity measure Λ(d(x, s)) = f (x, s)dxQ(ds), satisfying the following assumptions: (A1) for any s ∈ K, Z(s) is a countably H n -rectifiable and compact sub- for some γ, γ > 0 independent of s; (A2) for any s ∈ K, H n (disc(f (·, s))) = 0, where disc(f (·, s)) contains the discontinuity points of f (·, s), and f (·, s) is locally bounded such that for These assumptions may seem to be a little bit technical at a first glance, but they are natural hypotheses fulfilled by a wide class of germ-grain models, and their meaning has been extensively discussed in [10,44]; indeed, for the reader's convenience, we use here the same notation (A1) and (A2) introduced in [10] and in [44], respectively. We also recall that the assumption (A1) guarantees (see Remark 4 and Proposition 5 in [44]) that the measure E[μ Θn ] defined in (1) is locally bounded and absolutely continuous with density In order to define the kernel density estimator of the mean density, we remind that a multivariate kernel is a probability density function κ : R d → R which is radially symmetric. Summing up, throughout the paper, unless otherwise specified, we suppose the validity of:
is a sequence of i.i.d. random closed sets as Θ n .
• κ is a continuous kernel with compact support supp(κ) ⊂ B R (0), and such that κ(x) ≤ M , for all x ∈ R d and for some M > 0.
The kernel-type estimator λ κ,N Θn (x) of the mean density λ Θn (x) at a point x ∈ R d is defined as follows [10]: where * stands for the usual convolution product, while κ r N : It can be shown (see [10,Corollary 7]) that if the bandwidth r N is such that The notion of approximate tangent space shall appear in the expression for the rate function both in the LDP and in the MDP stated in Theorem 2 and Theorem 3, respectively. Such a notion is borrowed from geometric measure theory and it is recalled below, for the reader's convenience. Denoted by G n the set of unoriented n-dimensional subspaces of R d , and by C c (R d ; R) the space of all the real valued continuous functions with compact support in R d , we remind that a H n -rectifiable compact set A ⊂ R d admits approximate tangent space By Theorem 2.83 and Proposition 1.62 in [3], π x A exists for H n -a.e. x ∈ A; moreover, (7) holds for any bounded Borel measurable function φ : R d → R with compact support such that H n |πxA (disc(φ)) = 0. For the sake of simplicity, we have assumed that κ is continuous: this allows us to directly apply Eq. (7) in the sequel. We refer to [10, Remark 9] for a more detailed discussion on the non-continuous case.

Large and moderate deviations for the kernel-type estimator
In this section we state large and moderate deviation principles for the kernel density estimator defined in (6), by deferring their proof to the Appendix. Such results will be useful to derive statistical properties and confidence intervals for the involved estimator, as we will see in Section 4.

Theorem 2 (LDP). Let Θ n and κ be as in the Assumptions. Then the sequence of kernel estimators
satisfies a LDP with speed v N = Nr d−n N and good rate function where Theorem 3 (MDP). Let Θ n and κ be as in the Assumptions, and let {b N } N ≥1 be a sequence of positive real numbers such that Then the sequence of estimators and good rate function where C V ar (x) is the quantity so defined

Statistical properties and confidence intervals
In the previous section we stated large and moderate deviation principles for the kernel estimator of the mean densities of random closed sets; these results allow to derive useful statistical properties for such an estimator. Indeed, proceeding along the same lines of [12, Remark 2], we can show how an estimate of the rate of convergence of λ κ,N Θn (x) to λ Θn (x) follows as a byproduct of Theorem 2 and that an immediate application of the Borel-Cantelli Lemma leads to a strong consistency result: Proposition 4 (Convergence rate). Let Θ n and κ be as in the Assumptions, and let has been defined in Theorem 2, we have that for any Proof. It is known that when we can apply the Gärtner-Ellis Theorem (see Theorem 1), the rate function J * (y) uniquely vanishes at y = y 0 , where y 0 := ∇J(0). Denoted for any δ > 0 we have that inf y∈C δ J * (y) > 0, since J * is non-negative and uniquely vanishes at y 0 . Therefore, as a consequence of the large deviation upper bound in (3) for the closed set C δ , we have By virtue of Theorem 2, the previous bound holds true for Z N = λ κ,N Θn (x), and v n = Nr d−n N . Besides, using equations (5) and (29), it can be easily seen that in our setup y 0 := J x (0) = λ Θn (x). Hence, in view of these remarks and (11), one concludes that for all η such that 0 < η < Γ * δ , there exists N 0 such that Corollary 5 (Strong consistency). Let Θ n and κ be as in the Assumptions, Proof. Let H := (Γ * δ − η), with Γ * δ defined as in Proposition 4 and η ∈ (0, Γ * δ ). Then H is a positive quantity independent of N , and observe that N ≥1 exp − Thus the result follows by Proposition 4 and a standard application of the Borel-Cantelli lemma.
At the end of Section 2.2, we mentioned that the term moderate deviation is used when for a sequence {a N } of positive numbers satisfying the conditions in (4), a LDP holds for suitable centered random variables with speed v N = 1/a N . If we choose w N = Nr d−n N , we may observe that by Theorem 3 we are in the case a N = Nr d−n N /b 2 N , with b N satisfying the conditions in (9). Moreover we also mention that the case a N = 1/w N (so here b N = Nr d−n N ) and a N = 1 (so here b N = Nr d−n N ) should correspond to the convergence to zero and to the weak convergence to a centered normal distribution, respectively, of the associated centered random variables (here λ κ,N and , respectively). This is in accordance with the corollary above and with the proposition below.
Proposition 6 (Asymptotic Normality). Let Θ n and κ be as in the Assumptions. Then the sequence is the quantity defined in (10).
Proof. One can proceed as in the proof of Theorem 3 with b N = Nr d−n N , noticing that the proof is still valid, even if the first condition in (9) is violated. As a consequence one is able to show that which is tantamount to saying that converges weakly to the normal distribution N(0, C V ar (x)), as N → +∞.
We conclude the investigation of the statistical properties related to λ κ,N Θn (x) providing asymptotic confidence intervals for λ Θn (x), relying on Proposition 6. In order to do this we have to choose a specific bandwidth r N , which is assumed to be the optimal bandwidth determined in [10]. Here we recall some useful results in this direction.
We remind that the best choice for r N should be the one which minimizes the mean square error (MSE), given by The minimization of the MSE is a quite challenging problem, which cannot be solved even in the simplest case of kernel density estimators of random variables. Hence one should look for an r N which minimizes the asymptotic mean square error (AMSE). For Θ n and κ as in the Assumptions, the following asymptotic approximation of the variance may be deduced by the proof of Theorem 8 in [10]: where C V ar (x) is the quantity defined in (10). For what concernes the asymptotic approximation of the bias, further differentiability assumptions on f are required. To fix the notation (the same used in [10] for the reader's convenience), in the sequel α := (α 1 , ..., α d ) will denote a multi-index of N d 0 ; we will further define |α|

F. Camerlenghi and E. Villa
besides, for all s ∈ K, we will put ·, s)).
For now on we assume that f (·, s) is at least twice differentiable, and that the following assumption is fulfilled for any |α| = 2: (A2bis) for any s ∈ K, H n (D (α) (s)) = 0 and D α y f (y, s) is locally bounded such that for any compact An asymptotic approximation of the bias has been proved in [10,Theorem 8]: where From (12) and (13) H d -a.e. x ∈ R d , provided that C Bias (x) = 0. (For a discussion on the case C Bias (x) = 0 we refer to [10].) Proposition 7. Let Θ n and κ be as in the Assumptions, and such that (A2bis) is fulfilled. If r N is the asymptotic optimal bandwidth r o,AM SE N in (14), then Proof. First of all note that and the first term in (15) converges weakly to the standard normal distribution as N → +∞, by Proposition 6. Let us notice now that the non-random term Proof. Thanks to Proposition 7 we can state that The asymptotic confidence intervals we have derived in the present section are based on the assumption C Bias (x) = 0, this does not happen for stationary Boolean models. However in such a situation the kernel-estimator is unbiased (see [10]) and Proposition 6 gives immediately the following: The previous Proposition is the basic building block to determine asymptotic confidence intervals for stationary Boolean models as well. Indeed, proceeding along the same lines as in the proof of Corollary 8, an asymptotic confidence interval for λ Θn of level α is . Note that here r N can be any bandwidth.

Noteworthy examples
Here we discuss some relevant examples of Boolean models, in particular a Boolean segment process, the Poisson point process and the Matérn cluster process.

A Boolean segment process
As simple example of applicability of the previous results we discuss the Boolean segment process already introduced in [10]. Let n = 1 and assume that Θ 1 is an inhomogeneous Boolean model of segments in R 2 with random length L and uniform orientation, so that the mark space is K = R + × [0, 2π]. For all s = ( , α) ∈ K, let Z(s) := {(u, v) ∈ R 2 : u = τ cosα, v = τ sinα, τ ∈ [0, ]} be the segment with length ∈ R + , and orientation α ∈ [0, 2π]. Denoted by P L (d ) the probability law of the random length L, we assume that E[L 3 ] < +∞. Finally the segment process Θ 1 is driven by the marked Poisson process Φ in R 2 × K having intensity measure Λ(d(y, s)) = f (y)dyQ(ds), where f (y) = f (y 1 , y 2 ) = y 2 1 + y 2 2 and Q(ds) = 1 2π dαP L (d ). We are going to consider the kernel k(z) = 1 B1(0) (z)/π, which is not continuous, anyway the theory developed here apply for this kernel thanks to [10, Remark 9]. More precisely, λ κ,N Θ1 (x) is given here by whereas in [10] it is shown that and the asymptotic optimal bandwidth is Hence Proposition 7 and Corollary 8 now apply with the previous specifications of λ κ,N Θ1 (x), r N and C V ar (x).

Poisson point processes
Let Ψ be a Poisson point process in R d with a continuous intensity λ Ψ . We recall that Ψ may be seen as a particular Boolean model with Hausdorff dimension n = 0 and mean density λ Ψ , by choosing K = R d as mark space, Z(s) = s ∈ R d as trivial typical grain, and Λ(d(y, s)) := λ Ψ (y)dyδ 0 (s)ds. As expected, observe that of λ Ψ (x) defined by In particular, by observing that π y (x − Z(s)) = {0}, we get Hence we can specialize large and moderate deviation principles for λ κ,N Ψ (x) by a direct application of Theorem 2 and Theorem 3, respectively: satisfies a LDP with speed v N = Nr d N and good rate function is a sequence of positive real numbers satisfying Finally, as a direct consequence of Proposition 6 it follows that the sequence converges weakly, as N → +∞, to the normal distribution N(0, ||κ|| 2 2 λ Ψ (x)).

Matérn cluster processes
Clustering is a fundamental operation on point processes, well-known in stochastic geometry, and it allows to construct new point processes (see [23] for a more exhaustive treatment). The clustering operation consists in replacing each point x of a given point process Φ p , called parent point process, by a cluster N x of points, called daughter points. Each cluster N x is itself a point process, and it is assumed to have only a finite mean number of points. The resulting point process given by the union of all the clusters N x is said to be a cluster point process.
Let us assume that the parent point process Φ p is a homogeneous Poisson point process with intensity λ p , and the clusters N x are of the form N xi = N i + x i for each x i ∈ Φ p , where the sequence {N i } i is independent of Φ p , and independent and identically distributed as N 0 (the representative cluster, centered at 0). Assuming that the number of points of N 0 is distributed according to a Poisson random variable with parameter n c , and that these points are independently and uniformly distributed in the ball B R (0), where R is a further parameter of the model, then the resulting cluster point process is called Matérn cluster process. It follows that Φ has constant intensity λ Φ = λ p n c , and may be regarded as a Boolean model Θ 0 with dimension n = 0, underlying Poisson point process Φ p , and typical grain Z 0 := N 0 given by a Poisson point process restricted to B R (0) whose intensity equals λ N0 ( . The resulting Boolean model Θ 0 ≡ Φ is driven by a marked Poisson point process in R d × S having intensity measure Λ(d(ξ, η)) = λ p dξQ(dη), where the mark space coincides with K := S the space of all sequences of points in R d and Q is the probability distribution of N 0 . Note that all the assumptions (A1) and (A2) are trivially fulfilled; as a consequence, all the previous results on λ κ,N Θ0 (x) ≡ λ κ,N Φ (x) hold in such a context. A LDP follows from Theorem 2, more specifically one can observe that the general expression for J * x appearing in the statement of that theorem simplifies in the context of Matérn cluster processes, indeed Hence one can see that the same large deviation principle (LDP) stated in Section 5.2 for a Poisson point process Ψ holds even for the Matérn cluster process Φ, replacing the intensity λ Ψ with λ Φ in the expression for the rate function J * x (16). In a similar vein one can prove the validity of the MDP stated in Section 5.2 for the Matérn cluster process Φ, where again the intensity λ Ψ is replaced with λ Φ in (17).

Discussion and concluding remarks
We have proved large and moderate deviation principles for kernel-type estimators of the mean density of Boolean models. Thanks to these results, we have been able to derive the consistency of the estimator, and asymptotic confidence intervals as well. Theorems 2 and 3 are connected with classical results concerning the kerneltype estimator of the density function of an absolutely continuous random variable due to [31,35]. Here we want to pinpoint the connection with the classical literature in view of future developments. More specifically, let X be a random variable taking values in R d with probability density function f X , and let X 1 , . . . , X N be a random sample for X. The kernel density estimator f N X (x) of f (x) at a point x ∈ R d is traditionally [27,32,41] defined as The scaling parameter r N , known as the bandwidth, determines the smoothness of the estimator, and it has to be chosen such that r N → 0 Nr d N → ∞ to obtain an asymptotically unbiased and weakly consistent estimator f N X (x). The kernel density estimator λ κ,N Θn (x) of λ Θn (x) defined in Eq.(6) may be seen as the natural extension of f N X (x) to the case of very general random geometric objects in R d of Hausdorff dimension n > 0, i.e. not necessarily Boolean models. See also [10, Section 3.3.1].
Large and moderate deviation principles for kernel density estimators of f X have been investigated in [31,35] with different techniques, in particular, in [31] the author establishes pointwise, as well as uniform, moderate and large deviations principles for the sequence f N , even for more general kernel functions κ. We recall here the pointwise results for large and moderate deviations given in [31, Proposition 3.1] and [31, Proposition 2.1], respectively, specializing them with our notation and assumptions on κ: satisfies a LDP with speed v N = Nr d N and rate function is a sequence of positive real numbers satisfying If Theorems 2 and 3 were true for a general germ-grain model (not only for Boolean models), then the results concerning random variables just recalled here would follow as a particular case. Indeed a random variable X ≡ Θ 0 can be seen as a trivial germ-grain process driven by the marked point process Φ = {(X, s)} in R d with mark space K = R d , consisting of one point (X) only, with grain Z(s) := s, and intensity measure Λ(d(y, s)) = f (y)dyδ 0 (s)ds. With these choices equation (5) implies that λ Θ0 (x) = f (x), i.e. the mean density of X amounts to be its probability density function, and the expressions in (18) and (19) follow (formally) by replacing Λ(d(y, s)) = f (y)dyδ 0 (s)ds and Θ n = X in Theorem 2 and Theorem 3, respectively, in analogous way as we did in Section 5.2. Note that the further term tf X (x) appearing in (18) is due to having considered now .
Hence we may ask whether the theorems obtained for Boolean models extend to more general random closed sets, e.g. germ-grain models. In such a case, as just observed here, the results of [31,35] would follow as a particular case of a more general theory. Otherwise, if the extension is not possible, the independence property of the underlying Poisson point processes would be peculiar in obtaining such expressions. This problem remains open and requires different kinds of techniques with respect to the ones employed here, which are mainly based on the availability of the Laplace functional of a Poisson point process.
Finally it is worth to underline that the theoretical results proved in this paper and in [12] may be useful in many applications, for example to determine confidence intervals for the estimators. A future work in this direction, we are working on, will be focused on simulation studies of the kernel-type estimator in comparison with other estimators, such as the "Minkowski content"-based estimator mentioned in the Introduction.

A.1. Proof of Theorem 2
Before proving Theorem 2, we provide two thecnical lemmas. For the sake of simplifying notation we define and we shall write h N (ξ, s) if r = r N in the above definition.

F. Camerlenghi and E. Villa
Proof. First of all we remind that (see [34, pg. 28]) if Ψ is a Poisson point process on X with intensity measure μ, then, for any measurable function g : for any complex number ϑ. By observing that min{h(ξ, s), 1} ≤ h(ξ, s), and that 1/r n ≤ 1/r d if r ≤ 1, we have We remind that κ is a kernel with supp(κ) ∈ B R (0), Z(s) ⊆ Ξ(s), and we notice that ifỹ ∈ Z(s) and Thus we may write Finally, Lemma 3 in [44] guarantees that the event that different grains of Θ n overlap in a subset of R d of positive H n -measure has null probability, therefore we may claim that that is the assertion.
Lemma 11. Let Θ n and κ be as in the Assumptions. If r < min 1, 1/(2R) , the following bound holds for any s ∈ K, t ∈ R, w ∈ R d , and H n -a.e. y ∈ x − Z(s): where Proof. First of all consider the case t ≤ 0. The case t > 0 is less trivial, and we employ the Taylor series expansion of the exponential Finally, the integrability condition (25) easy follows: Proof of Theorem 2. The proof relies on the Gärtner-Ellis Theorem. First of all we will show that then we observe that J is a smooth function defined on R, hence satisfying the assumptions of the Gärtner-Ellis Theorem. Since Θ (i) n i∈N is a sequence of i.i.d. random sets, then for N sufficiently big so that r N < 1 It is worth to multiply and divide the above integrand by [(x−Z(s))−ξ]/r N κ(y) × H n (dy); then, by suitable changes of variable, the following chain of equality holds: Denoted by D f (s) the set of discontinuity points of f (·, s) for any s ∈ K, assumption (A2) implies H n (D f (s)) = 0, therefore we can see that for any s ∈ K, w ∈ R d , and H n -a.e. y ∈ x − Z(s) .
Thus the (29) follows by a simple application of the dominated convergence theorem, whose validity is guaranteed by Lemma 11.
To conclude the proof we observe that J satisfies the assumptions of Theorem 1. More precisely, as a byproduct of the application of the dominated convergence theorem, J(t) < +∞ for any t ∈ R. Finally we show that J is differentiable on R with for any t 0 ∈ R. In order to prove this, fix t 0 ∈ R and δ > 0 sufficiently small; following [7,Theorem 16.8], we need to show that the integrand exp t is bounded from above for any t ∈ (t 0 − δ, t 0 + δ) by an integrable function. To this end, the definition of approximate tangent space and similar arguments as in (27) give hence the thesis follows.

A.2. Proof of Theorem 3
Before proving Theorem 3 we need some useful lemmas.
where the summation is extended over all positive integers which are solution of the equation q 1 + . . . + q n = n. It is worth noticing that the summation is knows as the Bell number, which amounts to be the number of partition of k objects in distinct sets (see [21, pg. 292]). By [21, pg. 97] we have the following representation The Stirling number of the second kind satisfy a useful recurrence relation Proof. With the notation introduced in (20), the same argument at the end of the froof of Lemma 10, together with traditional combinatorial arguments, show that for any k ≥ 3: where the sum over ( ) runs over all the vectors (q 1 , . . . , q i ) of positive integers such that q 1 + · · · + q i = k. Besides we have used the fact that the marked point process is simple.
Since there are i! possible permutations of the points ξ 1 , . . . , ξ i we can write where ν [i] is the i-th factorial moment measure of Φ (e.g. see [23] where τ is the function defined in (31). By the end of the proof of Lemma 12 we konw that whenever r is sufficiently small, i.e. r ≤ min 1, 1/(2R) , therefore and so Recalling the definition of the Bell numbers B k (see (35)) we have Now we consider the summation in (36) for any t 0 > 0. Finally, the previous bound for the expectation yields e t0Cm m! = r d−n exp e t0C − 1 , and the r.h.s. of this inequality turns out to be a O(r d−n ), which implies the assertion.
Finally we recall that the discrete version of the Hölder inequality can be written as follows where x i , y i ≥ 0 for any i = 1, . . . , n, and p, q > 0 are such that 1/p + 1/q = 1. By specializing (37) with n = 2, y i = 1 for i = 1, 2, p = k and q = k/(k − 1), it directly follows that for any x 1 , x 2 > 0 and for any integer k > 0.
We are now ready to prove the theorem.

Proof of Theorem 3. Let us define
and J x (t) := lim after proving that the limit exists finite for any t ∈ R; then the good rate function will turn out to be as a direct application of Theorem 1.
First of all observe that, for any t ∈ R, In order to bound the term R(N ) appearing in the previous equation, we note that for any real valued random variable X the following inequality holds where the last inequality follows from a standard application of the Hölder inequality, namely (E|X|) k ≤ E|X| k . Hence, if X = Θn κ x−y r N H n (dy), the remainder term R(N ) in (40) By assumption {Θ (i) n } i∈N is a sequence of i.i.d. random closed sets as Θ n ; therefore

V ar
Θn k x − y r N H n (dy) = Nr 2d N V ar( λ κ,N Θn (x)) (12) = Nr 2d As a consequence the rate function is given by and the assertion follows.