Asymptotic properties of the maximum pseudo-likelihood estimator for stationary Gibbs point processes including the Lennard-Jones model

This paper presents asymptotic properties of the maximum pseudo-likelihood estimator of a vector $\Vect{\theta}$ parameterizing a stationary Gibbs point process. Sufficient conditions, expressed in terms of the local energy function defining a Gibbs point process, to establish strong consistency and asymptotic normality results of this estimator depending on a single realization, are presented.These results are general enough to no longer require the local stability and the linearity in terms of the parameters of the local energy function. We consider characteristic examples of such models, the Lennard-Jones and the finite range Lennard-Jones models. We show that the different assumptions ensuring the consistency are satisfied for both models whereas the assumptions ensuring the asymptotic normality are fulfilled only for the finite range Lennard-Jones model.


Introduction
These last years, much attention has been paid to spatial point pattern data, and especially to models and methodologies for fitting them, see Møller (2008) for a recent overview of this topic and Daley and Vere-Jones (1988), Stoyan et al. (1987) Møller and Waagepetersen (2003) or Illian et al. (2008) for more general information. For spatial point pattern data, the reference model is the Poisson point process modelling a random configuration of points with no interaction between points. In particular, this leads to the independence of any two random sub-configurations lying in two non-overlapping domains. A way to introduce dependence is to consider the class of Gibbs models. In a bounded domain, a Gibbs point process is defined through its probability measure having a Radon-Nykodym derivative with respect to a Poisson point process measure proportional to e −V (ϕ) where V (ϕ) corresponds to the energy function (i.e. a cost function expressed in terms of interactions) of the configuration of points ϕ. The definition of Gibbs models in R d is essential when dealing with asymptotic properties of estimators based on a point process observed in a domain aimed at converging towards R d . The extension of this definition is not so straightforward. The probability measure of a Gibbs point process in R d has to be defined by specifying its conditional density (indirectly expressed in terms of the energy function V (ϕ)), see e.g. Preston (1976) or Section 2 for more details.
The class of Gibbs point processes is extremely rich. The energy function can penalize points, pairs or triplets of points (see e.g. Baddeley and Turner (2000)). More sophisticated models can also be obtained by considering interactions based on the Delaunay or the k−nearest neighbor graphs (Bertin et al. (1999b,c)), Voronoï tessellations (Dereudre and Lavancier (2009)) or random sets (Kendall et al. (1999), Dereudre (2009)).
Following the definition of a parametric Gibbs point process, the natural question of efficiently estimating the parameters arises. Many proposals have tried to estimate the energy function from an available point pattern data. The most well-known method is the use of the likelihood function, see e.g. Møller and Waagepetersen (2003) and the references therein. The main drawback of this approach is that the likelihood function contains an unknown scaling factor whose value depends on the parameters. This parametric normalizing constant is difficult to calculate from a practical point of view. From a theoretical one, it also makes asymptotic results more complicated to obtain. An alternative approach relies on the use of the pseudo-likelihood function. The idea originated from Besag (1974) in the study of lattice processes. Besag et al. (1982) further considered this method for pairwise interaction point processes, and Jensen and Møller (1991) extended the definition of the pseudo-likelihood function to the general class of marked Gibbs point processes. The construction of the pseudo-likelihood function is based on the conditional densities which spare the computation of the scaling factor.
Our paper deals with asymptotic properties of the maximum pseudo-likelihood estimator. In order to underline our theoretical improvements, let us discuss the two main different papers discussing this topic: • In Billiot et al. (2008), we obtain consistency and asymptotic normality for exponential family models of Gibbs point processes, that is, on models with energy functions that are linear in terms of the parameters. Moreover, we concentrate on models such that the local energy function is local and stable. The locality of the local energy expresses that the energy to insert a point x into ϕ, that is, V (x|ϕ) = V (ϕ ∪ x) − V (ϕ), depends only on the points of ϕ falling into some ball with a fixed radius whereas the stability of the local energy (property referred as the local stability) asserts that V (x|ϕ) is bounded from below by a finite negative constant. The paper Billiot et al. (2008) extends several papers (Jensen and Møller (1991), Jensen and Künsch (1994)) and includes a large class of examples of practical interest: area-interaction point process, Multi-Strauss marked point process based on the complete graph or the k-nearest-neighbors graph, or the Geyer's triplet point process to name a few.
• Another work has been undertaken by Mase. The consistency for non necessarily stable local energy functions (actually for superstable and lower regular ones introduced by Ruelle (1970)) is obtained in Mase (1995) for specific models with only two parameters -the chemical potential and the inverse temperature-which can be viewed as particular exponential family models. Mase (2000) extended his work to the context of marked point processes and provided asymptotic normality by adding the assumption of finite range.
Based on this literature, the main goal of this paper is to derive asymptotic properties similar to the ones presented before (consistency and asymptotic normality) but in a more general framework. We provide asymptotic results for general Gibbs point processes with non (necessarily) linear and non (necessarily) stable local energy functions. The characteristic example we have in mind is the Lennard-Jones model. This model, from statistical physics, is a stationary pairwise interaction Gibbs point process where the local energy to insert a point x into a configuration ϕ is parameterized as follows: for θ = (θ 1 , θ 2 , θ 3 ) ∈ R 3 with θ 2 , θ 3 > 0 Let us notice that Mase (1995) could only propose the estimation of θ 1 and θ 2 with known θ 3 . The Lennard-Jones model is of great interest from several points of view. From a physical point of view, this model arises when theoretically modelling a pair of neutral atoms or molecules subject to two distinct forces in the limit of large separation and small separation: an attractive force at long ranges (van der Waals force, or dispersion force) and a repulsive force at short ranges (the result of overlapping electron orbitals, referred to as a Pauli repulsion from the Pauli exclusion principle). In this literature, the parameters θ 2 and θ 3 are often referred to as the depth potential and the (finite) distance at which the interparticle potential is zero. From a probabilistic point of view, this model constitutes the main example of superstable, regular and lower regular energies studied in Ruelle (1970) where the author proves the existence of ergodic measures for such models. Finally, from a statistical point of view, this model has been considered by several authors, see e.g. Ogata and Tanemura (1981), Goulard et al. (1996) for fitting spatial point patterns arising in forestry. In particular, let us note that, in Goulard et al. (1996), the model is fitted by using the maximum pseudo-likelihood method. As the authors do not endeavour to justify the theoretical performances of the procedure, the result proposed in Section 4 of this paper fills this gap.
The rest of the paper is organized as follows. Section 2 introduces some background and notation on Gibbs point processes (general definitions, examples). The maximum pseudo-likelihood method and asymptotic results of the derived estimator are proposed in Section 3. For general Gibbs point processes, sufficient conditions, expressed in terms of the local energy function to establish strong consistency and asymptotic normality results of this estimator are presented. While no general condition on the model is assumed to obtain the consistency, the characteristic finite range of the local energy function is required to establish the asymptotic normality. For the sake of simplicity, Section 3 (and the resulting proofs) would concentrate on non-marked Gibbs point processes. However, as we have shown in our seminal paper Billiot et al. (2008), no real mathematical difficulty occurs with the introduction of marks. Section 4 focuses on the Lennard-Jones model. We show that the general assumptions described in Section 3 are fulfilled for this model. Proofs have been postponed until Section 5.

Background and notation
For the sake of simplicity, we consider Gibbs point processes in dimension d = 2.

General notation, configuration space
Subregions of R 2 will typically be denoted by Λ or ∆ and will always be assumed to be Borel with positive Lebesgue measure. We write Λ ⋐ R 2 if Λ is bounded. Λ c denotes the complementary set of Λ inside R 2 . The notation |.| will be used without ambiguity for different kind of objects. For a countable set J , |J | represents the number of elements belonging to J ; For Λ ⋐ R 2 , |Λ| is the volume of Λ; For a vector x ∈ R 2 , |x| corresponds to its uniform norm while x is simply its euclidean norm. For all x ∈ R 2 , ρ > 0 and i ∈ Z 2 , let B(x, ρ) := {y ∈ R 2 , |y − x| < ρ} and (i, ρ) := B(i, ρ) ∩ Z 2 .
A configuration is a subset ϕ of R 2 which is locally finite in that ϕ Λ := ϕ ∩ Λ has finite cardinality N Λ (ϕ) := |ϕ Λ | for all Λ ⋐ R 2 . The space Ω of all configurations is equipped with the σ-algebra F that is generated by the counting variables N Λ (ϕ) with Λ ⋐ R 2 . Finally, let T = (τ x ) x∈R 2 be the shift group, where τ x : Ω → Ω is the translation by the vector −x ∈ R 2 .

Gibbs point processes
Our results will be expressed for general stationary Gibbs point processes. Since we are interested in asymptotic properties, we have to consider these point processes acting on the infinite volume R 2 . Let us briefly recall their definition.
A point process Φ is a Ω-valued random variable, with probability distribution P on (Ω, F ). The most prominent point process is the (homogeneous) Poisson process with intensity z > 0. Recall that its probability measure π z is the unique probability measure on (Ω, F ) such that the following holds for Λ ⋐ R 2 : (i) N Λ is Poisson distributed with parameter z|Λ|, and (ii) conditionally to N Λ = n, the n points in Λ are independent with uniform distribution on Λ, for each interger n ≥ 1. For Λ ⋐ R 2 , let us denote by π z Λ the marginal probability measure in Λ of the Poisson process with intensity z.
In this article, we focus on stationary point processes on R 2 , i.e. with T -invariant probability measure. For any Λ ⋐ R 2 , we therefore consider V Λ (.; θ)) to be T -invariant, i.e. V Λ (τ x ϕ; θ) = V Λ (ϕ; θ) for any x ∈ R 2 . Furthermore, we assume that the family of energies is hereditary, which means that for any Λ ⋐ R 2 , ϕ ∈ Ω, and x ∈ Λ: In such a context, a Gibbs measure is usually defined as follows (see Preston (1976)).
Definition 1 A probability measure P θ on Ω is a Gibbs measure for the family of energies (V Λ (.; θ)) Λ⋐R 2 if for every Λ ⋐ R 2 , for P θ -almost every outside configuration ϕ Λ c , the law of P θ given ϕ Λ c admits the following density with respect to π z Λ : Without loss of generality, the intensity of the Poisson process, z is fixed to 1 and we simply write π and π Λ in place of π 1 and π 1 Λ . In the previous definition, we implicitly assume the consistency of the family (f Λ (.|.; θ)) Λ⋐R 2 : for any ∆ ⊂ Λ ⋐ R 2 .
A sufficient condition to directly fulfill this basic ingredient is to assume the compatibility of the family (V Λ (.)) Λ⋐R 2 : for every ∆ ⊂ Λ ⋐ R 2 , the function ϕ → V Λ (ϕ; θ) − V ∆ (ϕ; θ) from Ω into R ∪ {+∞} is measurable and only depends on ϕ Λ c . The existence of a Gibbs measure on Ω which satisfies these conditional specifications is a difficult issue. We refer the interested reader to Ruelle (1969); Preston (1976); Bertin et al. (1999a); Dereudre (2005); Dereudre et al. (2010) for the technical and mathematical development of the existence problem. The minimal assumption of our paper is then: [Mod-E]: Our data consist in the realization of a point process Φ with Gibbs measure P θ ⋆ , where θ ⋆ ∈Θ, Θ is a compact subset of R p and, for any θ ∈ Θ, there exists a stationary Gibbs measure P θ for the family (V Λ (.; θ)) Λ⋐R 2 .
In the rest of this paper, the reader has mainly to keep in mind the concept of local energy defined as the energy required to insert a point x into the configuration ϕ and expressed for any From the compatibility of the family of energies, the local energy does not depend on Λ. Our asymptotic normality result will require the following locality property assumption.

Example : Lennard-Jones models
Let us present the main example studied in this paper. We call LJ-type model the stationary pairwise interaction point process defined for some D ∈]0, +∞] by As a direct consequence, the local energy function is expressed as where θ = (θ 1 , θ 2 , θ 3 ) ∈ R × (R + ) 2 . The cases D = +∞ and D < +∞ respectively correpond to the Lennard-Jones model (briefly presented in the introduction) and the Lennard-Jones model with finite range. Ruelle (1970) has proved the existence of an ergodic measure for superstable, regular and lower regular potentials. The Lennard-Jones model (including the finite range one) is known to be the characteristic example of such a family of models for which Ruelle managed to prove the existence of ergodic measures for any θ ∈ R × (R + ) 2 . In order to ensure [Mod-E], it is required to assume that θ ⋆ 2 , θ ⋆ extended and Jensen and Møller (1991) (Theorem 2.2) obtained a general expression for Gibbs point processes. Using our notation and up to a scalar factor, the pseudo-likelihood defined for a configuration ϕ and a domain of observation Λ is denoted by P L Λ (ϕ; θ) and given by It is more convenient to define and work with the log-pseudo-likelihood, denoted by LP L Λ (ϕ; θ) The point process is assumed to be observed in a domain Λ n ⊕ D = ∪ x∈Λn B(x, D) for some D < +∞. For the asymptotic normality result, it is also assumed that D ≥ D and that Λ n ⊂ R 2 can be decomposed into ∪ i∈In ∆ i where I n = (0, n) and for i ∈ Z 2 , ∆ i = ∆ i ( D) is the square centered at i with side-length D. As a consequence, as n → +∞, Λ n → R 2 such that |Λ n | → +∞ and |∂Λ n | |Λ n | → 0.

Asymptotic normality of the MPLE
For establishing the asymptotic normality of the MPLE we need to assume the four additional following assumptions: [N2] There exists a neighbourhood V(θ ⋆ ) of θ ⋆ such that for all ϕ ∈ Ω, V (0|ϕ; θ) is twice continuously differentiable in θ ∈ V and, for all j, k = 1, . . . , p and θ ∈ V(θ ⋆ ), [N3] There exists A 1 , . . . , A ℓ , ℓ ≥ p events in Ω such that: The assumptions [N3] and [N4] will ensure (see Section 5 for more details) that the matrices U (2) (θ ⋆ ) and Σ( D, θ ⋆ ) respectively defined by and where Σ( D, θ ⋆ ) is defined by (4). In addition under the assumption [N4] where for some θ and any configuration ϕ, the matrix Σ n (ϕ; θ) is defined by In the following the assumption Note that [C3] is obvious since g LJ (r, ·) is continuous. For the integrability type assumptions, the following Lemma will be widely used.
Lemma 4 Let Φ be a stationary pairwise interaction Gibbs point process assumed to be superstable, regular and lower regular. For i = 1, 2, define H i (x|ϕ) = y∈ϕ g i (||x − y||) with g i a continuous function. Assume that there exists ε > 0 such that there exists a positive and decreasing function g(·) such that g ε (r) := g 2 (r)−ε|g 1 (r)| ≥ −g(r) for all r > 0 and +∞ 0 rg(r)dr < +∞. Then for all k ≥ 0, Proof. For all finite configuration ϕ Now, the assumptions ensure that g ε is lower regular in the Ruelle sense. We may now apply the same argument as in Lemma 3 of Mase (1995) to prove the integrability of the random variable e −Hε(0|Φ) . Before verifying the different assumptions, let us denote by Since Θ is a compact set of R × (]0, +∞[) 2 , then θ inf > 0 and θ sup < +∞.

Assumption [C1]
The first part is a direct application of Lemma 4. For the second part, one has to prove that for which satisfies the assumptions of Lemma 4 as soon as ε < θ ⋆ 3 θ3 12 θ ⋆ 2 θ2 , that is, as soon as ε < θ inf θ sup

Assumption [C2]
Let us denote for n ≥ 1, C n = B(0, n) \ B(0, n − 1) and define for m, n ≥ 1 the following configuration sets In order to prove [C2], we need the following Lemma.
There exists a constant k = k(R) such that for all n ≥ ⌈R⌉, sup x∈Cn g LJ (||x||; θ ⋆ ) ≤ kn −6 . Therefore, which leads to the result since the previous series is convergent. Let θ ∈ Θ \ θ ⋆ and consider the following configuration sets defined for k ≥ 1 and for η small enough by where D is any positive real for the Lennard-Jones model and corresponds to the range of the function g LJ (·) for the finite range Lennard-Jones model. There exists m ≥ 1 such that for all η > 0 and for k = 2, 4 For the Lennard-Jones model, according to Lemma 5 one has, for D large enough, Hence for η small enough, and for both models where for any ϕ k ∈ A k (η) (k = 2, 4), there exists a positive function f k (η) converging towards zero as η → 0 such that |f k (ϕ k )| is bounded by f k (η). Now, we have For η small enough, we have, for any ϕ k ∈ A k (η) (k = 2, 4), For the finite range Lennard-Jones model, Z ′ (ϕ 2 , ϕ 4 , D; θ, θ ⋆ ) = 0. For the Lennard-Jones model, according to Lemma 5, one has for D large enough Hence for η small enough, and for both models leading to θ 2 θ 6 3 = θ ⋆ 2 θ ⋆ 3 6 . By considering the combination √ 2D(0|ϕ 2 ; θ) − D(0|ϕ 4 ; θ) and using similar arguments as previously, one obtains: θ 2 θ 12 3 = θ ⋆ 2 θ ⋆ 3 12 . By computing the ratio of the two last equations, one obtains θ 3 = θ ⋆ 3 and then θ 2 = θ ⋆ 2 .

Assumption [C4]
For all ϕ ∈ Ω and for any θ ∈ Θ, V LJ (0|ϕ; θ) is clearly differentiable in θ. First, note that [C4] is trivial for j = 1. For j = 2, 3, let us define: Our aim will be to prove that for j = 2, 3 and for all k > 0 In particular, the Assumption [C4] corresponds to (10) with k = 2. Let us notice that for all ϕ ∈ Ω and for all θ ∈ Θ . Let us also underline that for j = 2, 3 Therefore, by defining V inf j (0|ϕ) := x∈ϕ g inf j (||x||), the result (10) will be ensured by proving According to Lemma 4, in order to prove this, let us denote by g j,ε (·) the function defined for j = 2, 3, for some ε > 0 and for r > 0 by g j,ε (r) = g inf j (r) − ε g inf (r) . On the one hand, one has which satisfies the assumptions of Lemma 4 as soon as ε < θ inf . On the other hand , which satisfies the assumptions of Lemma 4 as soon as ε < θ inf /12, which ends the proof.

Assumption [N1]
Let us present two auxiliary lemmas.
Lemma 7 Using the same notation and under the same assumptions of Lemma 6, assume that there exists g min such that for all r > 0 and any θ ∈ Θ, g(r; θ) ≥ g min , then Proof. The proof is immediate since Let k = 1, . . . , 3 and let λ 1 , . . . , λ k , k positive integers such that k i=1 λ i = 3 and define the random variable 1/k by using Hölder's inequality and the stationarity of the process. The result is then a simple consequence of (10) and Lemma 4.
On the one hand, for k large enough and for η small enough, we may have On the other hand, we have for k large enough Therefore for k large enough and for η small enough, we have which leads to y 1 = 0.
5 Annex: proofs of Theorems 1 and 2 Let us start by presenting a particular case of the Campbell Theorem combined with the Glötz Theorem that is widely used in our future proofs.
Corollary 8 Assume that the point process Φ with probability measure P is stationary. Let Λ ⋐ R 2 , ϕ ∈ Ω and let g be a function satisfying g(x, ϕ) = g(0, τ x ϕ) for all x ∈ R 2 . Define f (ϕ) = g(0, ϕ)e −V (0|ϕ) and assume that f ∈ L 1 (P ). Then, Proof. see Corollary 3 of Billiot et al. (2008) Let us now present a version of an ergodic theorem obtained by Nguyen and Zessin (1979) and widely used in this paper. Let ∆ 0 be a fixed bounded domain Theorem 9 (Nguyen and Zessin (1979)) Let {H G , G ∈ B b } be a family of random variables, which is covariant, that for all x ∈ R 2 , H τxG (τ x ϕ) = H G (ϕ), for a.e. ϕ and additive, that is for every disjoint G 1 , G 2 ∈ B b , Let I be the sub-σ-algebra of F consisting of translation invariant (with probability 1) sets. Assume there exists a nonnegative and integrable random variable Y such that |H G | ≤ Y a.s. for every convex G ⊂ ∆ 0 . Then, for each regular sequence G n → R 2 .

Proof of Theorem 1
Due to the decomposition of stationary measures as a mixture of ergodic measures (see Preston (1976)), one only needs to prove Theorem 1 by assuming that P θ ⋆ is ergodic. From now on, P θ ⋆ is assumed to be ergodic. The tool used to obtain the almost sure convergence is a convergence theorem for minimum contrast estimators established by Guyon (1992). We proceed in three stages.
The modulus of continuity of the contrast process defined for all ϕ ∈ Ω and all η > 0 by Let us start to write W n ϕ, 1 k ≤ W 1,n ϕ, 1 k + W 2,n ϕ, 1 k with Let k ≥ 1 and let θ, θ ′ ∈ Θ such that ||θ − θ ′ || ≤ 1 k , then under the assumption [C1] and from Theorem 9 and Corollary 8, we have P θ ⋆ −almost surely as n → +∞ Under Assumption [C4], one may apply the mean value theorem in R p as follows: there exist .

This leads, under Assumption [C4]
, to the following inequality In such a way, one may also prove that Hence, for all k ≥ 1 and for all θ, θ ′ ∈ Θ such that ||θ − θ ′ || ≤ 1 k there exists n 0 (k) ≥ 1 such that for all n ≥ n 0 (k), we have Since γ 1 and γ 2 are independent of θ and θ ′ , we have for all n ≥ n 0 (k) Finally, since for P θ ⋆ −a.e. ϕ, the expected result (18) is proved. Conclusion step. The Steps 1, 2 and 3 ensure the fact that we can apply Property 3.6 of Guyon (1992) which asserts the almost sure convergence for minimum contrast estimators.

Proof of Theorem 2
Step 1. Asymptotic normality of U (1) n (Φ; θ ⋆ ) The aim is to prove the following convergence in distribution as n → +∞ where the matrix Σ( D, θ ⋆ ) is defined by (4).
The idea is to apply to U (1) n (Φ; θ ⋆ ) a central limit theorem obtained by Jensen and Künsch (1994), Theorem 2.1. The following conditions have to be fulfilled to apply this result. For all j = 1, . . . , p Condition (i) : From the stationarity of the process, it is sufficient to prove that Recall that for any configuration ϕ

LP L
(1) Denote respectively by G 1 (ϕ) and G 2 (ϕ) the first and the second right-hand term of (20) and by Let us define for any ϕ, the measure µ ϕ := x∈ϕ δ x . From the definition of Gibbs point processes, Since π is a Poisson process, and therefore, by introducing ψ : Now, from Campbell Theorem (applied to the Poisson measure π) where π ! x stands for the reduced Palm distribution of the Poisson point process. Since from Slivnyak-Mecke Theorem (see e.g. Møller and Waagepetersen (2003)), π = π ! x , one can obtain Condition (ii) : For any bounded domain ∆ one may write for j = 1, . . . , p

LP L
(1) The assumption [N1] ensures the integrability of the first right-hand term. For the second one, note that The result is obtained by using the assumption [N1] and iterated versions of Corollary 8.
Condition (iii): let us start by noting that from the assumption [Mod-L], the vector LP L (1) ∆i (ϕ; θ ⋆ ) depends only on ϕ ∆j for j ∈ (i, 1). Let E i,j := E LP L Let j ∈ I n ∩ (i, 1) c , since LP L (1) ∆i (ϕ; θ ⋆ ) is a measurable function of ϕ ∆ c i , we have by using condition (i): Denote by I n the following set I n = I n ∩ (∪ i∈∂In (i, 1)) .

Conclusion
Step Under the assumptions [Mod] and [Ident], and using Steps 1 and 2, one can apply a classical result concerning asymptotic normality for minimum contrast estimators e.g. Proposition 3.7 of Guyon (1992) in order to obtain (5).
It remains to prove (6). This may de done in two different steps. The first one consists in verifying the positive definiteness of the matrix Σ( D, θ ⋆ ). The proof is strictly similar to the one of Billiot et al. (2008) (p. 261) except that the assumption [SDP] is now simply replaced by the more general one assumption [N4]. Now, the convergence in probability of Σ n (Φ; θ n (Φ)) towards Σ( D, θ ⋆ ) is obtained by applying Proposition 9 of Coeurjolly and Lavancier (2010).