Maximum pseudolikelihood estimator for exponential family models of marked Gibbs point processes

This paper is devoted to the estimation of a vector $\bm {\theta}$ parametrizing an energy function of a Gibbs point process, via the maximum pseudolikelihood method. Strong consistency and asymptotic normality results of this estimator depending on a single realization are presented. In the framework of exponential family models, sufficient conditions are expressed in terms of the local energy function and are verified on a wide variety of examples.


Introduction
The class of Gibbs point processes is interesting because it allows us to introduce and study interactions between points through the modelling of an associated energy function. Historical aspects of the mathematical theory are covered briefly in Kallenberg (1983). When the energy function is parametrized, one among many other methods of estimation, is the maximization of the pseudolikelihood. Baddeley and Turner (2000) dealt with some practical aspects of such a parametric method and gave a survey on asymptotic results. They noticed that some classical examples (such as the area-interaction model, the Multi-Strauss model, the 2-type Strauss model) do not satisfy the assumptions of the existing asymptotic normality results. This paper aims at filling this gap.
Many proposals tried to estimate the energy function from the available point pattern data generated by some marked Gibbs point processes. If the energy belongs to a parametric family model, the most well-known methodology is the use of the likelihood function, see e.g. Møller and Waggepetersen (2003) and the references therein. The main drawback of this approach is that the likelihood function contains an unknown scaling factor whose value depends on the parameters and which is difficult to calculate. An alternative approach relies on the use of the pseudolikelihood. This idea originated from Besag (1974) in the study of lattice processes. Besag et al. (1982) further considered this method for pairwise interaction point processes, while Jensen and Møller (1991) generalized it to the general class of marked Gibbs point processes. A general review of the problem of statistical inference on spatial point processes including the Takacs-Fiksel method (a parametric method based on a characteristic property of marked Gibbs point processes using Palm measure) and non-parametric methods can be found in the recent monograph of Møller and Waggepetersen (2003).
In order to underline our theoretical contributions, let us present the different papers discussing asymptotic properties of the maximum pseudolikelihood estimator. The first work was done by Jensen and Møller (1991). The authors obtained consistency for exponential family models of marked point processes. They mainly considered inhibition and hard-core models. They notably applied their results on the marked Strauss process. Jensen and Künsch (1994) and Mase (1995) focused on specific models with two parameters -the chemical potential and the inverse temperature-which can be viewed as particular exponential family models. Jensen and Künsch (1994) obtained an asymptotic normality result by first assuming the inhibition or hard-core property and then the finite range property. Mase (1995) established consistency for the class of superstable and lower regular potentials introduced by Ruelle (1970). Mase (2000) extended his work to the context of marked point processes and provided asymptotic normality by adding the assumption of finite range. Our goal is to deal with most classical models (see Baddeley and Turner (2000), Møller and Waggepetersen (2003) and Bertin et al. (1999b)) that could be interesting for practical purposes. They have been put into three categories according to their validity with respect to the previous existing works: 1) Overlap area point process 2) Multi-Strauss marked model, Strauss disc type process. 3) area-interaction, Geyer's triplet process, k-nearest-neighbour multi-Strauss marked model. Let us notice first that all the examples belong to the exponential family and satisfy the local stability and finite range properties. Due to the parametrization proposed by Jensen and Künsch (1994) and Mase (2000), they only consider the first category. By considering the exponential family, Jensen and Møller (1991) include a larger class of models. However, the required inhibition or hard-core type assumptions are only satisfied for examples 1 and 2. Examples 3 are only locally stable. What remains to be established is consistency for examples 3 and asymptotic normality for examples 2 and 3. In this paper, a general framework is proposed taking into consideration the previous remarks. Results are obtained using the general theory on minimum contrast estimators, e.g. Guyon (1995).
Section 2 introduces some background on marked Gibbs point processes. Our models are defined in Section 3. In the same spirit as Bertin et al. (1999a), providing existence results of stationary Gibbs states, assumptions on these models are expressed in terms of the local energy function. We also describe examples of interest of this work. Section 4 presents the pseudolikelihood method and our main results requiring two additional assumptions. The first one is an identifiability condition ensuring strong consistency. The second one is related to the definiteness of the asymptotic covariance matrix of the maximum pseudolikelihood estimator. These two assumptions allow us to derive a practical asymptotic normality result. They are verified for all the considered examples in Section 5. It is the belief of the authors that these assumptions are not restrictive since they should be true for every well-parametrized model. Proofs have been postponed until Section 6.

Background on marked Gibbs point processes
For the sake of simplicity, the framework of this paper is restricted to twodimensional marked Gibbs point processes. All the results must remain valid in the general d-dimensional (d ≥ 1) case. Define B 2 the Borel σ-algebra on R 2 , B 2 b the set of bounded Borel susbsets of R 2 and λ 2 the Lebesgue measure on R 2 .
Denote also by Å, M and λ Ñ the mark space and its corresponding σ-algebra and probability measure. Let Ë := R 2 × Å, B := B 2 ⊗ M and µ := λ 2 ⊗ λ Ñ denote respectively the state space and its corresponding σ-algebra and measure.
For shortness, let us denote x m = (x, m) for any x ∈ R 2 and any mark m ∈ Å and |Λ| := λ 2 (Λ) for any Λ ∈ B 2 . In addition, |I| designates the number of elements of some countable set I, Λ c is the complementary of some set Λ in R 2 and || · || is the ℓ 2 -norm. Let us define for all Let Ω denote the set of so-called configurations -of marked points-ϕ := {x mi i } i∈I where I is a subset of N and ((x i , m i )) i∈I is a sequence of elements of Ë.
In particular, any element ϕ ∈ Ω has the following representation ϕ = i∈I δ x m i i as an integer-valued measure on Ë such that for every F ∈ B 2 b , ϕ(F ) ∈ N, where δ x m is the Dirac measure at some element x m ∈ Ë. The subset of Ω with elements ϕ satisfying |ϕ| := ϕ(Ë) < +∞ is denoted by Ω f . The space Ω is equipped with the σ-algebra F generated by the family of sets ϕ ∈ Ω : ϕ(F ) = n with n ∈ N and F ∈ B 2 b . For every F ∈ B 2 and ϕ ∈ Ω represented as ϕ = i∈I δ x m i i , one introduces ϕ F := i∈I,x m i i ∈F δ x m i i which can be viewed as the configuration of marked points of ϕ in F . Furthermore, for every Λ ∈ B 2 b , ϕ Λ conveniently denotes ϕ Λ×Å .
A marked point process is a Ω-valued random variable, denoted by Φ, with probability distribution P on ( Ω, F ). The intensity measure N P of P is defined as a measure on B 2 such that for any F ∈ B 2 b : In the stationary case, N P (F ) = ν P λ 2 (F ) where ν P is called the intensity of P . A marked Gibbs point process is usually defined using a family of local specifications with respect to a weight process (often a stationary marked Poisson process with distribution Q and intensity λ Q = 1). Let Λ be a bounded region in R 2 . For such a process, given some configuration ϕ Λ c on Λ c , the conditional probability on Λ is of the form, for any F ∈ F: with the partition function Let us define the subset of all admissible configurations and denote by Ω f := Ω f ∩ Ω. Whereas the finite energy function V (ϕ) (for any ϕ ∈ Ω f ) measures the cost of any configuration, the local energy V (ψ|ϕ) (for any ϕ, ψ ∈ Ω f ) represents the energy required to add the points of ψ in ϕ: Let us notice that when ψ is a singleton {x m }, we denote by a slight abuse V (x m |ϕ) instead of V ({x m }|ϕ). It is well-known that the collection of probability kernels (Π Λ ) Λ∈B 2 b satisfies the set of compatibility and measurability conditions which define a local specification in the Preston's sense (Preston (1976)). The main condition is the consistency: Notice that some conditions are needed to ensure the existence of a probability measure P related to any local energy V and any weight process that satisfies the so-called Dobrushin-Lanford-Ruelle (D.L.R.) equations: For the general theory of Gibbs point processes, the reader may refer to Kallenberg (1983); Stoyan et al. (1995) and the references therein. For some finite configuration ϕ (resp. some set G) and for all x ∈ R 2 , ϕ x (resp. G x ) denotes the configuration ϕ (resp. the set G) translated of x. Finally, in this work a non-marked point process can be viewed as a particular case of marked point processes with Å = {0}.

Definitions and examples of marked Gibbs models
The framework of this paper is restricted to stationary marked Gibbs point processes based on an energy function invariant by translation, V (ϕ; θ), parametrized by some θ ∈ Θ, where Θ is some compact set of R p . The model is also assumed to belong to an exponential family, i.e.
where v(ϕ) = (v 1 (ϕ), . . . , v p (ϕ)) is the vector of sufficient statistics. The local energy is then expressed as Our models satisfy the general condition [Mod] described by the following statements: [Mod:S] Stability of the local energy: there exists K ≥ 0 such that for all [Mod:L] Locality of the local energy: there exists D ≥ 0 such that for all where B(x, r) denotes the ball centered at x ∈ R 2 with radius r > 0. Å × Ω f : Indeed, let I 1 and I 2 be the partition of {1, . . . , p} such that for any i ∈ I 1 (resp.  Bertin et al. (1999a)). • the local stability implies the Ruelle-bound correlation function (see Ruelle (1970)) leading to: E Φ B(0,D) β < +∞ for any β > 0.
• A key-ingredient of our proofs is the following one: for any α > 0, any θ ∈ Θ and any i = 1, . . . , p Let us now present some examples. Except the one based on the k-nearestneighbour graph, all examples are really classical and can be found e.g. in Baddeley and Turner (2000) and Møller and Waggepetersen (2003). For each example, we present the model through the sufficient statistics, the set of the parameters values (including Θ) for which the model is defined in the litterature and then we verify that [Mod] is satisfied. This proves in particular the existence of stationary Gibbs states in R 2 .
First of all, note that when v i (0 m |ϕ) := 1, then v i obviously satisfies . Recall that for a non-marked point process Å = {0}.

Multi-Strauss marked point process
The finite energy of this process is defined by p m 1 ,m 2 < +∞. In particular, the vector θ could be ordered as follows: (p m1,m2 − 1). One then derives the expression of the local energy where for convenience θ m1,m2 i and θ m2,m1 i denote the same parameter.

k−nearest-neighbour multi-Strauss marked point process
• This marked point process is defined similarly as the multi-Strauss marked point process except that the complete graph P 2 (ϕ) in (3.4) is replaced by the k-nearest-neighbour graph (k ≥ 1).

Geyer's triplet interaction point process
• This non-marked point process is defined for p = 3 and R > 0 by Note that v 3 (ϕ) represents the number of triangles of ϕ with edges of lengths lower than R.

Area interaction point process
• This model is the one-type marginal of the two-type Widom-Rowlinson model. Let p = 2 and R > 0 Note that v 2 (ϕ) represents the area of the union of discs of radius R centered at the points. Alternatively, Remark 1. Jensen and Møller (1991)  Remark 2. Note that unlike the Multi-Strauss marked point process, neither inhibition nor hard-core assumption is required for the k−nearest-neighbour multi-Strauss marked point process since its local energy is naturally stable.
Remark 3. Concerning the Geyer's triplet process, the case θ 3 = 0 is not considered since it is a particular case of a multi-Strauss marked point process.
Remark 4. Through these different examples, one may note that some parameters are assumed to be known: for example, the parameters D m1,m2 i for the multi-Strauss marked point process, the hard-core parameter δ, the parameter M max for the Strauss type disc process, the parameter R for the Geyer's triplet process or the area interaction point process, . . . . Their estimations could be investigated by using ad hoc methods.

Pseudolikelihood
As specified in the introduction, the idea of maximum pseudolikelihood is due to Besag (1974) who first introduced the concept for Markov random fields in order to avoid the normalizing constant. This work was then widely extended and Jensen and Møller (1991) (Theorem 2.2) obtained a general expression for marked Gibbs point processes. With our notation and up to a scalar factor, the pseudolikelihood defined for a configuration ϕ and a domain of observation Λ is denoted by P L Λ (ϕ; θ) and given by It is more convenient to define (and work with) the log-pseudolikelihood, denoted by LP L Λ (ϕ; θ).
Our data consist in the realization of a point process with energy function V (·; θ ⋆ ) satisfying [Mod]. Thus, θ ⋆ is the true parameter to be estimated and it is assumed that θ ⋆ ∈Θ. The Gibbs measure will be denoted by P θ ⋆ . Moreover the point process is assumed to be observed in a domain Λ n ⊕ D ∨ = ∪ x∈Λn B(x, D ∨ ) for some D ∨ ≥ D. For the asymptotic normality result, it is also assumed that Λ n ⊂ R 2 can de decomposed into ∪ i∈In ∆ i where I n = (0, n) and for i = (i 1 , i 2 ) ∈ Z 2 , ∆ i = ∆ i ( D) for some D > 0 fixed from now on. As a consequence, as n → +∞, Λ n → R 2 such that |Λ n | → +∞ and |∂Λ n | |Λ n | → 0.

Asymptotic results of the MPLE
This section provides consistency and asymptotic normality of the maximum pseudolikelihood estimator. Let us first consider the following assumption [Ident] Identifiability condition: there exists A 1 , . . . , A ℓ , ℓ ≥ p events of Ω and A Ñ 1 , . . . , A Ñ ℓ events of M such that:
For the next result consider

Theorem 2. Under the assumptions [Mod] and [Ident]
, we have, for any fixed D, the following convergence in distribution as n → +∞ (4.4) In addition under the assumption [SDP] 5) where for some θ and any configuration ϕ, the matrix Σ n (ϕ; D ∨ , D, θ) is defined by (4.6) Remark 5. Let us underline that the scaling that yields to asymptotic normality corresponds to the usual parametric rate. Indeed, in the d-dimensional case one would obtain (4.7) The assumption [SDP] may be rewritten for all k = 1, . . . , ℓ and for all ϕ k ∈ A k and ϕ 0 ∈ A 0 :

Back to examples
where for any configuration ϕ ∈ Ω and ϕ 0 ∈ A 0 Concerning this assumption, we choose D > D in all our examples.
For any ϕ n ∈ A n (η), we have where K comes from the local stability property. Let us also remark that Therefore by combining these arguments, for every ε > 0, we have for n large enough By choosing ε = |y2|g2(η) 2 , this leads to y 2 = 0. Then, for every ε ′ > 0 we may obtain for n large enough By choosing ε ′ = |y 1 |/2, this leads to y 1 = 0.

Assumption [SDP]
Let us first introduce the following sets for any η > 0 and d > 0 For any i ∈ {2, . . . , p m1,m2 }, when η is small enough, the couple of points ). We now derive the following events First of all note that for any ϕ 0 ∈ A 0 , y T LP L We leave the reader to check that for every ε > 0 there exists η > 0 small enough such that y T L(ϕ m1,m2 Therefore for every ε > 0 we have for η small enough By choosing ε = |y m1,m2 p m 1 ,m 2 |, this leads to y m1,m2 p m 1 ,m 2 = 0. By iterating this argument, we obtain that for any m 1 , m 2 ∈ {1, . . . , M }, y m1,m2 2 = · · · = y m1,m2 p m 1 ,m 2 = 0. It remains to prove that y 1,1 1 = · · · = y M,M 1 = 0. For this, consider the following configuration set indexed by n ≥ 1 For any ϕ m1 n ∈ A m1 n , we have Hence for every ε > 0 we have for n large enough by using the local stability property |y m1,m1 By choosing ε = |y m1,m1 1 |/2, this leads to y m1,m1 1 = 0.

k−nearest-neighbour multi-Strauss marked point process
Assumption [Ident] (resp. [SDP]) is proven without any change in the proof of the multi-Strauss marked point process for every k ≥ 1 (resp. k ≥ 2). The proof of [SDP] for k = 1 is omitted.

Assumption [Ident]
By considering which is clearly injective since det V = 1.
We leave the reader to chek that for j = 1, . . . , p and for any ϕ n ∈ A n (η) where K comes from the local stability property. Let us also remark that Therefore by combining these two arguments, for every ε > 0, we have for n large enough |y 3 | 2 = y 3 2 + y T (L(ϕ n ; θ ⋆ ) − R(ϕ n )) n(n − 1)(n − 2) ≤ ε.

Tools
Let us start by presenting a particular case of the Campbell Theorem combined with the Glötzl Theorem that is widely used in our future proofs.
Corollary 3. Assume that the (marked) point process Φ with probability measure P is stationary. Let Λ be a bounded Borel set, let ϕ ∈ Ω and let g be a function satisfying g(x m , ϕ x ) = g(0 m , ϕ) for all x m ∈ Ë. Define M a random variable with its distribution λ Ñ and f (m, ϕ) = g(0 m , ϕ)e −V (0 m |ϕ) and assume that f ∈ L 1 (λ Ñ ⊗ P ). Then, where ν P (·) is the Radon-Nikodym derivative of N P with respect to µ.
Let us now present a version of an ergodic theorem obtained by Nguyen and Zessin (1979) and widely used in this paper. Let D > 0 and denote by ∆ 0 the following fixed domain Theorem 4 (Nguyen and Zessin (1979)). Let {H G , G ∈ B 2 b } be a family of random variables, which is covariant, that for all x ∈ R 2 , H Gx (ϕ x ) = H G (ϕ), for a.e. ϕ and additive, that is for every disjoint G 1 , G 2 ∈ B 2 b ,

Let I be the sub−σ−algebra of F consisting of translation invariant (with probability 1) sets. Assume there exists a nonnegative and integrable random variable
for each regular sequence G n → R 2 .

Proof of Theorem 1
Due to the decomposition of stationary measures as a mixture of ergodic measures (see Preston (1976)), one only needs to prove Theorem 1 by assuming that P θ ⋆ is ergodic. From now on, P θ ⋆ is assumed to be ergodic. The tool used to obtain the almost sure convergence is a convergence theorem for minimum contrast estimators established by Guyon (1995). We proceed in three stages.
By noticing that the function t → e t − (1 + t) is nonnegative and is equal to zero if and only if t = 0, one concludes that the random variable and equals to zero when θ = θ ⋆ . Before ending this step, note that for any ϕ, U n (ϕ; ·) and K(·, θ ⋆ ) are clearly continuous functions.
the expected result (6.6) is proved. Conclusion step. The Steps 1, 2 and 3 ensure the fact that we can apply a consistency result on minimum contrast estimators, see Guyon (1995).

Proof of Theorem 2
Step 1. Asymptotic normality of U (1) n (Φ; θ ⋆ ) The aim is to prove that for any fixed D, the following convergence in distribution as n → +∞ where the matrix Σ( D, θ ⋆ ) is defined by (4.4).
The idea is to apply to U (1) n (Φ; θ ⋆ ) a central limit theorem obtained by Jensen and Künsch (1994), Theorem 2.1. The following conditions have to be fulfilled to apply this result. For all j = 1, . . . , p + 1 Condition (i): From the stationarity of the process, it is sufficient to prove that Recall that for any configuration ϕ

LP L
(1) Denote respectively by G 1 (ϕ) and G 2 (ϕ) the first and the second right-hand term of (6.8) and by From the definition of Gibbs point processes, ). Since Q is a Poisson process we can write Now, from Campbell Theorem (applied to the Poisson measure Q) Since from Slivnyak-Mecke Theorem, Q = Q ! x , one can obtain Condition (ii): For any bounded domain ∆ and any configuration ϕ, one may write for j = 1, . . . , p + 1

LP L
(1) , both right-hand terms are integrable with respect to P θ ⋆ , which implies that for any domain ∆ and in particular for ∆ i , Condition (iii): let us start by noting that the vector LP L (1) by using the stationarity of the process. From definitions, we can obtain By using condition (i), one may assert that for any Denote by I n the following set We now obtain Using the stationarity and the definition of the domain Λ n , one obtains as n → +∞ and as n → +∞. Hence as n → +∞ Step 2. Domination of U (2) n (Φ; θ) in a neighborhood of θ ⋆ and convergence of U (2) n (Φ; θ ⋆ ) Let j, k = 1, . . . , p, recall that U (2) n (ϕ; θ) j,k is defined for any configuration ϕ by Using the local stability property it comes From [Mod], one can apply Theorem 4 (Nguyen and Zessin (1979)). It follows that there exists n 0 such that for all n ≥ n 0 Note that from Theorem 4 (Nguyen and Zessin (1979)), for all θ (and in particular θ = θ ⋆ ), U (2) n (·; θ) converges almost surely as n → +∞ towards U (2) (θ) which is the (p, p) matrix with elements U (2) (θ) j,k = E v j (0 m |Φ)v k (0 m |Φ)e −V (0 m |ϕ;θ) , for j, k = 1, . . . , p.

Conclusion
Step Under the assumptions [Mod] and [Ident], and using Steps 1 and 2, one can apply a classical result concerning asymptotic normality for minimum contrast estimators, see Guyon (1995) in order to obtain (4.3). Now, the result (4.5) is proved in three substeps: (i) We first prove that the matrix Σ(θ ⋆ ) = Σ(D, θ ⋆ ) is a symmetric definite positive matrix. From Equation (6.9), it is sufficient to prove that the matrix Var(|Λ n (D)| −1/2 LP L (1) Λn(D) (Φ; θ ⋆ )) is definite positive for n large enough. Let y ∈ R p \ {0}, the aim is to prove that V := y T Var |Λ n (D)| −1/2 LP L Note that for i = j ∈ I n , Cov y T LP L Following the proof of Step 1, condition (iii) one may prove that the second right-hand term tends to 0 as n → +∞. Therefore by using the stationarity, we have for n large enough (1) Λ (ϕ 0 ; θ ⋆ ) j is assumed to be injective, this leads to y = 0 and hence to some contradiction. Therefore, when the variables Φ ∆ k (D) , 1 ≤ |k| ≤ 2 D D are fixed to ∅, the variable y T LP L (1) Λ (Φ; θ ⋆ ) is almost surely not a constant. Hence, Σ(θ ⋆ ) is a symmetric definite positive matrix.