Spatial logistic regression and change-of-support in Poisson point processes

In Geographical Information Systems, spatial point pattern data are often analysed by dividing space into pixels, recording the presence or absence of points in each pixel, and fitting a logistic regression. We study weaknesses of this approach, propose improvements, and demonstrate an application to prospective geology in Western Australia. Models based on different pixel grids are incompatible (a ‘change-of-support’ problem) unless the pixels are very small. On a fine pixel grid, a spatial logistic


Introduction
This paper investigates techniques used in Geographical Information Systems (GIS) to analyse spatial point pattern data. Figure 1 shows a motivating example, a map of the spatial locations of gold deposits and other geological features in a survey region. The goal of analysis is to predict the occurrence of gold deposits using other covariates. Further details are given in Section 3. In statistical science, spatial point pattern data such as Figure 1 are usually analysed by applying spatial point process methods [29,59,43] to the exact spatial coordinates of the data points. However in GIS practice, the first step in analysing a spatial point pattern is usually to discretise it.
Logistic regression is commonly used in GIS to predict the probability of occurrence of mineral deposits [1,20], archaeological finds [69,49], landslides [21,35] and other events which can be treated as points at the scale of interest. The study region is divided into pixels; in each pixel the presence or absence of any data points is recorded; then logistic regression [58,31,42] is used to predict the probability of the presence of a point as a function of predictor variables. Pixel-based spatial logistic regression for point events was developed in geology by Agterberg [1] on the suggestion of Tukey [74]. It was later independently rediscovered in archaeology [69,39,49,50] and is now standard in GIS.
The implications of logistic regression are not well understood in the GIS literature; some writers describe it as a "nonparametric" technique [51, p. 24], and the interpretation of the fitted parameters is widely held to be obscure. Nonetheless, it is clear that a pixel-based logistic regression is approximately equivalent to a Poisson point process model for the original data points.
Pixel-based logistic regression is not necessarily worse than other techniques, because all point process methods involve numerical approximations. The likelihood of a spatial Poisson point process is not generally available in closed form, so approximations are unavoidable. Spatial discretization has the advantage of converting the point process to a generalized linear model. Lewis [54] and Brillinger [16,18,17] showed that the likelihood of a general point process in one-dimensional time, or a Poisson point process in higher dimensions, can be usefully approximated by the likelihood of logistic regression for the discretised process. Asymptotic equivalence was established in [15]. This makes it practicable to fit spatial Poisson point process models of general form to point pattern data [14,22,8,9] by enlisting the efficient and reliable software already developed for logistic regression. Approximation of a stochastic process by a generalized linear model is now commonplace in applied statistics [55,56,57].
Pixel discretisation is not fundamentally objectionable, but it tends to obscure the stochastic model. GIS literature is unclear about the precise meaning and implications of pixel-based logistic regression; the interpretation of a fitted model; prediction from the model; and the relationship between models fitted using different pixel sizes.
Hence, there are immediate benefits for GIS users in recognising that pixelbased logistic regression is essentially equivalent to a Poisson point process. A wide variety of statistical tools for inference, model selection, validation and prediction is available for Poisson processes. This will be outlined in a companion paper [7].
On the other hand, it appears to be unknown precisely how much efficiency is lost by discretising a point pattern onto a pixel grid. There are also unanswered questions about the relationship between different models and the optimal choice of fitting technique. Further practical algorithmic problems are encountered in fitting pixel-based logistic regressions; GIS manuals suggest pragmatic solutions, without casting much light on the problems.
Thus, what seems to be missing is a detailed study of pixel-based logistic regression from the standpoints of stochastic modelling and statistical methodology. This is the aim of the present paper. Specifically we examine the implications of pixel-based logistic regression as a stochastic model; the relevance of pixel size; the precise connection to Poisson point process models; and the performance of parameter estimators and algorithms. Our major findings are summarised in the next section.

Pixel size
Of course, counting points inside pixels is a special case of aggregating spatial data into discrete geographical areas. This has been studied extensively in epidemiology and in ecological and environmental statistics [32,79,78,77]. However, most epidemiological research deals with geographical areas that are predetermined regions of appreciable size. In pixel-based regression, the pixel grid is chosen arbitrarily, and is usually very fine, which leads to a different set of methodological and practical problems.
When the pixel grid can be chosen arbitrarily, important questions include the equivalence of models fitted using different pixel grids (the "modifiable area unit problem" [61] or "change-of-support" [36,11,24]), the relation between discrete and continuous spatial models ("ecological fallacy" [66]) and bias due to aggregation over pixels ("ecological bias" [77,78] or aggregation bias [27,3]). GIS literature seems unclear on these questions, except to acknowledge that the interpretation of the fitted probabilities and model parameters clearly depends on the pixel size. The interpretation of fitted parameters is often said to be obscure [82, p. 175] and is typically based only on the sign of the slope parameters [35, pp. 405-407].
One of our unexpected findings is that it may be impossible to reconcile two spatial logistic regression models that were fitted to the same spatial point pattern data using different pixel grids (Section 5.1). Two such models are logically incompatible except in simple cases. Thus, there is no physical process which satisfies a logistic regression model whenever it is discretised on any pixel grid. The implication is that two research teams who apply spatial logistic regression to the same data, but using different pixel sizes, may obtain results that cannot be reconciled. A spatial logistic regression, fitted to data from a fullyexplored survey region, cannot always be extrapolated to make predictions in a prospective exploration region.

Poisson point process
In the GIS community the accepted practical remedy for such inconsistency is to take a very fine pixel grid. As the pixel size tends to zero, spatial logistic regressions with the same form of linear predictor are asymptotically equivalent, provided the linear predictor includes an intercept term.
However, the limit of spatial logistic regressions as pixel size tends to zero is a Poisson point process. Essentially this follows from the assumption of independent responses in logistic regression. A more precise result is given in Section 6 using distributional approximation. Furthermore, the use of the logistic link implies that the intensity of the limiting Poisson point process is a loglinear function of the spatial covariates, termed a modulated Poisson process by Cox [23]. Thus, contrary to the general view in the GIS literature, there is a clear physical meaning for the model parameters of spatial logistic regression, when pixels are small (cf. [10]). The Poisson model allows prediction of several quantities of interest, as we discuss in a companion paper on applications [7].
Thus, spatial logistic regression is physically meaningful only when pixels are extremely small, in which case it is equivalent to a Poisson point process with loglinear intensity. In the remainder of the paper we assume a loglinear Poisson point process model holds (for point events in the continuous spatial domain) and investigate optimal parameter estimation. In particular we compare the statistical performance of spatial logistic regression with the theoretically optimal method.

Estimation
We find that, in many cases, logistic regression using a fine grid of pixels is practically equivalent to the theoretical ideal of maximum likelihood estimation using the exact spatial coordinates of the data points. However, statistical performance and optimality depend very strongly on the regularity of the spatial covariates as functions of spatial location. If the covariates are piecewise constant inside each pixel, then the pixel presence/absence data conform exactly to a complementary log-log regression, i.e. binary regression with the link g(p) = log(− log(1 − p)) instead of the logit link g(p) = log(p/(1 − p)). Complementary log-log regressions on different pixel grids are compatible. Logistic regression estimates of the parameters have bias of order O(a) and efficiency of order 1 − O(a) where a is the pixel area. Alternatively if the covariates are smooth functions, then the pixel presence/absence data do not conform exactly to any generalized linear model. Logistic regression and complementary log-log regression estimates have bias of order O(δ) and efficiency of order 1 − O(δ) where δ is pixel diameter. If the covariates are not smooth, logistic regression and complementary log-log regression estimates may converge poorly. All three scenarios are relevant in practice, but only the last one causes difficulty.
However, using a very fine grid of pixels may be computationally expensive and can cause numerical instability in model-fitting algorithms. In a fine pixel grid, almost all pixels do not contain a data point so that the overwhelming majority of responses are zeroes. Taylor series approximation of the loglikelihood may be inaccurate, leading to numerical instability and the Hauck-Donner effect [40]. This is likely if one covariate is highly influential [75, p. 198], intuitively because this approaches a scenario where the MLE does not exist [72].
To avoid such problems, a strategy popular in the GIS literature is to randomly sub-sample those pixels with value 0, while retaining all the pixels with value 1, and to apply logistic regression (with an offset term) to the selected pixels. Conditional on the sample, logistic regression is optimal (due to the relation between loglinear Poisson and logistic regressions) but subsampling inflates the bias and variance.
We propose a new algorithmic strategy for increasing efficiency and reducing computational load. This is applicable to the case where a covariate is piecewise smooth on each of several subdomains D 1 , . . . , D m whose boundaries are very irregular. Take a coarse pixel grid and subdivide each pixel S j into sub-pixels S jk = S j ∩ D k . Treating the sub-pixels (S jk ) as a new pixel grid, apply logistic regression or complementary log-log regression with the appropriate offset. This strategy can greatly increase efficiency.

Plan of paper
The plan of the paper is as follows. The geological example in Figure 1 is discussed in detail in Section 3. Basic definitions are stated in Section 4. Incompatibility of logistic regressions using different pixel sizes is studied in Section 5. Poisson approximation results are given in Section 6. In the remainder of the paper we assume a loglinear Poisson point process model. Section 7 discusses ('exact') maximum likelihood for this model based on observation of the exact location of each point. Section 8 discusses maximum likelihood based on discretised observations on a pixel grid. In special cases, the discretised observations conform exactly to a generalized linear model, as explained in Section 9. Otherwise, spatial logistic regression is an approximation, as discussed in Section 10. The bias incurred by this approximation is calculated in Section 11. Two synthetic examples are analysed completely in Section 12. The GIS subsampling strategy is analysed in Section 13. Our new strategy of splitting pixels is presented in Section 14. The geological example (Section 3) is analysed in full in Section 15. Conclusions and recommendations are summarised in Section 16. Figure 1 shows the spatial locations of gold deposits and associated geological features in the Murchison area of Western Australia. The data were extracted from a large scale (1:500,000) study of the Murchison area by the Geological Survey of Western Australia [80]. The study region is contained in a 330 × 400 kilometre rectangle. At this scale, gold deposits are point-like. Gold deposits in this region should occur only in greenstone bedrock. 'Outcrop' is bedrock that is exposed at the surface and therefore observable in a geological survey. Geological faults can be observed reliably only within the greenstone outcrop region. However, some faults have been extrapolated (by geological "interpretation") beyond the outcrop.

Data
Prospective geology aims to predict the occurrence of valuable deposits, or at least to identify areas of higher 'prospectivity', where a deposit is relatively likely. Surveys such as Figure 1 can be used as training data for learning about relationships between the abundance of deposits and geological covariates such as geochemistry and the proximity of other features. The appropriate relationship is effectively a point process model. A fitted model can then be applied in a new exploration area to predict the spatially varying intensity of deposits.
The ancient bedrock of Western Australia has been heavily eroded and leached, removing much geologically useful information from the surface. This militates against the use of geomagnetic or gravitational measurement, alluvial chemistry and other techniques. It is in this context that traditional surveys like Figure 1 are important.
The Murchison data have been analysed by various methods [19,33,37,46]. The consensus is that there is strong evidence that the gold deposit pattern has higher intensity at locations closer to the faults.

Traditional analysis
To apply spatial logistic regression, we divided the study region into an N × N grid of pixels. For each pixel j, we recorded the binary response y j = 1 if at least one gold deposit fell inside the pixel, and y j = 0 otherwise. The two covariates were the greenstone indicator g j (equal to 1 if the centre of pixel j is inside the greenstone area, and 0 otherwise) and the fault distance d j (the distance in kilometres from the centre of pixel j to the nearest fault). We then fitted the two logistic regression models where p j = P{Y j = 1}. Table 1 shows the fitted coefficients obtained using different pixel resolutions N . The standard iteratively-reweighted least squares fitting algorithm in the R system [63,75] reported numerical exceptions when fitting model (2), but not with model (1).
Having fitted a model, the usual practice in geological applications is to display a pixel image of the fitted probabilitiesp j as shown in Figure 2. Direct interpretation of the fitted parameters from spatial logistic regression is widely regarded as difficult in GIS literature [82, p. 175]. The intercept clearly depends strongly on pixel size; usually the other parameters are only given a qualitative interpretation based on the sign of the coefficient (cf. [35, pp. 405-407] ).
The estimates of the slope parameter γ 1 in (2) are approximately independent of pixel size, but the estimates of factor effects β 1 , β 2 in (1) change by up to 50% as pixel size is reduced. This effect is discussed in Section 5. The negative values of β 2 , γ 1 are consistent with the interpretation that gold deposits are more abundant near a fault. Further analysis is reported in Section 15. Table 1 Estimates of the parameters in the spatial logistic regression models (1) and (2) for the Murchison data, obtained using different pixel grids. The parameters β 0 , β 1 , γ 0 are dimensionless while β 2 , γ 1 have units km −1

Definitions
The following definitions will be used throughout the paper. The study area is a compact region W ⊂ R d (usually d = 2) in which we observe a realisation of a spatial point process X. A pixel grid G is a partition of W into disjoint measurable sets S j ⊂ W of area (or d-dimensional volume) a j and diameter δ j , for j = 1, . . . , J. The typical pixel grid is a rectangular array in which all pixels have the same area; pixels of unequal area may be necessary on a curved surface (such as the Earth or celestial sphere). Irregular geographical areas can be regarded as generalised pixels, so that our results also apply to aggregated point counts in spatial epidemiology. Let N j = N (X ∩ S j ) be the number of points of X falling in pixel S j , and Y j = I{N j > 0} the indicator of the event that at least one point of X falls in S j . Write µ j = EN j for the expected number of points in S j , and p j = EY j = P{N j > 0} for the probability that at least one point of X falls in S j .
A spatial logistic regression model for X with respect to G assumes that the random variables Y 1 , . . . , Y n are independent with for j = 1, . . . , n, where β is a k-dimensional vector parameter and z j is a kdimensional vector of covariates associated with pixel S j . The general theory does not specify how the pixel covariate values z j should be determined when an artificial pixel grid G is superimposed on the domain W in continuous space. Suppose that Z : W → R k is a vector-valued function providing the values Z(u) of a k-dimensional covariate at any location u ∈ W . The computationally simplest way to define z j is to choose a representative or 'central' point c j ∈ S j for each pixel S j , and to read the covariate value at this location, Alternatively, for suitable types of covariate Z, the pixel value z j may be defined as the mean value over the pixel, where again a j is the area (or d-dimensional volume) of the pixel. Statistical properties are influenced by the spatial regularity of the function Z. We will consider three different cases. In the first and most optimistic case Z is constant within each pixel, so that (4) and (5) Examples include distance transforms (e.g. Z(u) is the distance from u to the nearest geological fault), geographical coordinates, and kernelsmoothed geochemical assay values. In the third case, Z is a non-Lipschitz function, typified by indicators of very irregular spatial domains such as exposed rock areas.

Incompatibility of models based on different pixel grids
Here we show that spatial logistic regressions at two different pixel scales can be mathematically inconsistent. Let G = {S j , j = 1, . . . , J} and H = {T k , k = 1, . . . , K} be two pixel grids on W . For convenience, assume that H is a refinement of G, obtained by subdividing each pixel S j in G into exactly r subpixels from H, where K(j) is the set of indices k such that T k ⊂ S j . Now let X be a point process on W , and assume that X satisfies a spatial logistic regression with respect to both grids G and H, with linear predictors of the same form, say where p j = P{N (X ∩ S j ) > 0} and q k = P{N (X ∩ T k ) > 0} are the presence probabilities for the pixels in G and H respectively. The k-dimensional vector covariate z(S j ) associated with a pixel S j is assumed to be determined either by pixel averaging (5) or sampling (4) of a covariate function Z : W → R k . The canonical parameters α, β are also k-dimensional vectors.
The independence of pixel responses in H, and the decomposition (7), imply that since both sides give the probability of having no points in S j . This key equation identifies the scaling property of the pixel probabilities: the quantity log(1 − p j ) is additive over pixels. It also implies that the three equations (8), (9) and (10) could be inconsistent in general.
Lemma 1. Assume that the covariate function Z is constant within pixels of G, i.e. (6) holds.
(a) Suppose there are at least three pixels S j , S j ′ , S j ′′ in G such that p j , p j ′ , p j ′′ are distinct values and z j , z j ′ , z j ′′ are collinear vectors. Then the three equations (8), (9) and (10) are logically inconsistent. (b) Suppose that {Z(u) : u ∈ W } is a basis for R k . Then (8), (9) and (10) are consistent.
For the proof, see Appendix A. 1. Lemma 1 has several implications in practice. First, two research teams who apply spatial logistic regression to the same data, but using different pixel sizes, may obtain results that cannot be reconciled. Second, there does not necessarily exist a random phenomenon in continuous space which we can rely upon to satisfy the assumptions of spatial logistic regression after it is discretised on a pixel raster of arbitrary scale. Third, a spatial logistic regression, fitted to data from a fully-explored survey region, can be extrapolated to make predictions in a prospective exploration region, only if the covariate data for the prospective region are converted to the same pixel grid size as the original survey data. Example 1. Consider spatial logistic regression with an intercept and a single slope parameter, log(p/(1 − p)) = β 0 + β 1 ζ. Thus Z(u) = (1, ζ(u)) where ζ : W → R is a real-valued spatial covariate. Assume ζ is constant within each pixel of G, but takes at least three different values on W . Then Lemma 1(a) applies, and spatial logistic regressions on two grids G and H are logically incompatible.
Inconsistency does not arise when the covariate Z is binary-or factor-valued, provided the model is not overdetermined.

Example 2.
Let the covariate T (u) take categorical values (e.g. representing soil type or rock type) with possible categories 1, 2, . . . , K. Effectively this divides the domain W into subregions W k = {u ∈ W : T (u) = k}. Assume that T is constant within pixels of G. Then a spatial logistic regression (8) asserts that the presence probability is constant across pixels within a sub-domain W k . The canonical covariate is Z(u) = (t 1 (u), . . . , t K (u)) where t k (u) = I{T (u) = k} is the dummy variable associated with category k. The values of Z are the standard unit basis of R K . Hence Lemma 1(b) applies, and two spatial logistic regressions (8) and (9) are logically consistent.
It can be shown [38] that logistic regressions on different pixel grids are incompatible.

Approximate compatibility on fine grids
Equations (8), (9) and (10) become asymptotically equivalent when pixels are small, under certain conditions. Heuristically, as pixel size tends to zero, the presence probabilities p j tend to zero, and the left side of (8) is log(p j /(1−p j )) ∼ log p j where ∼ denotes asymptotic equivalence. The relation (69) for conversion between two pixel grids becomes Here r is the number of pixels in grid H into which each pixel in G is subdivided.
where a j , b k are the areas (or ddimensional volumes) of pixels S j ∈ G and T k ∈ H respectively. Hence the following adjusted versions of the spatial logistic regressions (8) and (9) are approximately equivalent, with the same coefficient vector β: In practical applications, this requires that the linear predictor of the model include either an intercept term which can absorb the adjustment log a j , or a separate offset term log a j . A proof of asymptotic equivalence is given in Section 6.4.

Poisson process approximation
A spatial logistic regression on a fine pixel grid is approximately a Poisson point process. Intuitively this is clear, since spatial logistic regression assumes independence between pixels, and the presence probabilities p j in a fine grid are small. The classical Poisson limit theorem is almost enough to establish this. In Section 6.3 we obtain a stronger result giving a precise bound on the distance between the distribution of the two point processes in a quite general setting where we do not assume any particular model for the presence probabilities. A limit theorem for the special case of the logistic model (3) is then given as a consequence in Section 6.4.

Point processes
It is useful to represent a spatial point pattern x in W ⊂ R d as a finite measure x = n i=1 δ xi where x i ∈ W are the individual points (i = 1, . . . , n) and δ u is the Dirac measure which puts mass 1 at location u ∈ R d . We shall allow a point pattern x to have multiple points at the same location. A point pattern without multiple points is called simple. The number of points of x falling in a set A ⊆ W will be written as x(A) in the general case, and as N (x ∩ A) if x is simple. Denote by N the set of totally finite point measures (i.e. finite point patterns with multiplicity) on W , equipped with the (standard) vague topology and its Borel σ-algebra [44, Sections 4.1 and 15.7]. This is the minimal σ-algebra which renders X → X(A) measurable for every compact A ⊆ W . Thus point processes are interpreted as random point measures.

A metric between distributions of point processes
That is, in the most interesting case where x(W ) = y(W ) > 0, the d 1 -distance between x and y is the average distance between points under an optimal pairing. It can be seen that d 1 ≤ 1 is a metric that metrises the weak topology, which is the same as the vague topology, since W is compact. Now let P(N) be the space of probability measures on N, that is, the space of point process distributions. Define the d 2 -distance between two distributions P, Q ∈ P(N) by where Lemma 2. d 2 is a metric on P(N) with values ≤ 1 that metrises convergence in distribution; that is, for point processes X, For the proof see [70] and [83]. See [71] for an equivalent metricd 2 ≤ d 2 .

A general bound for Poisson process approximation of the observed process
Let λ be any finite measure on W . Then if the expectation measure of X is λ, this yields See Appendix A.2 for the proof. Note that we do not assume any particular model for the pixel probabilities p j = P{N j > 0}, only that the Y j are independent. This result applies equally to logistic regression, complementary log-log regresssion and other models mooted in the GIS literature. Moreover, the proof technique is flexible enough to allow for pixel values that are weakly dependent at the cost of additional small terms in the upper bound.
From the above result we obtain a Poisson limit theorem for a quite general sequence of point processes whose discretisations exhibit independent pixel values on successively finer pixel grids.
It is asymptotically infinitesimal if the pixel diameters δ m,j = diam(S m,j ) and d-dimensional volumes a m,j = |S m,j | satisfy δ m = max 1≤j≤Jm δ m,j → 0 and hence a m = max 1≤j≤Jm a m,j → 0 as m → ∞.
m∈N be an asymptotically infinitesimal, nested sequence of partitions of W . Assume that the random variables N m,j = N (X (m) ∩ S m,j ) satisfy Then X (m) converges in distribution as m → ∞ to a Poisson process with intensity λ.

See Appendix A.2 for the proof.
Definition 2. In the notation of Corollary 4, the sequence of pixel grids G (m) m is an asymptotically exact discretisation of the sequence of point processes Asymptotic exactness is slightly weaker than condition (ii) of Corollary 4. By Hölder's inequality, for any α > 0, Thus (ii) holds if the discretisation of X (m) is asymptotically exact and the sequence E X (m) (W ) 1+α m is bounded for some α > 0.

Poisson process approximation for spatial logistic regression
In spatial logistic regression, the relationship between the presence probabilities p and the explanatory covariates Z is controlled by the form of the logistic link function. In the fine-pixel limit, this also controls the relationship between the intensity of the limiting Poisson process and the covariates. As pixel size tends to zero we have p → 0 and log(p/(1 − p)) ∼ log p, so that the limiting Poisson intensity is a loglinear function of the covariates.
Theorem 5. Assume that the observation window W is compact and the covariate function Z : W → R k is continuous. Let G (m) m∈N be an asymptotically infinitesimal, nested sequence of partitions of W . For each m, let X (m) be a simple point process which, when discretised, satisfies the assumptions of spatial logistic regression; that is, writing N m,j = N (X (m) ∩ S m,j ), the indicators I{N m,j > 0} are independent for j = 1, 2, . . . , J m and the presence probabilities The covariate value z m,j is either a sampled value (4) or the average value (5) of Z over the pixel S m,j . Suppose furthermore that the discretisation of X (m) is asymptotically exact. Then X (m) converges in distribution as m → ∞ to a Poisson process with loglinear intensity λ, where log λ(u) = β t Z(u).

Maximum likelihood for loglinear Poisson point process
In the remainder of this paper we assume a loglinear Poisson model holds, and investigate the optimal way to fit such a model. We begin with a brief study of 'exact' maximum likelihood for the point process.

Assumptions
Assume X is a Poisson point process with log intensity Let G be a pixel grid consisting of pixels S j of area a j , j = 1, . . . , J. The corresponding pixel counts N j and presence/absence indicators Y j were defined in Section 4. Under the Poisson model, the N j are independent Poisson random variables with means and the presence-absence indicators are independent Bernoulli variables with

Point process likelihood
First consider the likelihood of the point process X based on complete information about the realisation The loglikelihood of the Poisson point process with intensity λ β (u), of general form, is as shown in [26, pp. 23, 213]. For the loglinear intensity (15) the loglikelihood takes the form This model is a canonically parametrised exponential family [13, p. 113], [53, pp. 23-24] with concave loglikelihood, with efficient score and Fisher information For a general concave loglikelihood, with arbitrary data, the existence and uniqueness of the maximum likelihood estimate (MLE) are not guaranteed, and depend on the absence of directions of recession [67,81,72,2,34]. For the loglikelihood (20) of the loglinear Poisson process, with data obtained from a realization of the Poisson process, the MLE exists uniquely except in trivial cases.
Lemma 6. Let X be a Poisson point process with loglinear intensity (15).
is positive definite. Then the model is identifiable. Conditional on the event that i Z(x i ) = 0, the MLEβ = argmax ℓ(β) exists and is unique.
For a sketch proof, see Appendix A. 4. The condition on M is analogous to requiring the design matrix of a GLM to have full rank. Note that the Lemma implicitly assumes that the data Z(x i ) are consistent with the model, since they are obtained from a realization of X. The condition that i Z(x i ) = 0 is satisfied if, for example, the model contains an intercept term and the realization x is not empty. Explicit examples are studied in Section 12.

Asymptotic normality
For very general Poisson point process models, Rathbun and Cressie [64, Theorem 11, pp. 135-136] and Kutoyants [48,Thm. 2.4,p. 51] proved that the MLE is consistent, asymptotically normal and asymptotically efficient in a 'large domain' limiting regime. See also [47]. The following simpler conditions (adapted from [64]) are sufficient here. We use || · || p to denote the usual L p norm in R d .
Theorem 7. Let W n , n = 1, 2, . . . be an increasing sequence of compact subsets of R d whose union is R d . Let X be a Poisson point process on R d with loglinear intensity (15) where β ∈ R k is a vector parameter and Z : R d → R k a vector covariate. Assume is positive definite for all n; and 3. |M n | → ∞, where | · | is the matrix norm |H| = sup{||Hx|| 1 : ||x|| 2 = 1}.
Letβ n be the MLE of β based on X ∩ W n . Thenβ n is a consistent, asymptotically normal and asymptotically efficient estimator of β with asymptotic vari- The integral in the loglinear Poisson process likelihood (20) is the Laplace transform of the covariate function Z. Consequently the likelihood is not generally available in closed form, and the maximum likelihood estimate does not generally permit exact analytic solution. Some form of numerical approximation of the integrals appearing in (20)-(22) is required. Approximations relevant to the spatial Poisson point process have been described in [14,8,22].

Maximum likelihood for discretised observations
In this section we again assume the data points are generated by a loglinear Poisson point process model (15) but now suppose that observations are discretised on a pixel grid. The available observations are either the counts N j or the indicators Y j . Various pathologies can now occur.

Counts
The loglikelihood based on the pixel counts N j is where the means µ j depend on β through (16). The score and Fisher information are respectively where The score could also be derived from the missing data principle [73]. Jensen's inequality (or the missing data principle for exponential families [62]) shows that I(β) − I d (β) is nonnegative definite. Note that (23) is an incomplete data loglikelihood [28]. It is generally not an exponential family, and may exhibit pathological behaviour. The Hessian is where In view of (26) and we recognise v j (β) as the variance-covariance matrix of Z(U ) where U is a random point in S j with probability density f (u) = λ β (u)/µ j . Hence (27) is the difference between two nonnegative definite matrices. For some realisations (N j ) and parameter values β the Hessian is not positive definite, and the loglikelihood may have multiple local maxima. Indeed the discretised model may be unidentifiable. The only case in which (23) is guaranteed to have a unique local and global maximum is when v j (β) = 0 in (27)-(28), which implies that the covariate Z(u) is constant within each pixel. This case is considered in Section 9.

Indicators
The presence/absence indicators Y j are Bernoulli with success probabilities We have so the loglikelihood for Bernoulli responses has score and Fisher information Jensen's inequality or the missing information principle can again be used to show that I d (β) − I b (β) is nonnegative definite. Pathological behaviour may occur here too, as the Hessian is not necessarily positive definite. In Example 4 the discretised model is again unidentifiable.

Reduction to generalized linear models
This section considers the special case (6) where the covariate Z is constant within each pixel. As shown in Section 8, this is the only case in which the discretised observations conform exactly to a generalized linear model.

Counts
If the covariate is constant within each pixel (6), the mean count (16) simplifies to and (23) is the loglikelihood of loglinear Poisson regression with an offset term: Discretisation causes no loss of information in this case: the counts N j are sufficient for the original model, and I(β) = I d (β). The discretised model is a canonically parametrised exponential family.

Indicators
If the covariate is constant within each pixel (6), the presence probability (30) reduces to so that (32) is the loglikelihood of complementary log-log regression with an offset: log(− log(1 − p j )) = log a j + β t z j , rather than logistic regression. That is, the exact likelihood is that of binary regression with the complementary log-log link rather than the logistic link. Information is lost by replacing N j by Y j . The Fisher information can be calculated from (34). This model is an exponential family, but is not canonically parametrised. The conditions for uniqueness of the MLE are the standard conditions for this discrete model [58]. Note that complementary log-log regressions on different pixel grids are compatible. Under the assumptions of Section 5.1, complementary log-log regressions on different pixel grids satisfy the multiplicative property (10) exactly.

A rule of thumb
Comparing (34) with (25) gives a rule-of-thumb for the relative efficiency of the MLE of β based on the presence/absence indicators Y j , relative to the MLE based on the pixel counts N j . If there is only one real covariate, the relative efficiency is A rule of thumb (derived by assuming the w j are all equal) is to approximate µ j is the average expected number of points per pixel. Thus the loss of efficiency should be tolerable as long asμ < 0.2, say. For pixels of equal area,μ = 1 N W λ β (u) du can be estimated by N (X)/N , the average number of data points per pixel.

Relations between regression models
The following fundamental relationships give some insight into the preceding results. Let N 1 , . . . , N J be independent Poisson variables satisfying a loglinear regression µ j = EN j = exp(β t z j ). Let Y j = I{N j > 0}. Then Y 1 , . . . , Y J satisfy a complementary log-log regression with the same linear predictor, log(− log(1− p j )) = β t z j where p j = P{Y j = 1}. This is well known [58, p. 212]. Now let then log(q j /(1 − q j )) = log µ j = β t z j . Thus, conditional on N j ≤ 1 for all j, the responses Y j satisfy a logistic regression with the same linear predictor.
One consequence is that spatial logistic regression and complementary log-log regression will be approximately equivalent if P{N j ≥ 2 for some j} is negligible.

Approximation by generalized linear models
In general, the discretised responses Y j do not conform exactly to any generalized linear model. This reflects a familiar difficulty with incomplete data likelihoods, in particular those from an exponential family. In the spatial context, it is an instance of the 'ecological fallacy' or 'modifiable area unit problem', closely related to the geostatistical problem of 'change of support' [36,11,66,61,24]. Both logistic regression and complementary log-log regression are practical approximations. The approximate loglikelihoods can be maximised using standard software, and may have better properties than the true loglikelihoods (23) and (32) of the discretised responses.

Logistic regression
In spatial logistic regression, the true binary loglikelihood (32) is approximated by the loglikelihood of logistic regression where z j are covariate values associated with pixels S j , and is the probability predicted by spatial logistic regression (11) in which the linear predictor includes an offset equal to the log of pixel area. The approximate MLE exists uniquely under usual conditions [58]. Note that the true loglikelihood (32) is not equivalent to (40), and indeed (32) is usually not equivalent to the loglikelihood of any GLM. The mean response EY j = p j depends on β through (30). Except in some extreme cases, this cannot be cast in the form g(p j ) = β t z j required for a binary GLM. Similarly, the loglikelihood (23) of the counts N j is generally not equivalent to the loglikelihood of any Poisson GLM.

Complementary log-log regression
The true binary loglikelihood (32) may alternatively be approximated by the loglikelihood of complementary log-log regression (38), where is the probability predicted by complementary log-log regression, with surrogate covariate values z j determined by sampling (4) or averaging (5) the covariate function Z.

Fine-pixel limit
When pixel size reduces to zero (the "fine-pixel limit") we shall prove that the likelihood of spatial logistic regression converges to the exact likelihood, and that the spatial logistic regression estimator β converges to the maximum likelihood estimator β.
To obtain convergence of the loglikelihoods as well as the estimators, the loglikelihood of spatial logistic regression (40) will be adjusted by subtracting j Y j log a j (where a j is the area of pixel j) to yield the adjusted loglikelihood where Y j is the presence indicator for pixel j.
Theorem 8. Consider the Poisson point process model with intensity of the form λ β (u) = exp(β t Z(u)) in a fixed bounded region W ⊂ R d , where β ∈ Θ = R k and Z : W → R k . Assume the function Z is Lipschitz with constant C, and that W Z(u) Z(u) t du is positive definite. Let β 0 be the true value of the parameter. Let G n , n = 1, 2, . . . be a nested, asymptotically infinitesimal sequence of pixel grids on W satisfying a n+1 ≤ a n /2 where n is the maximum pixel area in G n .
Consider the adjusted loglikelihood (44) of spatial logistic regression, in which the covariate value z j associated with a pixel S j is taken to be the average (5) of Z over the pixel. With probability 1, 1. log L adj (β) converges pointwise to the exact loglikelihood (20), 2. the first and second derivatives of log L adj (β) converge pointwise to the corresponding derivatives of the exact loglikelihood, and 3. the spatial logistic regression estimator β = arg max L adj (β) converges to the maximum likelihood estimator β = arg max L(β).
For the proof, see Appendix A. 5. A similar result holds for the adjusted loglikelihood of complementary log-log regression.

Bias in regression estimates
For practical applications it is important to assess the bias that arises from using logistic regression or complementary log-log regression in a pixel grid of a given resolution.

Approximate bias of logistic regression
Approximations for the bias can be obtained using a standard linearisation argument. Let β denote the estimate of β obtained using logistic regression. This is the zero of the logistic score The derivative of the logistic score is nonrandom (a consequence of the fact that the logit link is the canonical link for binomial likelihoods) so equals −I LR (β) where I LR (β) is the Fisher information of logistic regression. Assume a loglinear Poisson process (15) and denote by β 0 the true value of β. Then the true expectation of the logistic score is where p j = p j (β 0 ) is the true value of the success probability (30) for the loglinear Poisson process. Expanding U LR (β) about β 0 to first order, where the right side is equal to with p j given by (30) with β = β 0 . In test examples (Section 12.2) this is a tolerable approximation to the bias. An asymptotic justification for (48) is discussed in Section 11. 2. In real applications, the approximate bias incurred in spatial logistic regression may be estimated by re-fitting the model using the complementary log-log link, then either (a) evaluating the difference between the parameter estimates obtained using the two link functions, or (b) maximum likelihood estimation, using (49) where p j and β 0 are replaced by the fitted probabilities and estimated parameter vector obtained using the complementary log-log link.

Asymptotics of bias in spatial logistic regression
Asymptotic analysis of the bias approximation (49) is complicated because it involves two asymptotic regimes. Under "large-domain asymptotics" (Section 7.3) the exact maximum likelihood estimator β is close to β 0 . Under "fine-pixel asymptotics" (Section 10.2) the spatial logistic regression estimator β converges to the exact maximum likelihood estimator β. For a suitable sequence of models with increasing domains W n and increasingly fine pixel grids G n , both statements apply and (49) is an asymptotic approximation to the bias. Convergence involves a balance between the effects of reducing pixel size, which tends to reduce discretisation bias, and expanding the domain W n , which tends to increase bias. The asymptotic order of the expression on the right hand side of (49) depends on the rates of convergence in these two regimes and on the behaviour of the function Z.
Lemma 9. Let X be a Poisson point process with loglinear intensity (15) observed on an increasing sequence of domains W n ↑ R d . Assume the conditions of Theorem 7 hold. Let G n , n = 1, 2, . . . be a nested sequence of pixel grids on R d satisfying the assumptions of Theorem 8 and set a n = max j a n,j . Then (a) if Z is Lipschitz with constant C on each pixel of G n , for sufficiently large n, then with ||R n || = O(a n |W n |), and the leading term is (b) if Z is constant within pixels of G n , for all sufficiently large n, with ||R n || ≤ O(a 2 n |W n |), and For a proof see Appendix A. 6. The lemma implies that the value of the largedomain asymptotic bias term I(β 0 ) −1 E 0 U LR (β 0 ) is at most of order O(δ n ) and at best of order O(a n ), depending on the behaviour of the function Z.
Theorem 10 (Large domain, fine-pixel limit). Assume the conditions of Lemma 9. Let β n be the estimator of β obtained by logistic regression using the pixels of G n ∩ W n . Write Assume that I n → I and B n → B where I is a finite, positive definite matrix, and B is a finite vector. If either (a) Z is Lipschitz, δ n = o(|W n | −1 ), and b n = I −1 1 |W n | j Sjn λ(u) du − a jn e β 0 t zj z j ; or (b) Z is constant within pixels of G n , for all sufficiently large n, and a n = o(|W n | −1 ), and The proof of Theorem 10 is standard and is omitted. Note that pixel size must converge to zero at a faster rate in case (a) than allowed in case (b).

Bias in complementary log-log regression
Similar arguments apply to complementary log-log regression. The surrogate score is where p CL (β, j) is given by (43). The derivative of the surrogate score is Under the loglinear Poisson point process, the surrogate score and its derivative have ('true') expected values where p j is the true probability (30). So the bias is approximately and a similar analysis can be performed.

Theoretical examples
It is useful to study several examples which are computable analytically or with simple numerical approximations.

Factor covariate
Recall the setting of Example 2 where T (u) is a categorical covariate with possible values 1, 2, . . . , K effectively dividing W into subdomains W k = {u ∈ W : T (u) = k}. Consider a Poisson point process on W with different intensities λ k on the subdomains W k , where λ 1 , . . . , λ K are parameters. Thus λ(u) = λ T (u) , which can be recast in the form (15) by taking β k = log λ k and The exact maximum likelihood estimate of β k isβ k = logλ k whereλ k is the observed intensity of pointsλ k = N (X ∩ W k )/|W k | on the subregion W k . The Fisher information matrix (22) is diagonal with entries λ k |W k |.
Divide W into a grid of pixels of equal area a, and suppose T (u) is constant within each pixel. For all pixels S j contained in W k , we have µ j = aλ k and ∂µ j /∂β m = I{k = m}µ j . By straightforward calculation of (34) the Fisher information I b (β) is the diagonal matrix with entries λ k |W k | aλ k /(e aλ k − 1). Thus, replacing the pixel counts N j by the indicators Y j has a relative efficiency of aλ k /(e aλ k − 1) for each parameter β k .

Exact MLE of discretised observations
Divide the unit square into an N × N array of square pixels where γ 1 = exp(β 1 /N ) and γ 2 = exp(β 2 /N ). The partial derivatives are (∂/∂β 0 )µ ij = µ ij and Substituting into (25) and (34) yields the Fisher information for the count and indicator data respectively. Table 2 shows the asymptotic standard errors of the MLEβ (obtained as the square roots of the diagonal entries of the inverse of the Fisher information matrix) calculated for the model (55) with (β 0 , β 1 , β 2 ) = (2, 3, 2) discretised on a 32 × 32 pixel grid. We haveμ = 150.2/32 2 = 0.147 so the rule-of-thumb (Section 9.3) estimate of relative efficiency is 0.147/(exp(0.147) − 1) = 0.93. From Table 2 the actual relative efficiency (in replacing pixel counts by presence/absence) for the estimator of β 0 is (0.3315/0.3494) 2 = 0.90.    Table 3 shows the approximate bias (according to equations (49) and (54)) incurred by approximating this model by either a logistic regression or a complementary log-log regression, for various pixel grids. Note that the bias in logistic regression is consistently much larger than in complementary log-log regression.

Simulation experiments
We generated 10,000 realisations of the Poisson point process with intensity (55) in the unit square with parameters β = (2, 3, 2). For each realisation we fitted the model by various methods. Table 4 shows simulation estimates of bias, standard deviation and root-mean-square error for the different estimators of β. "Binary (logistic)" is the maximum likelihood estimator for spatial logistic regression, i.e. the maximiser of (40), where the responses Y j are pixel presence/absence indicators on a 32 × 32 pixel grid, and the linear predictor is of the form η = β 0 + β 1 x + β 2 y. "Binary (cloglog)" is the corresponding MLE for binary regression with the complementary log-log link. "Continuous" is the exact maximum likelihood estimate of β based on exact coordinates of all points in the Poisson process, obtained as the solution of (56). The bias and variance were estimated using the sample moments from the simulation.
In this model the expected number of data points is E[N ] = e β0 s 0 (β 1 )s 0 (β 2 ) = 150. 2. For a 32 × 32 grid, the mean expected number of points per pixel is µ = E[N ]/32 2 = 0.15, the maximum presence probability p j is 0.66 and the maximum value of P{N j ≥ 2} is 0. 29. Comparison with Table 2 shows that the predictions of asymptotic theory are quite accurate.

Sampling zero pixels
The use of fine discretisations of the spatial domain increases the computational load, and may cause numerical instability due to the overwhelming dominance of zero values of the indicators Y j . A practical strategy that is common in the GIS literature (e.g. [6, p. 378], [35, §3. 3.1, p. 403]) is to take a sample of the zero values. Then logistic regression is applied to the data consisting of the '1' pixels (all pixels containing a point of the point pattern) and the sampled '0' pixels. The validity of this procedure depends on the random sampling design.

Simple random sampling
Logistic regression for sampled data is valid if pixels are selected independently.
Lemma 11. For a particular pixel grid G let Y 1 , . . . , Y n be independent Bernoulli variables satisfying a spatial logistic regression (3). Conditionally on Y 1 , . . . , Y n , let S be a random sample of pixels, such that (a) if Y j = 1 then pixel j is sampled, (b) if Y j = 0, then pixel j is sampled with probability s j > 0 independently of other pixels. Then the conditional likelihood of Y 1 , . . . , Y n given S is the likelihood of the logistic regression Similarly, for spatial logistic regression with an offset (11) the conditional likelihood is that of logistic regression with an additional offset − log s j . The proof is straightforward. This is a subsampling property of logistic regression that is not shared by complementary log-log regression.
Since the discretisation is assumed to be very fine, the sampling procedure is virtually equivalent to generating a point process S of sample points. Logistic regression is optimal if the sample process S is a Poisson process.
Lemma 12. Suppose X is a Poisson point process in W with loglinear intensity (15). Let S be a Poisson point process in W with intensity function ρ(u), independent of X. Then the conditional likelihood of X given the superposition U = X ∪ S is the likelihood of the logistic regression where p(u) is the conditional probability that the point u belongs to X given that u ∈ U .
The connection between the Poisson and logistic models is of course very well known [58, p. 212]. The particular result above is well-known in the context of spatial epidemiology, e.g. [30]; however in that context X and S are observed point processes of cases and controls respectively, while in our context S is synthetic.
Subsampling entails a loss of efficiency, and this is easily calculated in the point process case. In Lemma 12 the conditional loglikelihood is and the score is .
Since X and S are independent Poisson processes, the Fisher information is Comparison with (22) shows the loss of efficiency relative to maximum likelihood. Note that subsampling avoids bias associated with spatial irregularity of covariates, assuming the exact values of the covariates are evaluated at each data point and sample point.

Other sampling patterns
If pixels are not sampled independently, or in continuous space if the sample points are not a Poisson process, then logistic regression is not appropriate. The conditional likelihood of X given X ∪ S is generally intractable.
Berman and Turner [14] proposed approximating the integral in the Poisson loglikelihood (19) by a finite sum over arbitrary sample points. Generalising, Rathbun et al [65] considered estimating functions of the form where ρ(u) is the intensity of S. The conditional expectation of U R (β) given In particular U R is an unbiased estimating function. More generally Waagepetersen [76] studied estimating functions of the form where s(u j ) is a suitable function. In particular if s(u j ) is the intensity of X ∪ S then this equation also has the property that E[U W (β) | X] = U (β) and hence is an unbiased estimating equation.

Splitting pixels
In this section we propose an alternative algorithmic strategy for avoiding the computational and numerical problems inherent in using very fine pixel grids, while reducing bias due to spatial approximation of irregular covariates.
The most common type of spatial irregularity in the covariate function arises when Z(u) is the indicator function of a spatial domain D with a very irregular boundary. The greenstone outcrop in the Murchison data (Section 3) is an example of such a domain. The indicator covariate causes a fundamental difficulty for pixel-based spatial logistic regression which is illustrated in Figure 4. A single square pixel S j is shown, containing one data point x i . The irregular spatial domain D covers part of the pixel, including the pixel centre c j , but excluding the data point x i .
Consider the Poisson process with intensity λ(u) = exp(β 0 + β 1 I(u)) where I(u) = I{u ∈ D} is the indicator covariate. In exact maximum likelihood estimation for the Poisson process (Section 7) the contribution to the score (21) from the pixel S j sketched in Figure 4 has components where a j = |S j | and A j = Sj I(u) du = |D ∩ S j |. In the GLM approximation to the likelihood of the pixel counts (Section 10) we effectively 'approximate' A j by a j I(c j ) and 'approximate' I(x i ) by I(c j ). In the case illustrated in Figure 4 this means replacing I(x i ) = 0 by I(c j ) = 1. Summing over all j we will have j a j I(c j ) ≈ j A j when pixel size is small, assuming the boundary of D is rectifiable, but i I(x i ) could be very different from j I(c j ) even for very fine pixel grids. This is a counterpart of the biostatistical principle that it is important to have accurate estimates of the hazard exposure for the cases in a case-control study.
A simple strategy for reducing bias is to subdivide each pixel into sub-pixels where the covariate is smooth. Suppose the study region W can be partitioned into subdomains D 1 , . . . , D M such that the restriction of Z to D m is a Lipschitz function, for each m = 1, . . . , M . This embraces the situation where we have several indicator covariates and several smooth covariates, from which we construct new covariates by algebraic combination.
The strategy is to split each pixel S j into sub-pixels T jm = S j ∩ D m for j = 1, . . . , J and m = 1, . . . , M . Then the partition (T jm ) j,m is treated as the pixel grid. Pixel covariate values z jm may be assigned either by averaging Z over T jm , or by evaluating Z at an arbitrary point c jm ∈ T jm . Then spatial logistic regression or complementary log-log regression is applied, with an offset log a jm where a jm = |T jm |. The split pixel grid satisfies the conditions of Lemma 9 and Theorem 10, which predict that the parameter estimates should converge at rate O(δ n ) or faster.
Split pixels T jm are somewhat analogous to 'mixels' (mixed pixels) as described in the remote sensing literature (cf. [41,45,52]), except that in our context the exact geometry of the subpixels is assumed known and does not need to be estimated.
In practice the split pixel strategy should be most efficient when the covariate discontinuities are covered by a relatively small number of pixels S j , so that the total number of nonempty sub-pixels T jm is small. Computational cost depends on the complexity of computing the areas |T jm | and of evaluating covariates.

Analysis of Murchison data
The various alternatives to spatial logistic regression developed in this paper are now demonstrated on the Murchison data introduced in Section 3. Software for performing the analysis was developed in the R system [63] as part of the spatstat library [9]. Table 5 shows the results of fitting spatial logistic regressions with an offset, Table 5 Estimates of the parameters in the offset spatial logistic regression models (64) and (65) for the Murchison data, obtained using different pixel grids to the Murchison data, where a = |W |/N 2 is the pixel area. This could alternatively have been derived from Table 1 by adding − log a to the estimates of β 0 and γ 0 in Table 1. Table 6 shows the results of fitting the analogous complementary log-log regressions

Pixel regression models
Numerical exceptions were again encountered in fitting model (67) but not (66). The estimates of the slope parameter γ 1 in (67) are approximately independent of pixel size, but the estimates of β 1 , β 2 in (66) change by up to 40% as pixel size is reduced.
These findings were broadly anticipated by our analysis in Section 11. For any spatial location u let d(u) be the distance from u to the nearest fault, and g(u) the indicator which equals 1 if u lies inside the greenstone outcrop region. The function d is Lipschitz with constant 1, by the triangle inequality. Lemma 9(a) applies to model (67) so the bias in the spatial logistic regression estimates of γ 0 , γ 1 should be at most of order O(δ n ). However, g is not Lipschitz and has jump discontinuities along a very irregular boundary. Hence, Lemma 9 does not apply to model (66) and the estimates could depend strongly on the pixel grid geometry. Table 7 summarises the results of applying the subsampling strategy of Section 13, Lemma 12, to the Murchison data. The objective is to fit the loglinear Poisson point process model A Poisson process of sample points with constant intensity ρ was generated; the covariates d(u) and g(u) were evaluated exactly at the data points and sample points; and logistic regression with offset − log ρ was fitted to the indicators (Y = 1 for data points, Y = 0 for sample points). Sampling intensity was ρ = Kλ where K ∈ {10, 100, 500} andλ = n/|W | was the empirical mean intensity of the data points. There are approximately s = nK sample points, so the computational load is roughly equivalent to pixel-based logistic regression with an m×m pixel grid, where m = n(K + 1) ≈ 50, 160 and 360 respectively. Estimates for K = 10 and 100 are based on 100 simulations while estimates for K = 500 are based on 30 simulations. Comparing Tables 7 and 9 we find that, for approximately equal computational effort, the mean square error is approximately equivalent.

Subsampling
Note that Table 7 refers to the strategy described in Lemma 12 of taking Poisson sample points in continuous space, and using the exact values of the covariates at the sample points. Table 8 shows the results of subsampling the pixels in spatial logistic regression as described in Lemma 11. For a 512 × 512 grid of pixels, the covariates were evaluated at the centre of each pixel. Pixels that did not contain a data point were randomly retained or deleted, with retention probability q, independently of other pixels. The retention probability q was taken to be K times the fraction of pixels containing a data point, with K chosen to be 10, 100 or 500. Logistic regression with offset log a − log q, where a is pixel area, was fitted to the data from the retained pixels. Table 8 shows the sample means and standard deviations of the resulting parameter estimates. For comparison, the last line of Table 8 gives the fitted coefficients obtained when all pixel data are used (obtained from Table 5). There are some persistent discrepancies between the outcomes of Tables 7 and 8 which we attribute to discretisation effects in the latter. 15. 3. Split pixels Table 9 shows the results of using the split-pixel strategy proposed in Section 14 to fit the spatial logistic regression model (64) and the complementary log-log regression (66). Any pixel S j that was intersected by the irregular polygonal boundary of the greenstone outcrop was divided into two sub-pixels T j0 and T j1 by this boundary. The areas of the two sub-pixels were computed using the discrete version of Green's formula. The distance covariate d jm for subpixel T jm was simply taken to be the value of d j for S j , computed as the distance from the centre of S j to the nearest geological fault. The greenstone indicator was assigned g jm = m for m = 0, 1.
Complementary log-log regression with pixel splitting produces very stable results. The parameter estimates obtained with a 64 × 64 pixel grid are within 5% of the estimates for the 512 × 512 grid. For a graphical comparison between the different methods, Figure 5 is a plot of estimates of the slope parameter β 2 in the Poisson loglinear model (68) using logistic regression or complementary log-log regression, with or without pixel splitting. These results confirm the high efficiency of the pixel-splitting strategy. Table 9 Parameter estimates using the split pixel strategy for the offset spatial logistic regression model (64)

Discussion of analysis
The model fitted by this analysis is a Poisson point process with intensity (68).
A colour image plot of the fitted intensity function (not shown) is visually very similar to a plot of the fitted pixel occupancy probabilities (Figure 2 of Section 3.2). However the values of the fitted intensity function have a physical meaning independent of the pixel size.
The foregoing data analysis is designed only to demonstrate features of the method. In particular it should be understood that other loglinear Poisson models could be fitted to the Murchison data using the same covariates (i.e. the original covariates can be transformed to make new canonical covariates). For example the intensity could be a log-quadratic function of distance to the nearest fault.
We have not undertaken a full analysis, which would require exploratory consideration of different models, inspection of model diagnostics, and model selection. Model diagnostics will be discussed in a sequel paper. Formal techniques for model selection are available since the Poisson likelihood can be evaluated explicitly; one may apply the likelihood ratio test based on an analysis of deviance, or criteria such as AIC.
We have avoided discussing the validity of the Poisson process model for the Murchison data. Gold deposits might well be spatially clustered (positively associated) due to random effects in the formation process. Landslides and avalanches are likely to be a regular (negatively associated) point process because each event requires the buildup of material [60, sec. 4]. Such models can be dealt with using established techniques of spatial statistics [25,29,59]. This is a general issue for GIS data analysis that is brought into focus by the current work . We implicitly assumed that the point pattern dataset is a complete and accurate record of the realisation of the point process. That is, the map of gold deposits is complete and accurate within the study region. In reality such maps typically show only the known deposits. At the very least, one should allow for spatial variation in the intensity or reliability of the survey. To a first approximation, everything is observable inside an outcrop (rock not covered by soil) while nothing is directly observable elsewhere.

Conclusions and recommendations
In applications where a study region is divided into pixels of arbitrary size, we have shown that spatial logistic regression models do not necessarily have a consistent meaning, independent of the choice of pixel size. A consistent meaning is obtained only when pixels are very small; in this case, the model is a Poisson point process, whose intensity is a loglinear function of the (canonical) covariates.
Estimation of the parameters of a loglinear Poisson process can be performed to a good approximation using presence/absence data on a coarse pixel grid: the preferred algorithm is binary regression with the complementary log-log link (instead of the logistic link) and with an offset term equal to the logarithm of pixel area. Our theoretical and empirical results show that this technique is typically much more accurate than logistic regression for a given grid size.
The assignment of values to the pixel covariates z j , particularly for the pixels that contain data points, can have substantial influence on statistical performance. This is analogous to the importance of obtaining accurate estimates of hazard exposure for the cases in an epidemiological case-control study. An important instance of this problem occurs when the covariate has jump discontinuities along a highly irregular edge. The split pixel strategy is a very effective solution, and is recommended especially in very large datasets.
The MLE is not necessarily appropriate; other possibilities for fitting Poisson point process models include robust estimation [4,5,84]. However, robust estimating equations have a structure similar to the score of the loglinear Poisson model [5], so we expect similar issues to arise in robust estimation.
Assumption (a) implies that we can write z j = tz j ′ +(1−t)z j ′′ with 0 < t < 1. Substituting in (70) yields Applying (70) to each term on the left side and writing v j = β t z j for any j, we obtain tf (v j ′ ) + (1 − t)f (v j ′′ ) = f (tv j ′ + (1 − t)v j ′′ ). But this is impossible, since f is strictly convex and the values v j ′ , v j ′′ are distinct. The contradiction implies that (10), (8), (9) are inconsistent.
Assumption (b) implies that β t z j , and hence f (β t z j ), takes at most k distinct values. Since the vectors z j form a basis, equation (70) has solutions.

A.2. Poisson process approximation for general models
This section contains proofs for Theorem 3 and Corollary 4 from Section 6.3.
Proof of Theorem 3. Choose a fixed point γ j ∈ S j for every j ∈ {1, . . . , J} and define the function γ : W → {γ j ; 1 ≤ j ≤ J} by γ(x) = γ j if x ∈ S j . Denote by X = J j=1 Y j δ γj the pixel point process, which has expectation measure µ = J j=1 p j δ γj . Let Π be a Poisson process on W with expectation measure λ and Π a Poisson process with expectation measure µ. Write r j = sup u∈Sj u−γ i andr = max 1≤j≤J r j . We can split up the initial distance as d 2 L (X), Pois(λ) ≤ d 2 L (X), L ( X) +d 2 L ( X), L ( Π) +d 2 L ( Π), L (Π) (71) and examine each of the summands on the right hand side separately. For the first one, we obtain d 2 L (X), L ( X) ≤ Ed 1 (X, X) = E d 1 (X, X) I{N j = Y j for every j} + E d 1 (X, X) I{N j > 1 for some j} = E 1 X(W ) x∈X x − γ(x) I{N j = Y j for every j} + P{N j > 1 for some j} ≤r + P{N j > 1 for some j}.
We write Π * = J j=1 Π(S j )δ γj and use a somewhat similar argument for the third summand. Employing two well-known facts about the total variation distance and Poisson distributions (see [12], Proposition A. 1.1 (1.15) and Theorem 1.C), we have For the second summand on the right hand side of inequality (71), we use a result obtained by Stein's method for Poisson process approximation. Theorem 4. 3.C in [70], which is derived by a slight modification of the proof of Equation (6.3) of Theorem 6.1 in [83], yields that Plugging the inequalities obtained for the three summands into (71) leads to the upper bound claimed in (13). Inequality (14) is a simple consequence.
The same proof technique can be used to deal with (weakly) dependent pixel values Y j . In this case the inequality (74) is replaced by that obtained in [12, Theorem 10.F] or, more generally, by [83, (5.51)]. This leads to a reasonable but more complicated upper bound which contains extra summands to control the amount and structure of dependence amongst the Y j .
Proof of Corollary 4. Write λ(du) = λ(u)du. By Condition (i) we can apply inequality (13)   which goes to zero as well. Finally, it can be shown from the fact that (G (m) ) m is nested and asymptotically infinitesimal that max j λ(S m,j ) → 0 as m → ∞; compare Lemma A1. 6.II in [26] (noting that (G (m) ) m is a dissecting system for W in the sense defined there). Thus inequality (75) implies that d 2 L (X (m) ), Pois(λ) → 0, and hence X (m) converges in distribution to Pois(λ) by Lemma 2.

A.3. Poisson process approximation for spatial logistic regression
This section contains a proof of Theorem 5 from Section 6. 4.
Write λ(du) = λ(u)du for the expectation measure of the Poisson process. Note that p m,j ≤ p m,j /(1 − p m,j ) ≤ Ka m,j , where K = max u∈W exp(β t Z(u)). For each m the conditions of Theorem 3 are satisfied, so that we can employ inequality (13). The third summand in the upper bound can then be estimated further as where ε m = max 1≤j≤Jm sup u∈Sm,j exp(β t z m,j ) − exp(β t Z(u)) , which goes to zero by the choice of z m,j , the uniform continuity of Z on W , and δ m → 0. Hence we have for the total d 2 -error d 2 L (X (m) ), Pois(λ) ≤ 2δ m + P{∃j : N m,j > 1} + |W |ε m + K K|W | + 6.5 a m . (76) By the prerequisites all the summands on the right hand side of (76) tend to zero, so that, by Lemma 2, X (m) converges in distribution to a Poisson process with expectation measure λ as m → ∞.

A.4. Existence of MLE for loglinear Poisson process
Following is a sketch proof of Lemma 6 in Section 7.
Let D = { i t i Z(x i ) : t i > 0} be the relative interior of the convex cone generated by the covariate values Z(x i ) for the data points. Let C be the relative interior of the convex cone generated by the image of the covariate function Z, defined as follows. First the image measure Z ♯ is a finite measure on R k defined by Z ♯ (B) = W I{Z(u) ∈ B} du for measurable B ⊆ R k . Its support spt(Z ♯ ) is the smallest closed set S whose complement has measure zero under Z ♯ . Finally C is the relative interior of the convex cone generated by spt(Z ♯ ).
By arguments similar to [72] the negative loglikelihood has no directions of recession iff D ∩ C = ∅. For a realization of the Poisson process, the values Z(x i ) are almost surely in spt(Z ♯ ), so D ⊆ C. Hence the MLE exists uniquely iff D = ∅, which is equivalent to requiring Z(x i ) = 0 for at least one i.

A.5. Fine-pixel limit for logistic regression
This section contains a proof of Theorem 8 from Section 10.2. For any fixed β, the function h(u) = β t Z(u) is uniformly bounded above and below, since Z is Lipschitz and W is bounded. Hence the values β t z j are uniformly bounded (independent of pixel size). Using the first order Taylor expansion of log(1 + x) at x = 0, we get j log(1 + a n,j exp(β t z j )) = j a n,j exp(β t z j ) + R 1 where the remainder satisfies |R 1 | ≤ 1 2 j a 2 n,j exp(2β t z j ) ≤ 1 2 a n j a n,j exp(2β t z j ).
Accordingly j log(1 + a n,j exp(β t z j )) → W exp(β t Z(u)) du for each fixed β, since g(u) = exp(β t Z(u)) is clearly Riemann integrable. Let N n,j denote the number of points of X in pixel S n,j from grid G n . Then N n,j is Poisson distributed with mean µ n,j = Sn,j λ β 0 (u) du. Consider the event A n = {∃j : N n,j > 1} for grid G n . We have P(A n ) = 1 − j e −µn,j (1 + µ n,j ).
By Taylor expansion P(A n ) ≤ 1 − exp(− 1 2 a n j a n,j e 2β t zj ) so that P(A n ) → 0. Furthermore, since a n+1 ≤ a n /2, we have n P(A n ) < ∞ and by the Borel-Cantelli Lemma P(A) = 0 where A = {A n occurs infinitely often}. For realisations of X outside the null set A, Lipschitz continuity of Z gives