Testing the first-order separability hypothesis for spatio-temporal point patterns

First-order separability of a spatio-temporal point process plays a fundamental role in the analysis of spatio-temporal point pattern data. While it is often a convenient assumption that simplifies the analysis greatly, existing non-separable structures should be accounted for in the model construction. We propose three different tests to investigate this hypothesis as a step of preliminary data analysis. The first two tests are exact or asymptotically exact for Poisson processes. The first test based on permutations and global envelopes allows us to detect at which spatial and temporal locations or lags the data deviate from the null hypothesis. The second test is a simple and computationally cheap $\chi^2$-test. The third test is based on statistical reconstruction method and can be generally applied for non-Poisson processes. The performance of the first two tests is studied in a simulation study for Poisson and non-Poisson models. The third test is applied to the real data of the UK 2001 epidemic foot and mouth disease.


Introduction
Spatio-temporal point process (STPP) models are increasingly used for modeling natural phenomena like disease incidences, sightings or births of a species, occurrences of fires, earthquakes, tsunamis, or volcanic eruptions 2002;. In practical applications, modeling the joint distribution of spatial locations and occurrence times of a spatio-temporal point pattern are challenging. Therefore, to make some simplification, it is often assumed that the point process has a separable spatio-temporal intensity function. In general, the strongest form of separability may be defined by the requirement that the distribution of a STPP is equal to the product of the distributions of the marginal processes in space and time. This form of separability is equivalent to the independence of the spatial and temporal components of the point process, and under this separability, the spatial and temporal components can be modeled completely separately (Beneš et al.;. Diggle (2013, page 220) uses the term no spatio-temporal interaction for a point process with independent spatial and temporal components. Weaker notions of separability are characterized by the product form of moment measures.
In order to discover what type of models would be appropriate for the data and also to help in interpreting summary characteristics in preliminary data analysis, an important part for STPPs is to consider different separability hypotheses. Depending on whether the pattern is considered to be a realisation of a stationary process or inhomogeneous, the first steps in the analysis of the data and the interest to separability hypotheses typically differ.
If an STPP is considered homogeneous, the interest typically lies in the interaction between the points and the first step is the test of complete spatio-temporal randomness Illian et al.;. If the presence of interaction is confirmed, one of the crucial steps is to test the second-order separability, i.e. separability of the space-time pair correlation function 2009;2012;Ghorbani;2019). If no evidence against the separability is found, the interactions in space and time can be inspected separately of each other. On the other hand, if the pattern is deduced to origin from a non-separable process, then interactions must be considered in space-time.
For an inhomogeneous pattern, the focus is typically first on its intensity function and the first-order separability, i.e. the separability of the space-time intensity function. First-order separability is often regarded as a convenient working assumption 2009;2012;2019), because under the separability the inference about the quite complicated spatio-temporal model can typically be based on the properties of the lower dimensional spatial and temporal marginal processes that are easier to handle. However, obviously the separability does not hold always, and a test of separability can help to suggest suitable explanatory variables for the variations in the intensity. Namely, under separability, spatial and temporal covariates may be used, whereas otherwise, one should look for spatio-temporal covariates. If an inhomogeneous pattern additionally exhibits clustering or regularity, it may be of interest to test for the separability of the second-order property once a good model for the intensity exists. Such a Monte Carlo test can be performed if the first-order separability holds .
The first-and second-order separability are not fully explored in the previous literature. So far, in the context of spatio-temporal marked point processes, for testing the hypothesis of the separability of the marks and the spatial-temporal process, some test statistics based on the conditional intensity function have been proposed by Schoenberg (2004) and Diaz-Avalos et al. (2013). Recently, Fuentes-Santos et al. (2018) provided a test statistic based on the relative risk function and using regression methods. All these test statistics were applied to wildfire data. As another test statistic for the hypothesis of the first-order separability, Gonzalez et al. (2019) used the ratio of the integrated intensity function at two disjoint spatial regions which should not vary by time and employed it in the analysis of tornado occurrences in the USA. However, none of these test statistic allows one to discover reasons of non-separability, i.e., where and when the non-separability occurs.
The aim of this paper is to propose tests for the null hypothesis of spatio-temporal separability of intensity function. We propose three different tests to investigate this hypothesis, namely, a permutation based test, a test based on stochastic reconstruction, and the χ 2 -test. For the first two tests, our test statistics are simply based on the nonparametric estimates of the non-separable and separable intensity functions. As usual in spatial and spatio-temporal statistics, the distribution functions of the proposed test statistics are unknown and Monte Carlo methods are used to compare the observed value of the test statistic with the values obtained from the simulated samples under the null hypothesis. In general, in the analysis of spatio-temporal point patterns, the test statistics are often summary functions, as here as well, and pointwise envelopes have been used for model checking 2009;2012. However, while the functions are inspected on an interval or a region, the pointwise envelopes control the type of error only locally (see e.g. Loosmore and Ford;2006;2017). We utilize instead the global envelope tests 2017;2019) that control the global type I error and show how they can be useful also in testing the separability hypothesis by providing good power and graphical interpretation of the test results. In producing Monte Carlo samples under the first-order separability hypothesis, a simple permutation strategy, namely permutation of times of the point pattern, works well for Poisson processes. However, it breaks down the second-order structures of non-Poisson processes. For non-Poisson cases, we suggest instead a test based on the stochastic reconstruction procedure (Tscheschel and Stoyan;2006;Wiegand et al.;Koňasová and Dvořák;, which can be used to produce Monte Carlo replications with the same interaction structure as the observed data and the same intensity functions of the spatial and temporal component process X space and X time , respectively. The third test is a computationally cheap χ 2 -test, which is based on the cell counts. The permutation based test is exact and the χ 2 -test is asymptotically exact for Poisson processes. The random reconstruction based test is instead appropriate for non-Poisson processes, provided that the reconstruction procedure is performed carefully. The rest of this paper is organized as follows. Section 2 provides some background materials about STPPs including their first-order characteristics that will be used in the subsequent sections. The null hypothesis of first-order separability is defined in Section 3. The permutation based test is introduced in Section 4, including test statistics, permutation strategy and description on the use of the statistics calculated for the data and simulations in the global envelope and deviation tests. Sections 5 and 6 respectively express the χ 2 -test and stochastic reconstruction method for testing the first-order separability. Section 7 is devoted to two simulation studies, one for inhomogeneous Poisson processes in Section 7.2 and another for log-Gaussian Cox processes in Section 7.3. These studies compare the performance of the proposed tests under various alternative hypotheses and also explain the graphical interpretation of the global envelope tests. We applied the tests to the UK 2001 foot and mouth disease data in Cumbria (Keeling et al.;2001;2006Ghorbani;2012) in Section 8. The paper ends with a short discussion. Some visualisation tools including interactive plots, suitable for informal assessment of the first-order separability hypothesis, are presented at the accompanying website: http://msekce.karlin.mff.cuni.cz/~dvorak/software/STseparability.html.

Assumptions and background
A STPP X with no overlapping points is a random countable subset of a space S ⊆ R 2 × R, where each point (u, t) ∈ X indicates the location and time of occurrence of an event of interest. In practice, X is observed within a spatio-temporal window W × T , where W ⊂ R 2 is a bounded region of area |W | > 0, T ⊂ R is a bounded time interval of length |T | > 0, and X ∩ (W × T ) = {(u i , t i ), i = 1 . . . n} is the observed data. We assume that X has intensity function ρ.
Assuming that X has an intensity function ρ(·), the spatial component process X space consisting of the locations with times in T and the temporal component process X time consisting of the times with locations in W , i.e. X space = {u : (u, t) ∈ X, t ∈ T } and X time = {t : (u, t) ∈ X, u ∈ W }, are then well-defined point processes on R 2 and R, respectively, with well-defined intensity functions. Following Møller and Ghorbani (2012), we use the indices space and time for the respective functional summaries of the spatial and temporal components. The intensity functions of these components are given by (1) Using the above marginal intensities, the conditional spatial and temporal intensities for any given time t and for any given spatial location u, are respectively defined by Nonparametric kernel estimates of ρ space and ρ time are respectively given bŷ andρ is a given density function, and C W, (u i ) = W k 2 (u − u i )du and C T,δ (t i ) = T k 1 δ (t − t i )dt are edge correction factors in space and time, respectively (see details e.g. in 2016, page 168)). In general, a nonparametric kernel estimate of the non-separable spatio-temporal intensity function iŝ The kernels k 1 and k 2 of the above formulas may have different forms. However, in this work, we use isotropic Gaussian kernels.

First-order spatio-temporal separability
Recall that the point process X has intensity function ρ(u, t) for (u, t) ∈ R 2 × R. Hence, the intensity measure µ for X can be written as where n(A) denotes the number of points of X in any bounded set A ⊆ R 2 × R. Note that, for a homogeneous STPP X, the intensity function ρ is the mean number of points per unit space-time volume. It is usually assumed that the spatio-temporal intensity function of a STPP is separable, i.e., where ρ 1 and ρ 2 are non-negative measurable functions. Under this assumption, for Borel sets A ⊆ R 2 and B ⊆ R, the intensity measure (6) is a product measure, that is µ(A × B) = A ρ 1 (u) du B ρ 2 (t) dt (see more details in 2012). Some literature use the term first-order spatio-temporal separability instead of separability of the spatio-temporal intensity function. In this paper we use the short term first-order separability.
The null hypothesis of the first-order separability implies that the intensity functions of the spatial and temporal component processes in (1) can be written as and then by combining (7) and (8), the intensity function of the point process X under the first-order separability is Combining (6) and (9), and defining µ space (A) = A ρ space (u)du and µ time (B) = B ρ time (t)dt, the intensity measure of X under the first-order separability denoted by µ sep is given by We further assume that the expected number of observed points is a positive and finite number, i.e., 0 < W ×T ρ(u, t) d(u, t) < ∞. Furthermore, for a first-order separable model, the conditional spatial and temporal intensities in (2) are simplified to , which do not vary by changing time and space, respectively. In the spatio-temporal point processes context, it is usually assumed that the "first-order spatio-temporal separability is a convenient working hypothesis which is hard to check" 2009;2012). Here, we propose formal tests for the first-order separability hypothesis. The test statistics are based on the separable and non-separable intensity estimates.

Test functions
To test the first-order separability hypothesis, we propose the following test function: forρ(u, t),ρ space (u),ρ time (t) > 0. The important part of this test function is the non-separable intensity in the numerator, while the separable intensity in the denominator is employed only for the scaling purpose. Because under the null hypothesis,ρ thereupon, for all (u, t) ∈ W × T , S(u, t) = 1. Therefore, values deviating from one relate to non-separable structures. In particular, S(u, t) > 1 indicates increased estimated non-separable intensity at the specific location u and time t in comparison to the estimated separable intensity. The function S(u, t) is a three-dimensional function which contains all the information about the intensity of the point process under study. In practice the intensities and thus S(u, t) are estimated on a finite number of spatial locations u ∈ W and times t ∈ T . When the temporal region T is small or low resolution in T is adequate, it is convenient to visualize the three-dimensional function by plotting S(u, t), u ∈ W , for each discretized time point t (see Section 7 for details). However, when T is large or fine discretization is desired, instead of studying a large number of images, it can be of interest to consider the following one-and two-dimensional functions that integrate S(u, t) with respect to u and t, namely and which can be easily visualized. On the other hand, some detailed information may be lost in the integration, as will be discussed in Section 7. Under the hypothesis of first-order separability, S time (t) = |W | for all t ∈ T and S space (u) = |T | for all u ∈ W .

Simulations under the first-order separability
Because the distributions of the above test functions are not known, we need to resort to simulation based tests as usual in spatial and spatio-temporal statistics. A test of first-order separability can be implemented using permutations under the assumption of the inhomogeneous Poisson model. The permutation test does not require distributional assumptions, or any type of model specification for the separability structure. The permutation test relies on the fact that if the null hypothesis is true, then shuffled data sets should look like the observed data, otherwise they should look different. In other words, the joint distribution of the observed data must be the same as the joint distribution of the permuted data for any permutation, which is called permutation invariance. Following Diggle (2013) we assume that the process is an inhomogeneous spatio-temporal Poisson process. Under this working assumption, it is easy to show that permutation invariance holds. To justify this claim, consider the density of the inhomogeneous spatio-temporal Poisson process with respect to the unit rate Poisson process as given by f (x) = e |S|− S ρ(u,t)dudt (u,t)∈x ρ(u, t) for finite point configurations x ⊂ S. Under the hypothesis of the first-order separability (7), Clearly, the value of the joint density function f (x) given in (14) for any random permutations of the temporal points t i holding the locations x i fixed will not change. The same holds for random permutations of the x i holding the t i fixed. Hence, simulations under the first-order separability hypothesis can be obtained using either random permutations of x i holding t i fixed or random permutations of t i holding x i fixed, with equal probability for each possible permutation. In all our tests, we permuted t i s holding x i s fixed. If the null hypothesis of first-order separability is true, then these randomly permuted data sets are statistically equivalent to the original data, and the rationale of the Monte Carlo test applies (Barnard;1963;Besag and Diggle;1977;. Note that, in practice we observe only one realization of X, so it is not possible to distinguish a Cox process from its corresponding inhomogeneous Poisson process and thus considering inhomogeneous Poisson process as a working assumption seems reasonable.

Global envelope tests
In the Monte Carlo test, first a test statistic must be chosen and be calculated for the data and for n simulations generated under the null hypothesis. Thereafter, the extremeness of the data statistic among all the statistic must be determined. This is done by ordering the statistics by some measure. We used the global envelope tests 2017;2019) to perform Monte Carlo tests based on S(u, t), S space (u) and S time (t). An advantage of the global envelope tests is that they provide graphical interpretation of the test results. Furthermore, their non-parametric nature ensures that they are not sensitive to the inhomogeneity in the distribution of the test functions over the spatial, temporal or spatio-temporal domain.
In practice, the global envelope tests use the discretized version of the test function, say S. Let us denote the discretized test function calculated from the data by S 1 and the test functions calculated from the n simulations by S 2 , . . . , S n+1 . In this paper, each of the S i can be thought as a d-variate vector, S i = (S i1 , . . . , S id ), where d is given by the number of the grid points and it determines the argument values (u 1 , t 1 ), . . . , (u d , t d ), u 1 , . . . , u d or t 1 , . . . , t d at which the function S(u, t), S space (u) or S time (t) are evaluated, respectively.
We used the global extreme rank length (ERL) envelope test 2019). The test is based on the corresponding ERL measure suggested independently by Myllymäki et al. (2017) and Narisetty and Nair (2016) where n + 1 plays only the role of a scaling factor, can be attached to S i , where a small value indicates extremeness of S i . Here and in what follows 1(·) denotes the indicator function.
The global envelope is constructed as a hull of those vectors S i that are considered non-extreme by the ERL measure M i . Formally, let m α ∈ R be the largest of the measures M i calculated for S 1 , . . . , S n+1 such that the number of those i for which M i < m (α) is less or equal to αs, and let I α = {i ∈ 1, . . . , s : M i ≥ m (α) } be the index set of vectors less or as extreme as m α . Then the 100(1 − α)% global envelope induced by the M i is given by If the data vector S 1 does outside the envelope, then also M i < m α (see e.g. 2019). The Monte Carlo p-value can be calculated as p = n+1 i=1 1(M i ≤ M 1 ) (n + 1). We included in our simulation study also the global envelopes based on the Area measure suggested by Mrkvička et al. (2019). However, in our examples the ERL and Area measures led to rather similar outcomes, and thus we decided to include the results based on the ERL measure only.

Deviation test
In addition to the global envelope tests, we included in our simulation study the deviation test based on the integral deviation measure which summarizes the discrepancy between the estimate of the spatio-temporal intensity function and its counterpart under the separability hypothesis. This measure stems from testing the independence of two continuous random variables (Blum et al.;1961;Diggle and Gabriel;2010, p. 451). In practise, the integral in (15) is replaced by a sum.
A standard Monte Carlo test can be based on the statistic S d,1 calculated from data and its simulated counterparts S d,2 , . . . , S d,n+1 . Given that large values of S d are significant, the p-value of the test is

χ 2 -test for first-order separability
A test of the first-order separability hypothesis can be performed without permutations in a manner similar to independence test in two-way contingency tables. The test is conditional on the observed number of points and asymptotically exact for Poisson processes. In this spirit it is closely related to the well-known χ 2 -test of Complete Spatial Randomness.
We consider a division of the interval T into disjoint sub-intervals T 1 , . . . , T J and similarly a division of the window W into disjoint subsets W 1 , . . . , W I . Both divisions can be based, e.g., on quantiles of the temporal and spatial coordinates, respectively, but are in fact arbitrary. Let n ij denote the number of observed events in the cell W i × T j . Under the first-order separability assumption and conditionally on the observed number of events being n, we observe a binomial point process on W × T with probability density function proportional to ρ 1 (u)ρ 2 (t). In this setting the events are independent and the expected count in the cell W i × T j is for each i = 1, . . . , I, j = 1, . . . , J. Hence the test statistic has asymptotically, with the increasing number of observations in the fixed observation window W × T , the χ 2 distribution with (I − 1)(J − 1) degrees of freedom. Recall that the test requires large sample sizes to be accurate. A simple rule of thumb regarding sample size is that the expected cell counts should be at least five. After defining the cells W i × T j and determining the observed counts n ij it is possible to perform the test using standard statistical software. We here utilized the R package spatstat .

Stochastic reconstruction for testing first-order separability
We have also investigated other approaches for obtaining Monte Carlo replications than the permutation strategy described above. Among others, we explored various versions of the classical random shift approach (Lotwick and Silverman;1982;, but none of these was able to produce replications under the null hypothesis, i.e. with separable first-order structure, while preserving the interaction structure of the observed pattern. However, the stochastic reconstruction procedure can be used to produce independent replications (outputs) with the same interaction structure as the observed data (input) and the same intensity functions of the spatial and temporal component process X space and X time , respectively (Tscheschel and Stoyan;2006;Wiegand et al.;Koňasová and Dvořák;. The user chooses a set of summary characteristics that should be preserved by the reconstruction procedure. We suggest to use the square root of a non-parametric estimator of the inhomogeneous space-time K-function 2009), together with the separable estimator of the intensity function (9). Based on the experience from Koňasová and Dvořák (2020), we also suggest to use as further summaries the valuesD k (r, t) giving the fraction of observed points which have at least k neighbours within distance r (in the spatial domain) and within the lag t (in the temporal domain). These are considered only to be empirical characteristics describing interpoint distances rather than being estimators of some theoretical quantities. However, they are closely related to the raw estimates of the kth nearest neighbour distribution functions in a stationary space-time point process.
A so-called energy functional, quantifying the dissimilarity between the input pattern X and another pattern Y , is then constructed: where w K , w D k , w ∆ are the weights determining the relative importance of the individual terms, {(u 1 , t 1 ), . . . , (u I , t I )} are the center locations of the cells of a regular grid covering W × T , a is the volume of the grid cell and T K , R K , T D , R D and k max are user-selected tuning constants. The procedure starts with a binomial pattern Y 0 generated as a collection of n independent points (the same as the number of observed points in pattern X) following a probability density function proportional tô ρ sep (X; u, t). Then iteration steps are repeated in which a new pattern Y new is proposed by randomly deleting one point from the current pattern, say Y m , and generating a new point in W ×T with density again proportional The algorithm stops when a user specified stopping rule is met, e.g. after performing a maximum allowed number of iterations or after rejecting a certain amount of proposals in a row (Tscheschel and Stoyan;2006;Illian et al.;. By minimization of the energy functional the output pattern Y out is forced to have approximately the same interaction structure as the input pattern X (as described by the K-and D k -functions) while having a separable first-order structure (as described byρ sep ).
After a large number of independent output patterns is generated, these can be used to perform a Monte-Carlo test of the first-order separability hypothesis as in Section 4. The outputs can be considered to be independent replicates of the data obtained under the null hypothesis. The performance of the Monte-Carlo test of course relies on the interaction structure of the observed data being correctly captured by the reconstruction procedure.
Using the stochastic reconstruction procedure requires some tuning of the parameters. It is also strongly suggested to verify on simulated data that the outputs of the reconstruction algorithm have the same properties as simulations from the correct model. This can be done following the suggestions of Koňasová and Dvořák (2020) and will be illustrated in the data example below.
7 Performance of separability tests

Separability of inhomogeneous Poisson processes
We investigated the performance of the global envelope tests based on the test functions (11)-(13), the deviation test S d based on (15) as well as the χ 2 -test in the following simulation study: We let W × T = [0, 1] 2 × [0, 1] be the observation window and considered the inhomogeneous spatio-temporal Poisson process within W × T with the following intensity: where the parameter ν controls the expected number of points in the pattern and the parameter γ ∈ [0, ν/2] controls the degree of separability. For γ = 0, the model is separable as all the points come from the first part of the model, whereas γ = ν/2 corresponds to the most non-separable model where half of the points come from the first and another half from the second part of the model. In our simulation study, we considered different values of γ = 0, 25, 50, 75, . . . , 200 and we determined ν by fixing the expected number of points in the pattern to be around 600, to have a corresponding number of points as in our data example, in all simulations.
The rationale behind the model (17) is to mimic a situation in practical applications, e.g. in environmental epidemiology, where the incidents typically occur randomly in the population, possibly with some general trend in time, but at a certain time there occurs a sudden burst of incidents around a contaminated source or another starting location of an epidemic. One famous example of such situation is the cholera cases in the proximity of the polluted water pump in Soho, London 1854 (Bivand et al.;2013, pages 118-122 and references therein). Another example is the UK 2001 foot and mouth disease data analysed in Section 8. In the first part of the model, ξ(u) can be understood as a population density in space. Given that the points represent incidents of some contagious disease, in the case γ = 0, they occur randomly in the population. On the other hand, ψ(t) is the baseline trend in time. The second part of the model with φ µ,Σ (u, t) creates a "burst" of incidents around a particular location and time. More precisely, for the baseline spatial and temporal densities ξ(u) and ψ(t), we considered four different models: (i) overall constant density with ξ(u) = 1 and ψ(t) = 1, (ii) ξ(u) = 1 and the temporal density ψ(t) was the univariate normal distribution φ µ,σ (t) with mean µ = 0.5 and variance σ = 0.2, (iii) ψ(t) = 1 and the spatial density ξ(u) was the bivariate normal distribution φ µ,Σ (u) with mean µ = (0.5, 0.5) and with diagonal covariance matrix Σ with variances 0.2 in both directions, (iv) ψ(t) = φ µ,σ (t) and ξ(u) = φ µ,Σ (u) as in (ii) and (iii), respectively.
Thus, in the case (i), the baseline points occur uniformly in space and time, generating a point pattern which is homogeneous in both space and time for γ = 0. This can be understood as a case where prevalence of some disease is inspected on a relatively small region with approximately constant population density. In the case (ii), the number of incidents increases up to the time 0.5 after which the density again decreases gradually. The generated point pattern is homogeneous in space when γ = 0, but inhomogeneous in time. In the case (iii), the incidents occur in time at a constant rate, whereas the population density represented by ξ(u) is centred in the middle of W , and thus the inhomogeneity occurs in space. Finally, the case (iv) is a combination of the cases (ii) and (iii) where the base line density is inhomogeneous both in space and time.
We used the independent thinning method to generate 1000 realisations of each of the spatio-temporal inhomogenous Poisson processes described above. Briefly, considering the fact that for each model the intensity function ρ(u, t) in (17) is bounded above by a positive constant ρ max on the given observation window W × T = [0, 1] 2 × [0, 1], we first simulated a homogeneous Poisson process X max within W × T with intensity ρ max , and second made an independent thinning of X max ∩ W × T where the retention probability of a point (u, t) ∈ X max ∩ W × T is given by p(u, t) = ρ(u, t)/ρ max (see e.g. Daley and Vere-Jones;2008, for details).
Considering each simulated pattern as our data on its own turn, we tested the first-order separability of it by the different tests. To generate simulations under the null hypothesis, we used the permutation procedure explained in Section 4.2 and we fixed the number of permutations to 1999 in each case. We used equations (3)-(5) to estimate the space, time and space-time intensities. The choice of the bandwidth in space was made to minimise the mean-square error criterion defined by Diggle (1985) and obtained using the function bw.diggle() of the R package spatstat . To choose the bandwidth in time we used the function bw.SJ() of the R package base (R Core Team; 2019) which is the Sheather and Jones (1991) method of bandwidth selection using pilot estimation of derivatives to minimise the mean integrated square error criterion. For the example pattern of Figure 1 of the model (17) with case (iii), the bandwidths in space and time were respectively 0.025 and 0.037. We made the ERL global envelope tests and the deviation test based on the function S(u, t) at a 25 × 25 × 20 grid covering W × T . Furthermore the global envelope tests were performed for the functions S space (r) and S time (r) in the corresponding space and time grids. In our calculations, we utilized the R library GET 2017;2019). For the χ 2 -test we used the 4 × 4 × 4 grids based on the quantiles of the temporal and spatial coordinates ensuring that the amount of points was in average more than five points in each grid cell. For each test, we calculated the number of rejections of the null hypothesis among the 1000 repetitions at the significance level 0.05. Table 1 shows the results of the simulation study. The case γ = 0 corresponds to empirical significance levels, while for the other values of γ, powers of different tests to detect deviation from the null hypothesis are given. The mean number of points,n, in the 1000 simulated point patterns are shown at the third column of the table. The following observations can be done: (a) As expected due to the theoretical result (14), the empirical significance levels of all tests were close to the nominal level 0.05. (They should be between 0.037 and 0.064 with a probability of 0.95 given by the 2.5% and 97.5% quantiles of the binomial distribution with parameters 1000 and 0.05).
(b) The power of the tests increased with increasing degree of non-separability.
(c) The S-function based tests had extremely high power for all models. However, the tests based on S space had lower performance for models (ii) and (iv), which had an inhomogeneous base line temporal density, while the tests based on S time had lower performance for models (iii) and (iv), which had an inhomogeneous base line spatial density. Thus, the temporal or spatial inhomogeneity had an effect on the performance of the tests based on S space and S time .
(d) The power of the deviation test (15) and the χ 2 -test was also high, but tended to be slightly lower than the power of the global envelope tests for the critical values γ = 25, 50.
The result (c) was expected, since the S-function utilizes all the information of the intensity function, while the latter two summarize S and can therefore loose some of the information as illustrated by the example in Section 7.2. We performed the permutation tests also with the global area envelope tests (Mrkvička et al.; 2019; Myllymäki and Mrkvička; 2019) (results now shown here). Its power tended to be slightly higher for S than the power of the global ERL envelope tests (for γ = 25), while it was the other way around for S space . Table 1: The proportion of rejections among 1000 repetitions of the permutation and χ 2 -tests at the significance level α = 0.05 for the four models specified by (17) (i)-(iv). The global envelope and deviation tests were performed with 1999 permutations of times using a 25 × 25 × 20 grid, and the χ 2 -test was based on a 4 × 4 × 4 grid.

Global ERL envelope
Deviation

Graphical interpretation of the first-order separability test results
Figure 1 depicts a STPP simulated from the non-separable inhomogeneous Poisson process (17) with the spatial and temporal base line intensities (iii) and γ = 100. Figures 2, 3 and 4 show the outputs of the three global ERL envelope tests based on the test functions S, S space and S time , respectively. The test setup was as described above. Figure 2 shows the regions where the S-function estimated from the data goes above and below the global envelope constructed from permutations. The result indicates increased intensity of points particularly around the spatial location (0.3, 0.3) at times 0.12-0.27. On the other hand, there are several times where the data function goes below the envelope. The explanation for this test output is as follows: In the generated pattern, the intensity is highest in the burst region centred at (0.3, 0.3, 0.2). The permuted patterns have lower intensity at this location and time, because the random permutations of times shuffle those points across the whole time window T . The shuffling simultaneously increases the intensity at the other times, whereby the data function goes below the envelope in the burst region at times different from the time of the burst. Because typically the interest lies in the increased numbers of points or incidents, the locations and times where the upper envelope is exceeded tend to be more interesting than exceeding the lower envelope. The few crossings of the upper envelope at later times have occurred just by a random chance; the choice of the rather small bandwidth (0.03) may have supported the detection of such small random clusters.
It should be noted that the separable intensity in (11) is the same for the data and all permutations and, therefore, it serves only as a scaling factor. Consequently, the exceeding of the envelope can be interpreted as increased intensity in comparison to the null hypothesis.
The function S space is able to detect the same spatial location around (0.3, 0.3) (see Figure 3). At first thought, it might be surprising that the data function goes below the envelope, and not above. However, the function S space is obtained by integrating the S-function over all times and, as explained above, the S-function had different behavior at different times. In this case, the function S time detected the deviation from the null hypothesis, as well: the data function went below the envelope at times 0.17, 0.22 and 0.27 (see Figure 4). Both functions S space and S time will probably go below the envelope in many applications as well, when the nonseparability occurs due to increased incidents at particular locations and times. We advice to use these functions only to detect the locations and times of non-separability and then draw further conclusions by inspecting the STPP.
An advantage of the non-parametric rank envelopes is that they adapt to the variability of the test function across its domain. This can be observed from the lower and upper envelopes that vary across the space or time domain (see Figures 2, 3, 4).
We assume here that the LGCP X is second-order intensity-reweighted stationarity (SOIRS), i.e. the pair correlation function g depends only on spatial distance and time lag between two points 2000;2009). This implies that the covariance function C((u, s),    Figure 3: Permutation based test: testing the first-order separability of the point pattern of Figure 1 using the global ERL envelope test with the function S space (u) (p = 5 · 10 −4 ): The empirical S space function (blue color)(left), the lower envelope overlaid by the significant regions where the empirical function goes below the envelope(middle), and the upper envelope overlaid by the significant regions where the empirical functions goes above the envelope(right). The 95% global envelope was constructed from 1999 simulations. Figure 4: Test of first-order separability for the point pattern of Figure 1 using the global ERL envelope test with the function S time (p = 25 · 10 −4 ) the observed function and the 95% global envelope. The central function is the mean of all functions S time .
where m 1 (u) and m 2 (t) are two functions for describing the trend of the field in space and time, respectively. That is, if the first-order separability holds, then spatial and temporal covariates may be used for modelling trends in space and time, respectively, whereas otherwise one should look for spatio-temporal covariates. We investigated the performance of the permutation based test and χ 2 -test for testing the first-order separability of a spatio-temporal LGCP on W × T = [0, 1] 2 × [0, 1]. For the sake of simplicity, we specifically set where γ = 0 corresponds to the hypothesis of first-order separability, while for γ = 0, the intensity function is non separable. We also set for parameters σ 1 , σ 2 > 0, and γ ≥ 0. We further assumed that Z s , Z t , and Z st are independent Gaussian random fields with mean zero and covariance functions C 1 (u 1 , u 2 ) = exp(− u 1 − u 2 2 /φ 1 ), C 2 (t 1 , t 2 ) = exp(−|t 1 − t 2 |)/φ 2 ), and C 3 ((u 1 , t 1 ), (u 2 , t 2 )) = C 1 (u 1 , u 2 )C 2 (t 1 , t 2 ), respectively, with correlation parameters φ 1 , φ 2 > 0. It is worth mentioning that, setting γ = γ = 0 implies that the realisation of the underlying random intensity Λ(u, t) has multiplicative form. This in turn indicates that, conditional on Λ(u, t), for γ = γ = 0, a realisation of a first-order separable inhomogeneous Poisson process is obtained. The special case of the above LGCP model assuming homogeneity (m(u, t) = log ρ, with ρ > 0 as a constant) has been used in Møller et al. (2019) to investigate the performance of the space-sphere K-function. They fitted the LGCP model with γ = 0 and prepared a Monte Carlo test, the global rank envelope test, based on the simulations from the fitted model. We explore here instead the performance of the completely non-parametric permutation based test and χ 2 -test.
First, we explored the empirical significance level of the tests, i.e. the performance of the tests for the case γ = 0. For each value of γ = 0, 0.5, 1, we simulated 1000 realisations of the LGCP on [0, 1] 2 × [0, 1] with β 1 = 0.25, σ 1 = σ 2 = 0.5, φ 1 = 0.06 and φ 2 = 0.05. Briefly, to simulate a realization of a LGCP with the given covariance function C((u 1 , t 1 ), (u 2 , t 2 )) and mean 0, we first generated a realization of a Gaussian random field, z(u, t), in a 20 × 20 × 20 grid with covariance function C((u 1 , t 1 ), (u 2 , t 2 )) and mean 0 using the circulant embedding method (Wood and Chan;1994) in the R package RandomFields (Schlather et al.;. Thereafter, we simulated a realisation of the inhomogeneous Poisson process with intensity exp(m(u, t)+z(u, t)), (u, t) ∈ R 2 × R. For each realisation, we tested the null hypothesis γ = 0 (the first-order separability) using both the permutation based test and χ 2 -test at the significance level 0.05. Note that the parameter values were chosen such that the mean number of the points of each realisation was around 200. The permutation based test was based on the test function (11) with three different choices of the bandwidths for estimating the intensities, Table 2: Empirical significance level (γ = 0) based on 1000 repetitions of the permutation and χ 2 tests for LGCP for different values of γ , and at the significance level α = 0.05. The global envelope test was performed with 999 permutations of time using the S(u, t)-function evaluated at a 20 × 20 × 20 grid. The intensities were estimated by kernel smoothing using three different bandwidths, small, moderate and large presented in this order in the table for each case. The χ 2 -test was based on a 4 × 4 × 4 grid.

Bandwidths
Global ERL envelope γ β 0 space time S(u, t) χ 2 -test  Table 2 presents the empirical significance levels for the permutation test based on S(u, t) with these bandwidth choices and for the χ 2 -test.
The following observations can be made from the empirical significance levels (case γ = 0): • For γ = 0 all empirical significance levels were fine.
• The empirical significance levels increased with increasing γ . They were still close to the nominal level with γ = 0.5, but for γ = 1, all tests were liberal.
• For γ = 1, increasing the bandwidths of the permutation test based on S(u, t) reduced the liberality of the test, but the nominal level was not reached for any bandwidth choice.
The first observation can be understood from the fact that γ = γ = 0 implies the multiplicative form of the realisation of Λ(u, t) as noted above. Thus, the additional clustering of the LGCP model caused by the Z(u, t) with additive structure did not affect the empirical significance levels for γ = 0. However, the non-additive form of Z(u, t) for γ > 0 led to a non-multiplicative form of Λ(u, t) and to liberal tests. In fact, the bigger the γ , the more severe the liberality. The χ 2 -test was similarly liberal for γ > 0, due to unbalanced cell counts. Thus, we conclude that the structure of the clusters has a major impact on the liberality of the tests for LGCPs and the considered first-order separability tests are truly valid only under the additive structure of Z(u, t) (case γ = 0 of the considered model). Even though the nominal significance level was in the above experiment reached approximately also for γ = 0.5, the liberality of the tests is further increased for any γ > 0 by increasing the values of the correlation parameters φ 1 and φ 2 or number of points controlled by the parameter β 0 . Given these conclusions, we next explored the power of the tests for the LGCP model with Table 3: The proportions of rejections among 1000 repetitions of the permutation and χ 2 -tests for LGCP with γ = 0 at the significance level α = 0.05. The tests were performed as in Table 2.

Bandwidths
Global ERL envelope 0.595 0.14 0.14 0.687 γ = 0 for different values of γ . The rejection rates among 1000 simulations of the LGCP model are reported in Table 3. The results can be summarized as follows: • As expected, the power of the tests increased with γ , i.e. with increasing degree of non-separability.
• At a fixed level of γ , the power of the permutation test based on S(u, t) increased by increasing bandwidths, and this impression is more visible for large values of γ .
The second observation is not surprising, because the non-separable structures of the model (20) were linear, leading to smoothly varying first-order intensities, which should be better captured by large bandwidths that involve larger amount of spatio-temporally correlated points in estimation of the intensity.
8 Case study: UK 2001 epidemic foot and mouth disease data As a real data example, we investigated the first-order separability of the spatial and temporal components of the UK 2001 epidemic foot and mouth disease (FMD) dataset in Cumbria. This dataset was previously analyzed in Keeling et al. (2001), Diggle (2006Diggle ( , 2007, Gabriel et al. (2018), Møller and Ghorbani (2012) and Ghorbani (2013). For more information about the data see Diggle (2013). The data analyzed in this section is taken from the R package stpp . The area of Cumbria is 5556.298 km 2 and the data were collected for 200 days starting at February 1, 2001, so we let T = [0,200]. Figure 5 shows the spatial point pattern of 648 infected animals in the irregular region W defined by Cumbria (upper left panel), and the daily number of infected animals (bars in the lower panel). Further, the upper right and lower panels show the estimated spatial intensityρ space estimated by (3) with bandwidth b = 1.83 km and the estimated temporal intensityρ time estimated by (4) with bandwidth of 3.86 days, respectively. The bandwidths were chosen by the same rules as in the simulation study (see Section 7.1). Manifestly, the most incidents occur in the North-Western to South-Eastern (NW-SE) belt of Cumbria and within the first 100 days. Closer inspection of the spatio-temporal data indicates that the epidemic has begun in the far north of the Cumbria and subsequently spread both south-west and south-east. Transmission of infection is thought to occur primarily between neighbouring farms but cases can also occur far from all preexisting cases, possibly because of the unintended transport of infected material . For the FMD dataset, what is of interest is how the farms locations collectively affect the progress of the epidemic in time. Is there any relationship between farms' location and the times t i at which particular farms reported FMD cases? From statistical perspective, we want to see if the spatial and temporal components are independent. In the following, we test the first-order separability of the fmd data, first assuming that the data follow an inhomogeneous Poisson process (Section 8.1) and second treating it as a clustered pattern (Section 8.2).
Both in the permutation and reconstruction method, 2499 simulations were generated under the null hypotheses and the global envelope test was performed for the empirical and simulated S-functions that were evaluated on a 50 × 50 × 10 grid.

Permutation and χ 2 tests
We first tested the first-order separability of the FMD data by the permutation and χ 2 tests, assuming that the data follows an inhomogeneous Poisson process. The simple χ 2 -test was performed with the 3 × 3 × 3 grid where the boundaries of the cells were determined using the 1/3-and 2/3-sample quantiles of the observed coordinates in each dimension, similarly to the experiments in Section 7. The grid had only 27 cells to ensure the observed counts in individual cells were high enough. The resulting p-value was 6.6 · 10 −24 , indicating very strong evidence against the null hypothesis of the first-order separability.
The output of the permutation based test is shown in Figure 6. The test detects that the intensity of points was particularly high in the NW corner at time 30, in the middle of the region at time 50 and in the SE corner at later times 110-190. Simultaneously, the observed intensity goes below the lower envelope obtained from the permutations at quite large areas in the SE corner at early time and in the north and NE at later times (see description of the phenomena in Section 7.2).

Stochastic reconstruction method
For testing the first-order separability hypothesis in the FMD dataset, treating it as a clustered pattern, we applied the stochastic reconstruction approach described in Section 6. To capture the interaction structure of the observed pattern X, we estimated the inhomogeneous space-time K-function where we used the non-separable kernel estimator of the intensity functionρ(u, t). We chose the non-separable estimator for this purpose since we are not sure whether the data were generated by a first-order separable process or not and we want to correctly estimate the interactions in both cases. Furthermore we estimated the functionsD k to capture the arrangement of points in the clusters. The estimator of the intensity function to be used in the energy functional (16) is the separable estimatorρ sep (u, t) so that the separable first-order structure is enforced on the output patterns. For tuning the algorithm, i.e. choosing the weights, upper bounds T K , R K , T D , R D and the number k max in (16), we used simulations from a model similar to the one fitted to the data in Møller and Ghorbani (2012). Namely, we considered a Poisson-Neyman-Scott type of process on the observation window W × T from the FMD dataset with inhomogeneous population of parents points following a Poisson process with the intensity function proportional toρ sep (X; u, t) and the mean number of parent points in W × T being 182. The mean number of offsprings per parent point was 3.5, their displacement around the parent point was governed by a trivariate Gaussian distribution with independent components and standard deviation 3.23 kilometers (in the spatial dimensions) and 7 days (in the temporal dimension), respectively.
As suggested in Koňasová and Dvořák (2020), from this model we generated 100 training patterns (to be used as inputs in the stochastic reconstruction procedure) and 999 testing patterns. For each training pattern we produced one output pattern using a given version of the stochastic reconstruction algorithm. To check the interaction structure of the output, we performed a global envelope test in which this output was treated as the observed data and the 999 testing patterns were treated as the Monte-Carlo replication. The test statistic was the inhomogeneous space-time J-function from Cronie and Van Lieshout (2015). In this way we produced 100 p-values from the 100 training patterns. Uniformity of distribution of these p-values can be considered as an  indication that the outputs of the stochastic reconstruction cannot be distinguished from the simulations from the true model, using this functional characteristic. After some experimenting we chose the following values: T K = 12 days, R K = 6 km, T D = 6 days, R D = 3 km, k max = 3, w K = 1, w D k = 3 · 10 3 , w ∆ = 4 · 10 5 . The iterations were stopped either if 100 proposals were rejected in a row or the maximum number of 10 5 iterations was reached. The weights were chosen so that, after the algorithm stops, the contribution of the individual terms in the energy functional is of the same order of magnitude. With these choices we performed the set of tests described in the previous paragraph, obtaining a very homogeneous population of p-values, see Figure 7 (left). For comparison we also performed the set of tests where the training patterns were treated as the observed data, obtaining again a rather uniform distribution of the 100 resulting p-values, see Figure 7 (right). We conclude that with these choices the stochastic reconstruction algorithm produces outputs which correctly reproduce the interaction structure of the given model, and the separable form of the first-order structure is enforced by the choice of the energy functional. Hence the outputs can be used in a Monte-Carlo test of the first-order separability hypothesis in place of independent simulations from the correct model. ERL global envelope test using the inhomogeneous Jfunction, for the outputs of the stochastic reconstruction procedure (left) and for comparison also for the respective input patterns (right). Uniformity of the distribution of p-values implies that the outputs cannot be distinguished from simulations from the true model using the inhomogeneous J-function.
With the choices above we produced 2499 reconstructions of the FMD dataset and used these to test the first-order separability hypothesis as described in Section 4, using the function (11). We employed the ERL version of the global envelope test at the significance level of 5 %. The test rejected the null hypothesis with the smallest possible p-value 1/2500 = 4 · 10 −4 . The outcome of the test, together with the significant regions where the data function lies above/below the envelope, is given in Figure 8.
The significant regions clearly indicate that the epidemic has shifted over time, starting in the north-western part of the region and gradually moving to the south-eastern part (this can be also seen from the interactive plots in the accompanying website http://msekce.karlin.mff.cuni.cz/~dvorak/software/STseparability.html). This is not consistent with the null hypothesis of the first-order separability and the test acknowledges this issue by indicating that, compared to the reconstructed patterns with the separable first-order structure, the observed pattern has some missing points in the early times in the south-eastern part and in the later times in the north-western part (see the lower envelope in Figure 8). Similarly, some excess points are present in the observed pattern at various locations and times (see the upper envelope in Figure 8). It is interesting to note that this effect is strong enough not to be confused with clustering and that the significant regions are very similar to those reported by the permutation-based tests in Figure 6.

Discussion and conclusion
In the analysis of spatio-temporal point patterns, modelling of the intensity function which characterizes the firstorder structure is one of the first steps. However, modelling the joint distribution of the spatial locations and time of occurrences of a STPP can be challenging due to the curse of dimensionality. Hence, in the literature on spatiotemporal point processes, either for modelling purpose or for checking second-order separability hypothesis, firstorder separability is usually assumed 2009;2012). This assumption allows us to express the spatio-temporal intensity in a multiplicative form which nicely ease both modelling and estimation. However, in practical applications, the separability assumption can be quite restrictive and unrealistic and should be tested in the early stage of the analysis.  The 95% global envelope was constructed from 2499 simulations. The temporal coordinates of the grid are given above the corresponding plots.
From this perspective, the first-order separability can be considered as the first-order spatio-temporal independence and therefore, in principle, one may think about a connection between the separability of the intensity function and the independence of two random variables. The independence of two random variables can be tested based on the distance between empirical cumulative distribution function (Blum et al.;1961) [F n (u, v) − F n (u)F n (v)] 2 dF (u, v).
In fact, a similar test statistic could be constructed for the first-order separability test of a STPP as well. Namely, considering the separable intensity (9), for Borel sets A ⊆ R 2 and B ⊆ R, it is natural to expect that the quantity Bρ time (t) dt 2 is close to zero under the first-order separability hypothesis. This quantity can be used to assess the separability of the intensity function locally in a given sub-region A × B. To obtain a test statistic for global assessment, one should sum over disjoint sub-regions A i × B j , i = 1, . . . , k, j = 1, . . . , l, where k and l denote numbers of quadrats in space and time, respectively. In particular, if A = (0, u] and B = (0, t], then the test of first-order separability can be based on the distance between kernel estimates of intensity measures, μ (0, u] × (0, t] −μ sep (0, u] × (0, t] 2 du dt. This idea has been roughly stated in Diggle and Gabriel (2010, page 451). In fact, in the results not shown, motivated by this characterization, we explored a test function S * (u, t) = (ρ(u, t) −ρ 1 (u)ρ 2 (t)) 2 = (ρ(u, t) −ρ space (u)ρ time (t)/n) 2 , (u, t) ∈ W × T, which however did not perform as well as the chosen statistic (11), leading to lower power in our examples and loosing the comparison of the order of the separable and non-separable intensities. Our test statistic (11) is instead based on the quotient of non-parametric estimates of the non-separable and separable intensity functions. In fact, the non-separable intensity plays the main role in our test statistic; the separable intensity is constant under permutations, and thus serves only as a scaling factor. We note that the χ 2 -test is based on the counts of points in joint sub-regions, but also based on dividing the separable and non-separable quantities. It requires that the observed counts in individual cells be at least five to make the χ 2 approximation applicable, otherwise one could use the Fisher's exact test that we have not considered in this paper. We proposed permutation and χ 2 tests which are appropriate to work under the Poisson assumption. The permutation test requires the permutations and calculations of the test statistic for each permutation. While permutations are computationally cheap, estimation of the spatio-temporal intensities can take some time. As a reward, one obtains however a graphical test that shows the spatial areas and times where the data contradict the null hypothesis. On the other hand, the χ 2 -test is a simple and computationally cheap alternative. Both tests turned out to have a good performance also for processes with weak clustering. Hence, for Poisson and weakly clustered processes, we recommend the χ 2 -test as a fast preliminary test, and the permutation test based on the test function (11) in order to learn where and when the potential non-separable structures occur.
For other processes, stochastic reconstruction method was proposed. It is a computationally expensive method, whose implementation for a specific data requires some experimenting. However, it can also provide detailed information about the possible non-separabilities in the intensity of the process.
One challenging point in the use of our test statistics can be their sensitivity to the choice of bandwidth. We believe that the choice of the bandwidths in intensity estimation can affect the performance of the tests and thus their proper choice can definitely increase the power of the tests. For Poisson processes the procedures for bandwidth selection as explained in Subsection 7.1 can be used. For non-Poisson processes there is unfortunately no automatic way for bandwidth selection. In general, the user should have some prior information on the smoothness of the first-order properties in the data and choose the bandwidth with respect to this understanding. This is important for the stochastic reconstruction approach, and also for the permutation based test if applied to data with small scale clustering.