Cox process representation and inference for stochastic reaction-diffusion processes

Complex behaviour in many systems arises from the stochastic interactions of spatially distributed particles or agents. Stochastic reaction-diffusion processes are widely used to model such behaviour in disciplines ranging from biology to the social sciences, yet they are notoriously difficult to simulate and calibrate to observational data. Here we use ideas from statistical physics and machine learning to provide a solution to the inverse problem of learning a stochastic reaction diffusion process to data. Our solution relies on a novel, non-trivial connection between stochastic reaction-diffusion processes and spatio-temporal Cox processes, a well-studied class of models from computational statistics. We develop an efficient and flexible algorithm which shows excellent accuracy on numeric and real data examples from systems biology and epidemiology. By using ideas from multiple disciplines, our approach provides both new and fundamental insights into spatio-temporal stochastic systems, and a practical solution to a long-standing problem in computational modelling.

equation" (RDME) [15,16]. The computational complexity of the RDME obviously increases as the spatial discretisation becomes finer, and in many cases the limiting process does not lead to the original SRDP [17]. Significant effort has been spent to improve the performance of the two types of simulations [18]- [24], however the computational costs are still high and quickly become prohibitive for larger systems. More importantly, the lack of an analytical alternative to simulations means that evaluating the fit of a model to observations (the likelihood function) is computationally extremely expensive, effectively ruling out statistical analyses. As far as we are aware, the few attempts at statistical inference for SRDPs either used simulation-based likelihood free methods [13], inheriting the intrinsic computational difficulties discussed above, or abandoned the SRDP framework by adopting a coarse space discretisation [25] or neglecting the individual nature of agents using a linear-noise approximation [26].
In this paper, we propose an approximate solution to the problem of computing the likelihood function in SRDPs, thus providing a principled solution to the inverse problem of model calibration.
Using the classical theory of the Poisson representation (PR) for stochastic reaction processes [27], we show that marginal probability distributions of SRDPs can be approximated in a mean-field sense by spatio-temporal Cox point processes, a class of models widely used in spatio-temporal statistics [30]. Cox processes model the statistics of point patterns via an unobserved intensity field, a random function which captures the local mean of the observed point process. This novel link between these two classes of models enables us to formally associate an SRDP with a continuous evolution equation on the local statistics of the process in terms of (stochastic) partial differential equations (SPDEs). Crucially, this novel representation for SRDPs allows us to efficiently approximate multiple time marginals and thus data likelihoods associated with observed point patterns, leveraging the rich statistical literature on spatio-temporal point processes for parameter estimation and/or Bayesian inference [30,31].
We demonstrate the efficiency and accuracy of our approach on a number of numerical examples using non-trivial models from systems biology and epidemiology. Our results provide both a valuable resource for performing statistical inference and assessing model fit in this important class of models, and a novel conceptual perspective on spatio-temporal stochastic systems.

SRDPs and Poisson representation
In the classical Doi interpretation [10,11], which we adopt here, SRDPs describe the evolution of systems of point particles performing Brownian diffusion in a spatial domain D; bimolecular reactions between particles occur with a certain rate whenever two particles are closer than some specified reaction range. SRDPs are frequently analysed via coarse graining by discretising space and assuming dilute and well-mixed conditions in each compartment; in this case the dynamics in each compartment is described by the chemical master equation [32]. Modelling diffusion of particles between neighbouring compartments as unimolecular reactions leads to the reactiondiffusion master equation (RDME) [15], which describes the dynamics of a continuous-time Markov jump process. For systems with only zeroth or first order reactions, the RDME converges to Brownian dynamics in the continuum limit. For non-linear systems, however, this is not the case because the rate of bimolecular reactions converges to zero, independently of the scaling of the corresponding rate constant [17].
Consider now a set of chemical species X i in a finite volume of D dimensions divided into L cubic compartments of edge length h interacting via the following reactions: Here s ij and r ij are the number of reactants and product particles of species X i in reaction i, respectively, k j is the rate constant of reaction j, X m i is the count of species i in compartment m, and d i is the diffusion rate of species i, which is related to the microscopic diffusion coefficient D i via d i = D i /h 2 . We assume homogenous diffusion here, i.e., d i is independent of the compartment position, but it would be straightforward to extend the following analysis to space-dependent diffusion. N (l) denotes all the adjacent compartments of compartment l. Equation (1) describes chemical reactions happening in each box while Equation (2) describes diffusion between adjacent compartments. We confine our analysis to reactions with at most two reactant and at most two product particles, i.e., N i=1 s ir , N i=1 r ir ≤ 2, r = 1, . . . , R, since higher order reactions rarely occur in nature. Under well-mixed and dilute conditions in each compartment, the evolution of marginal probabilities of this system is given by the RDME which is given in general form in Appendix A.2.
Gardiner derived an alternative description of the dynamics in a RDME type of system starting from the ansatz of writing a probability distribution P (n, t), n = (n 1 , . . . n N ) as a Poisson mixture [27,28] where u = (u 1 , . . . , u N ) and P(n i ; u i ) = (e −ui u ni i )/n i ! is a Poisson distribution in n i with mean u i , and the u i are complex-valued in general. Using this ansatz in the RDME one can derive an exact Fokker-Planck equation for p(u, t) or equivalently a Langevin equation for u(t) [27]. Gardiner derived this result for the non-spatial chemical master equation and applied it to the RDME to study the corresponding continuum limit. While the PR provides an elegant analytical tool to study reaction systems, its applicability is severely hindered by the fact that the Poisson variables u are in general complex-valued and hence lack a clear physical interpretation; in particular, all bimolecular reactions and all linear reactions with two non-identical product particles give rise to a complex-valued PR (for a taxonomy of which reaction systems become complex-valued see Appendix B.1).

Cox process representation and mean-field approximations
While in the classical view of the PR the auxiliary variables u are simply introduced as a mathematical device, we can make some progress by considering a joint process over the u and n variables. Formally, this is equivalent to what in statistics is called demarginalisation: a complex process is replaced by a (simpler) process in an augmented state space, such that the marginals of the augmented process return exactly the initial process. To formalise this idea, we first introduce some concepts from spatial statistics.
A (spatial) Poisson process [29] is a measure on the space of zero-dimensional subsets of a domain D; in this work we consider processes which admit an intensity function u(x), which gives the rate of finding a point in an infinitesimally small spatial region. The number of points in a finite spatial region is then a Poisson random variable with mean given by the integral of the intensity function over that region. A Cox process (also called "doubly stochastic Poisson process") is a generalisation of a Poisson process where the intensity field is itself a random process. Conditioned on a realisation of the intensity field, the Cox process reduces to a Poisson process. We will consider families of spatial Poisson (Cox) processes indexed by a time variable; importantly, in this case the intensity field can be thought of as the state variable of the system, with the actual spatial points as noisy realisations of this state (see Fig. 1 for a graphical explanation).
Our first observation follows directly from Gardiner's analysis of the continuum limit of the RDME (see Appendix B.4.1 for a proof).
Remark 1 Consider an SRDP on a spatial domain D and temporal domain T , and let all reactions involve production or decay of at most one particle. Then, ∀t ∈ T the single-time-point Right: time evolution of a Cox process. Here, the intensity field evolves in time, rather than the points in real space. The latter are merely noisy realisations of the intensity field. In particular, the points patterns at two different time points are independent of each other conditioned on the intensity field. spatial probability distribution of the SRDP is exactly the same as of a spatial Poisson process.
This point process representation enables us to develop mathematically consistent approximation schemes for SRDPs in general. Consider a bimolecular reaction of the type A + B k −−−−→ C with the rate function f (n A , n B ) = kn A n B , where n A and n B are the number of A and B particles in the system, respectively. While the PR for such systems is complex-valued, we can formally obtain a real system by a mean-field approximation f (n A , n B ) ≈ k 2 (n A n B + n B n A ). As by definition n A = u A , this replaces the direct interaction of the particles by an effective interaction of A with the mean-field of B and vice versa. Linear reactions with two non-identical product particles can be approximated in a similar fashion. This leads to a real evolution equation for the u variables, see Appendix B.2 for details.
Applying this approximation to a general RDME and subsequently the PR and taking the continuum limit gives the following set of N coupled SPDEs (see Appendix B.3 for a derivation): where the sum over r runs over all linear reactions with two product particle of species X i . Particularly, this means that in the absence of this type of reactions, the diffusion term in Equation (4) vanishes and Equation (4) reduces to a PDE, i.e., the u i (x, t) are deterministic. D i = h 2 d i in Equation (4) is the microscopic diffusion constant of species X i , ∆ is the D-dimensional Laplace operator, u(x, t) = (u 1 (x, t), . . . , u N (x, t)), u i (x, t) is the intensity field of species X i , dW r (x, t) is spatio-temporal Gaussian white noise, and we have defined the propensity functions g r (u(x, t)) in PR space which are obtained by applying the mean-field approximation to the propensity functions f r (n) and subsequently replacing n i → u i and n i → u i . Equation (4) looks similar to the spatial chemical Langevin equation [33], but has a different interpretation here since it describes the intensity in PR space. In particular, just as any other PDE or SPDE description in real space, the spatial chemical Langevin equation does not provide a generative model for the actual location of the events, and thus would not allow us to directly model statistically particle locations. Notice that, as a consequence of the mean-field approximation, the mean value of the u i fields is the same as in a deterministic rate equation description; however, the dynamics of the observed variable, i.e., the points in space, remain stochastic even when the intensity field evolves deterministically.
We can therefore extend Remark 1 to obtain the following result (see Appendix B.4.1 for a proof) Result 1 Consider the same setting as in Remark 1. If there is at least one reaction with two product particles of the same species, the system's single-time-point distribution is exactly given by a Cox process, whose intensity fulfils the stochastic PDE (SPDE) given in Equation (4). If the system involves other types of reactions, including bimolecular reactions, the single-time probability distribution of the SRDP is approximated in a mean-field sense by that of a Poisson (Cox) process whose intensity fulfils Equation (4).
Result 1 provides an efficient means to calculate statistics such as expected number of agents within a certain volume, without the need to perform extensive Monte Carlo simulations, since it only requires to solve a (S)PDE for which a rich literature of numerical methods exists [30,31]. The numerical methods used in this paper are presented in Appendix C. Importantly, we can use Result 1 to approximate the likelihood function of a configuration of points arising from a SRDP, by using the well-known Cox process likelihood: if u(x, t) is the intensity of a spatio-temporal Cox process with distribution p(u(x, t)) and y a given measurement at time t 0 , the corresponding likelihood is given by This function can be easily optimised to yield statistical estimates of kinetic parameters from single-time observations. We consider next the problem of approximating the joint distribution of point patterns arising from an SRDP at different time points. This is important when we have time series observations, i.e., spatial measurements x = (x 0 , . . . , x n ) x i ⊂ D at discrete time points t 0 , . . . , t n , and we want to compute the likelihood p(x|Θ) of the data given a model Θ. Since the system is Markovian the likelihood factorises as We would like to approximate this likelihood using the relation to Cox processes established in Result 1. While this is in principle straightforward, computing the terms p(x i |x i−1 , Θ) involves determining the distribution over the associated u i−1 variable in PR space. This would involve inverting the PR transformation (3), which is computationally inconvenient. Instead, we opt for an approximation strategy: assume that we have determined the PR distribution p(u i−1 ) at time i − 1. By definition of the intensity of a Poisson process, u i−1 represents the expectation of the random configuration of points x i−1 at time i − 1. We then approximate p(x i |x i−1 , Θ) in a mean-field way by replacing the explicit dependence on x i−1 with its expectation measured points x = (x 0 , . . . , x n ): while they are snapshots of the actual state in the true system, they correspond to noisy realisations of the state u(x, t) in the Cox process picture. We thus have:

Result 2
The joint n-time point marginal of an SRDP can be approximated in a mean-field sense by the joint probability of a Poisson (Cox) process with intensity governed by the (S)PDE in Equation (4).
Result 2 is particularly powerful statistically, because it enables us to analytically approximate the exact (intractable) likelihood p(x|Θ) in Equation (6) by the likelihood of a spatio-temporal Cox process with intensity u(x, t). This allows us to develop efficient algorithms for approximate maximum likelihood estimation in general SRDPs; the rest of the paper is dedicated to illustrating the performance of the algorithm on a range of case studies.

Examples
In the following we apply Result 1 and Result 2 to several systems, and perform parameter inference by maximising the data likelihood. We solve the corresponding (S)PDEs numerically by projecting them onto a set of spatial basis functions, see Appendix C for details.

Gene expression
To demonstrate the accuracy of our method, we first consider simulated time-series data in this section. To this end, consider a gene expression system in a one-dimensional container as depicted in Fig. 2. A gene located in the nucleus is transcribed into mRNA molecules. The latter decay and diffuse across the whole cell, and are translated into proteins in the cytosol. The protein molecules also diffuse across the whole cell and decay. For simplicity, we do not model the gene explicitly but assume that mRNA becomes transcribed with a certain fixed rate m 1 homogeneously in the nucleus. The corresponding reactions are where M and P denote the mRNA and protein, respectively. For this system, the SPDE of our method in Equation (4) becomes deterministic and thus corresponds to a Poisson process. In addition to the reaction parameters m 1 , m 2 , p 1 and p 2 , we have to infer the nucleus size r, as well as the diffusion rates d m and d p of the mRNA and protein, respectively, summing up to Table 1: Inference for gene expression system in Fig. 2 and reactions in equations (7) and (8). We assume measurements of the protein while the mRNA is unobserved. The inference is carried out by maximising the likelihood of simulated data for thirty measurement points separated by ∆t = 0.5. This procedure is carried out for hundred simulated data sets, and the mean value and standard deviation (in parenthesis) of the inferred results are displayed.  Table 2: Inference for the same system as in Table 1 but with the additional autocatalytic reaction in Equation (9). Since only the difference p 2 − p 3 is identifiable we fix p 3 = 0.01 and infer the other seven parameters. The table shows the average and standard deviations (in parenthesis) of the inference results for one hundred simulated data sets.
a total number of seven parameters. We assume that the positions of the protein molecules are observed at thirty time-points, while the mRNA is unobserved. The results for one parameter set are shown in Table 1. Considering that we observe the protein at only thirty time points with unobserved mRNA and that we have seven unknown parameters, the inferred average values are remarkably close to the exact values. Moreover, the standard deviations of the inferred results for single data sets are small, demonstrating the accuracy and precision of our method.
Next, we extend the system in Fig. 2 by adding an autocatalytic reaction for the protein, Including this reaction leads to a non-vanishing noise term in Equation (4) and the system corresponds to a Cox process. We note that the system has a steady state only if p 3 < p 2 , with an otherwise exponentially growing mean protein number. On the mean level only the difference p 2 − p 3 is identifiable, and we fix p 3 = 0.01. We thus infer the same parameters as in the previous case, but this time modelled as a Cox process. Table 2 shows the results indicating the accuracy of our method. Table 3: Inference for the SIRS model with reactions given in Equation (10). The inference is carried out by maximising the likelihood of simulated data for forty measurement points. This procedure is carried out for two hundred simulated data sets, and the mean value and standard deviation (in parenthesis) of the inferred results are displayed.

SIRS model
We next consider the SIRS system, a popular model for the dynamics of an infection spreading through a population. Such systems are frequently modelled as SRDPs [34] or discretised versions thereof [35]. We consider a system in the two-dimensional square [0, 1] × [0, 1]. The system comprises a susceptible (S), an infected (I) and a recovered species (R), which perform Brownian diffusion and interact via the reactions where the bimolecular infection is characterised by the microscopic reaction rate k and the reaction range r. We assume that all three species diffuse with the same diffusion rate d. We assume that initially there are no recovered (R) particles, S ini susceptible (S) particles placed uniformly over the whole area, and one infected (I) particle located at [0.05, 0.05]. We consider a fully observed system and perform inference for forty simulated data points using Result 2, thereby replacing k and r by an effective bimolecular reaction parameter k PR . The model thus has four parameters that need to be inferred: the diffusion rate d, the recovery rate r, the susceptible rate s and the bimolecular infection rate k PR . Table 3 shows the corresponding results, showing the accuracy and precision of our method. The computational efficiency of our method in comparison to stochastic simulations is particularly pronounced here. For the first parameter set in Table 3, for example, the Brownian dynamics simulation of a single realisation of the system takes about 250 seconds, while the whole inference procedure for the four parameters using our method takes only about ten seconds for one simulated data set on a 3.1GHz core. Fig. 3 visualises the dynamics of the SIRS system for one parameter set. Individual points from a simulation are shown in different colours (brown for S, yellow for I and blue for R), while the background RGB colours represent the intensity of the respective PR fields with optimised Right: corresponding density along the major axis for one embryo, from experimental data (blue histogram) and point process intensity (red line). In both cases the point process prediction agrees well with the experimental data. The point process prediction is obtained by solving Equation (4) numerically for the inferred parameter values maximising the data likelihood.
parameters (blue for S, red for I and green for R). Notice how the PR approximation is still able to capture the emerging behaviour of a wave of infection sweeping through the domain from bottom left to top right, before the establishment of a dynamic equilibrium between the three population. Such a phenomenon is clearly due to the spatial aspect of the system, and could not have been recovered using an inference method that did not incorporate spatial information. Indeed, the overall number of infected individuals rises rapidly and remains essentially constant between time 20 and time 35 before dropping to steady state, a behaviour which is simply not possible in a non-spatial SIRS model.

Drosophila embryo
Finally, we apply our method to real gene expression data for the Bicoid protein at cleavage stage 13 in the Drosophila embryo. The data for seventeen embryos can be obtained from the FlyEx database [37] (available from http://flyex.uchicago.edu/flyex/). The data consists of fluorescence intensity measurements on a spatial grid and is shown for one embryo in the middle panel of Fig. 4. The protein becomes expressed in some region at the left end of the embryo and then diffuses across the embryo and decays. The system is typically modelled by a linear birth-death process [25,26], and we assume the protein to be expressed within a certain distance r from the left end of the Embryo (see Appendix D.3 for details). At cleavage stage 13 the system is supposed to be in steady state and we can perform inference using Result 1 and Equation (4). For simplicity, we project the data to one dimension (see Appendix D.3 for details).
The system has four parameters: the creation range r, the diffusion rate d, the production rate c 1 and decay rate c 2 of the Bicoid protein. For steady state data not all parameters but only certain ratios are identifiable. We thus infer the creation range r, the diffusion rate d and the ratio with standard deviations of about 20% to 30%. We find that these results do not change significantly under variations of the initial parameter values used in the likelihood optimiser. Fig. 4 illustrates the inference result for one embryo. The left figure shows the Bicoid density along the major axis of the embryo and the middle and right figures show the corresponding densities across the whole embryo for experimental data and the PR prediction, respectively. In both cases we observe good agreement between the measurement data and the point process approximation.

Discussion
We studied two popular classes of models for studying stochasticity in spatio-temporal systems; stochastic reaction-diffusion processes (SRDPs) and spatio-temporal point processes. The two classes of models are both commonly used in many disciplines such as epidemiology [14,38] and social sciences [39], however they are widely perceived as conceptually distinct. SRDPs are microscopic mechanistic descriptions used to predict the dynamics of spatially interacting particles, whereas point processes are typically used empirically to perform inference tasks for systems for which no microscopic description exists. The two approaches therefore seem to be orthogonal to each other.
However, in this paper we have shown that the two methods are intimately related. By using the Poisson representation (PR) we established a Cox process representation of SRDPs, which is exact for certain classes of systems and approximate for others. This novel representation enables us to apply a wide range of statistical inference methods to SRDPs, which has not been possible so far. We applied the developed method to several example systems from systems biology and epidemiology and obtained remarkably accurate results.
Since our method agrees with a deterministic rate equation description on the mean level, bimolecular reactions may lead to deviations from the true mean, which is known to be the case in some non-spatial scenarios [36]. Since in our method distributions are given as real Poisson mixtures, sub-Poissonian fluctuations can not be captured. However, Gardiner showed that fluctuations of SRDPs are dominated by diffusion on small length scales and therefore Poissonian [33], which may explain the accuracy of our method.
Most inference methods in the literature for SRDPs are either based on Brownian dynamics simulations or stochastic simulations of spatially discretised systems using the RDME. Both approaches are computationally extremely expensive and quickly become unfeasible for larger systems and in particular for inference purposes. In contrast, our method relies on the solution of (S)PDEs for which a rich literature of efficient numerical methods exists. For the studied example systems our method turned out to be highly efficient: the computational time for inferring four unknown parameters was found to be of the order of 10 to 100 seconds on a 3.1GHz processor. We therefore expect our method to be applicable to significantly larger systems containing more species and unknown parameters. Remarkably, simulating a single realisation of the process from Brownian dynamics took about the same computational time or even an order of magnitude more than the whole inference procedure using our method, i.e., optimising the likelihood with respect to the parameters, indicating the immense computational costs of inference methods based on such simulations.
We therefore believe that the derived representation of SRDPs in terms of Cox processes has the potential to serve as the basis for a new class of statistical inference methods for SRDPs.

A Background
A.1 The chemical master equation and its Poisson representation where k j is the rate constant of reaction j. Define the stoichiometric matrix S ij = r ij − s ij ; reaction j is of order m iff N i=1 s ij = m. We will only consider m ≤ 2 since higher order reactions rarely occur in nature. Denote as n = (n 1 , . . . , n N ) the state of the system, where n i is the copy number of species X i . Under well-mixed and dilute conditions, the time evolution of the (single time) marginal probability distribution of the system obeys the chemical master equation (CME) [32]: where the propensity function f r (n)dt gives the probability for reaction r to happen in an infinitesimal time interval dt and is given by where Ω is the volume of the system. The Poisson representation makes the ansatz to write P (n, t) as a Poisson mixture [27] P (n, t) = du P(n 1 ; u 1 ) . . . P(n N ; u N )p(u, t), u i ∈ C, where u = (u 1 , . . . , u N ) and P(n i ; u i ) = (e −ui u ni i )/n i ! is a Poisson distribution in n i with mean u i , and the u i are complex in general. The integrals in (15) in general run over the whole complex plane for each u i . Using the ansatz (15) in the generating function equation which can be derived from (13) one can derive a PDE for the distribution p(u, t), Note that this PDE generally involves derivatives of higher order than two, meaning that p(u, t) can generally become negative and thus not have a probabilistic interpretation. However, since we consider only reactions satisfying i s ir ≤ 2, i r ir ≤ 2, r = 1, . . . , R, i.e., a maximum of two reactant and a maximum of two product particles, (16) simplifies to

which is a Fokker-Planck equation (FPE) with drift vector A(u) and diffusion matrix B(u) given by
where δ i,j denotes the Kronecker delta. The corresponding Langevin equation reads where dW is a l-dimensional Wiener process and l is the number of columns of C.
Depending on the reactions in the system, the diffusion matrix may be zero, in which case the Langevin equations in (22) reduces to deterministic ODEs. On the other hand, depending on the reactions, B(u) can be not positive-semidefinite and thus CC T = B cannot be fulfilled for real u, which means that (18) is not a proper FPE in real variables, but has to be extended to complex space. In Section B.2 we give a classification of different reactions with respect to the behaviour of B(u) and show how we handle reactions that become complex in the PR.
One important property of the PR is that the mean value of the particle numbers n i is equal to the mean value of the corresponding PR variables u i , i.e., n i = u i .

A.2 RDME
Consider a system as in (12) but in a M −dimensional volume discretised into L cubic compartments of edge length h and volume h M . Denote as n = (n 1 1 , . . . , n 1 N , . . . , n L 1 , n L N ) the state of the system, where n l i is the copy number of species X i in compartment l. Under well-mixed and dilute conditions in each compartment, the reaction dynamics in each compartment is governed by a corresponding CME as in (13). If we model diffusion of species i between neighbouring compartments by linear reactions with rate constant d i = D i /h 2 , where D i is the microscopic diffusion constant, the time evolution of the (single time) marginal probability distribution of the system obeys the RDME [32]: where f r (n) is the propensity functions of reaction r, δ l i is a vector of length N * L with the entry corresponding to species i in box l equal to 1 and the other entries zero and S l r is a vector of length N * L with the entries corresponding to the lth compartment equal to the rth row of S r and zero otherwise.

A.3 Poisson and Cox processes
where P(n; u A ) is a Poisson distribution in n with mean u A . A (spatial) Cox process is a generalisation of a Poisson process and also called "doubly stochastic process", in the sense that the intensity function u is itself a random process, and conditioned on the intensity u, the Cox-process reduces to a Poisson process. The distribution of the number of points in a subregion A ⊂ D is thus a mixture of Poissons, Since we are interested in dynamical systems, we will assume time-dependent intensities u : D × T → R + , where T is a finite real interval denoting time. We then require that for any fixed time points t ∈ T the process is a spatial Poisson (Cox) process with intensity u(·, t). In the case of a Poisson (Cox) process, the intensity u may for example be defined as the solution of a PDE (SPDE).

B Cox-process representation and mean-field approximations B.1 Classification of reactions w.r.t. to their Poisson representation
As mentioned in Section A.1, the PR becomes complex depending on the reactions in the system. Table 4 shows a classification of different types of elementary reactions in terms of the behaviour of the corresponding diffusion matrix B(u). We note that this strict classification of course only holds if the considered reaction is the only reaction in the system. If there are several reactions happening, the system typically behaves as the entry in Table 4 corresponding to the reaction of highest type. The behaviours of the PR are quite intuitive: for reactions of Type I, it is well-known that fluctuations are Poissonian, which manifests itself in a deterministic PR. Note that if the Poisson representation is stochastic, the probability distribution of the molecule numbers in (15) is a real mixture of Poisson distributions, for which it is well-known that the resulting distributions are super-Poissonian. For reactions of Type II, for which fluctuations are super-Poissonian, therefore have real and stochastic PRs. It is easy to see that reactions of Type III and IV, however, cannot be represented in this way: a zeroth or first order reaction with two non-identical product molecules, i.e., of Type III, imposes a constraint on the particle numbers. For the reaction ∅ → A + B for example, the particle numbers of species A and B differ by a constant integer number (depending on the initial condition). Conditioned on the molecule number of A, B is a delta distribution, which clearly cannot be achieved by a real Poisson mixture, and the PR has to be complex. Bimolecular reactions give rise to similar constraints or may lead to sub-Poissonian fluctuations, both in which cases the PR has to be real. We therefore linearise reactions of Type III and Type IV as described in the following.

B.2 Linearisation of Type III and IV reactions within the Poisson representation
We would like a real PR for general reaction networks. We therefore have to approximate reactions of Type III and IV. Consider first reactions of Type IV, where two molecules react with each other.
III i s ir ≤ 1 r ir = r jr = 1 for two i = j and zero otherwise zero or one reactant; two non-identical product molecules We approximate this type of reactions in a mean-field type of sense: we replace the interaction of the two molecules with each other by two reactions where one species interacts with the mean value of the respective other one. For instance, the reaction where n A and n B denote the mean values of the molecule numbers of species A and B, respectively, becomes replaced by We thus replace the bimolecular reaction in (28) by the two linear reactions in (29) and (30) with modified reaction rates. The reactions (29) and (30) thus correspond to linear reactions with one reactant and zero product molecules. The corresponding PR is therefore real and deterministic.
Since the mean value of the corresponding PR variables, say u A and u B , is equal to the means of n A or n B , the rate constants in PR space simply become rescaled by u A /Ω and u B /Ω, respectively. Specifically, if there are no other reactions happening in the system, the PR Langevin equations read In particular, if there is no other reaction happening in the system, the PR is deterministic and we can write u A = u A and u B = u B . Consider now a bimolecular reaction with two identical reactant molecules, For such type of reactions, we replace the interaction of A with itself by the interaction of A with its mean, In PR space, this leads to the Langevin equation for A, We thus arrive at a real representation in PR space. Consider next reactions of Type III (c.f. Table 4). The reaction can be approximated in a similar fashion as the bimolecular reactions before: we replace the dependence of the creation of B molecules on A molecules by a dependence on the mean of the later, n A , This can also be seen as a mean-field approximation. For the other two Type III reactions, we have to decouple the productions of B and C, i.e., we approximate the reactions by While (38) correlates the molecule numbers of species B and C, we have effectively decorrelated B and C by introducing the reactions (40) and (40). Table 5 summarises the approximations for all reactions of Type III and IV. Note that depending of the reaction, a combination of the used approximations has to be performed, for example for the reaction A + B → A + C.

B.3 Linearised Poisson representation in space
We now aim at applying the PR to the RDME in (23) and (24) after applying the previous discussed linearisations of certain reactions, and subsequently taking the continuum limit. We first consider the diffusion term in (23). Since different species do not interact with each other here, we can consider a single species system, say species X 1 , for which (23) reduces to where n = (n 1 , . . . , n L ), n m is the number of X 1 particles in compartment l, δ m is a vector with a one in the mth entry and zero otherwise, and the sum over m runs over all neighbouring compartments of compartment l. These are reactions of Type I (c.f. Table 4) and we can use the PR without any approximations, and the corresponding Langevin equation (which is deterministic in this case) reads where M is the spatial dimension of the system and D = dh 2 the microscopic diffusion constant.
Since the sum over m runs over all adjacent compartments of compartment l, the fraction in (43) is just the discretised version of the Laplace operator ∆ = ∂ 2 1 + . . . + ∂ 2 M . Introducing a discretised density field u(x l ) = u l /h M , where x l is the centre of compartment l, and taking the continuum limit of (43) we get the PDE which is just the diffusion equation for the field u(x, t). Consider next the reaction term in (24). Since reactions only occur within compartments, we can treat the compartments independently of each other. For a single compartment, (24) then reduces to the CME in (13). Here, however, we first apply the approximations discussed in the previous section for reactions of Type III and IV (c.f. Table 4). The corresponding propensity functions in PR space are given in the last column of Table 5 after replacing the n variables by their corresponding u variables. This then leads to a real valued PR as in (22). Since in the applied approximations, only reactions of Type II, i.e., zeroth or first order reactions with two identical product molecules, lead to stochastic terms in the PR, the PR Langevin equation simplifies to where the sum over r only runs over all reactions of Type II that produce X i particles. The propensities g r (u) are obtained by replacing the n variables with u variables and Ω with h M in the expressions in the last column of Table 5. The factor of two in the square root comes from the fact that two identical molecules become produced in these reactions. Reintroducing the label l denoting the compartment number in (45), and the species label i in (44) we can take the two contributions together to obtain If we again define the discretised density field u(x l ) = u l /h M , where x l is the centre of compartment l, and dW r (x l ) = dW l r / √ h M , we can take the continuum limit of (43) leading to Here, the dW r are spatio-temporal Gaussian white noise terms satisfying dW r (x, t)dW r (y, t) = δ r,r δ(x − y)dt.

B.4 Cox process representation
Note that Gardiner has derived equations similar to (47) in [27] by applying the full PR without any approximations to the RDME and taking the continuum limit. For reactions of Type I and II, the result is equivalent to (47). For reactions of Type III and IV, however, Gardiner's equations are complex. To our knowledge no one has ever stated the observation that the single-time marginal distributions of a system only involving reactions of Type I (and Type II) are exactly the same as the ones of a spatio-temporal Poisson (Cox-) point process. We will derive this result rigorously next.

B.4.1 Single time marginals -derivation of Remark 1 and Result 1
For simplicity, we consider a one-dimensional system with one species X in the interval [0, 1] here. Let us go one step back again and consider the PR of the discretised version in (46). Consider first a system involving only reactions of Type I. In that case we do not have to perform any approximations to arrive at (46) and the second sum including the noise terms vanishes, i.e., (46) reduces to a PDE. For deterministic initial conditions the u i thus remain deterministic, and the probability distribution of n l in compartment l at time t is given by a Poisson distribution with mean value u l (t). The mean number of molecules in an interval where N (I, t) = m2 i=m1 n i . Since the n i are independent Poisson random variables, N (I, t) is also a Poisson random variable with mean N (I, t) = m2 i=m1 u i (t). Defining u i /h → u(x i ), where x l is the center of compartment l, allows us to take the continuum limit h → 0 of (46) which gives the PDE in (47). The mean value of N (I, t) can be written as N (I, t) = m2 i=m1 hu(x i , t), which is a Riemann sum. Taking the limit h → 0 for constant I gives According to the "Countable additivity theorem", the sum of an infinite number of Poisson distributed independent random variables converges with probability 1 if the sum of the mean values converges, and the sum has a Poisson distribution with corresponding mean value. Assuming that the mean particle density is bound everywhere, which means that the values u i /h = u(x i ) are bound for all i and all h. Let B be such an upper bound. Since the sum converges in the limit h → 0. We thus find that N (I, t) is Poisson distributed in the continuum limit and we can write P (N (I, t) = n) → P(n; I dxu(x, t)).
The same can be shown similarly for a countable union of subintervals of [0, 1], and N (U 1 , t) and N (U 2 , t) are obviously independent for disjunct U 1 and U 2 . The probability distribution for any fixed t is thus exactly the same as the one of a spatial Poisson process with intensity u(x, t). Suppose now the system also includes reactions of Type II. In this case the PR becomes stochastic, i.e., there will be non-vanishing noise terms in (46) and its continuum version (47).
The field u(x, t) is thus a random process. Given a realisation of u(x, t), the same considerations as for the deterministic case apply and the single-time pdf behaves like a spatio-temporal Poisson process. Since u(x, t) now a random process, the single-time pdf of the systems corresponds exactly to the one of a spatial Cox-process with intensity u(x, t). The same considerations hold in an approximate sense for Type III and IV reactions. These findings can easily be generalised to multiple-species systems and general spatial dimensions. This concludes the derivation of Remark 1 and Result 1 in the main text. To apply the derived Cox-process representation we need to solve (S)PDEs. We do this here by reducing the system to a finite coupled set of (S)ODEs via basis function projection. For illustration we confine ourselves here to a one-dimension and one-species system, but the equations can be easily extended to multi-dimensional and multi-species systems. Consider a SPDE of the form where A(x, t) and C(x, t) are polynomials in u(x, t) with potentially space-dependent coefficients. We approximate u(x, t) as a linear-combination of a finite set of basis functions where we have introduced the time-dependent coefficients c i (t). Inserting this ansatz into (52), multiplying from the left with φ j and integrating over x, it can be shown that the parameter vector c = (c 1 , . . . c n ) fulfils [26] where dW is a n-dimensional Wiener process and we have defined for a general function f (x, t).

C.1.2 For the linearised Poisson representation
Due to the approximations of certain reaction types introduced in Section B.2 the drift and diffusion terms in the SDE in (54) are always linear in the coefficient vector c, with coefficients of the drift potentially depending on c , i.e., the drift may contain terms of the form c i c j . Using this it is straightforward to show that the moment equations of c of different orders are not coupled to each other, i.e., the first order moment equations depend only on first order moments, etc. This in turn allows to directly numerically integrate these equations. Now depending on the reactions involved, the SDE in (54) may have a multivariate Gaussian solution, which can be obtained by integrating the moment equations of up to order two. If the solution of the SDE is not Gaussian, we simply approximate it by a multivariate Gaussian with mean and variance obtained in the same way. Therefore, with the approximations introduced in Section B.2, all upcoming SPDEs can be directly solved numerically without the need of any additional approximations.

C.1.3 Locally constant non-overlapping basis functions
For simplicity, we use locally constant non-overlapping step functions throughout this work. For a one-dimensional system in the interval [0, 1], for example, we define n basis functions as The overlap matrix and diffusion operator then read (61)

C.2 The likelihood
Consider a Poisson process with intensity u(x, t) given as the solution of a PDE as in (52) (with vanishing noise term), and spatial measurements x = (x 0 , . . . , x n ) at discrete times t 0 , . . . , t n . Suppose the intensity is approximated by a linear combination of basis function as in (53). Solving the PDE for u(x, t) thus amounts to solving the system of ODEs in (54) (with vanishing noise terms) for the coefficient vector c.
Since the intensity of a Poisson process is deterministic, the likelihood is simply computed by solving the ODE in c forward over the whole time interval and subsequently computing the likelihood as where u(x, t i ) is given in terms of c(t i ) in (53). In the case of a Cox-process, the intensity u(x, t) fulfils a SPDE and thus is a random process. After basis projection as in (54) the dynamics can be formulated in terms of the coefficients c i (t), which fulfil the system of SDEs in (54). We consider here the case of linear SDEs as is the case for the systems studied in this work, which have a Gaussian solution if the initial conditions are Gaussian.
The likelihood now has to be computed iteratively by solving the SDEs forward between measurement points and performing measurement updates. Suppose we have the Gaussian posterior p(c(t i−1 )|x i−1 , . . . , x 0 ) at time t i−1 . Solving the SDE for c forward in time we thus obtain p(c(t 1 )|x i−1 , . . . , x 0 ) which is again Gaussian. The posterior at time t i is then obtained by the Bayesian update as with likelihood contribution where p(x i |c(t i )) is given in (63). The full likelihood is then given by The posterior in (64) is generally not Gaussian and intractable. We thus approximate it by a Gaussian using the Laplace approximation [26], which approximates the posterior by a Gaussian centred at the posterior's mode and with covariance being the negative Hessian of the posterior in the mode.
D Details for studied systems D.1 Gene expression

D.1.1 Equations
Consider the gene expression system in Fig. 2 in the main text. We do not model the gene explicitly here, but rather assume a homogeneous production of mRNA in the nucleus. The corresponding reactions are and both the mRNA M and protein P diffuse across the whole cell with diffusion constants d M and d P , respectively. After linearising the reaction in (69) as explained in Section B.2 the PR for this system is real and deterministic, and we obtain using (47) where r is the size of the nucleus and Θ the Heaviside step function. The functions h n (x) and h c (x) arise from M and P only being created in the nucleus and cytosol, respectively. If we additionally include the reaction P p3 − −−−− → P + P, the equation for u P (x, t) becomes a SPDE and reads du P (x, t) = [d P ∆u P (x, t) + p 1 h c (x)u M (x, t) + p 3 u P (x, t) − p 2 u P (x, t)]dt (76) + 2p 3 u P (x, t)dW (x, t).

D.1.2 Inference
Consider first the system without the reaction in (75). In this case the system corresponds to a Poisson process. After basis function projection of the PDEs in (71) and (72) as in Section C.1, we are left with solving a coupled system of ODEs and can compute data likelihoods as in (62). We fix the parameters to We assume that initially there are zero mRNA molecules and zero protein molecules in the cell. We further assume that the mRNA is unobserved and consider measurements of the protein at twenty equally separated time points separated by ∆t = 0.5. We project the PDEs in (71) and (72) onto twenty basis functions as explained in Section C.1. We then optimise the likelihood of the data with respect to the parameters to obtain the inferred parameter values. We vary the initial values for the parameters in the likelihood optimiser randomly between 0.5 times and 2 times the exact value. The inference results are shown in Table 1 in the main text. Next, we consider the same system but with the additional reaction in (75), for which the PDE in (72) gets replaced by the SPDE in (76). Now the system corresponds to a Cox-process, and we are left with solving a coupled system of SDEs after basis function projection. We approximate the solution of the SDEs by a multivariate Gaussian as described in Section C.1. The corresponding likelihoods can then be computed as in (66). We again consider measurements of the protein at equally separated time points separated by ∆t = 0.5 and optimise the corresponding likelihood. The results are shown in Table 2 in the main text.

D.2.1 Equations
The reactions of the SIRS system are We consider a system in the two-dimensional square [0, 1] × [0, 1]. After linearising the reaction in (69) as explained in Section B.2 the PR for this system is real and deterministic, and we obtain using (47) for the intensity fields of S, I and R, du S (x, t) = d∆u(x, t) − ku S (x, t)u I (x, t) + su R (x, t), du I (x, t) = d∆u I (x, t) + ku S (x, t)u I (x, t) − ru I (x, t),

D.2.2 Inference
As an initial condition we distribute S ini particles of species S randomly across the whole area, one I particle at [0.05, 0.05] and assume zero R particles. We simulate data for forty time points equally spaced by ∆t = 1. As a basis we take 100 basis functions equally distributed in both dimensions. The inference results are shown in Table 3 in the main text.

D.3.1 Data and equations
The data of the Bicoid protein in Drosophila embryos we are considering here consists of twodimensional fluorescence data as depicted in the left panel in Fig. 4 in the main text. Since the relation of measured fluorescence intensity to actual protein numbers is unknown we simply translate them one to one here. The Bicoid is typically modelled by a simple birth-death process with the reactions For simplicity, since diffusion is radially symmetric, we only consider the data within a certain distance from the major axis of the embryos, thus effectively obtaining one-dimensional data. We assume further that the protein is produced within a certain range around the left tip of the embryos. Mathematically the system is thus basically equivalent to the mRNA system in Section D.1. The intensity of the protein then fulfils the PDE du(x, t) = (d∆u(x, t) + k 1 f (x) − k 2 u(x, t))dt, where x is the distance from the left top of the embryo, d is the diffusion constant, k 1 the production rate, f (x) = 1, x ∈ [0, r], f (x) = 0, x / ∈ [0, 1], r is the production radius around the origin and k 2 is the decay rate.

D.3.2 Inference
Since we have steady state data, not all parameters are identifiable. One can easily see that multiplication of k 1 , and k 2 with the same factor leads to the same steady state. We thus infer the creation range r, the diffusion rate d, and the ratio c = k 2 /k 1 . For the inference we project the PDE in (84) on twenty basis functions and solve the resulting ODEs for large times to ensure equilibrium. We optimise the likelihood for each of the embryos independently to obtain the inferred parameter values.