Central limit approximations for Markov population processes with countably many types

When modelling metapopulation dynamics, the influence of a single patch on the metapopulation depends on the number of individuals in the patch. Since there is usually no obvious natural upper limit on the number of individuals in a patch, this leads to systems in which there are countably infinitely many possible types of entity. Analogous considerations apply in the transmission of parasitic diseases. In this paper, we prove central limit theorems for quite general systems of this kind, together with bounds on the rate of convergence in an appropriately chosen weighted $\ell_1$ norm.


Introduction
Metapopulations, introduced by Levins (1969), are used to describe the evolution of the population of a species in a fragmented habitat. The metapopulation consists of a number of distinct patches, together with (a summary of) the population present in each patch, and its development over time is governed by specified within and between patch dynamics. In the Markovian structured mean-field metapopulation model of Arrigoni (2003), the state of the system consists simply of the numbers of individuals in each patch. Individuals reproduce within patches and migrate between patches, and each patch is subject to random catastrophes, which reduce its population to zero. Letting N be the total number of patches, thought of as being large, and letting X N,i here, x := N −1 X. The total number N := j≥0 X N,j of patches remains constant throughout, and the number of individuals in any one patch changes by at most one at each transition. The per capita death and birth rates (d i ), (b i ) within each patch are allowed to vary with the current population size i, but in the same way in all patches; they would usually be chosen to correspond to one of the traditional single species demographic models. The per capita migration rate γ is also the same for all individuals, as is the probability ρ that a migration is successful, and a successful migrant chooses its new patch uniformly at random. Each patch is independently subject to catastrophes at the same rate κ.
If there were an absolute upper limit for the number of individuals in each patch, the model would be a finite dimensional Markov population process. The behaviour of these finite dimensional models can be approximated using the methods pioneered by Kurtz (1970Kurtz ( , 1971, who was able to establish a law of large numbers approximation, in the form of a system of ordinary differential equations, and a corresponding diffusion approximation. However, there are no upper limits on population number in the usual single population models, and it is the stochastic evolution according to the rules of the model that dictates the region in which population numbers typically lie. Thus it seems unnatural to introduce an a priori upper limit in the system above, just because more than one population is being modelled. The same considerations surface in a number of other population models, including the epidemic models of Luchsinger (2001a,b) and Kretzschmar (1993), and the model of cell behaviour as a function of the copy number of a particular gene in Kimmel & Axelrod (2002, Chapter 7). Instead, it makes sense to consider Markov population processes with a countably infinite number of dimensions as models in their own right.
A law of large numbers in a general setting of this kind was first established by Eibeck & Wagner (2003). Under appropriate conditions, Barbour & Luczak (2008, 2011 strengthened the law of large numbers by providing an error bound, in a weighted ℓ 1 norm, that is close to optimal order in N. In this paper, these latter results are complemented by a central limit approximation, together with a corresponding error estimate. Our general setting, as in Barbour & Luczak (2011) [BL], is that of families of Markov population processes X N := (X N t , t ≥ 0), N ≥ 1, taking values in the countable space X + := {X ∈ Z Z + + ; m≥0 X m < ∞}. The component X N,j t of X N t represents the number of individuals of type j that are present at time t, and there are countably many types possible; however, at any given time, there are only finitely many individuals in the system. The process evolves as a Markov process with state-dependent transitions where each jump is of bounded influence, in the sense that so that the number of individuals affected at each transition is uniformly bounded. Density dependence is reflected in the fact that the arguments of the functions α J are counts normalised by the 'typical size' N. Writing R := R Z + + , the functions α J : R → R + are assumed to satisfy where R 0 := {ξ ∈ R: ξ i = 0 for all but finitely many i}; this assumption implies that the processes X N are indeed pure jump processes, at least for some non-zero length of time. To prevent the paths leaving X + , we also assume that J l ≥ −1 for each l, and that α J (ξ) = 0 if ξ l = 0 for any J ∈ J such that J l = −1.
In the finite dimensional case, the law of large numbers is expressed in terms of the system of deterministic equations where A is a constant Z + ×Z + matrix, and (1.4) is then treated as a perturbed linear system (Pazy 1983, Chapter 6). Under suitable assumptions on A, there exists a measure µ on Z + , defining a weighted ℓ 1 norm · µ on R, and a strongly · µ -continuous semigroup {R(t), t ≥ 0} of transition matrices having pointwise derivative R ′ (0) = A. If F is locally · µ -Lipschitz, the solution x of the integral equation except on an event of probability of order O(N −1 log N), provided that The conditions under which this approximation holds can be divided into three categories: growth conditions on the transition rates, so that the a priori bounds, which have the character of moment bounds, can be established; conditions on the matrix A, sufficient to limit the growth of the semigroup R, and (together with the properties of F ) to determine the weights defining the metric in which the approximation is to be carried out; and conditions on the initial state of the system. The conditions are described in the next section. They are all needed in the current paper, too, in which we investigate the difference x N t − x t in greater detail.
Our main result, Theorem 6.1, shows that, under some extra conditions, it is possible to construct a diffusion process Y on the same probability space as X N in such a way that except on an event of probability of order O(N −b 2 ), for specific values of b 1 and b 2 . With the best possible control of moments, as for the model of Arrigoni (2003) mentioned above, one can take any b 1 < 1/4 and any b 2 < 1, provided that the initial conditions are appropriately chosen. The process Y can be interpreted as the infinite dimensional analogue of the diffusion approximation in Kurtz (1971), satisfying the formal stochastic differential equation Here, dW is a time-inhomogeneous white noise process with infinitesimal covariance matrix σ 2 (t) := J∈J JJ T α J (x t ), and Y has time-inhomogeneous linear drift with coefficient matrix A + DF (x t ). In particular, ifx is an equilibrium of the deterministic equations, satisfying Ax+F (x) = 0, then Y is an infinite dimensional Ornstein-Uhlenbeck process, with constant drift coefficient matrix A+DF (x) and infinitesimal covariance matrix J∈J JJ T α J (x).

Basic approach
The structure of the argument is as follows. It is shown in [BL, (4.8)] that, under suitable conditions, the process x N satisfies an equation very similar to (1.6): 10) and is a local martingale. Taking the difference between (1.9) and (1.6), and multiplying by √ N, gives Starting with this representation of U N , the first step is to show that η N is uniformly small with high probability, so that the randomness in U N is driven principally by the process N 1/2 m N . This quantity is in turn determined, through (1.10), by the local martingale N 1/2 m N . The next step is to show that N 1/2 m N is close to a diffusion W , formally expressible as (1.14) where the {W J , J ∈ J } are independent standard Brownian motions, and A J (t) := t 0 α J (x s ) ds: this is the diffusion appearing in (1.8). Analogously to (1.10), we then show that we can define a process W such that and that W is close to N 1/2 m N . Finally, returning to (1.12), we show that U N is close to the solution Y to the analogous equation which in turn can be shown to exist and be unique. The random process solving (1.16) at first sight seems rather mysterious. However, partial integration represents W t as t 0 R(t − s) dW s , and so the expression for Y can indeed be interpreted as the variation of constants representation of the solution to the formal stochastic differential equation (1.8).
In the remaining sections, this programme is carried out in detail. Section 2 is concerned with specifying the conditions under which the main theorem is true, and with recalling the results from [BL] that are needed here. In the subsequent sections, the steps sketched above are examined in turn.

Assumptions and preliminaries
We assume henceforth that (1.2) and (1.3) are satisfied. Since the index j ∈ Z + is symbolic in nature, we fix an ν ∈ R, such that ν(j) reflects in some sense the 'size' of j: We then assume that most indices are large and that most transitions involve some large indices, in the sense that, for for some n 1 , n 2 and β 1 ≤ β 2 ; note that in fact β 2 ≤ 2β 1 J * also. As a consequence of these assumptions, for any s > β 1 , there exists K s < ∞ such that

Moment assumptions
In the proofs that follow, it is important to be able to show that x N is largely concentrated on indices j with ν(j) not too large. This is shown to be the case in [BL, Section 2], under the following 'moment' assumptions.
(2.5) x ∈ R, the assumptions that we need are as follows.
Assumption 2.1 For ν as above, assume that there exist r and also that, for some non-negative constants k rl , the inequalities As a result of these assumptions, it is shown in [BL, Lemma 2.3 and Theorem 2.4] that, if r is such that 1 ≤ r ≤ r (2) max and if max{S r (x N 0 ), S p(r) (x N 0 )} ≤ C for some C, then there are constants C 1 and C 2 , depending on C, r and T , such that P sup

Semigroup assumptions
In order to make sense of (1.6), we need some assumptions about A. We assume that and that, for some µ ∈ R Z + + such that µ(m) ≥ 1 for each m ≥ 0, and for some w ≥ 0, We then use µ to define the µ-norm and, under these assumptions, the transition semigroup R is well defined [BL, Section 3], and for all j and t. (2.13) Note that there may be many possible choices for µ, but that we also require that F is locally Lipschitz in the µ-norm, in order to ensure that (1.6) has a µ-continuous solution: we assume that, for any z > 0, 14) and this should be borne in mind when choosing µ. We further assume that, for some β 3 , β 4 , Transition rate assumptions We need to ensure that the sum of the transition rates, even when weighted by largish powers of ν(j), remains bounded. To ensure this, we assume that, for some r 0 large enough, there exist r 1 ≤ r this assumption is a specialized version of [BL, (2.25)]. In view of (2.9), this implies that, if max{S r 1 (x N 0 ), S p(r 1 ) (x N 0 )} ≤ C, then there are constants C 1 and C 2 depending on C and T , such that We shall therefore assume that the initial condition needed for (2.17) is indeed satisfied: that, for some C < ∞, It can be seen from the statement of Theorem 6.1 that the larger we can take r 0 in (2.16), the sharper the approximation bound that we get in (1.7), in that ζ can be taken smaller for a given value of the product ζr 0 , resulting in larger values of b 1 (ζ).
Since it is immediate that it follows that, for any r, s ≥ 0, except on an event of probability at most C 2 N −1 .

Smoothness assumptions
We need some smoothness conditions on the rates near the deterministic path x. First, for some δ > 0, we assume that F has second order partial derivatives in the tube where x solves (1.6), and that, for any j, k, l, where the v jkl are such that It is also true under the following condition: that, for each k, there exists 24) for suitable n 0 , K 0 and K 1 , all finite. The first derivative of F has already been assumed to be µ-Lipschitz in (2.14); with the assumption (2.21), F becomes continuously µ-differentiable in the tube, so that, for some con- (2.25) We also assume that the individual transition rates α J are uniformly µ- for some K α , β 5 > 0. This assumption, and those on the second derivatives of F , go beyond what is required for the law of large numbers in [BL]; the same is true of the assumptions (2.2) and (2.15).

Preliminary conclusions
We now assume, in addition, that we can take T and K T such that, for all N large enough, if We shall from now on also assume that (2.18) holds with x for x N . Since then x can be represented as a limit of processes x M satisfying (2.18), because of (2.29), it follows in view of (2.16) and (2.17) that we also have and therefore, as for (2.19), for r + s ≤ r 0 , When approximating x N by a deterministic path x, it is natural to choose their initial values to be close, as in (2.28). The impact of also assuming (2.18) for the initial values of both paths is to specify how much closer the components need to be, whose indices j have ν(j) large.

Example
In the model of Arrigoni (2003) presented in the introduction, we can take ν(j) = j + 1, in which case β 1 = 1 and β 2 = 2, the latter because of the migration transition. Calculation shows that (2.7) is satisfied for all r, as is (2.8) also, with p(r) = 2r, so that we can take r Furthermore, (2.16) is satisfied for any r 0 , with r 1 = r 0 + 1. The quantities A and F are given by with all other elements of A equal to zero, and, writing s(x) := j≥1 jx j , where we have used the fact that j≥0 x j = 1. Hence Assumption (2.10) is immediate, and Assumption (2.11) holds for µ(j) = j + 1 (so that β 3 = 1), With the above choice of µ, F can easily be seen to be locally Lipschitz in the µ-norm, with K(µ, F ; z) ≤ 4ργz. The partial derivatives of F are given by for any i, k, l ≥ 0 (we take x −1 = 0). From this, it follows (using the elementary bound j + 1 ≥ 2j in j ≥ 0) that we can take K F 1 = 6ργ sup 0≤t≤T x t µ in (2.25), and that (2.24) is satisfied with n 0 = 2, K 0 = 2 and K 1 = ργ, so that (2.22) is also satisfied (one can in fact take K F 2 = 4ργ). Finally, (2.26) is satisfied, with β 5 = 0 if the d i are uniformly bounded, and with β 5 = 1 if they grow linearly, and with K α of the form K ′ (1 + sup 0≤t≤T x t µ ).

Controlling η N
From now on, we assume that all the assumptions of Section 2 are in force. We first show that the effect of the perturbation η N is negligible. For η N t , from (1.13), we need to consider the difference We note first that, if h µ ≤ δ for δ as in Condition (2.20), then, from (2.22), Hence, from (2.29) and from (2.13), for all N large enough to ensure that K (2) for all 0 < t ≤ T , except on a set of probability at most K T N −1 log N.

Discrete to diffusion
We now show that N 1/2 m N is close in the µ-norm to the diffusion W , given by as in (1.14). We first need to show that this W indeed has paths in R µ . For this, it is enough to show that for all t.
We begin by noting that, using the reflection principle, if B is standard Brownian motion, then there exists a constant γ < ∞ such that, for all a > 0, Thus, from (4.2), for any C > 1 and p, T > 0, we have for all 0 ≤ t ≤ T , except on a set of probability at most e{Cν(J)} −p . Hence it follows that for all 0 ≤ t ≤ T and for all J ∈ J , except on a set of probability at most since C is arbitrary. However, by (2.15) and recalling the definition of J * , we have, for any ε > 0, and both sums in the final expression are finite, by (2.4) and (2.30), provided that β 2 + 2β 3 < r 0 and that ε is small enough.
Having established that W indeed has paths in R µ , we now need to show that it is close to N 1/2 m N in the µ-norm, if the Brownian motions W J are suitably chosen. The relationship between N 1/2 m N and W arises because N 1/2 m N can be represented in the form We thus wish to show that the W J can be chosen in such a way that sup 0≤t≤T j≥0 is small, where we define For use in the next section, we prove somewhat more: that, under appropriate conditions, we can replace µ(j) in (4.6) by the larger quantity ν * (j) := {ν(j)} β 3 +β 4 , and still obtain something that is small. To do so, we begin by bounding the sum by We begin with T 2 (t), which we deal with by showing that, for suitable by (2.4) and (2.31), so long as r > β 3 + β 4 + ε + β 2 /2 and 2(r + r ′ ) ≤ r 0 . Hence, if r 0 > β 2 + 2(β 3 + β 4 ), then for any with an implied constant uniform for all p > β 2 , except on an event with probability of order O(M −p+β 2 ). So far, the bounds have been achieved without any specific choice of the Brownian motions W J , but, for T 1 (t), we need to be more precise. We treat each J separately, since the underlying Poisson processes P J are independent, and match the centred and normalized Poisson process Z N J to a Brownian motion W J using the KMT construction. We need only to do this over a limited time interval, since, from (2.17), except on an event E 0 of probability of order O(N −1 ), so that, off E 0 , (4.14) We use Komlós, Major & Tusnády (1975, Theorem 1 (ii)), together with (4.2) to interpolate between integer time points, applied to the centred unit rate Poisson process. This implies that, for any p > 0, we can choose W J in such a way that First, note that, by (2.26), and that sup 0≤t≤T except on an event of probability of order O(N −1 log N+M β 2 +2β 5 {M 2β 5 /N} r ′ ).
Off the exceptional event, we have for any ε > 0, and the exceptional event can be made to have probability of order O(N −1 log N) by choosing r ′ large enough, provided that M is bounded by a small enough power of N. Combining (4.10), (4.13), (4.15) and (4.20), and choosing M = N ζ , we see that we have no useful bound unless ζr 0 > 1 (because of the exceptional event in (4.10)) and ζ < 1/{4(β 2 + β 3 + β 4 ) + 2β 5 } (in view of (4.20)), so that r 0 > 4(β 2 + β 3 + β 4 ) + 2β 5 is a minimal requirement. Note that this assumption on r 0 is more restrictive than that assumed in Section 2. The error bound in (4.15) is always smaller than that in (4.20), and with ζr 0 > 1, the error bound in (4.10) is smaller than that in (4.13). This translates into the following conclusion: if r 0 > 4(β 2 + β 3 + β 4 ) + 2β 5 , then for any except on an event of probability of order O(N −b 2 ), for any

The existence of W , and its approximation
The next step in the argument is to show that the process W in (1.15), related to W exactly as m N is related to m N through (1.10), is well defined, and that it is indeed the limiting analogue of the process N 1/2 m N t . For its existence, recalling (1.15), it is enough to show that R(t − s)AW s exists for each s, t, and belongs to R µ . For this, it is enough to show that is a.s. bounded. Now, in view of (2.11), (2.13) and (2.15) and recalling that the off-diagonal entries of A are non-negative, this will be the case if we can bound is finite for some ε > 0, and this is achieved as in (4.12), if r 0 > β 2 +2(β 3 +β 4 ). To show that W is a good approximation to N 1/2 m N , we begin with the result proved in (4.21) above, that, except on an event of probability of or- Arguing much as in the previous paragraph, we need to bound But this is exactly what we achieved in (4.21). Hence, for ζ such that 1/r 0 < ζ < 1/{4(β 2 + β 3 + β 4 ) + 2β 5 } and for any b

The final approximation
The final step in the argument is to compare the solution U N to (1.12) with the solution Y to (1.16). Both satisfy the general equation but with different initial conditions Z 0 and forcing functions z; and their difference Y − U N also satisfies (6.1), with initial conditions and forcing functions subtracted. Now, for U N , the forcing function η N + N 1/2 m N is close to the forcing function W for Y , because of (3.1) and (5.2), and we shall assume that U N 0 and Y 0 are also close to one another, so that both differences are small. We now show that this implies that the difference between Y and U N is also small.
So, for example, in the model of Arrigoni (2003), we can take r 0 as big as we wish, and then ζr 0 = 2, allowing b 1 = 1/4 − ε and b 2 = 1 − ε for any ε > 0. However, these rates can only be attained for correspondingly well controlled initial conditions: in addition to (2.28), it is necessary to ensure that (2.18) is satisfied, so that S 2(r 0 +1) (x N 0 ) ≤ C for some C < ∞ and for all N, and that S 2(r 0 +1) (x 0 ) ≤ C also. For stochastic logistic dynamics within the patches, with b i = b and d i = d + ci, we need to take r 0 to exceed 4(β 2 + β 3 + β 4 ) + 2β 5 = 22 to yield an error bound that converges to zero with N, and thus require the initial conditions to have uniformly bounded (46 + δ)-th moments for some δ > 0.