Diffusion limit for the partner model at the critical value

The partner model is an SIS epidemic in a population with random formation and dissolution of partnerships, and with disease transmission only occuring within partnerships. Foxall, Edwards, and van den Driessche found the critical value and studied the subcritical and supercritical regimes. Recently Foxall has shown that (if there are enough initial infecteds $I_0$) the extinction time in the critical model is of order $\sqrt{N}$. Here we improve that result by proving the convergence of $i_N(t)=I(\sqrt{N}t)/\sqrt{N}$ to a limiting diffusion. We do this by showing that within a short time, this four dimensional process collapses to two dimensions: the number of $SI$ and $II$ partnerships are constant multiples of the the number of infected singles. The other variable, the total number of singles, fluctuates around its equilibrium like an Ornstein-Uhlenbeck process of magnitude $\sqrt{N}$ on the original time scale and averages out of the limit theorem for $i_N(t)$. As a by-product of our proof we show that if $\tau_N$ is the extinction time of $i_N(t)$ (on the $\sqrt{N}$ time scale) then $\tau_N$ has a limit.


Introduction
In the partner model each of N individuals can be susceptible or infected and in a partnership or not. So the system is described by giving S t , I t , SS t , SI t , and II t , i.e., the number of single individuals and partnerships with the indicated states at time t. Infected individuals become healthy (and susceptible to re-infection) at rate 1. A susceptible individual with an infected partner becomes infected at rate λ. Partnerships dissolve at rate r − . Each pair of single individuals forms a partnership at rate r + /N . Foxall, Edwards, and van den Driessche [7] introduced this model and showed that despite the complexity of the model it is possible to find the critical value explicitly. To do this they used the continuous time Markov chain X t with rates described in Figure 1. Thinking of a single infected individual in an otherwise susceptible population, we start in state A, and let τ be the first time X t enters {D, E, F, G}. The basic reproduction number for the model is which is the expected number of infected singles at time τ . The critical value of λ is with λ c < ∞ if and only if r + > 1 + 1/r − . There is an explicit formula for λ c but it is not very pretty since the formulas for the hitting probabilities are somewhat complicated. In [7] it was shown that Theorem 1. If R 0 < 1 there are constants T , C so that, from any initial configuration, with high probability the process dies out by time T + C log N . If R 0 > 1, then for any ǫ > 0 there are To describe the equilibrium values we need some notation. Let Y t = S t + I t be the number of single individuals and y t = Y t /N . y t approaches and remains close to a stationary value y * which is the unique equilibrium in (0, 1) for the ODE y ′ = −r − (1 − y) + r + y 2 We will explain this result in more detail later, see (6). The equilibrium frequency of singles, y * , is the solution of r − (1 − y * ) = r + y 2 * . To find the number of single infecteds in equilibrium we let i t = I t /N and note the three events that affect the number of infected singles are • I → S at rate I t = i t N , • I + I → II at rate (r + /N ) It 2 ≈ r + (i 2 t /2)N , and • S + I → SI at rate (r + /N )I t ≈ r + i t s t N .
Fixing i t = i ∈ (0, y * ) we define probabilities where z = 1 + r + (y * − i/2) is the sum of the numerators. Let be the expected change in the number of single infecteds (at partnership breakup) due to the three events. Finally let ∆(i) = p S ∆ S + p II ∆ II + p SI ∆ SI be the expected change in the number of infecteds per event. In equilibrium ∆(i * ) = 0. Having found i * and s * = y * − i * , it is routine to find ii * , si * , and ss * . See Section 5 of [7]. The analysis of the critical case was done in a second paper by Foxall [4]. The main result is Theorem 2. Let V t be the number of infected vertices at time t. If R 0 = 1 then • there are C, γ > 0 so that from any initial configuration, with probability at least 1 − e −γm , V mC √ N = 0, and Theorem 2 shows that the extinction time τ N is O(N 1/2 ) (recall that for two positive sequences of reals {a N } and {b N } the notation a N = O(b N ) implies that lim sup N →∞ a N /b N < ∞). By analogy with critical branching processes one might expect the time to be O(N ). To explain why N 1/2 is the right order of magnitude and to indicate what more precise result we would like to prove, we sketch the proof in the following simpler setting.
Contact process on a complete graph with N vertices. Individuals die at rate 1, and give birth at rate β to an offspring that is sent to a randomly chosen vertex, so the number of occupied vertices X t is a Markov chain on {0, 1, . . . N } with transition rates q(k, k − 1) = k and q(k, k + 1) = βk(1 − k/N ).
The critical value for prolonged survival is β c = 1.
Letting Let x N (t) = X(N 1/2 t)/N 1/2 this means that x N (ǫ) ≤ ǫ −2 with probability at least 1 − ǫ. The drift of x N t is while the diffusivity is (readers unfamiliar with the notions of drift and diffusivity, the latter defined here as the derivative of predictable quadratic variation, may consult the appendix). The first result then follows from Lemma 18 in the next section. Then, as in the proof of Theorem 6 in Section 3.7, to show τ N ⇒ τ , it is enough to show that for each ǫ > 0 we can find δ > 0, so that if x N (t) ≤ δ then x N (t + ǫ) = 0 with probability at least 1 − ǫ. In this case this follows from a standard calculation after noting that X t is dominated by the critical branching processX t in which each particle splits in two, or dies, each at rate one. We know that P (X t = 0 |X 0 = k) = (1 − (1 + t) −1 ) k ≥ 1 − k/t, which is at least 1 − ǫ if we let t = k/ǫ. Letting k = δN 1/2 , the result follows with δ = ǫ 2 .
In the above example, there are three main steps: i) Show X t comes down to C ǫ √ N within ǫN 1/2 time, ii) Show that x N (t) = X N 1/2 t /N 1/2 converges to a diffusion, iii) Show that once x N (t) is small, it hits zero in a short time.
The corresponding result for the partner model is more complicated because the process is four dimensional and there are two different time scales. To state our results and to avoid confusion between SI and S · I etc we introduce alternative notations that we will use throughout the paper: J = II, K = SI and L = SS. When we need to distinguish the time scales we will use (·) N for fast variables (O(1) time scale) and (·) N for slow variables (O( √ N ) time scale).
We first describe the limit processes, followed by statements of the main results, and then we provide the workflow. Throughout the paper, if we say a statement holds with high probability (whp), then it has probability tending to 1 as N → ∞.
Deterministic limits. We will show that as N → ∞, and k t . The fraction of singles y N t converges to y * , the solution of r + y 2 converge to a point on the ray (α, β, 1)R + of fixed points for the deterministic dynamics (see Theorem 7 and Theorem 8 for precise statements). This is reminiscent of (multiplicative) state space collapse in queueing networks where a vector of queue lengths are all proportional to one of them. There are many results of this type. For examples, see [8,1,21,20,19].
Diffusion limits. We will show that the fluctuations z N t = √ N (y N t − y * ) are approximately an Ornstein-Uhlenbeck process dz = −µzdt + σdB that evolves on a time scale that is O(1). On the other hand, the fluctuations of (i N , j N , k N ) (once they are close to the ray) occur on a time scale O(N 1/2 ). Since it stays close to a ray in phase space, (i N , j N , k N ) is effectively one-dimensional, and we will show that i N (t) = I(N 1/2 t)/N 1/2 converges to the limit in (2) but with different constants for the mean and variance. As in the previous result, the hitting time of 0 converges. More precisely, we prove the following results, which are the main goal of this article. Before stating the results let us recall that for two sequences of positive reals {a N } and {b N }, the notation where α and β are defined in (14). Then there are constants µ, σ 2 > 0 such that for any T > 0, i N converges in distribution in C[0, T ] to the diffusion It is likely the second assumption can be relaxed to Y 0 /N → y * . To do so one would require a more careful account of transient behavior. More generally, if (I 0 , J 0 , K 0 )/ √ N and Y 0 /N converge to a point which is not the invariant ray and y * , respectively, the diffusion limit should be initialized at the point of convergence for the solution of the deterministic system. We have not pursued these details here.
Our next result builds on Theorem 4 and allows the process to start from ∞.
It is possible that in Theorem 5 the assumption Y 0 /N → y * can be dropped. We do not pursue this direction in this paper. Lastly we show convergence of the hitting time.
The separation of time scales between y N and the infection variables i N , j N , k N may remind the reader of the work of Kang and Kurtz [12] and Kang, Kurtz, and Popovic [13] on chemical reaction networks. Their machinery reproduces the deterministic limits but do not imply the theorems above.
Workflow. There are seven main steps, described in greater detail in Section 3. First recall that the notation g(N ) = ω(f (N )) means f (N ) = o(g(N )). In order to make certain estimates it is helpful to define the following two quantities: is a left eigenvector for the matrix that determines the deterministic limit of (i N t , j n t , k N t ), as described in Section 3.2.
ii) The quantity Q = (U − u * ) 2 + (V − v * ) 2 , where U = I/H, V = γJ/H, W = ηK/H and (u * , v * , w * ) is the attracting fixed point for the limiting deterministic dynamics of (U, V, W ), as described in Section 3.3. It is clear from its definition that when Q is small, (i N t , j N t , k N t ) is close to the invariant ray.
Then, the main steps are as follows. We state them in the context of Theorem 5 when the initial values are ω( √ N ). For Theorem 4 we will just need to know that if |z 0 |, h 0 and Q 0 are small then they can be kept small for ω(N 1/2 ) amount of time.
Show that h N t comes down to C ǫ within ǫ √ N time. 6. Lemma 7, Theorems 4 and 5. Show convergence of i N (t) to the diffusion limit. 7. Lemma 8. Show that τ N 0 /N 1/2 → 0 in probability as h N 0 → 0, uniformly for large N .
The organization of the rest of the paper is as follows: In Section 2 we describe the deterministic limits. In Section 3 we state precise lemmas relating to each of the workflow steps, and prove diffusion limits. The latter sections are devoted to proofs of the lemmas. An appendix gathers some probability estimates and limit theorems that are used throughout the paper.

Deterministic limits
To begin this section we use the original notation. From the transition rates of the partner model we get the following equations for the drift: If we let Y = S + I and note that II Writing 2S · I = I(S − 1) + I + S(I − 1) + S we further deduce Hence, letting y t = Y t /N then the above becomes One can also check that the diffusivity σ 2 (y) ≤ CN/N 2 = o(1), for some absolute constant C. So writing y as y N to emphasize dependence on N and using Lemma 18 we obtain the following result.
Theorem 7. If y N 0 → y 0 as N → ∞ then y N t ⇒ y t , the solution to the initial value problem with y 0 and dy dt = r − (1 − y) − r + y 2 From the limiting differential equation, we see that in equilibrium To avoid confusing SI and S · I, as mentioned earlier, we now let J = II, K = SI, and L = SS. Since S + I + 2(J + K + L) = N and S is accounted for by Y , the three remaining equations are: µ(K) = r + N · S · I + 2J − (λ + 1 + r − )K Results in [4] suggest, and our results will show that for any ǫ > 0, after ǫN 1/2 time I, J, and K will be of order √ N . So we let N and s t = S t /N to obtain the following formulas for the drift: where the terms with (r + /N ) · I(I − 1) are o(1) since they were O(1) before they were divided by Denoting (i N , j N , k N ) to emphasize dependence on N and using Lemma 18 we obtain the following result.
, the solution to the initial value problem with (i 0 , j 0 , k 0 ) and Theorem 8 is not used elsewhere in the paper; it only provides motivation for the following calculations. To analyze the limit behavior of (9), which is a linear system, we introduce the matrix We can write the condition for an equilibrium as A(i, j, k) T = 0. To have a nontrivial solution we need det(A) = 0. Expanding around the first row For det(A) = 0 we need In Section 10 we show that (12) is equivalent to R 0 = 1. This indeed shows that there is non-trivial solution of the equilibrium. We now proceed to find the solution. To have A(i, j, k) T = 0 we must have −(r + y * + 1)i + 2r − j + r − k = 0 −(r − + 2)j + λk = 0 (13) The second equation implies j = λk/(r − + 2). Using this in the first equation we want −(r + y * + 1)i + 2r − λk r − + 2 + r − k = 0.
Solving we see that if α = r − r + y * + 1 2λ r − + 2 + 1 and β = λk r − + 2 (14) then the ray (αz, βz, z), z ≥ 0 is invariant for the dynamical system (9). To prove the dynamical system converges to this ray we note that The eigenvalues of A are the roots of 0 and b 2 is the sum of the 2 × 2 principal minors of −A, which is given by Note that the above is positive since each of the two negative terms are cancelled by a part of the positive term that precedes it. Since b 3 = det(−A) = 0 and b 1 b 2 − b 3 > 0 one can use the Routh Hurwicz conditions to conclude that the other two eigenvalues have negative real part. Alternatively one can observe that the non-zero roots of the equation det(θI − A) are Therefore the dynamical system (9) indeed converges to the invariant ray (αz, βz, z), z ≥ 0. A quantitative statement that applies to the infection process is given in Lemma 3.

Worklow and diffusion limits
In this section we fill in the details required to obtain the diffusion limits, and along the way we state the lemmas corresponding to the workflow steps outlined near the end of the Introduction. The Lemmas are listed in the order they are proved, which is the same numerical order as outlined in the Introduction. Some estimates assume t ≤ T 1/5 = inf{t : H t ≤ N 1/5 }. This does not hinder us to obtain the diffusion limit, as in a N 1/2 scale H t can essentially be treated to be 0. We also note that if H t ≤ N 1/5 then it can be sent to 0 within N 1/4 = o(N 1/2 ) additional time (see Lemma 7).

3.1
Step 1: the number of singles Y t Recall that (see (3)-(4)) on the original time scale and So, if z = o(N 1/4 ), then denoting by z N t and using Lemma 18, z N t ⇒ z t which satisfies i.e., the limit z t is an Ornstein-Uhlenbeck process. To control the behavior of z N t we will prove the following two facts.
Lemma 1 (Step 1). There is a constant C 1 so that with high probability, To understand the √ log N behavior recall that the stationary distribution of the Ornstein-Uhlenbeck process is a normal, which has density proportional to exp(−x 2 /2σ 2 ).

3.2
Step 2: a special linear combination of (I, J, K) In Section 2 we showed that (i N , j N , k N ) converges quickly to the invariant ray (14) for the ODE (9). Thus the knowledge of one component determines the other two, provided we have good control on the distance of the triple from the invariant ray. Recall that in the example from the Introduction (contact process on a complete graph at criticality), the negative drift of X t brings it down to the natural spatial scale for the diffusion, C ǫ N 1/2 , within ǫN 1/2 time. This suggests that in our model, we should look for a linear combination of (I, J, K) that has negative drift when it takes values larger than O(N 1/2 ).
Motivated by these observations we introduce the variable H = I +γJ +ηK where (1, γ, η)A = 0 and A as in (10). The desired constants satisfy The first equation implies η = (r + y * + 1)/r + y * . Using this in the second equation we have To find the evolution of H = I +γJ +ηK, we use the notation Z = Y −N y * , where recall Y = S +I. Using this notation we rewrite (7) as Due to the equations in (18) that γ and η satisfy, the drift Since (20) is small. Also, the second term dominates the third when I = ω(1). So, to obtain a closed-form differential inequality for E[H t ] it's enough to show I/H is bounded away from zero after a short time, which is done in Section 5.
In Lemma 2 the statement |z N | ≤ C 1 √ log N means that we assume |z N t | ≤ C 1 √ log N , with high probability, for all values of t relevant to the statement of Lemma 2. Since several estimates have other bounds as prerequisites, we will frequently use this shorthand in the results to follow. The choice of N 1/5 as a floor on H is so that once H ≤ N 1/5 a branching process approximation can be used to take H to 0. See Section 9.
Our reliance on Lemma 1 is to bound the first part of the drift that prevents us from showing H t comes down to O(N 1/2 ). To obtain the stronger bound we will have to show that the first term in the drift in (20) averages out to 0.

Step 3: (I, J, K) stays close to the invariant ray
To reduce (I, J, K) to a one-dimensional system we let U = I/H, V = γJ/H, and W = ηK/H. The coefficients in V and W are there to make U + V + W = 1. Recalling the definitions of α, β, γ, and η (see (14) and (18)) we let u * = α/d, v * = βγ/d, and w * = η/d where d = α + βγ + η which, as the reader will see, is the fixed point for the dynamical system corresponding to (U, V, W ). Let √ log N , h N ≤ log N and the sequence of constants b N ≥ N −1/6 then with high probability, Note that Steps 1-3 can be chained together to obtain the following result, which will be useful to prove Theorem 5.
Lemma 4. With constants as in Lemmas 1 and 3, with high probability the following estimates hold We say in the sequel that |z|, h or Q are small, when they satisfy the bounds given in Lemma 4. So, Lemma 4 says that after ǫN 1/2 time for any constant ǫ > 0, we can assume |z|, h and Q are small, and that they remain small up to ω(N 1/2 ) ∧ T 1/5 time.

3.4
Step 4: averaging the drift to 0

Step 5: Show H t comes down to O(N 1/2 )
In order to prove Theorem 5 we need to show that whp, h N (t) = O(1) for fixed t > 0.
To prove Theorem 5, we need the lower bound δ as we want z, Q to have equilibrated before h N reaches below any fixed value A.

Step 6: convergence to diffusion
Here we prove Theorem 4 and Theorem 5. Since the estimates of Steps 2 and 3 break down after T 1/5 , we'll need the following result.
Since the acceleration of time and the scaling of the variables cancel, using (20) we have Proof of Theorem 4. We use Lemma 18 to prove this theorem. Since the jump size is O(N −1/2 ) = o(1) it suffices to find a, b and show convergence of the compensator and predictable quadratic variation.
. Thus if Theorem 4 can be proved for h N for some values of the constants µ, σ 2 then it holds for i N with different constants.
Next let us consider the easier case . For this range of values of t, from (21), and using Lemma 1 it easily follows that µ(h N ) = o(1) and h N = o(1). So we also have sup This proves the assertion about the compensator of h N , as required by Lemma 18. Now to calculate the diffusivity of h we let m index the possible transitions and write where q m is the rate of transition m (it is a function of the state) and ∆ m h N is the change in h N at that transition. Note that there are constants c m so that (∆ m h N ) 2 = c m /N . Recall that on the original time scale the rates at which I, J, K change are Most rates are linear in I, J, or K. Those which are not are the I + I → J transition and the I + S → K transition. As we have already seen for t ≤ T 1/5 , we have I ≤ N 1/2 log N . Therefore the I + I → J transition has rate O((log N ) 2 ). By a similar reasoning the I + S → K transition has rate r + (y * + O(N −1/2 log N ))I = r + y * I + O((log N ) 2 ). Speeding up time by N 1/2 and writing in lower case, the rates are equal to N i N , 2N j N , etc and the error terms have rate O( If t ≥ T 1/5 then I, J, K = O(N 0.24 ) and an easy computation gives σ 2 (h N ) = o(1). This implies the desired convergence of predictable quadratic variation with a(x) = σ 2 x. An application of Lemma 18 now shows h N (and hence i N ) converges to a diffusion of the desired form.
We now prove Theorem 5.
Proof of Theorem 5. Since the limiting diffusion is continuous, it crosses any level C > 0, if it starts from ∞. For any C > 0 denote τ C = inf{t : X t = C}. We then have τ C ↓ 0 in probability as C ↑ ∞. Since the convergence in distribution of Theorem 5 allows for small time change, given the proof of Theorem 4, it is enough to show there are sequences Letting ǫ m = 1/m, Lemma 6 gives the bound on h N . Since Lemma 6 also ensures t m ≥ δ m for some δ m > 0, we then apply Lemma 4 that gives the desired bounds on z N and Q. This completes the proof.

Step 7: extinction time
To prove a result for the time for the infection to die out, let τ N x = inf{t : h N (t) ≤ x} and note that τ N 0 is the first time (on the N 1/2 time scale) there are no infected individuals. Applying the continuous mapping theorem allows us to solve half of the problem. Let Q be the law of the limiting diffusion. Let τ x denote the time to hit x and let ω denote a sample path. If x > 0 then ω → τ x (ω) is continuous Q-almost surely. However, τ 0 is not continuous at any ω in the support of Q as there are arbitrarily close paths with τ 0 (ω n ) ≥ τ 0 (ω) + 1. Since τ x ↑ τ 0 , we see that τ 0 is lower semicontinuous. To do the other half we need to show that τ N 0 converges in probability to 0 as h N (0) → 0, uniformly for large N . This is accomplished by combining Lemma 7 with the following result.
Proofs of Lemma 8 and Lemma 7 are postponed to Section 9. Assuming these for now, we now prove Theorem 6.
Proof of Theorem 6. Recall that Q is the law of the limiting diffusion for h and τ x is the hitting time of x > 0. By the strong Markov property and Blumenthal's 0-1 law, after hitting x, the process will with probability one immediately hit (0, x) and (x, ∞). From this it follows easily that τ x : C → R is continuous Q-almost surely. Let P N denote its law and define τ N (x) = inf{t : h N (t) ≤ x}, for any x ≥ 0. Using the continuous mapping theorem, Combining this with the last result and noting that x → τ N (x) is increasing, Letting ǫ → 0 we have half of the desired convergence in distribution.
To get the other half we again fix any arbitrary ǫ > 0. Note that for δ > 0 So it suffices to show that for each ǫ > 0 there is a δ > 0 so that the second term is at most O(ǫ), uniformly for large N . Since by convergence in distribution, on the N 1/2 time scale it takes at least s > 0 amount of time, whp as s → 0 + and N → ∞, for h to reach δ if h N (0) → x > δ, we may assume when taking the above sup that not only h ≤ δ but also that |z N | and Q are small. The desired bound then follows directly from Lemma 7 and Lemma 8. This completes the proof. 4 Step 1: upper bound on |z N | In this short section we prove Lemma 1. Proof of the lemma is split in two parts. In the first part we show within a short time z N become small.
Approach. Recall from (4) that We introduce the new notation F (·) to emphasize that F : R → R is just a function. Let Y * ∈ (0, N ) be the unique value with F (Y * ) = 0. Note that Y * = N y * since we used Y (Y − 1) and not Y 2 to compute it. However note that F (N y * ) = r + y * and by concavity of F we see that |F ′ | is bounded below by |F ′ (0)| = r − − r + /N . So it follows that Note that r depends on N but converges to a positive quantity as N → ∞. If Z 0 ≥ 1 then letting τ + = inf{t : Z t < 2} and using the product rule (Lemma 15), for t < τ + If Z 0 ≤ −1 then letting τ − = inf{t : Z t > −2}, we obtain the same estimate for P (τ − > t), so for τ 2 = inf{t : |z t | < 2}, taking a union bound and t = C log N with C = 2/ lim inf N →∞ r we find P (τ 2 > (2/r) log N ) ≤ 1/N = o(1).
Next we show that z N remains small over a long period of time.
Control. We use the estimate of Lemma 17 to control the size of z = N −1/2 Z over a long period of time. To this end, we need to appropriately choose the parameters in Lemma 17 and show that the required assumptions hold.

Step 2: upper bound on H
Recall H = I + γJ + ηK and T 1/5 is the first time H ≤ N 1/5 . Recalling (20) we see that the negative term in µ(H) involves I. Therefore, the first step is to get a lower bound on I/H. Lemma 9. There is a constant C 9 > 0 so that with high probability, Proof. There are two steps: get I/H ≥ ǫ for some ǫ > 0 within constant time, then keep I/H ≥ ǫ/2 for an additional N units of time. In both cases we may of course assume that H > N 1/5 .
Using Lemma 9 we now proceed to the proof of Lemma 2.
Proof of Lemma 2. Recalling that Z = √ N z, from (22) we obtain Since η > 1 > γ/2, using the assumption on |z N |, Lemma 9, and the fact I ≤ H we find that for Taking expected value and using Jensen's inequality while noting again η > γ/2, on that time interval we have From this we see there are C, δ > 0 such that if N is large and EH ≥ CN 1/2 log 1/2 N then The solution to the differential equation . Since H ≤ cN for some positive constant c it follows from the differential inequality that by time T 0 ∧ T 1/5 we must have EH ≤ CN 1/2 log 1/2 N , where T 0 = C 9 + (C 2 /2)N 1/2 / log N for some constant C 2 . Further noting that T 0 ≤ C 2 N 1/2 / log N , for all large N we deduce that by time T 0 ∧ T 1/5 we have H ≤ (3/4)N 1/2 log N with high probability.
To show that H does not cross N 1/2 log N for a long stretch of time we use again Lemma 17. Take X = H − N 1/2 log N/2 and x = N 1/2 log N/2. Since This completes the proof. 6 Step 3: (I, J, K) stay close to the invariant ray In this section our goal is to prove Lemma 3. Computing similarly as earlier we note that the crossterms σ(J, 1/H), σ(K, 1

/H) = O(1/H). Hence using the product rule and Taylor's approximation from (19)-(20) it follows
Thus applying Lemma 18 we deduce that U, V, W approach solutions to the ODEs corresponding to (28), as N → ∞.
To find the equilibrium of (28) we set the right-hand sides of (28) to 0, divide the second equation by γ and the third by η to obtain the following system of equations: Comparing with (13) we see that the solution is Using W = 1 − U − V , for the DEs corresponding to (28) are
Thus within t = (1/5a 2 ) log N time, EQ t = O(N −1/5 ) so Markov's inequality gives P (Q t ≥ N −1/6 ) = o(1). Now to prove the assertion about Q we simply note that Q ≤ min{θ 1 , θ 2 }Q. Now it remains to show that Q remains below the threshold 2b N for b N ≥ N −1/6 for a long stretch of time. To this end, we again use Lemma 17. We set This completes the proof. 7 Step 4: averaging z · i to 0 In this section we prove Lemma 5. Denoting σ 2 = 4r − (1− y * ), using (15) and recalling r − (1− y * ) = r + y 2 * , we see From above we note that z N is not symmetric. We define a processz N that looks as similar as possible to z N while also being symmetric about zero;z N can be thought of a spatially discrete Ornstein Uhlenbeck process. Denoting µ = r − + 2r + y * we letz N have transitions This ensures thatz N can hit zero and is symmetric about the reflection x → −x. Also, by taking the approach of Lemma 1 we obtain the same bounds forz N as we did for z N . Couple z N withz N in the obvious way, i.e. couple jumps of +2/N 1/2 at the minimum of the two rates and similarly for jumps of −2/N −1/2 . This gives for D N = z N −z N the transitions Using the definition of µ and the bound z N = O(log N ), Computing the diffusivity, We use Lemma 17 to control D N . To this end, note that if With these choices note that cµ ⋆ /σ 2 ⋆ = o(1) which allows us to set C c = 1 and C µ⋆ = 4CL+CL. Thus µ ⋆ x/C c σ 2 ⋆ = Ω((log N ) 2 ) and x/16C µ⋆ = Ω(1) so Γ = ω(N ). Combining with an analogous argument for −D N , for some (possibly larger) C > 0 we find Note that (33) implies that it is enough to prove Lemma 5 with z N replaced byz N . That is, we will only prove the following result: Under the same assumptions as in Lemma 5 except withz N in place of z N , with high probability Proof. We start with few definitions. Denote R 0 to be the first time thatz N = 0. Further for j ≥ 0 let Note that the variables R 2j+2 − R 2j and Conditioning on the information at time R 2j we see that sum is a discrete time martingale. Using Doob's L 2 maximal inequality for martingales and orthogonality of martingale increments, we obtain Using the bound onz N we then note that Therefore E   sup Our next step is to consider We observe that Using Cauchy-Schwarz inequality Note that the bound on the second term of the RHS of (38) already follows from (35). To bound the first term in (38) let us note that from (20), accounting for the change of scale, it follows for some positive reals Since . Therefore applying Doob's L 2 -maximal inequality we deduce Letting t = R 2j+2 and taking an expectation we conclude where we used B N ≤ log N to get the last bound. Since Q = o(1), the above inequality holds when h N is replaced by i N , j N or k N and using the fact that L(·) is Lipschitz it extends to L as well.
Indeed, Q = o(1) implies that i N = (c i + o(1))h N for some positive constant c i . Similarly j N = (c j + o(1))h N and k N = (c k + o(1))h N for some other positive constants c j and c k . We take for granted, omitting it from the notation, that the expectation is restricted to the event where Q = o(1). Now using the fact L is Lipschitz with constant c L we obtain that for any r < s, which easily implies (41) with L in place of h and with c 2 L included in the bound. Using this together with (35), (38), and (37), Combining with (36), using B N ≤ log N , and applying Markov inequality with log 5/2 (N ) times the expectation as the lower bound, we see that with high probability with high probability. Since the law of large numbers gives R ∆tn 1/2 /tN 1/2 → 1 with an error rate of N 1/4 the result follows. This completes the proof. 8 Step 5: bounding the time for H to reach O(N 1/2 ) In this section our goal is to prove Lemma 6. Before proving Lemma 6, we derive one additional bound related to z N , this time on its integral over a short time.
Lemma 11. Let τ 1 = inf{t : |z N t | ≤ 1}. Then with high probability Proof. The proof is again a use of Lemma 17. Suppose z N 0 > 0. The proof of the lemma is similar when z N 0 < 0. So we will leave the details for the case z N 0 < 0.
Next we show that on the event {X t ≤ x} we have the desired bound. To this end, note that on the event {z 0 > 0} we see that τ 1 = inf{t : z t ≤ 1}. We also recall that µ t (z) ≤ −rz t , where r as in Lemma 1. Thus on the event {X t ≤ x} we must have Solving by repeated substitution we find z t ≤ z 0 e −rt + x((1 − e −rt )/r + e −rt ) = z 0 e −rt + O(x). So t 0 z s ds = O(z 0 ) + O(xt) and the rest follows from noting that τ = O(log N ) with high probability (Lemma 1). Now we return to the proof of the Lemma 6. Recall that z N (t) = Z( We note from (20) it follows that: for some positive reals c i , i = 1, 2, 3. We also find that σ 2 (h N ) = O(h N ). Since the time scale is clear from the context we hereafter drop the subscript N . Given ǫ > 0 we want to find A such that h t = A + O(N −1/2 ) for some t ∈ [δ, ǫ], with probability 1 − o(ǫ) − o(δ), uniformly for large N , as ǫ, δ ↓ 0. Since h jumps by O(N −1/2 ) it's enough to show h δ ≥ A and h ǫ ≤ A. Also, by taking A ≥ 2 we can assume h ≥ 1 for the entire time frame in which we are interested.
To obtain the upper bound we need to find a better approximation than (43). In fact, expanding to the third order term in the Taylor series, and noting that jump size are Note that we may assume an arbitrarily small fixed amount of time has passed, so that Lemma 9 can be used. Then using Lemma 9 we find c 2 i 2 /h 2 ≥ c 3 for some c 3 > 0. Hence, from (46)-(48) it follows that x p t ≥ (c 3 + o(1))t. Now taking t = ǫ, ν = c 3 ǫ/3, a = ν and λ = 1/ν 2 ǫ while noting sup t≤τ x 3 t ≤ ν 3 to find that if τ > t then on the event However, the above implies that τ ≤ t = ǫ, which is a contradiction. Therefore by Lemma 14, P (τ > ǫ) ≤ e −λa . Since λa = 1/νǫ = 3/(c 3 ǫ 2 ), e −λa → 0 as ǫ ↓ 0, proving the required upper bound on τ .

9
Step 7: bounding the time for H to go from δN 1/2 to 0 In this section our goal is to prove Lemma 8 and Lemma 7. First we prove Lemma 8. Hence, we assume h N (0) = δ, for some δ to be determined during the course of the proof. Denote a k = 2 k δ for k integer. To prove Lemma 8 we will repeat the following step until H ≤ N 1/5 , or equivalently, h ≤ N −0.3 : Start from a k + O(N −1/2 ) and run the process until the first time τ k when it exits (a k−1 , a k+1 ). Since jump sizes are O(N −1/2 ) we note that h(τ k ) = a j + O(N −1/2 ) for some j ∈ {k − 1, k + 1}. As this procedure is continued until h drops below N −0.3 we have a k = Ω(N −0.3 ) for entire duration of this iterative procedure.
By our assumption |z| is small. Hence, we can use Lemma 6, which implies h ≤ log N with high probability. Therefore we also have that i ≤ log N with high probability. Using (21), we now have Since Q is also small, we have that |z|, Q and h are all small up to time ω(N 1/2 ) ∧ T 1/5 , with high probability. So as shown in the proof of Theorem 4 we have σ 2 (h) = (σ 2 + o(1))h for some σ 2 > 0. With these estimates on the drift and diffusivity of h we now state a result about the expectation of the exit time τ k .
Lemma 12. Fix m > 0 and assume h ≥ N −0.3 . There is a constant C 12 such that for any δ > 0 such that 4 m δ is sufficiently small and any k ≤ m, if h N (0) ∈ (a k−1 , a k+1 ) then we have Proof. We claim that to complete the proof it is enough to find C 12 , ǫ * with Indeed, one can deduce from (50) that for any positive integer n. Now the proof finishes by noting that Thus it remains to prove (50). To show the claim we use h 2 . Note Using Lemma 5 with B N = a k and L = ih, which is Lipschitz with constant c L = O(a k ) (to see where the last step follows since h ≤ log N implies a k = O(log N ), and so a Hence, for τ k > t and δ > 0 such that 4 m δ > 0 sufficiently small, from (51)-(52) we obtain for all large N .
Next, we estimate the exit probabilities.
Proof. Since τ k is the exit time from (a k−1 , a k+1 ) = (a k /2, 2a k ) and h jumps by O(N −1/2 ), for any T > 0 we obtain Note that we can restrict the expectation to an event that has high probability, and the above is still valid up to o(1). Rearranging and noting N −1/2 = o(a k ), On the other hand, from (49) it follows Using Lemma 12 we see that we can take T > 0 large enough so that O(P (τ k > T )) ≤ ǫ/2, uniformly for k ≤ m. This completes the proof.
Equipped with Lemma 12 and Lemma 13 we now prove Lemma 8.
Proof of Lemma 8. We first recall the following two facts about a simple random walk on Z that with p < 1/2, where p is the probability of jumping to the right, and (1 − p) is the same for the left.
1. Starting from 0 the probability to ever reach k > 0 is (p/(1 − p)) k , and 2. starting from k, the expected number of jumps out of k is equal to 1/(1 − 2p).
By comparison, and using Lemma 12 and Lemma 13 it follows that, starting from a 0 , uniformly for k ≤ m, where m is some fixed positive integer, we have 1. For any k > 0, the probability to ever reach a k is at most (1/2 + o(1)) k . 2. Assuming that a k is reached, the expected number of times we perform the step of exiting a k is 3 + o(1).
By the first observation it follows that with high probability in m, h(t) ≤ 2 m δ for all t ≤ τ 0.3 . From the second observation, on the event that h remains below 2 m δ, for some constant D 0 . Now Markov inequality implies that τ 0.3 ≤ D 0 a m M with high probability in M . Thus, given ǫ > 0, by taking m and then M > 0 large enough, we can ensure that P (τ 0.3 > 2 m D 0 M δ) ≤ ǫ. Finally shrinking δ again so that 2 m D 0 M δ < ǫ, we complete the proof.
Next we prove Lemma 7.
Proof of Lemma 7. Recall that τ 0.24 = inf{t : H t = 0 or H t ≥ N 0.24 }. We start by proving H τ 0.24 = 0, which we then use to prove τ 0.24 ≤ N 1/4 . Using the bound on |z N | and the fact that γ/2 < η, from (20), we deduce that for some constant a 1 > 0 and t ≤ N 1/2 log N , where the last inequality follows from the fact that I ≤ H. For b > 0 to be determined, letting f (x) = e −bx and using Taylor's theorem we get where z is between x and y. Since the jump sizes are O(1) we have that e −bz = e −b(x+O(1)) and (y − x) 3 = O(1). Noting that the transition rates are bounded by O(H) and multiplying by those transition rates upon summing over y we obtain Since σ 2 (H) = Ω(H) applying (54) we obtain that With these transitions, (I, J, K) is a continuous time three-type critical branching process. In particular, each initial particle of type I, J, K spawns an independent branching process. We would like to focus only on what happens when particles are single I type. To do so we note that 1. each initial particle of type J, K decays into 0, 1 or 2 type I particles before producing additional particles of other types, and 2. each type I particle follows the evolution described in Figure 1, yielding 0, 1 or 2 single I particles upon reaching the set {D, E, F, G}.
To address the first point we note that each particle of type J, K decays at a constant rate of at least r − , yielding 0, 1 or 2 single I type particles, before producing additional offspring. Since there are initially O(N 0.2 ) particles, with high probability, within constant times log N time every initial particle of type J, K has either decayed. Since log N = o(N 1/4 ), in order to prove our result we can assume that all initial particles have type I.
To address the second point, using the Markov chain described in Figure 1 we can determine the timing and the number of type I offspring of each type I particle. The offspring distribution has mean R 0 = 1 and is supported on the set {0, 1, 2}. Referring to Figure 1, the waiting time to produce offspring is the absorption time at {D, E, F, G} starting from A, which is at most exponential(1 + r + y * ) + exponential(r − ).
To finish the proof we note the set of descendants of any particle forms a critical Galton-Watson tree and recall a couple of facts that hold for such trees, assuming the offspring distribution has finite variance, which it does here since it has finite support. The height of the tree (maximum distance to the root) is greater than or equal to n with probability O(1/n), and the total number of vertices is greater than or equal to n with probability O(n −1/2 ). Thus with O(N 0.2 ) initial particles, with high probability the tallest tree has height O(N 0.22 ) and the sum of tree sizes is O(N 0.44 ). In particular there are in total O(N 0.44 ) offspring production events. Since the waiting time for offspring is at most the sum of two exponential random variables with fixed constant rates, the longest waiting time for offspring is whp bounded by constant times log N .  In this section we show that (12) is equivalent to the condition R 0 = 1. To do this we recall (1), namely the definition of R 0 : where τ is the hitting time of {D, E, F, G} for the Markov chain with rates drawn in Figure 1. To and note that h(D) = 0, h(E) = 0, h(F ) = 1, and h(G) = 2. By considering what happens on the first jump from each state we see that The equations (56) and (57) can be rewritten as Adding these last two equations we have Adding the fractions in the parentheses we have (2 + r − )(λ + 1 + r − ) − 2λ λ(2 + r − ) · h(B) = r − r − + 2 + 2λ λ(2 + r − ) .

Appendix: Sample path estimation
In this section we derive some sample path estimates, a diffusion limit theorem, and results on drift and diffusivity of functions of continuous time Markov chains that are later used in this paper. To derive these results we appeal to the semimartingale theory. For an overview of the semimartingale theory that is used here, we refer the reader to [9].
First we recall some standard definitions from semimartingale theory, noting along the way how the present context fits into this framework.
Let (Ω, F, F, P) be a filtered probability space that satisfy the usual conditions. This means that the filtration F = (F t ) t∈R + is right continuous in the sense that F t = s>t F s for each t, and each F t in the filtration F = (F t ) t∈R + contains the P null sets of F. In [9] it is also assumed that F is P complete; if this is not the case then it is easy to check that completing F and then F with respect to null sets does not violate right continuity. In our case, the filtered probability space is that of the finite state continuous-time Markov chain corresponding to the state variables (S, I, J, K, L), with the completion of the natural filtration. Since such a process is Feller, as shown in [18, I.5], the corresponding filtered space satisfies the usual conditions.
We use X to denote a general stochastic process, indexed by R + . X is called optional if it is measurable with respect to the σ-field (on Ω × R + ) generated by all càdlàg adapted processes. All the processes considered here are optional. We assume the reader is familiar with the notions of stopping time, predictable time and process, localization and martingale.
Given a stochastic process X we denote by X − the left-continuous process derived from X. We further let ∆X = X − X − to be the process of jumps. A stochastic process X has bounded jumps if |∆X| ≤ c a.s. for some constant c > 0, and it is said to be quasi-left continuous (qlc) if ∆X T = 0 a.s. on {T < ∞} for any predictable time T .
A semimartingale (s-m) X is a process that can be written as X = X 0 + M + A, where X 0 is an F 0 -measurable random variable, M is a local martingale and A has locally finite variation. We call a semimartingale special if it can be written as where X p is the compensator of X and X m is a uniquely defined local martingale satisfying X m 0 = 0. By I.4.24, if X has jumps bounded by c then it is special and |∆X m | ≤ 2c, and if it also qlc then using I.2.35 in the proof of I.4.24, the stronger and more convenient estimate |∆X m | ≤ c holds.
Recall that if a martingale M is locally square-integrable then M 2 has a compensator, denoted M and called the predictable quadratic variation (pqv). Any local martingale M with M 0 = 0 and bounded jumps is locally square integrable (see [9, I.4.1]). The following basic estimate is used repeatedly throughout the paper.
Lemma 14 ([6, Lemma 3]). Let X be a quasi-left continuous semimartingale such that |∆X| ≤ c, and let a, λ > 0. Then, if 0 < λc ≤ 1/2 then P sup The final result of this section is about identifying the diffusion limit (or an ODE limit) for a stochastic process. Before stating the result let us introduce some more notations. A stochastic process X t = (X t,1 , . . . , X t,d ) is an R d -valued semimartingale if each component is a s-m. The compensator (when it exists) of X t is defined component-wise and the predictable quadratic variation (when exists) is a d × d matrix-valued process with X t,ij = X m i , X m j t . Drift and diffusivity are similarly defined. Here we use Theorem 4.1 in [3] to obtain easily checkable conditions for convergence to an ODE or diffusion limit.
Lemma 18 (Diffusion limit). Let X N t be a sequence of qlc semimartingales, a be a Lipschitz d × d matrix-valued function on R d and b : R d → R d be Lipschitz. Suppose that a.s. |∆X N t | ≤ c N with c N → 0 as N → ∞. Also assume that for each T, R > 0, as N → ∞, where the convergence holds in probability, and τ N R = inf{t : |X N t | ≥ R}. Suppose X N 0 → x ∈ R d . Then X N t converges in distribution to the diffusion process x with x 0 = x and x 0 = x and dx = b(x)dt + a(x)dB.
In particular, if a = 0 then X N converges to the solution of the ODE system with x 0 = x and x ′ = b(x).
Proof. We show the conditions of Theorem 4.1 of [3, Chapter 7] are satisfied. The fact that a and b are Lipschitz ensures the martingale problem for the limit process is well-posed. (X N ) p and X N play the role of B N and A N respectively. Since v ⊤ X N t v = v ⊤ X N t for vector v and the latter is R-valued, the process X N is non-negative definite. We also note that X i − X p i and X i X j − X ij are local martingales as required by (4.1) and (4.2).
Since the jump size of X N tends to 0 (4.3) is satisfied. Applying [9, Lemma I. 4.24] we see that jump size of X p and X also converge to 0, which gives (4.4) and (4.5). The requirements of (4.6) and (4.7) are immediate from 66-(67) above. The rest is straightforward. We omit the details.