Differential equation approximations of stochastic network processes: an operator semigroup approach

The rigorous linking of exact stochastic models to mean-field approximations is studied. Starting from the differential equation point of view the stochastic model is identified by its Kolmogorov equations, which is a system of linear ODEs that depends on the state space size ($N$) and can be written as $\dot u_N=A_N u_N$. Our results rely on the convergence of the transition matrices $A_N$ to an operator $A$. This convergence also implies that the solutions $u_N$ converge to the solution $u$ of $\dot u=Au$. The limiting ODE can be easily used to derive simpler mean-field-type models such that the moments of the stochastic process will converge uniformly to the solution of appropriately chosen mean-field equations. A bi-product of this method is the proof that the rate of convergence is $\mathcal{O}(1/N)$. In addition, it turns out that the proof holds for cases that are slightly more general than the usual density dependent one. Moreover, for Markov chains where the transition rates satisfy some sign conditions, a new approach for proving convergence to the mean-field limit is proposed. The starting point in this case is the derivation of a countable system of ordinary differential equations for all the moments. This is followed by the proof of a perturbation theorem for this infinite system, which in turn leads to an estimate for the difference between the moments and the corresponding quantities derived from the solution of the mean-field ODE.


Introduction
General birth-and-death models are at the basis of many real-world applications ranging from queuing theory to disease transmission models, see Grimmett and Stirzaker [9]. In particular, the analysis of such models involves the consideration of Kolmogorov equations that simply describe the evolution of the probability of a certain process being in a given state at a given time. One of the major drawbacks of this approach is the large number of equations. This is very limiting from an analysis viewpoint, and in addition, it also precludes the construction of a numerical solution of the full or complete set of equations. Using techniques such as lumping in Simon et al. [18] this can be circumvented and an equivalent exact system with a tractable number of equations can be derived. However, often this technique only works in the presence of significant system symmetries such as in the case of a simple disease transmission model on a fully connected graph where all nodes are topologically identical. This requirement rarely holds and highlights the importance of approaches that deal with the complexity of the large number of equations. Progress in this direction has been made as illustrated by the important contributions of Kurtz [8], Bobrowski [4] and Darling and Norris [6]. Here we take a dynamical system type approach, where the Kolmogorov equations are simply considered as a system of linear ODEs with a transition rate matrix having specific properties such as special tri-diagonal structure and/or well defined functional form for the transmission rates. For example, consider a Markov chain with finite state space {0, 1, . . . , N } and denote by p k (t) the probability that the system is in state k at time t (with a given initial state that is not specified at the moment). Assuming that starting from state k the system can move to either state k − 1 or to state k + 1, the Kolmogorov equations of the Markov chain take the formṗ k = β k−1 p k−1 − α k p k + δ k+1 p k+1 , k = 0, . . . , N, (KE) or, introducing the tri-diagonal matrix The assumption on the tri-diagonality of the matrix can obviously be weakened, however, most practical problems that motivate our work fall into this class. Therefore, to be in line with applications and to make our results more transparent, the tri-diagonal case will be considered.
Here we assume that the coefficients β k and δ k are asymptotically density dependent in the sense that (1.1) and the following limits exist for all x ∈ [0, 1], with β and δ (at least) continuous functions, and We note that the usually used density dependence means that β(x) = BN (N x) N for all N (similarly for D N ). In order to get a differential equation approximation of the Markov chain for N → ∞, the random variables X N (t), forming a continuous time Markov process, with values in are considered. Then p k (t) = P X N (t) = k N can be expressed in terms of the transition probabilities as where the transition probabilities are Combining this with the Kolmogorov Equation (KE), it is straightforward to show that the transition matrix is given by As a simple calculation shows in Subsection 3.b, the approximating mean-field differential equation to the Markov chain can be written aṡ The problem of rigorously linking exact stochastic models to mean-field approximations goes back to the early work of Kurtz [13,14]. Kurtz studied pure-jump density dependent Markov processes where apart from providing a method for the derivation of the mean-field model he also used solid mathematical arguments to prove the stochastic convergence of the exact to the mean-field model. His earlier results [13,14] relied on Trotter type approximation theorems for operator semigroups. Later on, the results were embedded in the more general context of Martingale Theory [8]. A detailed survey of the subject from the probabilistic point of view is by Darling and Norris [6]. Building on this and similar work, McVinish and Pollett [16] have recently extended the differential equation approximation to the case of heterogeneous density dependent Markov chains, where the coefficients of the transition rates may vary with the nodes. The results by Kurtz and others in this area have been cited and extensively used by modelers in areas such as ecology and epidemiology to justify the validity of heuristically formulated mean-field models. The existence of several approximation models, often based on different modelling intuitions and approaches, has recently highlighted the need to try and unify these and test their performance against the exact stochastic models, see House and Keeling [11]. Some steps in these directions have been made by Ball and Neal, and Lindquist el al. [2,15], where the authors clearly state the link between exact and mean-field models. The probabilistic approach given by Ethier and Kurtz, and Darling and Norris in [8,6] yields weak or stochastic convergence of the Markov chain to the solution of the differential equation. Here we have a more moderate goal, namely to prove that the expected value of the Markov process converges to the solution of the differential equation as N → ∞, and to prove that the discrepancy between the two is of order 1/N . The benefit of framing the question in this simpler or different way lies in a less technical and more accessible proof. This will mainly rely on well-known results from semigroup theory compared to the combination of highly specialist tools and results from probability theory. For the applied semigroup methods see also Bátkai, Csomós and Nickel [3] and references therein. This simpler approach, based on the expected value of the Markov chain, is also motivated by practical considerations, namely that usually the goodness of the approximation is tested by comparing the average of many individual simulations to the output from the simplified approximate model. This can be a satisfactory and sufficient comparison since, according to our knowledge, weak and stochastic converges is rarely tested. The main result of the paper can be formulated in the following Theorem, where we assume uniformity for the convergence in (1.2).
be the expected value and let x be the solution of (1.4) with initial condition x(0) = y 1 (0). Let us assume that the limits in (1.2) are uniform in the sense that there exists a number L, such that for all x ∈ [0, 1] Then for any t 0 > 0 there exists a constant C, such that Since the essence of the Theorem is well-known we highlight the novelties of our approach in the paper.
• The proof is self-contained in the sense that no general, abstract, theorem or combination of theorems are used. The result is based on the simple fact that if the operators A N converge to the operator A in a certain sense as N → ∞, then the operator semigroup T N generated by A N converges to the semigroup T generated by A. This result is formulated in Lemma 5. • The proof automatically implies the rate of convergence, namely it can be shown that the difference between the expected value and the solution of the mean-field ODE is of order 1/N . • Our tools make it possible to extend the above convergence results from density dependent Markov chains to the more general case of asymptotically density dependent Markov chains. • For Markov chains where the transition rates satisfy some specific sign conditions, a completely new approach is presented. This is based on deriving a countable system of ordinary differential equations for the moments of a distribution of interest and proving a perturbation theorem for this infinite system.
The paper is structured as follows. In Section 2 we motivate our work via two examples and/or applications: an adaptive network model with random link activation and deletion, and a SIS type epidemic model on a static graph. The derivation of ODEs for the moments is presented in Section 3 together with the heuristic construction of the mean-filed equation for the first moment. In Section 4, we present our new approach and use it to prove Theorem 1. In Section 5, we present the derivation of an infinite system of ODEs for the moments (only in the density dependent case) and we show how this leads to a new approach that can be used to derive estimates for all moments, directly from the ODE. This is contrast with the usual approach where only the expected value of the Markov chain is estimated.

Motivation
In this Section we present two important examples that motivate our investigations.
Recently, it has become more and more important to understand the relation between the dynamics on a network and the dynamics of the network, see Gross and Blasius [10]. In the case of epidemic propagation on networks it is straightforward to assume that the propagation of the epidemic has an effect on the structure of the network. For example, susceptible individuals try to cut their links in order to minimize their exposure to infection. This leads to a change in network structure which in turn impacts on how the epidemic spreads. The first step in modeling this phenomenon is an appropriate dynamic network model such as the recently proposed globally-constrained Random Link Activation-Deletion (RLAD) model.
This can be described in terms of Kolmogorov equations as follows, where p k (t) denotes the probability that at time t there are k activated links in the network, and N is the total number of potential edges. It is assumed that nonactive links are activated independently at random at rate α and that existing links are broken independently at random at rate ω. Furthermore, the link creation is globally constrained by introducing a carrying capacity K max 1 , that is the network can only support a certain number of edges as given by K max 1 .
Using the above notation, here being of order N . These coefficients clearly satisfy (1.2) and (1.3).
The second motivation comes from epidemiology where a paradigm disease transmission model is the simple susceptible-infected-susceptible (SIS) model on a completely connected graph with N nodes, i.e. all individuals are connected to each other. From the disease dynamic viewpoint, each individual is either susceptible (S) or infected (I) -the susceptible ones can be infected at a certain rate (β) if linked to at least one infected individual and the infected ones can recover at a given rate (γ) and become susceptible again. It is known that in this case the 2 Ndimensional system of Kolmogorov equations can be lumped to a N +1-dimensional system, see Simon, Taylor and Kiss [18]. The lumped Kolmogorov equations take again the form (KE) with β k = βk(N − k)/N, δ k = γk, α k = β k + δ k , k = 0, . . . , N, These coefficients also satisfy (1.2) and (1.3). We note that in the case of a homogeneous random graph we get a similar system with a slightly different meaning of the coefficients.

Momentum approach
The basic idea of getting an approximating differential equation is to calculate the time derivative of the expected value by using the Kolmogorov equations. Since the obtained equation is not self-contained, it needs to be closed by using some closure approximation. In this Section we derive first equations for the derivatives of every moment, then we briefly show how to get the simplest mean-field approximation for the first moment. This is also discussed in the case of asymptotically density dependent Markov chains. (y 1 (t) is the expected value we are mainly interested in) we follow Simon and Kiss [17] to derive the differential equations for y n (t) starting from the Kolmogorov equations (KE). To get the time derivative of y n the following Lemma will be used. r k (k = 0, 1, 2, . . .) be a sequence and let

Let us introduce
Combining these with Lemma 2 leads tȯ Using the binomial theorem R k,n and Q k,n can be expressed in terms of the powers of k, hence d n can be expressed as d n (t) = n m=1 d nm y m (t) with some coefficients d nm . The d n terms contain N , hence to use the 1/N → 0 limit it has to be shown that d n remains bounded as N goes to infinity. This is proved in the next lemma. Proof. Taylor's theorem, with second degree remainder in Lagrange form, states that where ξ is between x 0 and x. This simple result can be used to find estimates for both R k,n and Q k,n . In particular, applying the above result when f (x) = x n , x = k + 1 and x 0 = k gives R k,n = n(n − 1) 2 Similarly, when x = k − 1 and x 0 = k, we obtain with some η ∈ [k, k + 1]. Hence, R k,n and Q k,n are non-negative yielding that d n (t) ≥ 0. On the other hand, ξ/N ≤ 1 and η/N ≤ 1 and (1.1) lead to the inequality given below Hence, the statement follows immediately from (3.2) and using that N k=0 p k (t) = 1.
We show two possible ways to turn (3.3) into an ODE. First, we use the approximation E(F (X)) = F (E(X) (where E stands for the expected value and F is a given measurable function) to derive the mean-field approximation. Then assuming that the Markov chain is density dependent and the functions β and δ are polynomials, we derive an infinite system of ODEs for the moments.
3.b. The mean-field approximation. Since d 1 = 0, applying (3.3) for n = 1 we obtainẏ Using the asymptotic density dependence of the Markov chain (1.2), the right-hand side can be approximated by In order to make the equation "closed" the approximation will be used. Substituting this approximation into the equation (3.4) we obtain the following differential equationẋ This equation is known as the mean-field approximation of the original Kolmogorov equation (KE).
3.c. Infinite system of ODEs in the polynomial case. In this Subsection it is assumed that the functions in (1.1) are polynomials and the Markov chain is density dependent, that is, Using this and denoting Letting N → ∞ on the right-hand-side, we arrive at the following systeṁ f n (t) = n · l j=0 q j f n+j−1 (t) n = 1, 2, . . . , (IE) that can be regarded as a system of "approximating" differential equations for (3.9). In Section 5 we are going to investigate how y 1 (t) can be approximated on finite time intervals using the solution of this infinite system. Remark 4. All the results for the infinite system obtained from here on remain valid in the asymptotically density dependent case when The only difference is that in (3.9) we obtaiṅ

Proof of Theorem 1
In this Section we prove that the solution of (ODE) is an O(1/N ) approximation of the expected value of the Markov chain, that is we prove Theorem 1. Let us introduce the matrix A N as in Section 1 Then the operator families (T N (t)) t≥0 defined as form uniformly continuous semigroups on C N +1 for each N ∈ N. This yields Assume for the sake of simplicity that on the right-hand-side of (ODE) the functions β, δ ∈ C 2 [0, 1]. If we denote the solution of (ODE) with initial condition x 0 by ϕ(t, x 0 ), then the operator family defined as defines a strongly continuous operator semigroup on C([0, 1]) (see Engel, Nagel [7, Section 3.28]) with generator (A, D(A)). We also know that for f ∈ C 1 ([0, 1]) The main idea is to approximate the semigroup (T N (t)) t≥0 (that is, the solution of the transposed (KE)) using the semigroup (T (t)) t≥0 (that is, the solution of the mean-field equation (ODE)). Observe that the semigroups act on different spaces: the first one acts on X N := C N +1 , the second one on X := C 1 ([0, 1]). In order to prove an approximation Theorem in a fixed space, we assume that there are linear operators Lemma 5. Assume that the conditions of Theorem 1 are satisfied. For (T N (t)) t≥0 and (T (t)) t≥0 the following holds: for all f ∈ C 2 ([0, 1]) and t 0 > 0 there exists where P N denotes the projection defined in (4.4).
Proof. For the generators A N , A, and for f ∈ C 2 ([0, 1]) the following identities hold: The idea of proving the estimate for the semigroups is to estimate the difference of the generators. Their difference can be divided into two parts as follows.
The first part can be estimated by using the asymptotic density dependence (1.5) as and using that f ′ is bounded. The second part can be estimated by using Taylor's formula for f ∈ C 2 ([0, 1]). We obtain that for each k = 0, . . . , N there exists ξ k ∈ ( k N , k+1 N ) such that Since β and δ are bounded on [0, 1] (they are in C 2 [0, 1]), we obtain from (4.6) and (4.7) that β k N , δ k N are uniformly bounded for all k = 0, . . . , N and N ∈ N. Hence, for all f ∈ C 2 ([0, 1]) there exists K > 0 such that It is easy to see that for all N , the operators T N (t) := J N T N (t)P N , t ≥ 0 form a strongly continuous semigroup on X -where J N is defined in (4.3) -with generator A N := J N A N P N . Using the variation of parameters formula (see e.g. Engel, Nagel [7, Corollary III.1.7]), for the difference of the (projected) semigroups, and for f ∈ C 2 ([0, 1]) we obtain that By the estimate in (4.9) and using that T (t) maps the subspace C 2 ([0, 1]) of X into itself (see Chicone [5,Theorem 1.3]), we obtain that for f ∈ C 2 ([0, 1]) there exist K * > 0 and L * > 0, such that Since β, δ ∈ C 2 ([0, 1]), the solution function ϕ of (ODE) is also a C 2 -function of the initial data. Hence, for s ∈ [0, t] there exists K t > 0 such that This yields that for f ∈ C 2 ([0, 1]) and Using this semigroup approximation Lemma we can now prove Theorem 1.
Proof of Theorem 1. We apply Lemma 5 for f = id [0,1] . We obtain that for any t 0 > 0 there exists C = C(id, t 0 ) such that for all t ∈ [0, t 0 ] Observe that by (3.1) and (4.1), the following relation holds Furthermore, using (4.2) it is easy to show that It can be seen that it is enough to prove the statement when the initial condition is p m (0) = 1, p j (0) = 0, j = m. Then y 1 (0) = m N yielding x(0) = m N and therefore x(t) = ϕ t, m N . Hence, combining the above facts we obtain where we used (4.10).

Infinite system of ODEs for the moments
In this Section we consider the infinite system of ODEs (3.9) for the moments and its formal limit as N → ∞ (IE). Our aim here is to prove that the solutions of the first system converge to those of the second as N → ∞ and t is in a bounded time interval. That is, that the moments of the Markov chain converge to the solutions of the infinite system (IE). As a bi-product of this result we get a completely new proof of Theorem 1 under extra sign conditions on the transition rates. In order to prove this general convergence result were are going to use the approach of Kato [12] and Banasiak et al. [1]. Assume that q 0 , q 2 , . . . , q l ≥ 0, q 1 ≤ 0 and Then there exists an operator (L, D) such that L ⊂ L max and (L, D) is the generator of a positive strongly continuous semigroup (T (t)) t≥0 of contractions on ℓ 1 .

(5.3)
Hence, using the assumptions on the signs of the q j 's, we can apply Theorem 1 of Kato in [12]. Though the assumptions of Kato's theorem are referred to the equality instead of the inequality in (5.3), the proof works without any changes in our case. Thus, we do not repeat the proof here.
Let us turn to our original question on the expected value y 1 (t) -that is, the first coordinate of the solution of (3.9).
Theorem 7. Assume that the conditions of Lemma 6 are satisfied. Then there exists an appropriate Banach space of sequences ℓ 1 w with norm · w such that for any initial condition y 0 ∈ ℓ 1 w the solution y(t) := (y n (t)) n∈N of (3.9) and the solution f (t) := (f n (t)) n∈N of (IE) satisfy that is the solution of (3.9) tends to the solution of (IE) in a finite time interval as N → ∞.
Applying Lemma 6 on the space ℓ 1 w , we obtain an operator (L w , D w ) -acting formally as L -which is the generator of a contraction semigroup (T w (t)) t≥0 on ℓ 1 w . By the formula of variation of parameters for (5.5) (see, e.g., Engel and Nagel [7, Corollary III.1.7]), we have for any y 0 ∈ ℓ 1 w . Thus 8) and this implies (5.4).
Remark 8. We note that the sign conditions of Lemma 6 hold in the case of globally constrained random link activation-deletion process, but they do not hold in the case of an SIS epidemic on a complete graph.