Methods for approximating stochastic evolutionary dynamics on graphs

Population structure can have a significant effect on evolution. For some systems with sufficient symmetry, analytic results can be derived within the mathematical framework of evolutionary graph theory which relate to the outcome of the evolutionary process. However, for more complicated heterogeneous structures, computationally intensive methods are required such as individual-based stochastic simulations. By adapting methods from statistical physics, including moment closure techniques, we first show how to derive existing homogenised pair approximation models and the exact neutral drift model. We then develop node-level approximations to stochastic evolutionary processes on arbitrarily complex structured populations represented by finite graphs, which can capture the different dynamics for individual nodes in the population. Using these approximations, we evaluate the fixation probability of invading mutants for given initial conditions, where the dynamics follow standard evolutionary processes such as the invasion process. Comparisons with the output of stochastic simulations reveal the effectiveness of our approximations in describing the stochastic processes and in predicting the probability of fixation of mutants on a wide range of graphs. Construction of these models facilitates a systematic analysis and is valuable for a greater understanding of the influence of population structure on evolutionary processes.


Introduction
Models of evolutionary dynamics were originally deterministic and assumed well-mixed populations in 1 which every individual of a given type is identical. Stochastic models of these finite well-mixed popula-where an A individual obtains a payoff a when interacting with another A individual and payoff b when 66 interacting with a B individual. Similarly, a B individual obtains payoffs c and d when interacting with an 67 A individual and a B individual respectively. 68 To define fitness based on the payoff, following similar definitions in the literature [9,20,28,40,38], 69 the fitness of each individual is assumed to be f = f back + wP , where f back is the background fitness of all 70 individuals, P is the average payoff received from interactions with neighbours, and w ∈ [0, ∞) is a parameter 71 which controls the contribution of the game payoff to fitness. 72 The fitness of an A individual which occupies node j, f j A , is therefore given by and similarly the fitness of a B individual occupying node j is given by (2) In the special case of constant fitness, where the fitness of individuals remains constant independent of the 75 interactions with other individuals, we take the payoff matrix as and death-birth with selection on death (voter model) [28]. The methods developed in this 84 paper will be presented in the general case, and can be applied to any of the above update rules, but we 85 shall focus on the invasion process when generating specific examples. In the invasion process, we select an 86 Since we consider a strongly connected adjacency matrix G, provided we have at least one type A and one 114 type B it is possible to get to either of the absorbing states and therefore from any mixed initial condition the 115 system will always end up distributed between these two states. We define the fixation probability P A f ix (S(i)) 116 of type A from an initial state S(i) to be the probability of being in the all A absorbing state, that is in node i at time t given that the system is in state S at time t; we refer to this as the replacement rate. 134 Definition 2.2. X t C denotes the event that the set of nodes C is in state X at time t; for example A t {i} is 135 the event that node i is in the type A state at time t. 136 Throughout this paper we shall use the shorthand B t {i} A t {j} X t C to represent the intersection of events 138 Theorem 2.1. Under any Markovian update mechanism, for a structured population represented by the 139 adjacency matrix G, the rate of change of the probability that the individual in node i is an A individual is where the sum over X V \{i,j} is over all possible states of the nodes V \{i, j}.

141
Proof. See Appendix A.

142
This theorem can be applied to any update mechanism by choosing an appropriate definition for the 143 replacement rate, χ(Ω t j→i ), which we shall define for the invasion process as an example.

144
Example 2.1 (Invasion process). The invasion process is an adaptation of the Moran process [25] to struc-145 tured populations. Each event is determined by selecting an individual to reproduce with probability propor-146 tional to its fitness. It produces an identical offspring which replaces one of the connected individuals which 147 is chosen uniformly at random. Therefore the rate at which the individual in node j replaces the individual 148 in node i at time t under the invasion process rules is given by where f t j is the fitness of the individual occupying node j at time t, F t = N m=1 f t m is the total fitness of the 150 population, and k j denotes the degree of node j. Here, the factor f t j /F t is the rate at which node j is selected 151 to reproduce, and 1/k j is the probability of replacing the neighbouring individual i which is selected uniformly 152 at random.

153
When calculating χ(Ω t j→i ) in Equation (4), we will use the following expression for the fitness of the 154 individual at a given node j at time t, which is a sum of equations (1) and (2) weighted by the node probabilities. We use this definition because 156 when we evaluate Equation (6) given that the system is in a particular state S, as required by Equation (4) pairs are written in terms of triples and so on, until the full system state size is reached and the hierarchy 164 is closed. This is useful when we can find appropriate closure approximations to close this hierarchy at a 165 low order. However, we see that such an approach cannot be used here because we condition against the 166 full system state in Equation (4) which means that the full system size appears even at the first order. We 167 therefore attempt to find other methods to simplify this system of equations.

168
In this section we will describe three different techniques to derive approximations for this system. we need to condition against the state of all neighbours of j. As an example, we shall assume here that 188 selection occurs on birth so that we require conditioning on the neighbourhood of node j, however we can 189 also make similar arguments when selecting on death. Usingχ to represent this modification of χ in (4) and

190
Q to represent the new probability distribution with the modified time series we obtain where N j is the neighbourhood of node j, i.e. all nodes that are connected to j. To solve this system probabilities as a function of these.

197
For example, we can make a pair approximation by applying Bayes' Theorem and assuming statistical 198 independence at the level of pairs to rewrite the neighbourhood probability in terms of pair probabilities.

199
Applying Bayes' Theorem to the probabilities on the right hand side of Equation (7) we get If we assume statistical independence of all nodes in the neighbourhood of j, given the state of j, we can 201 rewrite the neighbourhood probability where X t {l} is event where node l is in the same state as it is in the event X t Nj \{i} . Substituting this into , in order to evaluate these equations we require additional We can now apply the same assumption regarding total fitness that we used for the single node probabilities 209 so that Applying Bayes' Theorem to the neighbourhood probability We can now assume statistical independence of the remaining nodes given the state of j and k so that Since we know that node i is connected to node j we can assume that given the state of node j, the state of node i is independent of node k, and similarly the state of any node in the neighbourhood of k is independent 214 of node j, which gives us Substituting this into Equation (10) gives While this system is closed, its computational complexity increases exponentially with the maximum node 217 degree of the graph, so it is not numerically feasible for graphs with highly connected nodes. While this 218 could potentially be addressed by introducing approximations for nodes with high degree and this may lead 219 to accurate models, here we continue towards a simplified model. To do this, we follow the same process as 220 in epidemic models and make a homogeneity assumption by assuming that any pair is equally likely to be 221 in any given state [18, 33]; i.e. Q(X t {i} |Y t {j} ) = Q(X t |Y t ) for all pairs (i, j). This leads to where k j is the degree of node j and n X is the number of type A individuals in state X Nj \{i} . Since the whereχ(Ω t A→B |n) is the rate at which we select one of the type A individuals to reproduce and replace a 228 type B, given that there are n type A individuals and k j − n type B individuals in the neighbourhood of 229 the selected node.

230
Since we have assumed that any pair is equally likely, this assumption only holds when every node in the graph forms k connections, which are chosen at random. Therefore we require that node i is equally likely 232 to be connected to any other node and all nodes are topologically equivalent, so that the probability that a 233 given node of type B is connected to x type A neighbours is given by a binomial distribution with n = k 234 and p = Q(A t |B t ). Therefore the probability of an individual being type A changes with rate We can also apply these assumptions to the pair-level equations to obtain a closed system of equations 236 which are efficient to solve numerically. The resulting model is equivalent to the model in [26], which was 237 justified by using the assumption that the population occupies a regular graph, such that all individuals 238 have degree k, and that all nodes are topologically equivalent, such that every pair of individuals is equally 239 likely to be connected. We have shown that by applying these assumptions to the exact node-level equations 240 (Equation (4)) we can derive these models.

241
Similarly we can obtain a pair approximation model for the dynamics where we select on death by In the next sections we will attempt to develop approximation 247 methods which can capture this node-specific information.

248
As we alluded to earlier, a natural method would be to use Equation (7)   dP Since χ(Ω t j→i ) is now the same for all system states, Adding and subtracting which is a closed set of N equations with at most N summands on the right hand side. Therefore by defining 261P as an approximation to the probability distribution P we obtain the closed system which is easy to solve numerically for an arbitrary graph. process under neutral drift we obtain χ(Ω t j→i ) = 1 N kj , and therefore Equation (11) can be written as which is equivalent to the exact node equation (4) Since the fixation probability is known, we now need to solve Equation (11) on the complete graph to 282 derive the ratio between the two. In the constant fitness case this can be done analytically, with the scaling 283 factor for m initial mutants given by . The derivation of this can be found in Appendix B. 285 We can now define two methods for predicting the fixation probability under any Markovian update 286 mechanism.

287
• Method 1 (Unconditioned fitness model) Solve Equation (11)  provide an approximation to the fixation probability from a given initial condition.

293
In Section 4 we investigate the numerical performance of these two methods. Note that for the purpose of this 294 paper we have found the scaling factor for Method 2 under the invasion process (Equation (12)). However, 295 the method can be applied to other update mechanisms, such as death-birth with selection on birth, by 296 finding an appropriate scaling factor, which can be done by solving Equation (11)  In Section 3.2 we restricted the conditioning so that we only require the marginal probabilities of the 301 individual nodes. However, this removes a significant amount of information from the dynamics. In the 302 evolutionary process, when considering a replacement event the two nodes of most interest are the node 303 selected for birth and the node selected for death. Therefore, here we follow a similar method but retain 304 conditioning on the states of these two key nodes. Since we restrict the conditioning to only the states of 305 the relevant contact, when looking at the term (4) we condition only 306 on the states of the nodes i and j and obtain Under the above condition, Equation (4) becomes To see the effect of this assumption on the rates, consider Here we condition only against 309 node i being in state B and node j being in state A rather than against the entire system state. Consequently 310 in the fitness equation (6) In Equation (13), the chance of selecting node j is now independent of the state X t V \{i,j} of the remaining 312 nodes which enables the equation to be reduced to This gives an approximate equation for individuals in terms of pairs. We then need to build equations to describe pair-level probabilities. Similar methodologies have been followed to describe epidemics propagated 315 on networks [33,34].

316
Applying the same conditioning to the exact pair level equation (9) we obtain .
, which, when we assume statistical independence of nodes i and k given j, simplifies to .
Typically this closure is applied to nodes on a graph where nodes i and j are connected and nodes j and k are connected but where there is no connection between nodes i and k, which we call an 'open triple'.

348
However, it could be applied to any triplet of nodes. This closure method is thought to be most accurate on 349 trees [18, 31, 34], and has been shown to be exact for such graphs under the SIR epidemic model [19,34,35]. 350 We can adopt either closure to remove triples and close the system. For example, if we are using the 351 Kirkwood closure to approximate all triples in Equation (15) we obtain the system of equations whereP represents the approximation to the probability distribution P . However, note that using this 353 closure for all triples will eventually require equations for every pair of nodes in the system, whether they 354 are connected or not.

355
It is also useful to use a combination of the two methods whereby the Kirkwood closure (16) is used for 356 closed triples, and (17) is used for open triples [13,33]. In this work we shall use this combined approach 357 to obtain a closed system. However, we find that unlike in epidemiology, this standard approach does not 358 produce good results. We therefore also try using just the Kirkwood closure because this permits explicit 359 correlations between nodes which are not linked, although as indicated above, this substantially increases 360 computational complexity because the system of equations will scale with N 2 rather than the number of 361 connected individuals in the graph.

362
With the contact conditioning model we define two different methods to approximate the evolutionary 363 dynamics. where there is no link between nodes i and k. We call this an open triple, and can approximate it as .
If there exists a link between nodes i and k we call this a closed triple, and approximate this using the 368 Kirkwood closure, .
Using this method it is only necessary to use pairs which have a link between them in the graph, and .   Table 1, where it can be seen that increasing the size from 20 to 35 to 50 moves the solution closer to 386 zero on random graphs). To account for this, we use Method 2 (scaled unconditioned fitness model). and scale-free random graphs, we start the process in three different initial conditions; a high-degree initial 394 node, a low-degree initial node and an average degree initial node. This is because under the dynamics of 395 the invasion process, a low degree node is known to act as an amplifier of selection and a high degree node 396 is known to act as a suppressor [2, 32], and so we potentially expect different performance of the methods 397 when initiated from nodes of different degree. In the k-regular random graph, since all nodes have equal 398 degree, we only consider results for one initial node. In addition to the random graphs (   in a significantly amplified approximation when initiated on the low degree leaf nodes, and for the 35 and 424 50 node star graphs the approximations initiated on the leaf node are close to 1. This is potentially due 425 to the time to convergence on large stars being very long, which allows these inaccuracies to compound so 426 that the system converges to this uninformative solution. This failure does not occur on these stars if we 427 reduce the fitness advantage, suggesting that as the size of the star becomes very large the method will only 428 work under weak selection. On random graphs, which do not significantly amplify fixation, this issue is also 429 observed, but only when the fitness advantage of one type is sufficiently high. This issue starts when the  (Table 2). Each line represents the marginal probability of a certain node in the graph being occupied by an A individual, the corresponding colours between the solid lines and dashed lines represent the same node on the graphs. The discrete-time stochastic process was simulated 10,000 times from the same initial condition, from which we obtained the probability for each node being a mutant at a given time as the proportion of simulations for which that node is a mutant. Method 4 was numerically integrated to approximate the probability of each node being a mutant at a given time. We use a dashed line with interpolation between integer time points for the discrete-time system to enable easier comparison of the dynamics. The game considered is the constant fitness case where the A individuals have fitness 1.2 and the B individuals have fitness 1.
we call the Hawk and Dove strategies, respectively. We let the resource yield a payoff V which both players received by a Dove from this interaction is V /2. Therefore, in this game the payoff matrix is given by with a population consisting of half Hawks and half Doves to minimise the chance of early extinction events. 465 We observe that when the cost is low the approximation is reasonable, with all 3 random graphs providing a 466 good approximation, and some accuracy lost on the square lattice. However, as we increase the cost, C, we 467 observe that the approximation does not perform well. This is because the contact conditioning assumption 468 seems to amplify the strength of the Hawk strategy, with the rate at which an individual becomes a Hawk 469 under this assumption being greater than it will be in the exact case.  We start with a representation of the stochastic evolutionary process using a master equation [11], were originally developed in physics [4,16] and later used in epidemiology and ecology [10,13,29,34,35].

486
The key idea behind these techniques is to write deterministic differential equations to describe how the 487 probabilities of the states of individuals and pairs change over time. 488 We find that a major difference between evolutionary graph theory and other areas in which these 489 methods have been applied is that here, event probabilities depend on the states of all individuals in the 490 population. As a result, we do not obtain a precise BBGKY-like hierarchy, which relies on neighbouring 491 particle-particle interactions. Another difference is that in evolutionary dynamics, we have two absorbing 492 states, which potentially leads to system-wide correlations that cannot be captured on a local level. It is 493 worth noting that some alternative nearest-neighbour interaction evolutionary models, which may yield such 494 a hierarchy directly, have also been considered [39]; however, in this paper we have restricted our attention 495 to the classic evolutionary graph theory dynamics.

496
In spite of these differences, some progress could be made towards approximating evolutionary dynamics.

511
To obtain an approximation which is numerically feasible in general, we first ignored any conditioning, 512 similar to a model in [37] which uses this assumption to construct a population level approximation. The 513 resulting model (Equation (11)) was found to work well for small graphs and contains the exact neutral 514 drift model [32] as a special case. However, as population size increases, the predictions for the fixation 515 probability of a single mutant individual were observed to tend to zero. By solving this system for the 516 fixation probability on a complete graph, we obtained a scaling factor which enabled this model to give a 517 reasonable prediction of fixation probability from a given initial condition with a single mutant individual 518 on any graph. Due to the construction of this method, it will perform best on graphs which yield average 519 fixation probability close to the Moran probability.

520
To generate a more accurate model and one which does not require an artificial scaling factor, we in- this method provides a reasonable approximation to the fixation probability. When the degree of the initial 537 mutant node is not low the approximation can be very accurate. However, if we initiate on a low degree 538 node, the method performs less well, potentially due to such nodes amplifying the fixation probability in the 539 invasion process, again leading to inaccuracies in the solution being amplified. Despite potential inaccuracies 540 in the fixation probability approximation, we observe that this method is particularly accurate for the early-541 time behaviour of these systems for any graph, and therefore can give interesting insights into this behaviour.

542
The method is computationally feasible for reasonably large N , however, the computational complexity scales 543 with N 2 rather than with N which is more typical for epidemic models. Nevertheless, this still represents a 544 significant reduction over the master equation which scales with 2 N .

545
The novelty of this work is the adaption of well-established techniques from other fields to the study of 546 evolutionary dynamics at the level of individual nodes. The contribution is two-fold. Firstly we have obtained 547 insight into existing models by deriving them from the master equation. Secondly, the advantage of looking 548 at node-level quantities rather than a homogenised model is that we gain the ability to compare dynamics 549 from different initial conditions on the same graph, which is not present in many other approximation where X V \{i} is the state of the nodes in the system not including i.

650
Consider a set state X V \{i} of the remaining nodes. The rate of change in the full system state probability 651 P (A t {i} X t V \{i} ) is given by where 1 (B t {j} ∈X t V \{i} ) is an indicator function on the event B t {j} being part the event X t V \{i} . That is, the 658 state of node j in the state X is type B. The χ(Ω t j→i |A t {i} X t V \{i} ) term is the rate at which the individual 659 in node j replaces the individual in node i, given that the system is in state A t {i} X t V \{i} , as defined in 660 Definition 2.1. Rearranging these and substituting into Equation (A.2) gives By substituting this into Equation (A.1) we obtain Clearly the last two sums cancel, so we can simplify this to as required. Considering the last term on the right hand side we have