A new and accurate continuum description of moving fronts

Processes that involve moving fronts of populations are prevalent in ecology and cell biology. A common approach to describe these processes is a lattice-based random walk model, which can include mechanisms such as crowding, birth, death, movement and agent–agent adhesion. However, these models are generally analytically intractable and it is computationally expensive to perform sufficiently many realisations of the model to obtain an estimate of average behaviour that is not dominated by random fluctuations. To avoid these issues, both mean-field (MF) and corrected mean-field (CMF) continuum descriptions of random walk models have been proposed. However, both continuum descriptions are inaccurate outside of limited parameter regimes, and CMF descriptions cannot be employed to describe moving fronts. Here we present an alternative description in terms of the dynamics of groups of contiguous occupied lattice sites and contiguous vacant lattice sites. Our description provides an accurate prediction of the average random walk behaviour in all parameter regimes. Critically, our description accurately predicts the persistence or extinction of the population in situations where previous continuum descriptions predict the opposite outcome. Furthermore, unlike traditional MF models, our approach provides information about the spatial clustering within the population and, subsequently, the moving front.

Lattice-based random walk models that include crowding, birth, death, movement and agent-agent adhesion are commonly used to describe processes that involve moving fronts [6,[16][17][18][19][20][21]. For example, these random walk models have been used to interpret in vitro cell biology experiments by considering the position of the leading edge of the cell front or the cell density profile [6,19,21]. Illien et al [18] consider random walks in the context of single-file diffusion models of active transport, that is, transport that requires energy due to an opposing force. However, the stochastic nature of random walks makes it problematic to efficiently examine the collective behaviour of a population, as a large number of realisations of the random walk must be performed to reduce the influence of stochastic fluctuations. Furthermore, it is difficult to determine meaningful population behaviour through analysis of the discrete process. There is, therefore, considerable interest in approaches that are both analytically tractable and avoid the computational expense of repeated simulations.
A common technique to analyse random walk processes is to consider a deterministic, continuum approximation of the discrete process [19,20,[22][23][24][25][26][27][28][29][30]. The standard approximation, known as a mean-field (MF) approximation, results in a partial differential equation (PDE) [19,20,25]. The resulting PDE is amenable to analysis but only provides an accurate approximation of the discrete process in an extremely limited set of parameter regimes where spatial correlations are weak [22,24,31]. Hence, using these approximations to model moving fronts may provide an inaccurate estimate of the velocity of the moving front if the spatial correlations are important. This inaccuracy could have significant consequences if, for example, the front velocity is used to make decisions about implementing ecological control measures. To address the influence of spatial correlations, alternative approximation techniques have been proposed [22,24,28,32,33]. The corrected meanfield (CMF) approximation, which explicitly describes pairwise correlations, results in a system of ordinary differential equations (ODEs) that accurately approximate the discrete process for a wider range of parameter regimes, compared to the MF approximation [22,24]. However, the CMF description is still invalid in parameter regimes where spatial correlations are sufficiently strong [28]. Furthermore, the CMF cannot be used to study moving fronts, as the governing equations cannot be evaluated anywhere that has zero agent density [22], such as areas where the population has yet to invade. The chain-and-gap approach (C&G), proposed by Johnston et al [28], considers the dynamics of groups of contiguous occupied and vacant sites, and results in a system of ODEs that provide an accurate approximation of the discrete process in all parameter regimes. These groups are termed chains and gaps for the contiguous occupied and vacant sites, respectively. Additionally, the C&G description provides information about the spatial clustering and patchiness present in the system. There is considerable interest in determining the influence of local spatial structure on the persistence of a species [34,35]. However, the C&G description has previously only been applied to discrete processes that are, on average, spatially uniform [28]. As such, the C&G description is not currently suitable for describing processes that contain moving fronts.
Here the C&G description presented by Johnston et al [28] is extended to incorporate spatial variation so that the description can be applied to moving fronts. We interpret the discrete process in terms of chains and gaps, noting that the left-most site in each chain or gap can occur at any lattice site. The corresponding system of ODEs is derived and presented, and we demonstrate that the numerical solution of the ODE system provides an accurate approximation of the average discrete behaviour in all cases, even in parameter regimes where both the MF and CMF descriptions are inaccurate. This allows for the robust prediction of whether a population persists or becomes extinct, as well as reliable estimates of the velocity of the moving front. In addition, for the first time, the C&G description has been extended to include rates of birth, death and movement that are dependent on the length of the chain an agent belongs to. Furthermore, the C&G description includes explicit information about the spatial clustering present within a moving front.

Random walk model
We consider a one-dimensional lattice-based random walk model where each lattice site may be occupied by, at most, one agent [36]. The lattice is interpreted as a combination of groups of contiguous occupied and vacant sites [28]. Agents on the lattice undergo birth, death and movement events. These events occur at rates P p n , P d n and P m n per unit time, respectively, where is the length of the chain an agent belongs to and N is the total number of sites. During a potential birth event, an agent randomly selects a nearestneighbour site and attempts to place a daughter agent at that site. The birth event is successful provided that the target site is vacant [24]. During a death event, an agent is removed from the lattice [24]. During a potential movement event, an agent selects a nearest-neighbour site and attempts to move to that site, and is successful provided that the target site is vacant [24]. The target site selection is unbiased if both nearestneighbour sites are vacant. If one nearest-neighbour site is occupied, the vacant nearest-neighbour site is selected with probability ( ) [26]. The constant parameter α represents the strength of agent-agent adhesion/repulsion. Setting a = 0 means that there is no agent-agent adhesion/ repulsion, whereas setting a ¹ 0 simulates agent-agent adhesion (a > 0) or repulsion (a < 0) [26]. Note that α does not depend on the chain length.
Due to crowding, the success of birth and movement events depends on whether an agent has zero, one or two nearest-neighbour agents. These agents are referred to as single, edge or middle agents, respectively [28]. An example lattice configuration highlighting the different types of agents is presented in figure 1 The spatially dependent restriction on the length is due to the choice of no-flux boundary conditions. Note that Note that middle agents cannot exist at sites i=1 and i=N due to the choice of boundary conditions. Similar to the approach of Johnston et al [28], we consider how birth, death and movement events change the location and lengths of the chains and gaps, rather than the occupancy of an individual lattice site. This approach avoids making an assumption about the probability that a particular site is occupied or vacant, as the sites either side of a chain or gap are necessarily vacant or occupied, respectively. There are twenty different types of events that change either the location of the left-most agent in a chain, the length of a chain, or both. An example of each event is presented in figure 1 for a particular configuration of agents. Events can have more than one potential outcome. For example, the potential outcomes of a single agent at site i undergoing a birth event can be classified into four groups. The daughter agent can be placed at sitei 1 or + i 1, and the gap that the daughter agent is placed in can be length one or greater. Here we detail each possible event and the subsequent change in configuration with respect to the number of chains and gaps. Events are referred to by a nomenclature describing the type of agent undergoing the event and the mechanism of the event itself, followed by a number highlighting the potential for multiple outcomes to arise from a specific event. For example, birth events for single agents are referred as SB events. Since there are four different types of SB events, we refer to these as SB1, SB2, SB3 and SB4 events. The details of each event are as follows: To obtain expressions for the time rate of change of C i n and G i m , we consider the rate at which each event occurs at site i, and all possible results of each event. Birth events are always successful for single agents and, as such, occur at rate P C Practical examples that could be described using chain-length dependent rates arise in a variety of situations [28,37]. For example, Hedayati et al [37] demonstrate that nanoparticle-mediated heating causes cytotoxicity in prostate cancer cells, provided the volume of cells is above a threshold value. Furthermore, the cytotoxicity increases with the cell number. Our framework would be suitable for modelling this process, as we are able to impose death rates that are zero below a threshold length, and an increasing function with respect to length otherwise.
While the rates at which events occur for each mechanism and agent type combination are known, multiple events can occur for a mechanism and agent type combination. For example, there are four types of birth events for single agents and, as such, we require the proportion of single birth events that are SB1, SB2, SB3 and SB4 events. For a single birth event to be an SB1 event, the agent at site i must place a daughter agent at sitei 1 and the gap that contains sitei 1 must be of length two or greater. Note that the gap cannot include site i, as the selected agent occupies that site. The proportion of single birth events where the agent at site i selects a target site ati 1 is 1/2. The proportion of single birth events where the agent selects a target site that is part of a gap of length two or greater is å å , that is, the number of gaps including sitei 1 that are of length two or greater divided by the total number of gaps including sitei 1. Hence the rate at which SB1 events occur at site i, and subsequently decrease C i 1 and increase - 2 . The proportion of SB1 events that change -G j i j and -- . Therefore, we can determine the expected rate of change for all chains and gaps affected by an SB1 event at site i. Following a similar process for all events we obtain transition rates for C i n and G i m , where 1 . The resulting system of ODEs is presented in appendix A.1 for C i n and appendix A.2 for G i m .

Traditional MF descriptions
Traditional MF descriptions of lattice-based random walk models containing crowding, birth, death, movement and agent-agent adhesion do not have the flexibility to describe processes where the rate of birth, death and/or movement is arbitrarily chain-length dependent. However, with certain simplifying assumptions, MF descriptions of the discrete process can be derived [20,38]. Here we examine two special cases of the discrete process, where continuum MF descriptions have been presented previously. Special case 1: One simplifying assumption is that the birth, death and movement rates are independent of chain length. Hence we define = = =¼= P P P P . There is, therefore, no positive or negative benefit associated with an agent being near other agents. Furthermore, there is no agent-agent adhesion/repulsion, and hence a = 0. The MF description for this case takes the form of a reaction-diffusion equation, known as the Fisher-Kolmogorov model [20,[38][39][40], and ( ) r x t , is the agent density [20,38]. Special case 2: An alternative simplifying assumption is that birth, death and movement rates depend on whether an agent is isolated or not. An isolated agent has zero nearest-neighbours [38], corresponding to a chain of length one. All grouped agents, that is, agents with at least one nearest-neighbour, undergo birth, death and movement events at the same rate. Grouped agents correspond to agents that are part of a chain of length two or greater. Hence we define = =¼= P P P then isolated agents are more likely to die, compared to other agents, and hence there is a positive benefit associated with being adjacent to other agents. This allows for significant flexibility in modelling a variety of competitive or co-operative processes [38]. Again, there is no agent-agent adhesion/ repulsion, and a = 0. The MF description for this case is a reaction-diffusion equation with a nonlinear diffusivity function and Allee kinetics [38], While CMF descriptions have been presented for certain lattice-based random walk models, these descriptions are unsuitable for studying problems containing moving fronts as the system of governing ODEs is singular at zero agent density [22,24,26].

Results
The solution of the MF model leads to a prediction of the average agent density profile as a function of position and time, whereas the C&G description provides the number of chains and gaps of all possible lengths as a function of location and time. Hence to compare the MF descriptions with the C&G description, it is necessary to reconstruct the agent density at each location, r i , from C i n and G i m . The agent density at site i is the sum of all possible chains that include site i, namely, An example of the output from the one-dimensional discrete model, illustrating twenty identically prepared realisations, is presented in figures 2(a)-(c) at t=0, t=100 and t=200, respectively. Identically prepared realisations refer to simulations of the discrete model performed with the same initial condition, parameter regime and boundary conditions. Initially, the domain is fully occupied for   x 61 140, and vacant otherwise. As time increases, the population spreads into the initially vacant region. Note that figures 2(a)-(c) each show twenty one-dimensional simulations, rather than one two-dimensional simulation. To obtain the average behaviour of the agent population, M identically prepared realisations of the discrete model are performed and the binary lattice occupancy at each site i,r i , is calculated for each realisation. Note that the discrete model is simulated with the Gillespie algorithm [41]. The binary lattice occupancy is then averaged, giving The averaged density profile from the discrete model is presented in figure 2(d), with the numerical solutions to both equation (1) and the C&G governing equations superimposed, at t=100 and t=200. Both continuum descriptions match the averaged discrete model predictions extremely well. Details of the numerical techniques used to solve the PDEs and systems of ODEs are presented in appendix B.
The parameter regime considered in figure 2 has  P P 1 m p and P d =0, and, as such, we expect the MF description to approximate the average discrete behaviour well since this combination of parameters avoids the formation of significant agent clustering [20,22]. We now consider a parameter regime where P P m p is ( )  1 . In such parameter regimes, the spatial correlations are significant and, subsequently, the MF description does not provide a valid approximation of the average discrete behaviour [22,28]. To highlight this, a comparison between the average discrete behaviour and the numerical solution of equation (1) is presented in figure 3(a). For these results, the domain is initially fully occupied for  x 20, and is vacant otherwise. The MF description predicts a front that is significantly sharper, and has a higher front speed, compared to the discrete model. In contrast, the numerical solution of the C&G description predicts the average discrete behaviour well, matching both the shape and position of the averaged discrete data. Furthermore, the distribution of chain and gap lengths, C n and G m , matches the observed average distribution of chains and gaps in the discrete model, as shown in figures 3(b) and (c). As such, the C&G description provides an accurate estimate of the front shape and speed, as well as a valid prediction of the clustering of occupied and vacant sites in the system. If the birth, death and movement rates depend on whether an agent has zero or at least one nearestneighbour agent then the MF description of the discrete model is equation (2) [38]. A comparison of the average discrete behaviour, the numerical solution of equation (2) and the numerical solution of the C&G description in an appropriate parameter regime is presented in figure 4(a). Interestingly, the MF description predicts that the agent population moves in the negative x direction, and would subsequently become extinct. In contrast, both the discrete model and the C&G description suggest that the agent population persists, and spreads in the positive x direction. Again, the C&G description matches the average discrete behaviour extremely well. The results in figure 4(a) highlight the need for an accurate approximation. If the naive approach of implementing a MF approximation to describe the spread of an invasive species is taken, it might be recommended that no culling measures are required to curtail the spread of the species. Obviously, such a recommendation is incorrect if the aim is to halt the invasion of the alien species. The clustering present in the system is highlighted in figures 4(b) and (c), for chains and gaps, respectively. Intuitively, as time increases and the population spreads, the average chain length increases and the average gap length decreases. The C&G description predicts both the average chain and gap distributions in the discrete model well.
Introducing a non-zero death rate for chains of length two or greater reduces the carrying capacity in the MF description of the discrete model [38]. To determine whether this reduction is an accurate reflection of the average discrete behaviour, we present a comparison of the average discrete behaviour, and the numerical solutions of both equation (2) and the C&G governing equations in figure 5(a). Both the discrete model and the C&G description predict that the peak agent density near x=50 decreases between t=25 and t=50, whereas the MF description predicts that the peak agent density at this location is approximately constant, at r = 0.646 [38], after t=25. Interestingly, both the discrete model and the C&G description predict that the population eventually goes extinct. In contrast, the MF description predicts that the population persists and spreads  throughout the domain. Again, these results highlight the importance of implementing an accurate approximation to obtain meaningful conclusions, as well as the robust nature of the C&G description.
Note that all results presented here have been performed with a = 0, and hence no agent-agent adhesion/ repulsion. Simulations performed with a ¹ 0 (not presented) confirm that the C&G description accurately predicts the average discrete behaviour in all cases, even with strong agent-agent adhesion/repulsion.

Experimental case study
To highlight the insight provided about spatial clustering by the C&G description, we consider a case study motivated by a scratch assay. Scratch assays are widely used to observe the collective behaviour of a cell population in response to a model wound [42]. We present a schematic representation and experimental images of a scratch assay in figure 6. In a scratch assay, a cell population is placed on a dish and allowed to grow to confluence. A portion of the cell population is then removed, and the remaining cells spread, through a combination of migration and proliferation, into the newly vacant space [42]. A schematic representation of the confluent population before and after the scratch is performed is presented in figures 6(a) and (b), with a typical experimental field of view highlighted in green. To mimic the geometry of this experiment, we consider   x 1 100 and initially set ( ) r = x 1 for  x 30 and ( ) r = x 0, otherwise. Note that scratch assays are twodimensional processes, as observed in the experimental images of a scratch assay for a 3T3 fibroblast population in figures 6(c) and (d), and the corresponding schematics for this experiment in figures 6(e) and (f). Full experimental details are given in [6].  As the experiment is approximately spatially uniform in one direction, we can approximate the scratch assay with a one-dimensional model [43]. We consider two scratch assays performed with two different cell lines where parameter estimates for the cell motility and cell proliferation rates have been presented previously: 3T3 fibroblast cells (3T3 cells) and MDA MB 231 breast cancer cells (231 cells) [44]. The investigation performed by Simpson et al [44] resulted in parameter estimates of = P 0.056 Note that the ratio P P p i m i is approximately one order of magnitude higher for the breast cancer cells compared to the fibroblasts, which implies that the spatial correlations between breast cancer cells will be more significant [22].
For the numerical solution corresponding to the 3T3 cell population, presented in figure 7, both the MF description and the C&G description approximate the average discrete behaviour reasonably well. However, the C&G description provides additional information regarding the clustering present within the migrating cell population. The chain distribution, presented in figure 7(b), suggests that the 3T3 population does not form significant clusters, as the majority of the chains are short length.
In contrast, for the numerical solution corresponding to the 231 cell population, presented in figure 8, the C&G description accurately approximates the average discrete behaviour, while the MF description does not. This is result is intuitive as we observe that there is significantly more clustering present in the system, compared to the numerical solution corresponding to the 3T3 cell population. That is, the chain distribution in figure 8(b) contains significantly fewer chains of short length, compared to the chain distribution in figure 7(b). For example, at the times shown, C 1 is approximately 25 times higher in the 3T3 cell population, compared to the 231 cell population. Critically, the C&G description accurately predicts the experimental observation that 231 cells form clusters while 3T3 cells do not [44].  Specifically, during monolayer formation, 3T3 cells form an approximately spatially uniform monolayer while 231 cells develop into clusters [44].

Discussion and conclusions
Processes that involve moving fronts are common in cell biology and ecology [1,2,[4][5][6][7][9][10][11][12], and latticebased random walks are widely employed to describe these processes [6,16,[18][19][20][21]45]. Due to the stochastic nature of random walks, it can be computationally intractable to perform sufficiently many realisations of a random walk model to obtain average behaviour that is not dominated by fluctuations. Furthermore, it is difficult to extract meaningful information about population behaviour through analysis of the random walk. The standard approach to overcome these issues is to derive a MF description of the random walk [19,20,25]. However, this approach relies on the assumption that any spatial correlations within the random walk are weak [22,24]. CMF descriptions that account for the spatial correlations have been proposed [22,24,46]. Unfortunately, these descriptions are not applicable to problems involving moving fronts as the ODEs governing the CMF description are singular in regions where the density of agents is zero [22,24].
Here we develop and present an accurate continuum description for moving fronts associated with latticebased random walks that contain crowding, birth, death, movement and agent-agent adhesion. We consider processes that are spatially variable, and include birth, death and movement rates that are chain-length dependent. Our C&G description provides predictions that match the average behaviour of the discrete model well in all parameter regimes. In contrast, the MF description is less flexible in terms of the birth, death and movement rates and only provides a valid approximation of the average discrete behaviour in extremely limited parameter regimes. Furthermore, for all cases considered in this work, the C&G description requires less computation time than performing 1000 realisations of the discrete model. A comparison between the time taken to perform a single realisation of the discrete model, 1000 realisations of the discrete model, and to obtain the numerical solution to the C&G system of equations is presented in table 1.
For the special case where the rates of birth, death and movement are independent of the chain length, the MF description correctly predicts the persistence of the population but inaccurately predicts the front velocity. For the special case where the rates of birth, death and movement are different depending on whether the agents are part of a chain of length one, or are part of a chain of length two or greater, the MF description predicts persistence when the population becomes extinct, and predicts extinction when the population persists. In both these cases the C&G description accurately predicts the front velocity, and the persistence or extinction of the population. Furthermore, the C&G description provides information about the spatial clustering of both occupied and unoccupied sites, and the clustering predictions approximate the clustering observed in the discrete model accurately.
The work presented here could be extended in several ways. The influence of local spatial structure on the persistence of species is a key question in ecology [34,35,45,47]. The C&G description provides an explicit estimate of the spatial clustering of both agents and unoccupied space. Therefore, it would be instructive to apply the C&G description to ecological processes to obtain insight into the clustering present in the system for parameter regimes where the agent population becomes extinct. Another approach would be to investigate a truncated system of governing equations, where there is a maximum chain or gap length. If there is prior knowledge about the long-time density of the system, an assumption could be made that chains or gaps above a threshold length could be neglected, and hence the system of equations could be truncated. This truncation would reduce the computational cost associated with obtaining a numerical solution to the governing equations. It would be insightful to examine the trade-off between the reduction in computational cost and the decreased accuracy caused by the truncation. Alternatively, the C&G description presented here could be calibrated to experimental data from the cell biology literature. For example, lattice-based random walks have been calibrated to in vitro cell biology Table 1. Time in seconds taken to perform: (i) a single realisation of the discrete model; (ii) 1000 realisations of the discrete model, and (iii) a numerical solution of the C&G system of equations for the parameter values in figures 2-7. All solutions are obtained using a single 3.0 GHz Intel i7-3540 M desktop processor. Note that the computation time for 1000 realisations is lower than performing 1000 repeats of a single realisation due to the time associated with initial set-up.  experiments to obtain estimates of cell diffusivity and cell proliferation rates [6,21]. However, the calibration of stochastic models to experimental data is computationally expensive [6,48]. As the C&G description accurately approximates the average random walk behaviour in all parameter regimes, it would be instructive to determine whether similar cell diffusivity and cell proliferation rates could be obtained from calibration of the deterministic C&G description to experimental data, and to quantify the reduction in computation time to obtain the parameter estimates. However, we leave these extensions for future consideration.

Acknowledgments
We appreciate the assistance provided by Emeritus Professor Sean McElwain. This work is supported by the Australian Research Council (FT130100148, DP140100249). We thank two referees and the editor for their helpful comments.

A.1. Chains
Here we present the time rate of change for each chain length and location, obtained by considering the potential outcomes of each type of birth, death and movement event.

A.2. Gaps
Here we present the time rate of change for each gap length and location, obtained by considering the potential outcomes of each type of birth, death and movement event.
, ,            The Y function represents the number of configurations that contain the specified chains and gaps within the parentheses, and is defined as