Mechanistic and experimental models of cell migration reveal the importance of cell-to-cell pushing in cell invasion

Moving fronts of cells are essential for development, repair and disease progression. Therefore, understanding and quantifying the details of the mechanisms that drive the movement of cell fronts is of wide interest. Quantitatively identifying the role of intercellular interactions, and in particular the role of cell pushing, remains an open question. In this work, we report a combined experimental-modelling approach showing that intercellular interactions contribute significantly to the spatial spreading of a population of cells. We use a novel experimental data set with PC-3 prostate cancer cells that have been pretreated with Mitomycin-C to suppress proliferation. This allows us to experimentally separate the effects of cell migration from cell proliferation, thereby enabling us to focus on the migration process in detail as the population of cells recolonizes an initially-vacant region in a series of two-dimensional experiments. We quantitatively model the experiments using a stochastic modelling framework, based on Langevin dynamics, which explicitly incorporates random motility and various intercellular forces including: (i) long range attraction (adhesion); and (ii) finite size effects that drive short range repulsion (pushing). Quantitatively comparing the ability of this model to describe the experimentally observed population-level behaviour provides us with quantitative insight into the roles of random motility and intercellular interactions. To quantify the mechanisms at play, we calibrate the stochastic model to match experimental cell density profiles to obtain estimates of cell diffusivity, D, and the amplitude of intercellular forces, f0. Our analysis shows that taking a standard modelling approach which ignores intercellular forces provides a poor match to the experimental data whereas incorporating intercellular forces, including short-range pushing and longer range attraction, leads to a faithful representation of the experimental observations. These results demonstrate a significant role of cell pushing during cell front movement and invasion.


Introduction
Moving cell fronts occur during many physiological processes, such as wound healing, morphogenesis, and malignant invasion [1][2][3][4]. Typically, cell fronts are observed as advancing, sharp boundaries between densely occupied and vacant regions, or as a moving interface between two distinct populations of cells [5]. An example of the first scenario is wound healing, where populations of cells close and recolonize an initially vacant space [6,7], as shown in figure 1. An advancing interface between two populations of cells is often associated with malignant invasion into surrounding tissues [8,9]. Improving our understanding of how cell populations spread can provide important, clinically-relevant information about the nature of moving cell fronts. Historically, moving cell fronts have been studied, both in vitro and in vivo, to provide both qualitative and quantitative information about the mechanisms that drive front movement. We note that quantifying the precise contributions of various cellular-level mechanisms that lead to populationlevel front behavior is a nontrivial task that requires the integration of many different types of experimental data [10]. Often it is assumed that the movement of advancing cell fronts is driven by the combined effects of undirected cell migration and carrying capacitylimited cell proliferation [10]. At present, a fundamental question, which remains largely overlooked in the mathematical biology literature, is what is the role of cell-to-cell pushing and how does it influence population-level front movement?
Cell motility is a complicated process involving both the interplay and competition between various individual-level mechanisms [5,10,11]. One of the most well-studied individual-level cell motility mechanisms is lamellipodial cell migration where cells undergo undirected movement due to myosin-powered contractions of the actin network under the cytoplasmic membrane [12]. Since this process is observed in many cell types, it remains prevalent in many mathematical modelling frameworks. As such, the assumption that cells undergo Brownian motion is often invoked and cells are represented, either implicitly or explicitly, as non-interacting point particles that move according to a white Gaussian process [13]. While this approach is appealing due to its simplicity, it neglects the effects of intercellular interactions, such as adhesion and finite size (crowding) effects. The neglect of cell-to-cell adhesion can be problematic because it is known that mesenchymal cell types, such as keratinocytes, can be strongly affected by cell-to-cell adhesion during wound healing [14]. Furthermore, adherent cells can form clusters that exhibit qualitatively different behaviour from isolated cells [15][16][17][18]. For example, strongly adherent groups of cells can undergo movement mediated by adherens junctions and move as a whole in coordinated process known as locomotion [19,20]. During this motion cells can change shape because of the imbalance between traction and friction forces induced by leader and follower cells [21]. A few individual cells can heavily influence front expansion in a process known as trailblazing and form finger-like protrusions [22,23]. Tightly packed interacting monolayers of cells can exhibit properties similar to the solid state matter with a transition to the fluid-like behaviour [24] Arguably, some of the most striking examples of the front-like spreading of a cell population occur during embryonic development, such as neural crest cell invasion in the developing gut tissues, which is thought to arise as a consequence of combined undirected Brownian cell motility and carrying capacity limited cell proliferation [25][26][27]. However, previous investigations have made the point that short range cell pushing can also play a role in driving the movement of cell fronts in confined environments, such as living tissues [28]. Henceforth, we hypothesize that cells in a confined space may generate population pressure, driven by finite size effects and local repulsion, which can stimulate spatial expansion of the population.
Perhaps the most popular mathematical framework for modelling the movement of cell fronts involves using reaction-diffusion equations [29][30][31], including the Fisher-Kolmogorov equation, and generalisations thereof. Previously, the lack of individuallevel experimental data meant that classical reactiondiffusion models were a useful way to conceptualize and simulate collective cell behaviour. However, with the increasing availability of individual-level information it is becoming important to develop mathematical models that provide both population-level information and individual-level information. There is a vast number of discrete models that focus on many individual-level features of cell behaviour [32][33][34][35]. The downside of many modelling frameworks presented in the literature is a large number of free parameters which makes parameter estimation problems computationally intractable and simulations hard to interpret. For example, in [33] the authors develop a comprehensive mathematical model of stem cells dynamics in the intestinal crypt. This model requires the selection of 25 free parameters. Some of these parameters are not directly measurable, and other parameters in the model had not been estimated before this study. Consequently, quantifying the precise contribution of different factors that control cell colony behaviour and dynamics is very challenging. To deal with this question of over parameterization, we deal only with a fundamental model that is characterized by a minimal number of degrees of freedom that we can use experimental data to identify the parameters in the model and hence to quantitatively explore the roles of undirected migration and cell-tocell pushing in our experiments.
Previously, cell pushing has been incorporated into lattice-based models of cell motility where agents move on a spatial domain that is represented as a regular lattice [36,37]. However, these previous studies about the role of cell pushing are primarily theoretical studies that do not consider calibrating models to quantitatively match any experimental data. Our current work is the first attempt to incorporate short range cell pushing into a more realistic spatially continuous, off-lattice discrete model of cell migration. Importantly, we directly apply the model to quantitatively mimic a novel experimental data set. Figure 2 illustrates the IncuCyte ZOOM TM scratch assay experimental protocol that we use in this work, at t=0 h and t=48 h. In our experiments we isolate the role of cell migration from the effects of cell proliferation by working with a population of cells that has been pretreated with a chemotherapy drug, Mitomycin-C, to block DNA replication and, consequently, suppress proliferation [38]. This means that the number of cells present in the experimental field of view over the duration of the experiment remains approximately constant. However, a side effect of Mitomycin-C pretreatment is that individual cells increase in size during the experiment as the cells prepare to proceed through the cell cycle but are unable to divide [39]. This dynamic increase in cell size, which is typically neglected in previous modelling studies [39], can significantly influence intercellular interactions during the experiment and so we take great care to incorporate these effects into our mathematical model. Our approach for incorporating dynamic cell size effects in the model is justified by also working with a simpler model that neglects dynamic changes in cell size; we find that the simpler, standard model leads to a much poorer match with the experimental data.
This work is structured as follows. We begin by describing the IncuCyte ZOOM TM experimental protocol, the experimental data, and the procedure we use to process the experimental images. We then introduce the discrete mathematical model which accounts for random motility and intercellular interactions, including short range pushing and longer range attraction, as well as incorporating a mechanism for describing dynamic cell size changes. We refer to this model as Model I since it incorporates all four mechanisms that are thought to be relevant to the experimental system. To quantitatively explore the significance of these various cell-level mechanisms we systematically repeat the model calibration process for a range of simpler, more commonly used models. These simplified models account for: (i) random motility and intercellular forces (Model II); (ii) intercellular forces only (Model III); and (iii) random motility only (Model IV ). We discuss the performance of each model when applied to the IncuCyte ZOOM TM data in the Results and Discussion. Finally, in the Conclusions we summarize our findings and discuss alternative applications and extensions of our modelling framework.

Materials and methods
2.1. IncuCyte ZOOM TM experimental data Monolayer scratch assays are performed using the IncuCyte ZOOM TM system (Essen BioScience, MI USA [40]) as shown in figures 2(A)-(B). All experiments are performed using the PC-3 prostate cancer cell line [41] from the American Type Culture Collection (ATCC, Manassas, USA). The procedure of growing the cell culture in a flask is outlined in [42]. After growing, cells are removed from the flask using TrypLE TM (Life Technologies) in phosphate buffered saline, resuspended in medium and seeded at a density of 20,000 cells per well in 96-well ImageLock plates (Essen BioScience) as shown in figure 3. The diameter of each individual well is 9000 μm.
Mitomycin-C is added at a concentration of 10 g mL −1 for two hours before a scratch is made in the monolayer of cells [38,43,44]. A WoundMaker TM (Essen BioScience) is used to create identical scratches in the uniformly distributed populations. Medium is aspirated after scratching; each well is washed twice and refilled with fresh medium (100 μl). Plates are incubated in IncuCyte ZOOM TM and photographed every 2 hours for 48 hours. In total, these experiments are performed in eight of the 96 wells on the 96-well plate. After a preliminary visual inspection of the resulting eight experimental images, we selected four typical wells for analysis. Throughout this work we will refer to these four identically prepared experiments as Experiment A, B, C and D, respectively. All experiments are initiated with a monolayer of cells close to confluence. Even though the cell culture was treated with Mitomycin-C, which ensures the absence of proliferation and approximately constant cell count during an experiment, the monolayer of cells remains close to confluence throughout the duration of the experiment due to increase in the size of individual cells (figure A.1, Supplementary materials available online at stacks.iop.org/BPEX/5/045009/mmedia).
By the end of the experiment, summarised in figures 2(A)-(B), we see that the Mitomycin-C pretreated cells have approximately doubled in size [39].
There is little information about the change in volume of cells since all data is in the form of 2D images. However, we note that, due to effects of Mitomycin-C pretreatment that allows cells to go through cell cycle to the point when they are ready to divide, the observed growth is rather volumetric and reflects actual change in cell size. To quantify this increase in cell size we randomly choose 30 cells from the experimental images at t=0, 12, 24, 36 and 48 h and use these cells to estimate the average diameter as a function of time, δ(t).
To do this we estimate the area on the image occupied by each particular cell, and then convert this estimate of area into an equivalent diameter, d p = A 4 , where A is the area estimate. With 30 estimates of the diameter at t=0, 12, 24, 36 and 48 h, we compute the sample mean and sample standard deviation at each time point and plot the data in figure 2(C). Visually we see that the average diameter appears to increase approximately linearly with time, and so we fit a linear model to the data. The cell diameter data and the linear model are shown in figure 2(C). Note that had our experiments been performed over a longer period of time it would be more appropriate to use a different model, such as the logistic growth function, to model the temporal cell size dynamics.

Image analysis
An example of a raw experimental image and a detailed description of the procedure we use to extract density information from that image are summarized in figure 4. Here, the size of the field of view is (L x ×L y )=(1979 μm×1439 μm), as shown in figure 4(A). Throughout this work we use data from four identically-prepared experimental replicates of the scratch assay. Experimental images at the beginning of the experiment for each of the four replicates are shown in figure 5. For completeness, the time series of experimental images for all four identically prepared experiments is given in Supplementary materials (figure A.1), giving a total of 20 experimental images.
To process each experimental image we first separate the background of the image from the cells using Ilastik [45]. Ilastik is a machine learning tool that enables automatic object identification, and allows us to separate the cells in each image from the background. An example of a grey scale segmented image is shown in figure 4(B). A visual comparison of the raw image in figure 4(A) with the segmented image in figure 4(B) confirms that the identification of cells from the background in the image is accurate. Since the density of cells in the original images is independent of the vertical coordinate, as shown in figure 5, we divide each image into 40 equally-spaced columns, 49.5 μm wide each. We then use CellProfiler to automatically estimate the number of cells per column [46]. With this data we divide the number of cells per column by the area of each column to give an estimate of the cell density across the horizontal coordinate of the experimental images.
A typical column-averaged cell density profile, shown in figure 4(D), summarizes the spatial variations in density as a function of the horizontal coordinate. We employ this technique to extract density profiles at each time point, t=0, 12, 24, 36, and 48 h, for each experimental replicate. The data presented in figure 4(D) shows the average cell density that we associate with the centre of each column. This particular density profile is relatively noisy because it is associated with the single image in figure 4(A). Before we proceed, we discard density data from the two right-most columns of each image because the time label and scale bar are automatically superimposed on the IncuCyte ZOOM TM images and these objects partially obscure the numbers of cells in these subregions of each images. We also discard the left-most column from each image, which leaves us with a slightly smaller image that is discretized into 37 equally-spaced columns of width 49.5 μm, and a reduced domain width of 1831 μm. Next, to reduce the magnitude of the fluctuations in the cell density profile, we average the density profiles associated with each of the four experimental replicates to obtain a single averaged cell density profile as a function of time, as summarized in figure 6. For completeness, the relatively noisy column-averaged cell density profiles associated with each of the four individual experimental replicates are given in Supplementary materials (figure C.1). Now that we have quantified our experimental observations in terms of the temporal variation in the column-averaged cell density profiles, further averaged across four identically prepared experimental replicates, we will now attempt to use a suite of discrete mathematical models to mimic the experimental data set. To provide the most realistic discrete simulations, we always take care to initialize each discrete simulation using the exact same number and locations of cells that are present in the experimental images at t=0 h (figure 5).

Discrete stochastic model
To simulate our experimental data set we use a twodimensional discrete model of cell motility that incorporates random motility, intercellular interactions including both long range cell-to-cell adhesion and short range cell pushing, as well as capturing dynamic changes in cell size. We refer to this model as Model I since it describes the situation where all four processes are acting simultaneously. We choose to work with a discrete modelling framework because discrete individual-based approaches are more natural to compare with experimental images than continuum models [47,48]. Such discrete individual-based models are used to study a range of cell biology phenomena, including malignant invasion [49], wound healing [50], self-organization [35], angiogenesis [51] and embryonic development [25]. Since we work with an off-lattice discrete framework, each agent is allowed to move in any direction on a continuous domain. This off-lattice approach is more realistic than a simpler lattice-based model where the locations of agents are restricted to an artificial lattice structure [13,35,52,53].
We begin by introducing Model I and then describe three simplifications of this model in which we systematically neglect certain features. To explore the relevance of the mechanisms inherent in these four models we carefully calibrate each model to match the density data summarized in figure 6 and quantitatively compare the results of the calibration procedure.
2.3.1. Model I: random motility, long range cell-to-cell adhesion, short range cell pushing and dynamic changes in cell size A key novelty of our approach is that we simulate the dynamic change in agent diameter, δ(t). This approach is very different to standard approaches where agents in discrete models are thought of as having either no size or a constant size. Here, to match our experimental measurements presented in figure 2(C), we assume that δ(t) increases linearly, where t is time, measured in hours. We employ a Langevin stochastic framework [54], where a population of N cells is modelled by a system of N first order stochastic differential equations where u (i) is the position vector of the ith agent on a twodimensional domain, F ij is the deterministic interaction force between agents i and j that are separated by distance r [55], and x i is a random stochastic force exerted on the ith agent. The stochastic force x i is sampled from the Gaussian distribution [54] with zero mean and variance 2D/Δt, where D is the diffusivity, and Δ t is the time step used to numerically solve equation (2). Since the Langevin equation formalism does not include any inertial forces, this framework implicitly neglects agent acceleration. This assumption is reasonable at low Reynolds numbers, and is routinely invoked at cellular length scales [56]. The model given by equation (2) does not address possible directional bias near the edge of a wound resulting from the chemical signalling responses to wounding [57]. Including directional bias and chemotaxis will result in additional free parameters in the system [31], that we have little prior knowledge about, which, in turn, will make the parameter estimation problem significantly harder. Increasing number of parameters in the model requires increased complexity of the experimental data to avoid ambiguities in the parameter estimation [58]. As such, we focus on describing a cell population using the fundamental model where the cell migration is mediated by a directional movement due to cell-to-cell adhesive and repulsive forces, F ij , and the undirected stochastic force, x i The details of the deterministic interaction force, F ij , can be chosen to incorporate a range of relevant phenomena such as long range attraction and short range repulsion [59], as illustrated in figure 7. In this work we specify that the interaction force, F ij , depends upon the distance between agents, r, and time, t, and is given by  Previously, the interaction forces between cells in the epithelial monolayer have been inferred using a discrete framework in [48], where the authors explicitly demonstrate the significance of population pressure and mechanically driven population spreading [60]. The key novelty in the expression for the force function, equation (3), is the time dependent cell-to-cell interaction potential, Z(r, t), figure 7. This approach allows us to explicitly parameterise the dynamic change in cell size and simulate mechanically driven cell pushing, which is in contrast to many models of cell motility and adhesion present in the literature where cells have constant size [61]. The cell-to-cell pushing is expressed by a combination of cell-to-cell repulsive forces incorporated in the force function, equation (3), dynamical change in cell size, and cell random motility. Here, we address basic phenomenological principles of cell-to-cell adhesion without considering biochemical pathways that facilitate adhesion, such as E-cadherin protein junctions [62,63].
In this work we use the dimensionless function Z(r, t) to incorporate three main features of cell-to-cell interactions: (i) short range repulsion forces which mimic cell pushing; (ii) longer range attraction forces which mimic cell-to-cell adhesion; and (iii) dynamic changes in agent size. The short range repulsion forces can be interpreted as cell resistance to deformation, which leads to crowding and volume exclusion effects [64]. In contrast, the longer range attraction forces mimic intercellular attraction. These cell-to-cell attraction forces are thought to be a Figure 7. Schematic showing how short and long range forces are introduced in the model through the dimensionless force function, Z(r, t). A-B: green circles denote isolated agents that are unaffected by cell-to-cell interactions and, as a consequence, undergo random migration. Red circles indicate agents that are sufficiently close to other agents that they interact with them. Comparing the schematics in A and B shows that the increase in agent size leads to additional agent-to-agent interactions because agent growth reduces the distance between agents. The arrows in A and B indicate the direction of the deterministic forces F ij . C-D: dimensionless force function Z(r, t) for t=0 h and t=48 h. δ 0 is the agent size at the start of the experiment, t=0 h. The red shaded regions indicate a sufficiently small distance between agents, r3δ(t), where agent-to-agent interactions are present. The green shaded regions indicate a sufficiently large distance between agents, r>3δ(t), so that there is no interaction.
predominant factor in the cell-to-cell adhesion [65]. To incorporate these effects we use a modified Morse potential [66], where δ(t) is the time-dependent agent diameter, a>0 is a parameter that controls the shape of the force function, and ( ) 2is the Tersoff cutoff function [67] which ensures a finite range of interactions. The cell-to-cell interaction range is finite, and set to three agent diameters [54]. As such, the interaction force is zero for separation distances of greater than three agent diameters. For all results presented here we set a=0.08 which implies that the agents are relatively rigid and highly unlikely to overlap [66].
Schematics showing the key features of Z(r, t) at t=0 h and t=48 h are shown in figures 7(C)-(D).
Here we see that at short separation distances we have strong positive Z(r, t), which captures short range repulsion and pushing, owing to finite size effects. Over longer separation distances we have smaller negative Z(r, t) which models attraction, such as adhesion. Finally, over sufficiently large distances we have no interactions as Z(r, t)=0. To capture the effects of the increase in cell size, all length scales in figures 7(C)-(D) are given in terms of the average cell diameter, δ(t), which can vary with time. In this work we use a linear function for δ(t) because this matches our experimental observations, however other functional forms for δ(t) are possible. In all simulations we apply the Langevin model on a domain of size 1 831.5 μm×1439 μm, which is the size of the experimental field of view after the boundary columns have been neglected. To simulate Models I-IV we must apply appropriate boundary conditions to reflect the conditions relevant to the experiment. To determine these boundary conditions, we note that the experimental images, shown in figures 2(A)-(B), correspond to a field of view that is much smaller than the spatial extent of the experiment. For example, the width of the field of view in figures 2(A)-(B) is 1979 μm, which is much smaller than the diameter of the well in the 96well plate (9000 μm), as shown in figure 3. The schematic in figure 3 is important because it emphasizes that the images from this kind of assay only show a small proportion of the population of cells present in the well. In particular, it is important to remember that the boundaries around the field of view are not physical boundaries since the spatially uniform population of cells extends far beyond the boundaries around the field of view. This means that whenever the density is below confluence, cells will migrate, in each direction, across the boundaries of the field of view. However, since the population of cells is placed uniformly into each well of the 96-well plate, the net flux of cells across the boundaries of the field of view will be approximately zero owing to symmetry. To justify zero net flux boundary conditions we also count a number of cells in each experimental replica at t=0, 12,24,36, and 48 hours (figure B.1, Supplementary materials). These additional results demonstrate that cell counts remain approximately constant with average fluctuations of 8.1% of cell count at t=0 h across four experiments. Similar experimental set up without Mitomycin-C pretreatment would have resulted in about 100% increase in cell count due to proliferation. Therefore, we impose zero net flux boundary conditions around all boundaries of the field of view [68]. We implement these boundary conditions by simply aborting any potential movement event that would take a particular agent across one of the boundaries.

Results and discussion
To quantitatively compare the suitability of Models I-IV to describe our experimental data set, we calibrate each model to provide the best match to the experimental density data. For Model I our aim is to estimate the two model parameters, (D, f 0 ), that lead to the best match with the experimental measurements. To facilitate this we introduce a measure of the discrepancy between the experimental data and the model solution,  , correspond to the averaged experimental density, where the average is taken across all four identically-prepared experimental replicates. We find that it is necessary to estimate E(D, f 0 ) using averaged experimental data, rather than working with the four experimental data sets separately since the fluctuations in the data from the individual replicates lead to sufficiently large fluctuations in our estimates of E(D, f 0 ). We will refer to the function E(D, f 0 ) as an error surface. We will visualize the surface and seek to find values of D and f 0 that minimize E(D, f 0 ), and we denote these estimates asD andf 0 , respectively. All results, for Models I-IV, are obtained by numerically integrating the governing equations in two-dimensional space using a forward Euler method with a constant time step of duration δ t=0.02 h. This choice of time step is sufficiently small that our results are grid-independent. We then construct one-dimensional density distributions from the model output, i j model , using the same procedure that is used to convert the distribution of cells in the two-dimensional experimental images into onedimensional density profiles.
Since we have access to four identically prepared initial conditions for our experimental data set, each time we attempt to match the models with the averaged experimental data we repeat the process four times using the four different choices of initial conditions associated with experimental data sets A, B, C and D. This approach means that we can estimate and visualize the error surface four times for each particular model. A mathematical model that is compatible with the data ought to lead to estimates of D and f 0 that are consistent across the four initial conditions, and so we will examine Models I-IV to see whether they are capable of providing parameter estimates that are consistent across the four different initial conditions.
Previous modelling studies based on using reaction-diffusion equations indicate that estimates of dif- . Therefore, our choice of using  = 100 realizations leads to averaged discrete density profiles with fluctuations that are approximately one order of magnitude smaller than fluctuations in the experimental data.
Preliminary estimates ofD andf 0 are obtained by evaluating the error surface across a relatively coarse discretisation of the parameter space. These estimates are shown in figures 8(A)-(D) as red circles. We then refine our estimates ofD andf 0 by considering a To visualize the quality of match between the experimental data and discrete profiles predicted by the stochastic model we use the best-fit parameter estimates (D,f 0 )=(250, 0.06) for experiment D. First, we solve Model I with (D,f 0 )=(250, 0.06) and calculate the density profiles from the simulations as before. Second, we superimpose density profiles from the discrete simulations with the experimental density distributions, as shown in figure 9. The choice of initial conditions in the stochastic model guarantees that we have an exact match between the experimental density profile and the simulation density profiles at t=0 h. However, since we are dealing with stochastic experimental data and a stochastic mathematical model we do not expect there to be an exact match at later times. Results in figures 9(A)-(B) show that we have a reasonable match between the calibrated simulation results and the experimental density data for a single realization of the stochastic model. Similarly, results in figures 9(C)-(D) show that we also have a good match between the simulation results and the experimental density data over 100 identically prepared realizations of the stochastic model where we see that the magnitude of the fluctuations in the averaged stochastic data are reduced.
We now turn our attention to calibrating Model II to match the experimental data. To achieve this we repeat the exact same calibration process except that we implement Model II with constant agent size. Comparing results in figures 9(A)-(D) and 10(A)-(D) shows that the best fit parameters in Model II lead to a larger value of E(D, f 0 ). Furthermore, comparing estimates ofD andf 0 for Model II between the four identically prepared experimental data sets shows that we have a much higher degree of variability between our parameter estimates for Model II than we did for Model I. Overall, these results suggest that Model I is more consistent with our experimental data than Model II, and so we do not proceed with any further refinement of our parameter estimates for Model II. This result shows that the traditional approach of neglecting the dynamical changes in cell size has clear impact on the ability of the model to describe the behaviour of the entire cell population.
To visualize the quality of match between the experimental data and best-fit density profiles predicted by Model II we use the best-fit parameter estimates (D,f 0 )=(500, 0.3) for experimental replicate D. Again, we solve Model II with these parameter Finally, we calibrate Models III and IV to match the experimental data. Note that Model III neglects the role of random motility so our parameter estimation involves just one parameter, f 0 . Similarly, Model IV neglects the role of intercellular forces so our  parameter estimation involves just one parameter, D. Following a now familiar procedure, we compute measures of discrepancy, E( f 0 ) and E(D), for Models III and IV, respectively. Results in figure 12(A) show a well-defined minimum for each of the four experimental replicates for Model III, however the best-fit  estimates are in the range 0.07f 0 0.1 μm/h, which is approximately double the estimate we identified previously for Model I. In contrast, results in figure 12(B) show that we have relatively poorlydefined minimum for all four experimental replicates for Model IV. In this case the best-fit estimates are in the range 700D1000 μm 2 /h which is approximately four times greater than the estimates we obtained for Model I.
Results in figure 13 compare the discrete density profiles obtained using Model III parameterized with the best-fit estimate f 0 =0.09 μm h −1 and the experimental density profiles from experiment D. We note that Model III is deterministic and produces the same density distribution regardless of the number of model realizations. The quality of match between calibrated Model III and the experimental data for experimental replicate D is reasonable, however the value of (¯) E f 0 is greater than the value of (¯¯) E D f , 0 for Model I, thereby indicating that Model I produces an improved match to the experimental data. Results in figure 14 compares discrete density profiles obtained using Model IV parameterized with the best-fit estimate D= 700 μm 2 h −1 where we see no improvement in the quality of match between the calibrated mathematical model and the experimental data relative to Model 1.
All results presented in this section of the main document focus on comparing the quality of match between Models I-IV and the experimental data using experimental replicate D. Similar comparisons between the best-fit solution of Models I-IV and experimental data from experimental replicates A, B and C are given in Supplementary materials (figures D.1-D.12). These additional comparisons are consistent with the comparisons made here in the main document.

Conclusions
In this work we use a combined experimentalmathematical modelling approach to quantitatively explore the contribution of cell-to-cell pushing and random motility in driving the movement of cell fronts. We perform a series of IncuCyte ZOOM TM scratch assay experiments in which cells are pretreated with the chemotherapy drug, Mitomycin-C. This approach is useful because Mitomycin-C suppresses proliferation, thereby allowing us to focus on the role of cell migration in the experiments.
We quantitatively assess the role of cell-to-cell interactions, including short range pushing and longer range adhesion, by calibrating an off-lattice discrete stochastic model to match our experimental data set. The mathematical model that we use accounts for random cell motility, long range cell-to-cell attraction (adhesion), short range cell-to-cell repulsion (pushing) and dynamic cell size changes. We refer to this model as Model I. To explore the significance of these various cell-level mechanisms we systematically repeat the model calibration process for a range of simpler, more commonly used models. These simplified models account for: (i) random cell motility, long range cell-to-cell attraction (adhesion) and short range cellto-cell repulsion (pushing) without any dynamical changes in cell size (Model II); (ii) long range cell-tocell attraction (adhesion) and short range cell-to-cell repulsion (pushing) (Model III); and (iii) random cell motility only (Model IV ).
The novelty in our work resides in the use of the time-dependent cell-to-cell interaction function, equation (3), that allows us to model the dynamical increase in the cell size during the experiments. In contrast, in most of the existing literature it is commonly assumed that cell size remains constant over the time scale of the experiments. We focus on examining the role of the cell-to-cell pushing as a result of combined effects of cell-to-cell repulsion, motility, and dynamic increases in cell size. The role of cell-to-cell pushing is not normally discussed when individualbased models are applied to the experimental data. Here, we use experimental data with cells that grow in size as a result of Mitomycin-C treatment to suppress proliferation. As such, the effects of pushing are much more pronounced in our experiments, as opposed to other experiments where the dynamic change in cell size is not as important.
The phenomenological principles of cell-to-cell interactions modelled in our work are applicable to a range of cell types. However, different cell types can have different balances of mechanisms. For example, epithelial skin cells are normally tightly packed in thin sheets with strong cell-to-cell bonds, while mesenchymal stem cells are highly motile. Consequently, applying our model to a specific cell type would require specific adjustments. Also, we do not consider biochemical pathways of cell-to-cell adhesion and motility, such as E-Cadherin junctions, trailblazing and leader-follower interactions. The leader-follower interactions have a potential to severely influence estimates of the leading edge position, which is normally used to evaluate the rate of spatial spreading of the cell colony. Incorporating these effects is theoretically possible, however, they are not always pronounced in the experimental results and are case specific.
The crucial advantage of our framework is that even in the most advanced Model I cell motility is parametrised by only two free parameters. This is important because we are able to explore parameter space reasonably efficiently. In contrast, many discrete models presented in the literature are not as much plausible for calibrating experimental data while addressing same aspects of individual-level behaviour. We note that there is still a certain degree of freedom associated with the choice of cell-to-cell interaction function. However, our choice is fairly typical and is suitable for our purposes.
Comparing the calibration of these four models to our experimental data provides insight into which model provides the most faithful representation of the experimental observation. Comparing estimates of (¯¯) E D f , 0 between the four models shows that Model I provides the best match to the experimental data. This result suggests that properly accounting for random motility, intercellular forces, including short range pushing and longer range attraction, as well as dynamic changes in cell size, are important for this fairly typical experimental protocol. In contrast, calibrating Models II-IV to the data always provides estimates of model parameters that give the best match to the experimental data, but this does not mean that these simpler frameworks are the best model of the underlying biological processes. This is an important result because often in the mathematical biology literature a single type of model will be used to mimic an experiment, without investigation of the more important question of whether that model provides a reasonable description of the underlying biological processes. Here, by systematically comparing the performance of four different, but related, mathematical models to our novel experimental data set, we provide insight into the underlying biological mechanisms in a way that is not possible when working with a single mathematical model in isolation. This approach of calibrating multiple competing mathematical models to match a single data set is a useful way to provide insight into the underlying biological processes, as well as providing insight into the important question of model selection [42,69].
There are many ways in which our study could be extended to provide additional insight. A key simplifying assumption that we make in our modelling of the experiments is that we give all agents in the simulations the same size at the beginning of the simulation, δ(0)=δ 0 . While our estimates of this initial size are based on experimental estimates given in figure 2(C), our approach is to represent the distribution of observed cell sizes using a sample mean. A close examination of the experimental images in figure 5 shows that there is considerable variability in the distribution of cell sizes at the beginning of the experiment. This variability is captured in the error bars in figure 2(C) but neglected in our analysis. Therefore, a reasonable extension of the current analysis would be to incorporate this initial variability into the stochastic models with a view to understanding how this initial variability influences the population-level motion of the cell fronts. In our work we do not analyse cell-to-cell correlations or spatial structure [70]. Previously, velocity-velocity correlations have been used to study cell swarming and mechanical waves during tissue expansion and similar analysis could be applied to the experimental data used in our work [44,71,72]. We leave these extensions for future consideration.