Pushing coarse-grained models beyond the continuum limit using equation learning

Mathematical modelling of biological population dynamics often involves proposing high-fidelity discrete agent-based models that capture stochasticity and individual-level processes. These models are often considered in conjunction with an approximate coarse-grained differential equation that captures population-level features only. These coarse-grained models are only accurate in certain asymptotic parameter regimes, such as enforcing that the time scale of individual motility far exceeds the time scale of birth/death processes. When these coarse-grained models are accurate, the discrete model still abides by conservation laws at the microscopic level, which implies that there is some macroscopic conservation law that can describe the macroscopic dynamics. In this work, we introduce an equation learning framework to find accurate coarse-grained models when standard continuum limit approaches are inaccurate. We demonstrate our approach using a discrete mechanical model of epithelial tissues, considering a series of four case studies that consider problems with and without free boundaries, and with and without proliferation, illustrating how we can learn macroscopic equations describing mechanical relaxation, cell proliferation, and the equation governing the dynamics of the free boundary of the tissue. While our presentation focuses on this biological application, our approach is more broadly applicable across a range of scenarios where discrete models are approximated by approximate continuum-limit descriptions.


Introduction
Mathematical models of population dynamics are often constructed by considering both discrete and continuous descriptions, allowing for both microscopic and macroscopic details to be considered [1].This approach has been applied to several kinds of discrete models, including cellular Potts models [2][3][4][5], exclusion processes [6][7][8][9], mechanical models of epithelial tissues [10][11][12][13][14][15][16][17], hydrodynamics [18,19], and a variety of other types of individual-based models [1,[20][21][22][23][24][25][26][27].Continuum models are useful for describing collective behaviour, especially because the computational requirement of discrete models increases with the size of the population, and this can become computationally prohibitive for large populations, which is particularly problematic for parameter inference [28].In contrast, the computational requirement to solve a continuous model is independent of the population size, and generally requires less computational overhead than working with a discrete approach only [15].Continuum models are typically obtained by coarse-graining the discrete model, using Taylor series expansions to obtain continuous partial differential equation (PDE) models that govern the population densities on a continuum or macroscopic scale [10,11,29,30].
One challenge with using coarse-grained continuum limit models is that while the solution of these models can match averaged data from the corresponding discrete model for certain choices of parameters [10,17,31], the solution of the continuous model can be a very poor approximation for other parameter choices [13,15,32,33].More generally, coarse-grained models are typically only valid in certain asymptotic parameter regimes [31,33,34].For example, suppose we have a discrete space, discrete time, agent-based model that incorporates random motion and random proliferation.Random motion involves stepping a distance ∆ with probability P m ∈ [0, 1] per time step of duration τ .The stochastic proliferation process involves undergoing proliferation with probability P p ∈ [0, 1] per unit time step.The continuum limit description of this kind of discrete process can be written as [33] where q is the macroscopic density of individuals, D(q) is the nonlinear diffusivity that describes the effects of individual migration, and R(q) is a source term that describes the effects of the birth process in the discrete model [33].Standard approaches to derive (1) require D(q) = O(P m ∆ 2 /τ ) and R(q) = O(P p /τ ) in the limit that ∆ → 0 and τ → 0. To obtain a well-defined continuum limit such that the diffusion and source terms are both present in the macroscopic model, some restrictions on the parameters in the discrete model are required [33,34].Typically, this is achieved by taking the limit as ∆ → 0 and τ → 0 jointly such that the ratio ∆ 2 /τ remains finite, implying that P p = O(τ ) so that both the diffusion and source terms in (1) are O (1).In practice, this means that the time scale of individual migration events has to be much faster than the time scale of individual proliferation events, otherwise the continuum limit description is not well defined [33,34].If this restriction is not enforced, then the solution of the continuum limit model does not always predict the averaged behaviour of the discrete model [33], as the terms on the right-hand side of (1) are no longer O(1) so that the continuum limit is not well defined [34].Regardless of whether choices of parameters in a discrete model obey the asymptotic restrictions imposed by coarse-graining, the discrete model still obeys a conservation principle, which implies that there is some alternative macroscopic conservation description that will describe population-level features of interest [35,36].Equation learning is a means of determining appropriate continuum models outside of the usual continuum limit asymptotic regimes.Equation learning has been used in several applications for model discovery.In the context of PDEs, a typical approach is to write ∂q/∂t = N (q, D, θ), where q is the population density, N is some nonlinear function parametrised by θ, D is a collection of differential operators, and θ are parameters to be estimated [37].This formulation was first introduced by Rudy et al. [37], who extended previous work in learning ordinary differential equations (ODEs) proposed by Brunton et al. [38].Equation learning methods developed for the purpose of learning biological models has also been a key interest [39,40].Lagergren et al. [41] introduce a biologically-informed neural network framework that uses equation learning that is guided by biological constraints, imposing a specific conservation PDE rather than a general nonlinear function N .Lagergren et al. [41] use this framework to discover a model describing data from simple in vitro experiments that describe the invasion of populations of motile and proliferative prostate cancer cells.VandenHeuvel et al. [42] extend the work of Lagergren et al. [41], incorporating uncertainty quantification into the equation learning procedure through a bootstrapping approach.Nardini et al. [32] use discrete data from agent-based models to learn associated continuum ODE models, combining a user-provided library of functions together with sparse regression methods to give simple ODE models describing population densities.Regression methods have also been used as an alternative to equation learning for this purpose [43].
These previous approaches to equation learning consider various methods to estimate the parameters θ, such as sparse regression or nonlinear optimisation [32, 37-39, 41, 42], representing N as a library of functions [37][38][39], neural networks [41], or in the form of a conservation law with individual components to be learned [41,42].In this work, we introduce a stepwise equation learning framework, inspired from stepwise regression [44], for estimating θ from averaged discrete data with a given N representing a proposed form for the continuum model description.We incorporate or remove terms one at a time until a parsimonious continuum model is obtained whose solution matches the data well and no further improvements can be made to this match.Our approach is advantageous for several reasons.Firstly, it is computationally efficient and parallelisable, allowing for rapid exploration of results with different discrete parameters and different forms of N for a given data set.Secondly, the approach is modular, with different mechanistic features easily incorporated.This approach enables extensive computational experimentation by exploring the impact of including or excluding putative terms in the continuum model without any great increase in computational overhead.Lastly, it is easy to examine the results from our procedure, allowing for ease of diagnosing and correcting reasons for obtaining poor fitting models, and explaining what components of the continuum model are the most influential.We emphasise that a key difference between our approach and other work, such as the methods developed by Brunton et al. [38] and Rudy et al. [37], is that we constrain our problem so that we can only learn conservation laws rather than allow a general form through a library of functions, and that we iteratively eliminate variables from θ rather than use sparse regression.These important features are what support the modularity and interpretability of our approach.
To illustrate our procedure, we consider a discrete, individual-based one-dimensional toy model inspired from epithelial tissues [10,17].Epithelial tissues are biological tissue composed of cells, organised in a monolayer, and are present in many parts of the body and interact with other cells [45], lining surfaces such as the skin and the intestine [46].They are important in a variety of contexts, such as wound healing [47,48] and cancer [49,50].Many models have been developed for studying their dynamics, considering both discrete and continuum modelling [10][11][12][13][14][15][16], with most models given in the form of a nonlinear reaction-diffusion equation with a moving boundary, using a nonlinear diffusivity term to incorporate mechanical relaxation and a source term to model cell proliferation [12,13,16].These continuum limit models too are only accurate in certain parameter regimes, becoming inaccurate if the rate of mechanical relaxation is slow relative to the rate of proliferation [13,15,33].To apply our stepwise equation learning procedure, we let the nonlinear function N be given in the form of a conservation law together with equations describing the free boundary.We demonstrate this approach using a series of four biologically-motivated case studies, considering problems with and without a free boundary, and with and without proliferation,with each case study building on those before it.The first two case studies are used to demonstrate how our approach can learn known continuum limits, while the latter two case studies show how we can learn improved continuum limit models in parameter regimes where these known continuum limits are no longer accurate.We implement our approach in the Julia language [51], and all code is available on GitHub at https://github.com/DanielVandH/StepwiseEQL.jl.

Mathematical model
Following Murray et al. and Baker et al. [10,16], we suppose that we have a set of nodes x 1 , . . ., x n (t) describing n cell boundaries at a time t.The interval (x i (t), x i+1 (t)) represents the ith cell for i = 1, . . ., n−1, where we fix x 1 = 0 and x 1 < x 2 (t) < • • • < x n (t).The number of nodes, n, may increase over time due to cell proliferation.We model the mechanical interaction between cells by treating them as springs, as indicated in Figure 1, so that each node i experiences forces F i,i±1 from nodes i ± 1, respectively, except at the boundaries where there is only one neighbouring force.We further assume that each of these springs has the same mechanical properties, and that the viscous force from the surrounding medium is given by ηdx i (t)/dt with drag coefficient η.Lastly, assuming we are in a viscous medium so that the motion is overdamped, we can model the dynamics of each individual node x i (t), fixing x 1 = 0, by [16] where is the interaction force that the ith node experiences from nodes i ± 1 (Figure 1).In Case Studies 1 and 3 (see Section 3, below), we hold x n (t) = L constant and discard (3).Throughout this work, we use linear Hookean springs so that is the length of the ith cell, k > 0 is the spring constant, and s ≥ 0 is the resting spring length [10]; we discuss other force laws in Appendix E. (c) Proliferation schematic, showing a cell (x i (t), x i+1 (t)) dividing into (x i (t + ∆t), x i+1 (t + ∆t)) and (x i+1 (t + ∆t), x i+2 (t + ∆t)) following a proliferation event, where show schematics for the four case studies considered in the paper, where the first row in each panel is a representation of the initial configuration of cells at t = 0 and the second row a representation at a later time t > 0.
The dynamics governed by ( 2)-(3) describe a system in which cells mechanically relax.Following previous work [10,14,16], we introduce a stochastic mechanism that allows the cells to undergo proliferation, assuming only one cell can divide at a time over a given interval [t, t + ∆t) for some small duration ∆t.We let the probability that the ith cell proliferates be given by G i ∆t, where G i = G(ℓ i ) for some length-dependent proliferation law G(ℓ i ) > 0. As represented in Figure 1(c), when the ith cell proliferates, the cell divides into two equally-sized daughter cells, and the boundary between the new daughter cells is placed at the midpoint of the original cell.Throughout this work, we use a logistic proliferation law G(ℓ i ) = β[1 − 1/(Kℓ i )] with ℓ i > 1/K, where β is the intrinsic proliferation rate and K is the carrying capacity density; we consider other proliferation laws in Appendix E. The implementation of the solution to these equations ( 2)-( 3) and the proliferation mechanism is given in the Julia package EpithelialDynamics1D.jl; in this implementation, if G(ℓ i ) < 0 we set G(ℓ i ) = 0 to be consistent with the fact that we interpret G(ℓ i ) as a probability.We emphasise that, without proliferation, we need only solve (2)-( 3) once for a given initial condition in order to obtain the expected behaviour of the discrete model, because the discrete model is deterministic in the absence of proliferation.In contrast, incorporating proliferation means that we need to consider several identically-prepared realisations of the same stochastic discrete model to estimate the expected behaviour of the discrete model for a given initial condition.
In practice, macroscopic models of populations of cells are described in terms of cell densities rather than keeping track of the position of individual cell boundaries.The density of the ith cell (x i (t), x i+1 (t)) is 1/ℓ i (t).For an interior node x i (t), we obtain a density q i (t) by taking the inverse of the average of the cells left and right of x i (t), giving as in Baker et al. [16].At boundary nodes, we use derived by linear extrapolation of ( 5) to the boundary.The densities in (6) ensure that the slope of the density curves at the boundaries, ∂q/∂x, match those in the continuum limit.We discuss the derivation of (6) in Appendix B. In the continuum limit where the number of cells is large and mechanical relaxation is fast, the densities evolve according to the moving boundary problem [10,16] where q(x, t) is the density at position x and time t, D(q) = −1/(ηq 2 )F ′ (1/q), R(q) = qG(1/q), H(q) = −2qF (1/q)/[ηD(q)], and L(t) = x n (t) is the leading edge position with L(0) = x n (0).The quantity 1/q in these equations can be interpreted as a continuous function related to the length of the individual cells.The initial condition q(x, 0) = q 0 (x) is a linear interpolant of the discrete densities q i (t) of the cells at t = 0. Similar to the discussion of (1), for this continuum limit to be valid so that both D(q) and R(q) play a role in the continuum model, constraints must be imposed on the discrete parameters.As discussed by Murphy et al. [15], we require that the time scale of mechanical relaxation is sufficiently fast relative to the time scale of proliferation.In practice this means that for a given choice of β we must have k/η sufficiently large for the solution of the continuum model to match averaged data from the discrete model.We note that, with our choices of F and G, the functions in (7) are given by For fixed boundary problems we take H(q) = 0 and dL/dt = 0.In Appendix C, we describe how to solve (7) numerically, as well as how to solve the corresponding problem with fixed boundaries numerically.

Continuum-discrete comparison
We now consider four biologically-motivated case studies to illustrate the performance of the continuum limit description (7).These case studies are represented schematically in Figure 1(d)-(g).Case Studies 1 and 3, shown in Figure 1(d) and Figure 1(f), are fixed boundary problems, where we see cells relax mechanically towards a steady state where each cell has equal length.Case Studies 2 and 4 are free boundary problems, where the right-most cell boundary moves in the positive x-direction while all cells relax towards a steady state where the length of each cell is given by resting spring length s.Case Studies 1 and 2 have β = 0 so that there is no cell proliferation and the number of cells remains fixed during the simulations, whereas Case Studies 3 and 4 have β > 0 so that the number of cells increases during the discrete simulations.To explore these problems, we first consider cases where the continuum limit model is accurate, using the data shown in Figure 2, where we show space-time diagrams and a set of averaged density profiles for each problem in the left and right columns of Figure 2, respectively.Case Studies 1 and 3 initially place 30 nodes in 0 ≤ x ≤ 5 and 30 nodes in 25 ≤ x ≤ 30, or equivalently n = 60 with 28 cells in 0 ≤ x ≤ 5 and 28 cells in 25 ≤ x ≤ 30, spacing the nodes uniformly within each subinterval.Case Studies 2 and 4 initially place 60 equally spaced nodes in 0 ≤ x ≤ 5.The problems shown in Figure 2 use parameter values such that the solution of the continuum limit ( 7) is a good match to the averaged discrete density profiles.In particular, all problems use k = 50, η = 1, s = 1/5 and, for Case Studies 3 and 4, ∆t = 10 −2 , K = 15, and β = 0.15.The accuracy of the continuum limit is clearly evident in the right column of Figure 2 where, in each case, the solution of the continuum limit model is visually indistinguishable from averaged data from the discrete model.With proliferation, however, the continuum limit can be accurate when k/η is not too much larger than β, and we use Case Studies 3 and 4 to explore this.
Figure 3 shows further continuum-discrete comparisons for Case Studies 3 and 4 where we have slowed the mechanical relaxation by taking k = 1/5.This choice of k means that D(q) and R(q) are no longer on the same scale and thus the continuum limit is no longer well defined, as explained in the discussion of (1), meaning the continuum limit solutions are no longer accurate.In both cases, the solution of the continuum limit model lags behind the averaged data from the discrete model.In Appendix A, we show the 95% confidence regions for each curve in Figure 3, where we find that the solutions have much greater variance compared to the corresponding curves in Figure 2 where k = 50.
We are interested in developing an equation learning method for learning an improved continuum model for problems like those in Figure 3, allowing us to extend beyond the parameter regime where the continuum limit (7) is accurate.We demonstrate this in Case Studies 1-4 in Section 4 where we develop such a method.

Learning accurate continuum limit models
In this section we introduce our method for equation learning and demonstrate the method using the four case studies from Figures 1-3.Since the equation learning procedure is modular, adding these components into an existing problem is straightforward.All Julia code to reproduce these results is available at https: //github.com/DanielVandH/StepwiseEQL.jl.A summary of all the parameters used for each case study is given in Table 1.

Case Study 1: Fixed boundaries
Case Study 1 involves mechanical relaxation only so that there is no cell proliferation and the boundaries are fixed, implying R(q) = 0 and H(q) = 0 in (7), respectively, and the only function to learn is D(q).Our equation learning approach starts by assuming that D(q) is a linear combination of d basis coefficients {θ 1 , . . ., θ d } and d basis functions {φ 1 , . . ., φ d }, meaning D(q) can be represented as These basis functions could be any univariate function of q, for example the basis could be {φ 1 , φ 2 , φ 3 } = {1/q, 1/q 2 , 1/q 3 } with d = 3.In this work, we impose the constraint that D(q) ≥ 0 for q min ≤ q ≤ q max , where q min and q max are the minimum and maximum densities observed in the discrete simulations, respectively.This constraint enforces the condition that the nonlinear diffusivity function is positive over the density interval of interest.While it is possible to work with some choices of nonlinear diffusivity functions for which D(q) < 0 for some interval of density [52][53][54], we wish to avoid the possibility of having negative nonlinear diffusivity functions and our results support this approach.
Values indicated by a line are not relevant for the corresponding case study.For Case Study 3 and 4, the label "a" refers to the accurate continuum limit case, and "b" refers to the inaccurate continuum limit case.
For Case Study 4, some parameters are given by a set of four parameters, with the ith value of this set referring to the value used when learning the ith mechanism; see Section 44.4 for details.
(7), with R(q) = 0 and H(q) = 0, and expand the spatial derivative term so that we can isolate the θ k terms, where we let q ij denote the discrete density at position x ij = x i (t j ) and time t j .We note that while q ij is discrete, we assume it can be approximated by a smooth function, allowing us to define these derivatives ∂q ij /∂t, ∂q ij /∂x, and ∂ 2 q ij /∂x 2 in (10); this assumption is appropriate since, as shown in Figure 2, these discrete densities can be well approximated by smooth functions.These derivatives are estimated using finite differences, as described in Appendix D. We also emphasise that, while (10) appears similar to results in [37,38], the crucial difference is that we are specifying forms for the mechanisms of the PDE rather than the complete PDE itself; one other important difference is in how we estimate θ, defined below and in (15).We save the solution to the discrete problems ( 2) . ., n} and j ∈ {2, . . ., M }, where n = 60 is the number of nodes and we do not deal with data at j = 1 since the PDE does not apply at t = 0. We can therefore convert (10) into a rectangular matrix problem Aθ = b, where the rth row in A, r = 1, . . ., n(M − 1), corresponding to the point (x ij , t j ) is given by a ij ∈ R 1×d , where with each element of a ij corresponding to the contribution of the associated basis function in (10).Thus, we obtain the system The solution of Aθ = b, given by θ = (A T A) −1 A T b, is obtained by minimising the residual ∥Aθ − b∥ 2 2 , which keeps all terms present in the learned model.We expect, however, just as in (8), that θ is sparse so that D(q) has very few terms, which makes the interpretation of these terms feasible [37,38].There are several ways that we could solve Aθ = b to obtain a sparse vector, such as with sparse regression [37], but in this work we take a stepwise equation learning approach inspired by stepwise regression [44] as this helps with both the exposition and modularity of our approach.For this approach, we first let I = {1, . . ., d} be the set of basis function indices.We let A k denote the set of active coefficients at the kth iteration, meaning the indices of non-zero values in θ, starting with A 1 = I.The set of indices of zero values in θ, I k = I \ A k , is called the set of inactive coefficients.To obtain the next set, A k+1 , from a current set A k , we apply the following steps: 1. Let the vector θ A denote the solution to Aθ = b subject to the constraint that each inactive coefficient θ i is zero, meaning θ i = 0 for i ∈ I \ A for a given set of active coefficients A. We compute θ A by solving the reduced problem in which the inactive columns of A are not included.The vector with With this definition, we compute the sets M + k is the set of all coefficient vectors θ obtained by making each active coefficient at step k inactive one at a time.M − k , is similar to M − k+1 except we make each inactive coefficient at step k active one at a time.We then define is the set of all coefficient vectors obtained from activating coefficients one at a time, deactivating coefficients one at a time, or retaining the current vector θ k .
2. Choose one of the vectors in M k by defining a loss function L(θ): where q(x, t; θ) is the solution of the PDE (7) with R(q) = H(q) = 0 and D(q) uses the coefficients θ in (9), q(x ij , t j ; θ) is the linear interpolant of the PDE data at t = t j evaluated at x = x ij , and ∥θ∥ 0 is the number of non-zero terms in θ.This loss function balances the goodness of fit with model complexity.If, for some θ, D(q) < 0 within q min ≤ q ≤ q max , which we check by evaluating D(q) at n c = 100 equally spaced points in q min ≤ q ≤ q max , we set L(θ) = ∞.With this loss function, we compute the next coefficient vector If θ k+1 = 0, so that all the coefficients are inactive, we instead take the vector that attains the second-smallest loss so that a model with no terms cannot be selected.
3. If θ k+1 = θ k , then there are no more local improvements to be made and so the procedure stops.Otherwise, we recompute A k+1 and I k+1 from θ k+1 and continue iterating.
The second step prevents empty models from being returned, allowing the algorithm to more easily find an optimal model when starting with no active coefficients.We note that Nardini et al. [32] consider a loss based on the regression error, ∥Aθ − b∥ 2 2 , that has been useful for a range of previously-considered problems [32,37,38].We do not consider the regression error in this work as we find that it typically leads to poorer estimates for θ compared to controlling the density errors as we do in (15).
Let us now apply the procedure to our data from Figure 2, where we know that the continuum limit with D(q) = 50/q 2 is accurate.We use the basis functions φ i = 1/q i for i = 1, 2, 3 so that and we expect to learn θ = (0, 50, 0) T .We save the solution to the discrete model at M = 50 equally spaced time points between t 1 = 0 and t M = 5.With this setup, and starting with all coefficients initially active so that A 1 = {1, 2, 3}, we obtain the results in Table 2.The first iterate gives us θ 1 such that D(q) < 0 for some range of q as we show in Figure 4(a), and so we assign L(θ 1 ) = ∞.To get to the next step, we remove θ 1 , θ 2 , and θ 3 one a time and compute the loss for each resulting vector, and we find that removing θ 3 leads to a vector that gives the least loss out of those considered.We thus find A 2 = {1, 2} and θ 2 = (−1.46,47.11, 0) T .Continuing, we find that out of the choice of removing θ 1 or θ 2 , or putting θ 3 back into the model, removing θ 1 decreases the loss by the greatest amount, giving A 3 = {2}.Finally, we find that there are no more improvements to be made, and so the algorithm stops at θ 3 = (0, 43.52, 0) T , which is close to the continuum limit.We emphasise that this final θ 3 is a least squares solution with the constraint θ 1 = θ 3 = 0, thus there is no need to refine θ 3 further by eliminating θ 1 and θ 3 directly in ( 16), as the result would be the same.
Comparing the densities from the solution of the learned PDE with θ = θ 3 with the discrete densities in Figure 5(a), we see that the curves are nearly visually indistinguishable near the center, but there are some visually discernible discrepancies near the boundaries.We show the form of D(q) at each iteration in Figure 4(a), where we observe that the first iterate captures only the higher densities, the second iterate captures the complete range of densities, and the third iterate removes a single term which gives no noticeable difference.
Table 2: Stepwise equation learning results for the density data for Case Study 1: Fixed boundaries using the basis expansion (16), saving the results at M = 50 equally spaced times between t 1 = 0 and t M = 5 and starting with all coefficients active, A 1 = {1, 2, 3}.Coefficients highlighted in blue show the coefficient chosen to be removed or added at the corresponding step.
Step To improve our learned model we introduce matrix pruning, inspired from the data thresholding approach in VandenHeuvel et al. [42], to improve the estimates for θ.Visual inspection of the space-time diagram in Figure 2(a) shows that the most significant density changes occur at early time and near to locations where q changes in the initial condition, and a significant portion of the space-time diagram involves regions where q is almost constant.These regions where q has minimal change are problematic as points which lead to a higher residual are overshadowed, affecting the least squares problem and consequently degrading the estimates for θ significantly, and so it is useful to only include important points in the construction of A. To resolve this issue, we choose to only include points if the associated densities falls between the 10% and 90% quantiles for the complete set of densities, which we refer to by density quantiles; more details on this pruning procedure are given in Appendix D. This choice of density quantiles is made using trial and error, starting at 0% and 100%, respectively, and shrinking the quantile range until suitable results are obtained.When we apply this pruning and reconstruct A, we obtain the improved results in Table 3 and associated densities in Figure 5(b).Compared with Table 2, we see that the coefficient estimates for θ all lead to improved losses, and our final model now has θ = (0, 49.83, 0) T , which is much closer to the the continuum limit, as we see in Figure 5 Step 1 Step 2 Step 3  3 using matrix pruning.
everywhere.Moreover, we show in Figure 4(b) how D(q) is updated at each iteration, where we see that the learned nonlinear diffusivity functions are barely different from the expected continuum limit result.These results demonstrate the importance of only including the most important points in A.

Case Study 2: Free boundaries
Case Study 2 extends Case Study 1 by allowing the right-most cell boundary to move so that H(q) ̸ = 0. We do not consider proliferation, giving R(q) = 0 in (7).
The equation learning procedure for this case study is similar to Case Study 1, namely we expand D(q) as in (9) and constrain D(q) ≥ 0. In addition to learning D(q), we need to learn H(q) and the evolution equation describing the free boundary.In (7), this evolution equation is given by a conservation statement, qdL/dt = −D(q)∂q/∂x with q = q(L(t), t).Here we treat this moving boundary condition more generally by introducing a function E(t) so that at x = L(t) for t > 0. While (17) could lead to local loss of conservation at the moving boundary, our approach is to for the possibility that coefficients in D(q) and E(q) differ and to explore the extent to which this is true, or otherwise, according to our equation learning procedure.We constrain E(q) ≥ 0 so that (17) makes sense for our problem and we expand D(q), H(q), and E(q) as follows The matrix system for θ d = (θ d 1 , . . ., θ d d ) T is the same as it was in Case Study 1 in (12), which we now write as given by A and b in (12), and we can construct two other independent matrix systems for θ h = (θ h 1 , . . ., θ h h ) T and θ e = (θ e 1 , . . ., θ e e ) T .To construct these x 0 10 20 30 q(x,t)   3 using matrix pruning so that densities outside of the 10% and 90% density quantiles are not included.(c) Comparisons of the learned D(q) from Table 2 without pruning, Table 3 with pruning, and the continuum limit from (8).In (a)-(b), the arrows show the direction of increasing time, and the density profiles shown are at times t = 0, 1, 2, 3, 4, 5 in black, red, blue, green, orange, and purple, respectively.matrix systems, for a given boundary point (x nj , t j ) we write where L j = x nj is the position of the leading edge at t = t j .In (19) we assume that L j can be approximated by a smooth function so that dL j /dt can be defined.With (19) we have A h θ h = b h and A e θ e = b e , where with A h ∈ R (M −1)×h and b h ∈ R (M −1)×1 , and with A e ∈ R (M −1)×e and b e ∈ R (M −1)×1 .Then, writing we obtain The solution of Aθ = b is the combined solution of the individual linear systems as A is block diagonal.Estimates for θ d , θ h , and θ e are independent, which demonstrates the modularity of our approach, where these additional features, in particular the leading edge, are just an extra independent component of our procedure in addition to the procedure for estimating D(q).In addition to the new matrix system Aθ = b in ( 23), we augment the loss function (14) to incorporate information about the location of the moving boundary.Letting L(t; θ) denote the leading edge from the solution of the PDE (7) with parameters θ, the loss function is Let us now apply our stepwise equation learning procedure with ( 23) and (24).We consider the data from Figure 2, where we know in advance that the continuum limit with D(q) = 50/q 2 , H(q) = 2q 2 − 0.4q 3 , and E(q) = 50/q 2 is accurate.The expansions we use for D(q), H(q), and E(q) are given by With these expansions, we expect to learn θ d = (0, 50, 0) T , θ h = (0, 2, −0.4,0, 0) T , and θ e = (0, 50, 0) T .We initially consider saving the solution at M = 1000 equally spaced times between t 1 = 0 and t M = 100, and using matrix pruning so that only points whose densities fall within the 35% and 65% density quantiles are included.The results with this configuration are shown in Table 4, where we see that we are only able to learn H(q) = E(q) = 0 and D(q) = 25.06/q 3 .This outcome highlights the importance of choosing an appropriate time interval, since Figure 2(b) indicates that mechanical relaxation takes place over a relative short interval which means that working with data in 0 < t ≤ 100 can lead to a poor outcome.(25), saving the results at M = 1000 equally spaced times between t 1 = 0 and t M = 100, pruning so that densities outside of the 35% and 65% density quantiles are not included, and starting with all terms inactive.Coefficients highlighted in blue show the coefficient chosen to be removed or added at the corresponding step. Step 3 Loss 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -1.40 2 0.00 0.00 25.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.40 We proceed by restricting our data collection to 0 ≤ t ≤ 15, now saving the solution at M = 200 equally spaced times between t 1 = 0 and t M = 15.Keeping the same quantiles for the matrix pruning, the new results are shown in Table 5 and Figure 6.We see that the densities and leading edges are accurate for small time, but the learned mechanisms do not extrapolate as well for t ≥ 15, for example L(t) in Figure 6(b) does not match the discrete data.To address this issue, we can further limit the information that we include in our matrices, looking to only include boundary points where dL/dt is neither too large not too small.We implement this by excluding all points (x nj , t j ) from the construction of (A e , b e ) in (21) such that dL j /dt is outside of the 10% or 90% quantiles of the vector (dL 2 /dt, . . ., dL M /dt), called the velocity quantiles.
Table 5: Stepwise equation learning results for Case Study 2: Free boundaries, using the basis expansions (25), saving the results at M = 200 equally spaced times between t 1 = 0 and t M = 15, pruning so that densities outside of the 35% and 65% density quantiles are not included, and starting with all terms inactive.Coefficients highlighted in blue show the coefficient chosen to be removed or added at the corresponding step. Step 3 Loss 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -3.37 2 0.00 0.00 0.00 0.00 -0.03 0.00 0.00 0.00 0.00 0.00 0.00 -2.37 3 0.00 0.00 0.00 0.00 -0.03 0.00 0.00 0.00 8.74 0.00 0.00 -3.68 4 0.00 47.38 0.00 0.00 -0.03 0.00 0.00 0.00 8.74 0.00 0.00 -4.02 5 0.00 47.38 0.00 8.41 -1.69 0.00 0.00 0.00 8.74 0.00 0.00 -8.14 Implementing thresholding on dL/dt leads to the results presented in Figure 7.We see that the learned densities and leading edges are both visually indistinguishable from the discrete data.Since H(q) and E(q) are only ever evaluated at x = L(t), and q(L(t), t) ≈ 5 for t > 0, we see that H(q) and E(q) only match the continuum limit at q ≈ 5, which means that our learned continuum limit model conserves mass and is consistent with the traditional coarse-grained continuum limit, as expected.We discuss in Appendix E how we can enforce D(q) = E(q) to guarantee conservation mass from the outset, however our approach in Figure 7 is more general in the sense that our learned continuum limit is obtained without making any a priori assumptions about the form of E(q)., and E(q) with the forms from the continuum limit (8).  5 for Case Study 2: Free boundaries, except also using matrix pruning on (A 3 , b 3 ) so points where dL j /dt falls outside of the 10% and 90% velocity quantiles are excluded, giving θ e 1 = 9.42 rather than 8.74.(a) Comparisons of the discrete density profiles (solid curves) with those from the learned PDE (dashed curves), plotted at the times t = 0, 5, 10, 25, 50, 100 in black, red, blue, green, orange, and purple, respectively.The arrow shows the direction of increasing time.(b) As in (a), except comparing the leading edges.(c)-(e) are comparisons of the learned forms of D(q), H(q), and E(q) with the forms from the continuum limit (8).

Case Study 3: Fixed boundaries with proliferation
Case Study 3 is identical to Case Study 1 except that we incorporate cell proliferation, implying R(q) ̸ = 0 in (7).This case is more complicated than with mechanical relaxation only, as we have to consider how we combine the repeated realisations to capture the average density data as well.For this work, we average over each realisation at each time using linear interpolants as described in Appendix D. This averaging procedure gives n k points xij between x = 0 and x = 30 at each time t j , j = 1, . . ., M , with corresponding density value qij .The quantities xij and qij play the same role as x ij and q ij in the previous case studies.
To apply equation learning we note there is no moving boundary, giving H(q) = 0 in (8).We proceed by expanding D(q) and R(q) as follows with the aim of estimating θ d = (θ d 1 , . . ., θ d d ) T and θ r = (θ r 1 , . . ., θ r r ) T , again constraining D(q) ≥ 0. We expand the PDE from (10), as in Section 4(a), and the only difference is the additional term r m=1 φ r m (q ij )θ r m for each point (x ij , t j ).Thus, we have the same matrix as in Section 4(a), denoted A d ∈ R n k (M −1)×d , and a new matrix A r ∈ R n k (M −1)×r whose row corresponding to the point (x ij , t j ) is given by so that the coefficient matrix A is now The corresponding entry for the point ( Notice that this additional term in the PDE adds an extra block to the matrix without requiring a significant coupling with the existing equations from the simpler problem without proliferation.Thus, we estimate our coefficient vectors using the system We can take exactly the same stepwise procedure as in Section 4(a), except now the loss function (14) uses n k , qij , and xij rather than n, q ij , and x ij , respectively.

Accurate continuum limit
Let us now apply these ideas to our data from Figure 2, where we know that the continuum limit with D(q) = 50/q 2 and R(q) = 0.15q − 0.01q 2 is accurate.The expansions we use for D(q) and R(q) are given by and we expect to learn θ d = (0, 50, 0) T and θ r = (0.15, −0.01, 0, 0, 0) T .We average over 1000 identicallyprepared realisations, saving the solutions at M = 501 equally spaced times between t 1 = 0 and t M = 50 with n k = 50 knots for averaging.For this problem, and for Case Study 4 discussed later, we find that working with 1000 identically-prepared realisations of the stochastic models leads to sufficiently smooth density profiles.As discussed in Appendix F, the precise number of identically-prepared realisations is not important provided that the number is sufficiently large; when not enough realisations are taken, the results are inconsistent across different sets of realisations and will fail to identify the average behaviour from the learned model.We also use matrix pruning so that we only include points whose densities fall within the 10% and 90% density quantiles, as done in Section 4(a).The results we obtain are shown in Table 6, starting with all coefficients active.Table 6 shows that we find θ d = (0, 52.97, 0) T and θ r = (0.15, −0.010, 0, 0, 0) T , which are both very close to the continuum limit.Figure 8 visualises these results, showing that the PDE solutions with the learned D(q) and R(q) match the discrete densities, and the mechanisms that we do learn are visually indistinguishable with the continuum limit functions (8) as shown in Figure 8(b)-(c).

Table 6:
Stepwise equation learning results for Case Study 3: Fixed boundaries with proliferation, where the continuum limit is accurate, using the basis expansions (30), saving the results at M = 501 equally spaced times between t 1 = 0 and t M = 50, averaging across 1000 realisations with n k = 50 knots, pruning so that densities outside of the 10% and 90% density quantiles are not included, and starting with all diffusion and reaction coefficients active.Coefficients highlighted in blue show the coefficient chosen to be removed or added at the corresponding step.
(b)-(c) are comparisons of D(q) and R(q) with the forms from the continuum limit (8).

Inaccurate continuum limit
We now extend the problem so that the continuum limit is no longer accurate, taking k = 1/5 to be consistent with Figure 3(a).Using the same basis expansions in (30), we save the solution at M = 751 equally spaced times between t 1 = 0 and t M = 75, averaging over 1000 realisations with n k = 200.We find that we need to use the 25% and 75% density quantiles rather than the 10% and 90% density quantiles, as in the previous example, to obtain results in this case.With this configuration, the results we find are shown in Table 7 and Figure 9.
Results in Table 7 show θ d = (0, 0.12, 0) T , which is reasonably close to the continuum limit with (0, 0.2, 0) T .The reaction vector, for which the continuum limit is (0.15, −0.01, 0, 0, 0) T so that R(q) is a quadratic, is now given by θ r = (0.16, −0.02, 7.49 × 10 −4 , −1.69 × 10 −5 , 0) T , meaning the learned R(q) is a quartic.Figure 9 compares the averaged discrete densities with the solution of the learned continuum limit model.Figure 9(c) compares the learned source term with the continuum limit.While both terms are visually indistinguishable at small densities, we see that the two source terms differ at high densities, with the learned carrying capacity density, where R(q) = 0, reduced relative to the continuum limit.This is consistent with previous results [15].

Case Study 4: Free boundaries with proliferation
Case Study 4 is identical to Case Study 2 except that we now introduce proliferation into the discrete model so that R(q) ̸ = 0 in (7).First, as in Case Study 3 and as described in Appendix D, we average our data across each realisation from our discrete model.This averaging provides us with points xij between x = 0 and x = Lj at each time t j , j = 1, . . ., M , where Lj is the average leading edge at t = t j , with corresponding density values qij , where i = 1, . . ., n k and n k is the number of knots to use for averaging.We expand the functions D(q), R(q), H(q), and E(q) as again restricting D(q), E(q) ≥ 0. The function E(q) is used in the moving boundary condition in (7), as in (17).The matrix A and vector b are given by where A dr = [A d A r ] as defined in (28), A h and A e are the matrices from ( 20) and ( 21), respectively, and similarly for b dr = ∂q/∂t, b h , and b e from ( 12), (20), and (21), respectively.Thus, Similar to Case Study 2, the coefficients for each mechanism are independent, except for θ d and θ r .The loss function we use is the loss function from (24).
With this problem, it is difficult to learn all mechanisms simultaneously, especially as mechanical relaxation and proliferation occur on different time scales since mechanical relaxation dominates in the early part of the simulation, whereas both proliferation and mechanical relaxation play a role at later times.This means D(q) and R(q) cannot be estimated over the entire time range as was done in Case Study 3. To address this we take a sequential learning procedure to learn these four mechanisms using four distinct time intervals I d , I e , I h , and I r : In these steps, solving the system Aθ = b means to apply our stepwise procedure to this system; for these problems, we start each procedure with no active coefficients.The modularity of our approach makes this sequential learning approach straightforward to implement.For these steps, the interval I d must be over sufficiently small times so that proliferation does not dominate, noting that fixing R(q) = 0 will not allow us to identify any proliferation effects when estimating the parameters.This is less relevant for I h and I e as the estimates of θ h and θ e impact the moving boundary only.

Inaccurate continuum limit
We now consider data from Figure 3(b) where the continuum limit is inaccurate.Here, k = 1/5 and the continuum limit vectors are θ d = (0, 0.2, 0) T , θ r = (0.15, −0.01, 0, 0, 0) T , θ h = (0, 2, −0.4,0, 0, 0) T , and θ e = (0, 0.2, 0) T .Using the same procedures and expansions as Figure 10 To apply the equation learning procedure we prune all matrices so that points outside of the 40% and 60% temporal quantiles are eliminated, where the temporal quantiles are the quantiles of ∂q/∂t from the averaged discrete data, and similarly for points outside of the 40% and 60% velocity quantiles.We find θ d = (0, 0.21, 0) T , θ e = (0, 0.23, 0) T , θ h = (−0.15,0, 0, −0.0079, 0) T , and θ r = (0.11, −0.0067, 0, 0, 0) T .Interestingly, here we learn R(q) is quadratic with coefficients that differ from the continuum limit.The results with all these learned mechanisms are shown in Figure 11.We see from the comparisons in Figure 11 that the PDE results from the learned mechanisms are visually indistinguishable from the discrete densities.Moreover, as in Figure 10, the learned H(q) and E(q) match the continuum results at q(L(t), t) which confirms that the learned continuum limit conserves mass, as expected.Note also that the solutions in Figure 11(a) go up to t = 250, despite the stepwise procedure considering only times up to t = 50, demonstrating the extrapolation power of our method., R(q), H(q), and E(q) with the forms from the continuum limit (8).

Conclusion and discussion
In this work, we presented a stepwise equation learning framework for learning continuum descriptions of discrete models describing population biology phenomena.Our approach provides accurate continuum approximations when standard coarse-grained approximations are inaccurate.The framework is simple to implement, efficient, easily parallelisable, and modular, allowing for additional components to be added into a model with minimal changes required to accommodate them into an existing procedure.In contrast to other approaches, like neural networks [41] or linear regression approaches [43], results from our procedure are interpretable in terms of the underlying discrete process.The coefficients incorporated or removed at each stage of our procedure give a sense of the influence each model term contributes to the model, giving a greater interpretation of the results, highlighting an advantage of the stepwise approach over traditional sparse regression methods [32,37,38].The learned continuum descriptions from our procedure enable the discovery of new mechanisms and equations describing the data from the discrete model.For example, the discovered form of D(q) can be interpreted relative to the discrete model, describing the interaction forces between neighbouring cells.In addition, we found in Case Study 4 that, when k = 1/5 so that the continuum limit is inaccurate, the positive root of the quadratic form of the source term R(q) is greater than , R(q), H(q), and E(q) with the forms from the continuum limit (8).
the mean field carrying capacity density K, as seen in Figure 11.This increase suggests that, when the rate of mechanical relaxation is small relative to the proliferation rates, the mean field carrying capacity density in the continuum description can be different from that in the discrete model.We demonstrated our approach using a series of four biologically-motivated case studies that incrementally build on each other, studying a discrete individual-based mechanical free boundary model of epithelial cells [10,11,15,16].In the first two case studies, we demonstrated that we can easily rediscover the continuum limit models derived by Baker et al. [16], including the equations describing the evolution of the free boundary.The last two case studies demonstrate that, when the coarse-grained models are inaccurate, our approach can learn an accurate continuum approximation.The last case study was the most complicated, with four mechanisms needing to be learned, but the modularity of our approach made it simple to apply a sequential procedure to learning the mechanisms, applying the procedure to each mechanism in sequence.Our procedure was able to recover terms that conserved mass, despite not enforcing conservation of mass explicitly.The procedure as we have described does have some limitations, such as assuming that the mechanisms are linear combinations of basis functions, which could be handled more generally by instead using nonlinear least squares [42].The procedure may also be sensitive to the quality of the data points included in the matrices, and thus to the parameters used for the procedure.In Appendix F, we discuss a parameter sensitivity study that investigates this in greater detail.In this parameter sensitivity study, we find that the most important parameters to choose are the pruning parameters.These parameters can be easily tuned thanks to the efficiency of our method, modifying each parameter in sequence and using trial and error to determine suitable parameter values.
There are many avenues for future work based on our approach.Firstly, two-dimensional extensions of our discrete model could be considered [55,56], which would follow the same approach except the continuum problems would have to be solved using a more detailed numerical approximation [57][58][59].Another avenue for exploration would be to consider applying the discrete model on a curved interface which is more realistic than considering an epithelial sheet on a flat substrate [60,61].Working with heterogeneous populations of cells, where parameters in the discrete model can vary between individuals in the population, is also another interesting option for future exploration [14].Uncertainty quantification could also be considered using bootstrapping [42] or Bayesian inference [62].Allowing for uncertainty quantification would also allow for noisy data sets to be modelled, unlike the idealised, noise-free data used in this work.We emphasise that, regardless of the approach taken for future work, we believe that our flexible stepwise learning framework can form the basis of these potential future studies.

Appendix A Confidence bands for inaccurate continuum limits
In the paper, we show in Figure 3 a series of curves for Case Study 3 and Case Study 4 with k = 1/5, finding that the solution to the continuum limit is no longer a good match to the data from the discrete model.Figure 12 shows the confidence bands around each of these curves, showing how the uncertainty evolves over time.

Appendix B Discrete densities at the boundaries
In the paper, we give the following formulae for computing the cell densities from our discrete model at the boundary: noting also that x 1 (t) = 0 in this work.In this section, we derive the expressions for q 1 (t) and q n (t) and show the need for these complicated expressions over those from Baker et al. [16], namely q 1 (t) = 1/(x 2 (t) − x 1 (t)) and q n (t) = 1/(x n (t) − x n−1 (t)), through an example.

B.1 Derivation
We give the derivation for q n (t) only, as q 1 (t) is derived in the same way.We follow the idea from Baker et al. [16], relating the cell index i to the density q according to i(x, t) = 1 + x 0 q(y, t) dy.
Baker et al. [16] use 1 = n − (n − 1) together with this relationship to write q(y, t) dy, and Baker et al. [16] then use a right endpoint rule to approximate q n (t).If we instead use a trapezoidal rule, then xn−1(t) We use this expression to solve for q n (t): which is exactly the formula in (35).We note that an alternative derivation of this formula is to use linear extrapolation, treating the density 1/(x n (t) − x n−1 (t)) as if it were placed at the cell midpoint (x n−1 (t) + x n (t))/2 rather than x n (t).

B.2 Motivation
Let us now give the motivation for why we need the modifications to the boundary densities in ( 35) compared to those given in Baker et al. [16].Consider a mechanical relaxation problem, starting with 30 equally spaced nodes in 0 ≤ x ≤ 5, taking the parameters k = 50, s = 1/5, η = 1 and leaving the right boundary free.Let us compare the discrete densities at t = 2 to those from the continuum limit, as well as estimates of the gradient ∂q/∂x at the right boundary.Figure 13 shows our comparisons.Focusing on the densities at the right boundary of Figure 13(a) gives Figure 13(b), where we can see a clear difference in the slopes of each curve.The curve obtained using the approach of Baker et al. [16], using q n (t) = 1/(x n (t) − x n−1 (t)), has a different slope from the continuum limit, whereas the red curve, using q n (t) = 2/(x n (t)−x n−1 (t))−2/(x n (t)−x n−2 (t)), has a slope that is much closer to the slope of the continuum limit model at this point.These issues become more apparent when we try to estimate ∂q/∂x at the boundary for each time, as we would have to do in our equation learning procedure.Shown in Figure 13(c), we see that the estimates of ∂q/∂x that use q n (t) = 1/(x n (t) − x n−1 (t)) do not resemble what we expect in the continuum limit, namely ∂q/∂x = H(q) = 2q 2 (1 − qs) (using q = q(x n , t), where q(x, t) is the solution from the continuum limit partial differential equation (PDE)).Our new expression for q n (t) gives estimates for ∂q/∂x that are much closer to H(q), with H(q) passing directly through the center of these estimates across the entire time domain.Thus, our revised formulae (35) are necessary if we want to obtain accurate estimates for the boundary gradients.

Appendix C Numerical methods
In this section we give the details involved in solving the PDEs on the fixed and moving domains numerically using the finite volume method [63].We have provided Julia packages FiniteVolumeMethod1D.jl and MovingBoundaryProblems1D.jl to implement these methods for the fixed and moving domains, respectively.

C.1 Fixed domain
We start by considering the fixed domain problem.The PDE we consider is where L is the length of the domain, D(q) is the nonlinear diffusivity function, and R(q) is the source term.
To discretise (37), define a grid 0 = ∆x and ∆x = L/(n − 1).This grid enables us to define control volumes Ω i = [w i , e i ] for each i, where The volumes of these control volumes are denoted V i = e i − w i , i = 1, . . ., n.We then integrate (37) over a single Ω i to give To proceed, let q i = q(x i , t), D i = D(q i ), R i = R(q i ) and define the following approximations: Using the approximations in ( 40), (39) becomes for i = 2, . . ., n − 1.The boundary conditions are x = 0 and x = L are incorporated by simply setting the associated derivative term in (39) to zero, giving The system of ordinary differential equations (ODEs) is thus given by ( 41)-( 43) and defines the numerical solution to (37).In particular, letting q n = (q 1 (t n ), . . ., q n (t n )) T for some time t n , we start with q 0 = (q 0 (x 1 ), . . ., q 0 (x n )) T using the initial condition q(x, 0) = q 0 (x), and then integrate forward in time via ( 41)-( 43).This procedure is implemented in the Julia package FiniteVolumeMethod1D.jl which makes use of DifferentialEquations.jl to solve the system of ODEs with the TRBDF2(linsolve = KLUFactorization()) algorithm [64][65][66].

C.2 Moving boundary problem
We now describe how we solve the PDEs for a moving boundary problem.The PDE we consider is We assume that L(t) > 0 for t ≥ 0. The discretisation starts by transforming onto a fixed domain using the Landau transform ξ = x/L(t) [16,67,68].With this change of variable, (44) becomes To now discretise (45), define ξ i = (i − 1)∆ξ for i = 1, . . ., n, where ∆ξ = 1/(n − 1), and then let We then define a control volume to be the interval Ω i = [w i , e i ] with volume V i = e i − w i , i = 1, . . ., n. Next, the PDE in ( 45) is integrated over this control volume to give ei wi Using integration by parts, the first integral on the right-hand side of ( 47) is simply ei wi ξ ∂q ∂ξ dξ = e i q(e i , t) − w i q(w i , t) − ei wi q dξ.
Next, define the control volume averages and set q i = q(ξ i , t), D i = D(q i ), and R i = R(q i ).With this notation, we define the following set of approximations: qi = q i i = 1, . . ., n, Using these approximations, (47) becomes The last component to handle are the boundary conditions.Since ∂q/∂ξ = 0 at ξ = 0, and since w 1 = ξ 1 = 0, our discretisation at ξ = 0 becomes The boundary condition at ξ = 1 is ∂q/∂ξ = LH(q), thus The remaining boundary condition is the moving boundary condition, qdL/dt = −[E(q)/L]∂q/∂ξ.Since ∂q/∂ξ = LH(q), we can write q n dL/dt = −[E(q n )/L]LH(q n ) = −E(q n )H(q n ), giving The system of ODEs ( 49)-( 52), together with the initial conditions q i (0) = q 0 (ξ i L(0)) for i = 1, . . ., n and L(0) = L 0 , where q 0 (x) and L 0 are the initial conditions, define our complete discretisation.Solving these ODEs over time give values for q(ξ i , t j ), for some t j , which gets translated back in terms of x via x i = ξ i L(t j ).As in the fixed domain case, we solve these ODEs using DifferentialEquations.jltogether with the TRBDF2(linsolve = KLUFactorization()) algorithm [64][65][66].We provide our implementation of this procedure in a separate Julia package, MovingBoundaryProblems1D.jl.

D.2 Derivative estimation
The equation learning system (A, b) requires estimates for the derivatives ∂q ij /∂t, ∂q ij /∂x, ∂ 2 q ij /∂x 2 , and dL j /dt.To give a formula for an estimate of these derivatives, suppose we have three function values {f 1 , f 2 , f 3 } for some function f (x) at the points {x 1 , x 2 , x 3 }, where f i = f (x i ) for i = 1, 2, 3.These points do not need to be equally spaced.The Lagrange interpolating polynomial through this data is given by which can be used to estimate the derivatives via f ′ (x i ) ≈ g ′ (x i ), i = 1, 2, 3, and similarly for f ′′ (x).Using this approximation, we write where ( 57) is valid for i = 1, 2, 3. We can use the formulae ( 54)-( 57) to approximate our required derivatives.For example, taking assuming the times are equally spaced with spacing h.The estimate for dL M /dt is obtained by taking Similarly, taking {x 1 , x 2 , x 3 } = {x i−1,j , x ij , x i+1,j } and {f 1 , f 2 , f 3 } = {q i−1,j , q ij , q i+1,j } gives for i = 2, . . ., n j − 1 and j = 1, . . ., M , where n j is the number of nodes at t = t j .The remaining derivatives can be obtained similarly, ensuring that the appropriate finite difference (backward, central, or forward) is taken for the given point.
The only exception to these rules are for ∂q/∂x at the boundaries.We find that using simple forward and backward differences there gives better results than with ( 54) and ( 55), so we use

D.3 Matrix pruning
We now discuss our approach to matrix pruning, wherein we discard points from our equation learning matrix A that do not help to improve our estimates for θ.The approach we take is inspired from the data thresholding idea from VandenHeuvel et al. [42].
To start with our approach, let q = (q 12 , . . ., q n M M ) T be the vector of all discrete densities, letting n j be the number of nodes at the time t = t j , excluding the densities from the initial condition.Then, take the threshold tolerance 0 ≤ τ q < 1/2 and compute the interval (Q q τq , Q q 1−τq ), where Q y τ denotes the 100τ % quantile of the vector y.With these intervals, we only include a row in the matrix A from a given point (x ij , t j ) if Q q τq ≤ q ij ≤ Q q 1−τq .By choosing the threshold τ q appropriately, we can significantly improve the estimates for θ as we only include the most relevant data for estimation, excluding all points with relatively low or high density.Similar thresholds can be defined for the other quantities |∂q/∂x|, |∂ 2 q/∂x 2 |, |∂q/∂t|, and |dL/dt|, defining these vectors similarly to q, for example |∂ t q| = (|∂ t q 12 |, . . ., |∂ t q n M M |) T , with respective threshold tolerances 0 ≤ τ ∂q/∂x , τ ∂ 2 q/∂x 2 , τ ∂q/∂t , τ dL/dt < 1/2.

Appendix E Additional examples
In this section, we give some additional case studies to further demonstrate our method, exploring different force law and proliferation laws, and enforcing conservation of mass together with a discussion about enforcing equality constraints in general.

E.1 Enforcing conservation of mass
In the paper, we discussed at the end of Case Study 2 that it could be possible to enforce mass conservation to fix the issue with D(q) ̸ = E(q), noting that mass conservation requires D(q(L(t), t)) = E(q(L(t), t)).In this section, we consider the results when we fix D(q) = E(q) so that mass is conserved from the outset.
This change D(q) = E(q) is reasonably straightforward to implement in the algorithm, simply replacing the boundary condition (17) so that q(L(t), t) This constraint D(q) = E(q) also needs to be reflected in the matrix A. This is simple to do in this case.Previously, our matrix system took the block diagonal form With the constraint D(q) = E(q), ( We note that, if we wanted to enforce this constraint in Case Study 4, where A 1 = [A d A r ], with A d and A r defined from (28), then we instead have Let us now consider the results with mass conservation.We use the same parameters that were used to produce the results in Figure 7.In particular, we save the solution at M = 200 equally spaced times between t 1 = 0 and t M = 15, τ q = 0.35, τ dL/dt = 0.1, and we start with all coefficients initially inactive.The results we obtain are shown in Table 8 and Figure 14.We see that the form we learn for D(q), and hence for E(q) also, is close to the continuum limit 50/q 2 , and similarly H(q) is a good match; note that H(q) is only evaluated at the boundary densities, which is approximately 5 for t > 0, so indeed H(q) matches the continuum limit.Looking to Figure 14(a)-(b), the results are indistinguishable from the continuum limit, which is also what we found in Figure 7 before we enforced conservation of mass.

E.1.1 Imposing linear equality constraints generically
We note that this approach to implementing the constraint D(q) = E(q) requiring such a significant change to the matrix system, giving (64), and to the boundary condition (62), might suggest that the modularity of our approach weakens here.This does not need to be the case, and so let us briefly remark about how constraints such as D(q) = E(q), or any other linear constraints, could be alternatively implemented in our approach seamlessly, further demonstrating the modularity.(25), saving the results at M = 200 equally spaced times between t 1 = 0 and t M = 15, pruning with τ q = 0.35 and τ dL/dt = 0.1, starting with all terms inactive, and enforcing conservation of mass with D(q) = E(q).Coefficients highlighted in blue show the coefficient chosen to be removed or added at the corresponding step. Step Loss 1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -3.371 2 0.000 0.000 0.000 0.000 -0.025 0.000 0.000 0.000 -2.371 3 0.000 47.413 0.000 0.000 -0.025 0.000 0.000 0.000 -1.706 4 0.000 47.413 0.000 0.000 0.443 0.000 0.000 -0.004 -0.688 q(x,t) , and E(q) compared to the forms from the continuum limit.
Suppose we take our system Aθ = b, with A ∈ R m×p , θ ∈ R p , and b ∈ R m , and suppose we have constraints of the form Q T θ = c where Q ∈ R p×c and c ∈ R c , where c < p and Q has full rank.The constrained least squares estimator for θ subject to these constraints, denoted θc , is then given by θc where θ = (A T A) −1 A T b is the unconstrained least squares estimator for Aθ = b [69].Using this formulation, imposing D(q) = E(q) is simple to enforce without changing the boundary condition or the matrix A, simply using c = 0 3×1 and where I n and 0 m×n denote the n-square identity matrix and m × n zero matrix, respectively.This does not solve the problem entirely, though, since we also have coefficients that we force to zero throughout the stepwise procedure.These zeros constraints can also be imposed by including additional columns of Q.For example, if θ h 1 and θ d 2 are inactive, then Q becomes where e d 2 = (0, 1, 0) T and e h 1 = (1, 0, 0, 0, 0) T .In particular, each inactive coefficient θ i corresponds to a new column with a one in the row corresponding to that coefficient.Note that Q in (67) can be further written as , where Q 1 are the user-provided constraints D(q) = E(q) and Q 2 are the constraints imposed by the inactive coefficients, making it easy to incorporate constraints in this manner.Additional care is required to ensure that there are no redundant constraints represented by Q 1 and Q 2 as Q must be full rank.For example, imposing θ d 1 = 0 and θ e 1 = 0 together with the constraint θ d 1 = θ e 1 from D(q) = E(q) can be represented using only two constraints rather than three, and the associated matrix where e h 1 = (1, 0, 0) T , only has rank 4 rather than the full rank 5.This could be dealt with by finding a basis for the column space of Q, replacing Q with the corresponding matrix of basis vectors.
To summarise this discussion, it is straightforward to implement our procedure with the ability to enforce linear equality constraints, allowing for additional constraints, such as conservation of mass, to be enforced.This is easy to code without breaking the modularity of the approach and requiring a significant change to the procedure that would be cumbersome to implement by increasing the complexity of the corresponding code.

E.2 A piecewise proliferation law
In this section, we consider the problem described in Section 3.3 of Murphy et al. [15].This problem given by Murphy et al. [15] is used to demonstrate a case where the solution of the continuum limit no longer gives a good match with averaged data from the discrete model, as the value of k used is too low relative to the proliferation rate.Here, we show how our method can learn an accurate continuum model in this case.
The example is as follows.We consider F (ℓ i ) = k(s − ℓ i ) as usual, taking k = 10 −4 and s = 0, but our proliferation law is now given by where ℓ p = 0.2 is the proliferation threshold and β = 10 −2 .We use ∆t = 10 −2 for the proliferation events.The initial condition places n = 41 equally spaced nodes in [0, 10] so that ℓ i = 0.25 at t = 0 for each of the 40 cells.In Figure 15, we show a comparison of the discrete data from this problem with the solution of the continuum limit.We also compare the cell numbers N (t), where the cell numbers from the PDE q(x, t) are obtained via N (t) = 10 0 q(x, t) dx.We see that the densities from the solution of the continuum limit reach a capacity at 50 cells, while the discrete model instead reaches 80 cells.Note that the densities appear jagged in Figure 13 due to the combination of the averaging procedure from Section D.1 with the variance of the densities for moderate t; a better averaging method could be to build the knots at each time t based on the node positions themselves, but we do not consider that here as it does not impact the results.Comparison of the number of cells from the discrete model with that computed from the solution of the continuum limit, using N (t) = 10 0 q(x, t) dx for the continuum limit case.In (a)-(b), the discrete results are averaged over 1000 identically prepared realisations, using n k = 100 knots for the averaging procedure described in Section D.1.
The continuum limit for this problem is D(q) = 10 −4 q 2 and R(q) = 0 q > 1/ℓ p , 10 −2 q q ≤ 1/ℓ p .This suggests one possible basis expansion to use for R(q) in our equation learning procedure, with the aim to learn an appropriate continuum approximation to the results in Figure 15, could be R(q) = θ r 0 + θ r 1 q + θ r 2 q 2 + θ r 3 q 3 I q ≤ 1 ℓ p , where I(A) is the indicator function for the set A. We find that this does not lead to any improved model for this problem, and so we instead consider a polynomial model: R(q) = θ r 0 + θ r 1 q + θ r 2 q 2 + θ r 3 q 3 + θ r 4 q 4 + θ r 5 q 5 .( For D(q), this mechanism does not appear to be relevant in this example, with the results that follow all giving visually indistinguishable regardless of whether D(q) = 0 or D(q) = 10 −4 /q 2 .Thus, we do not bother learning it in this case, simply fixing D(q) = 10 −4 /q 2 ; if we do not fix D(q), we just end up learning D(q) = 0 in the results that follow.With (70) and D(q) = 10 −4 /q 2 , the results we obtain are shown in Table 9 and Figure 16.0.00 0.00 0.00 0.00 0.00 -1.63 2 0.00 0.00 0.00 0.00 0.00 0.00 -1.53 3 0.077 -0.0096 0.00 0.00 0.00 0.00 -6.22 Table 9: Equation learning results for the piecewise proliferation law problem in Figure 15, fixing D(q) = 10 −4 /q 2 and using the expansion of R(q) in (70).The discrete data is averaged over 1000 identically prepared realisations with n k = 100 knots for interpolating, and the solution is saved every 0.1 units of time between t = 0 and t = 500.
x 0 5 10 q(x,t)  The results in Table 9 and Figure 16 show that we have learned R(q) = 0.077 − 0.0096q.
The results in Figure 16(a)-(b) show a good match between the discrete data and the learned PDE solution.
Most interestingly, 16(c), we see that this learned R(q) connects the endpoints of the continuum limit form continuously.In particular, R(q) ≈ β(K − q) = βK(1 − q/K), where K = 8 is the maximum density from the averaged discrete data.We have thus learned an accurate continuum model to describe this problem, originally from Murphy et al. [15], showing that the piecewise continuum limit form of R(q) is more appropriately described by a simple linear model that connects the jumps in R(q).

E.3 Linear diffusion
In this section, we consider an example where we consider a force law that leads to linear diffusion, namely We use k = 20 and s = 1.For the initial condition, we consider a Gaussian initial density q 0 (x) with variance three centered at x = L 0 /2 over 0 ≤ x ≤ L 0 with L 0 = 10, and scaled so that the initial number of cells is 40, meaning 40 = 10 0 q 0 (x) dx.This leads to where N (0) = 40, σ 2 = 3, and erf is the error function.To convert this density into a set of initial cell positions, we consider a set of nodes x 1 , . . ., x 41 with x 1 = 0 and x 41 = L 0 .The interior nodes x(0) = (x 2 (0), . . ., x 40 (0)) T are obtained by solving the optimisation problem x(0) = argmin x∈R 39   41 i=1 (q 0 (x i (0)) − q i ) 2 subject to the constraint 0 < x 2 (0) < • • • < x 40 (0) < L 0 , where q i is the density at x i using our piecewise formulae.This problem is solved using NLopt.jl[70,71].The discrete densities we obtain over 0 ≤ t ≤ 100 are shown in Figure 17, where we also compare the data to the solution of the continuum limit.To apply the equation learning procedure to this problem, we note that we expect D(q) = E(q) = 20, and H(q) = 2q − 2q 2 .We thus consider H(q) = θ h 1 q + θ h 2 q 2 + θ h 3 q 3 + θ h 4 q 4 + θ h 5 q 5 , E(q) = θ e −2 q 2 + θ e −1 q + θ e 0 + θ e 1 q + θ e 2 q 2 .

3 :FreeFigure 1 :
Figure 1: Discrete model and schematics for each case study (CS).(a) A fixed boundary problem with x 1 = 0 and x n = L fixed.(b)A free boundary problem with x 1 = 0 and x n (t) = L(t), show in red, free.(c) Proliferation schematic, showing a cell (x i (t), x i+1 (t)) dividing into (x i (t + ∆t), x i+1 (t + ∆t)) and (x i+1 (t + ∆t), x i+2 (t + ∆t)) following a proliferation event, where x i+1 (t + ∆t) = (x i (t) + x i+1 (t))/2.(d)-(g) show schematics for the four case studies considered in the paper, where the first row in each panel is a representation of the initial configuration of cells at t = 0 and the second row a representation at a later time t > 0.
: Case Study 4: Free boundaries with proliferation

Figure 2 : 4 :Figure 3 :
Figure 2: Space-time diagrams (left column) and densities (right column) for the four case studies from Figure 1 considered throughout the paper.The left column shows the evolution of the discrete densities in space and time, with (c) and (d) showing averaged results over 2500 identically-prepared realisations of the discrete model.In (b) and (d), the red line shows the position of the free boundary.In the figures in the right column, the solid curves are the discrete densities (5) and the dashed curves are solutions to the continuum limit problem (7), and the curves are given by black, red, blue, green, orange, and purple in the order of increasing time as indicated by the black arrows.The times shown are (a) t = 0, 1, 2, 3, 4, 5; (b) t = 0, 5, 10, 25, 50, 100; (c) t = 0, 1, 5, 10, 20, 50; and (d) t = 0, 5, 10, 20, 50, 100.In (c) and (d), the shaded regions show 95% confidence bands from the mean discrete curves at each time; the curves in (a) and (b) show no shaded regions as these models have no stochasticity.
(b)  where the solution curves are now visually indistinguishable : D(q) with pruning

Figure 4 :
Figure 4: Progression of D(q) over each iterate for Case Study 1: Fixed boundaries.(a) Progression from the results in Table 2 (dashed curves).(b) As in (a), except with the results from Table3using matrix pruning.

Figure 5 :
Figure 5: Stepwise equation learning results for Case Study 1: Fixed boundaries.(a) Comparisons of the discrete density profiles (solid curves) with those learned from PDEs obtained from the results in Table 2 (dashed curves).(b) As in (a), except with the results from Table3using matrix pruning so that densities outside of the 10% and 90% density quantiles are not included.(c) Comparisons of the learned D(q) from Table2without pruning, Table3with pruning, and the continuum limit from(8).In (a)-(b), the arrows show the direction of increasing time, and the density profiles shown are at times t = 0, 1, 2, 3, 4, 5 in black, red, blue, green, orange, and purple, respectively.

Figure 6 :
Figure 6: Stepwise equation learning results from Table5for Case Study 2: Free boundaries.(a) Comparisons of the discrete density profiles (solid curves) with those learned from PDEs obtained from the results in Table5(dashed curves), plotted at the times t = 0, 5, 10, 25, 50, 100 in black, red, blue, green, orange, and purple, respectively.The arrow shows the direction of increasing time.(b) As in (a), except comparing the leading edges.(c)-(e) are comparisons of the learned forms of D(q), H(q), and E(q) with the forms from the continuum limit(8).

Figure 7 :
Figure 7: Stepwise equation learning results from Table5for Case Study 2: Free boundaries, except also using matrix pruning on (A 3 , b 3 ) so points where dL j /dt falls outside of the 10% and 90% velocity quantiles are excluded, giving θ e 1 = 9.42 rather than 8.74.(a) Comparisons of the discrete density profiles (solid curves) with those from the learned PDE (dashed curves), plotted at the times t = 0, 5, 10, 25, 50, 100 in black, red, blue, green, orange, and purple, respectively.The arrow shows the direction of increasing time.(b) As in (a), except comparing the leading edges.(c)-(e) are comparisons of the learned forms of D(q), H(q), and E(q) with the forms from the continuum limit(8).

Figure 8 :
Figure 8: Stepwise equation learning results for Case Study 3: Fixed boundaries with proliferation, where the continuum limit is accurate.(a) Comparisons of the discrete density profiles (solid curves) with those learned from PDEs obtained from the results in Table6(dashed curves), plotted at the times t = 0, 1, 5, 10, 20, 50 in black, red, blue, green, orange, and purple, respectively.The arrow shows the direction of increasing time.(b)-(c) are comparisons of D(q) and R(q) with the forms from the continuum limit(8).

Figure 9 :
Figure 9: Stepwise equation learning results for Case Study 3: Fixed boundaries with proliferation, where the continuum limit is inaccurate.(a) Comparisons of the discrete density profiles (solid curves) with those learned from PDEs obtained from the results in Table6(dashed curves), plotted at the times t = 0, 1, 10, 25, 40, 75 in black, red, blue, green, orange, and purple, respectively.The arrow shows the direction of increasing time.(b)-(c) are comparisons of D(q) and R(q) with the forms from the continuum limit(8).
For saving the solution, the time intervals we use are I d = [0, 0.1], I e = [0, 5], I h = [5, 10], and I r = [10, 50], with 25, 50, 100, and 250 time points inside each time interval for saving.For interpolating the solution to obtain the averages, we use n k = 25, n k = 50, n k = 100, and n k = 50 over I d , I e , I h , and I r , respectively.
, we average the data over 1000 realisations.The time intervals we use are I d = [0, 2], I e = [2, 10], I h = [10, 20], and I r = [20, 50], using 20 time points for I d and 200 time points for I e , I h , and I r .We use n k = 50 knots for averaging the solution over I d , and n k = 100 knots for averaging the solution over I e , I h , and I r .

Figure 12 :
Figure 12: Complementary figure to Figure 3 in the main document, showing inaccurate continuum limits for Case Study 3 (left column) and Case Study 4 (right column).The solid curves are the discrete densities from Equation2.4 and the dashes curves are solutions to the continuum limit problem in Equation 2.6.The shaded regions show 95% confidence bands from the mean discrete curves at each time shown.

Figure 13 :
Figure13: Comparison of the density definitions from Baker et al.[16] to those in(35), using data from a mechanical relaxation problem as an example.(a) Comparing the definitions at t = 2 together with densities from the continuum limit PDE.The magenta rectangle shows the region that is zoomed in on in (b).(b) Zooming in on the magenta rectangle from (a) at the right boundary.(c) Comparing estimates of ∂q/∂x at the right boundary using each definition along with the continuum limit boundary condition ∂q/∂x = 2q 2 (1 − qs).

Figure 14 :
Figure 14: Stepwise equation learning results from Table8.(a) Comparisons of the discrete density profiles (solid curves) with those from the learned PDE (dashed curves), plotted at the times t = 0, 5, 10, 25, 50, 100 in black, red, blue, green, orange, and purple, respectively.(b) As in (a), except comparing the leading edges.(c)-(e) are comparisons of the learned forms of D(q), H(q), and E(q) compared to the forms from the continuum limit.

Figure 15 :
Figure15: Comparison of the solution of the piecewise proliferation law problem with the solution of continuum limit, where F (ℓ i ) = k(s − ℓ i ) and G(ℓ i ) = β for ℓ i ≥ ℓ p and G(ℓ i ) = 0 otherwise, using k = 10 −4 , s = 0, ℓ p = 0.2, η = 1, β = 10 −2 , and ∆t = 10 −2 .(a) The solid curves are the discrete densities, and the dashed curves are the densities from the solution of the continuum limit.The arrow shows the direction of increasing time.The density profiles are shown at the times t = 0, 10, 50, 100, 250, 500 in black, red, blue, green, orange, and purple, respectively.(b) Comparison of the number of cells from the discrete model with that computed from the solution of the continuum limit, using N (t) =

Figure 16 :
Figure16: Equation learning results for the piecewise proliferation law problem in Figure15, using the results from Table9.(a) Comparison of the averaged discrete densities (solid curves) with the solution of the learned PDE (dashed).The arrow shows the direction of increasing time.The arrow shows the direction of increasing time.The density profiles are shown at the times t = 0, 10, 50, 100, 250, 500 in black, red, blue, green, orange, and purple, respectively.(b)Comparison of the cell numbers.(c) Comparison of the learned form of R(q) with the continuum limit form of R(q).

Figure 17 :
Figure 17: Comparison of the linear diffusion problem with its continuum limit, where F (ℓ i ) = k(a/ℓ i − s) with k = 20, s = 1, η = 1, and a Gaussian initial density.(a) The solid curves are the discrete densities, and the dashed curves are the densities from the solution of the continuum limit.The arrow shows the direction of increasing time.The density profiles are shown at the times t = 0, 0.1, 2, 10, 50, 75, 100 in black, red, blue, green, orange, purple, and brown, respectively.(b) Like in (a), except comparing the leading edges.

Figure 18 :
Figure 18: Equation learning results for the linear diffusion problem in Figure 15, using the results from Table 9.(a) Comparison of the discrete densities (solid curves) with the solution of the learned PDE (dashed).The arrow shows the direction of increasing time.The density profiles are shown at the times t = 0, 0.1, 2, 10, 50, 75, 100 in black, red, blue, green, orange, purple, and brown, respectively.(b) Line in (a), except comparing the leading edges.(c)-(e) shows comparisons of the learned mechanisms with the forms from the continuum limit.

Table 1 :
Parameters used for each case study.The parameters are k, the spring constant; η, the drag coefficient; s, the resting spring length; ∆t, the proliferation duration; β, the intrinsic proliferation rate; K, the carrying capacity density; M , the number of time points; t 1 , the initial time; t M , the final time; n s , the number of identically-prepared realisations; n k , the number of knots used for averaging over realisations;

Table 3 :
Improved results for Case Study 1: Fixed boundaries from Table2, now using matrix pruning so that densities outside of the 10% and 90% density quantiles are not included.Coefficients highlighted in blue show the coefficient chosen to be removed or added at the corresponding step.

Table 4 :
Stepwise equation learning results for Case Study 2: Free boundaries, using the basis expansions

Table 8 :
Stepwise equation learning results for Case Study 2, using the basis expansions