Numerical Study of a Particle Method for Gradient Flows

We study the numerical behaviour of a particle method for gradient flows involving linear and nonlinear diffusion. This method relies on the discretisation of the energy via non-overlapping balls centred at the particles. The resulting scheme preserves the gradient flow structure at the particle level, and enables us to obtain a gradient descent formulation after time discretisation. We give several simulations to illustrate the validity of this method, as well as a detailed study of one-dimensional aggregation-diffusion equations.


Introduction
In this work we introduce a new particle method for approximating the solutions to evolution equations of the form ρ t = ∇ · ρ∇ H ′ (ρ(x)) + V (x) + W * ρ(x) , t > 0, x ∈ R d , ρ(0, ·) = ρ 0 (·), (1.1) where ρ(t, ·) ≥ 0 is the unknown probability measure and ρ 0 is a fixed element of P 2 (R d ), the set of Borel probability measures on R d with bounded second moment. Note that we denote by the same symbol a probability measure and its density, whenever the latter exists. The operator * denotes the convolution, H : [0, ∞) → R is the density of internal energy, V : R d → R is the confinement potential, and W : R d → R is the interaction potential. These equations are ubiquitous in many applications, ranging from granular media and porous medium flows to collective behaviour models in mathematical biology and self-assembly, see [6,8,12,36,37] and the references therein. Recent advances in the analysis of the equation (1.1) are mainly based on variational schemes using the natural gradient flow structure in the space of probability measures, see e.g. [26,21,31,38,1,15]. We define the continuum energy functional E : P 2 (R d ) → R ∪ {−∞, +∞} by if ρ ∈ P ac,2 (R d ) and H = 0, where P ac,2 (R d ) is the subset of P 2 (R d ) of probability measures which are absolutely continuous with respect to the Lebesgue measure. The functions H, V and W satisfy the following hypotheses.
Hypothesis 1. V is a function in C 1 (R d ) and W is a symmetric, locally integrable function in C 1 (R d \ {0}) with W (0) = 0.
The assumption W (0) = 0 is made without loss of generality. Indeed, if W (0) is finite, then W can be shifted "up" or "down" to get W (0) = 0; if W has a singularity at 0, then setting W (0) := 0 does not affect the physical behaviour of a system governed by the potential W . The assumptions that H(0) = 0 and h is convex and non-increasing imply that the energy E is displacement convex if, for example, V = W = 0, see [26], [25,Section 4] and [38,Theorem 5.15]. Also note that the classical cases H(ρ) = ρ log ρ and H(ρ) = ρ m m−1 (m > 1) satisfy all the required assumptions. The underlying topology on the probability measures in this paper is given by the quadratic Wasserstein distance d 2 (ρ, µ), which is defined between two measures ρ and µ in P 2 (R d ) by where Π(ρ, µ) is the space of probability measures (also called transport plans) on R d ×R d with first marginal ρ and second marginal µ. Let us fix a final time T > 0. We say that ρ : [0, T ] → P 2 (R d ) is a continuum gradient flow solution with initial condition ρ 0 if holds in the sense of distributions on [0, T ] × R d , see [1,Equation (8.3.8)]. The operator ∇ P 2 (R d ) is the classical quadratic Wasserstein gradient on P 2 (R d ), which takes the explicit form where δE δρ = H ′ (ρ) + V + W * ρ is the first variation density of E at point ρ. As a by-product of the general theory developed for instance in [1], gradient flow solutions to (1.2) are weak solutions to (1.1). Note that if theoretical issues such as the existence and uniqueness of solutions to the continuum gradient flow (1.2) are of interest, appropriate additional assumptions must be imposed on H, V, W and ρ 0 , see [21,38,1].
We propose below to approximate solutions to the continuum gradient flow (1.2) by finite atomic probability measures represented by finite numbers of particles. The basic idea is to restrict the continuum gradient flow to the discrete setting of atomic measures by performing the steepest descent of a suitable approximation of the energy E defined on finite numbers of Dirac masses, and apply a discrete analogue of the JKO scheme for the Fokker-Planck equation proposed by Jordan, Kinderlehrer and Otto [21]. The theoretical underpinning of this method is studied in the companion paper [17], where the convergence of the discrete gradient flow to the continuum one is proved in the framework proposed by Serfaty in [35,34], in the special case of V = W = 0 in one dimension for equally-weighted particles and under additional appropriate hypotheses on H. The goal of the present paper is to give numerical evidence of such convergence and to motivate possible extensions of the theoretical result in [17] to nonzero confinement and interaction potentials (even with possible singularities), as well as to higher dimensions and unequally-weighted particles.
Let us mention that other numerical methods have been developed to conserve particular properties of solutions of the continuity equation (1.1). In [7,12] the authors developed finite-volume methods preserving the decay of energy at the semi-discrete level, along with other important properties like non-negativity and mass conservation. Particle methods for these equations without the diffusion term are known to be convergent under suitable assumptions on the potentials V and W since these results are connected to the question of the mean-field limit and rigorous derivation of the equations from particle trajectories, see [13] for instance. In the case of diffusion equations, particle methods based on suitable regularisations of the flux of the continuity equation (1.1) have been proposed in [32,18,23,24]; note that an early particle method was derived for collisional kinetic equations in [33]. Our steepest descent method is purely variational and is based on regularising the internal part of the energy E by substituting particles by non-overlapping blobs, as we discuss next. Let us mention that the numerical approximation of the JKO variational scheme has already been tackled by different methods using pseudo-inverse distributions in one dimension [19,8,10], diffeomorphisms [16], or solving for the optimal map in a JKO step [5]. Our method avoids these computationally intensive procedures by the approximation of the energy in the discrete setting. Finally, note that gradient-flow-based Lagrangian methods for higher-order, drift diffusion and Fokker-Planck equations have recently been proposed in [29,30,28,22].
The paper is structured as follows. In Section 2 we give a summary of the method and the derivation of the discrete gradient flow in a more general setting than that considered in [17]. Section 3 is dedicated to the derivation of the numerical scheme used to approximate the continuum gradient flow (1.2), that is an explicit version of the JKO scheme, as well as to a numerical validation study of the scheme via diffusion equations. We stress that, from the scientific computing point of view, this is a preliminary study whose aim is to motivate the presented particle method. Finally, in Section 4 we give the numerical results for various one-dimensional aggregation-diffusion equations-we emphasise that this particle method, despite its simplicity, is able to capture the critical mass for the modified one-dimensional Keller-Segel model.

Particle method and discrete gradient flows
In this method, the underlying probability measure is characterised by the particles' positions (x 1 , . . . , x N ) ∈ (R d ) N = R N d and the associated weights (or masses) w := (w 1 , · · · , w N ) ∈ R N , where N is the total number of particles considered. Throughout this paper, the positions (x 1 , . . . , x N ) are evolving in time but the weights w are fixed and such that w i > 0 and N i=1 w i = 1. Also, we denote by R N d w the space of particles with weights w, that is, x := (x 1 , . . . , x N ) ∈ R N d w means that each particle x i is in R d and is associated with the weight w i . Notice the boldface font when referring to elements of R N d w . Remark 2.1. As an important convention in the rest of the paper, whenever particles x ∈ R N d w are considered, they are assumed to be distinct, i.e., x i = x j if i = j. Moreover, in one dimension, the particles are assumed to be sorted increasingly, i.e., x i+1 > x i for all i ∈ {1, . . . , N − 1}.
2.1. Discrete gradient flow. Consider N particles x := (x 1 , . . . , x N ) ∈ R N d w with fixed weights w. The most natural representation of the underlying probability measure is the empirical measure which belongs to the space of atomic measures Definition 2.2 (Discrete energy). We define the discrete energy E N : where B i is the open ball of centre x i and radius 1 2 min j =i |x i − x j |, and |B i | is the volume of B i . Note that E N is finite over the whole A N,w (R d ) since, by the Hypotheses 1 and 2, H, V and W are pointwise finite. The essence of this discrete approximation E N on A N,w (R d ) of the continuum 3 energy E on P 2 (R d ) lies in the treatment of the internal part R d H ρ(x) dx of the energy E, which becomes infinity on point masses; the point mass of each particle is uniformly spread to circumvent this problem. To this end, consider where χ B i is the characteristic function of the ball B i . Clearly ρ N is in P ac,2 (R d ), and thus the internal part of the energy is well-defined for ρ N . Note that the representation ρ N does not involve overlapping of balls, but contains "gaps" between balls whose sizes are expected to decrease as the number of particles increases. Then, using (2.3) for the internal part of the energy and (2.1) for the confinement and interaction parts, Here the diagonal terms in the interaction potential vanish since W (0) = 0 by Hypothesis 1. This choice of non-overlapping particles has the main advantage of reducing the computational cost of the internal part of the discrete energy functional. One can already notice that this approach allows us to treat diffusive effects simultaneously with confinement and interaction ones in a very natural way; this is another advantage of our method, as it becomes clearer throughout this paper. Since the expression above depends essentially on x ∈ R N d w , we can define the discrete energy equivalently as a function of Definition 2.3 (Subdifferential in Hilbert spaces). Let X be a Hilbert space with inner product ·, · X and φ : X → R. We define the subdifferential ∂φ : X → 2 X of φ, for all x ∈ X, by the subset If ∂φ(x) is not empty, then ∂ 0 φ(x) is defined to be the unique element in ∂φ(x) with minimal norm. Now, we say that x : [0, T ] → R N d w is a discrete gradient flow solution with initial condition for almost all t ∈ (0, T ] (2.5) is satisfied, and x(0) = x 0 . Here x ′ (t) is the velocity of the curve x(t) and w · x ′ is the element-wise product (w 1 x ′ 1 , . . . , w N x ′ N ). This formulation is not a standard differential inclusion because of the presence of the weights. To cope with this, we introduce the following inner product on R N d w . Definition 2.4 (Weighted inner product on R N d w ). For all x, y ∈ R N d w we define the weighted inner product between x and y as From now on, the Euclidean space R N d w is endowed with this inner product. This definition clearly induces the weighted norm It also induces a slightly more general subdifferential structure: for any functional φ : where again w · z is the element-wise product between w and z; and we can then define the element ∂ 0 w φ(x) with minimal norm accordingly. The discrete gradient flow inclusion (2.5) can now be rewritten more naturally as This is the discrete gradient flow structure that is used from now on.
Remark 2.5. In [17] the authors showed that in one dimension with V = W = 0 the discrete energy E N Γ-converges (in the d 2 -topology, see Definition A.3) to the continuum one E if adequate additional assumptions are imposed on H.
2.2. p-approximated discrete gradient flow. The gradient flow (2.7) is written as a differential inclusion instead of an ordinary differential equation, primarily because the radius 1 2 min j =i |x i −x j | of the balls B i is not a smooth function of the positions in the definition of the energy E N . As a result, the energyẼ N is not a smooth function of the positions x when the radius of the ball B i is determined by more than one particle.
If E N is lower semi-continuous and convex, and x 0 ∈ {y ∈ R N d w | ∂ w E N (y) = ∅}, it is known by [4, Theorem 1 of Section 3.2] that the discrete gradient flow (2.7) is well-posed and the equation is satisfied for almost every t ∈ [0, T ]. Note that, in our case, the lower semicontinuity of E N is trivial to check by the assumptions on H, whereas the convexity is proved in Proposition A.6 for convex potentials V and W in one dimension. Although the element of minimal norm ∂ 0 w E N (x(t)) can be computed in some situations as in [17] (for the special case of equallyweighted particles with internal energy only and in one dimension), the detailed procedure is usually involved and is not convenient for numerical implementation. In theory, one can approximate E N by a differentiable function usually characterised by a parmeter, where the minimal norm element of the subdifferential is recovered as the limit of the gradient of the approximation as the parameter goes to infinity. A common choice of such an approximation is depending on the parameter p > 0. Then Y p N is differentiable almost everywhere in R N d w , ∇Y p N is called the Yosida approximation of ∂ w E N , and see [4, Theorem 2 of Section 3.1 and Theorem 4 of Section 3.4]. However, the Yosida approximation requires another optimisation, and is therefore not very well adapted to the computation of ∂ 0 w E N (x). An alternative is to approximate the radius of the balls B i in the definition of E N by a smooth function.
The following proposition, whose proof is straightforward and left to the reader, justifies the use of the p-approximation of the minimum function. (1) Let p > 0. Then min p ∈ C ∞ ((0, ∞) s ).
(4) Let p > 0. Then min p (x) ≥ min(x) for all x ∈ (0, ∞) s , with equality if x i = x j for all i, j.
Now the discrete energy is further approximated using a smoothed radius as below.
Definition 2.8 (p-approximated discrete energy). Let p > 0. The p-approximated discrete energy Here B p i is the ball with the new radius As in (2.4), we can define the p-approximated discrete energy on R N d w rather than on A N,w (R d ): We say that x : where the weighted gradient ∇ w is naturally defined on R N d w by Remark 2.9. Whenever d = 1 and V and W are convex, we know by Proposition A.6 that E p N is convex, and therefore the p-approximated discrete gradient flow (2.10) is well-posed.
Remark 2.10. The interest of the p-approximation of the discrete gradient flow only lies in the numerical simplicity of coding a gradient descent on an ODE system such as (2.10) rather than on a differential inclusion such as (2.7). The p-approximation is indeed not needed for the theoretical proof given in [17] of the convergence of the discrete gradient flow to the continuum one, since, as already mentioned, in that specific case the minimal norm element of ∂ E N is explicitly computable.
Remark 2.11. Approximations of the minimum function other than the one given in Definition 2.6 are possible. One example is The properties of min p given in Proposition 2.7 are not enough to say that the p-approximated discrete gradient flow (2.10) converges to the discrete one (2.7) as p → ∞ in some sense. This still needs to be checked if we want to justify the numerical use of the p-approximated gradient flow. In the next section and in Appendix A we prove that this is the case in dimension one since we can there exploit the convexity of the discrete energies.
2.3. One-dimensional case. Definition 2.12 (Inter-particle distance). For any particles x ∈ R N w we denote the inter-particle distance by the positive quantity (eventually +∞ by convention) with the convention x 0 = −∞ and x N +1 = +∞. We also write, for p > 0, Since particles are sorted increasingly by convention, the p-approximated discrete energy (2.8) in dimension one can be defined in the simpler form The equivalent formulation of the energy E p N on the particles is defined in dimension one accordingly, see (2.9).
In order to check that the p-approximated discrete gradient flow defined above indeed approximates the discrete one, we need a few elements of maximal monotone operator theory (see [3,4,9] for a detailed overview of the theory). These notions and results being quite abstract, they are only given in Appendix A. There we show how to use this theory to prove the convergence of the p-approximated discrete gradient flow to the discrete one in a precise sense, in the case when V and W are convex; we therefore justify the numerical use of the p-approximated gradient flow (2.10) (at least in the case when V and W are convex and d = 1).
Remark 2.13. In this method, one reason why we decide to discretise according to non-overlapping balls, rather than, for example, Voronoi cells (see for instance [20] for a detailed account on Voronoi cells), is to allow for simpler computations of the discrete energies, which in turn gives rise to less costly simulations. In fact, this is not very relevant in one dimension since then the implementation of Voronoi cells is actually not more costly than that of non-overlapping balls; the one-dimensional simulations given below, which are run according to the non-overlapping balls described above, should therefore be seen as an initial validation of the method and as test cases which need to be extended to higher dimensions in a future work.

Numerical scheme and validation
Here a general variational formulation is applied to the discrete setting described above, leading to our numerical scheme giving the particles' positions at discrete time steps (see [1,2,21] for a Fokker-Planck motivation and its generalisation to curves in probability spaces). For the sake of generality the derivation of the scheme is partly done for the discrete gradient flow, rather than the p-approximated one. The simulations given later were, however, performed in the p-approximated setting only. Note also that, in this section, some indices N and p are dropped for clarity.
T ] a subdivision of the time interval [0, T ] and a time-step size ∆t = ∆ n t (eventually adaptive) such that t n = t 0 + n−1 i=0 ∆ i t for all n ∈ {1, . . . , M }. Suppose that, for some n ∈ {0, . . . , M − 1}, we know µ n ∈ A N,w (R d ), where µ n is the empirical measure of an approximated discrete gradient flow solution at time t n . Then we want to get an approximated discrete gradient flow solution at the time t n+1 , that is µ n+1 . To this end we use the JKO scheme, i.e., we choose µ n+1 as where we recall that d 2 is the quadratic Wasserstein distance. In the following any object with subor superscript n or n + 1 is associated with the time step n or n + 1, respectively. From now on, apart from a two-dimensional test in Section 3.4.5, the setting is one-dimensional.
3.1.1. Computation of the Wasserstein distance. Let us compute the Wasserstein distance in (3.1). For each n ∈ {0, . . . , M − 1} and approximation µ n and µ ∈ A N,w (R), we directly get where x n = (x n 1 , . . . , x n N ) and x = (x 1 , . . . , x N ) are the particles of µ n and µ, respectively, which leads to the scheme Clearly the scheme (3.3) on the empirical measures can be equivalently rewritten on the particles: where w·x n+1 is the element-wise multiplication between the vectors w and x n+1 . The minimisation problem (3.4) is characterised by When E N is convex, (3.5) is precisely the backward Euler scheme of the differential inclusion (2.7).

3.1.2.
Minimisation and final form of the scheme. Let p > 0. We now return to the p-approximated setting, i.e., we consider (3.4) (and equivalently (3.3)) with E p N instead of E N (and equivalently E p N instead of E N ). We want to minimise, over the whole set R N w , the functional in the argmin operator in (3.4) (and (3.3)) to find our approximation x n+1 (and µ n+1 ) at time step n + 1. First note that, for each n ∈ {0, . . . , M − 1}, we can write the discrete energy E p N as i , see Definition 2.12. For any n ∈ {0, . . . , M − 1}, (3.5) becomes where the uninspiring computation of the derivatives in the sum terms is left to the reader. The scheme (3.6) is the implicit Euler scheme of the ODE (2.10), or equivalently, of the ODE and it coincides with the JKO scheme for E p N , as in (3.1). Since the implicit scheme (3.6) is difficult and costly to solve, its explicit Euler version was used in the numerical examples of this paper: We remind the reader that the convergence of the implicit Euler version of the scheme is presented in [21] in one dimension. The stability analysis of the explicit scheme (3.7) is not dealt with as it is not the purpose of the present paper-we are not worried about time stepping stability issues for large number of particles in this preliminary stage of validation of our new approach. We can nevertheless indicate that, intuitively, a time step satisfying ∆t ≤ C N 2 for some constant C > 0 should suffice to ensure stability; indeed, this would be in line with the classical CFL condition for diffusion equations where the mesh size is of order 1 N . Note that this condition was respected in the simulations presented below. Let us also mention that higher-order time discretisations could be used in place of the explicit Euler scheme, and would indeed lead to a better time and space accuracy of the method.
Remark 3.1. It is worth pointing out that our particle method does not aim at being competitive against classical finite-volume and finite-difference schemes for purely diffusive equations. In fact, it aims primarily at being simple to implement-even in higher dimensions-and flexible when considering additional terms to diffusion such as confinement and interaction. In terms of complexity at each time step, the method is of order N 2 regardless of the space dimension, as it actually is for other finite-volume methods [12]. Note that, however, if we time-discretise (2.7) rather than its regularised form (2.10), the complexity becomes significantly higher in more than one dimension (at most of order N 3 ) since then we are required to find the closest neighbour to each particle; in dimension one the order of complexity is still N 2 thanks to the increasing ordering of the particles.

3.2.
Initialisation of the scheme. Below we give two different ways of approximating the initial profile. Let us first introduce the notion of pseudo-inverse.
3.2.1. Initially equally-weighted particles. If we want to approximate the initial profile ρ 0 ∈ P 2 (R) with equally-weighted particles, we need to start with unequally-spaced particles. Consider the pseudo-inverse Φ 0 of the cumulative distribution x → ρ 0 ((−∞, x]) of the initial profile. Suppose

3.2.2.
Initially equally-spaced particles. If now we want to approximate the initial profile ρ 0 ∈ P 2 (R) with equally-spaced particles, we need to assign a different weight to each particle. Consider F 0 , the cumulative distribution of the initial profile, and x 0 ∈ R N w , some chosen equally-spaced initial particles. Let us define x 0 Then we choose the weights w as The weight w i as given in (3.8) for each 1 < i < N is the mass that ρ 0 "assigns" to the interval , that is, to the Voronoi cell of x i . Note that the choice we have on the initial positions of the particles x 0 is not completely free of constraints: they must be chosen inside the support of ρ 0 or the resulting weights are zero. Finally, since the initial particles are chosen to be equally spaced, only the first and last particles' locations are needed to determine the locations of all the others; for the following numerical simulations we give this information under the form Remark 3.3. There are two main advantages in the second initialisation approach. The first one is that it allows a better approximation of the initial profile when there is a strong variation in the density; indeed one has the freedom to place initial particles in less populated regions.
The second one is that there are no gaps between the initial balls of centres x 0 i and diameters , see Proposition 2.7(4), allowing again a better approximation of the initial profile.

3.3.
Computation of the error. The natural error to compute for our scheme (3.7) is the quadratic Wassertein error.
3.3.1. Error with respect to an exact solution. If we want to get the error between the numerical and exact solutions we proceed as follows. Let ρ ∈ P 2 (R) be a continuum gradient flow solution at the final time T . Also write x ∈ R N w , the approximation of the p-approximated discrete gradient flow solution obtained with the JKO scheme (3.7) at the final time step M with associated empirical measure µ := µ M . Then we define the quadratic Wasserstein error by To compute this, one can use the one-dimensional pseudo-inverse definition of the quadratic Wasserstein distance. Let us write F and G the cumulative distributions of ρ and µ, respectively, and Φ and Ψ their respective pseudo-inverses. Then We have, for all i ∈ {1, . . . , N }, with Ω i := i j=1 w j and the convention Ω 0 = 0. Therefore The error can thus be easily determined if the inverse of the cumulative distribution Φ of the exact solution at the final time T is known.

3.3.2.
Error with respect to a discrete steady state. If we know that the considered gradient flow has a steady state and we are interested in the stabilisation behaviour of the scheme (3.7) we proceed as follows. Let x * := (x * 1 , . . . , x * N ) ∈ R N w be a discrete steady state (obtained by running a simulation for a "large" time) with associated empirical measure µ * , and let x ∈ R N w be the approximation of the p-approximated discrete gradient flow solution obtained with the JKO scheme (3.7) at some final time step M with empirical measure µ := µ M . Then we define the error by which is nothing but the Euclidean error on the particle space R N w , see (2.6). 3.4. Numerical validation: Diffusions. Before giving any simulation results, let us summarise practical implementation aspects of the scheme.
• As already justified in Section 3.1, all the following simulations were obtained by implementing the explicit scheme (3.7). • The parameter p needed for the iteration of such scheme was chosen to be 10 for every simulation.
This choice is only justified by the fact that it gave sensible results. Note that taking p "large" is a bad idea. Indeed, if p increases, then the p-approximated discrete gradient flow "approaches" the discrete one, for which the energy E N is not differentiable whenever two neighbouring interparticle distances are equal. This may cause numerical instabilities as we would expect the inter-particle distances at the centre of a symmetric profile (like a Gaussian) to be indeed equal; this is what we observed for large p under the form of particle oscillations. • All one-dimensional simulations were initialised in the way described in Section 3.2.2, up to a slight modification for end particles in Section 4.3. In each case, the initial continuum profile and the interval I init are explicitly given.
is shown in Figure 1a. The exact solution is given, for all t > 0, by for all x ∈ R, whose cumulative distribution F and its pseudo-inverse Φ are where erf is the error function. The error at the final time T can then be found using (3.9) and some quadrature form to approximate the integrals therein, see Figure 1b. In Figure 1a    , where the subscript + stands for the positive part, κ = m−1 2m(m+1) and K determined by the conservation of mass. Indeed, since the total conserved mass is one, then the constant K can be expressed as where Γ is the Gamma-function. Then one can verify that, for all t > 0, is a solution, see [37,Section 4.4]. For x ∈ 0, (t + t 0 ) α K κ , the cumulative distribution is Similarly, for x ∈ −(t + t 0 ) α K κ , 0 , Also, noticing that we get that the pseudo-inverse of F is The error can then be found using (3.9) and approximating the integrals therein, see Figure 2b. In   2 . In this case, regardless of the initial condition, there is a steady state: for all x ∈ R. In Figure 3b, we can see that the stabilisation of the scheme towards the discrete steady state (which we arbitrarily define as being the discrete solution obtained at T = 6) is linear on a semi-log plot with a slope very close to −2, which seems not to depend on the number of particles N . We can therefore write the error as e * d 2 = O e −2T , see (3.10).   2 . In this case, regardless of the initial profile and since the total mass is one, the steady state is     [14] and references therein, which is nicely confirmed numerically by Figure 4b with m = 2.

3.4.5.
A two-dimensional test: the heat equation. It is straightforward to generalise the scheme (3.7), derived in Section 3.1, to higher dimensions whenever the expression (3.2) for the Wasserstein distance can be used, which is the case when the approximation µ n+1 at time step n + 1 is sufficiently close to µ n , i.e., when the time step ∆t is small enough. Indeed, let us emphasise that the Wasserstein distance is computed between empirical measures on points, and not between their approximations on non-overlapping balls (which are only used to define the diffusion part of the regularised discrete energy functional, see (2.2)). When the time step is small enough, the Wasserstein distance approximation (3.2) is exact, since then the splitting of mass between empirical measures possibly happening in higher-dimensional optimal transport actually does not occur.
We test our scheme for the heat equation in two dimensions. The initial continuum density is ρ 0 (x) = 1 4πt 0 e −|x| 2 /(4t 0 ) with t 0 = 0.125, and the particle positions at T = 1 are shown in Figure 5b. The data were initialised by fixing the particles on a regular grid as in Figure 5a, with weights being the integrals of the continuum density ρ 0 on the Voronoi cells generated from the particles.  The averaged quantities along the time evolution, like the second moments N i=1 w i |x i | 2 and the entropy N i=1 w i log w i |B i | , seem to be very accurate, as shown in Figure 6, and this accuracy does not seem to degenerate with time. However, representing the numerical solution in this two-dimensional test is a delicate issue since gaps between discretisation balls are significant; this is an issue in itself which we do not deal with here.

The modified Keller-Segel equation.
We start by considering a modified one-dimensional Keller-Segel equation, that is the continuum gradient flow (1.2) with H(ρ) = ρ log ρ and W (x) = 2χ log |x| (and W (0) := 0), where χ > 0 is a parameter quantifying the attraction (see [11,8] for well-posedness and qualitative properties, and [10] for the approximation of this equation by a different particle method).
This model shows a critical behaviour depending only on the chemosensitivity strength χ as the classical Keller-Segel model in two dimensions, that is, there is a dichotomy of blow-up in finite  time or global existence which is only determined by χ being larger or less than 1, see [11,8]. In case χ < 1 solutions spread in time behaving like self-similar solutions. To get the leading order profile given by the self-similar solution, a time-space scaling is done for χ < 1, which is equivalent to impose a quadratic confinement potential on particles, see [8]. The long-time behaviour in this subcritical case is given by the profile of the self-similar solution.
4.1.1. Theoretical properties. We show that our particle approximation keeps approximately the criticality of the original Keller-Segel model at the discrete level. We show that there exist two positive constants χ 1 and χ 2 (N ) such that the following holds: if an appropriate confinement potential V is considered, the discrete and the p-approximated discrete gradient flows (2.7) and (2.10) for the modified Keller-Segel equation have steady states for χ < χ 1 ; while if V = 0, the p-approximated discrete gradient flow shows finite-time blow-up for χ > χ 2 (N ). Quite surprisingly, χ 1 happens to be exactly the critical parameter at the continuum level, i.e., χ 1 = 1, and χ 2 (N ) tends to 1 as N → ∞ and does not depend on p (but only on N and the chosen weights). By the term "blow-up" at the discrete level we mean the event of two, or more, particles colliding. Also note that in the following the term "maximal time of existence" indicates either the first time when two or more particles of a solution collide, i.e., the first blow-up time, or the first time when the norm of a solution equals +∞.
First, let us prove that the discrete and p-approximated discrete confined Keller-Segel equations show no collisions of particles if χ < χ 1 .
Proposition 4.1. Consider the discrete gradient flow corresponding to the confined Keller-Segel equation with V coercive and such that the function x → inf y∈R (w 1 V (y) + w N V (x + y)) − log |x| is coercive. Suppose there exists a solution x to such gradient flow, emanating from an initial condition x 0 ∈ R N w , up to some maximal time of existence, say T * > 0. If χ < χ 1 := 1, then no particles of x can collide in [0, T * ); furthermore, the minimal inter-particle distance is uniformly bounded from below in time by a positive constant.
Proof. The energy E N is a Lyapunov functional, i.e., where r i (t) = min(∆x i (t), ∆x i+1 (t)). Writing log − x := log x if 0 < x < 1 and log − x := 0 if x ≥ 1, and using that log is increasing, Hence, writing log + x := log x if x ≥ 1 and log + x := 0 if 0 ≤ x < 1, and by (4.2), using that − log + r i (t) ≥ − log + (x N (t) − x 1 (t)) for all i ∈ {1, . . . , N }. From the assumptions on V we know that V is bounded from below and also that w 1 V (x 1 (t))+w N V (x N (t))−log + (x N (t)−x 1 (t)) is bounded from below uniformly in time. Therefore there exists a constant C ∈ R, independent of t, such that using (4.1). We see that if χ < χ 1 = 1, the minimal inter-particle distances r i (t) cannot get arbitrarily small, or the energy E N (x(t)) gets larger than its initial value E 0 , which violates the fact that the system is a gradient flow. Hence the result.
Remark 4.2. The proof above is given only for the discrete case; however, note that it can be easily adapted to the p-approximated one, if p ≥ 1, by Proposition 2.7(5) with s = 2, and without the need to change the constant χ 1 .
We can now show the global existence in time and the existence of steady states for the discrete and p-approximated discrete confined Keller-Segel equations. Proof. Let χ < χ 1 = 1. Suppose there exists such a solution x, emanating from an initial condition x 0 ∈ R N w , defined on [0, T * ), see Proposition 4.1, and assume that x N (t) − x 1 (t) → +∞ as t → T * . For all t ∈ [0, T * ), the proof of Proposition 4.1 implies that there exists a t-independent constant C 1 ∈ R such that x 1 (t)). By the growth assumption at infinity on V , we have f (x N (t) − x 1 (t)) → +∞ as t → T * , which implies that the inequality above is violated for a time t close enough to T * . Therefore x N (t) − x 1 (t) cannot diverge as t → T * , and thus there exists C 2 ∈ R, uniform in t, such that log + (x N (t) − x 1 (t)) ≤ C 2 for all t ∈ [0, T * ). Therefore, which, by coercivity of V shows that x 1 (t) and x N (t) cannot diverge. Thus, there exists some constant ℓ > 0, independent of time, such that {x 1 (t), . . . , x N (t)} ⊂ B(0, ℓ) for all t ∈ [0, T * ). This, together with the no-collision result in Proposition 4.1, shows that the maximal time of existence of the discrete gradient flow solution is T * = ∞.
Finally, the functional E N is lower semi-continous and bounded from below on any sublevel set of E N which are compact due to (4.3). Indeed, the previous argument shows that . Moreover, the same argument leading to (4.3) implies that for all x ∈ R N w such that E 0 ≥ E N (x), since χ < 1. Therefore, by a direct method of calculus of variations, we get that E N has a global minimiser, which ends the proof.
Therefore the evolution of the second moment is linear in time with slope 2(1 − χ). This slope is negative if χ > 1, which implies that M 2 (t) becomes zero in finite time leading to concentration of mass in finite time and contradiction with the assumption of global existence. We want here to show that our p-approximated discrete gradient flow (2.10) preserves this finite-time blow-up property for some numerical critical parameter χ 2 (N ), at least when all particles have same weight.
Recall that, at the discrete level, we define blow-up as being the event of two particles colliding. Proof. Suppose that there exists x, a p-approximated discrete Keller-Segel gradient flow solution emanating from an initial condition x 0 ∈ R N w , defined on some maximal interval of existence [0, T * ). Let us compute the evolution of the second moment of µ N , the empirical measure associated to x, at any t ∈ [0, T * ).
In the following we drop the dependencies on time for the sake of simplicity. Suppose T * = ∞. We want to find a contradiction if χ > χ 2 (N ) by computing explicitly the evolution of the second moment in (4.5). Write ∆ j i := 1 + (∆x j /∆x i ) p for all i, j ∈ {1, . . . , N } with i = j, recalling the convention ∆x 1 = ∆x N +1 = +∞ and setting ∆ 1 0 = ∆ N +1 N +2 = +∞. By (2.10), whereW := log | · |. First, compute A 1 by appropriately rearranging the sum terms, Also, Hence, by combining the last two computations, we get Then, compute A 2 by using the anti-symmetry ofW ′ (x) = 1 x , Therefore, for all t ∈ [0, T * ) = [0, ∞), Hence the evolution of the second moment is linear with a negative slope, since by assumption χ > χ 2 (N ), which clearly contradicts the fact that the maximal time of existence T * = ∞, and therefore the solution x exists only up to a finite time: T * < ∞. At exactly that time, only two things may happen: either the norm of the solution equals +∞, i.e., |x| w = +∞, or two or more particles collide. The first option is not plausible since trivially the second moment of an empirical measure is finite at all times. We are thus only left with the collision of particles, that is x has to blow up in finite time.
Finally, renew ∆ n t := min i∈{1,...,N } δ i , compute the positions x n+1 i with the new ∆ n t, and start over until ∆ n t > 10 −7 ; when ∆ n t ≤ 10 −7 stop the simulation. We took ∆ 0 t = 10 −5 as the very initial time-step size. In Figures 7, 8 and 9 the simulations shown stopped due to this adaptive time-step procedure.  Figure 7 one can see that the scheme we used captures nicely the formation of the blow-up for a supercritical parameter χ. Figure 8a shows that the evolution of the second moment is linear for some range of time with a slope that deviates slightly from the theoretical one, as expected from Proposition 4.6 and its proof. Actually, by comparing (4.4) and (4.6), it can be seen that this slope deviation decreases as N increases. For larger times, as the blow-up is approached and the time step refined, the numerical slope deviates even more from the theoretical one. Figure 8b shows the trajectories of the particles up to the first blow-up time, defined numerically as the first time when the distance between two particles gets smaller than some chosen number d min , or equivalently when the adaptive time-step size ∆ n t gets larger than 10 −7 , see above. A possible procedure to continue the evolution after the first blow-up time is merging particles into a new heavier one whenever the distance between these particles gets less than a certain threshold, say proportional to d min ; the weight of the new particle is then chosen to be the sum of the merged particles, and the position the barycentre of the merged particles. This procedure might give an idea of how the particles behave after the first blow-up, but we found that it is not very accurate since the post-collision trajectories strongly depended on the choice of the threshold, which is arbitrary. We found that the analysis of the post-collision behaviour is very delicate without having clear   criteria for deciding when to merge particles and how many simultaneously. We thus leave this issue for further analysis and future work. Figure 9 shows the result with a continuum two-bump initial profile, with I init = [−4.5, 4.5]: with t 0 = 0.25. Figure 9 shows the possible formation of several Dirac masses, according to how much attraction is involved in the system and to how many bumps are present at the beginning. It seems like the more attraction, the more Dirac masses can form. Note that the two peaks in Figure 9b are actually of same height despite a displaying artifact; also, even if not clear from Figure 9b, the two peaks get slightly closer to each other with time, but then blow up before merging.    Each curve in Figure 10 is a good approximation of a steady state for the modified Keller-Segel equation with nonlinear diffusion. For each m, the steady state is different; as m tends to 1 the steady state "squeezes" and looks as if it is approaching a Dirac mass, which is the "steady state" of the modified Keller-Segel equation with linear diffusion studied in Section 4.1.

4.3.
A compactly supported potential with nonlinear diffusion. We consider here the continuum gradient flow when H(ρ) = ρ m m−1 , m > 1, V = 0 and W (x) = −c max(1 − |x|, 0) + c, c > 0, is a compactly supported interaction potential. In Figure 11 the considered continuum initial profile 22 is a uniform distribution on the interval [−2, 2]. Here I init = [−2, 2] and the end particles were set to have weights equal to 0.001.    Figure 11a shows the formation of a metastable state made of two bumps, while Figure 11b shows how this metastable state breaks into a single-bumped steady state. This behaviour, for exactly this interaction potential (up to a multiplicative constant, c), was already noted in [12, Example 3] when using a finite-volume scheme on gradient flows. The bumps are actually supposed to be disconnected since no particles were found numerically in-between, although this does not seem to be the case in the plots. This is because the particle of each bump which is closest to the origin is not located at the actual boundary of the exact solution's corresponding bump, and so the line connecting the two bumps does not show 0 density but more. Visually, a better separation of the bumps can be obtained by increasing the number of particles in the simulation.
Appendix A. Convergence of the p-approximated gradient flow in one dimension In this appendix we give the proof of the convergence of the p-approximated gradient flow (2.10) to the discrete one (2.7) in one dimension, see Section 2.3. Before doing so we need to recall some notions and results from monotone operator theory.
If A : {y ∈ X | Ay = ∅} ⊂ X → X * , where X is a Banach space and X * its dual, is a maximal monotone operator, then the graph of A is given by graph(A) = {(x, Ax) | x ∈ {y ∈ X | Ay = ∅}}.
Definition A.1 (Graph convergence). Let X be a reflexive Banach space, (A k ) k∈N a sequence of maximal monotone operators from X to X * , and A a maximal monotone operator from X to X * . We say that (A k ) k∈N graph-converges to A if, for every (x, y) ∈ graph(A), there exists a sequence ((x k , y k )) k∈N with (x k , y k ) ∈ graph(A k ) for all k ∈ N such that x k → x strongly in X and y k → y strongly in X * as k → ∞.
Remark A.2. One can check that if φ is a lower semi-continuous, convex function from a reflexive Banach space to R, then ∂φ is a maximal monotone operator, see [3, Section 3.7.1].
For the sake of completeness we give the defintion of Γ-convergence. Definition A.3 (Γ-convergence). Let (X k ) k∈N be a sequence of metric spaces endowed with a distance d and (φ k ) k∈N be a sequence of functionals φ k : X k → R for all k ∈ N. We say that (φ k ) k∈N Γ-converges to φ if the following two conditions are met for all u ∈ X: (i) ("liminf" condition) φ(u) ≤ lim inf k→∞ φ k (u k ) for all sequences (u k ) k∈N with u k ∈ X k for all k ∈ N and d(u k , u) → 0 as k → ∞, (ii) ("limsup" condition) lim sup k→∞ φ k (u k ) ≤ φ(u) for some sequence (u k ) k∈N with u k ∈ X k for all k ∈ N and d(u k , u) → 0 as k → ∞.
Theorem A.4 connects the notion of Γ-convergence to that of graph-convergence in finite dimension. It is a consequence of [3,Theorem 3.66] and the fact that Γ-convergence is equivalent to Mosco convergence in finite dimension. Indeed, in a general dimensional setting, Mosco convergence means that the "liminf" condition of Γ-convergence holds for the weak topology and the "limsup" condition holds for the strong topology, see [27,Definition 2.2] and [3,Definition 3.17].
Theorem A.4. Let X be a finite-dimensional Banach space. Let (φ k ) k∈N be a sequence of lower semi-continuous, convex functions with φ k : X → R for all k ∈ N, and φ : X → R a lower semicontinuous, convex function. If (φ k ) k∈N Γ-converges to φ, then (∂φ k ) k∈N graph-converges to ∂φ.
Theorem A.5. Let X be a Hilbert space. Let (φ k ) k∈N be a sequence of lower semi-continuous, convex functions with φ k : X → R for all k ∈ N and φ : X → R be a lower semi-continuous, convex function. Suppose that (∂φ k ) k∈N graph-converges to ∂φ. Consider the following differential inclusions, for all k ∈ N.
It is not hard to see that E p N , see (2.11), Γ-converges to E N , see (2.4), as p → ∞. Furthermore, it is easily verified that E p N (x 0 ) → E N (x 0 ) as p → ∞, where x 0 ∈ R N w is taken here to be the initial condition for both the discrete and p-approximated discrete gradient flows (2.7) and (2.10). Therefore, in order to use Theorems A.4 and A.5 combined, we are only left with checking that E p N and E N are lower semi-continuous and convex. The first condition is trivial to verify based on the assumptions on H, V and W , whereas the convexity condition is shown below whenever V and W are assumed to be convex. Actually the following proposition shows the convexity of E N only, but also holds for E p N since the proof is easily adapted from the classical minimum function to the p-approximated one.
Proposition A.6. Let d = 1 and the confinement and interaction potentials V and W be convex. Then the discrete energy E N defined in (2.4) is convex.
Proof. Let λ ∈ [0, 1] and x, y ∈ R N w . Then, by the facts that min is concave on R 2 and h is non-increasing and convex on (0, ∞), we know that h • min is convex on (0, ∞) 2 , where • is the composition operator. Define, for all a, b ∈ R, [a, b] λ = λa + (1 − λ)b, and r i (x) = min(∆x i , ∆x i+1 ) and r i (y) = min(∆y i , ∆y i+1 ). Therefore, since V and W are convex, Hence convexity of E N .
Theorem A.4 now tells us that ∂ w E p N graph-converges to ∂ w E N as p → ∞, and Theorem A.5 tells us in which sense the p-approximated discrete gradient flow converges to the discrete one and also gives us some regularity on the discrete and p-approximated discrete gradient flow solutions. The use of the p-approximated gradient flow (2.10) is therefore justified to approximate (2.7) (at least in the case when V and W are convex and d = 1).