Existence and approximation of probability measure solutions to models of collective behaviors

In this paper we consider first order differential models of collective behaviors of groups of agents based on the mass conservation equation. Models are formulated taking the spatial distribution of the agents as the main unknown, expressed in terms of a probability measure evolving in time. We develop an existence and approximation theory of the solutions to such models and we show that some recently proposed models of crowd and swarm dynamics fit our theoretic paradigm.


Introduction
This paper deals with mathematical models of collective behaviors of groups of interacting agents, such as human crowds and swarms. The reference framework is that of first order differential models ruled by the principle of conservation of mass and supplemented by a kinematic description of the behavioral strategy developed by the agents. The number of individuals is assumed to be finite, though arbitrarily large, their state being represented by their position evolving in time in the Euclidean space R d , d ≥ 1. Rather than looking at the agents singularly, we abstract their spatiotemporal evolution into that of a suitable probability measure, representing the law of their positions understood as random variables. This allows us to provide a unified theory for both discrete and continuous models.
There are three main contributions of this paper. (i) We outline a basic set of modeling assumptions, which allow us to prove the existence of probability measure solutions to a broad class of models of collective behaviors of the kind described above. (ii) Under the very same assumptions, complemented with a suitable condition on the time and space discretization, we provide a convergence result of an ad hoc numerical scheme, originally proposed in [9], for approximating the solutions to such models. (iii) We reinterpret the rendez-vous, swarm, and crowd dynamics models developed in [4,5,8,9] in the light of our probabilistic description, showing that they comply with the above modeling assumptions and are therefore in the scope of our existence and approximation theory.
In more detail, the paper is organized as follows. After this Introduction, Section 1.1 briefly introduces and explains the main notations and notions used throughout the other sections. Section 2 proposes a probabilistic interpretation of the dynamics of systems of interacting agents, discussing both the indefinite mass conservation equation and the related Cauchy problems. Then it presents the modeling assumptions and offers an overview of the results proved in the subsequent sections. Section 3 deals with the existence of solutions and Section 4 with the convergence of the approximation scheme. These two sections may be skipped by readers not interested in the technical details of the proofs. Section 5 addresses the above-cited crowd and swarm dynamics models, showing that they fit the theory in all cases interesting for the applications. Finally, Section 6 gives an example of how ODE-based discrete models can be explicitly recovered from our measure-theoretic framework. In addition, it shows by means of numerical tests the convergence of the computational scheme discussed in Section 4 to the ODE solutions of such models.

Notations and background
In this section we quickly review the main notations and notions that we will extensively use in the paper.
Functions and function spaces. We denote by C(A; B) the space of continuous functions f : A → B. The set B is usually omitted if it is R. Coherently, we denote by C ∞ c (R d ) the space of real-valued infinitely differentiable functions with compact support in R d .

The indicator function of a set
Norms. We use |·| for the Euclidean norm in R d and · for the corresponding inner product. The open Euclidean ball with center x ∈ R d and radius R ≥ 0 is denoted by B R (x). If f : Measures. Let B(R d ) be the Borel σ-algebra on R d . If µ is a measure on the measurable space (R d , B(R d )) and f : R d → R d is Borel, the integral of f w.r.t. µ over a measurable set A is denoted by A f dµ, or by A f (x) dµ(x) when it is necessary to emphasize the variable of integration. The Lebesgue measure in R d is denoted L d . However, for integrals with respect to L d we will prefer the usual symbol dx to dL d (x). If f is Borel, we denote by f #µ the push forward of µ through f . Specifically, f #µ is the measure defined by the relation Probability spaces. We denote by P 1 (R d ) the space of probability measures on (R d , B(R d )) whose first moment is finite, i.e., R d |x| dµ(x) < +∞. The space P 2 (R d ), that we will also occasionally mention, is defined analogously using the second moment. Given two probability measures µ, ν ∈ P 1 (R d ), their Wasserstein distance is defined to be where Lip 1 (R d ) is the space of Lipschitz continuous functions ϕ : R d → R with Lipschitz constant Lip(ϕ) ≤ 1. It can be shown that W 1 is a metric on P 1 (R d ) and that (P 1 (R d ), W 1 ) is complete (see e.g., [1,Proposition 7.1.5]). Finally, to deal with curves in P 1 (R d ) parameterized by time, [0, T ] ∋ t → µ t ∈ P 1 (R d ), we introduce the space C([0, T ]; P 1 (R d )), which is complete with the metric dist(µ • , ν • ) := sup t∈[0, T ] W 1 (µ t , ν t ).

Problem statement and main results
In this section we present our approach to the modeling of systems of interacting agents by means of probability measures, and we give an overview of our results.

Probabilistic description of systems of interacting agents
Crowds and swarms can be thought of, in abstract, as systems of N interacting agents in the physical space R d . The evolution of such systems in time is described by tracing the agent positions at successive instants. Assume that the position of the i-th agent at time t is a random variable X i t from a fixed (i.e., time-independent) abstract probability space (Ω, F , P ) to the measurable space (R d , B(R d )). The probability P is naturally transported by X i t onto the new probability µ t := X i t #P on (R d , B(R d )), called the law of X i t . The fact that µ t does not depend on the agent label i means that agents are indistinguishable: the probability of finding a certain agent somewhere in R d at time t is the same for all agents, namely, given Let us now fix a measurable set A ⊆ R d and count the average number of agents contained in A at time t. This amounts to introducing the new random variable Y t,A : Ω → N defined as and taking its expectation, that we can compute as follows: Notice that E[Y t,• ], thought of as a map on B(R d ), is a finite positive measure, say m t , over the It is straightforward to identify m t with the mass of the system at time t. From the above calculation we see that m t is proportional to the probability measure of the distribution of the agents: i.e., the total mass of the system is constant in time.
The latter observation suggests that we can assume the principle of conservation of the mass, stating that the mass of any measurable set A may change in time only because of inflow or outflow of mass from the boundary ∂A. In other words, the mass is neither created nor destroyed but only moved across the domain. This is expressed by postulating the continuity equation (or mass conservation equation) for the evolution of the measure m t : where v t (x) is the velocity at time t in the point x ∈ R d . In systems of interacting agents the velocity is likely to be affected by the distribution of the agents themselves. Due to the proportionality between m t and µ t , this implies that v t may ultimately depend on the probability µ t . We assume in particular that v t depends on t only through µ t itself, i.e., that the system is autonomous. Finally, we write v t = v[µ t ] to emphasize such a structure of the velocity and notice that Eq. (2) can be converted into an evolution equation for the probability µ t :

Cauchy problems
By supplementing Eq. (3) with an initial conditionμ, the following Cauchy problem is obtained: which models the spatiotemporal evolution of the agent distribution starting from the initial configuration described byμ. Derivatives in problem (4) are meant in the sense of distributions, which leads us to consider the following notion of weak solution: As far as the probabilistic interpretation is concerned, we take the initial condition into account by understanding µ t in (4) as the law of the random variableX i t := E[X i t |X i 0 ], the expectation of X i t conditioned to its initial value X i 0 . In practice, µ t is reinterpreted as the distribution of the position of the i-th agent subject to the distribution of the corresponding initial position. From the theory of conditional expectation, we known thatX i t is a function of X i 0 , i.e., there exists a Borel mapping γ t : In order for Eq. (6) to result in a representation formula for the solutions of problem (4), a more precise characterization of the function γ t is needed. Formally, we plug Eq. (6) into Eq. (5) and compute for an arbitrarily fixed test function φ. Next we notice that the integrand at the right-hand side can be read as the derivative w.r.t. τ of the function φ(γ τ (x)), provided we identify ∂ ∂t γ t (x) with v[γ t #μ](γ t (x)). Under this assumption we get and further, interchanging the order of integration at the right-hand side, With the additional condition γ 0 (x) = x (i.e., γ 0 is the identity function in R d ), this shows that µ t represented by Eq. (6) is formally a weak solution to the Cauchy problem (4). To sum up, γ t has been characterized as a function such that for every x ∈ R d . In transport theory, such a function is called a flow map. The physical interpretation is that γ t (A) is the configuration assumed by the set A at time t > 0 when transported by the velocity field v. Alternatively, for x ∈ R d the mapping t → γ t (x) is the trajectory of system (7) issuing from x. The method of representing solutions to the Cauchy problem (4) via flow maps is called method of the characteristics. To develop our theory we will mostly prefer a different approach, more suited to treat, by common ideas, existence and approximation of solutions to the models we are interested in. The reader interested in the method of the characteristics is referred to [3] and references therein for further details.

Basic assumptions and results overview
Models based on Eq. (3) require to specify the velocity, namely its dependence on the probability µ t and on the space variable x. Rather than considering a specific model, we outline here a small set of assumptions, which can be possibly regarded as modeling guidelines, whence the whole theory will follow.
Assumption 1 (Properties of v). We assume that the velocity field v = v[µ](x) satisfies the following properties.
(i) Uniform boundedness: there exists V > 0 such that (ii) Lipschitz continuity: there exists a constant Lip(v) > 0, independent of both the space variable and the probability, such that (iii) Linearity w.r.t. the measure for convex combinations: From this basic set of hypotheses we will be able to prove, in Section 3, existence of solutions to the Cauchy problem (4) in the appropriate weak sense of Definition 1. More precisely, to this end we need the further technical assumption that the initial condition have finite first and second order moments: Assumption 2 (Initial condition). We assume thatμ ∈ P 1 (R d ) ∩ P 2 (R d ).
Then, our main result in Section 3 reads: Theorem (cf. Theorem 5). Under Assumptions 1, 2 there exists a weak solution to problem (4).
Notice that Assumption 2 is readily satisfied if, for instance,μ has compact support. Indeed, in such a case suppμ is bounded, i.e., there exists a ball B R (0) of sufficiently large radius R > 0 such that, for every p ≥ 0, The compactness of the support ofμ makes perfectly sense from the modeling point of view, in fact a crowd or a swarm spread on the whole space would sound quite unrealistic. Assumption 2 is therefore not restrictive for our purposes.
In Section 4 we turn our attention to the approximation of solutions to problem (4). We introduce a sequence of grids in R d × [0, T ] with mesh parameters h k in space and ∆t k in time. The index k relates to the grid refinement, in such a way that h k , ∆t k → 0 when k → ∞. Specifically, we consider the numerical scheme proposed in [9], which at each time step seeks an approximation of µ t via a probability measure absolutely continuous w.r.t. to Lebesgue and piecewise constant in space. By introducing a linear-in-time interpolation of such approximate solutions and passing to the limit k → ∞, we obtain the following convergence result: Theorem (cf. Theorem 12). Under Assumptions 1, 2, suppose that h k = o(∆t k ) for k → ∞. If the sequence of approximate solutions converges to some µ • ∈ C([0, T ]; P 1 (R d )) when the grid is refined then µ • is a weak solution to problem (4).
Note that convergence when the grid is refined is an assumption of this theorem. In this respect, this result resembles the Lax-Wendroff's Theorem for the numerical approximation of hyperbolic conservation laws (see e.g., [7]). However, we anticipate that if there exists a bounded subset of R d , that at each time step and for all level of refinement of the grid contains the supports of all the approximate solutions, then the sequence does converge in C([0, T ]; P 1 (R d )) to some limit, which is then a weak solution to problem (4) (cf. Corollary 13).
In Section 5 we apply the above theory to the models of swarm and crowd dynamics presented in [5,8]. As shown in [6], these models can be derived from the common framework provided by Eq. (3), with a velocity field of the form where the integral expresses the interactions among the agents. Assuming some minimal regularity of the functions v d , f , r, χ Ux , we prove that this velocity complies with Assumption 1 in case of both isotropic and anisotropic interactions, cf. Sections 5.1 and 5.2, respectively. As a consequence, for the above-mentioned models we deduce existence of probability measure solutions, that can be duly approximated via the numerical scheme discussed in Section 4.
Our results are exemplified in Section 6, where we show that, given a purely atomic initial measureμ, i.e.,μ a solution to problem (4) can be found by solving a system of ODEs whose unknowns are the trajectories of the agents. In addition, using a numerical solution of these ODEs as a benchmark, we are able to visualize the convergence of the numerical scheme presented in Section 4.

Existence of solutions
This section is devoted to give a constructive proof of the existence of solutions to the Cauchy problem (4). Under Assumptions 1, 2, the solution is constructed as the limit of a suitable sequence of curves in P 1 (R d ) parameterized by time t. Let (∆t k ) k≥0 be a sequence of time steps such that ∆t k → 0 when k → ∞. We consider the measures (µ k n ) n≥0 generated recursively as where N k ∈ N is such that N k ∆t k = T and γ k n is the (one-step) flow map It can be shown (see e.g., [9]) that (8) is the explicit time discretization of (4) at the time instants t k n = n∆t k . In particular, the mapping γ k n results from the explicit time discretization of problem (7). Sinceμ is a probability measure, by induction it is immediate to check that so are all of the µ k n 's. By linear interpolation in time, we define the following curves: Obviously, M k t is a probability measure for each t ∈ [0, T ] and each k ≥ 0.
We will use the curves M k • to construct, in the limit k → ∞, a weak solution to problem (4). The proof is divided in two parts, each of which proceeds through a series of technical intermediate steps developed in the next two sections. First, we show that the measures M k • converge to a limit; later, we show that such limit satisfies Eq. (5).

Convergence of the measures M k t
In this section we prove that, when k → ∞ and up to subsequences, M k • converges to a limit µ • in C([0, T ]; P 1 (R d )).
We begin by establishing the necessary regularity properties of the iterates µ k n and the curves M k • . Lemma 1. We have µ k n ∈ P 1 (R d ) ∩ P 2 (R d ) for all n = 1, . . . , N k and all k ≥ 0. In addition: i.e., first and second moments of the µ k n 's are uniformly bounded. Proof. (i) Let us begin by considering the case p = 1. Note that and summing telescopically over n we get Sinceμ ∈ P 1 (R d ), this implies µ k n ∈ P 1 (R d ) for all n = 1, . . . , N k and all k ≥ 0. But the right-hand side of (10) is independent of both n and k, hence this also provides the uniform bound for p = 1.
(ii) We argue analogously for the case p = 2. We note that and that Moreover, using the bound (10) we deduce and, summing telescopically over n, whence the claims of the lemma follow also for p = 2.
i.e., the curves M k ]. Hence, using Eq. (10), we obtain From the arbitrariness of t ∈ [0, T ], k ≥ 0 our claim follows, along with the uniform boundedness of the first moment of M k t in both t and k. (ii) We prove now the estimate (11). Let s, t ∈ [0, T ] and assume, without loss of generality, that s ≤ t. Then there exist two integers m and n, such that 0 ≤ m ≤ n ≤ N k , with the property that t k m ≤ s ≤ t k m+1 and t k n ≤ t ≤ t k n+1 . Therefore we can write and further, owing to the triangle inequality, Notice that and analogously W 1 (µ k n , M k t ) = (t − t k n )W 1 (µ k n , µ k n+1 )/∆t k , thus, according to Eq. (12), where Lemma 1 has been used. This proves the claim and also the uniform boundedness of the second moment of M k t in both t and k. We are now ready to prove the main convergence result of this section.
(i) Equicontinuity follows from the estimate (11). Indeed, let ε > 0 then for δ = ε/(2V ), which does not depend on k, we have |t − s| < δ ⇒ W 1 (M k s , M k t ) ≤ ε/2 < ε. (ii) According to [1, Proposition 7.1.5], the relative compactness of {M k t } k≥0 in P 1 (R d ) is equivalent to the fact that {M k t } k≥0 be tight and have uniformly integrable first moments. (ii-a) Using [1, Remark 5.1.5], a sufficient condition for tightness is that there exists a function ϕ : Taking ϕ(x) = |x| and invoking Lemma 2 we see that this condition is fulfilled, hence , a sufficient condition for the uniform integrability of the first moments of {M k t } k≥0 is that there exists p > 1 such that From Lemma 2 we know that this actually holds for p = 2.
, up to subsequences we obtain that the sequence (M k • ) k≥0 converges in C([0, T ]; P 1 (R d )) and we are done.

The limit µ • solves problem (4)
In the previous section we have constructed a map µ • as the limit of the sequence (M k • ) k≥0 . In this section we prove that such a µ • is a weak solution to problem (4). To this end, we first derive an equation solved by the M k t 's, then we pass to the limit k → ∞ thanks to Proposition 3.
A Taylor expansion of φ • γ k n with Lagrange's reminder gives where D 2 φ is the Hessian of φ andx is a point of the segment connecting x and x + v[µ k n ](x)∆t k . Hence the previous computation specializes as We claim now that the mapping t → R d φ dM k t is Lipschitz continuous, hence a.e. differentiable by Rademacher's Theorem. To see this, observe that x → φ(x)/ Lip(φ) is Lipschitz continuous with Lipschitz constant at most 1, so that Thus, using Eq. (13), we compute the derivative Let us consider now: and invoke Assumption 1-(iii) to get Using this in (14) gives Formally we can regard this expression as an equation satisfied by M k t . The remaining of this section is devoted to relate this equation to Eq. (5).
Set H := D 2 φ ∞ = Lip(∇φ) and notice that, owing to Assumption 1-(i), (ii), the function Consequently: Integrating Eq. (15) between 0 and t ≤ T and using the last inequality yields so that, by further manipulating the left-hand side and taking the limit for k → ∞, we obtain To infer from (17) that µ t solves (5), we need the following convergence result.
Proof. (i) For the first limit we write and, up to passing to a suitable subsequence of (M k • ) k≥0 , we conclude by applying Proposition 3.
(ii) For the second limit we preliminarily observe that , and the thesis follows from Eq. (18). Combining Eq. (17) and Lemma 4, and thanks to the arbitrariness of φ, we obtain that µ • solves problem (4) in the sense of Definition 1. In conclusion, we have proved: to the Cauchy problem (4).

Approximation of the solutions
This section is devoted to a convergence analysis of the numerical scheme proposed in [9] for the approximation of the solutions to problem (4). We begin by sketching the main ideas which underlie the construction of the scheme, referring the reader to the above-cited paper for a detailed derivation. The scheme is obtained from a twofold approximation of problem (4), in time and in space. To this goal, we introduce a discretization of the time interval [0, T ] by means of discrete instants t k n = n∆t k , where the index n ranges from 0 to a value N k such that N k ∆t k = T and the time step ∆t k > 0 tends to 0 when k → ∞. By this discretization, we obtain the discrete-time dynamical system (8). Then, we introduce a space discretization in the following way. We define a pairwise disjoint partition of R d made of measurable elements for all k ≥ 0. For the sake of simplicity, we assume that the E k i 's are hypercubes of edge length h k > 0, with h k → 0 when k → ∞. Specifically, Note that the index k identifies the level of refinement of the numerical grid. Using this space discretization, we approximate both the initial conditionμ and the measures µ k n by means of piecewise constant measures λ k n ≪ L d , n = 0, . . . , N k , k ≥ 0. More precisely, dλ k n = ρ k n dx with The space discretization makes it necessary to approximate also the flow map γ k n defined in Eq. (9). This is accomplished by the mapping x k i being a point of the grid cell E k i (e.g., its center x k i = ih k ). In practice, the velocity v[µ k n ](x) is approximated by the piecewise constant fieldṽ k n (x) taking in E k i the value that v, computed w.r.t. the measure λ k n , takes in x k i . Notice that ṽ k n (x) ≤ V for all x ∈ R d , all n = 0, . . . , N k , and all k ≥ 0 because of Assumption 1-(i).
By imposing λ k n+1 (E k i ) = (γ k n #λ k n )(E k i ) for each i ∈ Z d , we deduce the following explicit-intime scheme relating recursively the coefficients {ρ n i } at two successive time steps: where h d k is L d (E k i ). To start up the scheme one has to provide the coefficients ρ 0 i , that we obtain as We are now ready to start the analysis of the proposed scheme. To this end, we begin with a simple property of the approximation measures.
where we have used the σ-additivity ofμ.
(ii) If we assume now that λ k n is a probability measure for a certain n, using Eq. (19) we get ρ n+1 i ≥ 0 all i and moreover where we have used the fact that (γ k n ) −1 (E k i ) are pairwise disjoint and that set operations, e.g., union, commute with the inverse image of a function. By induction on n, and by the arbitrariness of k ≥ 0, the claim follows.
The next result shows that the link between two successive measures λ k n , λ k n+1 is a push forward as defined by Eq. (1), provided we restrict test functions to simple functions adapted to the spatial grid {E k i } i∈Z d , i.e., piecewise constant functions s : R d → R of the form Proof. We have Notice that s(γ k n (x)) = i.e., s •γ k n is piecewise constant over the sets (γ k n ) −1 (E k i ), which form a pairwise disjoint partition of R d . Therefore, for any given A ∈ B(R d ), For A = E k j this enables us to continue the previous computation as and to obtain the thesis.
An immediate consequence of this lemma is the following result, that will be fundamental for the sequel.

Lemma 8. For all Lipschitz continuous
Proof. We consider the simple function and, using Lemma 7, we compute: Notice thatγ k n #λ k n is a probability measure. The Lipschitz continuity of ϕ entails and, due to the arbitrariness of ϕ and n, the thesis follows.
To address the convergence of the measures λ k n , it is convenient to introduce the following linear-in-time interpolation: with Λ k t a probability measure for all t ∈ [0, T ] and all k ≥ 0. This definition is reminiscent of the definition of M k t (cf. Section 3). In order to prove a convergence result about Λ k t , from now on we make the following assumption: As a consequence, there exists a sequence (β k ) k≥0 ⊂ R, with β k > 0 all k and lim k→∞ β k = 0, such that h k = β k ∆t k . By convergence, it further results sup k≥0 h k , sup k≥0 ∆t k , sup k≥0 β k < +∞.

Regularity of Λ k
• This section is devoted to establish some regularity properties of the curves Λ k • .
Lemma 9. We have λ k n ∈ P 1 (R d ) for all n = 0, . . . , N k and all k ≥ 0. In particular, i.e., first order moments of the λ k n 's are uniformly bounded. Proof. (i) We begin by proving an inequality for n = 0. Recalling Eq. (20), we have

Considering that |α
where we have used that, in the double integral, both x and y are points of the same grid cell whence the thesis for n = 0.
(ii) We obtain the general case 0 < n ≤ N k by induction from this and Lemma 8. Choosing ϕ(x) = |x| in the latter yields whence, summing telescopically and using h k = β k ∆t k , The thesis now follows taking the supremum of both sides in n and k while considering that t k n ≤ T .
Lemma 10. For all k ≥ 0, whereβ := sup k≥0 β k , i.e., the curves Λ k • ∈ C([0, T ]; P 1 (R d )) are Lipschitz continuous uniformly in k. Moreover, Owing to Lemma 9, we can find a uniform upper bound on the first moments of the λ k n 's: there exists a constant C > 0, independent of n and k, such that which says that Λ k t ∈ P 1 (R d ) for all k ≥ 0 and all t ∈ [0, T ] with uniformly bounded first moment.
(ii) We argue the Lipschitz continuity of the mapping t → Λ k t as in Lemma 2. In particular, for s, t ∈ [0, T ] with s ≤ t we obtain where 0 ≤ m ≤ n ≤ N k are such that s ∈ [t k m , t k m+1 ] and t ∈ [t k n , t k n+1 ]. In order to estimate W 1 (λ k i , λ k i+1 ) for a generic 0 ≤ i ≤ N k we fix ϕ ∈ Lip 1 (R d ) and use Lemma 8: Computing as in Lemma 2, in view of Eq. (23) this yields W 1 (Λ k s , Λ k t ) ≤ C(t − s) and we have the thesis.

Limit equation for Λ k
• The next step is to find, similarly to what we did in Section 3.2, an equation satisfied by Λ k • in which to pass to the limit k → ∞. Fix a test function φ ∈ C ∞ c (R d ) and notice that the mapping t → R d φ dΛ k t is Lipschitz continuous because so is the mapping t → Λ k t in view of Lemma 10. Then, owing to Rademacher's Theorem, it is a.e. differentiable. Using expression (21), we find that its derivative is Let us introduce now the function g k n : i.e., the flow map γ k n (cf. Eq. (9)) computed w.r.t. the measure λ k n instead of µ k n . Then whence, expanding φ • g k n in the second term at the right-hand side with Lagrange's reminder (cf. the analogous calculation performed in Section 3.2 for the function φ • γ k n ), wherex is a point of the segment connecting x and x + v[λ k n ]∆t k . On the other hand, computing as in Section 3.2 we find This is formally an equation satisfied by Λ k • . A derivation analogous to that performed in Section 3.2 gives where C > 0 is a constant depending only on V , Lip(v), Lip(∇φ) = D 2 φ ∞ . In addition, from Eq. (22) we know W 1 (λ k n , λ k n+1 ) ≤ (V + 2 √ dβ)∆t k , thus C being a new constant including the previous one and V + 2 √ dβ. To estimate the remaining term at the right-hand side, we need to adapt Lemma 8 to g k n .
Lemma 11. For all Lipschitz continuous ϕ : Proof. It suffices to observe that The thesis follows from this inequality, combining the triangle inequality and Lemma 8.
With this result, we can further manipulate the previous inequality and obtain d dt which, integrating both sides in time from 0 to t ≤ T , implies and finally, taking the limit k → ∞, Thanks to Eq. (25), we are in a position to prove our convergence result of the numerical scheme.
Proof. Arguing as in Lemma 4, if lim k→∞ sup t∈[0, T ] W 1 (Λ k t , µ t ) = 0 we know that, up to subsequences, The claim of the theorem then follows from Eq. (25) and the arbitrariness of φ, provided we have also R d φ dλ k 0 → R d φ dμ when k → ∞. To show the latter fact, it is enough to prove that W 1 (λ k 0 ,μ) → 0 when k → ∞.
Recall that dλ k 0 = ρ k 0 dx, with ρ k 0 given by Eq. (20). Fixing ϕ ∈ Lip 1 (R d ) and reasoning like in the proof of Lemma 9-(i), we find that |x − y| dy, so that from the previous calculations we deduce We conclude this section with a simple criterion which implies the convergence property assumed in Theorem 12.

Corollary 13. Assume there exists a bounded set
From the arbitrariness of k, t it follows supp Λ k t ⊆ K for all k ≥ 0 and all t ∈ [0, T ]. Since K is bounded, we can find a ball B R (0) with radius R > 0 so large that K ⊆ B R (0). Consequently i.e., the Λ k t 's have uniformly bounded moments of any order p ≥ 0. This implies that {Λ k t } k≥0 is relatively compact in P 1 (R d ) for all t ∈ [0, T ], which, together with the equicontinuity of the family {Λ k • } k≥0 entailed by the estimate (22), allows us to apply Ascoli-Arzelà's Theorem and conclude that the sequence (Λ k • ) k≥0 is relatively compact in C([0, T ]; P 1 (R d )). Thus, up to subsequences, it converges to some µ • ∈ C([0, T ]; P 1 (R d )), which is a weak solution to problem (4) because of Theorem 12.
Remark (CFL condition). Let us define α k := V /β k for all k ≥ 0. Then the condition expressed by Assumption 3 is equivalent to which implies γ k n (x) − x ≤ α k h k all x ∈ R d , i.e., the displacements produced by the mapping γ k n are bounded by the quantity α k h k . As a consequence, the number of nonempty intersections γ k n (E k j ) ∩ E k i , j ∈ Z d , is finite and bounded from above uniformly in i, which in particular makes the series in Eq. (19) actually a finite sum for all i.
We observe that Eq. (26) is a generalization of the Courant-Friedrichs-Lewy (CFL) condition, allowing α k → +∞ when k → ∞ to ensure convergence to continuous-in-time-and-space solutions as stated by Theorem 12. With the time step ∆t frozen, Eq. (26) is used in [9] to prove the stability of the numerical scheme (19) in approximating the solutions to the discrete-in-time model (8).

Models of crowd and swarm dynamics
In [5,8] a class of models describing the collective dynamics of swarms and crowds has been introduced. These models are based on the idea that each individual of the group is an intelligent (or active) agent, able to develop a behavioral strategy for pursuing specific goals. Since agents are not passively dragged by external forces, the Newtonian approach is abandoned in favor of a kinematic one, in which the velocity of the agents stems from few basic behavioral rules. These concepts are formalized in [6], where it is shown that the above-cited models can be given a common formulation within the framework of Eq. (3). In particular, the corresponding velocity of the agents is Adopting the terminology of the kinetic theory for active particles, see [2], we call test agent an agent potentially concerned with interactions, whose position is described in the above formula by the variable x, and field agents the agents distributed in space (variable y), which the test agent may interact with. In more detail: is the test agent's desired velocity, i.e., the velocity of an isolated test agent in the absence of interactions; (ii) U x is the interaction neighborhood of the test agent. It conveys the idea that the test agent experiences nonlocal interactions with some selected field agents, namely those inside U x .
In particular, the effect of the term χ Ux in Eq. (27) is to rule out interactions of the test agent with field agents outside U x ; (iv) r : R d → R d is the direction of the interaction, depending on the relative position of the interacting agents and oriented in such a way that r(y − x) · (y − x) ≥ 0, with in addition |r| ≤ 1; (v) f : [0, +∞) → R is the interaction strength, which depends on the distance between the interacting agents. If f < 0 the interaction is repulsive, i.e., the test agent tries to avoid local aggregation with the field agents (as it is common in crowds under normal -i.e., nonpanic -conditions). If f > 0 the interaction is attractive, i.e., the test agent aims at staying close to the surrounding field agents (as in swarms, where group cohesion is advantageous e.g., for food search or predator avoidance).
The shape of U x depends partly on the criteria used for selecting the field agents to interact with. In general, when U x is symmetric w.r.t. x, interactions are said to be isotropic, as opposed to anisotropic when U x is not symmetric. Notice also that the interaction integral in Eq. (27) is actually computed w.r.t. the mass measure N µ t , because interactions depend on the number of field agents in U x .
Prototypes of the above objects are, among others: Although meaningful from the modeling point of view, some of these choices need to be adapted in order for the velocity field (27) to comply with Assumption 1. Concerning this, let us introduce the vector-valued function F : Such a form encompasses a broader class of models of coordinated behavior based on Eq. (3), for instance those in [4]. From now on, we will take (28) as our velocity model.
Assumption 4 (Properties of v as in Eq. (28)). We assume that the velocity field in Eq. (28) satisfies the following properties.
(i) v d is Lipschitz continuous and bounded in R d .
(ii) There exists R > 0 such that, for each x ∈ R d , the set U x ⊆ B R (x) is measurable and isometric to a reference set U 0 ⊆ B R (0).
Remark. Some comments on Assumption 4 are in order.
(i) In the sequel we denote where R x ∈ R d×d is a rotation matrix possibly depending on the point x.
(iv) The function χ A can be thought of as a mollification of ½ A . In particular, by continuity χ A = 0 on ∂A. Because of the isometry between U 0 and U x , one can check that the following relation holds: In the next two sections we show that, with both isotropic and anisotropic interactions, Assumption 4 is sufficient to apply our existence and approximation results to the above-mentioned crowd and swarm models. First, we consider the case of a spherical neighborhood, which is the most important prototype for isotropic interactions. Later, we extend the analysis to any bounded neighborhood, including the significant cases of anisotropic interactions.

Spherical interaction neighborhood
In case of isotropic interactions we set U x = B R (x), and consequently U 0 = B R (0). Since B R (x) is invariant under rotations, the mapping ξ x is simply ξ x (z) = z + x.
By the change of variables z = ξ −1 x (y) in Eq. (28), and recalling furthermore Eq. (30), we can rewrite the velocity as where ξ −1 x #µ t is in P 1 (R d ) for all µ t ∈ P 1 (R d ) and all fixed x. We state a preliminary result, which is useful to verify that such a velocity field complies with Assumption 1.
Proof. To study the expression it is convenient to distinguish three cases.
Proof. In the sequel, x ∈ R d and µ ∈ P 1 (R d ) are fixed but arbitrary. We verify the items of Assumption 1 in order.
(i) The field v is uniformly bounded thanks to (ii) As for Lipschitz continuity, we check it separately w.r.t. to x and to µ.
Owing to Proposition 15, we can state that models based on Eq. (3), with a velocity field featuring isotropic interactions as in Eq. (28), see e.g., [8], admit probability measure solutions for any initial distribution of the agents with finite first and second order moments (e.g., an initial distribution with compact support). Moreover, such solutions can be duly approximated using the numerical scheme (19).
Remark (Unbounded interaction neighborhood). As a modification to Assumption 4, we can allow U 0 = U x = R d along with the boundedness assumption F (x) ≤ C F for all x ∈ R d . The above arguments can be promptly adapted to show that also in this case the velocity field (28) complies with Assumption 1. However, it should be noted that, for the applications we have in mind, the physically relevant cases are those in which the interaction neighborhood is bounded.

Bounded interaction neighborhood
In this section we drop the specific hypothesis that U x be a ball. We allow it to have a generic shape, with the only requirement of being bounded, i.e., contained in a ball. This encompasses the important case of anisotropic interactions. In such cases, the neighborhood U x may not be invariant under rotations, therefore we have to consider the full form (29) of the transformation ξ x . Performing again the change of variables z = ξ −1 x (y) in the integral (28), the velocity takes with the rotation matrix explicitly appearing in the argument of the function F . To deal with it, for the sake of simplicity we confine ourselves to the two-dimensional case (d = 2), for then a simple representation of R x is available: being the angle of rotation which determines the local orientation in the point x of the neighborhood of interaction. The choice of ϑ x has to do with the way in which the anisotropy of the interactions is modeled. In the models of crowd and swarm dynamics we are considering, ϑ x is the angle formed by the desired velocity v d (x) w.r.t. a fixed reference direction, for instance the horizontal one identified by the unit vector i. Assuming for simplicity that v d has constant unit modulus, this implies In the second formula, v d (x) and i are thought of as embedded into R 3 , with × denoting vector product and k the unit vector orthogonal to the plane of v d (x) and i.
Remark. More in general, Eq. (31) holds with v d (x) replaced by v d (x)/ |v d (x)|, which is a Lipschitz continuous field provided |v d | is uniformly bounded away from zero. The formalism introduced above allows us to prove the following technical fact.
and the same is true also using inverse matrices. In addition, hence the thesis follows from the Lipschitz continuity of v d .
With Lemma 16 we are in a position to prove that the velocity (28) complies with Assumption 1 also in case of anisotropic interactions.
to the probability for convex combinations follow straightforwardly from calculations entirely analogous to those performed in the proof of Proposition 15. In fact, it is sufficient to observe that, R x being an isometry, the function F (R x ·) is Lipschitz continuous and bounded in B R (0) with Lip(F (R x ·)) = Lip(F ) for all x ∈ R d . Moreover, by the same argument as in the proof of Lemma 14, the function F (R x ·)χ U0 (·) is Lipschitz continuous in R d with the same Lipschitz constant as F χ U0 , thus in particular independent of x.
(ii) Lipschitz continuity w.r.t. to x is instead more delicate, because it involves directly the rotation matrix R x . Let x 1 , x 2 ∈ R 2 and fix µ ∈ P 1 (R 2 ), then In the first integral at the right-hand side of (32) we can assume z ∈ U 0 , for otherwise χ U0 (z) = 0. Hence |z| ≤ R and moreover R x z ∈ B R (0) for all x ∈ R 2 . Lipschitz continuity of F in that ball, along with Lemma 16, implies having observed that (ξ −1 x2 #µ)(U 0 ) = µ(U x2 ) ≤ 1. As far as the second integral at the right-hand side of (32) is concerned, we have Notice that we can confine ourselves to y ∈ U x1 ∪U x2 , for otherwise χ U0 (ξ −1 xj (y)) = χ Ux j (y) = 0 for both j = 1, 2. We distinguish two cases.
In this case U x1 ∩ U x2 = ∅ because the balls B R (x 1 ), B R (x 2 ) are disjoint. Thus: For all y ∈ U xj , j = 1, 2, it results ξ −1 xj (y) ∈ U 0 ⊂ B R (0), hence R x1 ξ −1 xj (y) ∈ B R (0) and we can use the boundedness of F in that ball to get 2R , therefore we conclude In this case the neighborhoods U x1 , U x2 need not be disjoint, however we can resort to the Lipschitz continuity of the function F (R x ·)χ U0 (·): and further, thanks to Lemma 16 and to the fact that R −1 x2 is an isometry, Let us examine the term with the integral. If y ∈ U x1 then |y − In conclusion, from (35) and (36) we deduce that there exists a constant C > 0 such that This, together with the estimate (33), completes the proof of Lipschitz continuity of the mapping x → v[µ](x).
In view of Proposition 17 we conclude that two-dimensional models based on the velocity (28) with anisotropic interactions have probability measure solutions, which can be approximated arbitrarily well using the scheme (19) on finer and finer numerical grids. Notice that, as far as e.g., crowd dynamics is concerned, two-dimensional problems are enough for applications.
Remark (Higher dimension). For d > 2 additional technicalities arise, due to a more complex structure of the rotation matrix. Nevertheless, in the special case that the desired velocity is constant in x, it is straightforward to extend the results to any spatial dimension. In fact, the rotation matrix being independent of x, Proposition 17 can be proved without using Lemma 16, which is the only point where we use the explicit representation of the matrix. Models with constant desired velocity have been recently proposed for swarm dynamics problems [5] and for rendez-vous algorithms [4].
Remark (Zero desired velocity). When interactions are anisotropic and the desired velocity is zero [5], the orientation of the neighborhood of interaction cannot be defined by Eq. (31). However, this issue can be solved by replacing v d in (31) with any other Lipschitz continuous unit vector field, e.g., a nonzero constant one, with the only purpose of defining a rotation angle. Clearly, this problem does not arise if the desired velocity is zero but interactions are isotropic, as in [4].

Case study: discrete models
In this last section we put the theory into operation by giving an example of explicit solution to the Cauchy problem (4). Furthermore, we visualize the convergence to such solution of the approximations produced by the numerical scheme discussed in Section 4.
For illustrative purposes, we consider the simple case of a system of agents featuring isotropic interactions with zero desired velocity: Moreover, we prescribe as initial condition the discrete probability measurē where δ x l 0 is the Dirac mass concentrated in x l 0 (i.e., for any A ∈ B(R d ) it results δ x l 0 (A) = 1 if x l 0 ∈ A, δ x l 0 (A) = 0 otherwise) and x 1 0 , . . . , x N 0 are N selected points in R d . We recall thatμ is the common law of the random variables X i 0 , i = 1, . . . , N , expressing the initial positions of the agents. The structure (38) ofμ implies that each X i 0 is a random variable taking almost surely the values x 1 0 , . . . , x N 0 , each with probability 1/N . Therefore we are considering a discrete model of a group of indistinguishable agents initially concentrated in x 1 0 , . . . , x N 0 . The indistinguishability is reflected by the fact that any of the points x l 0 can be, with equal probability, the initial position of the generic i-th agent.
We find a solution to the Cauchy problem (4) with initial condition (38) by the method of the characteristics (cf. Section 2.2). In particular, we know from Eq. (6) that µ t = γ t #μ, where γ t is the flow map. From the linearity of the push forward we first deduce µ t = 1 N N l=1 γ t #δ x l 0 , then we observe that for any measurable set A it results Hence γ t #δ x l 0 = δ γt(x l 0 ) and we can write the solution as Recalling that µ t is the law of the random variablesX i t = E[X i t |X i 0 ], i = 1, . . . , N , from Eq. (39) we infer that eachX i t takes almost surely the values γ t (x 1 0 ), . . . , γ t (x N 0 ), each with probability 1/N . Therefore, at every time t > 0 the distribution of the group of agents is concentrated on the discrete set of points γ t (x 1 0 ), . . . , γ t (x N 0 ). Notice that again we cannot associate deterministically a given agent with its position because of the indistinguishability of the agents. However, we can describe the trajectories of the agents by means of the mappings t → γ t (x l 0 ), l = 1, . . . , N . The flow map is defined by Eq. (7), which with the velocity (37) and the initial condition (38) yields for all x ∈ R d . The discrete structure (39) of µ t makes it actually sufficient to solve problem (40) for x = x l 0 . Setting x l (t) := γ t (x l 0 ) and computing Eq. (40) for x = x l 0 we find the initial-value problem x l (0) = x l 0 (l = 1, . . . , N ), thus we conclude that constructing the measure (39) amounts to solving the system of ODEs (41), whose solutions are the positions of the agents at t > 0. From the numerical point of view, we can either approximate the measure (39) by using the scheme (19) or integrate problem (41) via a standard numerical method for ODEs. The remaining part of this section is devoted to show the convergence of the approximations obtained from (19) to the numerical solutions of (41) in a toy model. Let us consider a one-dimensional problem (d = 1) with N = 10 agents, whose initial positions are sampled from a uniform distribution on the interval [0, 1]. Agents repel each other according to the following repulsion function: which is from the product of f (z) = − a max{|z| , ε} , r(z) = z max{|z| , ε} .
The repulsion strength f is inversely proportional to the distance between the interacting agents (up to a minimal threshold ε) and the direction of the interaction r is a Lipschitz mollification of Simulations of the ODE system (41) and of problem (4) with initial condition (38) were run independently and their results visualized on the same graphs of Fig. 1 for duly comparison. In particular, the ODE system was numerically integrated using an explicit-in-time Euler scheme, whereas the conservation law for the probability µ t was solved through the scheme (19) on different meshes, choosing Note that this entails β k ∼ h 1−δ k in Assumption 3 and α k ∼ h δ−1 k in Eq. (26). Figure 1 displays the numerical solution computed with δ = 0.9 and h k = 1/k in the three cases k = 10 2 , 10 3 , 10 4 , at two different time instants. Convergence toward the exact solution (39) as the mesh refines can be visually appreciated, although approximating singular measures with densities is a difficult task, which requires very fine and computationally expensive meshes to get accurate results. Therefore, when such a structure of the solution is numerically sought, it is more efficient to exploit the stated equivalence of the original problem with the discrete system of ODEs.