Mean Field Games of Controls with Dirichlet boundary conditions

In this paper we study a mean-field games system with Dirichlet boundary conditions in a closed domain and in a mean-field of control setting, that is in which the dynamics of each agent is affected not only by the average position of the rest of the agents but also by their average optimal choice. This setting allows the modeling of more realistic real-life scenarios in which agents not only will leave the domain at a certain point in time (like during the evacuation of pedestrians or in debt refinancing dynamics) but also act competitively to anticipate the strategies of the other agents. We shall establish the existence of Nash Equilibria for such class of mean-field of controls systems under certain regularity assumptions on the dynamics and the Lagrangian cost. Much of the paper is devoted to establishing several a priori estimates which are needed to circumvent the fact that the mass is not conserved (as we are in a Dirichlet boundary condition setting). In the conclusive sections, we provide examples of systems falling into our framework as well as numerical implementations.


Introduction
Context and literature review.The study of dynamical systems of interacting agents has become extremely popular in the last years, since their flexibility and tractability has allowed them to faithfully model the main emerging patterns of several biological, social and economic phenomena: from the collective motion of animals [12] and pedestrians [13,38], to the behavior of market participants [28] and the emergence of price volatility [8], passing through the study of optimization heuristics [30,40].For recent developments on their application in the modeling of complex systems, we point to the surveys [11,19,27,50] and references therein.
Beside the literature on this topic, another complementary literature has grown in parallel at the same pace: the one on the control of such systems.This literature can be roughly divided into two main streams.The one on centralized controls focuses on the modeling approach of an external control agent that exerts its force to steer the system towards a desired optimal configuration.Many papers have studied this framework in the discrete [23,18] as well as in the continuous setting [20,33,46,32,22], where the framework has come to be known as mean-field optimal control.
Alongside this centralized approach to control, there is the stream dedicated to decentralized control strategies: these consist in assuming that each agent, besides being subjected to forces induced in a feedback manner by the rest of the population, follows an individual strategy to coordinate with the other agents based on their position.Mean-field games (in short MFG) stem from this control setting when the number of agents is very large, and were first introduced in [43] and independently in the engineering community by [39].For a comparison of the MFG setting and the mean field optimal control one, we refer the interested reader to the book [14].
MFG have provided a powerful framework for the description of large systems of mutually competing agents, such as pedestrian dynamics (for which we highlight to the reader the interesting papers [21,48]) and market dynamics (a topic which has seen a wealth of literature flourishing since the seminal paper [37]).For these modeling scenarios, it is of particular relevance the study of MFG with Dirichlet boundary conditions, since they arise naturally in situations where agents can leave the domain, such as in evacuation problems, in the case of financial default or in an exhaustible resources framework [36].We mention that, in the paper [31], the authors have proven the existence and uniqueness of solutions to a stationary MFG with Dirichlet boundary conditions, while in [15] the author has studied well-posedness of MFG in the context of optimal stopping (a framework resembling Dirichlet boundary conditions).
However, the MFG framework may not be enough to fully capture the behavior of competing agents since in real-life situations, when choosing a strategy, they usually take into account not only the positions of their competitors but also their strategy.For such reason, a more general class of MFG systems were introduced in [35] under the terminology of extended MFG models or mean-field game of controls to allow for stochastic dynamics depending upon the joint distribution of the controlled state and the control process.The mean-field game of controls setting allowed the authors of [25] to analyze optimal liquidation strategies in a context where trader's choices affect the public price of a good, and has since been employed by several authors [1,9,34].Existence and uniqueness results for mean-field game of controls on T d were discussed in the recent work [41].
Beside the recent advances on the theoretical side, a rich literature on numerical methods for MFG has flourished.The first results go back to the pioneering times of MFG theory, such as [4], then many other developments have been published (such as the articles [3,2,6]).We refer to [7] for an up-to-date and more complete picture on the state-of-the-art on the numerics of MFG equations.However, if we consider the study of mean-field games systems in a mean-field game of controls setting, the literature is much less developed.In particular, [5] has proposed a finite difference scheme coupled with an iterative method for solving the systems of nonlinear equations that arise in the discrete setting.Further details on this topic can be found in [44].
Aim of the paper and main contributions.In this paper we study the following family of mean-field game of controls models with Dirichlet boundary conditions that naturally arise in the context of pedestrian and investor dynamics: ∂ t u(t, x) + σ∆u(t, x) − H(t, x, Du(t, x); µ t ) = 0, for x ∈ Ω and t ∈ [0, T ] ∂ t m t (x) − σ∆m t (x) − div(D p H(t, x, Du(t, x); µ t )m t (x)) = 0, for x ∈ Ω and t ∈ [0, T ] µ t = (Id, α * (t, •, Du(t, •); µ t )) ♯ m t , for t ∈ [0, T ] m 0 (x) = m(x), u(T, x) = ψ(x, m T ), for x ∈ Ω m t (x) = 0, u(t, x) = ψ(x, m t ), for x ∈ ∂Ω and t ∈ [0, T ] Here, m t denotes the distribution of the agents' states, u is the value function of the optimal control problem (with optimal control α * ), and µ t stands for the joint distribution of the state and the control.The boundary condition for m, that is m = 0 on ∂Ω, indicates that the agents stop taking part to the game as soon as they reach the boundary.As we shall see in a moment, the above system is obtained from a standard MFG setting as soon as we require that agents need to optimize their strategy by taking into account the strategy of the others, which leads to the fixed-point relationship µ t = (Id, α * (t, •, Du(t, •); µ t )) ♯ m t between the joint distribution µ t and its marginals m t and α * (t, •, Du(t, •); µ t ) ♯ m t .A primary contribution of this paper is that we give sufficient conditions for the existence of Nash Equilibria for such systems (that is, their well-posedness).To prove the result we need to employ a double fixed-point method, in accordance to the formulation of the system.To this end we need to establish several a priori regularity estimates on the dynamics of our system.The Dirichlet boundary condition prevents us to leverage upon the conservation of mass of m t for such estimates, hence we need to rely on weaker results to work around the issue, and to resort to techniques that, to the best of our knowledge, are novel in the mean-field game of controls framework.The importance of Dirichlet boundary conditions in applications can be appreciated in [5], which provides novel numerical schemes to simulate solutions of such problems, although without providing existence results.Moreover, apart from the technical leap of dealing with Dirichlet boundary conditions, we remark that the setting is more general from the one of [41] since we also prove our existence result on R d (rather than on T d ).Furthermore, our framework allows for richer agents' dynamics like, for instance, the Cucker-Smale and the Hegselmann-Krause models in crowd motion.All these differences introduce several specific technical complications that set the two works apart, as far as technical details and modeling applications are concerned.The uniqueness problem for our modeling framework shall be the subject of a dedicated forthcoming work.
A second contribution of the paper is that we provide two explicit models fitting into our framework belonging to very different paradigms: one in the context of evacuation problems, the other formulated in a debt refinancing setting.This shows the flexibility of our framework and its potential to be exported to different modeling scenarios.
A third and final contribution is the implementation, through an iterative procedure based on the particle method, of mean-field games system with Dirichlet boundary conditions in a mean-field game of controls setting.This method draws has an analogy with the results of convergence of large population games to mean field games (see, for example, [45]).Up to the best of our knowledge, our approach is one of the first uses of particle methods to numerically solve mean-field games system with Dirichlet boundary conditions in a mean-field game of controls setting, in which an iterative procedure is implemented for reducing the control space.
Derivation of the model.To help the reader in the interpretability of the mean-field game of controls system (1), we now give a formal derivation from the particle dynamics of an individual agent optimizing its strategy.
We denote by X t ∈ Ω the state of an infinitesimal agent of the system in the state space Ω, and by α t its control, which takes values in the control space A ⊂ R d .The distribution density of the state-control pair (X t , α t ) shall be denoted by µ t , which is a probability measure on the product space Ω × A whose first marginal π 1♯ µ t is the distribution of the agents' states m t .
We assume the dynamics of an agent to be where W t is a d-dimensional Brownian motion.Due to the presence of a boundary, the integral form of the dynamics is given by for any t∈ (0, τ ), where τ denotes the exit time of the agent from Ω The optimal control α * t is chosen to solve The control set U is the set of progressively measurable functions α : [0, T ] → A with respect to the filtration generated by the process {W t } t∈[0,T ] .Notice that, since we are in a mean-field game of controls setting, the dynamics of each agent is affected not only by the average position of the rest of the agents, given by the distribution m t , but also by their average optimal choice α * t , whence the dependence of L on µ t , instead of just its first marginal as in the usual MFG framework.
For a given µ : [0, T ] → M 1 (Ω × A), the value function u(0, x 0 ) = inf α∈U J(0, x 0 , α; µ) satisfies the following Hamilton-Jacobi-Bellman equation with Dirichlet boundary conditions where H stands for the Hamiltonian of the system If we assume that for every (t, x, p, ν) there exists a unique α * (t, x, p, ν) for which then the optimal control of an infinitesimal agent at time t is given by α * (t, X t , Du(t, X t ); µ t ).This means that, in an equilibrium configuration, the measure µ t is the pushforward of m t by the map x → (x, α * (t, x, Du(t, x); µ t )), i.e., it holds the fixed-point relationship This, in particular means that µ t is uniquely determined by Du t and m t , hence for every t ∈ [0, T ] it holds for some function Γ (which will be investigated thoroughly in Section 3.1).
At the same time, the evolution of the mass m t satisfies a Fokker-Planck type equation with drift given by b(t, x, α * (t, x, Du(t, x); µ t )) = −D p H(t, x, Du(t, x); µ t ), which yields the system Plugging together the dynamics (4) of u and (8) of m, as well as the fixed-point relationship (6) for µ t , we get the mean-field games system (1).
Structure of the paper.The paper is organized as follows: in Section 2 we shall state the main assumptions under which we will establish the existence of Nash equilibria of system (1).Section 3 contains the main result of the paper, that is the well-posedness of system (1) under the set of assumptions stated in Section 2 through a priori estimates and regularity techniques.In Section 4 we shall provide examples of systems satisfying our set of hypotheses.We conclude the paper with Section 5 which reports a numerical implementation of our modeling scenario.

Preliminaries and main assumptions
In this section we shall first give some preliminary notations and results that will be instrumental in the following sections.We then move on to list the assumptions under which we shall prove the well-posedness of the mean-field game of controls system (1).
Let Ω ⊆ R d be an open, precompact set with smooth boundary ∂Ω, fix T > 0 and define Q T := [0, T ] × Ω.We shall denote by | • | any finite R d -norm.For any domains X, Y , we define by • C n (X; Y ) the space of all functions from X to Y which are continuous together with all their derivatives up to order n, equipped with the norm ∥ • ∥ C n (X;Y ) (or simply ∥ • ∥ C n ) whenever clear from the context); • C α (X; Y ) for any α ∈ (0, 1) the space of all Hölder continuous functions from X to Y with exponent α; • C n+α (X; Y ) the set of all functions whose n derivatives are all in C α (X; Y ); • for any subset X ⊆ Q T , C n+α,(n+α)/2 (X; Y ) the set of functions from X to Y ⊆ R n which are continuous together with all the derivatives D r t D s x for 2r + s ≤ n and whose derivatives of order n are Hölder continuous with exponent α in space and exponent α/2 in time; As a shorthand notation, we shall write C γ,γ/2 (X; Y ) in place of C n+α,(n+α)/2 (X; Y ) with the convention that n = ⌊γ⌋ and α = n − γ.We denote the γ-Hölder norm of C γ,γ/2 (X; Y ) by ∥ • ∥ C γ,γ/2 (X;Y ) .We remember that the γ-Hölder norm is the sum of the supremum norms for derivatives up to ⌊γ⌋-order, plus the sum of (n − γ)-Hölder coefficients for the ⌊γ⌋-order derivatives, for a precise definition we refer the reader to [42, pages 7-8].
Given a compact set B ⊂ R d and Σ a σ-algebra over B, we shall denote by M 1 (B) the set of all positive measures from Σ to R such that m(B) ≤ 1.It is well-known that the weak * convergence in M 1 (B) can be metrized by any of the following metrics (parametrized by a > 1): .
where, by the Stone-Weierstrass theorem, the set (f k ) k∈N + can be obtained by putting together any basis of the vector space of polynomials of degree n over the ring R[x 1 , . . ., x d ] for every n ∈ N.
The following technical result shall be helpful in the following.
Lemma 2.1.Let B ⊂ R d be a compact set, let and let h : B → R d be a continuous bounded function.Then for every a > 0 it holds where we used the fact that ν 1 , ν 2 ∈ M 1 (B) and we denote by deg(f k ) the degree of f k .But then, using that f 1 ≡ 1 we have This concludes the proof.
Remark 2.2.The fact that, for a space of measures on a compact set, the Wasserstein and weak * convergence are equivalent is well-known (see, e.g., [49,Thm 5.10]).Lemma 2.1 provides only a computable bound for such equivalence for a given continuous bounded function h, which shall be useful later on.
We now state the necessary hypotheses that the various functions of our extended MFG system need to fulfil.

Assumptions (A)
In the following, fix β > 0 and a continuous bounded function ξ : Ω × A → R d .We shall assume that (A1) The sets Ω and A are compact subsets of R d , and we denote by δ := δ(Ω × A), where δ is defined as in (9).
(A2) m ∈ C 2+β (Ω; R) such that m(x) ≥ 0 for every x ∈ Ω and satisfies the following conditions: (i) for every ν 1 , ν 2 ∈ M 1 (Ω × A) and for every t ∈ [0, T ] the monotonicity condition holds: and the corresponding function where the constant L satisfies (5) satisfies the following statement: there exist two continuous functions as well as where Γ is the function defined in (7).
Remark 2.3.We allow for β to be greater than 1 since, from the notation adopted at the beginning of the Section, in that case we refer to the Hölder regularity of the ⌊β⌋-order derivatives.
Solutions of system (1) will be interpreted in the classical sense.
What follows is the main theoretical result of the paper, the well-posedness of system (1).

Existence of Nash equilibria
In this section we shall prove that, under the Assumptions (A) stated in the previous section, there exists a β-classical solution of system (1).Throughout this section, we will fix a β > 0 and a bounded function ξ : Ω×A → R d , and assume Assumptions (A) hold.We shall start with some preliminary results on the Fokker-Planck equation and the fixed point relationship (6), and then we shall derive sharper a priori regularity estimates for u, m and µ, which will then be instrumental in obtaining the well-posedness of system (1).

Existence and stability of the map Γ
To establish the existence of the fixed point (6) we will need that m t ∈ M 1 (Ω), so that we may rely on the weak * compactness of this space.However, since we consider smooth solutions of the Fokker-Planck equation (8), there is no guarantee that this holds true.We shall prove this fact by using a similar technique to the one employed in [36].
and let m be a smooth solution of (8) with u and µ.Then for every t ∈ [0, T ] and x ∈ Ω it holds Therefore, under assumption (A2), we have m t ∈ M 1 (Ω) for every t ∈ [0, T ].
Proof.Let δ > 0 and ϕ δ : R → R be defined as pointwise.Finally, notice that ϕ ′ δ (0) = 0. We now multiply the function ϕ ′ δ (m t (x)) to (8) and we integrate in time and space to get We use integration by parts on both sides of the equation.On the left, we have the identity We can now invoke the dominated convergence theorem to pass the limit δ → 0 + inside the integrals in order to get By invoking assumptions (A2) (yielding m − = 0) and Young's inequality we get As a shorthand notation, set K(t, x) := D p H(t, x, Du(t, x); µ t ).By the assumptions on u and µ we have that ∥K∥ L ∞ (Q T ) < +∞, therefore Hölder inequality and Poincaré inequality yield the following estimate from above where C Ω,2 denotes Poincaré constant.Using Gronwall's lemma we finally get m − t (x) = 0 for every t ∈ [0, T ] and x ∈ Ω as desired.
To get the inequality ∥m t ∥ L 1 (Ω) ≤ ∥m∥ L 1 (Ω) it suffices to use the function ϕ δ (s) = s 1+δ .Indeed, using this new version of ϕ ′ δ (m(t, x)) in ( 12) we obtain Therefore, passing to the limit δ → 0 + we get having invoked Fatou's lemma for the first term and the dominated convergence theorem for the second term.
As anticipated, the well-posedness of the fixed-point map ( 6) shall follow from the compactness properties of the measure space M 1 .The strategy of this proof follows [25].
) and let α * be the unique maximizer of (10).
Since M 1 (Ω×A) is trivially convex and compact with respect to the weak * convergence of measures (by the Banach-Alaoglu theorem), to show that Φ[p, m] has a fixed point via the Schauder fixed point theorem we just need to prove that Φ[p, m] is continuous.To this end, let (µ n ) n∈N be a sequence in M 1 (Ω × A) weak * converging to µ ∈ M 1 (Ω × A).Then, by using Assumption (A6)-(ii), for every This shows existence.To prove uniqueness, assume that µ 1 and µ 2 are two fixed points of Φ[p, m].Then by assumption (A5)-(i) we get If we add and subtract the terms p(x) • b(t, x, α * (t, x, p(x); µ i )) for i = 1, 2 and we rearrange the expression, we get Next we show the stability of the fixed-point map Γ.
Therefore, if we invoke the dominated convergence theorem we get ) and the uniqueness of the weak * limit then imply µ = Φ(p, m)(µ), as desired.

A priori regularity estimates
We shall now extend Lemma 3.2 to Hölder continuous measure-valued curves.
Then there exists a C 0 := C 0 (M 0 , P 0 ) > 0 and a metric on M 1 (Ω × A) that metrizes the weak * convergence such that the application Ψ from the set has the measure-valued curve (Γ(p t , m t )) t∈[0,T ] as unique fixed point.In particular, (Γ(p t , m t )) t∈[0,T ] is Hölder continuous in time with exponent β/2 with respect to such metric.
Proof.As already argued in Section 2, the metric Notice the trivial estimate deg(k) ≤ k for any k ∈ N + .We shall prove that, for a proper choice of C 0 and ω, the map Ψ defines a contraction over the set (13) (where each µ is Hölder continuous with respect to the metric λ ω ) by using the fact that The right-hand side of the inequality instead implies which together with (15) yields Putting together ( 14) and ( 16) we arrive at Now denote by L 1 the Hölder constant in (A6)-(i).Since for every M 0 , P 0 > 0 it holds We can therefore fix ω := ω(M 0 , P 0 ) > 0 satisfying log(a) log(δ + 1) > ω > sup We therefore pass to bound from above the distance λ ω .Denote by µ t := Φ[p t , m t ](µ t ), whose existence and uniqueness for every t is guaranteed by the hypotheses on p and m and by Lemma 3.2.We have We now focus on the first term of the right-hand side of (18).Notice that, by Assumption (A6)-(ii) and Lemma 2.1 with h = ξ, B = Ω × A and our choice of a, we have the estimate where d a ≤ λ ω comes from a > (δ + 1) ω (as implied by ( 17)).
Moreover, the family (f k ) k∈N is a basis of Lipschitz continuous functions with Taking these information into account we may write where, as a shorthand notation, we denoted by ∥µ∥ β := ∥µ∥ C β/2 ([0,T ];M 1 (Ω×A)) .Notice that we used the Hölder continuity in time of the functions p and µ.
Concerning the second term on the right-hand side of (18), since it holds that sup then we also have In conclusion, using The choice of ω in (17) then yields This, in turn, implies that Ψ[p, m] is an application from the set in the product topology, and consider the family of functions ϑ n (t) = Γ(p n,t , m n,t ), as well as ϑ(t) := Γ(p t , m t ).Then (ϑ n ) n∈N converges uniformly to ϑ.
Proof.Since the sequences (p n ) n∈N and (m n ) n∈N are convergent, they are bounded in their respective β-Hölder norm by two positive constants P 0 and M 0 .Then we can take a uniform C 0 and ω for the entire sequence by using such P 0 and M 0 in Lemma 3.4.This implies that the metric λ ω on M 1 (Ω × A) of the previous result is such that it holds Since the set Z is compact by the Ascoli-Arzelá theorem, we can extract from (µ n ) n∈N a convergent subsequence to some µ ∈ Z and the convergence is uniform in time, that is On the other hand, we know from Lemma 3.3 that the sequence (Γ(p n,t , m n,t )) t∈[0,T ] converges pointwise to (Γ(p t , m t )) t∈[0,T ] ∈ Z, therefore by the uniqueness of the pointwise limit we have which in turn implies that (Γ(p n,t , m n,t )) t∈[0,T ] converges to (Γ(p t , m t )) t∈[0,T ] uniformly.
Notice that our smoothness assumptions on b and L transfer easily to the Hamiltonian of the system, as the following technical result shows.
We can now show that the stability of the map Γ easily translates into stability for the functions H and D p H. Lemma 3.7.Assume that the sequence Proof.We shall first show that α * (t, x, p n,t (x); Γ(p n,t , m n,t )) → α * (t, x, p t (x); Γ(p t , m t )) uniformly.
The thesis will then follow by the regularity assumptions on b and L, namely (A4)-(i) and (A5)-(ii), and the fact that H satisfies ( 19) and ( 20) by Proposition 3.6.
Let ω be determined by ( 17) (for P 0 and M 0 chosen according to the bounds of the sequences (p n ) n∈N and (m n ) n∈N ).Then the following estimate follows easily from Assumption (A6)-(ii) together with Lemma 2.1: By taking the supremum over (t, x) ∈ Q T , the statement is proven by invoking Lemma 3.5.
The following a priori Hölder bound for u and m follows easily from the regularity assumptions on the data (i.e., the sets, constants and functions listed in the assumptions) of the problem and the previous lemmas.Lemma 3.8.There exists a constant C depending only on the data of the problem such that, if then the measure valued curve µ t = Γ(Du(t, •), m t ) for every t ∈ [0, T ] is Hölder continuous in time with exponent β/2.Moreover, by Proposition 3.6 follows that the coefficients of the PDEs (4) and (8) are bounded in the parabolic Hölder space C β,β/2 (Q T ; R) and C β,β/2 (Q T ; R d ), respectively.
Therefore, since u ∈ C 2+β,1+β/2 (Q T ; R), m is a solution to a nondegenerate parabolic equation with coefficients bounded in C β,β/2 (Q T ; R).We can thus invoke [42, Theorem V.6.1,Equation (6.12)-(6.12')] to get the uniform bound for some constant C 1 depending only on the data.For the same reasons, since m ∈ C 2+β,1+β/2 (Q T ; R), then u also solves a nondegenerate parabolic equation with coefficients in C β,β/2 (Q T ; R) and it satisfies where the constant C 2 depends only on the data.Putting together the two estimates we get the thesis for C := C 1 + C 2 .

Well-posedness of system (1)
We are now ready to prove the main result of the paper.
Proof of Theorem 2.5.Denote by X the subset of and equip it with the product norm (which we denote by ∥ • ∥ X ).
Therefore, by the fact that m n (Ω) ≤ 1 for every n ∈ N, we have the desired statement.■ We now introduce the map T : X × [0, 1] → X that associates to every (u, m) ∈ X and every ε ∈ [0, 1] a solution (v, ρ) of the PDE system Claim: Proof of Claim: Set (v, ρ) := T (u, m, ε).Since the couple (Du, m) satisfies the assumptions of Lemma 3.4, the measure-valued curve µ t = Γ(Du t , m t ) satisfies the fixed-point relationship . Now, by Proposition 3.6, if (u, m) ∈ X then the term H(t, x, Du(t, x); µ t ) satisfies the hypotheses of [42, Theorem IV.5.2], hence there exists a unique solution such that for some continuous function K 1 : R → R and constants C 1 , C 2 > 0 depending only on the data.
Here we used Assumption (A3)-(i), as well as Assumption (A7).Furthermore, since D p H(t, x, Du(t, x); µ t ) is Hölder continuous as well, for the same reasons there exists also a solution which satisfies again for some continuous function K 2 : R → R and constants C 3 , C 4 > 0 depending only on the data, having used Assumption (A7).Moreover, by Proposition 3.1, it holds that ρ t ∈ M 1 (Ω) for every t ∈ [0, T ].Summing (25) and (27) together we arrive at for some continuous function K : R → R depending only on the data.It then follows that (v, ρ) = T (u, m, ε) ∈ X, and hence T (•, ε) is a well-defined map.■ Proof of Claim: Let (u n , m n ) n∈N ∈ X be a sequence converging to (u, m) in the norm of X, and denote by (v n , ρ n ) := T (u n , m n , ε) for every n ∈ N. Then (u n , m n ) n∈N ∈ X is also bounded, and so is K(∥(u n , m n )∥ X ) n∈N by the continuity of the function K in (28).Therefore, by the estimate (28) we infer that Hence, by Ascoli-Arzelá Theorem and the fact that X is closed, we may infer the existence of (v, ρ) ∈ X such that (v n , ρ n ) n∈N converges to (v, ρ) in Hölder norm (up to subsequences).We now have to show that T (u, m, ε) = (v, ρ).
Therefore, using well-known stability results for viscosity solutions of second-order equations, it holds that v is a classical solution of (24).Similarly, since by Lemma 3.7 we also get D p H(t, x, Du n (t, x); µ n,t ) → D p H(t, x, Du n (t, x); µ n,t ) uniformly, then we have that ρ solves (26), which in turn implies T (u, m, ε) = (v, ρ). ■ Proof of Claim: To show compactness we need to prove that it maps bounded sets into precompact sets.Therefore, fix M > 0 and consider the bounded set with Hölder norm bounded by the constant sup [0,M ] K. Since X is closed by a previous claim, the image of X M by T (•, ε) is precompact by the Ascoli-Arzelá theorem, as desired.■ Claim: T (•, ε) has a fixed point for every ε ∈ [0, 1].

Proof of Claim:
We shall apply the Leray-Schauder fixed point theorem.We already showed that for every ε ∈ [0, 1] the map T (•, ε) is a continuous compact mapping, and it trivially holds T (u, m, 0) = (0, 0) by the maximum principle for the heat equation.Therefore, assume that (u, m) is a fixed point of T (•, ε), i.e., it is a classical solution of (1) with εH, εD p H, εψ and εm in place of H, D p H, ψ and m.Then from ( 21) and (22) in the proof of Lemma 3.8 we see that we can select a constant C > 0 independent from ε ∈ [0, 1] such that Therefore, by the Leray-Schauder fixed point theorem we have that the application T (•, ε) has a fixed point for every ε ∈ [0, 1].■ To conclude the proof, we simply notice that a fixed point (u, m) of T (•, 1) is also a classical solution of (1).

Modeling applications
We now present a set of sufficient conditions for some of the assumptions reported in Section 2, and then we shall provide two models that satisfy them, so that we may infer for them the existence of Nash equilibria by Theorem 2.5.

Sufficient conditions for Assumption (A5)
We shall first focus on the Lagrangian and we will show a functional form satisfying Assumptions (A5) which can be easily found in the context of multi-agent models.In such contexts, one of the most employed modeling tools is the convolution of the mass of agents with an interaction kernel, which models the interaction of the mass with itself.We recall that the convolution between two functions f : whenever this quantity is well-defined (for instance, when m is absolutely continuous with respect to the Lebesgue measure on R d and f ∈ L 1 (R d )).Convolutions with sufficiently regular interaction kernels play a crucial role, both in the dynamics as well as inside the cost functional, in several pedestrian and financial markets models (see for instance [26] for the former and [8] for the latter).Definition 4.1 (Multi-agent interaction Lagrangian).Fix ℓ : [0, T ] × Ω → R, h : R → R, Q : R d → R, g : A × M 1 (Ω × A) → R and ε ≥ 0. We define the multi-agent interaction Lagrangian to be For this kind of cost function, a rather easy sufficient condition can be formulated for Assumption (A5)-(i).
Proof.Follows easily from the fact that This concludes the proof.
The regularity condition of Assumption (A5)-(ii) requires to be more precise in the choice of the function g.As we want to model situations where agents optimize their choices taking into account the average strategy of the other agents, we first define the operator Θ(ν) := Ω×A αdν(x, α).(31) The operator Θ measures the average control strategy of the mass of players.Such quantity will be considered inside the Lagrangian in the form of the following cost g(α, ν) := φ(α • Θ(ν)), (32) where φ is a sufficiently smooth, convex and monotone function.Minimizing such cost leads the infinitesimal agents to align its strategy α * with the direction of the average strategy Θ(ν), provided that φ is decreasing.On the other hand, if φ is increasing, the optimal strategy α * should go on the opposite direction of Θ(ν).
Remark 4.3.In several applications, the function g may be chosen as g(α, ν) := φ(α − Θ(ν)), (33) for some smooth, convex and monotone function φ (with φ(z) = z 2 being the favourite choice, see e.g.[5,41]).Our unconventional choice of g is done on the basis that all the results that will be proven for (32) can be trivially extended to (33) as well, and that our cost leads to nice closed loop solutions (see, for instance, Remark 4.8) and to interesting agent dynamics, as shown in Section 5.
We first show that the Hölder regularity in time of a measure ν passes naturally to Θ(ν), as the following result shows.Proof.By Lemma 2.1 for the choice h = α, a = (δ + 1) ω and B = Ω × A, for every t, s ∈ [0, T ] we have This concludes the proof.This result allows us to extend the regularity of α and ν to the multi-agent interaction Langrangian, as the following result shows.Proposition 4.5.Assume that ℓ, h, Q and φ are smooth functions and that g satisfies (32).Then the multi-agent interaction Lagrangian (29) satisfies Assumption (A5)-(ii).
Concerning the convolution term, notice that for every (x 1 , t 1 ), ( hence the desired regularity comes from the smoothness of h, Q and the C β/2 regularity in time for ν t (which passes to π 1♯ ν t ).Since the sum of Hölder continuous functions is still Hölder continuous, we get L(t, x, α(t, x); ν t ) ∈ C β,β/2 (Q T ; R), which is the desired statement.

Sufficient conditions for Assumption (A6)
We now pass to give sufficient conditions for Assumption (A6).The framework we consider here is coherent with the one in the previous subsection, so that we will be satisfying the sufficient conditions for Assumption (A5) at basically no cost if we already satisfying the following sufficient conditions.
Proof.If φ is convex, with the above choice of b and L, the Lagrangian function J is strictly convex in α, and thus admits a unique minimizer α * for every (t, x, p, ν) Since, under the above hypotheses, the Hamiltonian H reads then optimal control α * (t, x, p, ν) satisfies the identity In order to prove that Assumption (A6)-(ii) holds, let p 1 , p 2 ∈ R d and ν 1 , ν 2 ∈ M 1 (Ω × A) and denote by α i := α * (t, x, p i , ν i ) for every i = 1, 2. Then by (34) follows We can therefore conclude the above chain of inequalities with and since Lip(φ ′ ) < 2εM −2 by assumption (3c) and the choice of M and ε, we have where we used the fact that, by assumption (3b), it holds It remains to show that the Lipschitz coefficient satisfies condition (11), that is However, since by assumption δ ≤ M and, due to the choice of ξ(x, α) = α, we have that ∥ξ∥ ∞ ≤ M , then it is enough to show that which is, however, equivalent to assumption (3c) and it is well defined by the choice of M and ε.This concludes the proof.
We now provide two explicit optimal control strategies satisfying both Proposition 4.6 and Assumption (A6)-(i), so that these controls satisfy Assumption (A6) in full.
In this particular case we can actually solve the self-referencing hidden in (35) by means of a simple computation.Indeed, for given p and m we have hence we get the closed-loop form This representation clearly shows that, along the typical adjoint term −p t , in this modeling scenario the average adjoint with respect to the mass of players contributes in determining the optimal choice.Notice that this extra term vanishes as soon as m t does.Remark 4.8 (Exponential-cost optimal control).If we assume that φ(z) := M 1 exp(M 2 z) then the first order equation ( 34) reads whose solution by Proposition 6.2 is given by Notice how this functional form resembles (35), except for the multiplier for Θ(ν).This solution satisfies Assumption (A6)-(ii) for the appropriate choices of M 1 and M 2 , thanks to Proposition 4.6.In addition to this, by invoking Lemma 6.1 and Proposition 4.4, if p ∈ C β,β/2 (Q T ; R d ) and ν t ∈ C β/2 ([0, T ]; M 1 (Ω × A)) then α * satisfies the Hölder regularity conditions of Assumption (A6)-(i).

Sufficient conditions for Assumption (A7)
Before moving to actual applications of our modeling framework, we show how Assumption (A7) can be satisfied.We first introduce the following definition.
In order to give concreteness to the above definition, we immediately give an example of an optimal control strategy which is bounded along fixed points.
Lemma 4.10.The linear-cost optimal control of Remark 4.7 is bounded along fixed points.
Proof.We start from the equivalent formulation of the linear-cost optimal control given by (36).Since all quantities inside integrals are bounded, for every r, s such that 2r + s ≤ β by Leibniz integral rule and Faà di Bruno's formula we get the identity where B k,j denotes the j-th summand of the k-th complete exponential Bell polynomial.Since all the quantities involved on the right-hand side involve integrals of p t and m t , it can be easily shown that for some continuous function H r,s : R → R. In a similar fashion, the above identity allows to bound the Hölder norm of D r t D s x α * by a continuous function of the Hölder norms of p and m.Summing all contributions for all r, s such that 2r + s ≤ β, we get the result.Assumption (A7) can be simply reformulated by saying that the Hamiltonian H as well as its derivative D p H are bounded along fixed points.The following result shows that if the optimal control is bounded along fixed points and if the functionals of the system preserve boundedness along fixed points, then the Hamiltonian satisfies Assumption (A7).
Then the Hamiltonian H satisfies Assumption (A7).
Proof.Let H b , H L : R → R be the continuous function associated to the functions b and L, respectively.From the definition (5) we get which shows that H is bounded along fixed points by means of the function H 1 (x) := xH b (x) + H L (x).Since D p H is trivially bounded along fixed points by the function H 2 (x) := H b (x), the statement is proven.
We now show examples of functions that preserve boundedness along fixed points (and thus can be employed in the definition of b and L to give rise to a Hamiltonian for which Proposition 4.11 holds).Lemma 4.12.If α * (t, x, p; ν) is bounded along fixed points, then also Θ(ν) is.
Proof.We start by noticing that, since α * (t, x, p(t, x); We can therefore argue as in the proof of Lemma 4.10 to conclude that Θ(ν) is bounded along fixed points.
The following statements are easy corollaries of the above result.
Corollary 4.13.Let ϕ : A → R d be a smooth function.Then if α * (t, x, p; ν) is bounded along fixed points, also f (t, x, p; ν) Another corollary is the following result which shows that the convolution dynamics preserve boundedness along fixed points.Then if α * (t, x, p; ν) is bounded along fixed points, also f (t, x; ν) is.
Proof.By π 1♯ Γ(p t , m t ) = m t and the properties of the convolution operator follows that if where we passed derivation inside the integral because of Leibniz integral rule.This easily implies that which in turn yields the statement.

Refinancing dynamics
We now pass to show the first application of our modeling framework.Assume that d = 1 and Ω = (−1, 1), and let us consider a continuum of indistiguishable firms with X t ∈ [−1, 1] denoting the face value of single-period debt of each firm.Notice that being indistiguishable implies that the impact of the individual choice on debt value must be infinitesimal.To simplify the setting, we assume that the level X t = 1 corresponds to the default of the firm (e.g., for the lack of collateral to pledge in return), while X t = −1 means that the firm quits the debt market since it has enough savings for the rest of its lifetime.
At each t, the firm decides the adjustment α * of its debt level according to the following minimization problem where the term |α t | 2 should be considered as a deadweight adjustment cost.The stopping time τ can be interpreted as the exit time from the debt market (if an agent could exit only in the case X t = 1 it could be interpreted as a default time, but in the present case we allow to exit also for the case X t = −1).Furthermore, the infinitesimal firm rolls over its entire stock of debt by paying a rate of return R(ν t ) which depends on the current demand for debt in the market, i.e., for some ρ : R → R sufficiently smooth we have Similarly, we shall prescribe that the function g : A × M 1 (Ω × A) → R is given by g(α, ν) := φ α Ω×A αdν(x, α) = φ(αΘ(ν)), with φ : R → R smooth, increasing and convex with φ ′ ≥ B for some B ≥ 0. With this cost, it is advantageous for the individual firm to "go against" the average demand for debt.The motivation is that it is better for a firm to increase its stock of debt (i.e., α > 0) while there is an excess supply of funds, corresponding to the condition Θ(ν) = Ω×A αdν(x, α) < 0, than while there is excess demand.
This setting yields the minimization problem subject to the stopped SDE and the fixed point relationship for µ t given by (6).
It is quite easy to show that we fall under the general framework outlined in Section 2 as soon as we choose φ(z) = M 1 z for the appropriate value of M 1 > 0. As Assumptions (A1)-(A4) are trivially satisfied, let us see how we can employ the sufficient conditions of the previous sections to show that also Assumptions (A5)-(A7) hold true.
Firstly, Assumption (A5) holds as soon as we show that Proposition 4.2 applies to our choice of g: since it holds then we have (30) for F (x, α) = α √ M 1 .Secondly, Proposition 4.6 holds true as soon as we choose M 1 ≤ 2ε, so that in this case also Assumption (A6) holds.
Finally, as we are in the linear-cost optimal control case, we can employ all the results of Section 4.3 to establish the validity of Assumption (A7), in particular Corollaries 4.13 and 4.14.

Evacuation of pedestrians
Let d = 2, Ω ⊂ R 2 be an open subset and A = B M (0) for some M > 0. We consider a continuum of infinitesimal pedestrian whose dynamics is given by for some smooth convolution kernel K : R d → R d (like the smoothed Hegselmann-Krause one).The optimal velocity α * should be chosen according to two main principles: firstly, pedestrians should try to avoid congestions, which can be done by minimizing the functional where Q : R d → R is a decreasing radial function (see [17] for an analysis of this cost term in the context of pedestrian dynamics) satisfying inf Ω Q ≥ B for some B > 0. Secondly, as argued in [10], pedestrian have a natural attitude to follow their mates, a tendency that can be reproduced by minimizing the quantity where φ : R → R is a strictly convex, decreasing functional with φ ′ ≥ −B/M 2 .Indeed, if this quantity is minimized, then the angle between the optimal velocity α * and the average velocity chosen by the rest of the agents is going to be "small".As a result, this model has the advantage of being able to treat alignment of agents even if it is a first-order model.
Putting things together, we obtain the minimization problem subject to the stopped SDE and the fixed point relationship for µ t given by (6).
Arguing similarly to Section 4.4, we can easily show that the general framework presented above applies to this model as well as soon as we choose φ(z) = −M 1 z with M 1 ≤ B/M 2 .We shall only show that Proposition 4.2 holds true, since we trivially have where we used the fact that M ≥ |α|.However the above chain of inequalities reduces to (30) for the trivial choice F ≡ 0.

The numerical method
In this article, we have chosen a particle method for simulating system (1).The strategy consists in describing a population, composed of a great number of individuals, by means of a reduced set of numerical particles, which obey to the action criterion of the problem.Let φ ε be an ε-dependent smooth function, such that φ ε (x) := φ(ε −1 x)/ε d .It is often denoted shape function because it is used in reconstructing the density of particle starting from the numerical particles.A careful choice of the shape function is crucial for eliminating (or, at least, greatly reducing) the noise of the density profile and for respecting the boundary conditions of the continuous problem.
The first step of the particle method we are going to employ consists in discretizing the initial density f in by means of a set of smooth shape functions centred on the individual positions and controls, in such a way that (38) where N m represents the number of numerical particles, (x 0 k ) 1≤k≤Nm their initial positions and (α 0 k ) 1≤k≤Nm their initial velocities (i.e.their controls).We underline that each numerical particle is weighted by means of a set of parameters ω k ∈ R + (which in our application shall be uniform in k).Once the number N m of numerical particles has been chosen, the initial positions (x 0 k ) 1≤k≤Nm are sampled according to the initial density f in , either in a deterministic way or thanks to a Monte-Carlo procedure.In the first iteration (since the exit cost of our cost functional is going to be set to zero), we choose (v 0 k ) 1≤k≤Nm such that the particles move in the direction of the boundary point closest to their initial position with velocities of maximum norm.
We introduce a time discretization of step ∆t so that we set t n := n∆t.The density of the continuous problem at time t n is hence where (x n k ) 1≤k≤Nm and (v n k ) 1≤k≤Nm are the positions and the velocities of the numerical particles at time t n .The positions of the numerical particles evolve in time by minimizing the individual cost functional specific to the problem, in a time interval [n∆t, (n + 1)∆t].In the next Section, we will provide an example of such a cost.Once we identify the velocity (that is, the control strategy) which minimizes the individual cost, the numerical particles move according to such velocity.
In order to find the Nash equilibrium solution of the problem, an iterative numerical scheme has been implemented.We begin the iteration procedure by considering a simplified problem, and use this solution as the starting point of the iteration scheme.First, the admissible velocities are discretized in an appropriated way in order to reduce them to a finite set.Then, the next steps are implemented as follows.
In what follows, we denote by j the index labelling the iteration step, by (x n k,j ) 1≤k≤Nm and (α n k,j ) 1≤k≤Nm the positions and the velocities of the particle labelled with the index k at time t n at the j-th iteration.At the beginning of the j-th iteration, we randomly order the numerical particles, thanks to a sample from the uniform distribution.We then consider the first particle x 1 1,j (with respect to the random order) at the numerical time t 1 and choose α 1 1,j by testing the possible costs between t 1 and t 2 of this particle with respect to the set of discretized admissible velocities, by assuming that the other particles have the positions and the velocities computed at the (j − 1)-th iteration (i.e., using (α 0 k,j−1 ) 2≤k≤Nm ).Once we find the minimal cost for the first particle, we update the position and the velocity of such particle, which will be used for the computations of the next step.At the numerical time t 2 , we consider the second particle x 2 2,j (with respect to the random order).We choose α 2 2,j by testing the possible costs between t 2 and t 3 of this particle with respect to the set of discretetized admissible velocities, by assuming that the other particles have the positions and the velocities computed at the (j − 1)-th iteration, except for the first particle, whose position and velocity have been updated in the previous step (i.e., using α 1 1,j and (α 0 k,j−1 ) 3≤k≤Nm ).We then end the procedure either once all the particles have been tested, or when the time horizon of the problem has been reached.
The updated positions and velocities are then the starting point for the (j + 1)-th iteration.
The procedure ends when the difference between the distribution obtained at the (j + 1)-th iteration and the distribution obtained at the j-th iteration is below a given threshold.
A peculiar feature of the method is the absence of a stability condition.This could be an advantage for simulating the long-time behaviour of a MFG system, especially in space dimensions higher than one.Moreover, this technique is very well-suited to treat not only second-order systems of PDEs in space, but also first-order systems.In such case, it allows to manage non-conventional boundary conditions, such as absorbing boundary conditions or specular reflection boundary conditions.We finally underline that particle methods suffer from artificial numerical diffusion much less than finite-difference methods.
On the other hand, in order to have accurate simulations, the number of required numerical particles should be very high and the simulations can be time-consuming.
In some sense, our procedure, based on successive approximations, mimics a real-life repeated game and is linked to the concepts of "best reply" [16] and "fictitious play" [24].We underline that iterative procedures for the numerical implementation of MFG, not based on finite difference methods, have been proposed in the context of non-compulsory vaccination [47].

Numerical simulations
In this section, we discuss some numerical results obtained from the implementation of the method described in the previous subsection to the evacuation model of Section 4.5.
We have considered the space domain Ω := (0, 1), the control space A := [−.2, .2], the convolution kernel K = 0, the initial measure m as a smoothed version of the step function 2, x ∈ (0.5, 1), and the cost functional given by ( 40) , and where τ is the exit time from the domain.Notice that we assume that there is no exit cost, i.e. ϕ = 0.It is easy to show that this functional satisfies the hypotheses of Proposition 4.5, Proposition 4.6 and Proposition 4.11.Hence, Assumptions (A) listed in Section 2 are satisfied and the pedestrian evacuation model admits a Nash equilibrium.
Notice that the first integral in (40) takes into account the tendency of pedestrians to avoid congestions and the second one the tendency to follow their mates (see Subsection 4.5 for more details concerning the meaning of this functional).
In the numerical experiments, we discretized the above quantities in the following way.First, the initial density used in the numerical experiments is f in .We underline that f in has total mass equal to one and its center of mass is located at x = 0.75.
Then, the simulation has been obtained by using N = 6 × 10 3 numerical particles.
To discretize the control space A, we choose N α ∈ N, with N α > 2 and denoted by ∆α = 0.4/(N α − 1).In our simulations we have considered the control set obtained by discretizing the control space A. At each instant, the numerical particles choose their optimal velocity α * ∈ A α in order to minimize the functional (40).
We have moreover assumed that N α = 11 and that σ = 2.5 × 10 −9 .The time history of the density is plotted in Figure 1.We see that the support of the density is split in two disjoints subsets.Then, when the position of the center of mass is modified by the exits from the domain, the new position of the center of mass induces some numerical particles to modify their velocity in order to reduce their global cost.
Moreover, we observe that the numerical boundary conditions are coherent with the problem (i.e., the density vanishes in x = 0 and in x = 1).
Moreover, we have plotted, in Figure 2 (left) the time evolution of the total mass with the data of our simulation and, in Figure 2 (right) the time evolution of the center of mass of the population.
The profile of the center of mass is non-monotone and all the oscillations are coherent with the loss of mass during the time evolution of the population.
It is clear that, in principle, an iterative procedure may be time-consuming.In particular, the computational time heavily depend both on the number of points of the discretized control set and on the number of numerical particles.A strategy used in our article has been to work in two steps.In the first step, we have computed the numerical solution by using the full discretized control set and a reduced number of numerical particles.Then, this solution of the numerical simulation with a relatively low number of numerical particles has been the basis for reducing the control set, by eliminating the elements of the discretized control set which are not relevant for the minimization of the cost fuction.
We have observed, in our numerical experiments, that the fixed point is reached quite rapidly once the set of admissible control velocities is reduced: as shown in Figure 3 (left), 3 iterations have been enough to reach the Nash equilibrium in the studied case.In the first iteration, by summing all the modification in each time step, 1,064,896 individual strategy modifications have been observed; in the second iteration, 8,382 individual modification have been carried out and in the third one, 1,005 individual strategy modifications have been reported.Finally, in the forth iteration step no individual modifications have been observed, and hence the Nash equilibrium has been reached.We underline that, in the time span of the simulation, by multiplying the number of time steps by the number of numerical particles, the total number of possible strategy modifications is equal to 6 × 10 6 .We conclude our analysis of the numerical experiment by underlying that our method allows to reconstruct the value function not only as a function of space and time, but also as a function of time for a particle starting at a given point x ∈ Ω as t = 0.In Figure 3 (right), we have plotted the time evolution of the value function for particles starting at x = 0.6, x = 0.7 and x = 0.8.All of the are curves are strictly decreasing functions of time and are identically equal to zero for times greater than the exit time of the corresponding particle (for example, t ≥ 3, 684 for a particle located at x = 0.7 when t = 0).

Appendix
We here recall some basic facts concerning Hölder continuous functions and provide a proof of the explicit solution to the product log equation (37  ( This concludes the proof.Proposition 6.2.Let p, q ∈ R d and a, b ∈ R. Then the solution of p + α + a exp (bα • q) q = 0 (44) is given by α = −p − a exp bW ab|q| 2 p • q exp (−bp • q) q, (45) where W denotes the principal branch of the Lambert W function [29].
Proof.First notice that if q = 0 then the statement is straightforward.Assume therefore that w.l.o.g.q 1 ̸ = 0.

Figure 2 :
Figure 2: Time evolution of the total mass (left) and time evolution of the center of mass (right).

8 Figure 3 :
Figure 3: Individual strategy modifications before reaching the Nash equilibrium (left) and time evolution of the value function at x = 0.6, x = 0.7 and x = 0.8 (right).
the product topology.Then for n → ∞ it holds H(t, x, p n,t (x); Γ(p n,t , m n,t )) → H(t, x, p t (x); Γ(p t , m t )) uniformly, as well as D p H(t, x, p n,t (x); Γ(p n,t , m n,t )) → D p H(t, x, p t (x); Γ(p t , m t )) uniformly.