A rank based mean field game in the strong formulation

We discuss a natural game of competition and solve the corresponding mean field game with \emph{common noise} when agents' rewards are \emph{rank dependent}. We use this solution to provide an approximate Nash equilibrium for the finite player game and obtain the rate of convergence.


Introduction
Mean field games (MFGs), introduced independently by [8] and [6], provide a useful approximation for the finite player Nash equilibrium problems in which the players are coupled through their empirical distribution. In particular, the mean field game limit gives an approximate Nash equilibrium, in which the agents' decision making is decoupled. In this paper we will consider a particular game in which the interaction of the players is through their ranks. Our main goal is to construct an approximate Nash equilibrium for a finite player game when the agents' dynamics are modulated by common noise.
Rank-based mean field games, which have non-local mean field interactions, have been suggested in [4] and analyzed more generally by the recent paper by Carmona and Lacker [3] using the weak formulation, when there is no common noise. There are currently no results on the rank-dependent mean field games with common noise. In order to solve the problem with common noise, we will make use of the mechanism in [7] by solving the strong formulation of the rank-dependent mean field game without common noise and then by observing that purely rank-dependent reward functions are translation invariant.
The rest of the paper is organized as follows: In Section 2 we introduce the N-player game in which the players are coupled through the reward function which is rank-based. In Section 3 we consider the case without common noise. We first find the mean field limit, discuss the uniqueness of the Nash equilibrium, and construct an approximate Nash equilibrium using the mean field limit. Using these results, in Section 4 we use the mechanism in [7] and obtain respective results for the common noise.

The N -player game
We consider N players each of whom controls her own state variable and is rewarded based on her rank. We will denote by X i the i-th player's state variable, and assume that it satisfies the following stochastic differential equation (SDE) dX i,t = a i,t dt + σdB i,t + σ 0 dW t , X i,0 = 0, where a i is the control by agent i, and (B i ) i=1,...,N and W are independent standard Brownian motions defined on some filtered probability space (Ω, F, {F t } t∈[0,T ] , P), representing the idiosyncratic noises and common noise, respectively. The game ends at time T > 0, when each player receives a rank-based reward minus the running cost of effort, which we will assumed to be quadratic ca 2 for some constant c > 0.
In order to precisely define the rank-based reward, let denote the empirical measure of the terminal state of the N -player system. Then µ N (−∞, X i,T ] gives the fraction of players that finish the same or worse than player i. Let R × [0, 1] (x, r) → R(x, r) ∈ R be a bounded continuous function that is nondecreasing in both arguments. For any probability measure µ on R, The reward player i receives is given by When R(x, r) is independent of x, the compensation scheme is purely rank-based. In general, we could have a mixture of absolute performance compensation and relative performance compensation. The objective of each player is to observe the progress of all players and choose her effort level to maximize the expected payoff, while anticipating the other players' strategies. The players' equilibrium expected payoffs, as functions of time and state variables, satisfy a system of N coupled nonlinear partial differential equations subject to discontinuous boundary conditions, which appears to be analytically intractable. Fortunately, in a large-population game, the impact of any individual on the whole population is very small. So it is often good enough for each player to ignore the private state of any other individual and simply optimize against the aggregate distribution of the population. As a consequence, the equilibrium strategies decentralize in the limiting game as N → ∞.
We shall use the mean field limit to construct approximate Nash equilibrium for the N -player game, both in the case with and without common noise.

Mean field approximation when there is no common noise
In this section, we assume σ 0 = 0. Solving the mean field game consists of two sub-problems: a stochastic control problem and a fixed-point problem (also called the consistency condition). For any Polish space X , denote by P(X ) the space of probability measures on X , and P 1 (X ) := {µ ∈ P(X ) : X |x|dµ(x) < ∞}. We first fix a distribution µ ∈ P(R) of the terminal state of the population, and consider a single player's optimization problem: where dX s = a s ds + σdB s , (3.2) B is a Brownian motion, and a ranges over the set of progressively measurable processes satisfying E T 0 |a s |ds < ∞. The associated dynamic programming equation . Using the first-order condition, we obtain that the candidate optimizer is a * = vx 2c , and the Hamilton-Jacobi-Bellman (HJB) equation can be written as The above equation can be linearized using the Cole-Hopf transformation u(t, x) := e (2cσ 2 ) −1 v(t,x) , giving Together with the boundary condition u(T, x) = e (2cσ 2 ) −1 Rµ(x) , we can easily write down the solution: where Z is a standard normal random variable. Let us further write u as an integral: dy.
Using the dominated convergence theorem, we can differentiate under the integral sign and get Similarly, we obtain .
Since v xx is bounded, the drift coefficient a * = vx 2c is Lipschitz continuous in x. It follows that the optimally controlled state process, denoted by X * , has a strong solution So the optimal cumulative effort is bounded by some constant independent of µ. It also implies that X Standard verification theorem yields that the solution to the HJB equation is the value function of the problem (3.1)-(3.2), and that a * is the optimal Markovian feedback control. Finally, using the dominated convergence theorem again, we can show that for t < T , The same limits also hold for a * since u is bounded away from zero. In other words, the optimal effort level is small when the progress is very large in absolute value. This agrees with many real life observations that when a player has a very big lead, it is easy for her to show slackness; and when one is too far behind, she often gives up on the game instead of trying to catch up.

Existence of a Nash equilibrium
For each fixed µ ∈ P(R), solving the stochastic control problem (3.1)-(3.2) yields a value function v(t, x; µ) and a best response a * (t, x) = (2c) −1 v x (t, x; µ). Suppose the game is started at time zero, with zero initial progress, the optimally controlled state process X µ of the generic player satisfies the SDE Finding a Nash equilibrium for the limiting game is equivalent to finding a fixed point of the mapping Φ : µ → L(X µ T ), where L(·) denotes the law of its argument. We shall sometimes refer to such a fixed point as an equilibrium measure.

Theorem 3.2.
The mapping Φ has a fixed point.
Proof. Similar to [1], we will use Schauder's fixed point theorem. Observe that for any µ ∈ P(R), we have This implies the set of Φ(µ) = L(X µ T ) is tight in P(R), hence relatively compact for the topology of weak convergence by Prokhorov theorem. Recall that P 1 (R) = {µ ∈ P(R) : R |x|dµ(x) < ∞}. Equip P 1 (R) with the topology induced by the 1-Wasserstein metric: Here Lip 1 (R) denotes the space of Lipschitz continuous functions on R whose Lipschitz constant is bounded by one. It is known that (P 1 (R), W 1 ) is a complete separable metric space (see e.g. [9, Theorem 6.18]). We shall work with a subset of P 1 (R) defined by E := µ ∈ P 1 (R) : It is easy to check that E is non-empty, convex and closed (for the topology induced by the W 1 metric). Moreover, one can show using [9, Definition 6.8(iii)] that any weakly convergent sequence {µ n } ⊆ E is also W 1 -convergent. Therefore, E is also relatively compact for the topology induced by the W 1 metric. So we have found a non-empty, convex and compact set E such that Φ maps E into itself. It remains to show Φ is continuous on E. In the rest of the proof, the constant C may change from line to line.
By Lemma 3.1 and the mean value theorem, we have that We first show (3.7). Using the estimates in Lemma 3.1, we get Since all integrands are bounded, to show the expectations converge to zero, it suffices to check that the integrands converge to zero a.s. Fix ω ∈ Ω, we know from (3.4) that .
Since W 1 (µ k , µ) → 0, µ k also converges to µ weakly, and the cumulative distribution function F µ k (x) converges to F µ (x) at every point x at which F µ is continuous. It follows from the continuity of R that R µ k (x) converges to R µ (x) at every point x at which F µ is continuous. Since F µ has at most countably many points of discontinuity, the random variable inside the expectation converges to zero a.s. The dominated convergence theorem then allows us to interchange the limit and the expectation, giving that .
Again, using that F µ has countably many points of discontinuity, one can show that Putting everything together, we have proved (3.7).
Next, we show (3.8) by Gronwall's inequality. Let > 0 be given. For any r ∈ [0, t], By (3.7) and the bounded convergence theorem, we obtain By Gronwall's inequality, This completes the proof of (3.8), and thus the continuity of Φ. By Schauder's fixed point theorem, there exists a fixed point of Φ in the set E.

Uniqueness of Nash equilibrium.
Let C ⊆ P(R) be a class of measures in which uniqueness will be established. We first state a monotonicity assumption which is in the spirit of [8].

Assumption 3.3.
For any µ, µ ∈ C, we have is differentiable and has non-negative partial derivatives h x , h r1 , h r2 . This includes any continuously differentiable function R which satisfies (i) r → R(x, r) is convex, and (ii) r → R x (x, r) is non-decreasing. To see why h x , h r1 , h r2 ≥ 0 is sufficient to verify ECP 0 (2012), paper 0.
Page 6/13 ecp.ejpecp.org A rank-based mean field game Assumption 3.3, first note that for any µ, µ ∈ C, R µ and R µ are absolutely continuous.

Using integration by parts for absolutely continuous functions, we have
Re-arranging terms and using that h x , h r1 , h r2 ≥ 0, we get If one measures the rank of x with respect to a given distribution µ using the "regular" cumulative distribution functionF µ (x) := 1 2 (F µ (x+) + F µ (x−)), then for the case R(x, r) = r, Assumption 3.3 is satisfied with C = P(R) (see [5,Theorem B]).

Proposition 3.5. Under Assumption 3.3, Φ has at most one fixed point in C.
Proof. Suppose µ and µ are two fixed points of Φ in C. To simplify notation, write v(t, x) := v(t, x; µ) and v (t, x) := v(t, x; µ ). Let X µ and X µ be the optimally controlled state processes (starting at zero) in response to µ and µ , respectively. Let t ∈ (0, T ). Using Itô's lemma and the PDE satisfied by v and v , it is easy to show that and Letting t → T and using the continuity of v and v at the terminal time, we get Now, exchange the role of µ and µ . We also have Adding (3.11) and (3.12), and using that µ = L(X µ T ), µ = L(X µ T ), we get where the last inequality follows from Assumption 3.3. This implies v x (s, X µ s ) = v x (s, X µ s ) dP × dt-a.e.
By the uniqueness of the solution of the SDE (3.6), we must have X µ T = X µ T a.s. and µ = µ .

Approximate Nash equilibrium of the N -player game
The MFG solution allows us to construct, using decentralized strategies, an approximate Nash equilibrium of the N -player game when N is large. In the MFG literature, this is typically done using results from the propagation of chaos. Here we have a simpler problem since the mean-field interaction does not enter the dynamics of the state process. And it is this special structure that allows us to handle rank-based terminal payoff which fails to be Lipschitz continuous in general.
We now state an additional Hölder condition on R which allows us to get the convergence rate. It holds, for example, when R(x, r) = A(x)r p + B(x) where p ∈ (0, ∞) and A ∈ L ∞ (R). Assumption 3.7. There exist constants L > 0 and α ∈ (0, 1] such that |R(x, r 1 ) − R(x, r 2 )| ≤ L|r 1 − r 2 | α for any r 1 , r 2 ∈ [0, 1] and x ∈ R. Proof. Let µ be a fixed point of Φ, and letā i,t be defined as in the theorem statement. To keep the notation simple, we omit the superscript of any state process if it is controlled by the optimal Markovian feedback strategy (2c be the value of the limiting game where X satisfies (3.6), and be the net gain of player i in an N -player game, if everybody use the candidate approximate Nash equilibrium (ā 1 , . . . ,ā N ).
Since our state processes do not depend on the empirical measure (the interaction is only through the terminal payoff), each X i is simply an independent, identical copy of X. Hence Let us first show that J N i and V are close. We have ]. It follows from the α-Hölder continuity of R that where for n ∈ N,F n µ denotes the empirical cumulative distribution function of n i.i.d. random variables with cumulative distribution function F µ . By Dvoretzky-Kiefer-Wolfowitz inequality, we have Next, consider the system where player i makes a unilateral deviation from the candidate approximate Nash equilibrium (ā 1 , . . . ,ā N ); say, she chooses an admissible control β. Denote her controlled state process by X β i , and the state processes of all other players by X j as before for j = i.
where the inequality follows from the optimality ofā i for the i-th player's problem.
where we used Jensen's inequality in the fourth step. Combining the two estimates, we obtain Remark 3.9. Without Assumption 3.7, one can still use the continuity and boundedness of R to get convergence; that is, the MFG solution still provides an approximate Nash equilibrium of the N -player game. However, the convergence rate is no longer valid.

Mean field approximation when there is common noise
In this section, we assume σ 0 > 0 and R(x, r) is independent of x; the latter means the reward is purely rank-dependent. Unlike the case with only idiosyncratic noises, since the common noise does not average out as N → ∞, the limiting environment becomes a random measure instead of a deterministic one. So the MFG problem now reads: (i) Fix a random measure µ of the terminal distribution of the population where the randomness comes from the common noise W , and solve the stochastic control problem faced by a representative player: where dX s = a s ds + σdB s + σ 0 dW s , X 0 = 0. Denote the optimally controlled state process by X µ . (ii) Find a fixed point of the mapping Ψ : µ → L(X µ T |W ).
For µ ∈ P(R), denote by µ(· + q) the probably measure obtained by shifting µ to the left by q ∈ R. Observe that when R is independent of x, we have R µ (x + q) = R(F µ (x + q)) = R(F µ(·+q) (x)) = R µ(·+q) (x). So we are precisely in the framework of translation invariant MFGs. In fact, purely rank-based functions should be another important example of translation invariant functions besides the convolution and local interaction given in [7]. In the general case without translation invariance, results have only been obtained in the weak formulation, see [3].
In the remaining discussion, let us refer to the problems (3.1)-(3.2) and (4.1)-(4.2) together with their respective fixed point problems as MFG 0 and MFG cn , respectively. A direct application of [7, Theorem 2.5] yields the following existence result.
is a (random) equilibrium measure of MFG cn . Moreover, the optimal control associated withμ for MFG 0 is also an optimal open loop control associated with µ for MFG cn .
The intuition is that the whole population is affected in parallel by the common noise. Thus, the effect of common noise is essentially cancelled out in the optimization problem due to translation invariance. Such a random equilibrium measure is clearly σ(W )-measurable, hence is a strong MFG solution in the language of [2]. N -player game, the individual can observe all state processes. When N is large, the individual noises average out. Thus, observing the entire system should give each player some information about the common noise. Passing to the MFG limit, the individual should be allowed more information than that generated by her own state process.
2) with a replaced byā i and B replaced by B i . Then (ā 1 , . . . ,ā N ) form an O(N −α/2 )-Nash equilibrium of the N -player game with common noise.
Proof. Let µ and X • be defined as in Proposition 4.1 and Remark 4.2. Also let X := X • + σ 0 W and X i := X • i + σ 0 W . By Proposition 4.1, we have be the net gain of player i in an N -player game, if everybody use the candidate approximate Nash equilibrium (ā 1 , . . . ,ā N ).
Similarly, sincē    The arbitrary control β in the above proof may depend on the common noise. However, the additional information of the common noise gives each player very little advantage when everyone else use their respectiveā i 's which are independent of the common noise.