Pure and Random strategies in differential game with incomplete informations

We investigate a two players zero sum differential game with incomplete information on the initial state: The first player has a private information on the initial state while the second player knows only a probability distribution on the initial state. This could be view as a generalization to differential games of the famous Aumann-Maschler framework for repeated games. In an article of the first author, the existence of the value in random strategies was obtained for a finite number of initial conditions (the probability distribution is a finite combination of Dirac measures). The main novelty of the present work consists in : first extending the result on the existence of a value in random strategies for infinite number of initial conditions and second - and mainly - proving the existence of a value in pure strategies when the initial probability distribution is regular enough (without atoms).

1. Introduction. We consider a two-player, zero-sum differential game with dynamics x (t) = f (x(t), u(t), v(t)) u(t) ∈ U, v(t) ∈ V x(t 0 ) = x 0 and terminal cost g : R N → R, which is evaluated at a terminal time T > 0. The first player acts on the system through his control u(·) in order to minimize a final cost g(X(T )) while the second player wants to maximize g(X(T )) by choosing his control v(·).
Let us now describe how the game is played: fix an initial time t 0 ∈ [0, T ]. -before the game starts, the initial position x 0 is chosen randomly according to a probability measure µ 0 , -the initial state x 0 is communicated to Player I but not to Player II, -the game is played on the time interval [t 0 , T ], 364 PIERRE CARDALIAGUET, CHLOÉ JIMENEZ AND MARC QUINCAMPOIX -both players know the probability µ 0 and observe their opponents controls.
Such a game with incomplete information (the first player has a private information not available for the second player) was introduced in the 1960's in the framework of repeated games by Aumann and Maschler and are extensively studied since then (see for instance [3]).
For differential games, the similar problem were introduced by the first author in [6]: in this paper the existence of a value in mixed strategies was obtained when the unknown information for the players belongs to a finite set. Further investigations and generalizations on this topics can be found in [4,8] and the references therein. In all these works the private information is given as a finite number of types. The case of a continuum of types for games in continuous times has been addressed only recently in [9], in a very particular situation where there is no dynamics and where the information issue lies on the payoff.
In the game we investigate here, the role of the information is crucial. Indeed the second Player does not know what the current state of the game is. However he can try to guess it-at least partially-by observing the actions of the first Player. For this reason the first Player's interest is to hide as much as possible his actions by playing randomly (choosing a random strategy), of course still trying to achieve his own goals. The second Player's interest is also to reveal as little as possible his actions by playing random strategies.
The main phenomenon that appears here lies in the fact that when the initial measure µ 0 has no atoms, one can built on it a "kind of randomness" which avoids the use of random strategies. This is precisely this phenomenon that is explained in our main result (Theorem 4.1) of the paper. Such a statement is reminiscent of the existence of pure strategies in noncooperative, nonatomic games: see in particular Schmeidler [13]. Note however that the frameworks are very different. This is also related to the notion of purification of strategies (e.g. for instance [11]).
The paper is organized as follows: the first section contains basic fact on probability measure spaces, the complete description of the model, and a brief summary of results and methods of proof. Section 2 is devoted to the regularity of the value functions in random strategies. As a byproduct of this regularity, the existence of the value in random strategies is obtained for arbitrary probability measure µ 0 . The last section contains our main result showing the existence of a value in pure strategy when µ 0 has no atoms.

Probability distribution on the initial condition.
Notation. Throughout the paper | · | denotes the euclidian norm in the ambient space (in general R N ). Given a Lipschitz continuous function φ, we let lip(φ) denote its Lipschitz constant. Finally, for m ∈ N * , L m is the Lebesgue measure on R m .
Throughout the paper, we will restrict ourselves to Borel probability measures µ 0 on R N with compact support (denoted by supp(µ 0 ) ). We denote by P(R N ) the set of such probability measures. It is well-known that P(R N ) can be endowed with the Wasserstein distance |x − y| 2 dγ(x, y) 1 2 PURE AND RANDOM STRATEGIES IN DIFFERENTIAL GAME...

365
where Π(µ, ν) is the set of probability measures γ on R 2N which has µ as first marginal and ν as second one. It is known that the infimum is actually a minimum. Such optimal measures γ are then called optimal plan from µ to ν (see [14]). It is well known that the distance W 2 is compatible with the weak convergence of measures supported on a fixed compact set (cf. for instance [14]). For µ ∈ P(R N ) and φ : R N → R N a Borel measurable with at most a linear growth, we denote by φ µ the push-forward of µ by φ, i.e., the measure in P(R N ) such that Let us recall the following result (cf. [1] and [12]) that we will use several times in the paper: Proposition 1. Let m ∈ N and P and Q be two Borel probability measures on R m with a compact support. If P has no atom, there exists a sequence (h n ) n of Borel measurable maps from R m to R m such that: If, moreover, P is absolutely continuous with respect to the Lebesgue measure on R m , there exists a unique Borel measurable map h : R m → R m such that
2.2.1. Dynamics. We consider a two-player zero-sum differential game with dynamics given by the controlled differential equation In the above equation, t 0 ∈ [0, T ] is the initial time-T being the finite horizon of the game-and x 0 ∈ R N is the initial position. We denote by U and V the sets of actions for each player, U for the first one and V for the second one; we assume that U and V are compact subsets of some finite dimensional spaces. The dynamics f : R N × U × V → R N is continuous in all variables, Lipschitz continuous in the state variable, and bounded. We will denote by U(t 0 ) (respectively V(t 0 )) the set of measurable controls u(·) : The sets of controls U(t 0 ) and V(t 0 ) are endowed with the L 1 U [t 0 , T ] and L 1 V [t 0 , T ] topology associated with the following distance : for u 1 and u 2 in L 1 where d U denotes the distance on the compact metric space U (the definition of d L 1 V is similar). Under our assumptions, to any pair of controls (u(·), v(·)) ∈ U(t 0 )×V(t 0 ) one can associate in a unique way a solution to (1) that will be denoted by t → X t0,x0,u,v t . Throughout the paper we will assume that Isaacs' condition holds: Let us recall that this condition is generally used for differential games with perfect information in order to prove the existence of a value. The main novelty of the differential game studied in this paper lies in its information structure: given a measure µ 0 ∈ P(R N ), we suppose that -before the game starts, the initial position x 0 is chosen randomly according to a probability measure µ 0 , -the initial state x 0 is communicated to Player I but not to Player II, -the game is played on the time interval [t 0 , T ], -both players know the probability µ 0 and observe their opponents controls.
Because of this structure of information, the strategies of the players should be defined according only their available information. This leads to the following notions of strategies (compare with [4,6]).
Definition 2.1. A pure strategy for Player II is a Borel measurable map 1 β : A pure strategy for Player I is a Borel measurable map: for which there is a delay τ α > 0 such that, for any The set of pure strategies for Player I (resp. Player II) is denoted by A(t 0 ) (resp. B(t 0 )).
Observe that, in order to formalize the information of the players, the definition of their strategies is not symmetric: Player I knows the initial state x 0 , but this initial state is not known by Player II.
In order to write the game in a normal form we need the following: For any pair of pure strategies (α, β) ∈ A(t 0 ) × B(t 0 ), and any initial condition The proof of this Lemma is a particular case of the proof of Lemma 2.4 (stated later on for the random strategies). We denote by t → X t0,x0,α(x0,·),β(·) t the solution x(·) to (1) with the controls (u x0 , v x0 ).
It is now time to introduce the payoff: let g : R N → R be a Lipschitz continuous and bounded terminal map. The first player acts on the system by choosing the control u(·), his goal being to minimize a final cost g(X(T )) while the second player wants to maximize g(X(T )) by choosing the control v(·).
The upper value function in pure strategies is: while the lower value in pure strategies can be defined in the same way by: We will use the following notation: Similarly a random strategy for Player II is a pair ( The set of random strategies for Player I (resp. Player II) is denoted by A r (t 0 ) (resp. B r (t 0 )).

Remark 1.
Our random strategy concept-borrowed from [6]-is closely related with Aumann's notion of strategies [2]. It is worth pointing out that it slightly differs from the mixed strategies for differential games introduced in [5]. In contrast to repeated games, there is no well admitted definition of mixed strategies or behavior strategies for differential games, at least up to now. Finding a concept which would allow to dispense of Isaacs' condition (2) is a difficult task and is out of the scope of the paper.
Consider now n > 1 and suppose that the map (ω, Let us prove that this still holds true when n is replaced by n+1. Fix (ū,v) ∈ U(t 0 )×V(t 0 ). For any u ∈ U(t 0 ) we denote by u |[t0,t0+nτ ] ū |[t0+nτ,T ] the measurable control which restriction on [t 0 , t 0 + nτ ] is u and which restriction on [t 0 + nτ, T ] isū. Clearly the map is continuous for the L 1 U norm. We clearly have the same property for a map Υ V similarly defined in V(t 0 ). Because of the nonanticipativity property, the restrictions of do not depend onū andv, the map The result follows by induction.

Remark 2.
As usually for random strategies, one can show that the upper value does not change if the first player plays a random strategy against a pure strategy of the second player: Similarly for the lower value we have From this fact, because a pure strategy can be viewed as a particular case of a random strategy, one can derive the following inequalities for any µ 0 ∈ P(R N ) One can also show that the space of random strategies could be restricted to Ω = [0, 1]: PURE AND RANDOM STRATEGIES IN DIFFERENTIAL GAME...
It is a non-atomic probability measure. So, by Proposition 1, there exists h : [0, 1] m → [0, 1] m such that h P = P . For any x ∈ R N and z ∈ [0, 1] letᾱ(x, z, ·) := α(x, h(z, 0, ..., 0), ·). Then, for any x ∈ R N and β ∈ B(t 0 ), we have: Now we recall the result of the first author showing the existence of a value for a finite number of initial conditions. In our framework, it can be reformulated as follows: Proposition 2 ([6], section 6). Assume that Isaacs condition (2) holds and that the initial probability distribution is a finite combination of Dirac masses: µ 0 = J i=1 a i δ x i 0 . Then the differential game with incomplete information has a value in random strategies: Outline of the main results. Our paper contains two main results: the first one (Theorem 3.1) states that, under Isaacs' condition, the game has a value in mixed strategies: equality V + r (t 0 , µ 0 ) = V − r (t 0 , µ 0 ) holds for any (t 0 , µ 0 ). In other words, we can remove the "finite support condition" required in Proposition 2.
Our second main result (Theorem 4.1) is the existence of a value in pure strategies for measures without atoms: V + (t 0 , µ 0 ) = V − (t 0 , µ 0 ) if µ 0 is non-atomic. We actually show a stronger statement: for any non-atomic measure µ 0 , one always have V ± (t 0 , µ 0 ) = V ± r (t 0 , µ 0 ) (even without Isaacs' condition). Then the first Theorem gives the result under Isaacs' condition.
Both results are proved under the assumption that the measure µ 0 has compact support and for games with incomplete information on one side. Extensions to more general measures and to games with incomplete information on both sides are discussed in Remarks 4 and 5 below.
Comments on the proofs are now in order. To show our first result (existence of a value in mixed strategies), we have to overcome the issue that the method used in [6] is no longer available: indeed [6] heavily relies on techniques of partial differential equations which have-up to now-no counterpart for general measures. Our idea is to extend the existence of a value for measures with finite support to game to general measures. The main step for this is a Lipschitz continuity property of the value functions V + r and V − r (Proposition 3), which is proved by optimal transport techniques: these techniques appear to be extremely useful here because they allow to transport properties of one measure to properties for another one. With the regularity of V ± r we can conclude by using Proposition 2 and the density into P(R N ) of measures with finite support. For the second statement (existence of a value in pure strategies), we start by approximating the measure µ 0 by discrete ones; then we transform −optimal random strategies for these discrete measures into pure strategies for non-atomic ones by optimal transport techniques. The resulting pure strategies turn out to be also −optimal for the initial measure µ 0 .
By Proposition 1, there exists 2 a map ξ : (y, ω ) ∈ supp(µ 1 ) × [0, 1] N → ξ(y, ω ) ∈ R N such that ξ(y, ·) L N = γ y for µ 1 -almost all y, and We now prove the Borel measurability of ξ which is a technical but important claim for further considerations. First, by the classical desintegration Theorem (cf. for instance [10]) we know that the map y ∈ supp(µ 1 ) → γ y ∈ P(R N ) is Borel Measurable (when P(R N ) is equipped with the Borel σ-fields associated with the distance W 2 ).
Last, it is well known that the map ( The map ξ being the composition of the three above maps, it is Borel measurable. Our claim is proved.
We recall that any measure in P(R N ) can be written as a limit-for the Wasserstein distance-of a sequences of probability measures which are finite combinations of Dirac masses. Then in view of Proposition 2, we can deduce from Proposition 3 the existence of a value for general probability measures: 372 PIERRE CARDALIAGUET, CHLOÉ JIMENEZ AND MARC QUINCAMPOIX Theorem 3.1. Under Isaacs condition (2), the differential game with incomplete information has a value in random strategies:

4.
Values in pure strategies. In this section, we prove that the game has a value in pure strategy when the initial probability measure µ 0 has no atoms.
Theorem 4.1. Let µ 0 be a compactly supported probability measure on R N without atoms. Then If moreover we suppose that Isaacs' condition (2) holds true then the value of the game exists in pure strategies for µ 0 : Proof. We only prove (13) for upper values, the proof being similar for lower values. Let us approximate µ 0 by a sequence of discrete probability measures µ n : Let ς ∈]0, 1 n 2 [ be small enough such that for all k = k the intersection of the ball B(x k , ς) with the ball B(x k , ς) is empty. We introduce the following sequence of probability measures: (where ζ N = L N (B(0, 1)). We have that Let ε = 1 n and (([0, 1] m , B([0, 1] m ), L m ), α) an ε-optimal mixed strategy for V + r (µ n ). Namely, in view of Remark 2: With the random strategy α we associate a pure strategyα in the following way. There exists T k a Borel measurable map such that T k [14] for instance). Then we set . Then for all β ∈ B(t 0 ), using the definition ofα, we have: (thanks to standard estimates on trajectories of (1) ) Then taking the suppremum on β ∈ B(t 0 ), using the Lipschitz property of V + r (Proposition 3) and the 1 n -optimality of α (equation 15), we get In view of (14), this implies that there exists some constant C > 0 (independent of n) such that for n large enough This means that the pure strategyα is C n -optimal for the value in random strategies V + r (ν n ). From Proposition 1, the probability measure µ 0 being non-atomic, there exists a minimizing sequence of maps (S n ) n such that S n µ 0 = ν n and : Let us defineᾱ n (x, ·) :=α(S n (x), ·). Then for any β ∈ B(t 0 ), we obtain Passing to the supremum on β ∈ B(t 0 ) in the above inequality yields so that, in view of (16), ≤ V + r (µ 0 ) + 2CLip(g)W 2 (µ 0 , ν n ) + 1 n (C + CLip(g)).
Since ν n converge to µ 0 as n → +∞, passing to the limit on n of the above inequality gives V + (µ 0 ) ≤ V + r (µ 0 ).
According to (11), the reverse inequality also holds true. So we obtain the equality V + (µ 0 ) = V + r (µ 0 ). This proves the first part (13) of the Theorem. If now Isaacs condition holds, then the inequality V + (µ 0 ) = V − (µ 0 ) is a direct consequence of (13) and of Theorem 3.1. The proof is complete.

Remark 4.
Our results are also valid if we replace P(R N ) (the set of Borel probability measures with compact support) by P 2 (R N ) (the set of Borel probability measures µ with finite second moment R N |x| 2 dµ(x) < +∞) and if the assumption of Lipschitz continuity and boundedness of g is replaced by Lipschitz continuity and ∃a > 0, ∀x ∈ R N , g(x) ≤ a(1 + |x| 2 ).
We have chosen not to consider this more general case because this would require to make a new proof of measurability arguments used to obtain the measurability of the function ξ in the proof of Proposition 3 (in the compact support case, we can refer to [14]). This would be make much longer our paper which main aim is not optimal transport theory neither measurability.
Remark 5. Our approach could be extended to differential games with a incomplete information for both players as follows. Suppose that the state space R N is a product space R N = Y ×Z (this is the case in particular for pursuit differential games where each player acts only on a component of the dynamics). The game is then played as follows: the game starts at time t 0 , where the initial position x 0 = (y 0 , z 0 ) is chosen randomly according to a probability measure µ 0 ⊗ ν 0 , -the component y 0 of the initial state is communicated to Player I but not to Player II, while the component z 0 of the initial state is communicated to Player II but not to Player I -both players know the probability µ 0 ⊗ν 0 and they observe their opponents controls -the game is played until the terminal time T .
The payoff is again of the form g(x(T )). Both main results of the paper (i.e., the existence of a value in random strategy for µ 0 ⊗ ν 0 ∈ P(R N ) and the existence of value in pure strategy when µ 0 ⊗ ν 0 has no atoms) are still valid in this context. Indeed our approach is based on Proposition 2 which is also valid for µ 0 ⊗ ν 0 = J i=1 a i (δ y i 0 ⊗ δ z i 0 ) (cf. [6]). The proofs of analogues of Proposition 3 and Theorem 4.1 can be generalized using the same arguments.