Approachability, Regret and Calibration; implications and equivalences

Blackwell approachability, regret minimization and calibration are three criteria evaluating a strategy (or an algorithm) in different sequential decision problems, or repeated games between a player and Nature. Although they have at first sight nothing in common, links between have been discovered: both consistent and calibrated strategies can be constructed by following, in some auxiliary game, an approachability strategy. We gathered famous or recent results and provide new ones in order to develop and generalize Blackwell's elegant theory. The final goal is to show how it can be used as a basic powerful tool to exhibit a new class of intuitive algorithms, based on simple geometric properties. In order to be complete, we also prove that approachability can be seen as a byproduct of the very existence of consistent or calibrated strategies.


Introduction
Sequential decision problems can be represented as repeated games between a player and Nature. At each stage the player (also called agent, decision maker or predictor depending on the context) chooses an element of his decision set. At the same time, Nature chooses on her side a state of the world. Those sequences of choices generate a sequence of outcomes that induces an overall payoff to the player.
The opponent is called Nature as we do not precise her payoff, her objectives or her rationality; absolutely no assumptions is made on her behavior, and future states of the world cannot be inferred from the past. Typically the environment is not stochastic or Bayesian but adversarial; for instance, Nature can represent one malignant opponent, or a set of independent (or correlated) players. A crucial requirement of these model is that a strategy of the player must be good (i.e., it must fulfill some exogenous criterion) against every possible sequence of states of the world (or simply against any strategy of Nature).
Depending on the structure of outcomes mappings, overall objectives of the player might vary. Hannan [30] studied the case where an outcome is actually a real payoff. The player's goal is to maximize his average (or cumulative) payoff. As we made no assumption on Nature's behavior, a player can not ensure to himself a given exogenous amount, unlike in traditional zero sum game where a value can be guaranteed: assume for instance that Nature decides to give a payoff of zero (or one, minus one, etc) to the player at each stage, no matter what he does.
The criterion Hannan introduced is called regret and measures the difference between the average payoff the player got and what he would have got if he had chosen the same action repeatedly. It is somehow related to convex optimization (if Nature chooses repeatedly the same loss function), or more precisely to online convex optimization.
Main results of Hannan [30] are that such a consistent strategy, i.e., a strategy without regret exists, and he constructed one. This has been widely refined and improved using different techniques and ideas by notably (providing an exhaustive list seems almost impossible as the subject has been developed by many different communities) Foster & Vohra [23], Hart & Mas-Colell [31], Fudenberg & Levine [28], Lehrer [43], Auer, Cesa-Bianchi & Gentile [3], Cesa-Bianchi & Lugosi [14] (see also references therein), Sorin [71]... When outcomes are vectorial (and not scalar) payoffs, the problem is closely related to multicriteria optimization, each coordinate representing a different sub-objective. Instead of considering some exogenous convex combination of these objectives or optimizing them in a given order (to encompass this framework into the precedent one), Blackwell [9] introduced another concept. He considered that some target set is given and the player's goal is that the average outcome converges to it; on the contrary, Nature tries to push it away. Formally, a given closed set is approachable, if the player has a strategy such that the average payoffs remains, after some maybe large stage, arbitrarily closed to this target set, no matter the sequence of moves of Nature.
Blackwell's approachability theory is quite elegant as it relies on simple geometric properties. They allowed him to characterize explicitly approachable convex sets and to provide a simple sufficient approachability condition for non-convex set (such sets are called, in reference to Blackwell, B-sets). Spinat [73] proved later that this was in fact almost a necessary condition.
Here, a stage outcome is not some payoff (either scalar or vectorial) but the actual state of the world chosen by Nature. The overall objective of the player is to predict, sequentially, the whole sequence of states so that the average prediction and the empirical distribution of states are asymptotically arbitrarily close. Without any other restrictions, this is in fact fairly easy: one just has to predict at some stage the outcome of the precedent one.
Additional requirements can be, for instance, that predictions can only belong to some finite (yet maybe large) set and that the empirical distribution of states on the set of stages where a specific prediction is made is closer to this prediction than to any other possible one. An usual and celebrated example consists in a meteorologist that predicts, each day, the probability of rain the following day. Predictions belongs to 0%, 10%, 20%, etc. and it is asked that that when a meteorologist says that the probability of rain is, say, 30%, it rains in average between 35% and 45% of the times.
Oakes [57] and Dawid [17] proved that no deterministic algorithm can be calibrated (yet this strong assessment could be discussed) while random algorithm can, as proved by Foster & Vohra [24]. The existence of such algorithms can be seen as a negative result, as it claims that a strategic non-informed meteorologist can mimic an expert one (that knows the true underlying process, if it exists); a whole literature studied this aspect and recent results are gathered into the survey of Olszewski [58]. On the other hand, it can also be seen as a positive result, as it states that the long term behavior of Nature can asymptotically be predicted, and this might lead to another class of algorithms and results, as in Foster & Vohra [23] or Perchet [59,62].
A common feature of regret minimization and calibration is that they can be written as a specific case of approachability of a well chosen target set in some auxiliary vectorial payoff game . The first to notice this property is Blackwell [10] (this idea is already mentioned at the end of the seminal paper of Hannan [30] or in Luce & Raiffa [48]) and then by Foster [21], Hart & Mas-Colell [32], Lehrer & Solan [45], Sorin [71], Perchet [60], Mannor & Stoltz [50], Abernathy, Barltlett & Hazan [1]... We assumed implicitly that the player observes the sequence of states of the world; this is in fact a crucial hypothesis here, sometimes referred to as full monitoring. In particular, we will not consider the case of partial monitoring (or bandit problems), or stochastic games (where, for instance, the whole sequence of outcomes could depend of a unique choice at some stage). Those are also interesting subjects, yet far from the current scope.
Objectives and Structure of the paper. Describing explicit interactions and equivalences between the notions of approachability, calibration and regret is the central point of this paper, the final argument being that that explicit constructions of consistent and calibrated strategies (even for more precise or refined notions that the ones introduced here) are possible and provided thanks to approachability theory. The remaining is organized as follows: In Section 1, we introduce the concept of approachability, centerpiece of this work.
We first recall (in Subsection 1.1) a sufficient and necessary condition under which an arbitrary set is approachable. The specific case of convex sets, for which a complete characterization is available, is studied in Subsection 1.2. First extensions and generalizations of the framework (e.g., in infinite dimension, with variable stage durations, unbounded payoffs, etc.) are given in Subsection 1.3. Last Subsection 1.4 is concerned with other possible proofs and techniques of approachability. In particular, we show that approachability with respect to the supremum norm can be achieved using some potential minimization, generalizing the exponential weight algorithm; we also prove that the usual Euclidian (or Hilbertian) framework is not necessary for approachability.
Proofs are almost always provided, as long as they bring something new to the literature (yet some technical lemmas are delayed to the Appendix).
Regret minimization is introduced in Section 2. Several refinements are introduced and links with game theory (as well as famous algorithms called exponential weight algorithm and follow the perturbed leader) are given in Subsection 2.3. Since our purpose is to provide reduction to some auxiliary approachability problems, proofs are only sketched in this section and delayed to the last one. An example of regret minimization, with expert advice is given for illustration at the end; however, this subject is very well studied in the book of Cesa-Bianchi & Lugosi [14].
Calibration and its generalizations are formalized in Section 3; for the same reasons, proofs are essentially delayed to the last section. We provide there a discussion on wether calibration (yet a weaker but maybe more intuitive notion) can or can not be obtained using deterministic algorithms.
Final Section 4 contains all the reductions to approachability. We prove (or recall) how regret minimizations (either with finite or infinite action spaces) and calibration (either finite or with checking rules) can be obtained using approachability results from the first section.
Maybe the most general results are, on regret minimization, Theorems 4.1 and 4.2 that provide (explicit for the first one) strategy minimizing swap regret if action space are, respectively, finite or infinite. Proposition 4.2, due to Blackwell [10] himself, shows how minimization of the supremum norm of regret is exactly approachability.
Concerning calibration, most striking results might be Proposition 4.5, its consequence Theorem 4.4 and Theorem 4.5. They refine and generalize recent results of Mannor & Stoltz [50] as well as Rakhlin, Sridharan and Tewari [66].
We conclude this Section by explaining how the circle is complete: if regret minimization and calibration can be seen as specific instances of approachability, the converse is also true. Indeed, using some generalized notions of regret and/or calibration, one can construct approachability strategies (in the case of convex sets). 1 Blackwell's approachability 1

.1 Approachability of arbitrary sets
Consider a two-person repeated game between a player and Nature. Their actions set are respectively denoted by A and B (of respective cardinality A and B) and payoffs are defined trough some vectorial mapping g : A × B → R d . The game is repeated in discrete time, and we denote actions chosen at stage n ∈ N by a n ∈ A and b n ∈ B; they induce a payoff g n := g(a n , b n ) ∈ R d . Formally, a n and b n are functions of the history, i.e., the past observations h n−1 = (a 1 , b 1 , . . . , a n−1 , b n−1 ) ∈ (A × B) n−1 =: H n−1 .
Explicitly, a strategy σ of the player is a mapping from H := n∈N H n , the set of finite histories, into ∆(A), the set of probability distributions over A. Similarly, a strategy τ of Nature is a mapping from H into ∆(B). Kolmogorov's extension theorem implies that a pair (σ, τ ) induces a probability distribution P σ,τ over H = (A × B) N , the set of infinite histories of the game embedded with the product topology.
Before defining the concept of approachability, we introduce some notations. Given a closed set E ⊂ R d , we denote by d E (x) = inf z∈E { x − z } the distance from x to E, by E δ = {z ∈ R d s.t. d E (x) < δ} the δ-open neighborhood of E, and by Π E (x) = {z ∈ E s.t. x − z = d E (x)} the projection of x onto E, which is in general non singlevalued. We also denote by co E the convex hull of a set. The mapping g defined on A × B (and more generally any such mapping) is extended to ∆(A) × ∆(B) by g(x, y) = E x⊗y g(a, b) . The average of a sequence s = {s m } m∈N up to stage n ∈ N is denoted by s n := n m=1 s m /n. Definition 1.1 A closed set E ⊂ R d is approachable by the player if he has a strategy σ ensuring, for every ε > 0, the existence of some integer N ε ∈ N such that, no matter the strategy τ of Nature, sup n≥Nε E σ,τ d E (g n ) ≤ ε and P σ,τ sup A set E is excludable by Nature if she can approach the complement of E δ for some δ > 0.
Informally, a given set E ⊂ R d is approachable by the player if he has a strategy such that the average payoff converges almost-surely to E, uniformly with respect of the strategies of Nature. The right hand side of Equation (1) clearly implies the first one, which is actually the most commonly used (and rates of convergences, i.e. smallest mappings ε → N ε satisfying each condition, might differ).

Approachable arbitrary set : Blackwell's sufficient condition
Blackwell [9] provided a simple geometrical condition under which a set E is approachable. This sufficient condition is in fact almost necessary (as proved in Section 1.1.2, following Spinat [73]).
there exists a projection π ∈ Π E (z) and x := x(z) ∈ ∆(A) such that the hyperplane perpendicular to z − π at z separates z from g(x, y) , y ∈ ∆(B) , or formally: Blackwell [9] proved that being a B-set is sufficient for approachability ; he also exhibited a specific strategy, from now on referred to as Blackwell (approachability) strategy.
, then E is approachable by the player. Moreover, the strategy σ defined by σ(h n ) = x(g n ) ensures that, for every η > 0 and against any strategy τ of Nature: Blackwell [9] and Mertens, Sorin & Zamir [55] obtained respectively the bounds in expectation and in probability. The very definition of g ∞ allows each g(a, b) to be random variables with bounded second moment. We propose in the following Corollary 1.1 a slight variant that improves the constants (in the deterministic case or when E is compact); for instance, they are divided by two if E = {0}, as in Section 1.3.6. Corollary 1.1 A closed set E is approachable if and only if E g := E ∩ co g(a, b) ; a ∈ A, b ∈ B is also approachable. Blackwell's strategy applied to E g ensures that where κ = ( g ∞ + E g ) 2 and E g := sup z ; z ∈ E g is smaller than g ∞ .
Proof: An approachability strategy of E ensures that any accumulation point of g n must belong to both the closed set E and to the compact set co g(a, b) ; a ∈ A, b ∈ B , hence to E g . Reciprocally, any approachability strategy of E g approaches its super-set E.
Let σ be Blackwell's strategy applied to E g , define δ n := d E (g n ) and denote by π n any element of Π E (g n ) given by Equation (2). Definition of d E implies that Conditioning on the finite history h n and using Equation (2) as well as the definitions of g ∞ and E g , the last inequality becomes and, with a simple induction, E σ,τ [δ 2 n ] ≤ κ/n. Thus g n converges in probability towards E. The almost sure convergence is a consequence of the facts that Indeed, Doobs' inequality (see Neveu [56], prop. IV.5.2) implies then that which gives the result.
Blackwell's strategy depends only on the sequence {g n } n∈N so these results do not require the finiteness of B or A, nor that Nature's actions are observed. In fact, we could as well assume the following model that we call the compact case (in opposition to the finite case).
Action sets are compact and convex sets, denoted by X ⊂ R A and U ⊂ R d A . At stage n ∈ N, Nature chooses an outcome U n = (U a n ) a∈A ∈ R d A in U and the player chooses x n ∈ X . Those choices incur the vector payoff g n = x n .U n ∈ R d , the standard inner product between x n and U n . Condition (2) that defines B-set becomes then It is also possible to incorporate randomness in this model. The compact and convex sets X and U can be sets of probability distribution (this was the case when X = ∆(A)) and in that case x.U is the expectation of a random payoff associated with x and U (that must have a second moment).

Equivalent formulations and necessary condition
Blackwell defined geometrically a B-set from outside. As Soulaimani, Quincampoix & Sorin [2] noticed that it can also be defined similarly from inside. Informally, one can interpret these definitions slightly differently: instead of viewing approachability as the convergence of average payoffs to E, it can be understood as preventing average payoffs from escaping E.
First, we need to recall the notion of proximal normals to E.

Definition 1.3
The set of normal proximal to some closed set E ⊂ R d at e ∈ E is denoted by N P E (e) ⊂ R d and is defined by: where B e + p, p is the open ball of center e + p and radius p .
The equivalent definition of a B-set, which is closely related to the notion of discriminant set in differential games, is given by the following lemma whose proof is immediate and omitted.
Interesting results on a slightly different (but equivalent as we shall see) notion of approachability that can be found in the literature can be easily derived from this alternative definition of B-set. Definition 1.4 Given ε > 0, a closed set E ⊂ R d is ε-approachable by the player if he has a strategy σ ε ensuring that, after some stage N ε ∈ N, no matter the strategy τ of Nature, And a set E is 0-approachable if it is ε-approachable for every ε > 0.
The difference between approachability and ε-approachability is wether the strategy can depend on ε or not. It is clear that an approachable set is 0-approachable but the converse is not immediate. It is easier to show -following Spinat [73] and thanks to Lemma 1.3 -that a 0-approachable set must contain a B-set and so both notions coincide. Lemma 1.3 Let {E n } n∈N be a decreasing sequence of compact non-empty 0-approachable sets, then E ∞ := ∩ n∈N E n is also a compact non-empty 0-approachable set.
Proof: One just has to notice that, for every ε > 0, the ε/2 neighborhood of E ∞ is included in some E n which is ε/2-approachable. And an ε/2-approachability strategy of E n will ε-approach E. Lemma 1.3 is not trivially true for approachability 1 . Indeed, one must find an approachability strategy that is independent of ε and a simple concatenation of those σ ε might not work (except in the specific case of convex sets).

Proposition 1.4 If a closed set E is 0-approachable, it contains a B-set.
We only provide a sketch of the proof, complete details can be found in Spinat [73].
Proof: Consider the family of every compact subset of E that are 0-approachable. It is a non-empty family, ordered by inclusion and, because of Lemma 1.3, every fully ordered subset has a minorant (the intersection of all elements of this subset) which belongs to this family. Thus Zorn's lemma yield that a minimal element E ∞ exists and we claim that E ∞ is a B-set.
Indeed, assume the converse: condition (4) does not hold for some e ∈ E ∞ and some proximal normal p ∈ N P E∞ (e). So there exists y 0 ∈ ∆(B) such that In particular, Definition 1.3 of proximal normals implies that, at least for some small λ ∈ (0, 1), (1 − λ)e + λg(x, y 0 ) belongs, for every x ∈ ∆(A), to B (e + p, p ). Therefore, By continuity, Equation (6) holds (up to δ/2 instead of δ) on a small open neighborhood V of e. We shall prove that this implies that E ∞ \V is still 0-approachable ; it is a contradiction with the minimality of E ∞ which must therefore be a B-set. Assume that at some stage n ∈ N, g n belongs to V and that Nature plays repeatedly accordingly to y 0 after. Then if n is large enough, there exists some large m ∈ N such that m n+m and g n+m are, respectively and with arbitrarily high probability, arbitrarily close to λ and to some (1 − λ)g n + λg(x, y 0 ), which is at δ/2 from E ∞ .
Consider a δ/4-approachability strategy of E ∞ denoted by σ δ/4 . For some large N ∈ N independent of τ , the P σ δ/4 ,τ -probability that g n belongs to V for some n ≥ N must therefore be smaller than δ/4. In particular, this implies that g n stays within δ of E ∞ \V with probability greater than 1 − δ. Thus, for every δ > 0, there exists a δ-approachability strategy of E∞\V .
A direct consequence of Theorem 1.

Specific case of convex sets
In the specific case of convex sets, there exists a dual and complete characterization of approachability and excludability due to Blackwell [9]. It is somehow a consequence of the fact that, for any z in some closed and convex set C ⊂ R d one has: in particular this implies that N P C (z) is a cone, referred to as the normal cone.
And a convex set is either approachable by the player or excludable by Nature.
Proof: Let C ⊂ R d be a convex set and p ∈ R d be a normal proximal of C at some z ∈ C. Because of Property (7), Condition (8) can be immediately rewritten into The mapping (x, y) → p, g(x, y) − z is linear in both of its argument, so von Neumann minmax theorem implies that operator min and max can be switched, i.e., thus C is a B-set and is approachable by the player.
On the contrary, if Condition (8) is not satisfied, there exists some y 0 ∈ ∆(B) such that g(x, y 0 ) ∈ C for every x ∈ ∆(A). By continuity, there exists δ > 0 such that d C (g(x, y 0 )) ≥ δ. If Nature plays repeatedly accordingly to y 0 , then the law of large numbers implies that g n converges uniformly to the set of {g(x, y 0 ), x ∈ ∆(A)} which is included in the complement of C δ . So C is excludable by Nature and, of course, is not approachable by the player.
Proof of Theorem 1.3 relies on the Hilbertian structure of R d . However, using different arguments, it can be generalized to any normed space, see Theorem 1.7. Remark 1.1 In the specific case of a convex set, Blackwell strategy at stage n + 1 ∈ N can be decomposed as follows: i) Given g n ∈ R d , compute its projection Π C (g n ) on the closed and convex set C; ii) Solve the projected zero-sum game defined by Equation (9), i.e., find x n+1 ∈ ∆(A) that minimizes this problem and choose a n+1 accordingly to it.
These steps ensure that x n+1 = x(g n ) as introduced in Definition 1.2. So Blackwell strategy reduces to a projection onto a convex set and the resolution of some linear program (solving a zero-sum game can be reduced to the latter, see Sorin [70], appendix A). On the other hand, checking wether a convex set is approachable or not, i.e., if it satisfies Condition (8) (or equivalently the more complicate Condition (2)) is NP-hard, even with C = {0}. Mannor & Tsilikis [53] has indeed reduced this to the 3-SAT problem.
In the compact case where action set are X ⊂ R A and U ⊂ [0, 1] d A , a closed convex C ⊂ R d is approachable if and only if

Sharper high probability bounds
In this section, we use the convexity of C to exhibit high probability bounds improving Corollary 1.1. Corollary 1.5 If C ⊂ R d is a closed and convex approachable set, Blackwell strategy ensures that for every η > 0 and against any strategy τ of Nature : (10) Proof: Distance to a convex set is Lipschitz and convex, so where the third inclusion is a consequence of the rate of convergence of Blackwell strategy. We conclude using Lemma 5.3.
This result must be put in perspective with Corollary 1.1 that states that, for any arbitrary approachable set E and every η > 0, P σ,τ sup m≥n d E (g m ) ≥ η ≤ (η 2 n/8 g 2 ∞ ) −1 .

Biased approachability
We assume in this section that the closed and convex set C ⊂ R d is not approachable by the player. In that case, the natural extension of Blackwell strategy would be defined by σ(h n ) = x n+1 ∈ ∆(A), where x n is optimal in the projected zero-sum game with payoffs g(x, y) − Π C (g n ), g n − Π C (g n ) .
Corollary 1.6 Even if a closed and convex set C ⊂ R d is not approachable by the player, Blackwell's strategy σ ensures that Proof: We only need to prove that σ is in fact exactly Blackwell's approachability strategy of the closure of C δ (the δ-neighborhood of C) which is by definition and Condition (8) approachable. This is simply due to the fact that: Indeed, C δ = C +δB(0, 1), so Π C δ (z) minimizes z −(c+δe) 2 = z −c 2 −2δ z −c, e +δ 2 over (c, e) ∈ C×B(0, 1). And necessarily, one must have e = (z−c)/ z−c and c = Π C (z). The results follows from the fact that d C δ (z) ≤ d C (z) + δ and C δ ≤ C + δ.
The key ingredient of Corollary 1.6 is not the rates of convergence (which are a direct consequence of the fact that C δ is approachable), but the fact that it does not require the computation of δ and C δ (we recall that determining if a convex set is NP-hard, thus determining the smallest approachable extension is even more complex). Notice that if C is approachable, rates of Condition 8 and Corollary 1.6 and of Theorem 1.1 match.
This result has to be put in perspective with the following proposition that also deals with biased approachability, yet on different level. Proposition 1.7 Assume that player and Nature strategies generates a sequence of payoffs such that, at every stage n, for some sequence ε n . Then In particular, if ε n converges to 0, then g n converges in expectation to E; the convergence is almost sure as soon as n∈N εn n < ∞. Proof: The proof is identical to the one of Corollary 1.1.
Actually, the result is stated for arbitrary sets and holds for non-deterministic sequences of ε n . On the other hand, for convex sets, concentration inequalities introduced in the previous section show that thus g n converges almost surely to C as soon as ε n goes (in expectation) to 0.

Deterministic approachability and procedures in law
As mentioned in Section 1.1.1, Blackwell's approachability strategy does not use the fact that actions chosen by Nature are observed, as it is only required to observe the sequence of payoffs. In fact, it is not even required that the random variable g n = g(a n , b n ) is perfectly observed. Indeed, denote by γ n the observation made after stage n, and assume it is equal to either g(x n , b n ) or g(x n , y n ), where x n and y n are mixed action of stage n (i.e., laws of a n or b n ). Blackwell's strategy applied to the sequence of γ n ensures that the sequence of deterministic averages γ n converges to E, uniformly with respect to Nature's strategy.
To conclude that this describes an approachability strategy, it remains to notice that d E (g n ) ≤ d E (γ n )+ g n −γ n and that the norm of g n −γ n converges almost surely to zero, because it is an average of bounded martingale differences (using classical concentration arguments to get rates of convergence independent of strategies).

Approachability in infinite dimension spaces
We assume in this section that g no longer takes value in some Euclidian space. Formally, there exists a probability space (Ω, µ, F) such that, for every a ∈ A and b ∈ B, g(a, b) ∈ L 2 (Ω, µ, F)g is extended to ∆(A) × ∆(B) as before. The finite case can be easily embedded into this framework by defining, Ω = {1, . . . , d} and µ = 1 d d k=1 δ k . In this context, notions of approachability slightly differ, as the uniform convergence with respect to Nature's strategy is not required: Definition 1.5 A closed set E ⊂ R d is approachable by the player if he has a strategy σ ensuring that, no matter the strategy τ of Nature, g n converges µ-almost surely to E, for P σ,τ -almost every histories.
A set E is excludable by Nature if she can approach the complement of E δ for some δ > 0.
Lehrer [42] has proved that the natural inner product of L 2 (Ω, µ, F) allows to extend the definition of B-sets and Blackwell's characterization of approachable convex sets still holds (Equation (8), in the previous section).

Theorem 1.4 A closed convex set C is approachable if and only if
The proof relies on the following geometric principle, adapted from Lehrer [42].
Lemma 1.8 Let C be a closed convex subset of L 2 (Ω, µ, F). If, for every n ∈ N, g n is bounded µ-as by M ∈ L 2 (Ω, µ, F) and g n − Π C (g n ), g n+1 − Π C (g n ) ≤ 0 , then g n converges µ-as to C.
Proof: Let us denote f n = g n − Π C (g n ). The finite dimensional arguments of the proof of Corollary 1.1 imply that f n ≤ 2 M / √ n thus g n converges in probability to C.
The almost sure convergence is a consequence of the fact that so f n has small increments and we conclude using the technical Lemma 5.4.
Convexity of C is only used to get a Lipschitzian projection.
Proof of Theorem 1.4: Every arguments behind the proof of Theorem 1.3 hold in L 2 (Ω, µ, F). Therefore, a closed convex set satisfying Blackwell condition remains a B-set with respect to the natural inner product of L 2 (Ω, µ, F).
Assume that C is a B-set and consider Blackwell's strategy, denoted as usual by σ (and τ is Nature's strategy). Let µ ⊗ P σ,τ be the product measure on Ω × H on which we define the random variable g n by g n [ω, h] = g n (a n , b n )[ω] where (a n , b n ) is the pair of actions played at stage n accordingly to h. Since A and B are finite, g n and g n are uniformly bounded and the sequence g n satisfies the geometric principle.
As a consequence, g n converges µ ⊗ P σ,τ -as to C which is therefore approachable.

Approachability with infinite action space -non-linear approachability
It is also possible to generalize the previous results when actions spaces are not necessarily finite but two subsets of a given topological space, denoted by X and Y. Payoff mapping g is now a function from X × Y into L 2 (Ω, µ, F). In particular, it is not required in this section that g is linear in each of its variable.
Theorem 1.5 Assume the following regularity assumptions on g: a) there exists M ∈ L 2 (Ω, µ, F) such that g(x, y) ≤ M , µ-as, for every x, y ∈ X × Y; b) for every y ∈ Y, G(y), the closure of {g(x, y), x ∈ X }, is a compact and convex set.
c) for every u ∈ L 2 (Ω, µ, F) such that sup c∈C c, u < +∞, the zero-sum game with payoffs defined by u, g(x, y) has a value.
Then it holds that i) Blackwell's characterization of convex approachable set holds : C is approachable (in pure strategy) if and only if ∀ y ∈ Y, G(y) ∩ C = ∅; ii) C is approachable if and only if for every z ∈ L 2 (Ω, µ, F): iii) If there exists y 0 such that G(y 0 ) ∩ C = ∅, then C is excludable by Nature; Proof: The deterministic approachability strategy associated with Blackwell's characterization is defined as follows. Denote as before by g n ∈ L 2 (Ω, µ, F) the average payoff up to stage n. Since Π C is the projection onto a convex set, one has Assumption c) ensures that the game with payoff g(x, y) − Π C (g n ), g n − Π C (g n ) has a value which is, using Blackwell characterization, less or equal than 0. The approachability strategy consists in playing x n ∈ X , any 2 −n -optimal strategy of the latter game, i.e., The fact that this describes an approachability strategy follows from arguments used in the proof of Corollary 1.1 and technical Lemma 5.4. Assume that Blackwell's condition does not hold, i.e., there exists y 0 such that G(y 0 )∩ C = ∅; Nature, by playing repeatedly y 0 , can ensure that g n belongs to G(y 0 ). The intersection between the closed convex set C and the compact convex set G(y 0 ) is empty, so they can be strictly separated. Since Nature can approach G(y 0 ), C is excludable, thus not approachable.
Assumption b) is required to get point iii). Second conditions of i) and ii) are sufficient for approachability (but not necessary).
When actions sets A and B are finite, the projected game with payoff g(a, b), u typically does not have a value for some u ∈ L 2 (Ω, µ, F); so we considered instead mixed actions and strategies. This can be generalized when actions space are two measurable sets (A, A) and (B, B), using the same tools as for procedures in law, see Section 1.3.1.
Denote by X and Y the sets of probability distributions onto (A, A) and (B, B), embedded with the weak-⋆ topology ; the mapping g is extended to X × Y multi-linearly as usual. Then, under mild assumptions (for example if A and B are compact and g is continuous, see e.g. Sorin [70]), the projected game with payoff g(x, y), u has a value (at least for every u such that sup c∈C c, u < +∞). So C is approachable with respect to action sets X and Y. In particular, there exists an approachability strategy such that the averages of observed payoffs γ n = g(x n , b n ), where x n ∈ X is the action dictated to be played at stage n, converge to C -and the rate of convergence is O (1/ √ n).
Similarly to Section 1.3.1, this is an approachability strategy of C since g n −γ n is again an average of bounded Martingale differences, and concentration inequalities of sums of bounded martingales differences in any Hilbert spaces, see e.g. Chen & White [15], imply that, in expectation and with great probability, g n − γ n ≤ O 1/ √ n . Almost sure convergence is again a consequence of Lemma 5.4.

Approachability with activation
This section is concerned with the case where only a fragment of all coordinates of the payoff vector (belonging to L 2 (Ω, µ, F)) are active at each stage. Formally, there exists a mapping X : H → L 2 (Ω, µ, F) such that, after any finite history h n = (a 1 , b 1 , . . . , a n , b n ), X [h n ] ∈ L 2 (Ω, µ, F) has value in {0, 1} and only the coordinates ω ∈ Ω with X [h n ](ω) = 1 are active. In particular, wether a coordinate is active at a stage might depend on choices of actions of this specific stage. We also assume that X [h n ] increases µ-almost surely to infinity, no matter the pair of strategies.
In this framework, we denote tilted averages of payoffs by (with the convention that 0 0 = 0).
A set E ⊂ L 2 (Ω, µ, F) is approachable if the player has a strategy σ such that, for any strategy τ of Nature, the sequence g X ,n − Π E (g X ,n ) converges to zero µ-almost surely, for P σ,τ -almost all infinite histories.
We will only focus on product sets, that can be described by where Ω 0 and Ω 1 are two measurable subsets of Ω and f 0 , f 1 ∈ L 2 (Ω, µ, F). The following theorem shows that, in this specific framework, a notion of tilted B-set is sufficient for approachability Theorem 1.6 Let C ⊂ L 2 (Ω, µ, F) be a product set. Then any strategy σ such that, for any strategy τ of Nature, and for P σ,τ -almost every infinite history, where x n+1 = σ(h n ) and y n+1 = τ (h n ), is an approachability strategy of C.
The proof is similar to the one of Theorem 1.4, except that Lemma 5.5 is used instead of Lemma 1.8, so it is omitted.
The next proposition shows that approachability with activation of a product set C = d k=1 C k ⊂ R d in Euclidian spaces can actually be reduced to usual approachability. The only condition is that activation at stage n depends only of current actions (i.e., X [h n ] = X (a n , b n ) where X (a, b) might be a random variable); we also assume, without loss of generality, that the origin belongs to C and even that Proposition 1.9 A product set C ⊂ R d is approachable with activation depending only on current actions if and only if the following convex set z k ω k k∈{1,...,d} ∈ C with the convention that 0 0 = 0 is approachable in the game with payoffs defined by Moreover, there exists a strategy such that, in expectation, Proof: Consider any fixed (z, ω) ∈ R d ×R d ; we can always assume that every coordinates of ω are non equal to 0. Indeed, since C is a product set, Define (z e , ω e ) ∈ Π C z, ω and ω the smallest coordinate of ω.
As a consequence, Reciprocally, Finally, if C is a product set containing 0, then C is a convex cone. The result is a consequence of Blackwell's characterization of approachable sets.
Assuming that the origin belongs to the product set C is of course non-restrictive, one can always choose to transform the origin into any point. Moreover, in some cases, product set property can be relaxed. For instance, if there exists two coordinates ℓ and ℓ ′ that are always active together, i.e., if X (a, b) ℓ = X (a, b) ℓ ′ for every pair (a, b), then the results holds if C := k ∈{ℓ,ℓ ′ } C k × C ℓ,ℓ ′ where the convex set C ℓ,ℓ ′ ⊂ R 2 does not need to be a product set.

Variable stage duration
Cesaro averages of payoffs are considered in the usual definition of approachability. In this section, we make the implicit assumption that all stages does not have the same weights (when computing averages) or, equivalently, that they do not have the same length duration: payoffs obtained on long stages must have more importance than on short stages. We distinguish two classes of variable and random stage duration: wether they depend or not on the actions chosen.
Assume for the moment that ω n , the maybe random length (or weight) of the n-th stage, is independent of actions chosen by player and Nature. In this context, σ is an approachability strategy of a closed set E if g ω,n := n m=1 ω m g m / n m=1 ω m converges to E, P σ,τ -almost surely, uniformly with respect to the strategy τ of Nature. It will be convenient to define Ω n = n m=1 ω m . Proposition 1.10 Let E ⊂ R d be a closed B-set. Then Blackwell's strategy applied to the sequence of weighted averages g ω,n ensures that for every n ∈ N and η > 0 The proof is absolutely identical with Cesaro averages (when ω n = 1 for every n ∈ N) thus omited. In particular, for any polynomial weights, i.e. if ω n = n α with α > −1, a B-set is approachable at the rate of convergence of O (1/ √ n), which is independent of α -only the constant depends on α, see e.g. Mannor, Perchet & Stoltz [52].
In fact, as we shall see in the following Section 1.4.1, a B-set is approachable as soon as the usual Robbins-Monroe assumptions are satisfied almost surely: We now turn to the case where a stage length might depend on the actions of the player and Nature. For simplicity, we assume that there exists a mapping ω : A × B → [ω, ω] ⊂ (0, 1] such that ω n := ω(a n , b n ). Approachability in this framework can be reduced to regular approachability, similarly to what has be done with activation. Proposition 1.11 A closed set E ⊂ R d is approachable with respect to weighted averages if and only if the following cone E is approachable with Cesaro averages Moreover, if E is convex then E is also convex, thus E is approachable with respect to weighted averages if and only if As before, one has reciprocally, or even with an exponential decay (since {0} is convex, see Section 1.2.2). Finally, the approachability bound (in expectation) matches the optimal bound in the law of large number and thus is in some sense optimal. Indeed, if X n is an i.i.d. sequence such that X n = ±1 with probability 1/2, then by denoting E = {0}, one has

Bounded memory
Blackwell's approachability strategy does not require to know at each stage the whole sequence of past payoffs, but only the current average. Nonetheless, to update this average either stage number of the complete history must be kept in memory which takes of course an increasing required size of memory. This is why the question of wether it is possible to approach a closed set E using simpler strategies, for example implementable by a finite automata or with a finite memory, arises.
A strategy σ has a bounded memory of size M ∈ N if, for every finite history h n ∈ H n , σ(h n ) depends only on a n−M +1 , b n−M +1 , . . . , a n , b n , i.e. the last M profiles of actions played. Lehrer & Solan [44,46] proved that an approachable convex set C remains approachable by a player if it is restricted to use strategies with a bounded memory of size M ∈ N; indeed, the average payoff converges to some O(1/ √ M )-neighborhood of C. The basic idea is relatively natural; play Blackwell's strategy on a block of size M , then erase the memory and start over. It is only necessary to encode the beginning (and the end) of a block, but this can be done using √ M stages, for examples by playing always the same action and by ensuring that no such sequence appears in the same bloc. The average payoff on each block will be 1/ √ M close to C which is convex, hence the overall average payoff is also 1/ √ M of C.
On the other hand, Zapechelnyuk [82] considered the strategy with bounded memory directly adapted from Blackwell's, that is defined by σ(h n ) = x g M n , where x(·) is given by the definition of a B-set and g M n is the average payoff on the last M stages. For instance, we are interested by this strategy in the game where payoffs of player (that chooses a row) are given by the following matrix: For M big enough, there exists a strategy of Nature such that the sequence g M n n∈N enters a cycle (of length either 2M or 2M + 2). Roughly speaking, this latter is defined by four successive blocks of lengths M/2 (or M/2 + 1) where within a block, the same pair of actions (except on at most one stage) is played. And one can show that the order of these actions is (T, L), At the end of the blocks (B, R) and (T, L), g M n is close, respectively to (−1/2, 1/2) or (1/2, −1/2). So it is at a distance of around 1/2 from C, and the sequence (g M n ) n∈N of averages of payoffs on the M last stages does not converge to C.
However, nothing indicates wether the sequence g n does or does not converge to C (which is the case in this example).

Approachability in continuous time
Benaïm, Hofbauer & Sorin [7] noticed that Blackwell's approachability strategy of a B-set E satisfies the following recurrence relation: condionnaly to h n , Therefore, the sequence of averages payoff {g n } n∈N is a Discrete Stochasitic Approximation (a DSA for short) of g, solution of the associated ordinary differential inclusioṅ The derivative of the mapping δ(t) = d 2 C (g(t)) satisfies δ ′ (t) ≤ −2δ(t)/t thus it is a Lyapounov function and δ(t) ≤ δ(0)t −2 . As a consequence, g converges to E and, as an DSA, the sequence {g n } n∈N converges a.s. to E. However, rates of convergence of DSA are usually not explicit and might not be uniform.
To circumvent this issue, one might consider procedures in law, as defined in Section 1.3.1, that are deterministic and thus can be represented as an Euler Scheme of the associated ordinary differential inclusion. They might provide explicit rates as the difference between the average payoff and its expectation converges to zero, and is controlled by concentration inequalities (see Sorin [72] or Kwon [40]).
As Soulaimani, Quincampoix & Sorin [2] have considered an auxiliary differential game D where control spaces of the player and Nature are respectively X = ∆(A) and Y = ∆(B)) and the game dynamic is given by: The intuition is that g(t) = 1 t t 0 g(x(s), y(s))ds is the average payoff at time t. The change of variables t = e s and g(s) = g(e s ) modifies the dynamic into This transformation proves the characterization of a B-set given in Equation (4). Indeed, a set E is approachable if the player can force the dynamic to stay within E. Therefore, a closed set E is a B-set if and only if it is a discriminating domain for the player with respect to the dynamic f , i.e. if

Information-based strategies
Blackwell's strategy is a payoff-based strategy as the running relevant state variable is the sequence of average payoffs. We develop in this section a conceptually completely different kind of strategy based on the sequence of observed profile of action played, as in Perchet & Quincampoix [63] or Mannor, Perchet & Stoltz [51]. The basic idea follows from the following simple fact. Define θ n = δ an,bn ∈ ∆(A × B) as the Dirac mass on (a n , b n ) ∈ A × B and let θ n = n m=1 θ m /n be their average. By definition, g n = E θ n [g(a, b)] belongs to E if and only if θ n belongs to the following set If E is closed and convex, then E (seen as a subset of R A×B ) is also closed and convex; it remains to compare distance between E and E. Lemma 1.12 There exists γ > 0 such that, for any probability measure θ ∈ ∆(A × B) and any set E This gives the second inequality. For the first inequality, notice that g : ∆(A × B) ⊂ R A×B → co{g(a, b)} is a linear mapping, so its inverse g −1 is piecewise linear thus Lipschitz, see e.g., Billera & Sturmfels [8], bottom of page 530, or Walkup & Wets [81]. As a consequence, there exists λ > 0 such that for every z, z ′ ∈ co{g(a, b)} and any points θ such that g(θ) = z, there exists and one just has to take γ = 1/λ.
The consequence of this lemma is that an approachability strategy for E is an approachability strategy for E (and reciprocally); apart from the requirement to compute E, only constants in rates of convergence deteriorate.
The main advantage of this new kind of algorithms is that they do not rely on the observed sequences of payoffs. For example, consider the cases where payoffs are not vectors in some Euclidian space but in some arbitrarily normed space (or even payoffs can be subsets of this space). If the image space is not Hilbertian, then Blackwell's proofs do no longer hold; on the other hand, the transformation sequences of payoff into sequences of profile of action remains true. Therefore, we get this very general version of characterization of approachable convex sets. Theorem 1.7 Let H, N (·) be any normed space (not necessarily Hilbertian) and g : ∆(A) × ∆(B) → H (or g : A × B → H is A and B are some compact convex sets) any continuous bi-linear mapping. Then Blackwell's characterization of approachable convex sets holds: The result is already proved if A and B are finite. If they are some compact convex sets and g is continuous, then one can discretize them to get ε-approachability strategy. Since C is convex, they can be concatenate into an approachability strategy (using the doubling trick).
From the point of view of computational geometry, this result is rather intuitive. Indeed, no matter the image space, co g(a, b); a ∈ A, b ∈ B is a polytope with at most AB vertices which belongs to a relative space of finite dimension at most AB − 1. Up to a renormalization, this gives Theorem 1.7. However, in case where H = L 2 (Ω, µ, F), this does not directly imply previous results as the approachability is only in probability and not µ-almost surely.

Potential-based and uniform-norm approachability
Approachability was first defined with respect to the ℓ 2 distance. Roughly speaking, this induce a repeated game (see also the next subsection) between the player and Nature where the first player minimizes the distance to the set E and Nature maximizes it. This can be generalized to a more general class of mappings Φ : R d → R, called potentials, that are twice continuously differentiable (although this condition can be fairly weakened).
An illustration of the interest of potential based approachability is given in the following Corollary 1.16. It yields fastest rates of convergence when distances to sets are defined with respect to the uniform norm · ∞ instead of the Euclidian norm . 2 .
Let us denote by δ the minimum level of Φ that player can guarantee in expectation if he plays second, i.e.
Theorem 1.8 Assume that, for every z outside E δ , the gradient ∇Φ(z) points sufficiently towards z, i.e., there exists β > 0 such Then, no matter the strategy of Nature, choosing x n+1 = x(g n ) yields, in expectation, where κ Φ is a constant depending uniquely on Φ.
If β = 0 but the inequality is strict in (12), then uniform convergence still holds yet at a non-explicit rate.
Proof: First, notice that we can focus on the case where δ = 0. The proof follows from Hart & Mas-Colell [32] and Sorin [71] (see also Cesa-Bianchi & Lugosi [13,14]) and is based on a Taylor expansion of Φ. Indeed, since g n+1 = g n + (g n+1 − g n )/(n + 1) and Φ is C 2 , there exists some ξ n ∈ [g n+1 , g n ] such that where ∇Φ and D 2 Φ are respectively the gradient and the Hessian of Φ; since the latter is C 2 and every g n belongs to the same compact set, there exists and the result follows from simple induction when β ≥ 1. When 0 < β < 1, the bound is a consequence of the fact that The proof is a bit more intricate for β = 0 (along with a strict inequality in (12)), but we can use the fact that g n is a D.S.A. of the following differential inclusioṅ therefore g converges to C δ and so does g n .
If C and Φ are convex, then Equation (12) for some convex, twice continuously differentiable mapping Φ whose Hessian is bounded in norm by κ Φ on co g(a, b) , there exists a strategy such that, in expectation and no matter the strategy of Nature, The assumption that Φ is twice continuously differentiable can be easily weakened, in particular as soon as the constant κ Φ exists. The next proposition is concerned with the sequence of sums of payoffs G n = n m=1 g m instead of averages. It will be used, in some cases, to improve rates of convergence.
Proof: This is a consequence of the fact that, for some ξ n ∈ [G n , G n+1 ], followed by an immediate induction.
This result can be immediately extended if Φ is not C 2 but such that As mentioned before, the following corollary shows a faster convergence if C is an approachable cone. Proposition 1.14 is even used more deeply to get optimal rates of convergence (both in the number of stages and the dimension) below to obtain approachability with respect to the uniform norm.

Corollary 1.15
If C is an approachable closed and convex cone, then Blackwell's strategy ensures that, no matter Nature's strategy and for every N ∈ N Proof: First, if C is a cone then necessarily z − Π C (z), Π C (z) ≤ 0, therefore the first condition of Proposition 1.14 is the characterization of the fact that C is a B-set. Second, if Φ(·) = d 2 C (·) then Φ satisfies the second condition of Proposition 1.14 -or at least its straightforward extension -with κ Φ = g 2 ∞ . Since C is a cone, d 2 C (g n ) = Φ(G n+1 )/n 2 and the result follows.
For simplicity, we will assume that g(a, b) ∞ ≤ 1 and we only consider target sets such that, for some b k , c k ≤ 1, Corollary 1.16 There exists a strategy σ of the player such that, against any strategy τ of Nature and every n ∈ N and δ > 0, with probability at least 1 − δ, Proof: We first prove a similar result in the specific case where C = R d − is the negative orthant and if an horizon N is known in advance. Then we will use a doubling trick to conclude for the orthant; we will finally show how to reduce approachability of any product set C.
Let Φ be the following potential, depending on a parameter η > 0 to be fixed later: where diag(λ i ) is the matrix whose diagonal is λ 1 , . . . , λ d and zero everywhere. As a consequence, since C is approachable, the first condition of Proposition 1.14 is satisfied and , Proposition 1.14 and the choice η = log(d)/N imply that We now make appeal to the doubling trick, that is, we consider the strategy consisting in playing by blocks of lengths 2 k , following the potential associated with η k := log(d)/2 k on the k-th block and reseting everything at the beginning of a new block. A simple induction, based on the convexity of d ∞ C (·), shows that, at the end of any block, Hence it remains to control distances within blocks. Yet, using the previous bound obtained for ends of blocks, one has for n = 2 Concentration arguments give the bound in high probability. Indeed, the union bound implies thus the probability that g n − E[g n ] ∞ is smaller than 2 n log 2d δ is bigger than 1 − δ. The result for the orthant is a direct consequence of the triangle inequality.
We no longer assume that C is an orthant, but is defined by and g(x, y) ∈ C if and only if h(x, y) ∈ R 2d − . The result follows from the bound exhibited for the orthant.

From weak approachability to approachability
Recall that a closed set E is approachable if the player has a strategy such that after some (maybe large) stage N , the payoffs remains in a small neighborhood of E. Similarly, it is excludable if Nature can enforce the dual: after some stage N , the payoffs remains outside some neighborhood of E. Blackwell proved that there exists a dichotomy for convex sets: they are either approachable or excludable. This is not true for any set, as illustrated in the following example, due to Blackwell.
Consider the set and payoff matrix defined by, with A = {T, B} and B = {L, R}, Assume that the strategy of the player dictates to play T during N stages (with N a large even number) then to play either always T or always B during the following N stages, depending on wether Nature has played more than half of the time R during the first N stages.
In the former case, the player got after N stages, an average payoff of (1, y) with y ≥ 1/2 thus by keeping to play T for N stages, he ensures that its average payoff after 2N stages is (1, y ′ ) with y ′ ≥ y/2 ≥ 1/4. In the latter case, the payoff after N stages is (1, y) with y ≤ 1/2, thus the payoff after 2N stages is (1/2, y ′ ) with y ′ = y/2 ≤ 1/4.
As a consequence, this strategy guarantees that, after 2N stages, the payoff is exactly in E. So if this procedure is applied during 2N 1 stages, then started over for 2N 2 = 2e N 1 stages, then started again over for 2N 3 = 2e N 2 stages and so on, the payoff is infinitely often arbitrarily closed to E which is therefore not excludable.
Unfortunately, E is not approachable; indeed, this would imply that at least one of the two connected (and convex) component of E is approachable. But neither of them satisfies Blackwell's characterization.
In this example, the player cannot enforce the payoff to remain close to E, but if he knows in advance that there are only N stages in the game, then he can ensure that, at the terminal stage, the payoff is in E (or at least, for odd integer, 1/N -close to E). A natural weaker concept of approachability emerges: a set E ⊂ R d is weakly-approachable if, given some fixed large length of the game, the player has a strategy such that the terminal average payoff is close to E. Definition 1.6 A closed set E ⊂ R d is weakly approachable if for every ε > 0, there exists N ε ∈ N such that, in any game of length n ≥ N , the player has a strategy σ n such that, no matter the strategy τ of Nature, Similarly, E is weakly excludable if Nature can weakly approach the complement of E δ for some δ > 0.
We emphasize the fact that in weak-approachability, strategies can depend on the length of the game n, which is not allowed for regular approachability. The question rose by Blackwell [9] and solved by Vieille [77] is wether there exists a dichotomy between weakly-approachable and weakly-excludable sets. Theorem 1.9 Any closed set is either weakly-approachable or weakly-approachable.
Proof: We only sketch here the proof of Vieille [77].
Consider the differential zero-sum game where the player and Nature choose action x(t) ∈ ∆(A) and y(t) ∈ ∆(B) in continuous time (actually, even the formal definition of strategies might require precise notations and concepts). In this game, a state variable which represents the accumulated payoff, evolves following the dynamiċ G(t) = g x(t), y(t) and G(0) = 0 during the time t = 0 and t = 1.
In this game, the overall objective of the player is to minimize the terminal payoff d E (G(1)), while Nature maximizes it. The important fact is that one can prove, using techniques and results from differential games, that this game has a value v. If v = 0, then the player has a strategy such that the cumulated payoff at time t = 1 is exactly in E whereas if v > 0, Nature has a strategy such that this cumulated payoff is bounded away from E.
It remains to understand that a game in discrete time with N stages is a discretization (or an approximation) of this differential game and as N goes to infinity, this approximation is more and more precise. Therefore, if the player can enforce that G(1) belongs to E, then he can ensure that g N is arbitrarily close to E when N is large enough. The converse holds for Nature, hence the result.
Actually, the focus of this section if not only this important (and elegant) result but also the following properties, inspired from Cesa-Bianchi & Lugosi [14] or Rakhlin, Sridharan & Tewari [66]. Given an approachable convex set C, let σ N be an optimal strategy in the N -stage zero-sum game with terminal payoff E σ,τ d C (g N ) and denote by v N the value of this game (its existence is not difficult).
We know that E σ,τ d C (g n ) can be upper bounded, using some adequate approachability strategy, by O ( g ∞ / √ n); but it is also obviously lower-bounded by v n . So the computation of v n could indicate wether the rate is tight or not. On the other hand, exact computation of v n might be challenging, yet if satisfies where σ(τ ) is the strategy that chooses, given τ and after the finite history h n , where the supremum is taken over all sequences Last inequality is a consequence of Hoeffding-Azuma's inequality in Euclidian spaces.
A question that naturally arises is wether we can concatenate -using the doubling trick -optimal strategies in games of length 2 k to construct an approachability strategy of C (i.e., independent of any horizon n). The answer is both no and yes: no with the current definition of v n . Indeed the only guarantee is that terminal payoff is v 2 k -close to C but, for instance, payoff at middle stages could be arbitrarily away.
On the other hand, since C is a convex set, we can modify the definition of v n as follows so that the answer is yes. Define so that, using the same arguments and Doobs (or Hoeffding) maximal inequality Finally, the doubling trick works with this definition of v ′ n , see the proof of Corollary 1.16.
This technique seems void at first sight, but might be useful in some specific examples, as in Proposition 4.2 in Section 4 (see also Remark 4.2). In this case, because of the geometry of C, one has d C (z) ≤ 2 z−c ∞ , for every c ∈ C. Then the same tools yield that v n is smaller than O log(d)/n which is negligible compared to g ∞ / √ n ≃ d/n as the dimension d increases.

Regret minimization
Hannan [30] introduced the concept of external regret in repeated two-player games (between a player and Nature, with scalar payoff) in order to define an exogenous criterium to evaluate a strategy in a non-Bayesian framework. In words, the player has no external regret (or his strategy is externally consistent) if, asymptotically, he could not have gained strictly more if he had known, before the beginning of the game, the empirical distribution of moves of Nature. This notion has notably been refined by Foster & Vohra [23] (see also Fudenberg & Levine [28]) into internal regret: a player has no internal regret (or his strategy is internally consistent) if he has no external regret on the set of stages where he played a specific action, as soon as this set is big enough.

External regret
Choices of actions a n ∈ A and b n ∈ B generate a regret r n ∈ R A defined by Intuitively, the regret r n represents the differences between what the player could have got and what he actually got. And a player has no external regret if, asymptotically, every component of the average regret is non positive. In words, this means that the player could not think " if I had known [the empirical distribution of Nature's actions], I would have always played action a * ", hence the terminology of regret. Indeed, by linearity of ρ, Given a vector U ∈ R d , the notation U + will stand for the positive part of U , i.e., U + = max{U i , 0} 1≤i≤d . Similarly, U − is the negative part of U .
Definition 2.1 A strategy σ of the player has no external regret if, for all strategy τ of Nature, P σ,τ -almost surely, The existence of externally consistent strategies goes back to Hannan [30]. However, the following theorem, with rates of convergence independent of Nature's strategy, is due to Cesa-Bianchi & Lugosi [14].
Theorem 2.1 There exists an externally consistent strategy σ, such that, no matter the strategy τ of Nature and for every n ∈ N, We will not yet provide proofs of this result; instead, we will show a weaker result, following Zinkevich [83]. The basic idea is to notice that the overall objective is to maximize the convex function ρ(·, b n ) and therefore to apply any convex-maximization techniques, for example a gradient descent.
Proof: First, we claim that for every n ∈ N, there exists a strategy σ n (that depends on n), such that Let η be a parameter to be fixed later and define, for every m ≤ n the strategy σ following an usual gradient descend: the projection step ensures that x m+1 stays in ∆(A). Simple calculations show that, for every a ∈ A, Balancing the two terms by choosing η = 1/ √ nA proves the claim. We stress out the fact that this strategy ensures that, at stage t, the regret is bounded as which might be considerably bigger than √ tA for small t, but this uniform guarantee allows the use of a doubling trick, as in Corollary 1.16, to conclude.
To get the log(A) term instead of A in the upper bound, one just has to follow the algorithm known as exponential weight algorithm, defined by: where η n = 8n log(A) , see, e.g., Littlestone & Warmuth [47], Vovk [79] or Auer, Cesa-Bianchi & Gentile [3].
The following corollary shows that the previous result can be extended to the compact case. Actually, the proof is exactly the same, since it did not use the fact that B is finite, thus is omitted.
Corollary 2.1 Assume that Nature chooses at every stage an outcome vector U n in a compact set U ⊂ [0; 1] A such that the players payoff at this stage is U an n . Then there exists a strategy σ, such that, no matter the strategy τ of Nature and for every n ∈ N, Theorem 2.1 and Corollary 2.1 can actually be proved using more complex optimization procedures, as mirror descent instead of gradient descent (see e.g., Rakhlin [64] or Bubeck [12] for a survey on the use of these techniques in machine learning) and without using doubling trick. We will, on the contrary, prove them using approachability theory.
Remark 2.1 In Section 1.4.4, we claimed that we could not use a doubling trick. It was possible here because any strategy σ n , although only optimal at the final stage n, ensures relatively good performance at all stages. For instance, at the specific stage t = n/2, the regret is bounded in 3 √

4
A/t ≃ 1.06 A/t. This was not the case in the previous section, where the distance to the set could be of the order of a constant.
More specifically, strategies σ n are somehow equivalent to weak approachability (only the final stage matters). If we could always concatenate strategies using a doubling trick to output a strategy that behaves well at all stages, then we could construct approachability strategy from weak approachability strategies. This would mean that any set is either approachable or excludable, which is not true in general (in fact, as proved in Section 4.1, regret corresponds more to the approachability of convex sets, on which weak and regular approachability coincide).
An usual criticism to the notion of regret in games (and this could lead to long and probably unfruitful debates) is that a player compares his payoff with the payoff he would have got if he had always played the pure action a * . However, if he had played something else, then Nature would (or at least could) have chosen a totally different sequence b n so the comparison is meaningless. An easy and unsatisfactory answer is to say that a player's action does not change the behavior of Nature (as in the learning with experts advices literature, see Cesa-Bianchi & Lugosi [14]). A less unsatisfactory answer consists in stating that since there is absolutely no prior on Nature, it is impossible to infer whatsoever on her strategy if the world had been different. So we should compare the payoff with respect to best information available, which is the current sequence.
Let us develop a third point of view, based on game theoretic perspectives. The basic idea is that regret is not a criterion to compare different strategies: it does not say that a strategy without regret is a better strategy than always playing a * . In our repeated game, the player maximizes his cumulated payoff without any structural assumption on Nature. Therefore, he can just sequentially formulate predictions upon her behavior (we purposely remain vague on this subject) and play a best response to it. Regret is a simple measure on how much a sequence of predictions is correct or not. A large regret would mean that the player was most of the time wrong.

Internal and Φ-regret
The notion of external regret has been refined by Foster & Vohra [23] into the so-called internal regret. In words, a player has no internal regret (or his strategy is internally consistent) if he has no external regret on the set of stages on which he chose a specific given action.
Formally, choices of action a n ∈ A and b n ∈ B generate, besides an external regret r n , an internal regret R n which is an A × A-matrix whose raw are null except the a n -th one which is r ′ n ; stated otherwise R a,a ′ n = R(a n , b n ) a,a ′ := ρ(a ′ , b n ) − ρ(a n , b n ) if a = a n 0 otherwise .
Let us introduce here some notations. Given two sequences g n ∈ R d and a n ∈ A, recall that g n denote the average up to stage n. We define, for every a ∈ A, the following subset of stages and conditional averages Regret has been refined further by Blum & Mansour [11] into swap-regret (or Φregret). Define, for every mapping φ : A → A, family Φ ⊂ {φ : A → A} and n ∈ N, Definition 2.3 A strategy σ has no Φ-regret if, no matter the strategy τ of Nature, P σ,τ -almost surely, Existence of such strategies is due to Blum & Mansour [11]; proofs are again delayed.

Theorem 2.3
There exists strategies without Φ-regret such that, for every n ∈ N The notion of Φ-regret is a refinement of respectively external and internal regret, because of the specific choices of families Φ e := φ a * ; ∀a * ∈ A, φ a * (a) = a * , ∀a ∈ A or Φ i := φ a ′ ,a * ; ∀a ′ , a * ∈ A, φ a ′ ,a * (a ′ ) = a * and φ a ′ ,a * (a) = a if a = a ′ . Proposition 2.2 links the different aforementioned quantities, and shows that minimizing internal regret is, in some sense, enough to minimize each one of them (up to the cost of a factor A). We will need the following notation.
Proposition 2.2 Given any family Φ ⊂ {φ : A → A}, one has R Φ n = H Φ R n where R n is seen as a vector of size A 2 . As a consequence, For the specific case of external regret, one also has r n = R n 1 (where R n is seen as a matrix and 1 is a vector with only ones). The converse is not true as there exist externally consistent strategies with linear internal regret.
Proof: The proof of the first part follows directly from the definitions of internal and Φ-regret. For the existence of externally consistent strategies with linear internal regret, we refer to Stoltz & Lugosi [75].
Another refinements of these concepts can be made, following this time Fudenberg & Levine [28] and Lehrer [43], in two different directions. The first one is to assume that regret is computed not at every stages, but only on a restricted subset of stages (that might depend on the history) and the second direction is to consider time varying switch-mapping φ. Formally, let X be an activation function, i.e., X : H × A → {0, 1} and X (h n , a n+1 ) = 1 indicated that the stage n + 1 is active. We recall that H stands for the set of all finite histories.. A switch function φ : H × A → A indicates that, after the finite history h n , ρ(a n+1 , b n+1 ) will be compared to ρ (φ(h n , a n+1 ), b n+1 ).

Definition 2.4
Given an activation mapping X and a switch mapping φ, a strategy σ has no (X , φ)-regret if, no matter the strategy τ of Nature, P σ,τ -almost surely as soon as n m=1 X (h m−1 , a m ) converges to +∞.
Lehrer [43] has proved that, given a probability λ on the whole set of pairs of activationsswitch mappings (embedded with the product topology), there exists a strategy without (X , φ)-regret, for λ-almost all pairs. However, rates of convergence are not explicit, in part because we divide the score by the number of actives stages n m=1 X (h m−1 , a m ) and not by n.

Reductions : form external to Φ-regret
In this section, we show how to construct a strategy with no Φ-regret based on an algorithm that only outputs externally consistent strategies, developing an idea of Stoltz & Lugosi [75] and recovering the more general result of Blum & Mansour [11]. Indeed, consider the following auxiliary game where action sets of the player and Nature are respectively Φ and a compact subset U ⊂ [0; 1] A . Given an exogenous sequences p n ∈ ∆(A) we define the payoff at stage n of the player generated by the choices of φ ∈ Φ and U n ∈ U by Let θ be an externally consistent strategy and θ φ n denote the weight put by θ on φ at stage n. Then the expected external regret at this stage is written as   U a n On the other hand, the strategy that dictates to play p n at stage n in the original game suffers an expected Φ-regret defined by So, as soon as p n [a] = φ∈Φ θ φ n p n • φ −1 [a] for every a ∈ A, Φ-regret in the original game and external regret in the auxiliary game coincide exactly (in expectation). And the latter converges to zero, at the same speed of the former, i.e., at rates indicated by Theorem 2.1 and Corollary 2.1.
The existence of such a p n is a simple consequence of Brouwer fixed point theorem. Indeed, first, notice that θ n depends only on the past observations, thus is independent of p n . As a consequence, p n can be taken as any fixed point of the continuous mapping p → φ∈Φ θ φ n p • φ −1 from the simplex ∆(A) to itself. We only have proved the convergence of Φ-regret in expectation; as usual, almost sure convergence is a consequence of concentration inequalities (or see Theorem 2.7, page 47 and Example 1, page 19, in Hall & Heyde [29]).

Compact action spaces
Although Φ-regret can be seen as a consequence of external or internal regret in the finite case (when A is finite), its introduction is more useful in the following compact case.
Assume that A, action space of the player, is no longer finite but a compact subset of some Euclidian space. On the other side, U , action space of Nature, is a subset of mappings from A to R. Choices of a n ∈ A and U n ∈ U generate, at stage n, a payoff of ρ n := U n (a n ).
External regret is defined almost exactly as before, i.e., r n : A → R is a continuous mapping defined by r n (a) = U n (a)−ρ n . In the compact case, we must however be careful in the order of quantifiers when passing to limits: a strategy σ is externally consistent if, for all strategy τ of Nature, P σ,τ -almost surely,

Remark 2.2
We claimed that order of quantifiers has some importance. Assume that A = [0, 1] and that for every n ∈ N and a ∈ [0; 1], U n (a) = 1 a∈(0,1/n) . Choosing always the same fixed action a * gives zero as an asymptotic average payoff, therefore the strategy that plays a n = 0 should not have any regret (neither external, internal, or Φ for that matter). On the other hand, for every N ∈ N, the choice of a * = 1/2N gives U N (a * ) = 1, thus lim sup n→∞ sup a * ∈A U n (a * ) − ρ n = 1. This explains the choices in the order of quantifiers in the definition.
Difficulties arise to define internal regret, because scores are multiplied by frequencies of actions in the finite case. We shall instead only focus on Φ-regret, whose definition is also identical: R Φ n : Φ → R is a mapping defined by R Φ n (φ) = U n (φ(a n )) − U n (a n ). And a strategy σ has no Φ-regret if, for all strategy τ of Nature, P σ,τ -almost surely, sup φ∈Φ lim sup n→∞ 1 n n m=1 U n (φ(a n )) − U n (a n ) ≤ 0, or equivalently, lim sup If A is not compact but (A, F, µ) is a probability space, then external and Φ-regret can also be defined µ-almost surely. The supremum over Φ is simply replaced by for µ-almost every mappings φ ∈ Φ.
almost surely, no matter the strategy of Nature.
Of course, without additional assumptions on the sequences B n and Φ, generalized regret cannot be minimized. Rakhlin, Sridharan & Tewari [66] have used the min-max techniques to infer the existence of such strategies (associated with rates of convergences) on specific cases: i) External, internal and Φ-regret are obtained if g = ρ, B n (z 1 , . . . , Z n ) = 1 n n m=1 z m and, for every φ ∈ Φ, there exists ξ ∈ Ξ such that ξ[g](a, b) = ρ(φ(a), b).
ii) Approachability of a convex C if B n (z 1 , . . . , z n ) = −d C 1 n n m=1 z m and the departure mappings are ξ[g](a, b) ∈ C for any a ∈ A and b ∈ B.
iii) When B is a function of the average, i.e., B n (z 1 , . . . , z n ) = G 1 n n m=1 z m , an interesting (yet maybe counterintuitive) property arises even in the finite case. There might exist strategies that are not externally consistent yet internally consistent, in the sense that, ≤ 0 .

Experts
An interpretation -which is actually also a generalization -of these results concerns games of predictions with expert advices, studied (almost exhaustively) by Cesa-Bianchi & Lugosi [14]. At each stage n ∈ N, an agent must take a decision d n in some topological convex and compact set D. He is advised by a pool E of experts, i.e., expert e suggests to choose the decision d e n at this stage. Once his choice his made, Nature reveals the state of the world s n ∈ S (where S is some arbitrary space) which generate a loss L n := L(d n , s n ).
After n stages, the agent has suffered an average loss of L n = 1 Therefore L n − L ⋆ n is smaller than the expected regret of σ, hence the result.

Regret and sets of equilibria
Existence of consistent strategies can be used to prove classical game theory results: nonemptiness of Hannan (or correlated) sets and min-max theorems, as noticed by Blum & Mansour [11] and Cesa-Bianchi & Lugosi [14]. Consider a game between a set of players I of size I, where A i denotes the finite action space of player i and ρ i : i∈I A i → R his payoff function (extended multi-linearly as usual). Hannan set of player i is the subset of joint distributions of actions defined by where ρ i (q) = E q [ρ i ] and q −n is the marginal of q on j =i A j , i.e., the empirical joint distribution of actions played by the opponents of player i. Informally, a joint distribution q belongs to H i if player i has no interest to always play a fixed action a * ∈ A i if his opponents coordinate to play accordingly to q −i .
By linearity of ρ i , if a strategy of player i is externally consistent (independently of the behavior of its opponents), then necessarily the empirical joint distribution of actions converges to H i . We qualify this property as unilateral, as it does not make any assumption on opponents' strategies.
If every player follows unilaterally an externally consistent strategy (but not necessary output by the same algorithm), then empirical distributions of actions will converge to the Hannan set of the game, H = ∩ i∈I H i , which is therefore guaranteed to be non empty.
The main difference between elements of Hannan set and Nash equilibria is that in the latter the distribution must be a product distribution. So set of Nash equilibria is always contained, but might be in some arbitrary game, much smaller than H.
On the other hand, in zero sum game, elements of Hannan set satisfy the following property. If q ∈ ∆(A × B) belongs to H, then if we denote by q 1 ∈ ∆(A) and q 2 ∈ ∆(B) its marginals, necessarily ρ(x, y) .
Since max x∈∆(A) min y∈∆(B) ρ(x, y) ≤ min y∈∆(B) max x∈∆(A) ρ(x, y) always holds, both quantities must coincide and, by definition, are equal to the value v of the game. More importantly, the first and last inequality above imply that ρ(q 1 , y) = v, thus (q 1 , q 2 ) is a pair of optimal mixed actions.
As a consequence, in a zero sum game, if players follows unilaterally consistent strategies, they will obtain asymptotically at least the value. And if both players have consistent strategies, their empirical mixed action converges to their set of optimal mixed actions.
This property has been somehow generalized by Hart and Mas-Colell [33] in potential games, see also Viossat & Zapechelnyuk [78]. They have constructed a specific externally consistent strategy such that, if both players follows it, the product of empirical distributions of actions converges to the set of Nash equilibria (and more precisely to a subset of it whose payoff are identical). However, this is only a global property (as opposed to unilateral properties) as both players must follow this specific strategy. Moreover, the result does not extend to any game, even those with an unique Nash equilibria.
We proved, following Cesa-Bianchi & Lugosi [14] and Sorin [70], a min-max theorem due to von Neumann using externally consistent strategies. It is actually possible to get the following generalized version of Fan [18]. We first recall that a mapping ρ on A × B is said to be concave-like if for every a, a ′ ∈ A and α ∈ [0, 1], there exists a * ∈ A such that ρ(a * , ·) ≥ αρ(a, ·) + (1 − α)ρ(a ′ , ·). Convexity-like is defined similarly.

Theorem 2.4 Let
A be a compact set, B any set and ρ a concave-like convex-like mapping on A × B bounded from below and such that g(·, b) is upper-semicontinuous for every b ∈ B. Then the zero-sum game on A and B has a value.
Proof: Let B ′ be any finite subset of B and consider an externally consistent strategy of the first player; its existence is ensures by the following Corollary 4.4. It also implies that, for every ε > 0, there exists a sequence δ n ≥ 0 going to zero such that, at stage n, where b * n is given by the definition of convexity-like applied to n m=1 b m /n. On Nature's side, we can assume that her strategy is such that, at stage n, b n is an action realizing inf b∈B ′ ρ(a n , b) up to 1/2 n . As a consequence, where a * n is given by the definition of concavity-like. As a consequence, taking n and ε to their limits yields that, for any finite subset B ′ , Since A is compact and ρ(·, b) is upper-semicontinuous, for every ε and B ′ the set is a compact non-empty set, and this remains true for any finite intersection over different subsets. As a consequence, the whole intersection (over every ε and B ′ ) remains compact and non-empty, and any point a in it must satisfy that ρ(a, b) ≥ inf b∈B max a∈A ρ(a, b), for every b ∈ B. Stated otherwise, Stronger results can be proved using internally consistent strategies. Aumann [4] defined correlated equilibria in a game as a distribution on the set of profiles of action q ∈ ∆ i∈I A i such that, for every player i ∈ I and every action a ∈ A i : where q −i [a] ∈ ∆ j =i A j is the probability induced by q knowing that a i = a and q i [a] is the probability put on a ∈ A i by q (or the relative frequency of action a ∈ A i ).
In words, assume that a referee draws a lottery accordingly to q and only tells player i an action he should play. Then, a correlated equilibrium is a joint distribution such that every player, when he is told to play action a (and assuming that the others follows their recommendation), cannot gain strictly more by playing a * instead of a.
It is quite clear (from their very definition) that if every player follows unilaterally an internally consistent strategy then the empirical distribution of actions converges to the set of correlated equilibria (but maybe not to one specific correlated equilibrium), see Foster & Vohra [23].

Regret, (smooth) fictitious play and follow the perturbed leader
Fictitious play is a classic unilateral discrete time dynamic in game theory. At stage n, each player computes empirical (either joint or product) distributions of actions of his opponents and plays a best response to it. Although quite natural, this strategy is not externally consistent. On the contrary, Fudenberg & Levine [28] have introduced a slight modification, called smooth fictitious play that has asymptotically a regret smaller than ε (where ǫ > 0 is fixed), see also Hofbauer, Sorin & Viossat [36].
Let ρ ε denotes an ε-perturbation of ρ (induced by ψ : Since we are interested in unilateral procedure, we might as well make a change of variable by defining U = ρ(a, y) a∈A ∈ [0; 1] A so that ρ(x, U ) = x, U . As a consequence, the mapping ρ ε can be rewritten as We also define the ε-best response mapping by BR ε (U ) = argmax x∈∆(cA) x, U + ψ(x).
We assume that the mapping ψ : ∆(A) → R is chosen so that i) ψ is a continuously differentiable mapping and ψ ∞ ≤ 1; ii) The ε-best response mapping BR ε is univoque and continuous; iii) BR ε (U ) does not belong to the boundary of ∆(A).
Actually, point iii) ensures that ρ ε attains its maximum at a point where its first derivative vanishes. It can therefore be weaken into one of the following Study of σ(h n ) = BR ε (U n ), the strategy associated with this perturbation, might be simpler in continuous time. First, we introduce the mapping W : In particular, because of point i), regret is asymptotically smaller than 2ε as soon as lim sup n→∞ W ε (U n ) − ρ n ≤ ε. As in Section 1.4.1, the continuous-time dynamic associated with the discrete-time dynamic of (U n , ρ n ) is Define λ(t) = W ε (U(t)) − ρ(t) then one hasλ + λ ≤ ε thus λ(t) ≤ ε + M e −t for some constant M . As a consequence, λ is a Lyapounov function with respect to the set which is thus a global attractor of the dynamic (see Benaïm,Hofbauer & Sorin [7]). So (U n , ρ n ) converges almost surely to it and the strategy is ε-externally consistent. Benaïm & Faure [6] proved recently that external consistency can be achieve (without requiring a doubling trick argument) with a smooth fictitious play with a vanishing step size; indeed, they showed that if ε is not fixed but depends on n ∈ N as ε n = n γ , with γ < 1, then asymptotically the regret converges to zero.
Smooth fictitious play (also known as follow the regularized leader) is a generalization of two classes of algorithms, exponential weight algorithms or its even more general version called follow the perturbed leader (see Cesa-Bianchi & Lugosi [14], Sections 4.2 and 4.3). To recover the first class of algorithms, entropy must be used as regularization, i.e., ψ( which is, by definition, the exponential weight algorithm. Links with follow the perturbed leader (or Stochastic Fictitious Play accordingly to Fudenberg & Kreps [26]) might be a bit more tedious. This algorithm does not choose a deterministic regularization εψ but perturbs each component of U n by a random quantity ε a n , such that the joint density f : R A → R of the vector ε a n a∈A is independent of U n and n. Action played at stage n + 1 is any maximizer of U a n + ε a n . In particular, a given action a is chosen at this stage with probability X a (U n ) where X a (·) is defined by Follow the Perturbed Leader generates a discrete stochastic process (U n , ρ n ) which is an A.S.D. of the following differential inclusion This is a special case of Smooth Fictitious Play since, as soon as f is positive and X is continuously differentiable, Hofbauer & Sandholm [35] have shown that there exists a deterministic regularization εψ such that X(U ) = BR ε (U ). For example, in the case where ε a are i.i.d. with cumulative distribution F (x) = exp − exp(−ηx − γ) (where γ the Euler constant), follow the perturbed leader coincides exactly with exponential weight algorithm (see e.g., Lemma 1 in McFadden [54]).
As mentioned before, proofs based on A.S.D. do not exhibit rates of convergences (and this might be seen a major drawback of these techniques). However, we only considered here strategies that do not depend on the past sequence of player's actions (but only on the sequence of Nature's choices). So the discrete process is very closed to the one induced by procedures in law (this is not the case for approachability, see Section 1.3.1) which is in turn close to the continuous-time process. And it is actually possible to quantify explicitly these relative differences, see e.g., Sorin [72] or Kwon [40], in order to recover exact rates of convergence.

Calibration
We recall that calibration is a criterion introduced by Dawid [16] in the following repeated games between a player and Nature. At each stage n ∈ N, Nature chooses a state of the world ω n in some finite set Ω and the player makes a prediction upon its law by choosing a probability distribution p n ∈ ∆(Ω). Strategies of the player and Nature are mappings from the set of finite histories ∪ n∈N (Ω × ∆(Ω) n into, respectively, ∆(∆(Ω)) and ∆(Ω).
The usual example consisting of a meteorologist that predicts each day the probability of rain corresponds to Ω = {0, 1}, with ω = 1 if it rains. This binary case is in fact much easier than the general case, as discussed in Section 3.1.2.

Finite (ε and grid) calibration
We will need the following notations. For every p ∈ ∆(Ω) -seen as a subset of R Ω−1and ε > 0, let N n [p, ε] be the set of stages where the prediction was ǫ-close to p, i.e., where · is an Euclidian norm of R Ω−1 . We denote by ω n [p, ε] ∈ ∆(Ω) the empirical distribution of states on N n [p, ε] and by p n [p, ε] the average prediction on it.
Definition 3.1 A strategy σ of the player is ε-calibrated if for every strategy τ of Nature, and for every p ∈ ∆(Ω), A strategy is calibrated if it is ε-calibrated, for every ε > 0.
Intuitively, a strategy is ε-calibrated if on the set of stages (assuming that it is big enough) where the prediction was ε-close to some p ∈ ∆(Ω), the empirical distribution of states is close to this specific p. Although not stated explicitly in Definition 3.1, it is possible to require that rates of convergence are independent of Nature's strategy see Section 4.2 below. With a careful concatenation of ε-calibrated strategies, following the doubling trick, one can easily obtain a calibrated strategies, as did Foster & Vohra [24] or Fudenberg & Levine [27]. It remains to construct such strategies, which can be done using the slightly weaker concept of calibrated strategies with respect to an ε-grid of ∆(Ω) defined below.
We recall that a finite subset In words, a strategy is calibrated with respect to a grid if on the set of stages where p[ℓ] is predicted, the empirical distributions of states is closer to p[ℓ] than to any other p[k]. Each Voronoï is a polytope since they are defined by a finite number of linear inequalities, their union covers ∆(Ω) and any intersection has empty interior. The fact that the calibration score is non positive means that ω n [ℓ] belongs to (or converges to) the Voronoï cell V [ℓ].
Dawid [17] and Oakes [57] proved that there does not exist deterministic ε-calibrated strategies, based on a counter example given in the following section. On the other hand, there exists random ε-calibrated strategies, as proved by Foster and Vohra [23] by exhibiting an algorithm to construct makes the so-called Brier score decrease to zero.
Theorem 3.1 For every grid, there exists a calibrated strategy with respect to it. As a consequence, for every ε > 0, there exist ε-calibrated strategies, and thus calibrated strategies.
To end this section, we note that finite calibration can also be defined with respect to some weights {ν[ℓ] ∈ R; ℓ ∈ L}. A strategy σ of the player is weighted-calibrated with respect to p[ℓ], ν[ℓ] ; ℓ ∈ L if for every strategy τ of Nature, for every ℓ ∈ L, ≤ 0, P σ,τ -as.
as in Remark 3.1, a weighted-calibrated strategy ensures that ω n [ℓ] converges, as soon as the frequency of ℓ is not zero, to P [ℓ]. Because of the squared norms, this set is also a polytope.

Discussion on the impossibility of deterministic calibration
When Ω = {0, 1}, Oakes [57] and Dawid [17] output an example of Nature's strategy ensuring that no ε-deterministic calibrated strategies exist. Their idea is actually quite simple yet highly unstable. Define the strategy as follows: given the past history h n , if p n+1 ≥ 1 2 then ω n+1 = 0 and if p n+1 < 1 2 then ω n+1 = 1; In words, if the forecaster claims that it will rain with high probability then Nature does not make it rain and if it claims that it will not rain, Nature makes it rain. This prevents any deterministic strategies from being ε-calibrated, but this is not immediate (and the proof, although quite simple will shed lights on the following discussion). We distinguish two cases, either the predictions of 1/2 have an asymptotic positive frequency or a null frequency, i.e., if In the first case, p[1/2 + ε, ε] ≥ 1/2 while q[1/2 + ε, ε] = 0 thus such a strategy is not ε-calibrated.
In the second case, we can assume that no prediction falls exactly at 1/2 (since their frequency goes to zero). If the predictions bigger than 1/2 have an asymptotic positive frequency, then necessarily, there must exist p * such that the set of stages where predictions belong to [p ⋆ − ε, p * + ε] ⊂ [1/2; 1] also has a positive frequency. And since p n [p * , ε] ≥ 1/2 and ω n [p * , ε] = 0, the strategy is not ε-calibrated.
If the predictions bigger than 1/2 have an asymptotic null frequency, then necessarily the predictions smaller than 1/2 have an asymptotic positive frequency, and the same arguments hold (because we assumed that no predictions were equal to exactly 1/2). So no deterministic strategy can be ε-calibrated.
On the other hand, consider the deterministic strategy of the player that predicts at odd stages p n = 1/2 and at even stages p n = 1/2 − 1/n. The only accumulation point of the sequence of predictions if 1/2, so for every p = 1/2 there are a finite number of prediction ε-close to p, for every ε smaller than some ε p > 0. And on the other hand, for p = 1/2, no matter ε, if n is big enough, N n [0.5, ε] contains approximatively half predictions below 1/2 and half above, so the empirical distributions is asymptotically equal to 1/2. As a consequence, no matter ε > 0 and p = 0.5, Obviously, this does not contradict Oakes [57] and Dawid [17] counter-example. The reason is that, on the stages when the prediction is ε-close to p * = 1/2 + ε, the average prediction is 1/2 while the empirical state is 0. But one might argue that predictions are actually never close to p * (but ε-away), so Oakes and Dawid argument fails if calibration was defined only with respect only to those points p that are accumulation points of the sequence of predictions (i.e., there are predictions arbitrarily close to them).
This argument can be generalized to any stationary strategy of Nature (i.e., if ω n = f (p n ) for some fixed but possibly random mapping f ). Unfortunately, we are unable to claim that there exists deterministic (ε-)calibrated strategies with respect to accumulation points, but this shows how the very concept of calibration is unstable with respect to small variations in definition or objectives. This subject is somehow once again developed in Section 3.3.

Efficient calibration in the binary case
Foster [21] has designed an algorithm that computes efficiently an ε-strategy in the binary case (although it seems that it was Abernethy, Bartlett & Hazan [1] that noticed its efficiency). The idea is to consider a calibrated strategy with respect to the regular grid p[ℓ] := ε + 2ℓε ; ℓ ∈ L where L := 0, 1, . . . , (⌊ε −1 ⌋ − 1)/2 Following Foster's notation, we define, for every ℓ ∈ L, so that a strategy is calibrated if, asymptotically, every e ℓ n and d ℓ n are smaller than zero. Foster's algorithm consists in finding at stage n an element ℓ * ∈ L such that -either both e ℓ * n ≤ 0 and d ℓ * n ≤ 0; in that case, predict p[ℓ * ] -or e ℓ * −1 n > 0 and d ℓ * n > 0; in that case play p[ℓ * ] or p[ℓ * − 1] with a respective probability proportional to d ℓ * n and e ℓ * −1 n .
Existence of such a ℓ * is ensured by the fact that the first d 1 n and the last e L n are always non positive. Computations show that the error converges to zero.
So the tricky remaining part consists in finding efficiently this ℓ * . To this purpose, Abernethy, Bartlett & Hazan [1] introduced, for every ℓ ∈ L, the quantity θ ℓ n = e ℓ n if e ℓ n > 0, θ ℓ n = −d ℓ n if d ℓ n > 0 and θ ℓ n = 0 otherwise, which is well defined since e ℓ n and d ℓ n cannot be simultaneously positive. Specifically, it always holds that θ 1 n ≥ 0 and θ L n ≤ 0 so if any of them is equal to zero, Foster's strategy dictates to predict it. Otherwise, one must find ℓ * such that θ ℓ * −1 n > 0 and θ ℓ * n < 0 and the main argument is that it can be done through a binary search, thus in O (log(1/ε)) steps.
Foster's strategy can be somehow generalized with more than two outputs (see e.g., Mannor & Stoltz [50]) although, unfortunately, at the cost of efficiency since the binary search trick does not extend.

Generalization
Recall that, roughly speaking, a strategy is calibrated if on the set of stages where the prediction was close to p, the average prediction and the empirical distribution of outcome asymptotically coincide. General concepts of calibration are induced by a different definition of closeness.
Let F be a family of Borel measurable subsets of ∆(Ω) and denote, for every F ∈ F, ii) Perchet [60] defined F to be some neighborhood basis of ∆(Ω).
In the first case, the minmax techniques of Rakhlin, Sridharan and Tewari [65,66] upper-bound the calibration error at stage n (but with a strategy that depends on n) in O n −1/(|Ω|+1) while for the two last cases the bound shrinks to O n −1/2 . On the other hand, Mannor & Stoltz [50] and Perchet [60] obtained (actually before) the same results, yet in a constructive way. They are developed in Section 4.2.
Drawbacks of these definitions of calibration (which will lead to another type of generalization) are illustrated by the following examples.
Assume that Ω = {0, 1} and that the sequence of outcomes is 0, 1, 0, 1, 0... (i.e., ω n = 1 iff n is even). Consider a player that predicts, at every stage, that the probability of 1 is exactly 1/2. Then this strategy is calibrated accordingly to any of the previous definitions of calibration. On the other hand, on the set of even stages, empirical distribution is 1 while average prediction is 1/2 which contradict precepts of calibration.
Even more intricate: assume that Ω = {0, 1, 2} that ω n = 0 with probability 1/3 and that 1 and 2 alternates on the remaining set of stages. The sequence of outcomes on any fixed subset of N contains asymptotically as many 0 than 1 and 2 so predicting 1/3, 1/3, 1/3 at every stage is not contradicting. On the other hand, if we consider only the set of stages where the outcome was 1 or 2 then the prediction is always 1/2, 1/2 while 1 and 2 alternate.
We introduce the following concepts of checking rules. Let U and T be respectively an active universe mapping and a testing mapping, i.e., U : The interpretation is that stage n + 1 is active if (p n+1 , ω n+1 ) belongs to the active universe U (h n ); given a set of active stages, calibration compares the empirical frequency of the tested event with the average prediction of this event.
Such a pair (U , T ) forms a checking rule and we define as before the set of active stages , the empirical probability of tested events and the average predicted conditional probability of tested events Definition 3.4 A strategy σ is calibrated with respect to some given checking rule (U , T ) if, for every strategy τ of Nature, with the assumption that p{A|B} = +∞ if p{B} = 0.
The following theorem (a weaker version first appeared in Lehrer [41]) continues the discussion of Section 3.1.1 and weakens furthermore the range of the counterexample of Oakes and Dawid. It shows that if checking rules do not depend on current predictions (but possibly on past predictions), then deterministic calibration does exist; this is quite obvious if one faces only one checking rule, but the result actually holds with an infinite number of them.
To be formal, we embed the set of checking-rules independent of current prediction (i.e. pairs of mappings from n∈N (∆(Ω) × Ω) n into Ω) with the cylinder topology.
Theorem 3.2 Let λ be a probability distribution on the set of checking-rules independent of current predictions. Then there exists a deterministic strategy σ that is calibrated with λ-almost every checking rules.
Actually, the result that we shall prove is stronger as we will show that, as soon as |N n [U , T ]| goes to infinity, lim sup n→∞ q n [U , T ] − p n [U , T ] ≤ 0, P σ,τ -as.
A similar result (that extends Foster & Vohra [24]) due to Sandroni, Smorodinsky & Vohra [68] deals with checking rules depending on current predictions, under the following extra assumptions. We assume that the calibration test compares the empirical distribution of outcomes with the average prediction on the set of active stages where predictions were in some given set F ⊂ ∆(Ω); activeness of stages might depend on past histories. Formally, U (h n ) is either empty (so the stage n + 1 is not active) or U (h n ) = F × Ω. Mapping T is, on the other side, constant, i.e., T (h n ) = F × {ω} for some ω ∈ Ω (at least on active stages). Proposition 3.2 Consider a countable number of such checking rules. Then there exists a strategy of the player that is calibrated with every one of them.

Smooth calibration
Smooth calibration is another criterion (close to usual calibration) that can be satisfied with a deterministic strategy, as proved by Foster and Kakade. Even more surprisingly, it can be used to output a calibrated strategy showing again the instability of Oakes and Dawid's result.
The idea is to smooth definitions of calibrations. Indeed, notice that given F ⊂ ∆(Ω), the calibration score can be written as and the mapping p → 1{p ∈ F } is not continuous. Instead, given some continuous with respect to some checking rule (U , T ) independent of the current predictions (so that U (h n ) and T (h n ) can be seen as subset of Ω), this score becomes A weaker version of the following Proposition has been proved independently by Kakade & Foster [37] and Vovk, Nouretdinov, Takemura & Shafer [80]; the former named this property weak calibration, but we used the term weak in another meaning (i.e., when horizon of the game is fixed and known). If µ is a probability distribution on the set of checking rules independent of current predictions, then there exists a deterministic σ such that for µ-ae checking rule and every continuous mapping g, no matter Nature's strategy.
As noticed by Foster and Kakade, the convergence in first part of the result can be made uniform with respect to Nature's strategy.
Actually, the most surprising and interesting property of smooth calibration is not so much that there exist deterministic smooth calibrated algorithms, but that they can be used to construct an almost deterministic ε-calibrated strategy as follows, see Kakade & Foster [37] for more details Let ε be fixed and consider a finite ε-triangulation of ∆(Ω) whose vertices are V := {v 1 , . . . , v V }. Any p ∈ ∆(Ω) belongs to one simplex of the triangulation and we denote by V (p) its vertices (if there are more than one simplices, then choose one arbitrarily). The point p can be written as a convex combination of vertices in V (p), i.e. p = v∈V (p) µ v (p)v and it is even possible to decompose p = v∈V µ v (p)v by assuming that µ v (p) = 0 for any p that does not belong to the same simplex. All those mappings µ v are continuous and Lipschitz.
We construct an ε-calibrated strategy σ using a fixed deterministic smooth calibrated strategy σ d in the following way. Whenever σ d dictates to predict p ∈ ∆(Ω), σ predicts v ∈ V with probability µ v (p). Immediate calculations show that, for every v ∈ V, Since µ(p m ) p m − v ≤ ε, expected calibration score (and the actual score, thanks to concentration inequalities) are ε-close to the smooth calibration score, hence the result.
Key features of this construction are that, although it is impossible to construct an ε-calibrated strategy deterministically (as proved by Oakes and Dawid), it is possible by using randomizations on arbitrarily small balls. This is why we used the term of almost deterministic strategies.
Concerning the complexity of (weak) calibration, a recent result of Hazan & Kakade [34], based on an idea of Kakade & Foster [37], shows that it is a hard criterion to satisfy. Indeed, an almost deterministic strategy σ (based on some triangulation of ∆(Ω)) can be used to find ε-Nash equilibria of games. We sketch the proof in the following.
Consider a game between a set of players I with actions sets A i and payoff mappings ρ i . Define Ω = i∈I A i and let X i : ∆(Ω) → ∆(A i ) be a smooth ε-best response of player i (i.e., given any p ∈ ∆(Ω), if p −i denotes the i-th marginal of p, We denote by p n ∈ ∆(Ω) the prediction output at stage n by the strategy σ and we assume that player i plays accordingly to X i (p n ). The profile of actions actually played is ω n ∈ Ω and one has E[ω n ] = X 1 (p n ), . . . , X I (p n ) =: X(p n ). Since σ is ε-calibrated, for every vertex v and with probability one, Concentration inequalities, and the fact that X(p n ) − ω n and 1{p n = v} − µ v (p n ) are martingale differences imply that, with probability one, As a consequence, summing terms, for every vertex v that is predicted with a positive density (i.e., such that lim sup n→∞ n m=1 1{p m = v}/n > 0), one must have v − X(v) ≤ ε. And so, by the very definition of X(·), v must be a 2ε-Nash equilibrium.
Therefore, not only does the empirical profile of action converge to the convex hull of ε-Nash equilibria, but also if a stage n chosen at random then, with arbitrarily great probability, X(v n ) is an 2ε-Nash equilibrium.

From approachability to regret; the finite case
Although Blackwell [10] was the first to notice that consistent strategies can be constructed using approachability theory, we first treat Hart & Mas-Colell [32] idea in finite dimension.
We recall that choices of actions a n ∈ A and b n ∈ B generate at stage n ∈ N an external regret r n defined by r n = r(a n , b n ) := ρ(1, b n ) − ρ(a n , b n ), . . . , ρ(A, b n ) − ρ(a n , b n ) ∈ R A and that a strategy is externally consistent if r + n ∞ goes to 0 almost surely. Actually, using approachability theory, Hart & Mas-Colell [32] proved the following and, for every η > 0, P σ,τ sup N ≥n r + n ≥ η ≤ 3 exp − η 2 n 64A as soon as η 2 n 32A ≥ 1 Proof: We simply have to prove that σ is exactly Blackwell's approachability strategy of the negative orthant R A − (which is a cone) in the game where the vector payoff is r(a, b). This is a consequence of the following geometric property Since x n+1 = σ(h n ) is proportional to r + n , the geometric property implies that because one always has z + , z − = 0. Since r − n is the projection of r n on the negative orthant, this proves that σ satisfies Blackwell property, hence is an approachability strategy. Bonds follow from Corollary 1.15.
Once the reduction from external regret minimization to approachability of R A − has been made, the existence of externally consistent strategies is immediate because R A − is obviously a convex approachable set. Indeed, for every y ∈ ∆(B), there exists x ∈ ∆(A) such that r(x, y) ∈ R A − : it suffices to take for x any best response to y. The most interesting feature of Proposition 4.1 is that the strategy is very simple and natural: the more regret a specific action induces, the more it should be played (and with a weight exactly proportional to this regret generated).
Generalizations to the compact case (when Nature chooses at stage n ∈ N an outcome vector U n ∈ [0, 1] A ) are immediate and omitted.
Remark 4.1 One might argue that with exponential weight algorithm, the dependency in A in rates of convergence shrinks to log(A) instead of √ A, so the strategy we output might not be optimal. Actually this argument is flawed, rates of convergence are indeed optimal since we minimized the ℓ 2 -norm of the regret. It is only possible to upperbound with log(A)/n the ℓ ∞ -norm of the regret.
Actually, Hart & Mas-Colell strategy is an approachability strategy of R A − driven by the potential Φ(z) = z + 2 (that represents the ℓ 2 -norm of the regret) while exponential weights are driven by the soft-max potential Φ(z) = 1 η log a∈A e ηza which is a twice differentiable surrogate of z + ∞ . However, minimization of the infinite norm of regret can also be reduced to approachability, see Proposition 4.2 below (following actually an idea of Blackwell [10]).
Then any approachability strategy of C (which is a convex approachable set) minimizes the ℓ ∞ norm of the external regret since Proof: Convexity of C (which is actually a polytope, i.e., the intersection of a finite number of half-spaces and a compact set ) is a direct consequence of its definition since Approachability of C is immediate: for every U ∈ U , choosing a to be one of the highest component of U ensures that g(a, U ) = (U a , U ) belongs to C. It remains to prove the inequalities.
Notice that if we denote by (z n , U n ) the average vector payoff at stage n, then U n is the average outcome vector and z n is the average actual payoff. As a consequence, the ℓ ∞ norm of the regret, r n ∞ = max a∈A U a n − z n , is exactly equals to the distance between z n , U n and max a∈A U a n , U n . By definition, the latter belongs to C, therefore one has d C g n ≤ r + n ∞ . Let a * ∈ argmax a∈A U a n and z c n , U c n = Π C z n , U n , then where we used the fact that U → max a∈A U a is 1-Lipschitz.
Extensions to the case where A ⊂ R A is a compact convex set are immediate as the finiteness of A is not used in the proof.
Actually, Blackwell proved this result in the finite case, where Nature chooses action in B; in that case, stage payoffs are g ′ (a, b) = (ρ(a, b), δ b ) ∈ R × ∆(B) where, as usual, ∆(B) is seen as a subset of R B . The target set is and since g ′ (a, b) ≤ √ 2, approachability results imply that g ′ n converges to C ′ at the rate of 2/n thus expected regret is bounded in the order of B/n (because in this framework, y → ρ(a, y) is √ B-Lipschitz and not 1-Lipschitz). This shows that regret can be bounded, not only with respect to the number of player's actions (i.e. in log(A)/n), but also with respect to Nature's one (in B/n). This might lead to some improvement if the former is exponentially larger than the latter.

Remark 4.2
In the compact case, usual proofs show that Blackwell's approachability strategy ensures that d C (g n ) ≤ g ∞ /n = (A + 1)/n. However, there exist a consistent strategy such that r + n ∞ ≤ 3 log(A)/n. So this is an example where the optimal dimension dependency of rates of approachability is not g ∞ , but much smaller. There are two possible explanations: either minimizing step by step the ℓ 2 distance (i.e. following Blackwell's strategy) is not optimal, or some important facts are hidden within proofs. In Remarks 4.1 we claimed that the answer was the first possibility: indeed, the final objective was to minimize the ℓ ∞ -distance, so minimizing the ℓ 2 norm must induce an additional dimension-dependent constant. This is not the case here, because the final objective is within constant of the ℓ 2 -distance.
An open and fairly question is wether the dimension dependent term should depend on the specific target set C or not. In these examples, respective sizes of the target sets C within the set of feasible payoff vectors are rather intriguing. For instance, in the framework of Proposition 4.1, the volume of co{g(a, b)} is 2 A times the volume of C while it is only A + 1 times the volume of C in the framework of Proposition 4.2. This has to be compared with the respective size of dimension dependent constants which were √ A and log(A).
We now turn to the minimization of internal regret R n = R(a n , b n ). We recall that it is a A × A-matrix whose (a, a ′ ) component is ρ(a ′ , b) − ρ(a, b) if a = a n and 0 otherwise. The generalization of Hart & Mas-Colell strategy will appeal to the concept of invariant measures of matrices.
A probability distribution λ ∈ ∆({1, . . . , d}) is an invariant measure of a some d × d- and their existence is a consequence of Perron-Frobenius theorem (this also generalizes usual invariant measure of Markov chains (see e.g. Seneta [69]).
Sorin [71], but also Hart & Mas-Colell [31] and Foster & Vohra [25] used the existence of invariant measure to output a simple internally consistent strategy.
Proof: As for external regret, we just need to prove that σ is exactly Blackwell's approachability strategy of the negative orthant. And again, this is a consequence of a geometric property: Any invariant measure λ of any matrix M with non-negative coefficient satisfies, no matter the choice of since λ is an invariant measure of M .
Since x n+1 = σ(h n ) is an invariant measure of R + n , geometric properties implies that This proves that σ satisfies Blackwell property, hence is an approachability strategy and bonds follows from Corollary 1.15.
Once again, using approachability theory to prove existence of internally consistent strategies is immediate: the negative orthant satisfies Blackwell's property. An interesting feature of this algorithm is the simple characterization of this optimal (for the minimization of the ℓ 2 norm) strategy.
Interestingly, the reduction from external to internal consistent strategies (see Section 2.1.3 or Stoltz & Lugosi [75]) run with the algorithm of Proposition 4.1 constructs exactly the strategy of Proposition 4.3.
So both Propositions 4.1 and 4.3 can be unified into the the following theorem that deals more generally with Φ-regret. It exhibits a strategy with the same complexity as the previous internally consistent strategy, dictating to play at each stage an invariant measure of some matrix. Given a family Φ, we recall that Φ-regret at stage n is denoted by R Φ n ∈ R |Φ| and defined by and, for every η > 0, P σ,τ sup N ≥n R Φ n + ≥ η ≤ 3 exp − η 2 n 64A Φ as soon as η 2 n 32A Φ ≥ 1.
Proof: The proof follows closely the ones of Propositions 4.1 and 4.3. Indeed, one just has to prove that this strategy is an approachability strategy of R Indeed, if one denote U = ρ(·, b), then since λ is an invariant measure of Θ Φ (M ). As a consequence, this strategy is exactly Blackwell's approachability strategy of the negative orthant. The result comes from the fact that R Φ (a, b) has at most A Φ non-zero components, each one in [−1, 1], thus R Φ (a, b) 2 ≤ A Φ .

Remark 4.3
As usual, if techniques from approachability in infinite dimension are used instead of regular approachability, the term in √ A for external and internal regret or √ A Φ for Φ-regret can be replaced by respectively log(A) or A log(A), up to some constant.

From approachability to regret; the infinite case
We turn in this section to the case where action set A is no longer finite but some convex compact metric set and at stage n ∈ N, Nature chooses a mapping U n : A → [0, 1] in a set U of equicontinuous mapping. We show how previous results can be extended to this compact case (indeed, Arzela-Ascoli theorem ensures that U is relatively compact). Proof: Consider an auxiliary game where action sets of player and Nature are A and U . Choices of a ∈ A and U ∈ U generates a payoff U [a] ∈ L 2 (Φ c , λ), where λ is some fixed probability distribution over (Φ c , · ∞ ) embedded with the Borelian σ-field, defined by is not excludable by Nature; indeed, for any U ∈ U , there exists a ∈ A (any global maximizer of U ) such that U [a] belongs to C. Thus it is approachable by the player, and any approachability strategy has no φ-regret, for λ-almost all mapping φ ∈ Φ c . However, Φ c is separable (see Rudin [67] or Stoltz & Lugosi [76]), so there exists {φ k ; k ∈ N} a countable dense subset of Φ c ; the corresponding probability λ we consider is λ = k∈N 2 −k δ φ k . Since U is a family of equicontinuous mappings, every mapping U ∈ U share the same modulus of continuity ω(·); this means that, for every ε > 0 there exists δ := ω(ε) such that if d(a, a ′ ) ≤ δ then |U (a)− U (a ′ )| ≤ ε, for any mapping U ∈ U . Given φ ∈ Φ c , there exists φ k such that φ − φ k ≤ δ thus Since σ has no φ k -regret, its φ-regret is asymptotically smaller than ε, for every ε > 0, thus it has no Φ c -regret. Proof: Every U ∈ U is upper-semicontinuous over a compact set, it admits a maximum. Therefore U is uniformly bounded and the set C is approachable, with respect to some probability distribution λ that remains to be defined. Denote by U 1 , . . . , U m the extreme points of U . As they are upper-semicontinuous and bounded, their exists a countable subset {a k ; k ∈ N } ⊂ A such that, for every ε > 0 and every a ∈ A, there exists a k satisfying U i (a k ) ≥ U i (a) − ε, for every i ∈ {1, . . . , m}. Define λ as any probability measure whose support is exactly this countable subset.
The rest of the proof follows the one of Theorem 4.2.
In the finite case, approachability theory not only provides a quick and easy proof of consistent strategies, but also exhibit explicitly some of them. In fact, playing somehow proportionally to the positive part of the regret is still externally consistent in the compact case. Let λ be any positive probability measure on {a k ; k ∈ N}, a countable dense subset of A and denote by r + n [a k ] the external regret at stage n induces by action a k . Consider the strategy that chooses a k at stage n + 1 with probability λ k r + Then, as in the finite case, one can easily show that the geometric property holds, i.e., Approachability in infinite dimension (along with the density argument) ensures that this strategy has no external regret.
Concerning Φ-regret, one cannot simply play accordingly to any invariant measure of some infinite dimensional matrix, as their existence is not ensured. However, it is still possible to discretize finitely A to get a Φ-regret smaller than ε, with ε-arbitrary small (or even equal to 0, if ε is taken as a decreasing sequence, see Proposition 1.7).
Let ω U (·) be the common modulus of continuity of U ∈ U and A a finite ω U (ε)-grid of A. For any φ ∈ Φ, we define φ : A → A by φ = argmin a ′ ∈A d(φ(a), a ′ ) with ties broken arbitrarily. As a consequence, for every a ∈ A, U ∈ U and non negative q ∈ L 2 (Φ, λ), We define, for any (a, a ′ ), Θ[q] a,a ′ := Φ a,a ′ qdλ, where Φ a,a ′ := φ ∈ Φ s.t. φ(a) = a ′ . Let x be any invariant measure of the matrix Θ[q] then one has This proves that L − 2 (Φ c , λ) is a B-set, hence approachable. We can only claim that the strategy we exhibited has some flavors of invariant measures.

Using regret to get calibration
We show in this section that finite calibration can easily be understood in terms of internal regret. The first idea goes to Foster & Vohra [23] and it has been somehow clarified by Sorin [71]. Recall that, in finite calibration, Nature chooses at stage n an outcome ω n ∈ Ω. The player formulates a prediction on ω n by choosing a probability distribution p[ℓ n ] ∈ ∆(Ω) that must belong to a finite grid {p[ℓ] ; ℓ ∈ L}. Theorem 4.3 There exists a strategy σ calibrated with respect to the grid {p[ℓ] ; ℓ ∈ L}, such that, no matter the strategy τ of Nature, is the diameter of the grid.
Proof: The proof uses the fact (simply obtained by expanding sums) that, for any sequence q m and every ℓ, k ∈ L, Now consider the game with action space L and Ω where choices of ℓ and ω generate the payoff ρ(ℓ, ω) = − ω−p[ℓ] 2 . An internally consistent strategy satisfies, by definition, So this, along with the basic fact, shows that any internally consistent strategy is calibrated with respect to the grid {p[ℓ]; ℓ ∈ L}. Rates of convergences follows from those of internal consistency.
We stress out that we proved a stronger result than require; the calibration score converges almost surely to zero, at a rate independent of Nature's strategy.

Remark 4.4
This proof of calibration highlights the following fact. It does not really matter that ω m belongs to a finite set Ω and that p n are probability distributions over Ω. Indeed, one can just assume that sequences ω n and p n belong to some compact set of an . And this quantity is upperbounded optimally by the exponential weight algorithm. We could as well have defined calibration in terms of the ℓ 2 norm of this vector and as in regret minimization, playing an invariant measure could then improve bounds.
Next proposition states that, quite surprisingly, there exist ε-calibrated strategies with rates of convergence independent of ε (and even of Ω, for a slightly weaker notion).
Moreover, this strategy is ε-calibrated, with a rate of convergence independent of ε, since one also has, for every n ∈ N, Proof: Let ε be fixed; the strategy considered is simply a calibrated strategy with respect to some well chosen grid of ∆(Ω). Recall that ∆(Ω) is written as the following subset of R d with d = Ω Denote by e k the unit vector of R d whose components are all zero except the k-th which is one. The regular grid considered is indexed by L ε and defined by Consider the game introduced in the proof of Theorem 4.3, except that choices of ℓ n ∈ L ε and ω n generate an internal regret R ′ n whose (ℓ, ℓ ′ )-th component is As a consequence, using the simple fact concerning averages of norms, .
Same arguments as in the proof of Proposition 4.3 yield that playing, at stage n + 1, any High probability bounds are classics consequences of concentration inequalities, since In fact, Theorem 4.4 slightly improves the result of Mannor & Stoltz [50] since it implies that Rakhlin, Sridharan & Tewari [66] wrote the calibration problem in terms of a generalized regret, see Section 2.2.2. Formally, assume that actions spaces are respectively ∆(Ω) and Ω and that the stage game payoff is null, i.e. g(p, ω) = 0. The class of departure function considered are ξ p,λ ; p ∈ ∆(Ω), λ > 0 where ξ p,λ : ∆(Ω) × Ω → R Ω and the evaluation mappings B n : R Ω n → R are defined by, for every n ∈ N, ξ p,λ [g](p n , ω n ) = 1{ p n − p 1 ≤ λ}(p n − δ ωn ) and B n (Z 1 , . . . , Z n ) = 1 n n m=1 Z m .
As a consequence, one easily has that regret is upper bounded by calibration score, as

Using Approachability to get (smooth and generalized) Calibration
In this section, we show that recent results in calibration can be rewritten solely as the existence or construction of some approachability strategy. The first result we exhibit is a generalization of both a previous one of Perchet [60] (since the strategy is calibrated with respect to much larger families) and Rakhlin, Sridharan & Tewari [66] (because the proof is constructive and not horizon dependent). Although the family of ℓ ∞ -balls is infinite, the number of different possible intersections of such a ball with the grid L ε is obviously finite (it is trivially bounded by its number of subsets, 2 Lε ). However, an ℓ ∞ -ball B ∞ (p, λ) is rectangular and can be described by two extreme points: the lowest corner p − d k=1 λe k and the highest corner (in every direction) p + d k=1 λe k The grid L ε is regular, so this characterization holds for intersections with ℓ ∞ balls: they are characterized by two extreme points. As a consequence, they are at most L ε ≤ ε −2d different possible intersections. Consider a fixed family of ℓ ∞ -balls that induce exactly these different intersections, and denote it B ∞ (p[k], λ k ); k ∈ K .
We introduce an auxiliary game with action space L ε and Ω, payoff mapping and consider the closed and convex target set C := B ∞ (0, ε) ⊂ R d K .
Given q ∈ ∆(Ω), the pure action ℓ corresponding to a point of the grid p[ℓ] such that p[ℓ] − q ∞ ≤ ε ensures that g(ℓ, q) belongs to C which is therefore approachable. Moreover, since C is rectangular, the approachability strategy of Corollary 1.16, adapted to the potential Φ(z) = 1 η log Therefore, given N ∈ N such that N ≥ 2ed, the choice of ε/4 = η = d 8N log N 2d ensures in particular that As usual, when playing by blocks of increasing size 2 m (starting at m such that 2 m ≥ ed), the last two displays ensure that, for every n ∈ N, The result comes from the fact that, by construction, for every n ∈ N, If d ≥ 3, since L ε ≤ ε 2d /d!, constants in Theorem 4.5 can be lowered if one is only interested in the asymptotic behavior. This result holds almost surely since, using concentration inequalities, with P σ,τ probability at least 1 − δ, Statement concern ℓ ∞ balls; however, it is also possible to show that for other ℓ p -balls, the number of possible intersection with the grid is bounded by Rakhlin, Sridharan & Tewari [66]). Thus the results holds, up to some polynomial term in Ω, for any other ℓ p -norm. This technique could actually have been used to proved Theorem 4.4, a similar result with respect to the family of Borel sets. The difference is that the number of possible intersection between Borel sets and our grid would have been in the order of 2 1/ε d . After taking the logarithm, equalizing the three remaining terms in regret ε, η and 1/(ε d ηn) yields that ε = η = n −1/(d+2) . This would have been the bound on expected regret.
We now turn to calibration with checking rules and smooth calibration, and we show that they can be reduced to approachability problems. We recall that given a pair of mappings U and T , we defined , the empirical probability of tested events and the average predicted conditional probability of tested events If a checking rule is independent of current predictions, then the same definition hold with (p m , ω m ) ∈ U (h n ) (resp. in T (h n )) replaced by ω m ∈ U (h n ) (resp. in T (h n )).
Theorem 4.6 Let λ be a probability distribution on the set of checking-rules independent of current predictions. Then there exists a deterministic strategy σ that is calibrated with λ-almost every checking rules such that, P σ,τ -almost surely, Proof: Proof relies essentially on approachability with activation in infinite dimension.
We define an auxiliary game where payoff is a random variable over the set of checking rules independent of current predictions. Action set of the player is reduced to ∆(Ω) 0 , the interior of ∆(Ω) -so that conditional probabilities are well defined -and payoff at stage n is 1{ω n ∈ T (h n−1 )} − p{T (h n−1 )|U (h n−1 )} if the coordinates (U , T ) is active, i.e., if n ∈ N n [U , T ]. By definition, average payoff at stage n is exactly ω n [U , T ] − p n [U , T ] and we shall construct a strategy σ that approaches the convex set {0}, that is, using Theorem 1.6, find p n+1 ∈ ∆(Ω) 0 such that, for every ω ∈ Ω, is less or equal to zero (or at least smaller than ε n = 1/n 2 ). To construct this p n+1 , we consider the game with payoff defined on ∆(Ω) 0 and Ω by the integrals in the last two displayed equations coincide, so we just need to prove that there exists p n+1 ∈ ∆(Ω) 0 such that g(p n+1 , ω) ≤ ε n , for every ω ∈ Ω or, more generally, that inf p∈∆(Ω) 0 sup ω∈Ω g(p, ω) ≤ 0.
Proposition 4.6 Let λ be a probability distribution on the set of checking-rules. Then there exists a strategy σ that is calibrated with λ-almost every checking rules such that, P σ,τ -almost surely, lim sup Since g(·, ω) might not be continuous, Lemma 5.1 does not apply. However, g is bounded and defined over ∆(Ω) 0 and Ω, the former being measurable and the latter finite. Therefore, see Sorin [70] Theorem A.9, this game has a value in mixed action. And this value has to be smaller than 0 since g(p, p) = 0 for every p ∈ ∆(Ω) 0 .
The last similar reduction to approachability concerns smooth calibration. The same result holds if one adds checking rules independent of current predictions.
Proof: The set of continuous mappings from ∆(Ω) to R + is separable and we denote by λ a probability distribution with support {g k ; k ∈ N}, a dense countable family. Following the lines of the proof of Therorem 4.6, we define ω n [g k ] = n m=1 g k (p m )ω m n and p n [g k ] = n m=1 g k (p m )p m n .
Then, Corollary 1.4 ensures the existence of an approachability strategy such that, for every k ∈ N, ω n [g k ] − p n [g k ] converges to zero. Indeed, one just has to prove that {0} ⊂ L 2 is approachable, thus that for every n ∈ N, there exists p n+1 ∈ ∆(Ω) such that, no matter ω ∈ Ω, where we assumed that 0/0 = 0. The existence of such p ∈ ∆(Ω) is again a consequence of Ky Fan's inequality generalized in Lemma 5.4.
Since {g k ; k ∈ N} is a dense family, necessarily ω n [g] − p n [g] must converges to zero, for every continuous mapping g.
A close look to the first proof of existence of deterministic smooth calibrated strategies, due to Kakade & Foster [37], shows that they also have constructed an ε-approachability strategy (and then used a doubling trick). We proposed here a direct (and maybe more intuitive) proof.

Using calibration to get regret and approachability
Calibration in some auxiliary game can be seen as a useful tool to construct strategies that satisfies another criterion as approachability, no internal regret and so on. This idea goes back to Foster & Vohra [23] and was used, recently, by Perchet [59,61,62]; in particular, it is useful in a specific case of general regret (see Section 2.2.2) defined below.
But first, we focus on usual internal regret in the finite case (although it can be generalized immediately when B is any compact set). Recall that a strategy is internally consistent if the supremum limit of up to a factor 2. As a consequence, any weighted-calibrated strategy with respect to {ρ(a, ·), ρ(a, ·) 2 ; a ∈ A} is internally consistent. Since scores are actually exactly the same, rates of convergence of weighted calibration give rates for regret minimization.
We now turn to generalized regret. Assume that A and B are two compact and convex sets and let G : A × B → R be any fixed evaluation mapping that might not be linear in any of its coordinates. In this framework, a strategy has no G-external regret if lim sup n→∞ sup a * ∈A G(a * , b n ) − G(a n , b n ) ≤ 0.
To define internal regret, assume that a strategy only uses a finite number of actions in A L = {a[ℓ]; ℓ ∈ L}, so that σ is actually a mapping from the set of finite histories into L, and ℓ n = ℓ means that action a[ℓ] is played at stage n. Define Proposition 4.7 If G is continuous, then for every ε > 0 there exists a (L, ε)-internally consistent strategy. However, their might not exist any (ε-)externally consistent strategies.
The first case implies that G(a, b[ℓ]) − G(a, b n [ℓ]) ≤ ε/2 for every a ∈ A, thus in both cases one has that, after stage N , which characterizes a (L, ε)-calibrated strategy.
It remains to prove that there might not exist externally consistent strategies. Define G(a, b) = (1 − 4b)a, for every a ∈ [0, 1] and b ∈ [0, 1] and assume that during the first N stages (with N is large enough) b n = 0. Necessarily a N is arbitrarily close to 1. During the next N stage, define b n = 1 then a 2N is at most 1/2 thus the external regret is of at least 1/2.
We now prove how to construct an ε-approachability strategy via calibration. Given a closed and compact set C ⊂ R d and a vector payoff mapping g : ∆(A) × ∆(B) → R d , define G(x, y) = −d C (g(x, y)) for every x ∈ ∆(A) and y ∈ ∆(B). If C is approachable, then Blackwell's condition ensures that sup x * ∈∆(A) G(x * , y) = 0 for every y ∈ ∆(B). By convexity of d C and the triangle inequality, Both sums converges almost surely to zero, respectively because σ has no internal regret (with respect to G) and because of concentration inequalities since g(x[ℓ], b n ) = E[g(a n , b n )]. One can resort to the doubling trick (since we can easily derive uniform speed of convergence) to get an approachability strategy.

Using regret to get approachability
We proved in the last section how calibration and generalized regret can be used to construct approachability strategy, as noticed by Perchet [59] or Rakhlin, Sridharan & Tewari [66]. A completely different link can also be formulated between regret and approachability, as discovered recently by Abernathy, Bartlett & Hazan [1]. We recall that Blackwell's strategy consists in playing, at stage n + 1, optimally in the zero-sum projected game g(x, y)−π C (g n ), g n −π C (g n ) . Abernathy, Bartlett & Hazan [1] proposed to use a regret minimization scheme to determine, stage by stage, in which projected game to play (i.e., not necessarily along the direction g n − π C (g n )).
The formulation is rather simple when C = {0} ⊂ R d , so we will focus only on this case. It can however be generalized to any convex cone and therefore to any convex set in R d (seen as a section of a convex cone in R d+1 ). The basic idea is to notice that, for C = {0} and every n ∈ N, d C (g n ) = g n = sup θ∈B(0,1) θ, g n , where B(0, 1) = θ ∈ R d , θ 2 ≤ 1 .
Assume that at stage m, the player played optimality in the projected game along the direction θ m−1 . Since C is approachable, this zero-sum game has a negative value, hence θ m−1 , E[g m ] ≤ 0. As a consequence, E d C (g n ) = E g n ≤ E sup θ∈B(0,1) θ, g n − 1 n n m=1 θ m−1 , g m .
The term inside the expectation can be written as the external regret if player and Nature's action set are respectively B(0, 1) and g(a, b); (a, b) ∈ A × B . As a consequence, an approachability strategy can indeed be described as a two step procedure. At any stage n, choose, in a first step, a direction θ n ∈ B(0, 1) following any regret minimization algorithm. Then, in a second step, play optimally in the projected zero-sum game on θ n .
Blackwell's strategy dictates to choose (in the first step) the direction θ n that maximizes θ, g n ; in other words, this is precisely the follow the leader algorithm that does not guarantee a shrinking regret (in full generality). The key point to understand this feature is that, by definition of the second step, θ m , E[g m+1 ] is always non-positive (no-matter the choice of θ m ) ; so, in this auxiliary game, Nature is in fact very restricted on her choice of actions and what is even more intricate, these restrictions depend on the player's move. or, P Z T ≥ φ −1 (δ)/ √ T ≤ δ with φ(x) := (1 + x/K) exp −x 2 /2K 2 . Actually, a weak maximal version of this inequality holds: For d = 1, one can define φ(x) = 2 exp −x 2 /2 and φ(x) = 2 exp −x 2 /4 otherwise. Stronger maximal inequalities for averages of martingale differences exist: Lemma 5.2 Let Z t be a martingale difference sequence with Z t ≤ K then, for every δ > 0 and every integer T ≥ 1, Proof: Define ε t = 2φ −1 (δt/4T ) / √ t. Using a peeling argument, one obtains Hence the result.
Similarly, maximal inequalities can be derived for tail events: Lemma 5.3 Let Z t ∈ R d be a martingale difference sequence with Z t ≤ K then, for every ε > 0 and every integer T ≥ 1, The exponential dependency in T can be reduced since one has, as soon as T ε 2 2K 2 ≥ 1, Proof: Again, using a peeling argument, one obtains and the first part of the result follows.
The second part of the proof follows from the facts that