Payoff Performance of Fictitious Play

We investigate how well continuous-time fictitious play in two-player games performs in terms of average payoff, particularly compared to Nash equilibrium payoff. We show that in many games, fictitious play outperforms Nash equilibrium on average or even at all times, and moreover that any game is linearly equivalent to one in which this is the case. Conversely, we provide conditions under which Nash equilibrium payoff dominates fictitious play payoff. A key step in our analysis is to show that fictitious play dynamics asymptotically converges the set of coarse correlated equilibria (a fact which is implicit in the literature).

Continuous-time fictitious play (FP) has been first introduced by Brown [7,8] and it has since been a standard model for myopic learning, often used as a convenient reference algorithm due to its computational simplicity (see, for example, [11,31]). It has been shown to converge to Nash equilibrium in many important classes of games, such as zero-sum games [19], non-degenerate 2 × n games [3], non-degenerate quasi-supermodular games with diminishing returns or of dimension 3 × n or 4 × 4 [5,4], and others. On the other hand, convergence to Nash equilibrium (even when it is unique) is not guaranteed, as demonstrated by Shapley's famous example [27] of a 3 × 3 Rock-Paper-Scissors-like game with a stable limit cycle for FP.
The question therefore arises whether in the non-convergent case the payoff to the players along trajectories of FP compares favourably to Nash equilibrium payoff. In this paper we investigate the relation between Nash equilibrium payoff and average payoff along FP trajectories. In particular, we show that in many two-player games, FP may in the long run earn a higher payoff to both players than Nash equilibrium play, either on average, or even at all times. We also show that every two-player game is 'linearly equivalent' to one in which FP Pareto dominates Nash equilibrium (at all times, along every non-equilibrium FP orbit). Conversely, we give conditions under which FP is dominated by Nash equilibrium in terms of payoff, and show numerical examples for this (rather atypical) behaviour.
The paper is organized as follows. In Section 1 we introduce basic notation. In Section 2 we analyse the limiting behaviour of FP dynamics and show that FP converges to the socalled set of coarse correlated equilibria. In Section 3 we use this to compare the payoff along the limit sets with the Nash equilibrium payoffs. Ultimately, this allows us to show that every bimatrix game is linearly equivalent to one in which FP Pareto dominates Nash equilibrium and we discuss the conditions governing the payoff comparison of these two. In Section 4 we present a particular family of 3×3 games in which FP yields higher average payoff to both players than Nash equilibrium. In Section 5 we investigate the possibility of games in which Nash equilibrium play dominates FP. We also deduce conditions for this and numerically determine examples in which this is the case. The discussion shows that these examples are relatively 'rare'. Finally, in Section 6 we discuss the implications of these results for the notions of equilibrium (in the context of payoff performance of learning algorithms) and game equivalence.

Notation and standard facts
For A = (a i j ), B = (b i j ) ∈ R m×n , we denote by (A, B) a bimatrix game with players A and B having pure strategies S A = {1, . . . , m} and S B = {1, . . . , n}. We call S = S A ×S B the joint strategy space, and we call a probability distribution over S a joint probability distribution. By Σ A ⊂ R 1×m and Σ B ⊂ R n×1 we denote the (m − 1)-and (n − 1)-dimensional simplices of mixed strategies of the two players, and we implicitly identify the pure strategy i ∈ S A with the ith unit vector in R 1×m and j ∈ S B with the jth unit vector in R n×1 . We write Σ = Σ A × Σ B for the space of mixed strategy profiles. Note that this can be seen as a proper subset of the set of joint probability distributions.
The payoffs to players A and B from playing the pure strategy profile (i, j) ∈ S A ×S B are a i j and b i j , respectively. By linearity, their expected payoffs from playing a mixed strategy profile (x, y) ∈ Σ = Σ A × Σ B are u A (x, y) = xAy and u B (x, y) = xBy.

The players' best response correspondences BR
We further denote the maximal-payoff functions the maximal payoff to player A given player B's strategy q is equal to the maximal entry of the vector Aq, and similarly for player B. For generic bimatrix games, the best response correspondences BR A : Σ B → Σ A and BR B : Σ A → Σ B are almost everywhere single-valued, with the exception of a finite number of hyperplanes. The singleton value taken by BR A whenever it is single-valued is always a pure strategy of player A. When BR A (p) is not a singleton, it is the set of convex combinations of a subset of {e i : i ∈ S A }, that is, a face of the simplex Σ A , or possibly all of Σ A . The analogous statement holds for BR B .
It follows that Σ A and Σ B can be divided into respectively n and m regions (in fact, convex polytopes): We will call R A i the preference region of strategy i for player A, as it is the subset of the second player's strategies against which player A expects the highest payoff by playing strategy i; similarly, for R B j . For a generic game (A, B), the subset of Σ B on which BR A contains two distinct pure strategies i, i ∈ S A (and hence all their convex combinations) is a codimension-one hyperplane of Σ B : These hyperplanes are subsets of linear codimension-one subspaces of Σ B and Σ A , respectively. See Figure 1 for an illustration in the case n = m = 3. We call these sets the indifference sets of players A and B. Definition 1.1. A mixed strategy profile (p,q) ∈ Σ is a Nash equilibrium, ifp ∈ BR A (q) andq ∈ BR B (p). If a Nash equilibrium lies in the interior of Σ, it is called completely mixed. Figure 1. Geometry of a 3 × 3 bimatrix game. The spaces of mixed strategies Σ A and Σ B are each a simplex spanned by three vertices (the pure strategies). Note the convex preference regions R B j ⊂ Σ A and R A i ⊂ Σ B , their intersections as indifference sets Z B j j and Z A ii , and the projections to Σ A and Σ B of the (in this case, unique) Nash equilibrium (E A , E B ) at the intersection of all these sets.
The following lemma is a standard fact and easy to check.
is a (completely mixed) Nash equilibrium of an m × n bimatrix game (A, B) if and only if, for all i, i = 1, . . . , m and j, j = 1, . . . , n, Note that this implies that E A ∈ R B j and E B ∈ R A i , for all i, j. From the various ways to define continuous-time FP, we follow the approach taken in [15]. We define a continuous-time fictitious play process (p(t), q(t)) ∈ Σ, t ≥ 1, as a solution to the differential inclusioṅ with some initial condition (p(1), q(1)) ∈ Σ (see, for example, [15,19]). Alternatively, as in [15], we can denote by x(t) and y(t) the strategies played by the two players at time t ≥ 0, where x : [0, ∞) → Σ A and y : [0, ∞) → Σ B are assumed to be measurable functions. We write the average (empirical) past play of the respective players from time 0 through t as x(s) ds and q(t) 1 t t 0 y(s) ds.
Then continuous-time FP is given by the rule expressed in the following integral inclusions: x(t) ∈ BR A (q(t)) and y(t) ∈ BR B (p(t)) for t ≥ 1 and (x(t), y(t)) ∈ Σ arbitrary for 0 ≤ t < 1. Note that with an appropriate tie-breaking rule, x(t) and y(t) can be chosen to be pure strategies for any time t ≥ 1, since BR A (q) and BR B (p) each contain at least one pure strategy for any (p, q) ∈ Σ. Defined this way, (p(t), q(t)), t ≥ 1, is a solution of the differential inclusion (1) with initial condition p(1) . . , c n ∈ R to the matrix columns, andB can be obtained from B by multiplication with d > 0 and addition of d 1 , . . . , d m ∈ R to its rows: . . , m and j = 1, . . . , n.
The following lemma follows by direct computation. In particular, they have the same Nash equilibria, the same preference regions and indifference sets, and give rise to the same FP dynamics.

Limit set for FP
In this section we study the long term behaviour of (continuous-time) FP. It has been known since Shapley's famous version of the Rock-Paper-Scissors game [27] that FP does not necessarily converge to a Nash equilibrium even when the latter is unique, and can converge to a limit cycle instead. In fact, convergence to a unique Nash equilibrium in the interior of Σ seems to be rather the exception than the rule: It is a standing conjecture that such Nash equilibrium can only be stable for FP dynamics, if the game is equivalent to a zero-sum game [19].
We will show that every FP orbit converges to (a subset of) the set of so-called 'coarse correlated equilibria', sometimes also referred to as the 'Hannan set' (see [14,31,16]). In fact, this result follows directly from the 'belief affirming' property of FP 1 , shown in [22]. However, to the best of our knowledge, the conclusion that FP has its limit set contained in the set of coarse correlated equilibria has not been mentioned in the literature. We also provide a slightly different proof of this fact.
The following definition can be found in [24]. One way of viewing the concept of CCE is in terms of the notion of regret. Let us assume that two players are (repeatedly or continuously) playing a bimatrix game (A, B), and let P(t) = (p i j (t)) be the empirical joint distribution of their past play through time t, that is, p i j (t) represents the fraction of time of the strategy profile (i, j) along their play can be interpreted as the regret of the first player from not having played action i ∈ S A throughout the entire past history of play. It is (the positive part of) the difference between player A's actual past payoff 2 and the payoff she would have received if she always played i , given that player B would have played the same way as she did. Similarly, ] + is the regret of the second player from not having played j ∈ S B . This regret notion is sometimes called unconditional or external regret to distinguish it from the internal or conditional regret 3 . In this context the set of CCE can be interpreted as the set of joint probability distributions with non-positive regret. It has been shown that there are learning algorithms with no regret, that is, such that asymptotically the regret of players playing according to such algorithm is non-positive for all their actions. Dynamically this means that if both players in a two-player game use a no-regret learning algorithm, the empirical joint probability distribution of actions taken by the players converges to (a subset of) the set of CCE (not necessarily to a certain point in this set).
The concept of no-regret learning (also known as universal consistency, see [10]) and the first such learning algorithms have been introduced in [6,14]. More such algorithms have been found later on and moreover algorithms with asymptotically non-positive conditional regrets have been found (see, for example, [9,17,18]; for good surveys see [31,16]).
We now show that continuous-time FP converges to the set of CCE.
with equality for at least one (i , j ) ∈ S A ×S B . In other words, FP dynamics asymptotically leads to non-positive (unconditional) regret for both players.

Remark 2.3.
(1) Note that an FP orbit (p(t), q(t)), t ≥ 1, gives rise to a joint probability distribution P(t) = (p i j (t)) via p i j (t) = p i (t) · q j (t). When we say that FP converges to a certain set of joint probability distributions, we mean that P(t) obtained this way converges to this set.
(2) In [22] a stronger result is proved: continuous-time FP is 'belief affirming' or 'Hannan-consistent'. This means that it leads to asymptotically non-positive unconditional regret for the player following it, irrespective of her opponent's play (even if the opponent is playing according to a different algorithm). We will only need the weaker statement and provide our own proof for the reader's convenience.
By the envelope theorem (see, for example, [29]), forp ∈ BR A (q) we have that for t ≥ 1. We conclude that for T > 1, 3 Conditional regret is the regret from not having played an action i whenever a certain action i has been played, that is, [ j a i j p i j − j a i j p i j ] + for some fixed i ∈ S A . and therefore where P(T ) = (p i j (T )) is the empirical joint distribution of the two players' play through time T . On the other hand, Hence, By a similar calculation for B, we obtain This shows that any FP orbit converges to the boundary of the set of CCE.
Let us denote the average payoffs through time T along an FP orbit aŝ As a corollary to the proof of the previous theorem we get the following useful result.
Theorem 2.4. In any bimatrix game, along every orbit of FP dynamics we have Remark 2.5. This formulation of the result shows why in [22] this property is called 'belief affirming'. SinceĀ(q(T )) andB(p(T )) can be interpreted as the players' expected payoffs given their respective opponent's play q(T ) and p(T ), the above theorem says that the difference between expected and actual average payoff of each player vanish, so that asymptotically their 'beliefs' are 'confirmed' when playing according to FP.

FP vs. Nash equilibrium payoff
In this section we investigate the average payoff to players in a two-player game along the orbits of FP dynamics and compare it to the Nash equilibrium payoff (in particular, in games with a unique, completely mixed Nash equilibrium). We show that in contrast to the usual assumption that players should primarily attempt to play Nash equilibrium and that learning algorithms converging to Nash equilibrium are desirable, the payoff along FP orbits can in some games be better on average, or even at all times Pareto dominate the Nash equilibrium payoff.
Moreover, we demonstrate that to every bimatrix game with unique, completely mixed Nash equilibrium, there is a dynamically equivalent game for which this superiority of FP over Nash equilibrium holds.
Throughout the rest of this section we will assume that all the games under consideration have a unique, completely mixed Nash equilibrium point (E A , E B ) (it is a well-known fact that in such a game, both players necessarily have the same number of strategies). A first simple situation in which FP can improve upon such a Nash equilibrium is given by the following direct consequence of Theorem 2.4.  Let (A, B) be a bimatrix game with unique, completely mixed Nash equilibrium (E A , E B ). IfĀ(q) ≥Ā(E B ) andB(p) ≥B(E A ) for all (p, q) ∈ Σ, then asymptotically the average payoff along FP orbits is greater than or equal to the Nash equilibrium payoff (for both players).
Remark 3.2. The hypothesis of this proposition,Ā(q) ≥Ā(E B ) andB(p) ≥B(E A ) for all (p, q) ∈ Σ, means that that is, the Nash equilibrium payoff equals the minmax payoff of the players. For a nonzero-sum game this is a rather strong assumption, suggesting an unusually bad Nash equilibrium in terms of payoff. However, as we will show in the next result, at least from a dynamical point of view, the situation is not at all exceptional.
This result states that every bimatrix game with unique, completely mixed Nash equilibrium is linearly equivalent to one in which players are better off playing FP than playing the (unique) Nash equilibrium strategy. In the proof we will need the following lemma.  . . . , x n ) = (x 1 , . . . , x n−1 ), and note that π is invertible with inverse π −1 (y) = (y 1 , . . . , y n−1 , 1 − n−1 k=1 y k ).
For q ∈ Σ B we have that n k=1 q k = 1 and therefore (a ik − a jk − a in + a jn )q k + (a in − a jn ), and we define the affine map P : R n−1 → R n−1 by (a l,k − a l+1,k − a l,n + a l+1,n )x k + (a l,n − a l+1,n ), In particular, the affine map P is invertible and there is a unique vector v 1 ∈ {v ∈ R n : i v i = 0}, such that P(π(E B + v 1 )) = w 1 (−1, 0, . . . , 0) . Since E B is in the interior of Σ B , x 1 = E B + s · v 1 ∈ Σ B for sufficiently small s > 0, and we have that P(π(x 1 )) = (−s, 0, . . . , 0) . By the definition of P, this means that (Ax 1 ) 1 < (Ax 1 ) 2 = (Ax 1 ) 3 = · · · = (Ax 1 ) n . Hence Note also that every x ∈ L A 1 is of the form E B + s · v 1 for some s > 0, that is, L A 1 is a ray from the point E B . For 1 < k < n, let w k be the vector in R n with (k − 1)th and kth entries equal to 1 and −1 respectively, and all other entries equal to 0. Then choose v k such that P(π(E B + v k )) = w k . Again for sufficiently small s > 0, we get x k = E B + s · v k ∈ L A k . Finally, for k = n, let w k = (0, . . . , 0, 1) and proceed as above to get v n and x n = E B + v n ∈ L A n . Writing the affine map P as P(x) = Mx + b for some invertible matrix M ∈ R (n−1)×(n−1) and b ∈ R n−1 , we get w k = P(π(E B + v k )) = P(π(E B )) + M(v k 1 , . . . , v k n−1 ) = M(v k 1 , . . . , v k n−1 ) , k = 1, . . . , n. Since any n − 1 of the vectors are linearly independent and M is invertible, it follows that any n−1 of the vectors v 1 , . . . , v n are linearly independent, as claimed.
The same argument applied to the matrix B shows the analogous result for L B l , l = 1, . . . , n, which finishes the proof.
Proof of Theorem 3.3. Let A ∈ R n×n , such that a i j = a i j +c j for some c = (c 1 , . . . , c n ) ∈ R n . Then for any q ∈ Σ B , Observe that, restricted to R A k , level sets ofĀ are precisely the (n − 2)-dimensional hyperplane pieces in Σ B orthogonal to a k , the kth row vector of A: So all level sets ofĀ restricted to R A k are parallel hyperplane pieces. Figure 2 illustrates this situation for the case n = 3. By Lemma 3.4 we can choose n points P 1 , . . . , P n ∈ Σ B such that Each point P k is in the relative interior of the line segment L A k ⊂ Σ B . This line segment has endpoint E B and is adjacent to all of the regions R A i , i k. By the same lemma, P 1 − E B , . . . , P n−1 − E B form a basis for {v ∈ R n : k v k = 0}. Therefore, the vectors P 1 , . . . , P n form a basis for R n .
It follows that one can choose c = (c 1 , . . . , c n ) ∈ R n , such that c · P 1 +Ā(P 1 ) = · · · = c · P n +Ā(P n ), and hence by (2),Ā (P 1 ) = · · · =Ā (P n ). Then level sets ofĀ are boundaries of (n − 1)-dimensional simplices centred at E B (each similar to the simplex with vertices P 1 , . . . , P n ). Now we show that E B is a minimum forĀ . The uniqueness of the completely mixed Nash equilibrium implies that A is not the zero matrix. Therefore, there exists a vector v = (v 1 , . . . , v n ) ∈ R n with k v k = 0, such that at least one of the entries of Av is positive. Let r(t) = E B + t · v, t ≥ 0, be a ray from E B in Σ B . Then for t 2 > t 1 we get So, along some ray from E B ,Ā is increasing. By the spherical structure of the level sets, this implies thatĀ is increasing along every ray from E B . HenceĀ (E B ) ≤Ā (q) for every q ∈ Σ B with equality only for q = E B .
The same reasoning shows that one can choose d 1 , . . . , d n ∈ R and B ∈ R n×n , b i j = b i j + d i , such thatB (E A ) ≤B (p) for every p ∈ Σ A with equality only for p = E A .
The previous results, Theorem 3.3 and Proposition 3.1, assert that every game possesses a dynamically equivalent version, in which FP Pareto dominates Nash equilibrium play. This shows that dynamical equivalence does not in general preserve the global payoff structure of a game, since there are clearly games for which Pareto dominance of FP over Nash equilibrium does not hold a priori.
In the famous Shapley game or variants of it [27,28,30], FP typically converges to a limit cycle, known as a Shapley polygon [12], and usually the payoff along this polygon is greater than the Nash equilibrium payoff in some parts of the cycle, and less in others. On average, this can be still preferable for both players compared to playing Nash equilibrium, if they aim to maximise their time-average payoffs. In a similar setting, this has been previously observed in [12]. We will show an example of this situation in the next section.
In fact, the proof of Theorem 3.3 shows that the unique, completely mixed Nash equilibrium (E A , E B ) can never be an isolated payoff-maximum, since there are always directions from E B in Σ B and from E A in Σ A along whichĀ andB are non-decreasing. Heuristically one would therefore expect that FP typically improves upon Nash equilibrium in at least parts of any limit cycle. In Section 5 we will demonstrate that this is not always the case: there are games in which FP typically produces a lower average payoff than Nash equilibrium.

FP better than Nash equilibrium: an example
Consider the one-parameter family of 3 × 3 bimatrix games (A β , B β ), β ∈ (0, 1), given by This family can be viewed as a generalisation of Shapley's game [27]. In [28,30], FP dynamics of this family of games has been studied extensively, and the system has been shown to give rise to a very rich chaotic dynamics with many unusual and remarkable dynamical features. The game has a unique, completely mixed Nash equilibrium ( , which yields the respective payoffs To check the hypothesis of Proposition 3.1, let q = (q 1 , q 2 , q 3 ) ∈ Σ B , then

Moreover, equality holds if and only if
which is equivalent to q 1 = q 2 = q 3 , that is, q = E B . We conclude thatĀ(q) >Ā(E B ) for all q ∈ Σ B \ {E B }, and by a similar calculation,B(p) >B(E A ) for all p ∈ Σ A \ {E A }. As a corollary to Proposition 3.1 we get the following result.
Theorem 4.1. Consider the one-parameter family of bimatrix games (A β , B β ) in (3) for β ∈ (0, 1). Then any (non-stationary) FP orbit Pareto dominates constant Nash equilibrium play in the long run, that is, for large times t we havê In fact, one can say more: There is a β ∈ (0, 1) such that FP has an attracting closed orbit (the so-called 'anti-Shapley orbit' [28,30]) along which FP Pareto dominates Nash equilibrium at all times. In other words, both players are receiving a higher payoff than at Nash equilibrium at any time along this orbit. We omit the details of the proof: techniques developed in [20,26] can be used to analyse FP along this orbit, whose existence was shown in [28]. In particular, the times spent in each region R B j × R A i along the orbit can be worked out explicitly, which can be directly applied to obtain average payoffs. for all i, i ∈ S A and j, j ∈ S B . One interpretation of this notion is similar to that of the CCE (see paragraph after Definition 2.1), with the notion of '(unconditional) regret' replaced by the finer notion of 'conditional regret'. If we think of P as the empirical distribution of play up to a certain time for two players involved in repeatedly or continuously playing a given game, then P is a CE if neither player regrets not having played a strategy i (or j ) whenever she actually played i (or j). In other words, the average payoff to player A would not be higher, if she would have played i at all times when she actually played i throughout the history of play (assuming her opponent's behaviour unchanged), and the same for player B.
One can check that the set of Nash equilibria is always contained in the set of CE, which in turn is always contained in the set of CCE. In the game (A β , B β ) in Theorem 4.1, the Nash equilibrium (E A , E B ) is also the unique CE, which can be checked by direct computation. Hence our result shows that in this case, FP also improves upon CE in the long run.

FP can be worse than Nash equilibrium
We have seen that in many games FP improves upon Nash equilibrium in terms of payoff. Moreover, we have shown that for any bimatrix game with unique, completely mixed Nash equilibrium, linear equivalence can be used to obtain dynamically equivalent examples in which FP Pareto dominates Nash equilibrium. In this section we investigate the converse possibility of FP having lower payoff than Nash equilibrium. Again we restrict our attention to n × n games with unique, completely mixed Nash equilibrium.
Let us define the sub-Nash payoff cones, the set of those mixed strategies of player A, for which the best possible payoff to player B is not greater than Nash equilibrium payoff, and similarly By adding suitable constants to the player's payoff matrices we can assume without loss of generality that Then one can see that − denotes the quadrant of R n with all coordinates non-positive, and by (B ) −1 and A −1 we mean the pre-images under the linear maps B , A : R n → R n . Therefore, P − A and P − B are convex cones in Σ A and Σ B with apexes E A and E B respectively. Now an orbit of FP is Pareto dominated by Nash equilibrium if and only if it (or its part for t ≥ t 0 for some t 0 ) is contained in the interior of P − A × P − B . This shows that a result like Theorem 3.3 with the roles of Nash equilibrium and FP reversed cannot hold: if a game has an FP orbit whose projections to Σ A and Σ B are not both contained in some convex cones with apexes E A and E B , then for any linearly equivalent game, along this orbit there are times at which one of the players enjoys higher payoff than Nash equilibrium payoff. In order to find FP orbits along which payoffs are permanently lower than Nash equilibrium payoff, one therefore needs to find orbits contained in a halfspace (whose boundary plane contains the Nash equilibrium). The following lemma ensures that one can then obtain a linearly equivalent game with P − A × P − B containing this orbit. Lemma 5.1. Let (A, B) be any n × n bimatrix game with unique, completely mixed Nash equilibrium (E A , E B ). Then for any convex cones C A ⊂ Σ A , C B ⊂ Σ B with apexes E A and E B respectively, and opening angles in [0, π), there exists a linearly equivalent game (Ã,B), such that P − A = C 1 and P − B = C 2 . The proof of this lemma follows from the proof of Theorem 3.3. Note that there, to any given game we constructed a linearly equivalent game with P − A = P − B = ∅. By Lemma 5.1, to find an example of a game with an orbit which is Pareto worse than Nash equilibrium, it suffices to find a game with an orbit whose projections to Σ A and Σ B are completely contained in convex cones with apexes E A and E B respectively. One can then construct a linearly equivalent game, for which this orbit is actually contained in the sub-Nash payoff cones. We will demonstrate one such example in the 3 × 3 case, which we obtained by numerically randomly generating 3 × 3 games and testing large numbers of initial conditions to detect orbits of the desired type.
Observe that by convexity of the preference regions R A i , a halfspace in Σ B whose boundary line contains the (unique, completely mixed) Nash equilibrium contains at most two of the three rays L A i , i = 1, 2, 3. The same holds for a halfspace in Σ A and the rays L B j ,  The matrices A and B are chosen in such a way that the Nash equilibrium payoffs are both normalised to zero: u A (E A , E B ) = u B (E A , E B ) = 0. Numerical simulations suggest that FP has a periodic orbit as a stable limit cycle, which attracts almost all initial conditions. This trajectory forms an octagon in the four-dimensional space Σ = Σ A × Σ B , it is depicted in Figure 3. The orbit follows an 8-periodic itinerary of the form (That is, there is a strictly increasing sequence of times (t i ) i≥1 such that (p(t), t 4 ), etc.) Note that the second player's best response never changes from 1 to 2, nor vice versa. Similarly, for player A the best response never directly changes between 1 and 3 without an intermediate step through 2. Moreover, it can be seen from Figure 3 that the projections of the periodic orbit to Σ A and Σ B lie in halfplanes whose boundaries contain the points E A and E B respectively. Hence Lemma 5.1 allows us to choose the matrices A and B such that this orbit lies completely in P − A × P − B , so that the payoffs to both players are permanently worse than Nash equilibrium payoff. Figure 4 shows the (negative) payoffs to both players along several periods of the orbit and the higher (zero) Nash equilibrium payoff. This example has been obtained through numerical experimentation. The difficulty in finding an example of a periodic orbit with the key property of lying in a convex cone with apex at the unique, completely mixed Nash equilibrium seems to suggest that such examples are relatively rare. For most games with unique, completely mixed Nash equilibrium, payoff along typical FP orbits either Pareto dominates Nash equilibrium payoff or at least improves upon it along parts of the orbit. We formulate the following two conjectures. Conjecture 5.3. Bimatrix games with unique, completely mixed Nash equilibrium, where Nash equilibrium Pareto dominates typical FP orbits are rare. To be precise, within the space of n × n games with entries in [0, 1], those where typical FP orbits are Pareto dominated by Nash equilibrium form a set with at most Lebesgue measure 0.01. Conjecture 5.4. For bimatrix games with unique, completely mixed Nash equilibrium and certain transition combinatorics (see [25]), Nash equilibrium does not Pareto dominate typical FP orbits. In particular, this is the case if BR A (e j ) BR A (e j ) for all j j and BR B (e i ) BR B (e i ) for all i i .
Indeed, we could strengthen the above conjecture to the following statement.
Conjecture 5.5. For 'most' bimatrix games with unique, completely mixed Nash equilibrium, typical FP orbits dominate Nash equilibrium in terms of average payoff. In particular, this is the case under certain assumptions on the transition combinatorics of the game; for instance, if each pure strategy invokes a distinct pure best response (as in the previous conjecture).

Concluding remarks on FP performance
Conceptually, the overall observation is that playing Nash equilibrium might not be an advantage over playing according to some learning algorithm (such as FP) in a wide range of games, in particular in many common examples of games occurring in the literature. Even in cases where FP does not dominate Nash equilibrium at all times, it might still be preferable in terms of time-averaged payoff. In contrast, the previous section shows that there are examples in which Nash equilibrium indeed Pareto dominates FP, but the restrictive nature of the example suggests that this situation is quite rare.
Conversely, the discussion also shows that certain notions of game equivalence (for instance, linear equivalence, or the weaker best and better response equivalences, see [23,24]), which are popular in the literature on learning dynamics, are not meaningful in an economic context as they do not preserve essential features of the payoff structure of games, even though they preserve Nash equilibria (and other notions of equilibrium) and conditional preferences of the players. While some dynamics (in particular, FP dynamics or its autonomous version, the best response dynamics [13,21,19]) are invariant under all of these equivalence relations, the actual payoffs along their orbits and the payoff comparison of different orbits can strongly depend on the chosen representative bimatrix, as becomes apparent from Theorem 3.3. This is to some extent analogous to the situation in the classical example of the 'prisoner's dilemma' given by the bimatrix which shares all essential features such as equilibria, best response structures, etc with the prisoner's dilemma. Both games are dynamically identical, with all FP orbits converging along straight lines to the unique pure Nash equilibrium (2,2). However, the second game does not constitute a prisoner's dilemma in the classical sense: whereas in the prisoner's dilemma the Nash equilibrium is Pareto dominated by the (dynamically irrelevant) strategy profile (1, 1), in the second game this is not the case and no 'dilemma' occurs. Theorem 3.3 can be interpreted in a similar vain: linear equivalence turns out to be sufficiently coarse, so that by changing the representative bimatrix inside an equivalence class, one can create certain regions in Σ in which payoff is arbitrarily high in comparison to the payoff at the unique Nash equilibrium. Since FP orbits remain unchanged, this can be done in such a way that a given periodic orbit lies completely or predominantly in these desired 'high payoff portions' of Σ. On the other hand, it can be seen from the proof that the conditions for this to happen are not at all exceptional. Consequently, it could be argued that in many games of interest the assumption that Nash equilibrium play is the most desirable outcome might not hold and a more dynamic view of 'optimal play' might be reasonable.