Best-Response dynamics in two-person random games with correlated payoffs

We consider finite two-player normal form games with random payoffs. Player A's payoffs are i.i.d. from a uniform distribution. Given p in [0, 1], for any action profile, player B's payoff coincides with player A's payoff with probability p and is i.i.d. from the same uniform distribution with probability 1-p. This model interpolates the model of i.i.d. random payoff used in most of the literature and the model of random potential games. First we study the number of pure Nash equilibria in the above class of games. Then we show that, for any positive p, asymptotically in the number of available actions, best response dynamics reaches a pure Nash equilibrium with high probability.


INTRODUCTION
1.1.The problem.Consider the class of two-person normal-form finite games.Some properties hold for the entire class, for instance, the mixed extension of each game in the class admits a Nash equilibrium (Nash, 1950(Nash, , 1951)).Some properties hold generically, for instance, generically the number of Nash equilibria is finite and odd (Wilson, 1971, Harsanyi, 1973).Some properties do not hold generically and neither does their negation; for instance having a pure Nash equilibrium or not having a pure Nash equilibrium is a not a generic property of finite games.Still, it may be relevant to know how likely it is for a finite game to admit a pure equilibrium.Along a similar line of investigation, how likely is a recursive procedure-such as best response dynamics-to reach a pure Nash equilibrium in finite time?
One way to formalize these questions is to assume that the game is drawn at random according to some probability measure.It is not clear what a natural probability measure is in this setting; a good part of the literature on the topic has focused on measures that make the payoffs i.i.d. with zero probability of ties.Few papers have relaxed this assumption.For instance, Rinott and Scarsini (2000) considered payoff vectors that are i.i.d.across different action profiles, but can have some positive or negative dependence within the same action profile.Amiet et al. (2021b) considered i.i.d.payoffs whose distribution may have atoms and, as a consequence, may produce ties.Durand and Gaujal (2016) studied the class of random potential games, i.e., a class of games that admit a potential having i.i.d.entries.1.2.Our contribution.In this paper we want to study two-person games with random payoffs where the stochastic model for the payoffs parametrically interpolates the case of i.i.d.payoffs with no ties and the case of random potential games.In particular, we start with a model where all payoffs are i.i.d.according to a continuous distribution function (without loss of generality, uniform on [0, 1]) and we consider an i.i.d.set of coin tosses, one for each action profile.If the toss gives head, then the original payoff of the second player is made equal to the payoff of the first player; if the toss gives tail, the payoff remains unchanged.The relevant parameter is the probability p of getting heads in the coin toss.If p = 0, we obtain the classical model of random games with continuous i.i.d.payoffs.If p = 1, we get the model of common-interest random games.From the viewpoint of pure Nash equilibria (PNE) any potential game is strategically equivalent to a common-interest game.Therefore, the above class of games parametrically interpolates the case of i.i.d.payoffs with no ties and the case of random potential games.When p is small, the game is close to a game with i.i.d.payoffs; when p is large, the game is close to a potential game.
For this parametric class of games we first compute the expected number of PNE as a function of p, and then study its asymptotic behavior as the numbers of actions of the two players diverge, possibly at different speeds.It is well known (Powers, 1990) that, as the number of action increases, the asymptotic distribution of the number of PNE is a Poisson(1) distribution, for i.i.d.random payoffs.Our result shows an interesting phase transition around p = 0, in the sense that for every p > 0 the expected number of PNE diverges.
We then consider best response dynamics (BRD) for the above class of games.Durand and Gaujal (2016) considered BRD for random potential games with an arbitrary number of players and the same number of actions for each player.In this class of games a PNE is reached by a BRD in finite time.Durand and Gaujal (2016) studied the asymptotic behavior of the expectation of this random time.In our paper we first consider potential games and we compute the distribution of the time that the BRD needs to reach a PNE.Moreover we compute exactly the first two moments of this random time, when the two players have the same action set.Amiet et al. (2021a) showed that, for games with i.i.d.continuous payoffs, when players have the same action set, as the number of actions increases, the probability that a BRD reaches a PNE goes to zero.Here we generalize the result of Amiet et al. (2021a) to the case of possibly different action sets for the two players.Moreover, we prove that for every positive p, asymptotically in the number of actions, a BRD reaches a PNE in finite time with probability arbitrarily close to 1. Again this shows a phase transition in p = 0 for the behavior of the BRD.1.3.Related literature.Games with random payoffs have been studied for more than sixty years.We refer the reader to Amiet et al. (2021b), Heinrich et al. (2021) for an extensive survey of the literature on the topic.Here we mention just some recent papers and some articles that are more directly connected with the results of our paper.Powers (1990) proved that in random games with i.i.d.payoffs having a continuous distribution, as the number of actions of at least two players diverges, the asymptotic distribution of the number of PNE is Poisson(1).Stanford (1995) computed the exact nonasymptotic form of this distribution, from which the result in Powers (1990) can be obtained as a corollary.Rinott and Scarsini (2000) relaxed the hypothesis of i.i.d.payoffs, retaining independence of payoffs corresponding to different action profiles, but allowing dependence within the payoffs within the same profile.They proved an interesting phase transition in terms of the payoffs' correlation: asymptotically in either the number of players or the number of actions, for negative dependence the number of PNE goes to 0, for positive dependence it diverges, and for independence it is Poisson(1), as proved by Powers (1990).Baldi et al. (1989) studied the distribution of the number of local maxima on a graph, which-by choosing a suitable graph-can be translated into the number of PNE in a random potential game.Pei and Takahashi (2019) studied point-rationalizable strategies in two-person random games.Since the number of point-rationalizable strategies for each player is weakly larger than the number of PNE, they were interested in the typical magnitude of the difference between these two numbers.A game is dominance solvable if iterated elimination of strictly dominated strategies leads to a unique action profile, which must be a PNE.Alon et al. (2021) used recent combinatoric results to prove that the probability that a two-person random game is dominance solvable vanishes with the number of actions.
Several papers studied the behavior of various learning dynamics BRD in games with random payoffs.For instance, Galla and Farmer (2013) studied a type of reinforcement learning called experience-weighted attraction in two-person games and showed the existence of three different regimes in terms of convergence to equilibria.Sanders et al. (2018) extended their analysis to games with an arbitrary finite number of players.Pangallo et al. (2019) compared through simulation the behavior of various adaptive learning procedures in games whose payoffs are drawn at random.Heinrich et al. (2021) compared the behavior of BRD in games with random payoffs, when the order of acting players is fixed vs when it is random and they showed that, asymptotically in either the number of players or the number of strategies, the fixed-order BRD converges with vanishing probability, whereas the random-order does converge to a PNE whenever it exists.Similar results were obtained by Wiese and Heinrich (2022).Coucheney et al. (2014), Durand and Gaujal (2016), Durand et al. (2019) focused on random potential games and measured the speed of convergence of BRD to a PNE.Amiet et al. (2021a) dealt with two-person games where the players have the same action set and payoffs are i.i.d. with a continuous distribution.They compared the behavior of best response dynamics and better response dynamics.They proved that, asymptotically in the number of actions, the first reaches a PNE only with vanishing probability, whereas the second does reach it, whenever it exists.Amiet et al. (2021b) studied a class of games with n players and two actions for each player where the payoffs are i.i.d.but their distribution may have atoms.They proved that the relevant parameter for the analysis of this class of games is the probability of ties in the payoffs, called α.They showed that, whenever this parameter is positive, the number of PNE diverges, as n → ∞ and proved a central limit theorem for this random variable.Moreover, using percolation techniques, they studied the asymptotic behavior of BRD, as a function of α, and they showed a phase transition at α = 1/2.
Potential functions in games were introduced by Rosenthal (1973) and their properties were extensively studied by Monderer and Shapley (1996).Among them, existence of PNE and convergence to one of these equilibria of the most common learning procedures, including BRD.Goemans et al. (2005) introduced the concept of sink equilibrium.Sink equilibria are strongly connected stable sets of action profiles that are never abandoned once reached by a BRD.A sink equilibrium that is not a PNE is what in this paper is called a trap.Fabrikant et al. (2013) studied the class of weakly acyclical games, i.e., the class of games for which from every action profile, there exists some better-response improvement path that leads from that action profile to a PNE.This class includes potential games and dominance solvable games as particular cases.
The goal of our paper is to consider probability measures on spaces of finite noncooperative games that go beyond the usual assumption of i.i.d.payoffs.In particular, we define a parametric family of probability measures that interpolates random games with i.i.d.payoffs and random potential games.The interpolation is achieved locally by acting on each action profile of the game and replacing-with some fixed probability and independently across profiles-the payoff of the second player with the payoff of the first player.A different interpolation could be achieved by considering a convex combination of a game with i.i.d.payoffs and a random potential game.This was done, e.g., in Rinott and Scarsini (2000), where in each action profile the payoffs are obtained by summing a Gaussian vector with i.i.d.components and an independent Gaussian vector with identical components (which plays the role of the random potential).This approach is somehow comparable with the idea of decomposing the space of finite games proposed by Candogan et al. (2011).This decomposition was then used by Candogan et al. (2013) to analyze BRD in games that are close to potential games.1.4.Organization of the paper.Section 2 introduces some basic game theoretic concepts.Section 3 defines the parametric family of distributions on the space of games and deals with the number of PNE in games with random payoffs.Section 4 studies the behavior of BRD in games with random payoffs in the above parametric class, for different values of the parameter.Section 5 contains all the proofs.Conclusions and open problems can be found in Section 6. Appendix A lists the symbols used throughout the paper.Appendix B contains two well-known results about the Beta distribution.1.5.Notation.Given an integer n, the symbol [n] indicates the set {1, . . ., n}.Given a finite set A, the symbol |A| denotes its cardinality.The symbol denotes the union of disjoint sets.We use the notation x ∧ y := min{x, y}.The symbol P − → denotes convergence in probability.Given two nonnegative sequences h n , g n , we use the following common asymptotic notations: (1.5)

PRELIMINARIES
We consider two-person normal-form games where, for i ∈ {A, B}, player i's action set is where, for i ∈ {A, B}, (2.2) A pure Nash equilibrium (PNE) of the game is a pair (a * , b * ) of actions such that, for all a ∈ (2.3) As is well known, PNE are not guaranteed to exist.A class of games that admits PNE is the class of potential games, i.e., games for which there exists a potential function Ψ : (2.4b) Games of common interest, i.e., games for which U A = U B , are a particular case of potential games.
As far as PNE are concerned, every potential game is strategically equivalent to a common interest game, for instance to the game where U A = U B = Ψ.For the properties of potential games with an arbitrary number of players, we refer the reader to Monderer and Shapley (1996).Given a finite game, it is interesting to see whether an equilibrium can be reached iteratively by allowing players to deviate whenever they have an incentive to do so.In particular, we will consider a procedure where, starting from a fixed action profile, players in alternation choose their best response to the other player's action.If the procedure gets stuck in an action profile, then it has reached a pure Nash equilibrium.In general, there is no guarantee that this occurs.
Assume that the payoffs of each player are all different.The best response dynamics (BRD) is a learning algorithm taking as input a two-player game (U A , U B ) and a starting action profile (a 0 , b 0 ).For each t ≥ 0 we consider the process BRD(t) on and, if BRD(t) = (a , b ), then, for t even, for t odd, (2.9) It is easy to see that, if, for some positive t, we have The algorithm stops when it visits an action profile for the second time.If this profile is the same as the one visited at the previous time, then a PNE has been reached.
Moreover, the definition of BRD implies that for every trap T we have |T | ≥ 4 and |T | even.Moreover, for every game (U A , U B ) and every initial profile (a, b), the BRD eventually visits a PNE or a trap in finite time, say τ .
Even if the game admits PNE, there is no guarantee that a BRD reaches one of them; it could cycle over a trap, i.e., it could start to periodically visit the same set of action profiles and never stabilize.On the other hand, if the game is a potential game, then a BRD always reaches a PNE.This is due to the fact that at every iteration of the BRD the payoff of one player increases, and so does the potential.Since the game is finite, in finite time the BRD reaches a local maximum, which is a PNE (see, e.g., Karlin and Peres, 2017, proposition 4.4.6).
The goal of this paper is to study the number of PNE and the behavior of BRD in a "typical" game.To make sense of the above sentence, we need to formalize the meaning of the term typical.The approach that we will take is stochastic.That is, we will assume the bimatrix U to be random and drawn from a distribution that will be specified later.In any game with random payoffs, the set NE of pure Nash equilibria is a (possibly empty) random set of action profiles, i.e., a random subset of Therefore, since the game is finite, the number of PNE is an integer-valued random variable.Moreover, we will be able to speak about the probability that a BRD converges (to a PNE).

NUMBER OF PURE NASH EQUILIBRIA IN RANDOM GAMES
As mentioned in the Introduction, several attempts have been made in the literature to put a probability measure on a space of games.Most of the existing papers assume all the entries of U to be i.i.d. with a continuous marginal distribution.There are some notable exceptions to the independence assumption.Rinott and Scarsini (2000) considered a setting where the action profiles have a continuous distribution and are i.i.d., but some dependence is allowed within each profile.Durand and Gaujal (2016) studied random potential games where the entries of the potential are i.i.d. with a continuous distribution.
Our stochastic model is quite general, since-in a sense that will be made precise-it interpolates the i.i.d.payoffs and the random potential.
By definition, the concept of pure Nash equilibrium is ordinal, that is, if all payoffs in a game are transformed according to a strictly increasing function, then the set of pure Nash equilibria remains the same.Assume that each entry of U has a marginal distribution that is uniform on the interval [0, 1].Its distribution function will be denoted by F .The above consideration implies that this uniformity assumption is without loss of generality, i.e., any other continuous distribution would produce the same conclusions.
Start with (U A , U B ), where all the entries are i.i.d. with distribution F .Then, for each action profile (a, b), with probability p set U B (a, b) to be equal to U A (a, b), independently of the other action profiles.In other words, for every pair (a, b), • with probability 1 − p, the random payoffs The larger p, the closer the game is to a potential game.The smaller p, the closer the game is to a random game with i.i.d.payoffs.The game whose payoff bimatrix is obtained as above will be denoted by U (p).
We now compute the expected number of PNE in the above-defined class of random games.
The analysis of this class of games is quite complicated for fixed K A , K B .Therefore, as it is done in much of the literature, we will take an asymptotic approach, letting the number of actions grow.More formally, we will consider a sequence (U n ) n∈N of payoff bimatrices, where the numbers of actions in game U n are K A n and K B n , and these two integer sequences are increasing in n and diverge to ∞.In particular, we allow the number of actions of the two players to diverge at different speeds.
The following proposition shows the asymptotic behavior of the random number of PNE where the parameter p may vary with n.We write p n to highlight this dependence.In what follows, every asymptotic equality holds for n → ∞.The proof uses a second-moment argument.
For every n ∈ N, let W n be the number of PNE in the game U n and The following corollary deals with the case of fixed p. (3.4) (3.6)When p = 0, i.e., the payoffs are i.i.d., the number of PNE converges in distribution to a Poisson with parameter 1 (see Powers, 1990).When the payoffs within the same action profile are positively correlated, Rinott and Scarsini (2000) showed that the number of PNE diverges.A similar phenomenon happens here when p > 0.

BEST RESPONSE DYNAMICS
We now want to study the behavior of BRD in the class of random games introduced in Section 3. First notice that the continuity of F implies that the probability of ties in the payoffs of the same player is zero; as a consequence, once the game is realized, a BRD is almost surely deterministic.In this respect, the symbols P and E refer solely to the randomness of the payoffs, not to any randomness in the BRD.Moreover, the symmetry of our model implies that, without loss of generality, we can assume the starting position of the BRD to be any fixed profile.In the rest of the paper, without loss of generality, the starting point of any BRD will always be the profile (1, 1), i.e., for every n 4.1.BRD and related stopping times.As stressed above, for a given realization of the payoffs and a starting point, the BRD is a deterministic algorithm that decides its next step only on the basis of local information.In what follows we will exploit this fact by revealing the players' payoffs only when this information is needed to select the next position of the BRD.This whole process, in which the BRD moves on a sequentially sampled random game, can be thought of as a non-Markov stochastic process, and the time at which the BRD stops can be seen as a stopping time for such a stochastic process.
We will focus our attention on the distribution of the first time the BRD reaches a PNE.For the sake of brevity, we write NE n for the (random) set of PNE in the game U n (p n ), and we define In words, τ NE n is the first time the process BRD n (t) visits a PNE.Notice that the first step of the BRD is somehow different from the following, because, by the definition of the model, at time 0 no player is assumed to be already in a best response.Contrarily, for any t ≥ 1 odd (resp.even) we have that the first (resp.second) player is in a best response, and the other player's action can be changed at the following step, if it is not itself a best response.As a consequence of this fact, the forthcoming definitions that depend on the step t of the BRD, require a special treatment for the first few steps, i.e., t = 0, 1, 2. Clearly, this issue could be solved by assuming that at t = 0 the second player is in best response.The latter assumption has been made, e.g., in Amiet et al. (2021a).For the sake of generality, we prefer to avoid this assumption and rather treat the case of t = 0, 1, 2 separately.We will make use of the following sequence of random sets: Roughly speaking, the random set R n (t) is the set of all rows and columns that contain one element that has been visited by the BRD up to time t.More precisely, for t ≥ 1, in order to determine where BRD n (t) is and whether it is in a PNE or not, for each action profile (a, b) ∈ R n (t), at least one of the payoffs U A (a, b) and U B (a, b), at some time s ≤ t, needs to be revealed.For instance, BRD n (t) = (a, 1) for some a ∈ [K A n ].To determine whether this profile (a, 1) is a PNE or not, the payoff U B (a, 1) has to be compared with all payoffs To better understand the above definition, consider the random time An example of the first steps of the BRD is given in Fig. 1.The red (respectively blue) dots are the action profiles whose payoff is compared by the row (respectively column) player.The red (respectively blue) lines helps visualize the action profiles that the BRD considers when the row (respectively column) player is active in the BRD.The intersections between these lines are action profiles in which the payoffs of both the column and row player have been examined by the BRD.Such points are represented by two overlapping dots of different size: the color of the biggest (respectively smallest) one is associated to the first (respectively second) player that compares the payoff of such action profiles.The numbers indicate the positions of the BRD at the various times.The odd (respectively even) numbers are in red (respectively blue) since the associated action profiles are visited for the first time when the row (respectively column) player is active.The left side of Fig. 1 shows an instance of what occurs in case (i) in the above list, whereas the right side of the figure contains a graphical explanation of what happens in case (ii).
In general, the trajectory of the BRD for all t ≥ 0 is completely determined by the trajectory up to the random time τ R n .Concerning the random time τ NE n defined in Eq. (4.2), notice that (a) in the case described in (i), we have By Eqs.(4.3) and (4.5), we have Moreover, by Eq. (4.6) and (a), if a PNE is eventually reached, then the BRD must visit the set NE n for the first time within 2K n − 1 steps.Formally, As shown by the following lemma, another relevant feature of Eqs.(4.3) and (4.4) is that, for all t ≥ 1, conditionally on the event {τ R n > t}, despite the fact that the set Then, for every t ≥ 1, 4.2.Potential games.We start studying the case of p = 1, i.e., the case of potential games.A potential game cannot have traps.Therefore, thanks to (a), which, together with Eqs.(4.6) and (4.7), implies that, for every n ≥ 1, (4.12) In words, q t,n represents the conditional probability that a PNE is reached at time t, given that it was not reached before.Notice that if t = 2K n − 1, then q t,n = 1.We start with a non-asymptotic result, which, for every n ∈ N, provides the value of q t,n for all t ≤ 2K n − 1.
Lemma 4.2.Fix n ∈ N, let p n = 1 and recal the definition of r n (t) in Eq. (4.8).Then, and, for t ≥ 2, As an immediate consequence of the previous lemma, we obtain the exact form of the distribution of the random time τ NE n .
Theorem 4.3.With the definitions of Lemma 4.2, we have (1 − q j,n ). (4.16) The following proposition provides the asymptotic expectation and variance of τ NE n when the two players have the same action set.We point out that our expression for E τ NE n coincides with the one in Durand and Gaujal (2016, theorem 4), but the proof techniques exploited in Theorem 4.3 are completely different from the ones used by Durand and Gaujal (2016); moreover, our analysis considers the whole distribution of τ NE n and not just its expectation.
4.3.Games with i.i.d.payoffs.In a recent paper Amiet et al. (2021a, theorem 2.3) showed that, when p n = 0 and K A n = K B n , with high probability as n → ∞, the BRD does not converge to the set NE n .We start by generalizing their result to the general setting Theorem 4.5.Let p n = 0 for all n ∈ N. Then Even though the result in Theorem 4.5 can be achieved by naturally adapting the proof of Amiet et al. (2021a, theorem 2.3) to the rectangular case, for completeness we present a detailed proof in Section 5.It is worth stressing that, contrarily to our setting, Amiet et al. (2021a) assume that the BRD starts at an action profile in which the second player is already at best-response.The fact that we dispense with this assumption makes the quantities appearing in our proof slightly different from those in Amiet et al. (2021a).
4.4.The general case.The main purpose of this paper is to complement the negative result in Theorem 4.5 by showing that a tiny bit of (positive) correlation in the players payoffs, namely p n > 0, is enough to dramatically change the picture and make it similar to the case of a potential game, i.e., p n = 1.
More precisely, the following result shows that, if p n is not too small compared to K n , then the probability of the event τ NE n = ∞ vanishes as n goes to infinity.
Theorem 4.6.Fix a positive sequence p n .If Notice that the latter result is qualitative in nature: it tells us that the BRD will eventually converge to a Nash equilibrium rather than to a trap, but does not provide any bound on the rate of convergence beyond the trivial one presented in Eq. (4.7).The following result-which implies Theorem 4.6-provides a much better bound on the time of convergence.Indeed, it states that, under the condition in Eq. (4.20), the time of convergence of the BRD to an equilibrium can be upper bounded, with high probability, by any function diverging exponentially faster than p −1 n .
Proposition 4.7.Fix a positive sequence p n satisfying Eq. (4.20).Then, for any sequence T n such that we have Notice that Theorem 4.6 follows from Proposition 4.7 by choosing T n = 2K n −1.In the particular case when p n = p > 0 for all n ∈ N, Proposition 4.7 states that, with high probability, a PNE is reached by the BRD in at most T n steps, no matter how slowly the sequence T n diverges.
Remark 4.8.Notice that, whenever p n < 1, the random variable τ NE n takes value +∞ with positive probability.Therefore, even though Eq. (4.21) holds true, the random variable τ NE n cannot have a finite expectation.

Proofs of Section 3.
Proof of Proposition 3.1.By linearity of expectation, we have (5.1) Conditioning on the event U A (1, 1) = U B (1, 1) and using the law of total probability, we get (5.2) To see that Eq. ( 5.2) holds, consider the following: when U A (1, 1) = U B (1, 1), the profile (1, 1) is a PNE if and only if U A (1, 1) is larger than or equal to all payoffs U A (a, 1) and all payoffs . By symmetry, this happens with probability 1/(K A + K B − 1).
On the other hand, when U A (1, 1) = U B (1, 1), the profile (1, 1) is a PNE if and only if U A (1, 1) is larger than or equal to all payoffs U A (a, 1) for all a ∈ [K A ] and U B (1, 1) is larger than or equal to all payoffs . By independence of the payoffs and by symmetry, this happens with probability 1/K A K B .Plugging Eq. ( 5.2) into Eq.( 5.1), we get the result.
Proof of Proposition 3.2.By Proposition 3.1 we get where, in the asymptotic equality we used the fact that We now show that W n concentrates around its expectation.To this end, we use an upper bound on the second moment of W n .Notice that (5.4) It follows immediately from Eq. (5.4), that the computation of the second moment amounts to studying probabilities of the form We argue that there are only three relavant cases: (5.7) The inequality in Eq. (5.7) stems from the fact that the event (a , b ) ∈ NE n coincides with the ) .The independence between this latter event and {(a, b) ∈ NE n } proves the inequality.The second equality in Eq. (5.7), follows from the fact that, when U A (a , b ) = U B (a , b ), which happens with probability p n , the event has the same probability as the event of picking the maximum among K A n + K B n − 3 equally probable objects; on the other hand, when U A (a , b ) = U B (a , b ), the event in Eq. (5.8) has the same probability of picking independently the maximum of K A n − 1 equally probably objects and the maximum of K B n − 1 equally probably objects.The last equality stems from Eq. (5.2).In conclusion, plugging (5.5)-(5.7)into Eq.(5.4), we obtain (5.9)where the last equality stems from the fact that By Chebyshev's inequality, we have that, for every ε > 0, (5.10)where the asymptotic upper bound follows from Eq. (5.9).

Proofs of Section 4.1.
Proof of Lemma 4.1.We prove Eq. (4.9) by induction.For t = 1 we have |R n (1)| = K A n + K B n − 1, which is trivially true.Assume now that Eq. (4.9) holds up to t − 1 < τ R n .Notice that, conditioning on {τ R n > t}, BRD n (t) cannot visit a row or column visited at time 1, 2, . . ., t − 1.Since player A plays first, by the inductive hypothesis and thanks to the conditioning, for 0 < t < τ R n , we have almost surely, (5.11) Since (5.12) we can rewrite the right hand side of Eq. ( 5.11) as (4.8).Hence, Eq. (4.9) holds.

Proofs of Section 4.2.
Proof of Lemma 4.2.Since p n = 1, we have U A n = U B n almost surely.Therefore, to simplify the notation, we write (5.13)Moreover, in a potential game τ R n = t = τ NE n = t − 1 , for t > 2, as in Eq. (4.10).We start with t = 0. We have U n ( , 1), max (5.14) Let now t = 1.We have Moreover, if we define the event A a := {player A's best response to action 1 is a}, (5.16) we have where, with an abuse of notation, we have identified a random variable with its distribution and we have applied Proposition B.1 about the maximum of uniform independent random variables and Proposition B.2 about the probability that a Beta(a, 1) is larger than an independent Beta(b, 1).This result can be applied because the payoffs, and consequently the two Beta random variables, are independent.Therefore, combining Eqs.(5.15) and (5.17), we obtain (5.18) Let now t ≥ 2. Call Π t the set of sequences π = (π 0 , π 1 , . . ., π t ) such that • there is no pair of distinct odd indices i, j such that b i = b j , • there is no pair of distinct even indices i, j such that a i = a j .
Notice that the set Π t coincides with all possible trajectories of length t of BRD n satisfying the event τ NE n ≥ t .Define the event (5.20) The conditional probability P τ NE n = t | G π t , τ NE n ≥ t equals the probability that the maximum of r n (t − 1) i.i.d.uniform random variables is bigger than the maximum of We split the first sum in Eq. (5.25) into two parts: t ∈ {0, . . ., T n − 1} and t ∈ {T n , . . ., 2K n − 2}, where T n = (log K n ) 2 .We start by showing that (5.26) for which it suffices to show that for t ≥ T n (5.27) Notice that the sequence q t,n , defined in Eq. (4.12), is increasing in t.Hence, for t ≥ T n , we have where in the last asymptotic equality we used q 1,n = 1/2 and T n = ω(log(K n )).
We now prove Eq. (4.18).Call H n the distribution function of τ NE n .By Eq. (5.33), we have, for In what follows, we will use the following lemma, whose proof can be found, for instance, in Ogryczak and Ruszczy ński (1999, corollary 3).
Lemma 5.1.Let X be a random variable with finite expectation µ, finite variance, and distribution function where We now go back to the proof of Eq. (4.18).Since the random variable τ NE n is bounded, we can apply Lemma 5.1 to get (5.38)where, thanks to Eq. (5.36), for all t ≤ T n , By explicit numerical integration, using Eq.(5.39) and the fact that µ := E τ NE n ∼ e −1 ≤ T n for all n sufficiently large, we get (5.41) Since the sequence q t,n , defined in Eq. (4.12), is increasing in t and q 1,n = 1/2, by Eq. ( 5.27) we get, for all s ≥ 0, (5.42) Hence, for all t ≥ 0, Tn∨t) , (5.43) which goes to zero when n → ∞.Therefore, by Eqs.(5.36), (5.41) and (5.43), we conclude that Tn t (5.44) It is now convenient to split the second integral in Eq. ( 5.38) as follows (5.45) At this point, using Eq. ( 5.44), we can bound the second integral on the right hand side of Eq. (5.45) as follows (5.46) On the other hand, the first integral on the right hand side of Eq. ( 5.45) can be bounded by (5.47) In conclusion, by (5.45)-(5.47)and numerical integration, Proof of Theorem 4.5.First observe that either (5.49)where R n (t) is defined as in Eq. (4.8).Therefore, the statement in Eq. (4.19) is equivalent to (5.50) Notice that (5.51) We have (5.52) This implies (5.53) Notice that Hence, by definition of D n (t − 1), we get (5.57) By explicit computation (see Fig. 2), we get for t odd. (5.58) On the other hand, since D n (t) ⊆ D n (t − 1), (5.59) (5) coincides with the event that the payoff of player B at 5 is not the maximum of its row (the blue dashed line).Conditioning on D n (5), the probability of D n (6) is the product between the probability that BRD( 6) is not at the action profiles M 1 and M 2 , that is (K B n − 3)/(K B n − 1), and, given this, the probability that the payoff of player A at 6 is not the maximum of its column (the red dashed line), i.e., (K A n − 1)/K A n .This explains Eq. (5.61) when t = 6.Similarly, conditioning on D n (5), C n (6) is the intersection between two events: the first one is that the position of BRD(6) does not coincide with M 1 and M 2 , which has probability (K B n − 3)/(K B n − 1); the second one is that the payoff of player A at 6 is the maximum of its column (the red dashed line), which, conditioning on the first event, has probability 1/K A n .This justifies Eq. (5.58) when t = 6.
we have (5.60) The latter conditional probability can be explicitly computed (see Fig. 2), obtaining for t odd. (5.61) The proof of Proposition 5.2 relies on the following lemma.
Proof of Proposition 5.2.Define the events (5.76) Then 5.77) where the first equality is just the law of total probabilities, the second derives from Eq. (5.75), the first inequality is a consequence of Eq. (5.76), and the last stems from the definition of conditional be the set of action profiles that give the same payoff to the two players.Fix now a sequence (5.86) The event F s,Tn n occurs if there exists an interval of s consecutive steps before T n in which the BRD visits only elements in S n .
For every sequence (s n ) n∈N such that s n ≤ T n for every n, we have (5.87) First we show that the first term on the r.h.s. of Eq. (5.87) goes to zero as s n → ∞.To this end, it is enough to show that, under the event F sn,Tn n , there exists some t ≤ T n − s n such that the bestresponding player's payoff at time t + s n is stochastically larger than a Beta(u, 1) random variable, with 5.89) where in the first inequality we used Eq.(4.8) and the fact that s n ≤ T n ; the second inequality holds for all sufficiently large n by our choice of T n .Indeed, after the interval of s n consecutive steps in which the BRD visits only elements of S n , the BRD visits an action profile (a, b) such that which goes to zero as s n → ∞.We now show that, under the assumption in Eq. ( 5.67), it is possible to find a sequence (s n ) n such that Note that in this case the event F sn,Tn n occurs with T n = 7 and s n = 3.Indeed, there exists t ≤ T n − s n (in this case t = 4) such that the BRD visits only action profiles in S n for s n consecutive steps.As a consequence, the payoff of the row player at the action profile 7 is the maximum of the payoffs of the row player in the action profiles lying on the red dashed lines and of the payoffs of the column player in the action profiles lying on the blue dashed lines.Such payoffs are all i.i.d.Unif([0, 1]) except for the payoffs associated to the action profiles 4, V 1 , . . ., V 5 , for which we have additional information.Since the number of such exceptional action profiles is at most (T n /2) 2 and the total number of action profiles lying on the dashed lines is r n (s n ), we have that the payoff of the row player at the action profile 7 is the maximum of at least u i.i.d.Unif([0, 1]) random variables, where u is defined as in Eq. (5.88).
none of them is such that the BRD visits only S n in that subinterval.Therefore, (5.96)

CONCLUSIONS AND OPEN PROBLEMS
We have considered a model of two-person games with random payoffs that parametrically interpolates potential games and games with i.i.d.payoffs.The interpolation acts locally on each payoff profile.We have studied both the asymptotic behavior of the random number of pure Nash equilibria of the game and the asymptotic behavior of best response dynamics, as the number of actions for each player diverges.The type of model that we chose requires combinatorial tools for its analysis.
We see this paper as a first attempt to provide a parametric model for random games where the payoffs are not independent, but have some structure that depends on a locally acting parameter.Several extensions and variations of this model are conceivable and will be the object of our future research.For instance: (i) It would be interesting to have a clearer view of the phase transition taking place at p = 0.In particular, does it exists a sequence p n converging to zero for which P(τ NE n = ∞) converges to some ∈ (0, 1)? (ii) Games with more than two players could be studied.(iii) With more than two players, different types of deviator rules in BRD could be considered, e.g., round-robin, random order, etc.. (iv) The behavior of better-response dynamics could be studied and compared to best response dynamics, along the lines of Amiet et al. (2021a).(v) When we deal with the number of pure Nash equilibria, we studied a form of Law of Large Numbers.The existence of a Central Limit Theorem could be explored.
[ We report two well-known results about Beta distributions.For the sake of completeness, we add their simple proofs.
Proof.For any t ∈ [0, 1] we have .5) By the definitions in Eqs.(4.3) and (4.4), at time τ R n (i) either the BRD has reached an equilibrium, i.e., BRD n

Figure 1 .
Figure 1.Both figures show an instance of the first seven steps in the BRD.The figure on the left describes the case in which τ NE n = 6.To compute BRD n (7), the row player visits the action profiles on the red dashed lines and finds the maximum payoff at BRD n (6); hence BRD n (6) ∈ NE n .In this case R n (5) consists of all the action profiles on the solid red and blue lines in the figure.Hence, τ R n = 7 and, consequently, τ R n − 1 = τ NE n .The figure on the right describes the case in which the BRD discovers a trap.Since BRD n (6) and BRD n (3) are in the same column, we have BRD n (7) = BRD n (3).In this case τ R n = 6 because R n (4) consists of all the action profiles on the solid lines in the figure, except for the blue line passing through the action profiles labeled 5 and 6.Hence, BRD n (τ R n + 1) = BRD n (t) for t = 3 ≤ τ R n − 3.

Proposition 4. 4 .
If p n = 1 and K A n = K Bn for every n ∈ N.
(a) a = a and b = b ; (b) a = a and b = b or vice versa; (c) a = a and b = b ; Case (a) is trivial: P (a, b), (a , b ) ∈ NE n = P((a, b) ∈ NE n ) .(5.5)The continuity of F implies that in Case (b) we have P (a, b), (a , b ) ∈ NE n = 0. (5.6)To analyze Case (c), we remark that, if a = a and b = b , then the event {(a , b ) ∈ NE n } depends on the event {(a, b) ∈ NE n } only through the payoffs U A (a , b) and U B (a, b ).Therefore P (a, b), (a , b ) ∈ NE n = P((a, b) (t) − (t − µ)] dt = (1 + o(1))

Figure 2 .
Figure2.The figure on the left shows an instance of the first five steps of the BRD, whereas the one on the right considers an additional step.Given the position of BRD(t) for t = 1, . . ., 5 in the figure on the left, D n (5) coincides with the event that the payoff of player B at 5 is not the maximum of its row (the blue dashed line).Conditioning on D n (5), the probability of D n (6) is the product between the probability that BRD(6) is not at the action profiles M 1 and M 2 , that is(K B n − 3)/(K B n − 1), and, given this, the probability that the payoff of player A at 6 is not the maximum of its column (the red dashed line), i.e., (K A n − 1)/K A n .This explains Eq. (5.61) when t = 6.Similarly, conditioning on D n (5), C n (6) is the intersection between two events: the first one is that the position of BRD(6) does not coincide with M 1 and M 2 , which has probability (K B n − 3)/(K B n − 1); the second one is that the payoff of player A at 6 is the maximum of its column (the red dashed line), which, conditioning on the first event, has probability 1/K A n .This justifies Eq. (5.58) when t = 6.

Figure 3 .
Figure 3.The left figure shows an instance of the first four steps of the BRD.The right figure shows how the dynamics proceeds after time 4. The numbered action profiles lying on the dashed lines, i.e., 5, 6, and 7, give the same payoff to the row and column player.Note that in this case the event F sn,Tn T n = o( K A n ∧ K B n ), then the term on the r.h.s. of Eq. (5.93) goes to zero whenevern ) − log(s n ) − s n log(p −1 n ) = ∞,(5.95)or, equivalently,lim n→∞ log(T n ) 1 − log(s n ) log(T n ) − s n log(p −1 n ) log(T n ) = ∞.
i.e., a Beta(k, 1) distribution function.Proposition B.2.Let X and Y be independent random variables with distributions Beta(a, 1) and Beta(b, 1 • bs b−1 ds dt = 1 0 at a−1 • t b dt = a a + b . is not a PNE.Therefore, the sequence E n (t) is increasing in t, i.e., n (t) := {BRD n (t) = BRD n (t − 1)}, for t ≥ 1, (5.74)In words, E n (t) represents the event that at time t the process BRD n (t) visits a previously visited row or column, that is, E n (t) = {τ R n ≤ t}, whereas Z n (t) represents the event that BRD n (t − 1) n (t) is decreasing in t, i.e.,Z n (t − 1) ⊇ Z n (t).
n := ∃t ≤ BRD n (t+s n ) is a PNE, is bounded from below by the probability that a Beta(u, 1) random variable is larger than the maximum ofK A n ∨ K B n − 1 i.i.d.random variables with a uniform distribution on [0, 1].Hence, by Proposition B.2, we get • (a, b) ∈ S n , that is, both players receive the same payoff;• this common payoff is the largest of a set of random variables, among which at least u are i.i.d.Unif([0, 1]) (see Fig.3for more details); hence, the stochastic domination follows.Therefore, by Proposition B.1, the probability that BRD n (t + s n ) = BRD n (t + s n + 1), that is, K B ] action set of player B NE set of pure Nash equilibria p n probability that U A (a, b) = U B (a, b) in the game U n q t,n P τ NE n = t | τ NE n ≥ t , defined in Eq. (4.12) number of pure Nash equilibria in U n Z n (t) {BRD n (t) = BRD n (t − 1)}, defined in Eq. (5.74) Π t set of possible paths for BRD n up to time t τ NE n first time the BRD visits a PNE, defined in Eq. (4.2) τ cycle n first time the BRD re-visits an element of a trap, defined in Eq. (5.80) τ R n min{t ≥ 2 : BRD n (t) ∈ R n (t − 2)}, defined in Eq. (4.5) Φ n (t) t 0 H n (s) ds, defined in Eq. (5.39) Ψ potential function, defined in Eq. (2.4) APPENDIX B. BETA DISTRIBUTION n