research-article

Open Access

Simple Uncoupled No-regret Learning Dynamics for Extensive-form Correlated Equilibrium

Authors:
Gabriele Farina

Carnegie Mellon University, Pittsburgh, Pennsylvania, USA

Carnegie Mellon University, Pittsburgh, Pennsylvania, USA

0000-0002-3976-0061
View Profile

,
Andrea Celli

Bocconi University, Milan, Italy

Bocconi University, Milan, Italy

0000-0002-2046-4019
View Profile

,
Alberto Marchesi

Politecnico di Milano, Milan, Italy

Politecnico di Milano, Milan, Italy

0000-0002-8284-5757
View Profile

,
Nicola Gatti

Politecnico di Milano, Milan, Italy

Politecnico di Milano, Milan, Italy

0000-0001-7349-3932
View Profile

Authors Info & Claims

Journal of the ACM Volume 69 Issue 6Article No.: 41pp 1–41https://doi.org/10.1145/3563772

Published:18 November 2022Publication History

Journal of the ACM

Abstract

The existence of simple uncoupled no-regret learning dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as imperfect information. Because of the sequential nature and presence of private information in the game, correlation in extensive-form games possesses significantly different properties than in normal-form games, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to the classical notion of correlated equilibrium in normal-form games. Compared to the latter, the constraints that define the set of EFCEs are significantly more complex, as the correlation device (a.k.a. mediator) must take into account the evolution of beliefs of each player as they make observations throughout the game. Due to that significant added complexity, the existence of uncoupled learning dynamics leading to an EFCE has remained a challenging open research question for a long time. In this article, we settle that question by giving the first uncoupled no-regret dynamics that converge to the set of EFCEs in n-player general-sum extensive-form games with perfect recall. We show that each iterate can be computed in time polynomial in the size of the game tree, and that, when all players play repeatedly according to our learning dynamics, the empirical frequency of play after T game repetitions is proven to be a \( O(1/\sqrt {T}) \)-approximate EFCE with high probability, and an EFCE almost surely in the limit.

1 INTRODUCTION

The Nash equilibrium (NE) [38] is the most common notion of rationality in game theory, and its computation in two-player zero-sum games has been the flagship computational challenge at the interface between computer science and game theory (see, e.g., the landmark results in heads-up no-limit poker by Brown and Sandholm [3] and Moravčík et al. [35]). The assumption underpinning NE is that the interaction among players is fully decentralized. Therefore, an NE is an element of the uncorrelated strategy space of the game, that is, a product of independent probability distributions over actions, one per player. A competing notion of rationality is the correlated equilibrium (CE) proposed by Aumann [1]. A CE is defined as a probability distribution over joint action profiles—specifying an action for each player—and it is customarily modeled via a trusted external mediator that draws an action profile from this distribution and privately recommends to each player their component. The probability distribution is a CE if no player has an incentive to choose an action different from the mediator’s recommendation, because, assuming that all other players follow their recommended action, the suggested action is the best in expectation.

Many real-world strategic interactions involve more than two players with arbitrary (i.e., general-sum) utilities. In those settings, the CE is an appealing solution concept, as it overcomes several weaknesses of the NE. First, the NE is prone to equilibrium selection issues, raising the question as to how players can select an equilibrium while they are assumed not to be able to communicate with each other. Second, computing an NE is computationally intractable, being PPAD-complete even in two-player games [10, 11], whereas a CE can be computed in polynomial time.¹ Third, the social welfare that can be attained by an NE may be arbitrarily lower than what can be achieved via a CE [6, 31, 42]. Last, in normal-form (that is, simultaneous-move) games, the notion of CE arises from simple uncoupled learning dynamics even in general-sum settings with an arbitrary number of players. In words, these learning dynamics are such that each player adjusts their strategy on the basis of their own payoff function, and on other players’ strategies, but not on the payoff functions of other players. The existence of uncoupled dynamics allows to overcome the—often unreasonable—assumption that players have perfect knowledge of other players’ payoff functions, while at the same time offering a parallel, scalable avenue for finding equilibria. In contrast, in the case of the NE, uncoupled learning dynamics are only known in the two-player zero-sum setting [9, 22, 24] or in multi-player games with particular structures (see, e.g., the case of polymatrix games [12]). All of the above considerations contribute to the idea that CE is often a better prescriptive solution concept than NE in general-sum and multi-player settings.

Extensive-form correlated equilibrium (EFCE), introduced by von Stengel and Forges [48], is a natural extension of the correlated equilibrium to the case of extensive-form (that is, tree-form, sequential) games. Extensive-form games generalize normal-form games by modeling both sequential and simultaneous moves, as well as imperfect information. In an EFCE, the mediator draws, before the beginning of the sequential interaction, a recommended action for each of the possible decision points (also known as information sets) that players may encounter in the game, but these recommendations are not immediately revealed to each player. Instead, the mediator incrementally reveals relevant individual moves as players reach new information sets. At any decision point, the acting player is free to deviate from the recommended action, but doing so comes at the cost of future recommendations, which are no longer issued to that player if they deviate. It is up to the mediator to make sure that the recommended behavior is indeed an equilibrium—that is, that no player would be better off ever deviating from following the mediator’s recommendations at each information set. Compared to the constraints that characterize the set of CEs in normal-form games, those that define the set of EFCEs in extensive-form games are significantly more complex. Indeed, the main challenge of the EFCE case is that the mediator must take into account the evolution of beliefs of each player as they make observations throughout the game tree.

One could define a CE for an extensive-form game by allowing the mediator to draw and recommend an action for each information set to each player before the game starts. Then, each player could decide whether to follow the recommendation or deviate to an arbitrary strategy they desire. In an EFCE, players know less about the action recommendations that were sampled by the mediator than in a CE for extensive-form games, where the whole set of recommendations is immediately revealed. Therefore, by exploiting an EFCE, the mediator can more easily incentivize players to follow strategies that may hurt them, as long as players are indifferent as to whether or not to follow the recommendations. This is beneficial when the mediator wants to maximize, e.g., the social-welfare of the game.

In general-sum extensive-form games with an arbitrary number of players (including potentially the chance player modeling exogenous stochastic events), the problem of computing a feasible EFCE can be solved in polynomial time in the size of the game tree [27] via a variation of the Ellipsoid Against Hope algorithm [28, 40]. Dudík and Gordon [13] provide an alternative sampling-based algorithm to compute EFCEs. However, their algorithm is centralized and based on MCMC sampling, which limits its applicability on large-scale problems. In practice, these approaches cannot scale beyond toy problems. However, methods based on uncoupled learning dynamics usually work quite well in large real-world problems, while retaining the appealing properties of uncoupled dynamics that we discussed above.

The following fundamental research question remains open:

Is it possible to devise uncoupled learning dynamics that converge to an EFCE?

We show that the answer is positive, at least in the full-information feedback model.

In the first part of the article, we formalize the notion of trigger regret, simplifying and extending an idea by Gordon et al. [20]. Trigger regret is a notion of regret suitable for extensive-form games that naturally expresses the regret incurred by each player for following the recommendations issued by the EFCE mediator, instead of deviating according to some optimal-in-hindsight strategy. Specifically, trigger regret is a particular instantiation of the framework known as phi-regret minimization introduced by Stoltz and Lugosi [44] building on previous work by Greenwald and Jafari [21]. In general, phi-regret minimization operates with a notion of regret defined with respect to a given set of linear transformations on the decision set. To define trigger regret, we identify suitable linear transformations that allow us to encode the behavior of trigger agents in the definition of EFCE, which we coin canonical trigger deviation functions. Intuitively, canonical trigger deviation functions encode all the possible ways in which a trigger agent may deviate from the recommendations issued by the EFCE mediator, and instead start playing from that point on according to a different strategy than the recommended one. Our core result on trigger regret is the following: If each player plays according to a no-trigger-regret learning algorithm, then the empirical frequency of play approaches the set of EFCEs.

In the rest of the article, we provide an efficient (that is, requiring time polynomial in the size of the game tree at each iteration) algorithm that minimizes trigger regret. The algorithm is based on the general template for constructing phi-regret minimization algorithms given by Gordon et al. [20], extending prior work by Hazan and Kale [25]. Before one can use that template, two missing pieces need to be solved: (1) constructing an efficient regret minimizer for the set of all valid canonical trigger deviation functions, and (2) showing that any convex combination of canonical trigger deviation functions admits a fixed point strategy, and that such fixed point can be computed efficiently. We solve (1) by exploiting the non-trivial combinatorial structure of the set of canonical trigger deviation functions, and (2) by giving an efficient incremental procedure to compute the fixed point strategy in a top-down traversal of the game tree. Our resulting algorithm minimizes trigger regret, guaranteeing \( O(\sqrt {T}) \) trigger regret with high probability after T iterations and requiring time polynomial in the size of the game tree at each iteration. Thus, when all players play according to the uncoupled learning dynamics defined by our algorithm, the empirical frequency of play after T game repetitions is proven to be a \( O(1/\sqrt {T}) \)-approximate EFCE with high probability, and an EFCE almost surely in the limit. These results generalize the seminal work by Hart and Mas-Colell [22] to the extensive-form game case via a simple and natural framework.

1.1 Related Work

The study of adaptive procedures leading to a CE dates back to at least the seminal works by Foster and Vohra [16], Fudenberg and Levine [17], ,19], and Hart and Mas-Colell [22], ,23]; see also the monograph by Fudenberg and Levine [18]. In particular, the work by Hart and Mas-Colell [22] proves that simple dynamics based on the notion of internal regret result in empirical frequencies of play that converge to the set of CEs in normal-form games. The strategy that the authors introduce—the so-called regret matching—is conceptually simple and guarantees that if all players follow this strategy, then the empirical frequency of play converges to the set of CEs (see also Cahn [5]). Other works describe extensions to the models studied in the aforementioned papers. For example, Stoltz and Lugosi [44] describe an adaptive procedure such that the resulting empirical frequency of play converges to the set of CEs in games with an infinite, but compact, set of actions, while Kakade et al. [29] consider efficient algorithms for computing correlated equilibria in graphical games.

In more recent years, a growing effort has been devoted to understanding the relationships between no-regret learning dynamics and equilibria in extensive-form games. These games pose additional challenges when compared to normal-form games, due to their sequential nature and the presence of imperfect information. While in two-player zero-sum extensive-form games it is widely known that no-regret learning dynamics converge to an NE—with the counterfactual regret minimization(CFR) algorithm and its variations being the state-of-the-art for equilibrium finding in such games [4, 33, 45, 46, 49]—the general case is less understood. Celli et al. [7] provide some variations of the classical CFR algorithm for n-player general-sum extensive-form games, showing that they provably converge to a normal-form coarse correlated equilibrium, which is based on a form of correlation that is less appealing than that of EFCE in sequential games. Indeed, normal-form coarse correlated equilibria require that the players commit to following all the recommendations issued by the mediator upfront before the beginning of the game, which is not realistic in practice.

Finally, we mention relevant literature subsequent to the conference version of this article. In a recent paper, Morrill et al. [37] conduct a study of different forms of correlation in extensive-form games, defining a taxonomy of solution concepts. Each of their solution concepts is attained by a particular set of no-regret learning dynamics, which is obtained by instantiating the phi-regret minimization framework [20, 21, 44] with a suitably defined deviation function. As part of their analysis, Morrill et al. [37] investigate some properties of the well-established CFR regret minimization algorithm [49] applied to n-player general-sum extensive-form games, establishing that it is hindsight-rational with respect to a specific set of deviation functions, which the authors coin blind counterfactual deviations. In subsequent recent work, Morrill et al. [36] extend their prior work [37] by identifying a general class of deviations—called behavioral deviations—that induce equilibria that can be found through uncoupled no-regret learning dynamics. Behavioral deviations are defined as those specifying an action transformation independently at each information set of the game. As the authors note, the deviation functions involved in the definition of EFCE do not fall under that category. A particular class of behavioral deviation functions—called causal partial sequence deviations—induces solution concepts that are (subsets of) EFCEs. Thus, their result begets an alternative set of no-regret learning dynamics that converge to EFCE, based on a different set of deviation functions than those we use in this article.

2 PRELIMINARIES

In this section, we provide some standard definitions related to extensive-form games and regret minimization that will be employed in the remainder of the article. A more comprehensive treatment of basic concepts in the theory of extensive-form games can be found in the book by Shoham and Leyton-Brown [43], and an introduction to the theory of learning in games can be found in the book by Cesa-Bianchi and Lugosi [9].

2.1 Mathematical Notation and Algorithmic Conventions

In this article, we adopt the following notational and algorithmic conventions:

We denote the set of real numbers as \( \mathbb {R} \), the set of nonnegative real numbers as \( \mathbb {R}_{\ge 0} \), and the set \( \lbrace 1,2,\dots \rbrace \) of positive integers as \( \mathbb {N}_{\gt 0} \).
The set \( \lbrace 1,\ldots ,n\rbrace \), where \( n\in \mathbb {N}_{\gt 0} \), is compactly denoted as \( [n] \); the empty set as \( \emptyset \).
Given a set S, we denote its convex hull with the symbol \( \text {co}(S) \).
Vectors and matrices are marked in bold.
Given a discrete set \( S = \lbrace s_1,\dots ,s_n\rbrace \), we denote as \( \mathbb {R}^{S} \) (respectively, \( \mathbb {R}_{\ge 0}^{S} \)) the set of real (respectively, nonnegative real) \( |S| \)-dimensional vectors whose entries are denoted as \( \mathbf {x}[s_1], \dots ,\mathbf {x}[s_n] \). Given an element \( s \in S \), we denote by \( \mathbf {e}_s \in \mathbb {R}^{S} \) the canonical basis vector whose entries are all zeros except for \( \mathbf {e}_s[s] = 1 \).
Similarly, given a discrete set S, we denote as \( \mathbb {R}^{|S|\times |S|} \) (respectively, \( \mathbb {R}_{\ge 0}^{|S|\times |S|} \)) the set of real (respectively, nonnegative real) \( |S|\times |S| \) square matrices \( \mathbf {M} \) whose entries are denoted as \( \mathbf {M}[s_r,s_c] \) (\( s_r,s_c\in S \)), where \( s_r \) corresponds to the row index and \( s_c \) to the column index.
Given a discrete set S, we denote by \( \Delta ^{S} \) the simplex \( \Delta ^{S} := \lbrace \mathbf {x}\in \mathbb {R}_{\ge 0}^{S}: \sum _{s\in S} \mathbf {x}[s] = 1\rbrace \). The symbol \( \Delta ^n \), with \( n\in \mathbb {N}_{\gt 0} \), is used to mean \( \Delta ^{[n]} \).
Given a discrete set S, we use the symbol \( \mathbb {S}^{S} \subseteq \mathbb {R}_{\ge 0}^{|S|\times |S|} \) to denote the set of stochastic matrices, that is, nonnegative square matrices whose columns all sum up to 1. The symbol \( \mathbb {S}^n \), where \( n\in \mathbb {N}_{\gt 0} \), is used to mean \( \mathbb {S}^{[n]} \).
Given two functions \( f:X\rightarrow Y \) and \( g:Y\rightarrow Z \), we denote by \( g\circ f: X\rightarrow Z \) their composition \( \mathbf {x} \mapsto g(f(x)) \).
Given a set S and a function f, the image of S via f is denoted as \( f(S) := \lbrace f(s): s\in S\rbrace \).
Given a proposition \( \text {P} \), we denote with \( 1\!\!1 [\text {P}] \) the indicator function of that proposition: \( 1\!\!1 [\text {P}] = 1 \) if \( \text {P} \) is true, and \( 1\!\!1 [\text {P}] =0 \) if not.
Given a partially ordered set \( (S, \prec) \) and two elements \( s,s^{\prime }\in S \), we use the standard derived symbols \( s \preceq s^{\prime } \) to mean that \( (s = s^{\prime }) \vee (s \prec s^{\prime }) \), \( s \succ s^{\prime } \) to mean that \( s^{\prime } \prec s \), and \( s \succeq s^{\prime } \) to mean that \( s^{\prime } \preceq s \). Furthermore, we use the crossed symbols \( \not\prec , \not\preceq , \not\succ \), and \( \not\succeq \) to mean that the relations \( \prec ,\preceq ,\succ \), and \( \succeq \) (respectively) do not hold.
Several of the algorithms presented in this article take as input, give as output, or otherwise manipulate, linear functions. Therefore, to study the complexity of our routines, it is necessary to settle on a representation for such linear functions. Unless otherwise specified, we will always assume that a linear function f is stored in memory using coordinates relative to the canonical basis of their domain and codomain and call that representation the \( {canonical representation} \) of f, denoted \( \langle f\rangle \). Specifically:
- If f is a linear function from \( \mathbb {R}^{S} \) (for some discrete set S) to \( \mathbb {R} \), then its canonical representation \( \langle f\rangle \) is the (unique) vector \( \mathbf {v} \in \mathbb {R}^{S} \) such that \( \begin{equation*} f(\mathbf {x}) = \mathbf {v}^\top \mathbf {x}\qquad \forall \ \mathbf {x}\in \mathbb {R}^{S}, \end{equation*} \) where \( \top \) denotes transposition.
- If f is a linear function from \( \mathbb {R}^{S} \) to \( \mathbb {R}^{S} \) (for some discrete set S), then its canonical representation \( \langle f\rangle \) is the (unique) matrix \( \mathbf {M}\in \mathbb {R}^{|S|\times |S|} \) such that \( \begin{equation*} f(\mathbf {x}) = \mathbf {M} \mathbf {x} \qquad \forall \ \mathbf {x} \in \mathbb {R}^{S}. \end{equation*} \)
- If F is a linear functional, mapping linear functions \( \phi :\mathbb {R}^{S}\rightarrow \mathbb {R}^{S} \) to reals, then its canonical representation \( \langle F\rangle \) is the (unique) matrix \( \mathbf {\Lambda } \in \mathbb {R}^{|S|\times |S|} \) such that \( \begin{equation} F(\phi) = \sum _{s_r, s_c\in S} \mathbf {\Lambda }[s_r, s_c]\cdot \langle \phi \rangle [s_r, s_c]\qquad \forall \ \phi :\mathbb {R}^{S}\rightarrow \mathbb {R}^{S}, \end{equation} \) where \( \langle \phi \rangle \) is the canonical representation of \( \phi \).

2.2 Extensive-form Games

In this subsection, we introduce some standard concepts, terminology, and notation that we will use to deal with extensive-form games. A summary of the notation we introduce can be found in Table 1. Examples 2.1 and 2.5 demonstrate some of the notation in a simple extensive-form game.

Table 1.

Symbol	Description
\( \mathcal {H} \)	Set of nodes of the game tree.
\( \mathcal {H}^{(i)} \)	Set of nodes at which Player i acts.
\( \mathcal {A}(h) \)	Actions available to the player acting at \( h \in \mathcal {H} \) (empty set if h is a terminal node).
\( {\mathcal{J}^{(i)}} \)	Information partition of Player i.
\( \mathcal {A}(j) \)	Set of actions available at any node in the information set j.
\( \mathcal {Z} \)	Set of terminal nodes (leaves of the game tree).
\( {u^{(i)}} (z) \)	Payoff of Player i at terminal node \( z \in \mathcal {Z} \).
\( p_c(z) \)	Product of probabilities of all the stochastic events on the path from the root to terminal node \( z\in \mathcal {Z} \).
\( \Sigma^(i) \)	Set of sequences of Player i, defined as \( {\Sigma^(i)} := \lbrace (j,a) : j \in {\mathcal{J}^{(i)}} , a \in \mathcal {A}(j)\rbrace \cup \lbrace \varnothing \rbrace \),
\( \varnothing \)	where the special element \( \varnothing \) is called the empty sequence.
\( {\Sigma^(i)} _* \)	Set of sequences of Player i, excluding the empty sequence \( \varnothing \).
\( \sigma ^{(i)}(z) \)	Last sequence of Player i encountered on the path from the root to node \( z\in \mathcal {Z} \).
\( \sigma ^{(i)}(j) \)	Last sequence of Player i on the path from the root to any node in \( j \in {\mathcal{J}^{(i)}} \).
\( j \prec j^{\prime } \)	Information set \( j\in {\mathcal{J}^{(i)}} \) is an ancestor of \( j^{\prime }\in {\mathcal{J}^{(i)}} \), that is, there exists a directed path in the game tree connecting a node \( h\in j \) to some node \( h^{\prime }\in j^{\prime } \).
\( \sigma \prec \sigma ^{\prime } \)	Sequence \( \sigma \) precedes sequence \( \sigma ^{\prime } \), where \( \sigma ,\sigma ^{\prime } \) belong to the same player.
\( \sigma \succeq j \)	Sequence \( \sigma =(j^{\prime },a^{\prime }) \) is such that \( j^{\prime } \succeq j \).
\( {\Sigma^(i)} _j \)	Sequences at \( j \in {\mathcal{J}^{(i)}} \) and all of its descendants, \( {\Sigma^(i)} _j:= \lbrace \sigma \in {\Sigma^(i)} : \sigma \succeq j\rbrace \).
\( {\mathcal{Q}^{(i)}} \)	Sequence-form strategies of Player i (Definition 2.2).
\( {\mathcal{Q}^{(i)}} _j \)	Sequence-form strategies for the subtree\( ^{{{2}}} \) rooted at \( j \in {\mathcal{J}^{(i)}} \) (Definition 2.3).
\( {\Pi^{(i)}} \)	Deterministic sequence-form strategies of Player i.
\( {\Pi^{(i)}} _j \)	Deterministic sequence-form strategies for the subtree\( ^{{{2}}} \) rooted at \( j \in {\mathcal{J}^{(i)}} \).
\( \Pi \)	Set of joint deterministic sequence-form strategies, \( \Pi := \times _{i\in [n]}{\Pi^{(i)}} \).

View Table

Table 1. Summary of Game-theoretic Notation Used in this Article

An extensive-form game is played on an oriented rooted game tree. We denote by \( \mathcal {H} \) the set of nodes of the game tree. Each node \( h\in \mathcal {H} \) that is not a leaf of the game tree is called a decision node and has an associated player that acts at that node. In an n-player extensive-form game, the set of valid players is the set \( [n]\cup \lbrace c\rbrace \), where c denotes the chance player—a fictitious player that selects actions according to a known fixed probability distribution and models exogenous stochasticity of the environment (for example, a roll of the dice or drawing a card from a deck). The player that acts at h is free to pick any one of the actions \( \mathcal {A}(h) \) that are available at h. For each possible action \( a\in \mathcal {A}(h) \), an edge connects h to the node to which the game transitions whenever action a is picked at h. Given a player \( i \in [n]\cup \lbrace c\rbrace \), we denote with \( \mathcal {H}^{(i)} \subseteq \mathcal {H} \) the set of all decision nodes that belong to Player i.

Leaves of the game tree are called terminal nodes and represent the outcomes of the game. As such, they are not associated with any acting player, and the set of actions is conventionally set to the empty set. The set of all terminal nodes in the game is denoted with the letter \( \mathcal {Z} \). So, the set of all nodes in the game tree is the disjoint union \( \mathcal {H}= \mathcal {H}^{(1)} \cup \dots \cup \mathcal {H}^{(n)}\cup \mathcal {Z} \). When the game transitions to a terminal node \( z\in \mathcal {Z} \), payoffs are assigned to each of the non-chance players by the set of functions \( \lbrace {u^{(i)}} :\mathcal {Z}\rightarrow \mathbb {R}\rbrace _{i\in [n]} \). Furthermore, we let \( p_c : \mathcal {Z}\rightarrow (0,1) \) denote the function assigning each terminal node \( z\in \mathcal {Z} \) to the product of probabilities of chance moves encountered on the path from the root of the game tree to z.

2.2.1 Imperfect Information.

To model imperfect information, the nodes \( \mathcal {H}^{(i)} \) of each player \( i\in [n] \) are partitioned into a collection \( {\mathcal{J}^{(i)}} \) of set of nodes, called information sets. Each information set \( j\in {\mathcal{J}^{(i)}} \) groups together nodes that Player i cannot distinguish between when he or she acts there. Since a player always knows what actions are available at a decision node, any two nodes \( h,h^{\prime } \) belonging to the same information set j must have the same action set, that is, \( \mathcal {A}(h) = \mathcal {A}(h^{\prime }) \) for all \( h,h^{\prime }\in j \). For that reason, we can safely overload notation and write \( \mathcal {A}(j) \) to mean the set of actions available at any node that belongs to information set j.

As is standard in the literature, we assume that the extensive-form game has perfect recall, that is, information sets are such that no player forgets information once acquired. An immediate consequence of perfect recall is that, for any player \( i\in [n] \) and any two nodes \( h,h^{\prime } \) in the same information set \( j \in {\mathcal{J}^{(i)}} \), the sequence of Player i’s actions encountered along the path from the root to h and from the root to \( h^{\prime } \) must coincide (or otherwise Player i would be able to distinguish among the nodes, since the player remembers all of the actions they played in the past). This suggests the following partial ordering \( \prec \) on the set \( {\mathcal{J}^{(i)}} \): We write \( j\prec j^{\prime } \)—and say that \( j\in {\mathcal{J}^{(i)}} \) is an ancestor of \( j^{\prime }\in {\mathcal{J}^{(i)}} \) or equivalently that \( j^{\prime } \) is a descendant of j—if there exist nodes \( h^{\prime }\in j^{\prime } \) and \( h\in j \) such that the path from the root of the game tree to \( h^{\prime } \) passes through h.

It is a well-known consequence of perfect recall that the partially ordered set \( ({\mathcal{J}^{(i)}} ,\prec) \) is a forest for any player \( i\in [n] \), in the precise sense that, given any information set \( j\in {\mathcal{J}^{(i)}} \), the set of all of its predecessors forms a chain (that is, it is well-ordered by \( \prec \)).

2.2.2 Sequences.

For any player \( i\in [n] \), and given an information set \( j\in {\mathcal{J}^{(i)}} \) and an action \( a\in \mathcal {A}(j) \), we denote as \( \sigma =(j,a) \) the sequence of Player i’s actions encountered on the path from the root of the game tree down to action a (included) at any node in information set j. In perfect-recall extensive-form games, such a sequence is guaranteed to be uniquely determined, because paths that reach decision nodes belonging to the same information set identify the same sequence of Player i’s actions. A special element \( \varnothing \) denotes the empty sequence of Player i. Then, the set of Player i’s sequences is defined as \( \begin{equation*} {\Sigma^(i)} := \lbrace (j,a): j\in {\mathcal{J}^{(i)}} , a\in \mathcal {A}(j)\rbrace \cup \lbrace \varnothing \rbrace . \end{equation*} \) Moreover, we let \( {\Sigma^(i)} _* := {\Sigma^(i)} \setminus \lbrace \varnothing \rbrace \) be the set of all sequences of Player i other than the empty one.

Given a node \( h\in \mathcal {H} \), we denote by \( \sigma ^{(i)}(j)\in {\Sigma^(i)} \) the last sequence (information set-action pair) of Player i encountered on the path from the root of the game tree to any node in j, also known as j’s parent sequence. If Player i does not act before h, then \( \sigma ^{(i)}(h) \) is set to the empty sequence \( \varnothing \). If \( \sigma ^{(i)}(j) = \varnothing \), then we say that information set j is a root information set of Player i, while, whenever \( \sigma ^{(i)}(j)=(j^{\prime },a) \), we say that \( j^{\prime } \) is the immediate predecessor of j, or equivalently that information set j is immediately reachable from sequence \( \sigma ^{(i)}(j) \). That nomenclature is supported by the observation that \( j^{\prime }\prec j \), and that Player i does not need to take other actions after choosing a at \( j^{\prime } \) to reach j. Analogously, for all \( z\in \mathcal {Z} \), we define \( \sigma ^{(i)}(z)\in {\Sigma^(i)} \) as the last sequence of Player i’s actions encountered the path from the root of the game tree to terminal node z (notice that \( \sigma ^{(i)}(z) = \varnothing \) whenever Player i never plays on the path from the root to z).

Just like information sets, there exists a natural partial ordering on sequences, which we also denote with the same symbol \( \prec \). For every \( i \in [n] \) and any pair of sequences \( \sigma ,\sigma ^{\prime } \in {\Sigma^(i)} \), the relation \( \sigma \prec \sigma ^{\prime } \) holds if \( \sigma =\varnothing \ne \sigma ^{\prime } \), or if the sequences are of the form \( \sigma =(j,a),\sigma ^{\prime }=(j^{\prime },a^{\prime }) \), and the set of Player i’s actions encountered on the path from the root of the tree to any node in \( j^{\prime } \) includes playing action a at one of information set j’s nodes. As for information sets, it is a direct consequence of the perfect recall assumption that the partially ordered set \( ({\Sigma^(i)} , \prec) \) is a forest. Finally, we introduce the overloaded notation \( \sigma \succeq j \) (or equivalently \( j \preceq \sigma \)), defined for any player \( i \in [n] \), information set \( j\in {\mathcal{J}^{(i)}} \), and sequence \( \sigma \in {\Sigma^(i)} \), to mean that the sequence of Player i ’s actions that is denoted by \( \sigma \) must lead the player to pass through (some node in) j; formally \( \sigma = (j^{\prime },a^{\prime }) \in {\Sigma^(i)} _* \wedge j^{\prime }\succeq j \). With that, we let \( {\Sigma^(i)} _j := \lbrace \sigma \in {\Sigma^(i)} : \sigma \succeq j\rbrace \subseteq {\Sigma^(i)} \) be the set of Player i’s sequences that terminate at j or any of its descendant information sets.

Example 2.1.

To illustrate some of the concepts and notation described so far, we consider the simple two-player extensive-form game in Figure 1, in which black round nodes belong to Player 1, and white round nodes belong to Player 2. The gray clusters of nodes identify the information sets. Since we chose different action numbers for different information sets, there exists a one-to-one correspondence between actions and sequences, and we will sometimes refer to sequences using the corresponding action number. For example, we will sometimes refer to sequence “3” to mean sequence \( (\text{B},{\mathsf{3}}) \), sequence “8” to mean sequence \( (\text {D},{\mathsf{8}}) \), and so on. Player 1 has four information sets—denoted a, b, c, and d—with two actions each. Player 2 only has two information sets, r and s, each with two actions. Information set d of Player 1 contains two nodes and models Player 1’s lack of knowledge of the action taken by Player 2 at information set s. The partial ordering between information sets for Player 1 is \( \text {A}\prec \text {B},\text {A}\prec \text {C},\text {A}\prec \text {D} \). Moreover, we have that \( \sigma ^{(1)}(\text {A})=\varnothing \), \( \sigma ^{(1)}(\text {B})=\sigma ^{(1)}(\text {C})={\mathsf{1}} \), and \( \sigma ^{(1)}(\text {D})={\mathsf{2}} \). For the terminal node z in the picture, \( \sigma ^{(1)}(z)={\mathsf{3}} \). Finally, we have that \( \Sigma ^{(1)}_{\text {A}}=\Sigma _*^{(1)}=\lbrace {\mathsf{1}},{\mathsf{2}},\dots ,{\mathsf{8}}\rbrace \), \( \Sigma ^{(1)}_{\text {B}}=\lbrace {\mathsf{3}},{\mathsf{4}}\rbrace \), \( \Sigma ^{(1)}_{\text {C}}=\lbrace {\mathsf{5}},{\mathsf{6}}\rbrace \), and \( \Sigma ^{(1)}_{\text {D}}=\lbrace {\mathsf{7}},{\mathsf{8}}\rbrace \).

Fig. 1. (Left) Example of an extensive-form game with two players. Black round nodes belong to Player 1, white round nodes belong to Player 2. Small white square nodes represent terminal nodes. The gray partitions represent the information sets of the game. The numbers on the edges identify each of Player 1’s actions. (Right): Forest of information sets of Player 1, corresponding to the partially ordered set \( ({\mathcal{J}^{(i)}} [1],\prec) \) .

2.2.3 Sequence-form Strategies.

Conceptually, a strategy for a player specifies a probability distribution over the actions at each information set for that player. So, perhaps the most intuitive representation of a strategy, called a behavioral strategy in the literature, is as a vector that assigns to each information set-action pair \( (j,a)\in {\Sigma^(i)} _* \) the probability of picking action a at information set j. That representation has a major drawback: The probability of reaching any given terminal node \( z \in \mathcal {Z} \) is expressed as the product of several entries in the vector (one per each action on the path from the root of the game tree to z), rendering critical quantities—including the expected utility of a player—a non-convex function of the behavioral strategies of the players. As is standard in the literature, to soundly overcome the issue of non-convexity, throughout this article, we will exclusively use a different representation of strategies, known as the sequence-form representation [30, 41, 47].

Like behavioral strategies, a sequence-form strategy² for Player \( i \in [n] \) is a vector \( {\boldsymbol{q}} \in \mathbb {R}^{{\Sigma^(i)} }_{\ge 0} \). However, unlike behavioral strategies, each entry \( {\boldsymbol{q}} [(j,a)] \) of a sequence-form strategy \( {\boldsymbol{q}} \) contains the product of the probabilities of playing all of Player i’s actions on the path from the root of the game tree down to action a at information set j included. Furthermore, the entry \( {\boldsymbol{q}} [\varnothing ] \) corresponding to the empty sequence is defined as the constant value 1.

To ensure consistency, all sequence-form strategies must satisfy the probability-mass-conservation constraints \( \begin{equation*} {\boldsymbol{q}} {}[\varnothing ]=1, \hspace{28.45274pt} {\boldsymbol{q}} {}[\sigma ^{(i)}(j)]=\sum _{a\in \mathcal {A}(j)}{\boldsymbol{q}} {}[(j,a)], \hspace{8.5359pt}\forall \, j\in {\mathcal{J}^{(i)}} . \end{equation*} \) The above probability-mass-conservation constraints are linear, and therefore the set of sequence-form strategies is a convex polytope, suggesting the following definition:

Definition 2.2.

The sequence-form strategy polytope for Player \( i \in [n] \) is the convex polytope \( \begin{equation*} {\mathcal{Q}^{(i)}} := \big\lbrace {\boldsymbol{q}} \in \mathbb {R}^{{\Sigma^(i)} }_{\ge 0}: {\boldsymbol{q}} {}[\varnothing ]=1\hspace{8.5359pt}\text{and} \hspace{8.5359pt} {\boldsymbol{q}} {}[\sigma ^{(i)}(j)]=\sum _{a\in \mathcal {A}(j)}{\boldsymbol{q}} {}[(j,a)], \hspace{2.84544pt}\forall \, j\in {\mathcal{J}^{(i)}} \big\rbrace . \end{equation*} \)

As we mentioned in Section 2.2.2, the partially ordered set \( ({\mathcal{J}^{(i)}} ,\prec) \) is a forest. Thus, it makes sense to consider partial strategies that only specify behavior at an information set j and all of its descendants \( j^{\prime } \succ j \). We make that formal through the following definition:

Definition 2.3.

Let \( i\in [n] \) be a player and \( j\in {\mathcal{J}^{(i)}} \) be an information set for Player i. The set of sequence-form strategies for the subtree rooted at j, denoted \( {\mathcal{Q}^{(i)}} _j \), is the set of all vectors \( {\boldsymbol{q}} \in \mathbb {R}_{\ge 0}^{{\Sigma^(i)} _j} \) such that probability-mass-conservation constraints hold at information set j and all of its descendants \( j^{\prime } \succ j \), specifically, (2) \( \begin{equation} {\mathcal{Q}^{(i)}} _j:= \big\lbrace {\boldsymbol{q}} \in \mathbb {R}_{\ge 0}^{{\Sigma^(i)} _j} : \sum _{a\in \mathcal {A}(j)}\! {\boldsymbol{q}} [(j,a)]=1, \hspace{8.5359pt}\text{and}\hspace{8.5359pt} {\boldsymbol{q}} [\sigma ^{(i)}(j^{\prime })]=\sum _{a\in \mathcal {A}(j^{\prime })}{\boldsymbol{q}} {}[(j^{\prime },a)] \quad \forall \, j^{\prime }\succ j \big\rbrace . \end{equation} \)

2.2.4 Deterministic Sequence-form Strategies.

Deterministic strategies are those that select, at each information set at which the player acts, exactly one action with probability one. Since the probability mass on each action is either 0 or 1, the set of deterministic sequence-form strategies for Player i—which we denote with the capital letter \( {\Pi^{(i)}} \)—corresponds exactly with the set of all sequence-form strategies whose components are all either 0 or 1.

Definition 2.4.

The set of deterministic sequence-form strategies for Player \( i \in [n] \) is the set \( \begin{equation*} {\Pi^{(i)}} := {\mathcal{Q}^{(i)}} \cap \lbrace 0,1\rbrace ^{{\Sigma^(i)} }. \end{equation*} \) Similarly, the set of deterministic sequence-form strategies for the subtree\( ^{{{2}}} \) rooted at j is \( \begin{equation*} {\Pi^{(i)}} _j := {{\mathcal{Q}^{(i)}} _j} \cap \lbrace 0,1\rbrace ^{{\Sigma^(i)} _j}. \end{equation*} \)

The set of deterministic sequence-form strategies corresponds one-to-one to the game-theoretic notion of reduced normal-form strategies (e.g., von Stengel [47, Section 4]). Furthermore, Kuhn’s Theorem [32] implies that \( \begin{equation*} {\mathcal{Q}^{(i)}} = \text {co}({\Pi^{(i)}}),\quad {\mathcal{Q}^{(i)}} _j = \text {co}({\Pi^{(i)}} _j) \qquad \quad \forall \,i\in [n], j\in {\mathcal{J}^{(i)}} . \end{equation*} \)

When it is important to emphasize that an arbitrary sequence-form strategy \( {\boldsymbol{q}} \in {\mathcal{Q}^{(i)}} \) (or \( {\boldsymbol{q}} \in {\mathcal{Q}^{(i)}} _j \) for some \( j\in {\mathcal{J}^{(i)}} \)) of Player \( i \in [n] \) need not be a deterministic sequence-form strategy, we will say that \( {\boldsymbol{q}} \) is a mixed sequence-form strategy.

Given a sequence-form strategy \( {\boldsymbol{q}} \in {\mathcal{Q}^{(i)}} \), it is possible to build an unbiased sampling scheme resulting in a (random) deterministic strategy \( \mathbf {\pi }\in {\Pi^{(i)}} \) such that \( \mathbb {E}[\mathbf {\pi }]={\boldsymbol{q}} \). A natural unbiased sampling procedure is the following: Start from any root information set of Player i, that is, an information set \( j \in {\mathcal{J}^{(i)}} \) such that \( \sigma ^{(i)}(j) = \varnothing \). Given any information set \( j \in {\mathcal{J}^{(i)}} \), an action \( a_j\in \mathcal {A}(j) \) is sampled with probability \( {\boldsymbol{q}} {}[(j,a_j)]/{\boldsymbol{q}} {}[\sigma ^{(i)}(j)] \); then, the same procedure is applied recursively to all information sets immediately reachable from sequence \( (j,a_j) \), that is, the information sets \( j^{\prime } \in {\mathcal{J}^{(i)}} \) such that \( \sigma ^{(i)}(j^{\prime }) = (j,a_j) \). The process is repeated for all the root information sets of Player i. The final deterministic sequence-form strategy \( \mathbf {\pi } \) is obtained by setting \( \mathbf {\pi }[(j,a_j)]=1 \) for each information set \( j \in {\mathcal{J}^{(i)}} \) visited during the procedure, and all other entries equal to 0.

Finally, we denote as \( \Pi := \times _{i\in [n]}{\Pi^{(i)}} \) the set of joint deterministic sequence-form strategies of all the players. Therefore, an element of \( \Pi \) is a tuple \( \mathbf {\pi } = (\mathbf {\pi }^{(1)},\dots ,\mathbf {\pi }^{(n)}) \) specifying a deterministic sequence-form strategy \( \pi^{(i)} \) for each player \( i \in [n] \).

Example 2.5.

Continuing Example 2.1, in Figure 2, we provide one (mixed) sequence-form strategy \( \mathbf {q}\in {\mathcal{Q}^{(i)}} [1] \) and five deterministic sequence-form strategies \( \lbrace \mathbf {q}_{{\mathsf{135}}},\mathbf {q}_{{\mathsf{136}}},\mathbf {q}_{{\mathsf{145}}},\mathbf {q}_{{\mathsf{27}}},\mathbf {q}_{{\mathsf{28}}}\rbrace \subseteq {\Pi^{(i)}} [1] \) for the small game in Figure 1 (left). One can check that these vectors are valid sequence-form strategies by verifying that the probability-mass-conservation constraints of Definition 2.2 hold. Let us consider the mixed sequence-form strategy \( {\boldsymbol{q}} \). There, \( {\boldsymbol{q}} [(\text {A},{\mathsf{1}})]={\boldsymbol{q}} [(\text {A},{\mathsf{2}})]=0.5 \), and therefore Player 1 will select between actions \( {\mathsf{1}} \) and \( {\mathsf{2}} \) at information set \( \text {A} \) uniformly at random. Suppose Player 1 selects action \( {\mathsf{1}} \). Then, if Player 1 reached information set b, then she would select actions \( {\mathsf{3}} \) and \( {\mathsf{4}} \) with probability \( 0.25 / 0.5=0.5 \) each. However, if Player 1 reached information set c, then she would choose action \( {\mathsf{5}} \) with probability \( 0.1/0.5=0.2 \), and action \( {\mathsf{6}} \) with probability \( 0.4/0.5=0.8 \). Analogously, if Player 1 played action \( {\mathsf{2}} \) at information set a, then upon reaching information set d she would play action \( {\mathsf{8}} \) with probability \( 0.5/0.5=1 \). In general, the probability of playing action a at a generic information set j can be obtained by dividing \( {\boldsymbol{q}} [(j,a)] \) by \( {\boldsymbol{q}} [\sigma ^{(i)}(j)] \). As a second example, consider the deterministic sequence-form strategy \( {\boldsymbol{q}} _{{\mathsf{136}}} \). When Player 1 plays according to that strategy, she will always choose action \( {\mathsf{1}} \) at information set a, action \( {\mathsf{3}} \) at information set b, and action \( {\mathsf{6}} \) at information set c. It is impossible for the player to reach information set d given her strategy at a and correspondingly \( {\boldsymbol{q}} _{{\mathsf{136}}}[{\mathsf{7}}] = {\boldsymbol{q}} _{{\mathsf{136}}}[{\mathsf{8}}] = 0 \).

Fig. 2. Examples of sequence-form strategies for Player i in the game of Figure 1 (left).

2.3 Extensive-form Correlated Equilibrium (EFCE)

Extensive-form correlated equilibrium has been proposed by von Stengel and Forges [48] as the natural counterpart to (normal-form) correlated equilibrium in extensive-form games. In an EFCE, before the beginning of the game the mediator draws a recommended action for each of the possible information sets that players may encounter in the game, according to some probability distribution defined over joint reduced normal-form strategies. These recommendations are not immediately revealed to each player. Instead, the mediator incrementally reveals relevant action recommendations as players reach new information sets. At any information set, the acting player is free to deviate from the recommended action, but doing so comes at the cost of future recommendations, which are no longer issued if the player deviates. In an EFCE, the recommended behavior is incentive-compatible for each player, that is, no player is strictly better off ever deviating from any of the mediator’s recommended actions.

Before introducing the formal definition of EFCE, let us mention that one could directly extend the definition of CE [1] to extensive-form games, thereby obtaining what is usually called normal-form correlated equilibrium(NFCE) of the extensive-form game. In this case, the mediator draws and recommends a complete reduced normal-form strategy to each player before the game starts. Then, before the beginning of the game, each player can decide whether to follow the recommended plan or deviate to an arbitrary strategy they desire. For arbitrary extensive-form games with perfect recall, the following inclusion of the set of equilibria holds: \( \text {NFCE}\subseteq \text {EFCE} \).

Multiple equivalent definitions of EFCE can be given. In this article, we follow the equivalent formulation given by Farina et al. [15] based on the concept of trigger agents introduced by Gordon et al. [20] and Dudík and Gordon [13]. In what follows, we will assume that an extensive-form game has been fixed:

Definition 2.6

(Trigger Agent).

Let \( i\in [n] \) be a player, let \( \hat{\sigma }= (j,a) \in {\Sigma^(i)} _* \), and let \( \hat{\mathbf {\pi }} \in {\Pi^{(i)}} _j \). The \( (\hat{\sigma },\hat{\mathbf {\pi }}) \)-trigger agent is the agent that plays the game as Player i according to the following rules:

If the trigger agent has never been recommended to play action a at information set j, then the trigger agent will follow whatever recommendation is issued by the mediator.
When the trigger agent reaches information set j and is recommended to play action a, we say that the trigger agent “gets triggered” by the trigger sequence \( \hat{\sigma }= (j,a) \). This means that, from that point on, the trigger agent will disregard the recommendations and play according to the continuation strategy \( \hat{\mathbf {\pi }} \) from information set j onward (that is, at j and all of its descendant information sets).

An EFCE is a probability distribution \( \mathbf {\mu } \in \Delta ^{\Pi } \) over \( \Pi \) such that for any player \( i \in [n] \), trigger sequence \( \hat{\sigma } = (j,a) \in \Sigma _*^{(i)} \), and continuation strategy \( \hat{\mathbf {\pi }} \in {\Pi^{(i)}} _j \), the expected utility of the \( (\hat{\sigma },\hat{\mathbf {\pi }}) \)-trigger agent is not strictly greater than the expected utility that Player i would obtain by always following all of the mediator’s recommendations.

To turn the above condition into an analytic expression, it is useful to introduce the following additional quantities: Given a distribution \( \mathbf {\mu } \in \Delta ^{\Pi } \), we let \( r_{\mathbf {\mu }}(z) \) be the probability that the game ends in terminal node \( z \in \mathcal {Z} \) when all players follow recommendations issued by the mediator according to \( \mathbf {\mu } \); in particular, for every \( z \in \mathcal {Z} \), it holds: \( \begin{equation*} r_{\mathbf {\mu }}(z) := \sum _{\begin{matrix} ({\pi^{(1)}},\dots ,{\pi^{(n)}}) \in \Pi\\ {\pi^{(i)}}[\sigma ^{(i)}(z)] = 1 \,\,\, \forall i \in [n] \end{matrix}} \mathbf {\mu }[({\pi^{(1)}},\dots ,{\pi^{(n)}})], \end{equation*} \) where the summation is over all joint strategies \( (\pi^{(1)},\dots ,\pi^{(n)}) \in \Pi \) such that terminal node z is reachable when each player \( i \in [n] \) plays according to \( \pi^{(i)} \). Additionally, given a trigger sequence \( \hat{\sigma }= (j,a) \in {\Sigma^(i)} _* \) for a player \( i \in [n] \) and a continuation strategy \( \hat{\mathbf {\pi }} \in {\Pi^{(i)}} _j \), we let \( r^{(i)}_{\mathbf {\mu }, \hat{\sigma } \rightarrow \hat{\mathbf {\pi }}}(z) \) be the probability with which the \( (\hat{\sigma },\hat{\mathbf {\pi }}) \)-trigger agent reaches terminal node z. In particular, for every terminal node \( z \in \mathcal {Z} \) such that \( \sigma ^{(i)}(z) \succeq j \) it holds that: \( \begin{equation*} r^{(i)}_{\mathbf {\mu },\,\hat{\sigma } \rightarrow \hat{\mathbf {\pi }}}(z) := \left(\sum _{\begin{matrix} ({\pi^{(1)}},\dots ,{\pi^{(n)}}) \in \Pi\\ \pi [i^{\prime }][\sigma ^{(i^{\prime })}(z)] = 1 \,\,\, \forall i^{\prime }\ne i \\ {\pi^{(i)}}[\hat{\sigma }] = 1 \end{matrix}} \mathbf {\mu }[({\pi^{(1)}},\dots ,{\pi^{(n)}})] \right) \hat{\mathbf {\pi }} [ \sigma ^{(i)}(z) ] . \end{equation*} \)

We can now state the formal definition of EFCE and approximate EFCE.

Definition 2.7

(ε-EFCE; EFCE)

Given \( \epsilon \ge 0 \), a probability distribution \( \mathbf {\mu } \in \Delta ^{\Pi } \) is an \( \epsilon \)-approximate EFCE (or \( \epsilon \)-EFCE for short) if, for every player \( i \in [n] \), trigger sequence \( \hat{\sigma }= (j,a) \in {\Sigma^(i)} _* \), and continuation strategy \( \hat{\mathbf {\pi }} \in {\Pi^{(i)}} _j \), the expected utility of the \( (\hat{\sigma },\hat{\mathbf {\pi }}) \)-trigger agent is never larger than the expected utility that Player i would obtain by always following all of the mediator’s recommendations by more than the amount \( \epsilon \). In symbols, \( \begin{equation*} \sum _{\begin{matrix}z \in Z \\ \sigma ^{(i)}(z) \succeq \hat{\sigma }\end{matrix}} u^{(i)}(z)\, p_c(z)\, r_{\mathbf {\mu }}(z) \ge \sum _{\begin{matrix}{c}z \in Z\\ \sigma ^{(i)}(z) \succeq j\end{matrix}} u^{(i)}(z)\, p_c(z)\, r^{(i)}_{\mathbf {\mu },\, \hat{\sigma }\rightarrow \hat{\mathbf {\pi }}}(z) - \epsilon . \end{equation*} \) A probability distribution \( \mathbf {\mu } \in \Delta ^{\Pi } \) is an EFCE if it is a 0-EFCE.

2.4 Regret Minimization and Phi-regret Minimization

In this article, we will make heavy use of a mathematical object—one of the core abstractions in the field of online optimization—called a regret minimizer.

Definition 2.8.

Let \( \mathcal {X} \) be a subset of an Euclidean space with suitable dimension. A regret minimizer for \( \mathcal {X} \) is an abstract model for a decision maker that repeatedly interacts with a black-box environment. At each time t, the regret minimizer interacts with the environment through two operations:

NextElement has the effect that the regret minimizer will output an element \( \mathbf {x}^{t} \in \mathcal {X} \);
ObserveUtility\( (\ell ^t) \) provides the environment’s feedback to the regret minimizer, in the form of a linear² utility function \( \ell ^t : \mathcal {X}\rightarrow \mathbb {R} \) that evaluates how good the last-output point \( \mathbf {x}^t \) was. The utility function can depend adversarially on the outputs \( \mathbf {x}^1, \dots , \mathbf {x}^{t-1} \) of the regret minimizer, but not on \( \mathbf {x}^t \). ²

Calls to NextElement and ObserveUtility keep alternating to each other: First, the regret minimizer will output a point \( \mathbf {x}^1 \), then it will received feedback \( \ell ^1 \) from the environment, then it will output a new point \( \mathbf {x}^2 \), and so on. The decision making encoded by the regret minimizer is online, in the sense that at each time t, the output of the regret minimizer can depend on the prior outputs \( \mathbf {x}^1, \dots ,\mathbf {x}^{t-1} \) and corresponding observed utility functions \( \ell ^1,\dots ,\ell ^{t-1} \), but no information about future utilities is available. The objective for the regret minimizer is to output points so the cumulative regret (or simply regret) (3) \( \begin{equation} R^T := \max _{\mathbf {x}^*\in \mathcal {X}} \sum _{t=1}^T \Big (\ell ^t(\mathbf {x}^*) - \ell ^t(\mathbf {x}^t) \Big) \end{equation} \) grows asymptotically sublinearly in the time T. Many regret minimizers that guarantee a cumulative regret \( R^T = O(\sqrt {T}) \) at all times T for any convex and compact set \( \mathcal {X} \) are known in the literature (see, e.g., Cesa-Bianchi and Lugosi [9]).

A phi-regret minimizer is an extension of the concept of a regret minimizer introduced by Stoltz and Lugosi [44], building on previous work by Greenwald and Jafari [21].

Definition 2.9.

Given a set \( \mathcal {X} \) of points and a set \( \Phi \) of affine transformations \( \phi :\mathcal {X}\rightarrow \mathcal {X} \), a phi-regret minimizer relative to \( \Phi \) for the set \( \mathcal {X} \)—abbreviated in the term “\( \Phi \)-regret minimizer”—is an object with the same semantics and operations of a regret minimizer, but whose quality metric is its cumulative phi-regret relative to \( \Phi \) (or simply phi-regret relative to \( \Phi \), or \( \Phi \)-regret for short) (4) \( \begin{equation} R^T := \max _{\phi ^* \in \Phi } \sum _{t=1}^T \Big (\ell ^t(\phi ^*(\mathbf {x}^t)) - \ell ^t(\mathbf {x}^t) \Big), \end{equation} \) instead of the regular cumulative regret defined in (3). Once again, the goal for a phi-regret minimizer is to guarantee that its phi-regret grows asymptotically sublinearly as time T increases.

In the special case of the set of constant transformations \( \Phi ^\text{const} := \lbrace \mathcal {X}\ni \mathbf {x} \mapsto \hat{\mathbf {x}}: \hat{\mathbf {x}}\in \mathcal {X}\rbrace \), the definition of cumulative phi-regret (4) reduces to that of cumulative regret given in (3). So, a regret minimizer is a special case of a phi-regret minimizer.

A general construction by Gordon et al. [20] gives a way to construct a \( \Phi \)-regret minimizer for \( \mathcal {X} \) starting from any regret minimizer (in the sense of Definition 2.8) for the convex hull \( \text {co}(\Phi) \) of the set of functions \( \Phi \). Specifically, let \( \mathcal {R}_\Phi \) be a deterministic ² regret minimizer for the set of transformations \( \text {co}(\Phi) \) whose cumulative regret grows sublinearly and assume that every \( \phi \in \text {co}(\Phi) \) admits a fixed point \( \phi (\mathbf {x}) = \mathbf {x}\in \mathcal {X} \). Then, a \( \Phi \)-regret minimizer \( \mathcal {R} \) can be constructed starting from \( \mathcal {R}_\Phi \) as follows:

Each call to \( \mathcal {R}.\text {NEXTELEMENT} \) first calls NextElement on \( \mathcal {R}_\Phi \) to obtain the next transformation \( \phi ^t\in \text {co}(\Phi) \). Then, a fixed point \( \mathbf {x}^t = \phi ^t(\mathbf {x}^t) \) is computed and output.
Each call to \( \mathcal {R}.\text {OBSERVEUTILITY}(\ell ^t) \) with linear utility function \( \ell ^t \) constructs the linear utility function \( L^t: \phi \mapsto \ell ^t(\phi (\mathbf {x}^t)) \), where \( \mathbf {x}^t \) is the last-output strategy, and passes it to \( \mathcal {R}_\Phi \) by calling \( \mathcal {R}_\Phi .\text {OBSERVEUTILITY}(L^t) \).²

The proof of correctness of the above construction is deceptively simple, and we recall it next. Since \( \mathcal {R}_\Phi \) outputs transformations \( \phi ^1,\phi ^2,\dots \) and receives utilities \( \phi \mapsto \ell ^1(\phi (\mathbf {x}^1)), \phi \mapsto \ell ^2(\phi (\mathbf {x}^2)), \dots \), its cumulative regret \( R_\Phi ^T \) is \( \begin{equation*} R_\Phi ^T = \max _{\phi ^*\in {\text {co}(\Phi)}} \sum _{t=1}^T \Big (\ell ^t(\phi ^*(\mathbf {x}^t)) - \ell ^t(\phi ^t(\mathbf {x}^t)) \Big). \end{equation*} \) Hence, since \( \mathbf {x}^t = \phi ^t(\mathbf {x}^t) \) is a fixed point of \( \phi ^t \), we can write (5) \( \begin{align} R_\Phi ^T = \max _{\phi ^*\in {\text {co}(\Phi)}} \sum _{t=1}^T \Big (\ell ^t(\phi ^*(\mathbf {x}^t)) - \ell ^t(\mathbf {x}^t) \Big) \ge {\max _{\phi ^*\in \Phi } \sum _{t=1}^T \Big (\ell ^t(\phi ^*(\mathbf {x}^t)) - \ell ^t(\mathbf {x}^t) \Big)}, \end{align} \) where the inequality follows from the observation that \( \text {co}(\Phi) \supseteq \Phi \). The right-hand side is exactly the cumulative \( \Phi \)-regret \( R^T \) incurred by \( \mathcal {R} \), as defined in (4). So, because the regret cumulated by \( \mathcal {R}_\Phi \) grows sublinearly by hypothesis, then so does the \( \Phi \)-regret cumulated by \( \mathcal {R} \).

3 TRIGGER REGRET AND RELATIONSHIP WITH EFCE

In this section, we formalize a notion of trigger deviation function, building on an idea by Gordon et al. [20, Section 3]. We also introduce a connected notion of trigger regret minimization, which is an instance of phi-regret minimization, as recalled in Section 2.4. The central result of this section, Theorem 3.8, establishes a formal connection between EFCE and agents that minimize their trigger regret, thereby extending and generalizing the classic connection between correlated equilibrium and no-internal-regret in normal-form games [22] to the extensive-form game counterpart.

Definition 3.1

(Trigger Deviation Function).

Let \( \hat{\sigma }= (j,a) \in {\Sigma^(i)} _* \), and \( {\hat{\pi}} \in {\Pi^{(i)}} _j \). We call “trigger deviation function corresponding to trigger \( \hat{\sigma } \) and continuation strategy \( {\hat{\pi}} \)” any linear function \( f : \mathbb {R}^{{\Sigma^(i)} }\rightarrow \mathbb {R}^{{\Sigma^(i)} } \) whose effect on deterministic sequence-form strategies is as follows:

all strategies \( \pi\in {\Pi^{(i)}} \) that do not prescribe the sequence \( \hat{\sigma } \) are left unmodified. In symbols, \( \begin{equation} f(\pi) = \pi \quad \qquad \forall \ \pi \in {\Pi^{(i)}} : \pi{}[\hat{\sigma }] = 0; \end{equation} \)
all strategies \( \pi \in {\Pi^{(i)}} \) that prescribe sequence \( \hat{\sigma }= (j,a) \) are modified so the behavior at j and all of its descendants is replaced with the behavior prescribed by the continuation strategy \( {\hat{\pi}} \). In symbols, \( \begin{equation} f(\pi)[\sigma ] = {\left\lbrace \begin{array}{ll} \pi{}[\sigma ] & \text{if }\sigma \not\succeq j \\ {\hat{\pi}} {}[\sigma ] & \text{if }\sigma \succeq j, \\ \end{array}\right.} \quad \qquad \forall \ \sigma \in {\Sigma^(i)} , \pi \in {\Pi^{(i)}} : \pi{}[\hat{\sigma }] = 1. \end{equation} \)

Trigger deviation functions are simpler than the extensive-form transformations described by Gordon et al. [20]. Specifically, extensive-form transformations allow one to specify more than one trigger sequence (together with different continuation strategies, one for each specified trigger sequence), whereas our notion of trigger deviation functions only contemplates a single trigger sequence. Consequently, the set of all trigger deviation functions is significantly smaller, and simpler, than the set of all extensive-form transformations. The simpler structural properties of the set of trigger deviation functions, explored in Section 4.3, will enable us to construct an efficient regret minimizer for the convex hull of the set of all trigger deviation functions.

At this stage, it is technically unclear whether a linear function that satisfies Definition 3.1 exists for all valid choices of \( \hat{\sigma } \) and \( {\hat{\pi}} \). We show that this is indeed the case by explicitly exhibiting a linear function, which we call the canonical trigger deviation function. We start with a definition:

Definition 3.2.

Let \( \hat{\sigma }= (j,a)\in {\Sigma^(i)} _* \) and \( \mathbf {y}\in \mathbb {R}_{\ge 0}^{{\Sigma^(i)} _j} \). We denote with \( {\boldsymbol{M}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} \in \mathbb {R}_{\ge 0}^{|{\Sigma^(i)} |\times |{\Sigma^(i)} |} \) the matrix whose entries are defined as (8) \( \begin{equation} {\boldsymbol{M}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}}{}[\sigma _r, \sigma _c] = {\left\lbrace \begin{array}{ll} 1 & \text{if } \sigma _c \not\succeq \hat{\sigma }\text{ and } \sigma _r = \sigma _c \\ \mathbf {y}[\sigma _r] & \text{if } \sigma _c = \hat{\sigma }\text{ and } \sigma _r \succeq j \\ 0 & \text{otherwise}, \end{array}\right.} \qquad \qquad \forall \ \sigma _r,\sigma _c \in {\Sigma^(i)} . \end{equation} \) Furthermore, we denote with the symbol \( {{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} \) the linear map \( \mathbb {R}^{{\Sigma^(i)} }\ni \mathbf {x} \mapsto {\boldsymbol{M}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}}\,\mathbf {x}. \)

Intuitively, by recalling that the columns of a matrix are the images of the unit basis vectors under the linear map, we observe that \( {\boldsymbol{M}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} \) maps the canonical basis vector \( \mathbf {e}_{\hat{\sigma }} \) to \( \mathbf {y} \), the vector \( \mathbf {e}_\sigma \) is mapped to itself for each \( \sigma \not\succeq \hat{\sigma } \), and \( \mathbf {e}_{\sigma } \) is mapped to zero for every \( \sigma \succ \hat{\sigma } \).

In the following, we will focus on trigger deviation functions defined through the linear mapping of Equation (8). We call such deviation functions canonical trigger deviation functions.

Definition 3.3.

Let \( \hat{\sigma }= (j,a) \in {\Sigma^(i)} _* \) and \( {\hat{\pi}} \in {\Pi^{(i)}} _j \). The function \( {{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} \) is called the “canonical trigger deviation function corresponding to trigger \( \hat{\sigma } \) and continuation strategy \( {\hat{\pi}} \).” Furthermore, the set of all canonical trigger deviation functions is denoted with the symbol \( \begin{equation*} \psi^{(i)} := \big\lbrace {{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} : \hat{\sigma }= (j,a)\in {\Sigma^(i)} _*, {\hat{\pi}} \in {\Pi^{(i)}} _j \big\rbrace . \end{equation*} \)

Lemma 3.4.

For any \( \hat{\sigma }= (j,a) \in {\Sigma^(i)} _* \) and \( {\hat{\pi}} \in {\Pi^{(i)}} _j \), the linear function \( {{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} \) as defined in Definition 3.3 is a trigger deviation function in the sense of Definition 3.1.

Proof.

The proof just amounts to a simple application of several definitions. Let \( \pi \in {\Pi^{(i)}} \) be an arbitrary deterministic sequence-form strategy. By expanding the matrix-vector multiplication \( {\boldsymbol{M}^{(i)}_{\hat\sigma\rightarrow{\hat\pi}}} \pi \) using the Definition (8), we obtain that for all \( \sigma \in {\Sigma^(i)} \) (9) \( \begin{align} ({\boldsymbol{M}^{(i)}_{\hat\sigma\rightarrow{\hat\pi}}} \,\pi)[\sigma ] & = \pi[\sigma ]1\!\!1 [\sigma \not\succeq \hat{\sigma }] + {\hat{\pi}} [\sigma ] \pi[\hat{\sigma }] 1\!\!1 [\sigma \succeq j]. \end{align} \) There are only two possibilities:

If \( \pi[\hat{\sigma }] = 0 \), then (9) simplifies to \( \begin{equation*} ({\boldsymbol{M}^{(i)}_{\hat\sigma\rightarrow{\hat\pi}}} \,\pi)[\sigma ] = {\left\lbrace \begin{array}{ll} \pi[\sigma ] & \text{if } \sigma \not\succeq \hat{\sigma }\\ 0 & \text{otherwise}. \end{array}\right.} \end{equation*} \) Since by case hypothesis the probability of the sequence of actions from the root of the game tree down to \( \hat{\sigma } \) is zero, then necessarily the probability of any longer sequence of actions \( \sigma \succeq \hat{\sigma } \) must be zero as well, that is, \( \pi[\sigma ] = 0 \) for all \( \sigma \succeq \hat{\sigma } \). So, \( {\boldsymbol{M}^{(i)}_{\hat\sigma\rightarrow{\hat\pi}}} \,\pi = \pi \) and (6) holds.
Conversely, assume \( \pi[\hat{\sigma }]=1 \). This means that at information set \( j \in {\mathcal{J}^{(i)}} \) action a is selected (with probability 1), and therefore \( \pi[\sigma ] = 0 \) for all \( \sigma = (j,a^{\prime }): a^{\prime } \in A_j, a^{\prime } \ne a \). This means that \( \pi[\sigma ]1\!\!1 [\sigma \not\succeq \hat{\sigma }] = \pi[\sigma ]1\!\!1 [\sigma \not\succeq j] \) for all \( \sigma \in {\Sigma^(i)} \). Substituting that equality into (9) gives Equation (7), as we wanted to show.□

□

Since \( {\Pi^{(i)}} _j \subseteq \lbrace 0,1\rbrace ^{{\Sigma^(i)} _j} \), and \( |{\Sigma^(i)} _j| \le |{\Sigma^(i)} | \) by definition, we have the following immediate bound on the number of trigger deviation functions:

Lemma 3.5.

The number \( |\psi^{(i)} | \) of canonical trigger deviation functions for any player \( i \in [n] \) is upper bounded by \( |{\Sigma^(i)} |\cdot 2^{|{\Sigma^(i)} |} \).

In the following example, we show three examples of canonical trigger deviation functions operate. In particular, we show how they modify some deterministic sequence-form strategies on a simple extensive-form game.

Example 3.6.

We build on the small extensive-form game of Figure 1, and the sequence-form strategies defined in Example 2.1, to provide some concrete intuition behind canonical trigger deviation functions as defined in Definition 3.3.

First, let us consider the trigger deviation function \( \phi _a := \phi^{(1)}_{\mathsf{(A,1)}\rightarrow{\hat\pi_{a}}} \), where the trigger sequence is \( \hat{\sigma }=(\text {A}, {\mathsf{1}}) \), and the continuation strategy \( {\hat{\pi}} _a \) is such that Player 1 plays action \( {\mathsf{2}} \) at information set a, and subsequently sequence \( {\mathsf{7}} \) at information set d. The matrix \( \mathbf {M}_a:= \boldsymbol{M}^{(1)}_{\mathsf{(A,1)}\rightarrow{\hat\pi_{a}}} \) corresponding to \( \phi _a \) is reported in Figure 3 (left). To illustrate the effect of this linear mapping on sequence-form strategies, we provide some examples using the deterministic sequence-form strategy vectors defined in Figure 2. First, we observe that any deterministic sequence-form strategy choosing action \( {\mathsf{1}} \) with probability 1 triggers a deviation that follows the continuation strategy \( {\hat{\pi}} \). The deviation for those sequence-form strategies results in a final deterministic sequence-form strategy equal to \( \mathbf {q}_{{\mathsf{27}}} \). For example, using some of the deterministic-sequence form strategies of Figure 2, it can be easily verified (by working out the matrix-vector product) that: \( \begin{equation*} \mathbf {M}_a \mathbf {q}_{{\mathsf{135}}} = \mathbf {M}_a \mathbf {q}_{{\mathsf{136}}} = \mathbf {M}_a \mathbf {q}_{{\mathsf{145}}} = \mathbf {q}_{{\mathsf{27}}}. \end{equation*} \) However, deterministic sequence-form strategies that do not select sequence \( {\mathsf{1}} \) are left unmodified by the linear mapping. For instance, \( \begin{equation*} \mathbf {M}_a \mathbf {q}_{{\mathsf{28}}} = \mathbf {q}_{{\mathsf{28}}} \hspace{5.69046pt} \text{ and }\hspace{5.69046pt} \mathbf {M}_a \mathbf {q}_{{\mathsf{27}}} = \mathbf {q}_{{\mathsf{27}}}. \end{equation*} \)
Second, we examine the trigger deviation function \( \phi _b := \phi^{(1)}_{\mathsf{(B,3)}\rightarrow{\hat\pi_{c}}} \) for trigger sequence \( \hat{\sigma }=(\text {A}, {\mathsf{2}}) \), where the continuation strategy \( {\hat{\pi}} _b \) is defined so Player 1 plays action \( {\mathsf{1}} \) at information set a, sequence \( {\mathsf{3}} \) at information set b, and action \( {\mathsf{5}} \) at information set c. The corresponding matrix \( \mathbf {M}_b := \boldsymbol{M}^{(1)}_{\mathsf{A,2}\rightarrow{\hat\pi_{b}}} \) is reported in Figure 3 (middle). As in the previous case, all deterministic sequence-form strategy vectors that put probability 1 on action \( {\mathsf{2}} \) at information set a are modified so that the strategy at a and its descendants b,c,d matches the continuation strategy. For example, we have that \( \begin{equation*} \mathbf {M}_b \mathbf {q}_{{\mathsf{27}}} = \mathbf {M}_b \mathbf {q}_{{\mathsf{28}}} = \mathbf {q}_{{\mathsf{135}}}. \end{equation*} \)
Furthermore, sequence-form strategies that do not put probability 1 on sequence \( (\text {A},{\mathsf{2}}) \) are left unchanged. So, for example, \( \begin{equation*} \mathbf {M}_b \mathbf {q}_{{\mathsf{136}}} = \mathbf {q}_{{\mathsf{136}}} \hspace{2.84544pt}\text{ and }\hspace{2.84544pt} \mathbf {M}_b \mathbf {q}_{{\mathsf{145}}} = \mathbf {q}_{{\mathsf{145}}}. \end{equation*} \)
As a final example, Figure 3 (right) reports the deviation matrix \( \mathbf {M}_c \) corresponding to a trigger deviation function \( \phi _c:= \phi^{(1)}_{\mathsf{(B,3)}\rightarrow{\hat\pi_{c}}} \), corresponding to trigger sequence \( \hat{\sigma }=(\text {B}, {\mathsf{3}}) \) and continuation strategy \( {\hat{\pi}} _c \) selecting action \( {\mathsf{4}} \) at information set b. Here, we have that \( \mathbf {M}_c \mathbf {q}_{{\mathsf{135}}} = \mathbf {M}_c \mathbf {q}_{{\mathsf{145}}} \), and \( \mathbf {M}_c \mathbf {q}_{{\mathsf{145}}} = \mathbf {q}_{{\mathsf{145}}} \).

We are now ready to define the concept of trigger regret minimization, which extends and generalizes the homonymous notion in the conference version of this article [8], as well as the notion of internal regret minimization in normal-form games.

Fig. 3. Matrices defining different canonical trigger deviation functions (Definition 3.3) for the simple extensive-form game of Figure 1. Entries highlighted in dark gray represent the entries of the matrix defined in the second case of Equation (8). Let \( \hat{\sigma }=(j,a)\in {\Sigma^(i)} [1]_* \) be the trigger sequence of the trigger deviation function. All indices \( (\sigma _r,\sigma _c) \) such that \( \sigma _r,\sigma _c \succeq j \) are highlighted in light gray.

Definition 3.7.

For every \( i \in [n] \), we call trigger regret minimizer for player i any \( \psi^{(i)} \)-regret minimizer for the set of deterministic sequence-form strategies \( {\Pi^{(i)}} \).

The following theorem shows that if each player \( i\in [n] \) in the game plays according to a \( \psi^{(i)} \)-regret minimizer, then the empirical frequency of play approaches the set of EFCEs. The result is proved starting from the definition of cumulative \( \psi^{(i)} \)-regret, which is manipulated by using the definition of trigger deviation functions. Then, we rewrite the empirical frequency of play, replacing the summation over joint deterministic sequence-form strategies played at each \( t\in [T] \) with a weighted sum depending on the empirical frequency of play (see Equation (17)), which allows us to recover the definition of EFCE (Definition 2.7). We also remark that, to define the linear utility functions in the following theorem, we assume to be working under the full-information feedback model: The regret minimizer for Player i has access to the utility function \( u^{(i)} \) of Player i, the function \( p_c \) encoding probabilities of chance moves, and the pure sequence-form strategies \( \pi ^{(i^{\prime }),t} \) for each \( i ^{\prime } \ne i \). Extending the result to the bandit-information feedback setting is left as an open question.

Theorem 3.8.

For each player \( i \in [n] \), let \( \pi^{(i),1}, \pi^{(i),2}, \dots , \pi^{(i),T} \in {\Pi^{(i)}} \) be deterministic sequence-form strategies whose cumulative \( \psi^{(i)} \)-regret with respect to the sequence of linear utility functions (10) \( \begin{equation} \ell ^{(i),\,t} : {\Pi^{(i)}} \ni \pi^{(i)} \mapsto \sum _{z \in Z} u^{(i)}(z) \, p_c(z)\, {\begin{matrix} (\prod _{i^{\prime } \ne i} \pi^{(i\prime)} {}[\sigma ^{(i^{\prime })}(z)]\end{matrix}})\,{\pi^{(i)}}{}[\sigma ^{(i)}(z)] \end{equation} \) is \( R^{(i),\,T} \). Then, the empirical frequency of play defined as the probability distribution \( \mathbf {\mu } \in \Delta ^{\Pi } \) that draws each joint profile \( (\pi^{(1)},\dots ,\pi^{(n)}) \in \Pi \) with probability \( \begin{equation*} \mathbf {\mu }[(\pi^{(1)},\dots ,\pi^{(n)})] := \frac{1}{T}\sum _{t=1}^T1\!\!1 [({\pi^{(1),t}},\dots ,{\pi^{(n),t}}) = (\pi^{(1)}, \dots ,\pi^{(n)})] \end{equation*} \) is an \( \epsilon \)-EFCE, where \( \epsilon := \frac{1}{T}\max _{i\in [n]} R^{(i),\,T} \).

Proof.

It is immediate to check that \( \mathbf {\mu } \) is indeed a valid element of the \( |\Pi | \)-simplex. Furthermore, the utility function \( \ell ^{(i),\,t} \) clearly satisfies the requirement of being independent of \( \pi^{(i),t} \), for all \( i\in [n] \). We will show that \( \mathbf {\mu } \) defines an \( \epsilon \)-EFCE by verifying that the definition holds (Definition 2.7). Fix any player \( i \in [n] \), trigger sequence \( \hat{\sigma }= (j,a) \in {\Sigma^(i)} _* \), and continuation strategy \( \hat{\mathbf {\pi }} \in {\Pi^{(i)}} _j \). Since by hypothesis the cumulative \( \psi^{(i)} \)-regret is upper bounded by \( R^{(i),\,T} \), and \( R^{(i),\,T} \le T\epsilon \) by definition of \( \epsilon \), we must have \( \begin{equation*} T\epsilon \ge \sum _{t=1}^T \ell ^{(i),\,t}{\begin{matrix} ({{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} ( \pi^{(i),t} \hspace{0.8pt})\end{matrix}}) - \ell ^{(i),\,t}{\begin{matrix} ( \pi^{(i),t} \end{matrix}}). \end{equation*} \) By expanding the definition of the utility function, which was given in (10), the previous inequality is equivalent to (11) \( \begin{equation} T\epsilon \ge \sum _{t=1}^T \sum _{z\in \mathcal {Z}} \alpha _z^{(i),\,t}\cdot {\begin{matrix} ({\phi^{(i)}_{\hat\sigma\rightarrow\hat\pi}} (\pi^{(i),t} \hspace{0.8pt})[\sigma ^{(i)}(z)] - \pi^{(i),t}{}[\sigma ^{(i)}(z)]\end{matrix}}), \end{equation} \) where we used the symbol \( \begin{equation*} \alpha _z^{(i),\,t}:= u^{(i)}(z) \, p_c(z)\, {\begin{matrix} (\prod _{i^{\prime } \ne i} \pi{(i\prime)}{}[\sigma ^{(i^{\prime })}(z)]\end{matrix}}) \end{equation*} \) to lighten the notation. Since \( {{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} \) is a trigger deviation function (Lemmas 3.4), we have that (6) and (7) hold, and in particular it follows that \( \begin{equation*} {{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} ( \pi^{(i),t} {})[\sigma ] = \pi^{(i),t} {}[\sigma ] \end{equation*} \) for all \( t = 1,\dots , T \) and \( \sigma \not\succeq j \). So, the summation term in (11) is zero for all terminal states \( z \in \mathcal {Z} \) such that \( \sigma ^{(i)}(z) \not\succeq j \), and thus we can safely restrict the domain of the summation over terminal states \( z \in \mathcal {Z}^{(i)}_j := \lbrace z \in \mathcal {Z}: \sigma ^{(i)}(z) \succeq j\rbrace \) only, obtaining (12) \( \begin{equation} T\epsilon \ge \sum _{t=1}^T \sum _{z\in \mathcal {Z}^{(i)}_j}\alpha _z^{(i),\,t}\cdot {\begin{matrix} ({{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} ( \pi^{(i),t} \hspace{0.8pt})[\sigma ^{(i)}(z)] - \pi{(i),t}{}[\sigma ^{(i)}(z)]\end{matrix}}). \end{equation} \) We now study the term \( {{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} ( \pi^{(i),t} \hspace{0.8pt})[\sigma ^{(i)}(z)] \) for a generic \( t\in \lbrace 1,\dots ,T\rbrace \) and \( z \in \mathcal {Z}^{(i)}_j \) by splitting into cases contingent on the value of \( \pi^{(i),t}{}[\hat{\sigma }]\in \lbrace 0,1\rbrace \). If \( \pi^{(i),t}{}[\hat{\sigma }] = 0 \), then (6) applies, and therefore \( \begin{equation*} {{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} ( \pi^{(i),t} \hspace{0.8pt})[\sigma ^{(i)}(z)] - \pi{(i),t}{}[\sigma ^{(i)}(z)] = 0. \end{equation*} \) If, on the contrary, \( \pi^{(i),t}{}[\hat{\sigma }] = 1 \), then (7) applies, and \( {{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} ( \pi^{(i),t} \hspace{0.8pt})[\sigma ^{(i)}(z)] = \hat{\mathbf {\pi }}[\sigma ^{(i)}(z)] \), where we used the fact that \( \sigma ^{(i)}(z) \succeq j \) by definition of \( z \in \mathcal {Z}^{(i)}_j \). So, at all \( t = 1,\dots , T \) and for all \( z\in \mathcal {Z}^{(i)}_j \), it holds that \( \begin{equation*} {{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} ( \pi^{(i),t} \hspace{0.8pt})[\sigma ^{(i)}(z)] - \pi{(i),t}{}[\sigma ^{(i)}(z)] = \pi^{(i),t}{}[\hat{\sigma }]{\begin{matrix} (\hat{\mathbf {\pi }}[\sigma ^{(i)}(z)] - \pi{(i),t}{}[\sigma ^{(i)}(z)]\end{matrix}}), \end{equation*} \) and thus (12) can be equivalently written as (13) \( \begin{equation} T\epsilon \ge \sum _{t=1}^T \sum _{z\in \mathcal {Z}^{(i)}_j}\!\! \pi^{(i),t}{}[\hat{\sigma }]\, \alpha _z^{(i),\,t}\cdot {\begin{matrix} (\hat{\mathbf {\pi }}[\sigma ^{(i)}(z)] - \pi{(i),t}{}[\sigma ^{(i)}(z)]\end{matrix}}). \end{equation} \) We now make the crucial observation that time t appears in \( \alpha ^{(i),\,t}_z \) and (13) only as a superscript in the strategies \( \pi^{(1)},\dots ,\pi^{(n)} \), and nowhere else. Therefore, by introducing the functions (14) \( \begin{align} \alpha ^{(i)}_{z}: \Pi \ni (\pi^{(1)}\!,\dots ,\pi^{(n)}) & \mapsto u^{(i)}(z)\, p_c(z)\,{\begin{matrix} (\prod _{i^{\prime }\ne i}\pi{(i\prime)}{}[\sigma ^{(i^{\prime })}(z)]\end{matrix}}) \text{, and} \end{align} \) (15) \( \begin{align} v^{(i)}_{\hat{\sigma }\rightarrow \hat{\mathbf {\pi }}}: \Pi \ni \mathbf {\pi } = (\pi^{(1)}\!,\dots ,\pi^{(n)}) & \mapsto \!\!\! \sum _{z\in \mathcal {Z}^{(i)}_j}\!\!\! \pi^{(i)} {}[\hat{\sigma }]\, \alpha ^{(i)}_z(\mathbf {\pi })\cdot {\begin{matrix} (\hat{\mathbf {\pi }}[\sigma ^{(i)}(z)] - \pi^{(i)} {}[\sigma ^{(i)}(z)]\end{matrix}}), \end{align} \) we can rewrite (13) as (16) \( \begin{align} T\epsilon & \ge \sum _{t=1}^T v^{(i)}_{\hat{\sigma }\rightarrow \hat{\mathbf {\pi }}}(\pi^{(1)},\dots , \pi^{(n)} ) = \sum _{t=1}^T \sum _{\mathbf {\pi }\in \Pi } 1\!\!1 [(\pi^{(1)},\dots , \pi^{(n)} ) = \mathbf {\pi }]\cdot v^{(i)}_{\hat{\sigma }\rightarrow \hat{\mathbf {\pi }}}(\mathbf {\pi }) \\ &= \sum _{\mathbf {\pi }\in \Pi } {\begin{matrix} (\sum _{t=1}^T 1\!\!1 [(\pi^{(1)},\dots , \pi^{(n)} )=\mathbf {\pi }]\end{matrix}}) v^{(i)}_{\hat{\sigma }\rightarrow \hat{\mathbf {\pi }}}(\mathbf {\pi }) = T\sum _{\mathbf {\pi } \in \Pi }\mathbf {\mu }[\mathbf {\pi }] v^{(i)}_{\hat{\sigma }\rightarrow \hat{\mathbf {\pi }}}(\mathbf {\pi }), \end{align} \) where we used the definition of \( \mathbf {\mu } \) in the third equality. Dividing by T in (17), we can further write (17) \( \begin{equation} \epsilon \ge \sum _{\mathbf {\pi }\in \Pi } \mathbf {\mu }[\mathbf {\pi }]\, v^{(i)}_{\hat{\sigma }\rightarrow \hat{\mathbf {\pi }}}(\mathbf {\pi }). \end{equation} \) Finally, by expanding the definition of \( v^{(i)}_{\hat{\sigma }\rightarrow \hat{\mathbf {\pi }}} \) in (17), (18) \( \begin{align} \epsilon & \ge \sum _{\mathbf {\pi } =(\pi^{(1)},\dots ,\pi^{(n)})\in \Pi } {\begin{matrix} (\mathbf {\mu }[\mathbf {\pi }] \, \sum _{z\in \mathcal {Z}^{(i)}_j}\!\!\! \pi^{(i)} {}[\hat{\sigma }]\, \alpha ^{(i)}_z(\mathbf {\pi })\cdot {\begin{matrix} (\hat{\mathbf {\pi }}[\sigma ^{(i)}(z)] - \pi^{(i)} {}[\sigma ^{(i)}(z)]\end{matrix}})\end{matrix}}) \\ & = \sum _{z\in \mathcal {Z}^{(i)}_j}\sum _{\mathbf {\pi }=(\pi^{(1)},\dots ,\pi^{(n)})\in \Pi } \mathbf {\mu }[\mathbf {\pi }]\,\pi^{(i)} {}[\hat{\sigma }]\, \alpha ^{(i)}_z(\mathbf {\pi })\cdot {\begin{matrix} (\hat{\mathbf {\pi }}[\sigma ^{(i)}(z)] - \pi^{(i)} {}[\sigma ^{(i)}(z)]\end{matrix}}). \end{align} \) The right-hand side of (18) can be simplified further by noticing that, by definition of \( \alpha ^{(i)}_z(\mathbf {\pi }) \), \( \begin{align*} \pi^{(i)}[\hat{\sigma }] \alpha ^{(i)}_z(\pi^{(1)}, \dots , \pi^{(n)}) & = u^{(i)}(z)p_c(z) \pi^{(i)}[\hat{\sigma }]{\begin{matrix} (\prod _{i^{\prime }\ne i} \pi^{(i)} [\sigma ^{(i^{\prime })}(z)]\end{matrix}}) \\ & = {\left\lbrace \begin{array}{ll} u^{(i)}(z)p_c(z) & \text{if } \pi^{(i)} [\hat{\sigma }] = 1, \pi^{(i)} [\sigma ^{(i^{\prime })}(z)] = 1 \,\,\forall i^{\prime } \ne i \\ 0 & \text{otherwise}. \end{array}\right.} \end{align*} \) Substituting the above expression into (18), we obtain

where, to get the last equality, we used the definition of \( r^{(i)}_{\mathbf {\mu },\,\hat{\sigma } \rightarrow \hat{\mathbf {\pi }}} \) and we dropped the factor \( \pi^{(i)} {}[\sigma ^{(i)}(z)] \in \lbrace 0,1 \rbrace \) in the second summation by adding the condition \( \pi^{(i)} [i][\sigma ^{(i)}(z)] = 1 \) to its domain. Notice that, by definition of \( \mathcal {Z}^{(i)}_j \), the first summation above is exactly the term appearing in the left-hand side of the inequality in the definition of \( \epsilon \)-EFCE (Definition 2.7). Moreover, since for any \( (\pi^{(1)},\dots ,\pi^{(n)}) \in \Pi \) it holds that \( \pi^{(i)} [i][\sigma ^{(i)}(z)] = 1 \) and \( \pi^{(i)} [i][\hat{\sigma }] = 1 \) only for terminal nodes \( z\in \mathcal {Z}^{(i)}_j \) such that \( \hat{\sigma }\preceq \sigma ^{(i)}(z) \), we can restrict the domain of the second summation above to \( z \in \mathcal {Z}: \sigma ^{(i)}(z) \succeq \hat{\sigma } \) and equivalently rewrite it as \( \begin{equation*} \sum _{\begin{matrix}{c}z \in \mathcal {Z}\\ \sigma ^{(i)}(z) \succeq \hat{\sigma }\end{matrix}}u^{(i)}(z)p_c(z) \left(\sum _{\begin{matrix}{c} (\pi^{(1)},\dots ,\pi^{(n)}) \in \Pi \\ \pi^{(i)} [i^{\prime }][\sigma ^{(i^{\prime })}(z)] = 1 \,\,\, \forall i^{\prime } \in [n] \end{matrix}} \mathbf {\mu }[(\pi^{(1)},\dots ,\pi^{(n)})] \right) = \sum _{\begin{matrix}{c}z \in \mathcal {Z}\\ \sigma ^{(i)}(z) \succeq \hat{\sigma }\end{matrix}}u^{(i)}(z)p_c(z) r_{\mathbf {\mu }}(z), \end{equation*} \) which is exactly the first term appearing in the right-hand side of the inequality in the definition of \( \epsilon \)-EFCE (Definition 2.7). Thus, we obtain that \( \begin{align*} \epsilon & \ge \sum _{\begin{matrix} z \in \mathcal {Z}\\ \sigma ^{(i)}(z) \succeq j \end{matrix}}u^{(i)}(z)p_c(z) r^{(i)}_{\mathbf {\mu },\,\hat{\sigma } \rightarrow \hat{\mathbf {\pi }} }(z) - \sum _{ {\begin{matrix}z \in \mathcal {Z}\\ \sigma ^{(i)}(z) \succeq \hat{\sigma }\end{matrix}} }u^{(i)}(z)p_c(z) r_{\mathbf {\mu }}(z), \end{align*} \) for all \( \hat{\sigma }= (j,a)\in {\Sigma^(i)} _* \) and \( \hat{\mathbf {\pi }} \in {\Pi^{(i)}} _j \), which is the definition of \( \mathbf {\mu } \) being an \( \epsilon \)-EFCE.□

4 EFFICIENT NO-TRIGGER-REGRET ALGORITHM

Theorem 3.8 in Section 3 immediately implies that if all players \( i\in [n] \) play according to the strategies output by a \( \psi^{(i)} \)-regret minimizer for the set of their deterministic sequence-form strategies \( {\Pi^{(i)}} \), then their empirical frequency of play converges to the set of EFCEs. Therefore, the existence of uncoupled no-regret learning dynamics that converge to EFCE can be proved constructively by showing that one such \( \psi^{(i)} \)-regret minimizer can be constructed for each player \( i \in [n] \). More precisely, in this section, we seek to solve the following problem:

Problem 1.

Given any player \( i \in [n] \), construct a \( \psi^{(i)} \)-regret minimizer for the set of the player’s deterministic sequence-form strategies \( {\Pi^{(i)}} \), such that:

it is efficient: The NextElement and the ObserveUtility operations both run in polynomial time in the number \( |{\Sigma^(i)} | \) of sequences of the player; and
it guarantees low regret: After any T observed linear utility functions and for any \( \delta \in (0, 1) \), with probability at least \( 1-\delta \) the cumulative \( \psi^{(i)} \)-regret is \( O(\operatorname{poly}(|{\Sigma^(i)} |)\cdot \sqrt {T\log (1/\delta)}) \).

The central result of this section, Theorem 4.17, establishes that the \( \psi^{(i)} \)-regret minimizer \( \mathcal {R}^{(i)} \) defined in Algorithm 6 satisfies all the requirements of Problem 1.

4.1 Overview

Before delving into the details of the construction of our \( \psi^{(i)} \)-regret minimizer for the set of deterministic sequence-form strategies \( {\Pi^{(i)}} \) of a generic player \( i \in [n] \), we give an overview of the main logical steps that we use to solve Problem 1.

In Section 4.2, we show that one can soundly move the attention from the set of deterministic sequence-form strategies \( {\Pi^{(i)}} \) to the set of mixed sequence-form strategies \( {\mathcal{Q}^{(i)}} = \text {co}({\Pi^{(i)}}) \). In particular, in the rest of the section, we will seek to construct a \( \psi^{(i)} \)-regret minimizer for the set \( {\mathcal{Q}^{(i)}} \) (as opposed to \( {\Pi^{(i)}} \)) that guarantees sublinear regret and polynomial-time implementation in the worst case.
In Section 4.3, we show that the convex hull \( \text {co}(\psi^{(i)}) \) of the set of canonical trigger deviation functions possesses a combinatorial structure that can be leveraged to construct an efficient deterministic regret minimizer for it.
Finally, in Section 4.4, we prove that given any \( \phi \in \text {co}(\psi^{(i)}) \), there exists a fixed-point sequence-form strategy \( \mathbf {q} \in {\mathcal{Q}^{(i)}} \) such that \( \phi (\mathbf {q}) = \mathbf {q} \), and that such a fixed-point strategy can be found in polynomial time in the number of sequences \( |{\Sigma^(i)} | \) of Player i.

Together, the last two steps enable us to apply the construction by Gordon et al. [20] described in Section 2.4 to obtain an efficient \( \text{co}(\Psi^{(i)}) \)-regret minimizer for the set of sequence-form strategies \( {\mathcal{Q}^{(i)}} \) with worst-case sublinear regret guarantees. We summarize that construction in Figure 4, which can serve as a reading aid for the section. Since \( \text {co}(\psi^{(i)}) \supseteq \psi^{(i)} \), such \( \text{co}(\Psi^{(i)}) \)-regret minimizer is also a \( \psi^{(i)} \)-regret minimizer, and the construction is complete.

Fig. 4. Pictorial depiction of our \( \text {co}(\psi^{(i)}) \) -regret minimizer for the set of sequence-form strategies \( {\mathcal{Q}^{(i)}} \) . The symbol \( \otimes \) in the figure denotes a multilinear transformation of the input(s) into the output. Dashed lines denote utility functions. For notational convenience, we let \( {\Sigma^(i)} _* := \lbrace {\mathsf{1}},\dots , {m}\rbrace \) .

4.2 From Deterministic to Mixed Strategies

Suppose that a regret minimizer for a generic discrete set \( \mathcal {X} \) were sought, but only regret minimizers for the convex hull \( \text {co}(\mathcal {X}) \) were known. It seems natural to wonder whether one could take any regret minimizer for \( \text {co}(\mathcal {X}) \) and convert it into a regret minimizer for \( \mathcal {X} \) by sampling the outputs \( \bar{\mathbf {x}}^t \) of the former using an unbiased estimator \( \mathbf {x}^t \in \mathcal {X} \), with \( \mathbb {E}[\mathbf {x}^t] = \bar{\mathbf {x}}^t \). It is a folklore result, justified by a concentration argument, that this is indeed the case (see, for instance, Reference [9, page 192]). In particular, in the case of our interest where \( \mathcal {X}= {\Pi^{(i)}} \), the following can be shown:

Lemma 4.1.

Let \( i\in [n] \) be any player, and \( \bar{\mathcal {R}}^{(i)} \) be any \( \psi^{(i)} \)-regret minimizer for the set \( {\mathcal{Q}^{(i)}} \) of mixed sequence-form strategies. Consider the algorithm \( \mathcal {R}^{(i)} \) whose NextElement and ObserveUtility operations are defined as follows at all times t:

\( \mathcal {R}^{(i)}.\text {NEXTELEMENT} \) calls \( \bar{\mathcal {R}}^{(i)}.\text {NEXTELEMENT} \), thereby obtaining a mixed sequence-form strategy \( {\boldsymbol{q}^{(i),t}} \in {\mathcal{Q}^{(i)}} \). Then, an unbiased sampling scheme (such as the natural sampling scheme described in Section 2.2) is used to sample a random deterministic sequence-form strategy \( \pi^{(i),t} {} \in {\Pi^{(i)}} \) in linear time in the number of sequences \( |{\Sigma^(i)} | \). Finally, \( \pi^{(i),t} {} \) is returned to the caller;
\( \mathcal {R}^{(i)}.\text {OBSERVEUTILITY}(\ell ^{t}) \) calls \( \bar{\mathcal {R}}^{(i)}.\text {OBSERVEUTILITY}(\ell ^t) \) with the same utility function \( \ell ^{t} \).

Furthermore, assume that the linear utility functions \( \ell ^1,\ell ^2,\dots \) received as feedback by \( \mathcal {R}^{(i)} \) have range upper bounded by a constant D, that is, \( \max _{\mathbf {q},\mathbf {q}^{\prime } \in {\mathcal{Q}^{(i)}} } \lbrace \ell ^t(\mathbf {q}) - \ell ^t(\mathbf {q}^{\prime })\rbrace \le D \) for all \( t=1,\dots , T \). Then, \( \mathcal {R}^{(i)} \) is a \( \psi^{(i)} \)-regret minimizer for the set of deterministic sequence-form strategies \( {\Pi^{(i)}} \), and its cumulative \( \psi^{(i)} \)-regret satisfies, at all times T and for all \( \delta \in (0,1) \), the inequality \( \begin{equation*} \mathbb {P}{\begin{matrix} [R^{(i),\,T} \le \bar{R}^{(i),\,T} + {4}D\sqrt {T\cdot {|{\Sigma^(i)} |}\log {\begin{matrix} (\frac{1}{\delta }\end{matrix}})}\end{matrix}} ] \ge 1-\delta . \end{equation*} \)

Proof.

Let \( \ell ^1, \ell ^2, \dots \) be the sequence of linear utility functions observed by \( \mathcal {R}^{(i)} \), and fix any \( \phi \in \psi^{(i)} \). We introduce the discrete-time stochastic process² (19) \( \begin{equation} w^t := \ell ^t(\phi ( \pi^{(i),t} {})) - \ell ^t( \pi^{(i),t} ) - \ell ^t(\phi ({\boldsymbol{q}^{(i),t}})) + \ell ^t({\boldsymbol{q}^{(i),t}}). \end{equation} \) Since (i) \( \ell ^t \) and \( \phi \) are both linear functions, (ii) \( \ell ^t \) is independent² of \( \pi^{(i),t} \), and (iii) \( \pi^{(i),t} \) is an unbiased estimator of \( {\boldsymbol{q}^{(i),t}} \) at all times t by hypothesis, then \( w^t \) is a martingale difference sequence. Furthermore, each increment \( |w^t| \) can be easily upper bounded, at all times t, according to (20) \( \begin{equation} |w^t| \le |\ell ^t(\phi ( \pi^{(i),t} {})) - \ell ^t( \pi^{(i),t} )| + |\ell ^t(\phi ({\boldsymbol{q}^{(i),t}})) - \ell ^t({\boldsymbol{q}^{(i),t}})| \le 2D, \end{equation} \) where the second inequality follows from the fact that \( \phi \) maps sequence-form strategies to sequence-form strategies, together with the hypothesis that \( \ell ^t \) has range upper bounded by D.

For any T, let \( R^{(i),\,T}(\phi) \) and \( \bar{R}^{(i),\,T}(\phi) \) denote the regret cumulated by \( \mathcal {R}^{(i)} \) and \( \bar{\mathcal {R}}^{(i)} \), respectively, compared to always picking transformation \( \phi \); in symbols \( \begin{equation*} R^{(i),\,T}(\phi) := \sum _{t=1}^T \ell ^t(\phi ( \pi^{(i),t} )) - \ell ^t( \pi^{(i),t} ), \qquad \bar{R}^{(i),\,T}(\phi) := \sum _{t=1}^T \ell ^t(\phi ({\boldsymbol{q}^{(i),t}})) - \ell ^t({\boldsymbol{q}^{(i),t}}). \end{equation*} \) It is immediate to see from Definition (19) of \( w^t \), that (21) \( \begin{equation} \sum _{t=1}^T w^t = R^{(i),\,T}(\phi) - \bar{R}^{(i),\,T}(\phi)\qquad \quad \forall \ T \in \lbrace 1, 2, \dots \rbrace . \end{equation} \) Using the Azuma-Hoeffding concentration inequality,² it follows that, for all T, (22) \( \begin{align} \mathbb {P}{\begin{matrix} [R^{(i),\,T}(\phi) - \bar{R}^{(i),\,T}(\phi) \le D\sqrt {8T \log {\begin{matrix} (\frac{1}{\delta }\end{matrix}})}\end{matrix}} ] & = \mathbb {P}{\begin{matrix} [\sum _{t=1}^T w^t \le D\sqrt {8T \log {\begin{matrix} (\frac{1}{\delta }\end{matrix}})}\end{matrix}} ] \\ & \ge 1 - \exp \big\lbrace -\frac{2}{\sum _{t=1}^T (2|w^t|)^2} {\begin{matrix} (D\sqrt {8T \log {\begin{matrix} (\frac{1}{\delta }\end{matrix}})}\end{matrix}})^2\big\rbrace \\ & \ge 1 - \exp \big\lbrace -\frac{2}{(4D)^2 T} {\begin{matrix} (D\sqrt {8T \log {\begin{matrix} (\frac{1}{\delta }\end{matrix}})}\end{matrix}})^2\big\rbrace = 1 - \delta , \end{align} \) where we used (21) in the equality and (20) in the second inequality. Since (22) holds for any choice of \( \phi \in \psi^{(i)} \), we can now write (23) \( \begin{align} \mathbb {P}{\begin{matrix} [R^{(i),\,T} \le \bar{R}^{(i),\,T} + D\sqrt {8T \log {\begin{matrix} (\frac{1}{\delta }\end{matrix}})}\end{matrix}} ] & = \mathbb {P}{\begin{matrix} [\max _{\phi \in \psi^{(i)} } \lbrace R^{(i),\,T}(\phi)\rbrace \le \max _{\phi \in \psi^{(i)} }\lbrace \bar{R}^{(i),\,T}(\phi)\rbrace + D\sqrt {8T \log {\begin{matrix} (\frac{1}{\delta }\end{matrix}})}\end{matrix}} ]\\ & \ge \mathbb {P}{\begin{matrix} [\max _{\phi \in \psi^{(i)} } \lbrace R^{(i),\,T}(\phi)-\bar{R}^{(i),\,T}(\phi)\rbrace \le D\sqrt {8T \log {\begin{matrix} (\frac{1}{\delta }\end{matrix}})}\end{matrix}} ]\\ & = \mathbb {P}{\begin{matrix} [R^{(i),\,T}(\phi)-\bar{R}^{(i),\,T}(\phi) \le D\sqrt {8T \log {\begin{matrix} (\frac{1}{\delta }\end{matrix}})}\quad \forall \,\phi \in \psi^{(i)} \end{matrix}} ] \\ & \ge 1-|\psi^{(i)} |\cdot \delta , \end{align} \) where the first inequality follows from the fact that \( \max _{\phi \in \psi^{(i)} } \lbrace R^{(i),\,T}(\phi)\rbrace - \max _{\phi \in \psi^{(i)} }\lbrace \bar{R}^{(i),\,T}(\phi)\rbrace \le \max _{\phi \in \psi^{(i)} } \lbrace R^{(i),\,T}(\phi)-\bar{R}^{(i),\,T}(\phi)\rbrace \), while the second inequality follows from applying (22) and the union bound. Using Lemma 3.5, setting \( \delta ^{\prime }:= |{\Sigma^(i)} |\cdot 2^{|{\Sigma^(i)} |}\cdot \delta \), and noticing that \( \log (|{\Sigma^(i)} |\cdot 2^{|{\Sigma^(i)} |}) \le \log (2^{2|{\Sigma^(i)} |}) \le 2|{\Sigma^(i)} | \), we obtain the statement.□

Lemma 4.1 immediately implies that, to solve Problem 1, it is enough to solve the following problem:

Problem 2.

Given any player \( i\in [n] \), construct a \( \psi^{(i)} \)-regret minimizer for the set of mixed sequence-form strategies \( {\mathcal{Q}^{(i)}} \) such that:

it is efficient: Both the NextElement and the ObserveUtility operations can be implemented in polynomial time in \( |{\Sigma^(i)} | \); and
it guarantees low regret: After any T observed utilities, the cumulative \( \psi^{(i)} \)-regret is upper bounded as \( O({\operatorname{poly}(|{\Sigma^(i)} |)\cdot }\sqrt {T}) \).

The remainder of this section gives an algorithm that solves Problem 2 and, thus, indirectly, also Problem 1.

4.3 Regret Minimizer for the Convex Hull of the Set of Trigger Deviation Functions

In this subsection, we begin the construction of a phi-regret minimizer relative to the convex hull \( \text {co}(\psi^{(i)}) \) of the set of trigger deviation functions \( \psi^{(i)} \) for the set \( {\mathcal{Q}^{(i)}} \). Since \( \text {co}(\psi^{(i)}) \supseteq \psi^{(i)} \), any such \( (\text {co}(\psi^{(i)})) \)-regret minimizer is trivially also a \( \psi^{(i)} \)-regret minimizer for \( {\mathcal{Q}^{(i)}} \).

To obtain our \( (\text {co}(\psi^{(i)})) \)-regret minimizer, we will leverage the general framework due to Gordon et al. [20] that we recalled at the end of Section 2.4. In our particular case, that construction reduces to showing the following:

(1)	existence of a deterministic regret minimizer for the set of deviations \( \text {co}(\psi^{(i)}) \); and
(2)	existence of a fixed point \( \mathbf {q} = \phi (\mathbf {q}) \) for any \( \phi \in \text {co}(\psi^{(i)}) \).

In this subsection, we will focus on point (1), while in the next subsection, we will focus on point (2). Specifically, the central result of this subsection, Theorem 4.6, will constructively establish the existence of an efficient deterministic regret minimizer \( \tilde{\mathcal {R}}^{(i)} \) for the set \( \text {co}(\psi^{(i)}) \).

The starting point of our approach is the observation that, because the convex hull operation is associative, \( \text {co}(\psi^{(i)}) = \text {co}(\lbrace {{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} : \hat{\sigma }=(j,a)\in {\Sigma^(i)} _*, {\hat{\pi}} \in {\Pi^{(i)}} _j\rbrace) \) can be evaluated in two stages: First, for each sequence \( \hat{\sigma }= (j,a)\in {\Sigma^(i)} _* \) one can define the set \( \begin{equation*} {\bar\Psi^{(i)}_{\hat\sigma}} := \text {co}{\Big (}\big\lbrace {{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} {}: \hat{\mathbf {\pi }} \in {\Pi^{(i)}} _{j} \big\rbrace {\Big)}; \end{equation*} \) and then, one can take the convex hull of all \( {\bar\Psi^{(i)}_{\hat\sigma}} \), that is, (24) \( \begin{equation} \text {co}(\psi^{(i)}) = \text {co}{\Big (}\big\lbrace {\bar\Psi^{(i)}_{\hat\sigma}} : \hat{\sigma }\in {\Sigma^(i)} _* \big\rbrace {\Big)}. \end{equation} \) Our construction of \( \tilde{\mathcal {R}}^{(i)} \) will follow a similar structure. First, for each \( \hat{\sigma }\in {\Sigma^(i)} _* \), we will construct a regret minimizer \( \tilde{\mathcal {R}}^{(i)}_{\hat{\sigma }} \) for the set of deviations \( {\bar\Psi^{(i)}_{\hat\sigma}} \) (Section 4.3.1). Then, we will combine all the regret minimizers \( \tilde{\mathcal {R}}^{(i)}_{\hat{\sigma }} \) into a composite regret minimizer \( \tilde{\mathcal {R}}^{(i)} \) for the set \( \text {co}(\psi^{(i)}) \) (Section 4.3.2).

4.3.1 Regret Minimizer for \( {\bar\Psi^{(i)}_{\hat\sigma}} \).

Fix any \( \hat{\sigma }=(j,a)\in {\Sigma^(i)} _* \). A deterministic regret minimizer for the set \( {\bar\Psi^{(i)}_{\hat\sigma}} \) can be constructed starting from any deterministic regret minimizer for the set \( {\mathcal{Q}^{(i)}} _j \). The crucial insight lies in the observation that the mapping \( \begin{equation*} h^{(i)}_{\hat{\sigma }}: \mathbb {R}^{{\Sigma^(i)} _j} \ni \mathbf {y} \mapsto \phi{^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} \end{equation*} \) is affine, since the entries in \( {\boldsymbol{M}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} \) are defined using only constants and linear combinations of entries in \( \mathbf {y} \) (Definition 3.2). Hence, using the properties of affine functions, we can write \( \begin{equation*} \text {co}{\bigg (}\big\lbrace {{\phi}^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{y}}}} : {\hat{\pi}} \in {\Pi^{(i)}} _j\big\rbrace {\bigg)} = \text {co}(h^{(i)}_{\hat{\sigma }}({\Pi^{(i)}} _j)) = h^{(i)}_{\hat{\sigma }}(\text {co}({\Pi^{(i)}} _j)) = h^{(i)}_{\hat{\sigma }}({\mathcal{Q}^{(i)}} _j). \end{equation*} \) So, we have just proved the following characterization of the set \( {\bar\Psi^{(i)}_{\hat\sigma}} \):

Lemma 4.3.

For all sequences \( \hat{\sigma }= (j,a)\in {\Sigma^(i)} _* \), \( {\bar\Psi^{(i)}_{\hat\sigma}} \) is the image of \( {\mathcal{Q}^{(i)}} _j \) under the affine mapping \( h^{(i)}_{\hat{\sigma }} \). In symbols, \( \begin{equation*} {\bar\Psi^{(i)}_{\hat\sigma}} = \big\lbrace \phi{^{(i)}_{\hat\sigma\rightarrow{\boldsymbol{q}\hat\sigma}}} : {\boldsymbol{q}} _{\hat{\sigma }}\in {\mathcal{Q}^{(i)}} _j\big\rbrace . \end{equation*} \)

As a consequence of Lemma 4.3, given any \( \hat{\sigma }=(j,a)\in {\Sigma^(i)} _* \), all transformations \( \phi \in {\bar\Psi^{(i)}_{\hat\sigma}} \) are of the form \( \phi =\phi{^{(i)}_{\hat\sigma\rightarrow{q_{\hat\sigma}}}} \) for some \( {\boldsymbol{q}} _{\hat{\sigma }}\in {\mathcal{Q}^{(i)}} _j \). Thus, the cumulative regret incurred by a generic sequence of transformations \( \phi ^1 = \phi{^{(i)}_{\hat\sigma\rightarrow{q^{(i)}_{\hat\sigma}}}}, \dots , \phi ^T = \phi{^{(i)}_{\hat\sigma\rightarrow{q^{(T)}_{\hat\sigma}}}} \) against generic linear utility functions \( L^1, \dots , L^T \) can be written as

(25)

Since \( L^t \) is linear and \( h^{(i)}_{\hat{\sigma }} \) is affine, their composition \( L^t\circ h^{(i)}_{\hat{\sigma }} \) is affine, and therefore the shifted function \( \begin{equation*} g^{(i),\,t}_{\hat{\sigma }} : \mathbb {R}^{{\Sigma^(i)} _j} \ni \mathbf {x} \mapsto L^t(h^{(i)}_{\hat{\sigma }}(\mathbf {x})) - L^t(h^{(i)}_{\hat{\sigma }}(\mathbf {0})) \end{equation*} \) is linear.² Furthermore, from (25) it follows that (26) \( \begin{equation} \max _{\phi ^* \in {\bar\Psi^{(i)}_{\hat\sigma}} } \sum _{t=1}^T L^t(\phi ^*) - L^t(\phi{^{(i)}_{\hat\sigma\rightarrow{q^{(t)}_{\hat\sigma}}}}) = \max _{\hat{\mathbf {q}}^* \in {\mathcal{Q}^{(i)}} _j}\sum _{t=1}^T g^{(i),\,t}_{\hat{\sigma }}(\hat{\mathbf {q}}^*) - g^{(i),\,t}_{\hat{\sigma }}({\boldsymbol{q}} _{\hat{\sigma }}^t). \end{equation} \)

Equation (26) suggests that if the continuation strategies \( {\boldsymbol{q}} _{\hat{\sigma }}^t \in {\mathcal{Q}^{(i)}} _j \) are picked by a deterministic regret minimizer \( \tilde{\mathcal {R}}^{(i)}_{\mathcal {Q}, \hat{\sigma }} \) that observes the linear utility functions \( g^{(i),\,t}_{\hat{\sigma }} \) at all times t, then the regret cumulated with respect to utility functions \( L^t \) by the corresponding elements \( \phi{^{(i)}_{\hat\sigma\rightarrow{q^{(t)}_{\hat\sigma}}}} \) grows sublinearly. We make that construction explicit in Algorithm 1.

Algorithm 1 can be instantiated with any deterministic regret minimizer \( \tilde{\mathcal {R}}^{(i)}_{\mathcal {Q},\,\hat{\sigma }} \) for the set of sequence-form strategies \( {\mathcal{Q}^{(i)}} _j \). The following proposition formalizes the cumulative regret guarantee when \( \tilde{\mathcal {R}}^{(i)}_{\mathcal {Q},\,\hat{\sigma }} \) is set to the CFR algorithm [49], which so far has arguably been the most widely used regret minimizer for sequence-form strategy spaces:

Proposition 4.4.

Let \( i\in [n] \) be any player and \( \hat{\sigma }=(j,a)\in {\Sigma^(i)} _* \) be any trigger sequence. Consider the deterministic regret minimizer \( \tilde{\mathcal {R}}^{(i)}_{\hat{\sigma }} \) (Algorithm 1), where \( \tilde{\mathcal {R}}_{\mathcal {Q},\,\hat{\sigma }} \) is set to be the CFR regret minimizer [49]. Upon observing a sequence of linear utility functions \( L^1,\dots ,L^T:\text {co}(\Psi^{(i)})\rightarrow \mathbb {R} \), the regret cumulated by the elements \( \phi ^1 = \phi{^{(i)}_{\hat\sigma\rightarrow{q^{(1)}_{\hat\sigma}}}},\dots ,\phi ^T = \phi{^{(i)}_{\hat\sigma\rightarrow{q^{(T)}_{\hat\sigma}}}} \) output by \( \tilde{\mathcal {R}}^{(i)}_{\hat{\sigma }} \) satisfies \( \begin{equation*} R^T = \max _{\phi ^*\in {\bar\Psi^{(i)}_{\hat\sigma}} } \sum _{t=1}^T L^t(\phi ^*) - L^t(\phi ^t) \le D\, |{\Sigma^(i)} _j|\, \sqrt {T}, \end{equation*} \) where D is the range of \( L^1, \dots , L^t \), that is, any constant such that \( \max _{\phi ,\phi ^{\prime }\in {\bar\Psi^{(i)}_{\hat\sigma}} }\lbrace L^t(\phi)-L^t(\phi ^{\prime })\rbrace \le D \) for all \( t=1,\dots , T \). Furthermore, the NextElement and the ObserveUtility operations run in \( O(|{\Sigma^(i)} |) \) time.

Proof.

As shown in (26), the regret cumulated by \( \tilde{\mathcal {R}}^{(i)}_{\hat{\sigma }} \) upon observing linear utility functions \( L^1,\dots ,L^t \) equals the regret cumulated by the CFR algorithm upon observing linear utility functions \( g^{(i),\,t}_{\hat{\sigma }} : \mathbb {R}^{{\Sigma^(i)} _j}\ni \mathbf {x} \mapsto L^t(h^{(i)}_{\hat{\sigma }}(\mathbf {x})) - L^t(h^{(i)}_{\hat{\sigma }}(\mathbf {0})) \). Furthermore, the range of \( g^{(i),\,t}_{\hat{\sigma }} \) satisfies the inequality \( \begin{align*} \max _{\mathbf {q},\,\mathbf {q}^{\prime }\in {\mathcal{Q}^{(i)}} _j} \big\lbrace g^{(i),\,t}_{\hat{\sigma }}(\mathbf {q}) - g^{(i),\,t}_{\hat{\sigma }}(\mathbf {q}^{\prime })\big\rbrace & = \max _{\mathbf {q},\,\mathbf {q}^{\prime }\in {\mathcal{Q}^{(i)}} _j} \big\lbrace L^t(h^{(i)}_{\hat{\sigma }}(\mathbf {q})) - L^t(h^{(i)}_{\hat{\sigma }}(\mathbf {q}^{\prime }))\big\rbrace \\ & = \max _{\phi ,\,\phi ^{\prime }\in {\bar\Psi^{(i)}_{\hat\sigma}} } \big\lbrace L^t(\phi)-L^t(\phi ^{\prime }) \big\rbrace \le D. \end{align*} \) So, applying the regret bound of the CFR algorithm (Theorems 3 and 4 of Zinkevich et al. [49]), \( \begin{equation*} R^T \le D {\begin{matrix} (\sum _{j^{\prime } \succeq j}\sqrt {|\mathcal {A}(j^{\prime })|}\end{matrix}})\sqrt {T} \le D{\begin{matrix} (\sum _{j^{\prime }\succeq j} |\mathcal {A}(j^{\prime })|\end{matrix}})\sqrt {T} = D|{\Sigma^(i)} _j|\sqrt {T}, \end{equation*} \) completing the proof of the regret bound.

The complexity analysis of the NextElement operation follows directly from the fact that CFR’s NextElement operation runs in linear time in \( |{\Sigma^(i)} _j| \). So, we focus on the complexity of the ObserveUtility operation. Fix any time t, and let \( \mathbf {\Lambda }^t := \langle L^t\rangle \) be the canonical representation (defined in Section 2.1) of the linear utility function \( L^t \). Since the canonical representation of \( h^{(i)}_{\hat{\sigma }}(\mathbf {x}) \) is the matrix \( \boldsymbol{M}{^{(i)}_{\hat\sigma\rightarrow{x}}} \) for all \( \mathbf {x}\in \mathbb {R}_{\ge 0}^{{\Sigma^(i)} _j} \), from (1), we obtain \( \begin{align*} L^t(h^{(i)}_{\hat{\sigma }}(\mathbf {x})) - L^t(h^{(i)}_{\hat{\sigma }}(\mathbf {0})) & = \sum _{\sigma _r,\sigma _c\in {\Sigma^(i)} _j} \mathbf {\Lambda }^t[\sigma _r,\sigma _c]\,{\begin{matrix} (\boldsymbol{M}{^{(i)}_{\hat\sigma\rightarrow{x}}}{}[\sigma _r,\sigma _c] - \boldsymbol{M}{^{(i)}_{\hat\sigma\rightarrow{0}}}\end{matrix}}) \\ & = \sum _{\sigma _r\succeq j} \mathbf {\Lambda }^t[\sigma _r,\hat{\sigma }]\,\mathbf {x}[\sigma _r], \end{align*} \) where the second equality follows from expanding the definitions of \( \boldsymbol{M}{^{(i)}_{\hat\sigma\rightarrow{x}}}{}[\sigma _r,\sigma _c] \) and \( \boldsymbol{M}{^{(i)}_{\hat\sigma\rightarrow{0}}}[\sigma _r,\sigma _c] \) given in (8). So, the canonical representation \( \langle g^{(i),\,t}_{\hat{\sigma }}\rangle \) of \( g^{(i),\,t}_{\hat{\sigma }} \) is the vector \( (\mathbf {\Lambda }^t[\sigma _r,\hat{\sigma }])_{\sigma _r\in {\Sigma^(i)} _j} \), which can be clearly computed and stored in memory in \( O(|{\Sigma^(i)} _j|) \) time. Using the fact that CFR’s ObserveUtility operation runs in linear time in \( |{\Sigma^(i)} _j| \), the complexity bound of the statement follows.□

4.3.2 Regret Minimizer for co(ψ⁽ⁱ⁾).

Recently, Farina et al. [14] showed that a regret minimizer for a composite set of the form \( \text {co}(\lbrace \mathcal {X}_1, \dots , \mathcal {X}_m\rbrace) \) can be constructed by combining any individual regret minimizers for \( \mathcal {X}_1,\dots ,\mathcal {X}_m \) through a construction—called a regret circuit—which we describe next.

Proposition 4.5

(Farina et al. [14], Section 4.3¹²)

Let \( \mathcal {X}_1, \dots , \mathcal {X}_m \) be a finite collection of sets, and let \( \mathcal {R}_1, \dots ,\mathcal {R}_m \) be any regret minimizers for them. Furthermore, let \( \mathcal {R}_\Delta \) be any regret minimizer for the m-simplex \( \Delta ^m := \lbrace (\lambda _1, \dots ,\lambda _m) \in \mathbb {R}_{\ge 0}^m, \sum _k\lambda _k = 1\rbrace \). A regret minimizer \( \mathcal {R}_\text{co} \) for the set \( \text {co}(\lbrace \mathcal {X}_1, \dots , \mathcal {X}_m\rbrace) \) can be constructed starting from \( \mathcal {R}_1,\dots ,\mathcal {R}_m \) and \( \mathcal {R}_\Delta \) as follows:

\( \mathcal {R}_\text{co}.\text {NEXTELEMENT} \) calls NextElement on each of the regret minimizers \( \mathcal {R}_1,\dots ,\mathcal {R}_m \), obtaining elements \( \mathbf {x}_1^t, \dots , \mathbf {x}_m^t \). Then, it calls the NextElement operation on \( \mathcal {R}_\Delta \), obtaining an element of the simplex \( \mathbf {\lambda }^t = (\lambda _1^t, \dots ,\lambda _m^t) \). Finally, it returns the element \( \begin{equation*} \lambda _1^t \mathbf {x}_1^t + \dots + \lambda _m^t \mathbf {x}_m^t \in \text {co}(\lbrace \mathcal {X}_1, \dots , \mathcal {X}_m\rbrace). \end{equation*} \)
\( \mathcal {R}_\text{co}.\text {OBSERVEUTILITY}(L^t) \) forwards the linear utility function \( L^t \) to each of the regret minimizers \( \mathcal {R}_1, \dots , \mathcal {R}_m \). Then, it calls the ObserveUtility operation on \( \mathcal {R}_\text{co} \) with the linear utility function \( (\lambda _1, \dots ,\lambda _m) \mapsto L^t(\mathbf {x}^t_1)\lambda _1 + \dots + L^t(\mathbf {x}_m^t)\lambda _m \).

In doing so, the regret \( R_\text{co}^T \) cumulated by \( \mathcal {R}_\text{co} \) upon observing any T linear utility functions relates to the regrets \( R_1^T, \dots , R_m^T, R_\Delta ^T \) cumulated by \( \mathcal {R}_1,\dots ,\mathcal {R}_m,\mathcal {R}_\Delta \), respectively, according to the inequality (27) \( \begin{equation} R_\text{co}^T \le R_\Delta ^T + \max \lbrace R_1^T, \dots , R_m^T\rbrace . \end{equation} \)

We apply the construction described in Proposition 4.5 to obtain our deterministic regret minimizer \( \mathcal {R}^{(i)} \) for the set \( \text {co}(\psi^{(i)}) = \text {co}(\lbrace {\bar\Psi^{(i)}_{\hat\sigma}} :\hat{\sigma }\in {\Sigma^(i)} \rbrace) \) starting from the deterministic regret minimizers \( \tilde{\mathcal {R}}^{(i)}_{\hat{\sigma }} \) (Algorithm 1), one for each sequence \( \hat{\sigma }\in {\Sigma^(i)} _* \), as well as any deterministic regret minimizer \( \mathcal {R}_\Delta ^{(i)} \) for the simplex \( \Delta ^{{\Sigma^(i)} _*} \). Pseudocode is given in Algorithm 2.

Combining Proposition 4.5 and Proposition 4.4, we obtain the following result:

Theorem 4.6.

Consider the regret minimizer \( \tilde{\mathcal {R}}^{(i)} \) (Algorithm 2), where \( \mathcal {R}_\Delta ^{(i)} \) is set to the regret matching algorithm, and \( \tilde{\mathcal {R}}^{(i)}_{\hat{\sigma }} \) is instantiated as described in Proposition 4.4. Upon observing a sequence of linear utility functions \( L^1,\dots ,L^T:\text {co}(\psi^{(i)})\rightarrow \mathbb {R} \), the regret cumulated by the transformations \( \phi ^1,\dots ,\phi ^T\in \text {co}(\psi^{(i)}) \) output by \( \tilde{\mathcal {R}}^{(i)} \) satisfies \( \begin{equation*} R^T = \max _{\phi ^* \in \text {co}(\psi^{(i)})}\sum _{t=1}^T L^t(\phi ^*) - L^t(\phi ^t) \le 2D\,|{\Sigma^(i)} |\,\sqrt {T}, \end{equation*} \) where D is the range of \( L^1, \dots , L^t \), that is, any constant such that \( \max _{\phi ,\phi ^{\prime }\in \text {co}(\psi^{(i)})}\lbrace L^t(\phi)-L^t(\phi ^{\prime })\rbrace \le D \) for all \( t=1,\dots , T \). Furthermore, the NextElement and the ObserveUtility operations run in \( O(|{\Sigma^(i)} |^2) \) time.

Proof.

At all t, the range of the linear utility function \( \mathbf {\lambda } \mapsto \sum _{\hat{\sigma }\in {\Sigma^(i)} _*} \mathbf {\lambda }[\hat{\sigma }]\,L^t(\phi{^{(i)}_{\hat\sigma\rightarrow{q^{(t)}_{\hat\sigma}}}}) \) is upper bounded by D. Hence, from the known regret bound of the regret matching algorithm [22, 49], the regret cumulated by \( \mathcal {R}_\Delta ^{(i)} \) after T iterations is upper bounded as \( \begin{equation*} R^T_\Delta \le D \sqrt {|{\Sigma^(i)} _*|}\sqrt {T}\le D |{\Sigma^(i)} |\sqrt {T}. \end{equation*} \) However, the regret bound in Proposition 4.4 shows that, for all \( \hat{\sigma }=(j,a)\in {\Sigma^(i)} _* \), the regret \( R^T_{\hat{\sigma }} \) cumulated by \( \tilde{\mathcal {R}}_{\hat{\sigma }}^{(i)} \) is upper bounded as \( R^T_{\hat{\sigma }} \le D |{\Sigma^(i)} _j| \sqrt {T} \). Applying (27) together with the fact that \( |{\Sigma^(i)} _j| \le |{\Sigma^(i)} | \) for all \( j\in {\mathcal{J}^{(i)}} \) yields the regret bound in the statement.

The complexity analysis of NextElement is straightforward: The regret matching algorithm produces elements in \( O(|{\Sigma^(i)} |) \) time, while each iteration of the loop over \( {\Sigma^(i)} _* \) requires \( O(|{\Sigma^(i)} |) \) time. Then, we focus on the complexity of ObserveUtility. There, the only operation whose analysis is not immediately obvious is the construction of the linear utility function \( \ell ^t_\lambda : \mathbf {\lambda } \mapsto \sum _{\hat{\sigma }\in {\Sigma^(i)} _*} \mathbf {\lambda }[\hat{\sigma }]\,L^t(\phi{^{(i)}_{\hat\sigma\rightarrow{q^{(t)}_{\hat\sigma}}}}) \), where it is necessary to check that its canonical representation (Section 2.1), given by the vector \( (L^t(\phi{^{(i)}_{\hat\sigma\rightarrow{q^{(t)}_{\hat\sigma}}}}))_{\hat{\sigma }\in {\Sigma^(i)} _*} \), can be computed and stored in memory in \( O(|{\Sigma^(i)} |^2) \) time. Fix any \( \hat{\sigma }\in {\Sigma^(i)} _* \). The canonical representation of \( \phi{^{(i)}_{\hat\sigma\rightarrow{q^{(t)}_{\hat\sigma}}}} \) is \( \boldsymbol{M}{^{(i)}_{\hat\sigma\rightarrow{q^{t}_{\hat\sigma}}}} \), which is a matrix with \( O(|{\Sigma^(i)} |) \) nonzero entries. Therefore, using (1), the evaluation of \( L^t(\phi{^{(i)}_{\hat\sigma\rightarrow{q^{(t)}_{\hat\sigma}}}}) \) via the canonical representations of \( L^t \) (given as input) and \( \phi{^{(i)}_{\hat\sigma\rightarrow{q^{(t)}_{\hat\sigma}}}} \) takes \( O(|{\Sigma^(i)} |) \) time. Thus, the representation of \( \ell _\lambda \) can be computed in \( O(|{\Sigma^(i)} |^2) \) time, confirming the analysis in the statement.□

4.4 Computation of the Next Strategy

In this subsection, we complete the construction of our \( \text{co}(\Psi^{(i)}) \)-regret minimizer for \( {\mathcal{Q}^{(i)}} \) started in Section 4.3 by showing that every transformation \( \phi \in \text {co}(\psi^{(i)}) \) admits a fixed point \( \phi (\mathbf {q}) = \mathbf {q} \in {\mathcal{Q}^{(i)}} \) and that such a fixed point can be computed in time quadratic in the number of sequences \( {\Sigma^(i)} \) of Player i.

As a key step in our algorithm, we will use the following well-known result about stationary distributions of stochastic matrices:

Lemma 4.7.

Let \( \mathbf {A} \in \mathbb {S}^m \) be a stochastic matrix. Then, \( \mathbf {A} \) admits a fixed point \( \mathbf {A}\mathbf {x} = \mathbf {x}\in \Delta ^m \). Furthermore, such a fixed point can be computed in polynomial time in m.

Several algorithms are known for computing fixed points of stochastic matrices (see, e.g., Reference [39] for a comparison of eight different methods). Since the particular choice of method is irrelevant, in this article, we will make the following assumption:

Assumption 1.

Given any \( m \in \mathbb {N}_{\gt 0} \), we assume access to an oracle for computing a fixed point of any \( m\times m \) stochastic matrix \( \mathbf {A} \). Furthermore, we assume that the oracle requires at most \( \textsf { FP}(m) \) time in the worst case to compute any such fixed point.

Our algorithm for computing a fixed point of \( \phi \in \text {co}(\psi^{(i)}) \) requires that the transformation \( \phi \) be expressed as a convex combination of elements from the sets \( \lbrace {\bar\Psi^{(i)}_{\hat\sigma}} \rbrace _{\hat{\sigma }\in {\Sigma^(i)} _*} \), that is, an expression of the form (28) \( \begin{equation} \phi = \sum _{\hat{\sigma }\in {\Sigma^(i)} _*}\lambda _{\hat{\sigma }}\, \phi{^{(i)}_{\hat\sigma\rightarrow{q_{\hat\sigma}}}},\quad \text{ where } \sum _{\hat{\sigma }\in {\Sigma^(i)} _*}\lambda _{\hat{\sigma }}=1, \text{ and }\lambda _{\hat{\sigma }} \ge 0,\ \ \mathbf {q}_{\hat{\sigma }}\in {\mathcal{Q}^{(i)}} _j \quad \! \forall \ \hat{\sigma }=(j,a)\in {\Sigma^(i)} _*, \end{equation} \) in accordance with the characterization of \( \text {co}(\psi^{(i)}) \) established by (24) and Lemma 4.3. Note that our regret minimizer \( \tilde{\mathcal {R}}^{(i)} \) for the set \( \text {co}(\psi^{(i)}) \) (Algorithm 2) already outputs transformations \( \phi \) expressed in the form above.

Our algorithm operates incrementally, constructing a fixed point sequence-form strategy \( \mathbf {q} \) for \( \phi \) information set by information set in a top-down fashion. To formalize this notion of top-down construction, we will make use of the two following definitions:

Definition 4.8.

Let \( i \in [n] \) be a player and \( J \subseteq {\mathcal{J}^{(i)}} \) be a subset of that player’s information sets. We say that J is a trunk of \( {\mathcal{J}^{(i)}} \) if, for every \( j \in J \), all predecessors of j (that is, all \( j^{\prime }\in {\mathcal{J}^{(i)}} \) such that \( j^{\prime }\prec j \)) are also in J.

Example 4.9.

In the small game of Figure 1 (left), the sets \( \lbrace \text {A}\rbrace \), \( \lbrace \text {A},\text {B}\rbrace \), \( \lbrace \text {A},\text {C}\rbrace \), \( \lbrace \text {A},\text {D}\rbrace \), \( \lbrace \text {A},\text {B},\text {D}\rbrace \), \( \lbrace \text {A},\text {C}, \text {D}\rbrace \), and \( \lbrace \text {A},\text {B},\text {C}\,\text {D}\rbrace = {\mathcal{J}^{(i)}} [1] \), as well as the empty seq \( \emptyset \), exhaust all the possible trunks for Player 1. Conversely, set \( J = \lbrace \text {B}\rbrace \) is not a trunk for Player 1, because \( \text {A} \prec \text {B} \) and yet \( \text {A} \not\in J \).

Definition 4.10.

Let \( i\in [n] \) be a player, \( J \subseteq {\mathcal{J}^{(i)}} \) be a trunk of \( {\mathcal{J}^{(i)}} \) (Definition 4.8), and \( \phi \in \text {co}(\psi^{(i)}) \). We say that a vector \( \mathbf {x}\in \mathbb {R}_{\ge 0}^{{\Sigma^(i)} } \) is a J-partial fixed point of \( \phi \) if it satisfies the sequence-form constraints at all \( j\in J \), that is, (29) \( \begin{equation} \mathbf {x}[\varnothing ] = 1, \qquad \mathbf {x}[\sigma ^{(i)}(j)] = \sum _{a \in \mathcal {A}(j)} \mathbf {x}[(j,a)] \quad \forall \ j \in J, \end{equation} \) and furthermore (30) \( \begin{equation} \phi (\mathbf {x})[\varnothing ] = \mathbf {x}[\varnothing ] = 1, \qquad \phi (\mathbf {x})[(j,a)] = \mathbf {x}[(j,a)]\quad \forall \ j\in J,\ a\in \mathcal {A}(j). \end{equation} \)

It follows from Definition 4.10 that a \( {\mathcal{J}^{(i)}} \)-partial fixed point of \( \phi \) is a vector \( \mathbf {q}\in {\mathcal{Q}^{(i)}} \) such that \( \mathbf {q} = \phi (\mathbf {q}) \). The following simple lemma establishes an \( \emptyset \)-partial fixed point for any transformation \( \phi \in \text {co}(\psi^{(i)}) \):

Lemma 4.11.

Let \( i\in [n] \) be a player, and \( \phi = \sum _{\hat{\sigma }\in {\Sigma^(i)} _*}\lambda _{\hat{\sigma }}\phi{^{(i)}_{\hat\sigma\rightarrow{q^{\hat\sigma}}}} \) be any transformation in \( \text {co}(\psi^{(i)}) \), expressed as in (34). Then, the vector \( \mathbf {x}_0 \in \mathbb {R}_{\ge 0}^{{\Sigma^(i)} } \), whose entries are all zeros except for \( \mathbf {x}_0[\varnothing ] = 1 \), is a \( \emptyset \)-partial fixed point of \( \phi \).

Proof.

Condition (29) is straightforward. So, we focus on (30). Fix any \( \hat{\sigma }=(j,a)\in {\Sigma^(i)} _* \). The definition of \( \boldsymbol{M}{^{(i)}_{\hat\sigma\rightarrow{\hat{q}}}} \), given in (8), implies that \( \begin{equation*} \boldsymbol{M}{^{(i)}_{\hat\sigma\rightarrow{\hat{q}}}}[\sigma _r, \varnothing ] = {\left\lbrace \begin{array}{ll} 1 & \text{if }\sigma _r = \varnothing \\ 0 & \text{otherwise} \end{array}\right.} \qquad \quad \forall \,\sigma _r\in {\Sigma^(i)} . \end{equation*} \) Consequently, \( \phi{^{(i)}_{\hat\sigma\rightarrow{\hat{q}_{\hat\sigma}}}}(\mathbf {x}_0) = \boldsymbol{M}{^{(i)}_{\hat\sigma\rightarrow{\hat{q}_{\hat\sigma}}}}\,\mathbf {x}_0 = \mathbf {x}_0 \) (from expanding the matrix-vector multiplication). So, \( \phi (\mathbf {x}_0) = \sum _{\hat{\sigma }\in {\Sigma^(i)} _*} \lambda _{\hat{\sigma }}\phi{^{(i)}_{\hat\sigma\rightarrow{\hat{q}_{\hat\sigma}}}}(\mathbf {x}_0) = \mathbf {x}_0 \) and in particular \( \phi (\mathbf {x}_0)[\varnothing ]=\mathbf {x}_0[\varnothing ]=1 \). So, (30) holds, as we wanted to show.□

The key result that powers our algorithm to compute a fixed point of any \( \phi \in \text {co}{(}\psi^{(i)} {)} \) is that a J-partial fixed point can be cheaply promoted to be a \( (J\cup \lbrace j^*\rbrace) \)-partial fixed point, where \( j^* \in {\mathcal{J}^{(i)}} \setminus J \) is any information set whose predecessors are all in J. Algorithm 3 below gives an implementation of such a promotion: \( \text {EXTEND}(\phi , J, j^*, \mathbf {x}) \) starts with a J-partial fixed point \( \mathbf {x} \) of \( \phi \) and modifies all entries \( \mathbf {x}[(j^*,a)] \), \( a\in \mathcal {A}(j^*) \), so \( \mathbf {x} \) becomes a \( (J\cup \lbrace j^*\rbrace) \)-partial fixed point. Therefore, at a conceptual level, one can repeatedly invoke \( \text {EXTEND} \), growing the trunk J one information set at a time until \( J = {\mathcal{J}^{(i)}} \), starting from the \( \emptyset \)-partial fixed point \( \mathbf {x}_0 \) described in Lemma 4.11.

Before giving a proof of correctness and an analysis of the complexity of Extend, we illustrate an application of the algorithm in the simple extensive-form game of Figure 1.

Example 4.12.

Consider the simple extensive-form game of Figure 1 (left), and recall the three deviation functions \( \phi^{(1)}_{\mathsf{(A,1)}\rightarrow{\hat\pi_{a}}}, \phi^{(1)}_{\mathsf{(B,3)}\rightarrow{\hat\pi_{c}}}, \phi^{(1)}_{\mathsf{(B,3)}\rightarrow{\hat\pi_{c}}} \) considered in Example 3.6. We will illustrate two applications of Extend, with respect to the transformation \( \begin{equation*} \phi := \frac{1}{2}\, \phi^{(1)}_{\mathsf{(A,1)}\rightarrow{\hat\pi_{a}}} + \frac{1}{3}\, \phi^{(1)}_{\mathsf{(B,3)}\rightarrow{\hat\pi_{c}}} + \frac{1}{6}\,\phi^{(1)}_{\mathsf{(B,3)}\rightarrow{\hat\pi_{c}}} \in \text {co}(\psi^{(i)})[1]. \end{equation*} \)

In the first application, consider the trunk \( J = \emptyset \), information set \( j^* = \text {A} \), and the \( \emptyset \)-partial fixed point described in Lemma 4.11, that is, the vector \( \mathbf {x} \) whose components are all 0 except for the entry corresponding to the empty sequence \( \varnothing \), which is set to 1. In this case, \( \sigma _p = \sigma ^{(i)}(j^*) \) (Line 1 of Algorithm 3) is the empty sequence. Since no information set \( j^{\prime } \) can possibly satisfy \( j^{\prime } \preceq \sigma _p \), the vector \( \mathbf {r} \) defined on Line 2 is the zero vector. Consequently, the matrix \( \mathbf {W} \) defined on Line 3 is
which is a stochastic matrix. A fixed point for \( \mathbf {W} \) is given by the vector \( \mathbf {b} := (2/5, 3/5) \in \Delta ^{\lbrace {\mathsf{2}},{\mathsf{3}}\rbrace } \). So, the vector \( \mathbf {x}^{\prime } \) returned by extend is given by \( \begin{equation*} \mathbf {x}^{\prime }[\varnothing ] = 1,\quad \mathbf {x}^{\prime }[(\text {A},{\mathsf{1}})] = \frac{2}{5},\quad \mathbf {x}^{\prime }[(\text {A},{\mathsf{2}})] = \frac{3}{5} \end{equation*} \) and zero entries everywhere else. Direct computation reveals that \( \mathbf {x}^{\prime } \) is indeed a \( \lbrace \text {A}\rbrace \)-partial fixed point of \( \phi \).
In the second application of Extend, we start from the \( \lbrace \text {A}\rbrace \)-partial fixed point that we computed in the previous bullet point and extend it to a \( \lbrace \text {A},\text {D}\rbrace \)-partial fixed point. Here, \( j^* = \text {D} \), and so \( \sigma _p = (\text {A}, {\mathsf{2}}) \). The only \( j^{\prime } \preceq \sigma _p \) is \( \text {A} \), and so the vector \( \mathbf {r} \) defined on Line 2 is \( \begin{equation*} \mathbf {r}[{\mathsf{7}}] = \frac{1}{5}, \quad \mathbf {r}[{\mathsf{8}}] = 0. \end{equation*} \) Consequently, the matrix \( \mathbf {W} \) defined on Line 3 is
As expected, \( \mathbf {W} \in 3/5\, \mathbb {S}^{\lbrace {\mathsf{7}},{\mathsf{8}}\rbrace } = \mathbf {x}[(\text {A},{\mathsf{2}})]\,\mathbb {S}^{\lbrace {\mathsf{7}},{\mathsf{8}}\rbrace } \). A fixed point for \( \frac{1}{\mathbf {x}[(\text {A},{\mathsf{2}})]}\mathbf {W} = 5/3\,\mathbf {W} \) is given by the vector \( \mathbf {b} := (1, 0) \). So, the vector \( \mathbf {x}^{\prime } \) returned by Extend is given by \( \begin{equation*} \mathbf {x}^{\prime }[\varnothing ] = 1,\quad \mathbf {x}^{\prime }[(\text {A},{\mathsf{1}})] = \frac{2}{5},\quad \mathbf {x}^{\prime }[(\text {A},{\mathsf{2}})] = \frac{3}{5}, \quad \mathbf {x}^{\prime }[(\text {D},{\mathsf{7}})] = \frac{3}{5}, \quad \mathbf {x}^{\prime }[(\text {D},{\mathsf{8}})] = 0 \end{equation*} \) and zero entries everywhere else. Once again, direct computation reveals that \( \mathbf {x}^{\prime } \) is indeed a \( \lbrace \text {A},\text {D}\rbrace \)-partial fixed point of \( \phi \).

To prove correctness of \( \text {EXTEND} \) in Proposition 4.14, we will find useful the following technical lemma:

Lemma 4.13.

Let \( i\in [n] \) be any player, and \( \phi = \sum _{\hat{\sigma }\in {\Sigma^(i)} _*}\lambda _{\hat{\sigma }}\, \phi{^{(i)}_{\hat\sigma\rightarrow{q^{\hat\sigma}}}} \) be any linear transformation in \( \text {co}(\psi^{(i)}) \) expressed as in (28). Then, for all \( \sigma \in {\Sigma^(i)} \), \( \begin{equation*} \phi (\mathbf {x})[\sigma ] = {\begin{matrix} (1-\sum _\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\, \hat{\sigma }\preceq \sigma \lambda _{\hat{\sigma }}\end{matrix}})\mathbf {x}[\sigma ]+\sum _{j^{\prime } \preceq \sigma }\sum _{a^{\prime } \in \mathcal {A}(j^{\prime })}\lambda _{(j^{\prime },a^{\prime })}\,\mathbf {q}_{(j^{\prime },a^{\prime })}[\sigma ]\, \mathbf {x}[(j^{\prime },a^{\prime })]. \end{equation*} \)

Proof.

Fix any trigger sequence \( \hat{\sigma }= (j^{\prime },a^{\prime })\in {\Sigma^(i)} _* \). By expanding the matrix-vector multiplication between \( \boldsymbol{M}{^{(i)}_{\hat\sigma\rightarrow{\hat{q}_{\hat\sigma}}}} \) (Definition 3.2) and \( \mathbf {x} \), we have that for all \( \sigma \in {\Sigma^(i)} \), (31) \( \begin{align} \phi{^{(i)}_{\hat\sigma\rightarrow{q^{\hat\sigma}}}}(\mathbf {x})[\sigma ] & = \mathbf {x}[\sigma ]1\!\!1 [\sigma \not\succeq \hat{\sigma }] + \mathbf {q}_{\hat{\sigma }}[\sigma ] \mathbf {x}[\hat{\sigma }] 1\!\!1 [\sigma \succeq j^{\prime }]. \end{align} \) Therefore, for all \( \sigma \in {\Sigma^(i)} \),

as we wanted to show.□

Proposition 4.14.

Let \( i\in [n] \) be a player, \( \phi = \sum _{\hat{\sigma }\in {\Sigma^(i)} _*}\lambda _{\hat{\sigma }}\, \phi{^{(i)}_{\hat\sigma\rightarrow{q^{\hat\sigma}}}} \) be a linear transformation in \( \text {co}(\psi^{(i)}) \) expressed as in (28), \( \mathbf {x} \in \mathbb {R}_{\ge 0}^{{\Sigma^(i)} } \) be a J-partial fixed point of \( \phi \), and \( j^*\in {\mathcal{J}^{(i)}} \) be a information set not in J such that its immediate predecessor is in J. Then, \( \text {EXTEND}(\phi ,J,j^*,\mathbf {x}) \), given in Algorithm 3, computes a \( (J\cup \lbrace j^*\rbrace) \)-partial fixed point of \( \phi \) in time upper bounded by \( O(|{\Sigma^(i)} |\,|\mathcal {A}(j^*)| + \textsf { FP}(|\mathcal {A}(j^*)|)) \).

Proof.

We break the proof into four parts. In the first part, we analyze the sum of the entries of vector \( \mathbf {r} \) defined in Line 2 of Algorithm 3. In the second part, we prove that \( \frac{1}{\mathbf {x}[\sigma _p]}\mathbf {W} \in \mathbb {S}^{\mathcal {A}(j^*)} \), as stated in Line 3. In the third part, we show that the output \( \mathbf {x}^{\prime } \) of the algorithm is indeed a \( (J\cup \lbrace j^*\rbrace) \)-partial fixed point of \( \phi \). Finally, in the fourth part, we analyze the computational complexity of the algorithm. Part 1: sum of the entries of \( \mathbf {r} \). In this first part of the proof, we study the sum of the entries of the vector \( \mathbf {r} \) defined on Line 2 in Algorithm 3. By hypothesis, the immediate predecessor of \( j^* \) is in J. So, because \( \mathbf {x} \) is a J-partial fixed point, the sequence \( \sigma _p := \sigma ^{(i)}(j^*) \) satisfies \( \phi (\mathbf {x})[\sigma _p] = \mathbf {x}[\sigma _p] \). Hence, expanding the term \( \phi (\mathbf {x})[\sigma _p] \) using Lemma 4.13, we conclude that \( \begin{equation*} {\begin{matrix} (1-\sum _\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\, \hat{\sigma }\preceq \sigma _p\lambda _{\hat{\sigma }}\end{matrix}})\mathbf {x}[\sigma _p]+\sum _{j^{\prime } \preceq \sigma _p}\sum _{a^{\prime } \in \mathcal {A}(j^{\prime })}\lambda _{(j^{\prime },a^{\prime })}\,\mathbf {q}_{(j^{\prime },a^{\prime })}[\sigma _p]\, \mathbf {x}[(j^{\prime },a^{\prime })] = \mathbf {x}[\sigma _p]. \end{equation*} \) By rearranging terms, we have (32) \( \begin{equation} {\begin{matrix} (\sum _\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq \sigma _p\lambda _{\hat{\sigma }}\end{matrix}})\mathbf {x}[\sigma _p] = \sum _{j^{\prime } \preceq \sigma _p}\sum _{a^{\prime } \in \mathcal {A}(j^{\prime })}\lambda _{(j^{\prime },a^{\prime })}\,\mathbf {q}_{(j^{\prime },a^{\prime })}[\sigma _p]\, \mathbf {x}[(j^{\prime },a^{\prime })] . \end{equation} \)

However, since \( \mathbf {q}_{(j^{\prime },a^{\prime })} \in {\mathcal{Q}^{(i)}} _{j^{\prime }} \) for all \( j^{\prime }\preceq \sigma _p, a^{\prime }\in \mathcal {A}(j^{\prime }) \), the sequence-form (probability-mass-conservation) constraints (2) imply that \( \begin{equation*} \mathbf {q}_{(j^{\prime },a^{\prime })}[\sigma _p] = \sum _{a \in \mathcal {A}(j^*)} \mathbf {q}_{(j^{\prime },a^{\prime })}[(j^*,a)].\qquad \forall \ j^{\prime }\preceq \sigma _p, a^{\prime }\in \mathcal {A}(j^{\prime }). \end{equation*} \) Hence, plugging the previous equality into (32), we obtain \( \begin{align*} {\begin{matrix} (\sum _\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq \sigma _p\lambda _{\hat{\sigma }}\end{matrix}})\mathbf {x}[\sigma _p] & = \sum _{j^{\prime } \preceq \sigma _p}\sum _{a^{\prime } \in \mathcal {A}(j^{\prime })}\sum _{a\in \mathcal {A}(j^*)}\lambda _{(j^{\prime },a^{\prime })}\,\mathbf {q}_{(j^{\prime },a^{\prime })}[(j^*,a)]\, \mathbf {x}[(j^{\prime },a^{\prime })] \\ & = \sum _{a\in \mathcal {A}(j^*)} {\begin{matrix} (\sum _{j^{\prime } \preceq \sigma _p}\sum _{a^{\prime } \in \mathcal {A}(j^{\prime })}\lambda _{(j^{\prime },a^{\prime })}\,\mathbf {q}_{(j^{\prime },a^{\prime })}[(j^*,a)]\, \mathbf {x}[(j^{\prime },a^{\prime })] \end{matrix}}) \\ & = \sum _{a\in \mathcal {A}(j^*)} \mathbf {r}[a], \end{align*} \) where the last equality follows from recognizing the definition of \( \mathbf {r} \) on Line 2 of Algorithm 3. So, in conclusion, (33) \( \begin{equation} \sum _{a\in \mathcal {A}(j^*)} \mathbf {r}[a] = {\begin{matrix} (\sum _\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq \sigma _p\lambda _{\hat{\sigma }}\end{matrix}})\mathbf {x}[\sigma _p]. \end{equation} \)

Part 2: \( \mathbf {W} \) belongs to \( \mathbf {x}[\sigma _p]\cdot \mathbb {S}^{\mathcal {A}(j^*)} \). In this second part of the proof, we will prove that all columns of the nonnegative matrix \( \mathbf {W} \), defined on Line 3 of Algorithm 3, sum to the same value \( \mathbf {x}[\sigma _p] \). Fix any \( a_c \in \mathcal {A}(j^*) \). Using the definition of \( \mathbf {W} \), the sum of the entries in the column of \( \mathbf {W} \) corresponding to action \( a_c \) is \( \begin{align*} & \sum _{a_r\in \mathcal {A}(j^*)} \mathbf {W}[a_r, a_c]=\!\!\!\sum _{a_r\in \mathcal {A}(j^*)}\!\!\!\!\mathbf {r}[a_r] + {\begin{matrix} (\lambda _{(j^*,a_c)}\mathbf {q}_{(j^*,a_c)}[(j^*, a_r)] + {\begin{matrix} (1-\!\! \sum _\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq (j^*, a_c) \!\!\lambda _{\hat{\sigma }}\end{matrix}})\,1\!\!1 [a_r = a_c]\end{matrix}})\,\mathbf {x}[\sigma _p] \\ & \quad = {\begin{matrix} (\sum _\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq \sigma _p\lambda _{\hat{\sigma }}\end{matrix}})\mathbf {x}[\sigma _p] + \mathbf {x}[\sigma _p]\,\lambda _{(j^*,a_c)}{\begin{matrix} (\sum _{a_r\in \mathcal {A}(j^*)}\mathbf {q}_{(j^*,a_c)}[(j^*, a_r)]\end{matrix}}) + {\begin{matrix} (1-\!\! \sum _\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq (j^*,a_c) \!\!\lambda _{\hat{\sigma }}\end{matrix}})\,\mathbf {x}[\sigma _p] \\ & \quad ={\begin{matrix} (\sum _\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq \sigma _p\lambda _{\hat{\sigma }}\end{matrix}})\mathbf {x}[\sigma _p] + \mathbf {x}[\sigma _p]\,\lambda _{(j^*,a_c)} + {\begin{matrix} (1- \!\!\sum _\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq (j^*, a_c) \!\!\lambda _{\hat{\sigma }}\end{matrix}})\,\mathbf {x}[\sigma _p], \end{align*} \) where we used (33) in the second equality, and the fact that \( \sum _{a_r\in \mathcal {A}(j^*)}\mathbf {q}_{(j^*,a_c)}[(j^*, a_r)] = 1 \), since \( \mathbf {q}_{(j^*,a_c)}\in {\mathcal{Q}^{(i)}} _{j^*} \) (Definition 2.3), in the third. Using the fact that the set of all predecessors of sequence \( (j^*,a_c) \) is the union between all predecessors of the parent sequence \( \sigma _p \) and \( \lbrace (j^*,a_c)\rbrace \) itself, after rearranging terms, we can write \( \begin{align*} \sum _{a_r\in \mathcal {A}(j^*)} \mathbf {W}[a_r, a_c] & ={\begin{matrix} (\sum _{\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq \sigma _p}\lambda _{\hat{\sigma }}\end{matrix}})\mathbf {x}[\sigma _p] + \mathbf {x}[\sigma _p]\,\lambda _{(j^*,a_c)} + {\begin{matrix} (1-\!\! \sum _{\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq (j^*, a_c)}\!\! \lambda _{\hat{\sigma }}\end{matrix}})\,\mathbf {x}[\sigma _p] \\ & = \mathbf {x}[\sigma _p]\,{\begin{matrix} (1+\lambda _{(j^*, a_c)} + \sum _{\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq \sigma _p}\lambda _{\hat{\sigma }} - \sum _{\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq (j^*, a_c)} \!\!\lambda _{\hat{\sigma }}\end{matrix}}) \\ &= \mathbf {x}[\sigma _p]. \end{align*} \) So, all columns of the nonnegative matrix \( \mathbf {W} \) sum to the same nonnegative quantity \( \mathbf {x}[\sigma _p] \) and therefore \( \mathbf {W} \in \mathbf {x}[\sigma _p]\cdot \mathbb {S}^{\mathcal {A}(j^*)} \).

Part 3: \( \mathbf {x}^{\prime } \) is a \( (J\cup \lbrace j^*\rbrace) \)-partial fixed point of \( \phi \). We start by arguing that \( \mathbf {x}^{\prime } \) satisfies the sequence-form constraints (29) for all \( j\in J\cup \lbrace j^*\rbrace \). The crucial observation is that Algorithm 3 only modifies the indices corresponding to sequences \( (j^*, a) \) for \( a \in \mathcal {A}(j^*) \) and keeps all other entries unmodified. In particular, (34) \( \begin{equation} \mathbf {x}^{\prime }[(j,a)] = \mathbf {x}[(j,a)]\qquad \quad \forall \ j\in J, a\in \mathcal {A}(j). \end{equation} \) Furthermore, because J is a trunk for Player i, the above equation implies that \( \begin{equation*} \mathbf {x}^{\prime }[\sigma ^{(i)}(j)] = \mathbf {x}[\sigma ^{(i)}(j)] \quad \quad \forall \ j\in J. \end{equation*} \) Hence, using the hypothesis that \( \mathbf {x} \) is a J-partial fixed point of \( \phi \) at the beginning of the call, we immediately conclude that the constraints (29) corresponding to \( j\in J \) hold for vector \( \mathbf {x}^{\prime } \), and the only condition that remains to be verified is that (35) \( \begin{equation} \mathbf {x}^{\prime }[\sigma _p] = \sum _{a\in \mathcal {A}(j^*)} \mathbf {x}^{\prime }[(j^*, a)]. \end{equation} \) If \( \mathbf {x}[\sigma _p] = 0 \), then all entries \( \mathbf {x}^{\prime }[(j^*, a)] \) are set to 0 (Line 5) and so (35) is trivially satisfied. However, if \( \mathbf {x}[\sigma _p] \ne 0 \), then \( \mathbf {x}^{\prime }[(j^*,a)] \) is set to the value \( \mathbf {x}[\sigma _p]\, \mathbf {b}[a] \) (Line 8), and, since \( \mathbf {b} \) belongs to the simplex \( \Delta ^{\mathcal {A}(j^*)} \), (35) holds in this case, too. So, \( \mathbf {x}^{\prime } \) satisfies (29) for all \( j \in J\cup \lbrace j^*\rbrace , \) as we wanted to show.

We now turn our attention to conditions (30). From Lemma 4.13 it follows that \( \phi (\mathbf {x})[\sigma ] \) only depends on the values of \( \mathbf {x}[(j^{\prime },a^{\prime })] \) for \( j^{\prime } \preceq \sigma , a^{\prime }\in \mathcal {A}(j^{\prime }) \). So, from (34) it follows that \( \begin{equation*} \phi (\mathbf {x}^{\prime })[(j,a)] = \mathbf {x}[(j,a)] = \mathbf {x}^{\prime }[(j,a)]\quad \qquad \forall \ j\in J, a\in \mathcal {A}(j), \end{equation*} \) and the only condition that remains to be verified is that (36) \( \begin{equation} \phi (\mathbf {x}^{\prime })[(j^*,a^*)] = \mathbf {x}^{\prime }[(j^*, a^*)] \qquad \quad \forall \ a^*\in \mathcal {A}(j^*). \end{equation} \) Fix any \( a^*\in \mathcal {A}(j^*) \). We break the analysis into two cases.

If \( \mathbf {x}[\sigma _p] = 0 \), then \( \mathbf {w} = \mathbf {0} \) (Line 5) and therefore \( \mathbf {x}^{\prime }[(j^*,a^*)] = 0 \). Hence, to show that (36) holds, we need to show that \( \phi (\mathbf {x}^{\prime })[(j^*,a^*)]=0 \). To show that, we start from applying Lemma 4.13: \( \begin{align*} \phi (\mathbf {x}^{\prime })[(j^*,a^*)] & = \sum _{j^{\prime } \preceq (j^*,a^*)}\sum _{a^{\prime } \in \mathcal {A}(j^{\prime })}\lambda _{(j^{\prime },a^{\prime })}\,\mathbf {q}_{(j^{\prime },a^{\prime })}[(j^*,a^*)]\, \mathbf {x}^{\prime }[(j^{\prime },a^{\prime })]. \end{align*} \) Now, using the fact that \( \lbrace j^{\prime }\in {\mathcal{J}^{(i)}} : j^{\prime }\preceq (j^*, a^*)\rbrace \) is equal to the disjoint union \( \lbrace j^{\prime }\in {\mathcal{J}^{(i)}} : j^{\prime } \preceq \sigma _p\rbrace \cup \lbrace j^*\rbrace \), and that \( \mathbf {x}^{\prime }[(j^*, a^{\prime })] = 0 \) for all \( a^{\prime }\in \mathcal {A}(j^*) \), we have \( \begin{align} \phi (\mathbf {x}^{\prime })[(j^*,a^*)] & = \sum _{j^{\prime } \preceq \sigma _p}\sum _{a^{\prime } \in \mathcal {A}(j^{\prime })}\lambda _{(j^{\prime },a^{\prime })}\,\mathbf {q}_{(j^{\prime },a^{\prime })}[(j^*,a^*)]\, \mathbf {x}^{\prime }[(j^{\prime },a^{\prime })]. \end{align} \) Since \( \mathbf {q}_{(j^{\prime },a^{\prime })} \in {\mathcal{Q}^{(i)}} _{j^{\prime }} \) is a nonnegative vector, from Definition 2.3, it follows that \( \begin{equation} \mathbf {q}_{(j^{\prime },a^{\prime })}[\sigma _p] = \sum _{a \in \mathcal {A}(j^*)}\mathbf {q}_{(j^{\prime },a^{\prime })}[(j^*,a)] \ge \mathbf {q}_{(j^{\prime },a^{\prime })}[(j^*,a^*)]. \end{equation} \) Hence, substituting (38) into (37), \( \begin{align*} \phi (\mathbf {x}^{\prime })[(j^*,a^*)] & \le \sum _{j^{\prime } \preceq \sigma _p}\sum _{a^{\prime } \in \mathcal {A}(j^{\prime })}\lambda _{(j^{\prime },a^{\prime })}\,\mathbf {q}_{(j^{\prime },a^{\prime })}[\sigma _p]\, \mathbf {x}^{\prime }[(j^{\prime },a^{\prime })] \\ & = \phi (\mathbf {x}^{\prime })[\sigma _p] = \mathbf {x}^{\prime }[\sigma _p] = 0, \end{align*} \) where the first equality follows again from Lemma 4.13, and the second equality follows from the inductive hypothesis that \( \mathbf {x}^{\prime } \) is a J-partial fixed point of \( \phi \). Since \( \mathbf {x}^{\prime } \) is a nonnegative vector and \( \phi \) maps nonnegative vectors to nonnegative vectors, we conclude that \( \phi (\mathbf {x}^{\prime })[(j^*, a^*)] = 0, \) as we wanted to show.
If \( \mathbf {x}[\sigma _p] \ne 0 \), then \( \mathbf {b} \) is a fixed point of the stochastic matrix \( \frac{1}{\mathbf {x}[\sigma _p]}\mathbf {W} \), and therefore it satisfies \( \begin{equation*} \sum _{a_c \in \mathcal {A}(j^*)} \mathbf {W}[a^*, a_c]\, \mathbf {b}[a_c] = \mathbf {x}[\sigma _p]\,\mathbf {b}[a^*]. \end{equation*} \) Hence, by using the fact that \( \mathbf {x}^{\prime }[(j^*,a^*)]= \mathbf {x}[\sigma _p]\,\mathbf {b}[a^*] \) (Line 11), we can write \( \begin{equation*} \mathbf {x}^{\prime }[(j^*,a^*)] = \sum _{a_c \in \mathcal {A}(j^*)} \mathbf {W}[a^*, a_c]\, \mathbf {b}[a_c]. \end{equation*} \) By expanding the definition of \( \mathbf {W}[a^*, a_c] \) (Line 3) on the right-hand side \( \begin{align*} \mathbf {x}^{\prime }[(j^*,a^*)] & = \sum _{a_c\in \mathcal {A}(j^*)}{\begin{matrix} [\mathbf {r}[a^*] + {\begin{matrix} (\lambda _{(j^*,a_c)}\mathbf {q}_{(j^*,a_c)}[(j^*, a^*)] + {\begin{matrix} (1- \!\!\sum _\hat{\sigma }\in {\Sigma^(i)} _* \\ \hat{\sigma }\preceq (j^*, a_c) \!\!\lambda _{\hat{\sigma }}\end{matrix}})\,1\!\!1 [a^* = a_c]\end{matrix}})\,\mathbf {x}[\sigma _p]\end{matrix}} ]\mathbf {b}[a_c] \\ & =\mathbf {r}[a^*] + {\begin{matrix} (1- \!\!\sum _{\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq (j^*, a^*)} \!\!\lambda _{\hat{\sigma }}\end{matrix}})\mathbf {x}^{\prime }[(j^*,a^*)] +\!\! \sum _{a_c\in \mathcal {A}(j^*)}\!\!\lambda _{(j^*,a_c)}\mathbf {q}_{(j^*,a_c)}[(j^*, a^*)]\mathbf {x}^{\prime }[(j^*,a_c)], \end{align*} \) where in the second equality we used the fact that \( \mathbf {b}\in \Delta ^{\mathcal {A}(j^*)} \), and the fact that \( \mathbf {x}^{\prime }[(j^*,a)] = \mathbf {x}[\sigma _p]\,\mathbf {b}[a] \) for all \( a\in \mathcal {A}(j^*) \) (Line 11). Expanding the definition of \( \mathbf {r} \) (Line 2), \( \begin{align*} \mathbf {x}^{\prime }[(j^*,a^*)] & = \sum _{j^{\prime }\preceq \sigma _p}\sum _{a^{\prime }\in \mathcal {A}(j^{\prime })} \lambda _{(j^{\prime },a^{\prime })}\,\mathbf {q}_{(j^{\prime },a^{\prime })}[(j^*, a^*)]\,\mathbf {x}[(j^{\prime },a^{\prime })] + {\begin{matrix} (1- \!\!\sum _{\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq (j^*, a^*)} \!\!\lambda _{\hat{\sigma }}\end{matrix}})\mathbf {x}^{\prime }[(j^*,a^*)] \\ & \quad \ +\!\! \sum _{a_c\in \mathcal {A}(j^*)}\!\!\lambda _{(j^*,a_c)}\mathbf {q}_{(j^*,a_c)}[(j^*, a^*)]\mathbf {x}^{\prime }[(j^*,a_c)] \\ & ={\begin{matrix} (1-\!\!\!\!\sum _{\hat{\sigma }\in {\Sigma^(i)} _*\!\!,\,\hat{\sigma }\preceq (j^*,a^*)}\!\!\!\!\lambda _{\hat{\sigma }}\end{matrix}})\mathbf {x}^{\prime }[(j^*,a^*)]+\!\!\!\sum _{j^{\prime } \preceq (j^*,a^*)}\sum _{a^{\prime } \in \mathcal {A}(j^{\prime })}\lambda _{(j^{\prime },a^{\prime })}\,\mathbf {q}_{(j^{\prime },a^{\prime })}[(j^*,a^*)]\, \mathbf {x}^{\prime }[(j^{\prime },a^{\prime })] \\ & = \phi (\mathbf {x}^{\prime })[(j^*, a^*)], \end{align*} \) where the fact that \( \lbrace j^{\prime }\in {\mathcal{J}^{(i)}} : j^{\prime }\preceq (j^*, a^*)\rbrace \) is equal to the disjoint union \( \lbrace j^{\prime }\in {\mathcal{J}^{(i)}} : j^{\prime } \preceq \sigma _p\rbrace \cup \lbrace j^*\rbrace \) in the second equality, and Lemma 4.13 in the third equality.

Part 4: Complexity analysis. In this part, we bound the number of operations required by Algorithm 3.

Line 2: Each entry \( \mathbf {r}[a] \) can be trivially computed in \( O(|{\Sigma^(i)} |) \) time by traversing all predecessors of \( j^* \). So, the vector \( \mathbf {r} \) requires \( O(|{\Sigma^(i)} |\,|\mathcal {A}(j^*)| \)) operations to be computed.
Line 3: If \( a_r = a_c \), then the number of operations required to compute \( \mathbf {W}[a_r,a_c] \) is dominated by the computation of \( \sum _{\hat{\sigma }\preceq (j^*,a_c)} \lambda _{\hat{\sigma }} \), which requires \( O(|{\Sigma^(i)} |) \) operations. Otherwise, if \( a_r \ne a_c \), then the computation of \( \mathbf {W}[a_r,a_c] \) can be carried out in a constant number of operations. Hence, the computation of \( \mathbf {W}[a_r,a_c] \) for all \( a_r,a_c\in \mathcal {A}(j^*) \) requires \( O(|{\Sigma^(i)} |\,|\mathcal {A}(j^*)| + |\mathcal {A}(j^*)|^2) \) time. Since \( |\mathcal {A}(j^*)|\le |{\Sigma^(i)} | \), the total number of operations required to compute all entries of \( \mathbf {W} \) is \( O(|{\Sigma^(i)} |\,|\mathcal {A}(j^*)|) \).
Lines 4 to 8: If \( \mathbf {x}[\sigma _p]=0 \), then the computation of \( \mathbf {w} \) requires \( O(|\mathcal {A}(j^*)|) \) operations. If, however, \( \mathbf {x}[\sigma _p] \ne 0 \), then the computation of \( \mathbf {w} \) requires \( O(\textsf {FP}(|\mathcal {A}(j^*)|) + |\mathcal {A}(j^*)|) \) operation. Since clearly any fixed point oracle for a square matrix of order \( |\mathcal {A}(j^*)| \) needs to spend time at least \( \Omega (|\mathcal {A}(j^*)|) \) time writing the output, \( O(\textsf {FP}(|\mathcal {A}(j^*)|) + |\mathcal {A}(j^*)|)=O(\textsf {FP}(|\mathcal {A}(j^*)|)) \). So, no matter the value of \( \mathbf {x}[\sigma _p] \), the number of iterations is bounded by \( O(\textsf {FP}(|\mathcal {A}(j^*)|)) \).
Line 11: finally, the algorithm spends \( O(|\mathcal {A}(j^*)|) \) operations to set entries of \( \mathbf {x} \).

Summing the number of operations of each of the different steps of the algorithm, we conclude that each call to \( \text {EXTEND}(\phi ,J,j^*,\mathbf {x}) \) requires at most \( O(|{\Sigma^(i)} |\,|\mathcal {A}(j^*)| + \textsf {FP}(|\mathcal {A}(j^*)|)) \) operations.□

A fixed point for \( \phi \in \text {co}(\psi^{(i)}) \) can be computed by repeatedly invoking \( \text {EXTEND} \) to grow the trunk J one information set at a time, until \( J = {\mathcal{J}^{(i)}} \), starting from the \( \emptyset \)-partial fixed point \( \mathbf {x}_0 \in \mathbb {R}_{\ge 0}^{{\Sigma^(i)} } \) introduced in Lemma 4.11. This leads to Algorithm 4, whose correctness and computational complexity is a straightforward corollary of Proposition 4.14.

Corollary 4.15.

Let \( i\in [n] \) be a player, and let \( \phi = \sum _{\hat{\sigma }\in {\Sigma^(i)} _*} \lambda _{\hat{\sigma }}\phi{^{(i)}_{\hat\sigma\rightarrow{q^{\hat\sigma}}}} \) be a transformation in \( \text {co}(\psi^{(i)}) \) expressed as in (28). Then, Algorithm 4 computes a fixed point \( {\mathcal{Q}^{(i)}} \ni \mathbf {q}=\phi (\mathbf {q}) \) in time upper bounded as \( O(|{\Sigma^(i)} |^2 + \sum _{j\in {\mathcal{J}^{(i)}} }\textsf { FP}(|\mathcal {A}(j)|)) \).

4.5 The Complete Algorithm

In this subsection, we put together all the pieces we constructed in the previous subsections to build a \( \psi^{(i)} \)-regret minimizer that satisfies all requirements of Problem 1.

First, we provide in Algorithm 5 our \( \text{co}(\Psi^{(i)}) \)-regret minimizer. Its correctness follows from the correctness of the construction by Gordon et al. [20], described in Section 2.4, and by using Theorem 4.6 and Corollary 4.15.

Theorem 4.16.

Let \( i \in [n] \) be any player. \( \bar{\mathcal {R}}^{(i)} \), defined in Algorithm 5, is a \( \text{co}(\Psi^{(i)}) \)-regret minimizer for the set of sequence-form strategies \( {\mathcal{Q}^{(i)}} \), whose cumulative regret upon observing linear utility functions \( \ell ^1,\dots ,\ell ^T \) satisfies \( \begin{equation*} R^T \le 2D |{\Sigma^(i)} | \sqrt {T}, \end{equation*} \) where D is any constant such that \( \max _{\mathbf {q},\mathbf {q}^{\prime }} \lbrace \ell ^t(\mathbf {q})-\ell ^t(\mathbf {q}^{\prime })\rbrace \le D \) for all \( t=1,\dots , T \). Furthermore, the ObserveUtility operation requires time \( O(|{\Sigma^(i)} |^2) \), and the NextElement operation requires time \( O(|{\Sigma^(i)} |^2 + \sum _{j\in {\mathcal{J}^{(i)}} }\textsf { FP}(|\mathcal {A}(j)|)) \) at all t.

Proof.

From the properties of Gordon et al.’s construction [20] (Section 2.4), the cumulative \( \psi^{(i)} \)-regret incurred by \( \bar{\mathcal {R}}^{(i)} \) is equal, at all times, to the cumulative regret incurred by the underlying regret minimizer \( \tilde{\mathcal {R}}^{(i)} \) for the set of deviations \( \psi^{(i)} \). So, the regret bound follows from the regret analysis of Theorem 4.6.

Similarly, the complexity analysis follows from combining the analysis of \( \tilde{\mathcal {R}}^{(i)} \) and of FixedPoint (Algorithm 4), together with the observation that the canonical representation \( \langle L^t \rangle \) of the linear utility function \( \text {co}(\psi^{(i)})\ni \phi \mapsto \ell ^t(\phi (\mathbf {q}^t)) \) is the matrix \( \langle \ell ^t\rangle (\mathbf {q}^t)^\top \), which can be trivially computed in \( O(|{\Sigma^(i)} |^2) \) time.

Since \( \text {co}(\psi^{(i)}) \supseteq \psi^{(i)} \), Algorithm 5 is in particular also a \( \psi^{(i)} \)-regret minimizer for the set \( {\mathcal{Q}^{(i)}} \), and thus Theorem 4.16 establishes that Algorithm 5 provides a solution to Problem 2.

To obtain our \( \psi^{(i)} \)-regret minimizer for the set of pure sequence-form strategies \( {\Pi^{(i)}} \) from Algorithm 5, we apply the construction described in Section 4.2. The resulting regret minimizer, \( \mathcal {R}^{(i)} \), is given in Algorithm 6. Applying Lemma 4.1 immediately yields the following corollary:

Corollary 4.17.

Let \( i \in [n] \) be any player. \( \mathcal {R}^{(i)} \), defined in Algorithm 6, is a \( \psi^{(i)} \)-regret minimizer for the set \( {\Pi^{(i)}} \), whose cumulative regret \( R^T \) upon observing linear utility functions \( \ell ^1,\dots ,\ell ^T \) satisfies \( \begin{equation*} R^T \le 2D |{\Sigma^(i)} | \sqrt {T} + {4}D \sqrt {T|{\Sigma^(i)} |\log (1/\delta)}\text{ with probability at least } 1-\delta , \end{equation*} \) for any \( \delta \in (0,1) \), where D is any constant such that \( \max _{\mathbf {q},\mathbf {q}^{\prime }}\lbrace \ell ^t(\mathbf {q})-\ell ^t(\mathbf {q}^{\prime })\rbrace \le D \) for all \( t=1,\dots , T \). Furthermore, the ObserveUtility operation runs in \( O(|{\Sigma^(i)} |^2) \) time, and the NextElement operation runs in \( O(|{\Sigma^(i)} |^2 + \sum _{j\in {\mathcal{J}^{(i)}} }\textsf { FP}(|\mathcal {A}(j)|)) \) time at all t.

Therefore, Algorithm 6 provides a regret minimizer that satisfies all requirements of Problem 1.

5 CONVERGENCE TO EFCE

Theorem 3.8 implies that if all players \( i\in [n] \) play the game repeatedly according to the outputs of a \( \psi^{(i)} \)-regret minimizer for \( {\Pi^{(i)}} \) that observes, at each time t, the linear utility function given in (10), then the empirical frequency of play is a \( (\frac{1}{T}\max _i R^{(i),\,T}) \)-EFCE, where \( R^{(i),\,T} \) is the regret cumulated by the \( \psi^{(i)} \)-regret minimizer for Player i.

In particular, when all players play according to the strategies recommended by Algorithm 6, the following can be shown by combining Theorem 4.17 and Theorem 3.8:

Theorem 5.1.

When all players \( i = 1, \dots , n \) play according to the outputs of the regret minimizer \( \mathcal {R}^{(i)} \) defined in Algorithm 6, receiving as feedback at all times t the linear utility functions \( \ell ^{(i),\,t} \) defined in (10), the empirical frequency of play after T repetitions of the game is a \( \begin{equation*} {\begin{matrix} (D\frac{2|\mathcal {H}| + 4\sqrt {|\mathcal {H}|\log (n/\delta)}}{\sqrt T}\end{matrix}}){-EFCE with\; probability\; at\; least } 1-\delta , \end{equation*} \) for any \( \delta \in (0,1) \), where D is the difference between the maximum and minimum payoff of the game, and \( |\mathcal {H}| \) is the number of nodes in the game tree.

Proof.

Let \( R^{(i),\,T} \) be the regret cumulated by \( \mathcal {R}^{(i)} \) (Algorithm 6) up to time T. From Theorem 4.17, we have that for all \( \delta ^{\prime }\in (0,1) \), \( \begin{align*} \mathbb {P}{\begin{matrix} [R^{(i),\,T} \le 2D |\mathcal {H}| \sqrt {T} + 4D \sqrt {T |\mathcal {H}|\log (1/\delta ^{\prime })}\end{matrix}} ] & \ge \mathbb {P}{\begin{matrix} [R^{(i),\,T} \le 2D |{\Sigma^(i)} | \sqrt {T} + 4D \sqrt {T|{\Sigma^(i)} |\log (1/\delta ^{\prime })}\end{matrix}} ] \\ & \ge 1-\delta ^{\prime }, \end{align*} \) where the first inequality follows from the fact that \( |{\Sigma^(i)} | = \sum _{j\in {\mathcal{J}^{(i)}} }|\mathcal {A}(j)| \le \sum _{h\in \mathcal {H}} |\mathcal {A}(h)| \le |\mathcal {H}| \) (the number of edges in a tree is always less than the number of nodes). So, \( \begin{align*} & \mathbb {P}{\begin{matrix} [\max _i R^{(i),\,T} \le 2D |\mathcal {H}| \sqrt {T} + 4D \sqrt {T|\mathcal {H}|\log (1/\delta ^{\prime })}\end{matrix}} ] \\ &\hspace{170.71652pt} = \mathbb {P}{\begin{matrix} [\bigcap _i \big\lbrace R^{(i),\,T} \le 2D |\mathcal {H}| \sqrt {T} + 4D \sqrt {T|\mathcal {H}|\log (1/\delta ^{\prime })} \big\rbrace \end{matrix}} ] \\ &\hspace{170.71652pt}\ge 1-n\delta ^{\prime }, \end{align*} \) where the inequality follows from the union bound. Substituting \( \delta := n\delta ^{\prime } \) and using Theorem 3.8 yields the result.□

A standard application of the Borel-Cantelli lemma enables us to move from the high-probability guarantees at finite time of Theorem 5.1 to almost-sure guarantees in the limit.

Corollary 5.2.

When all players \( i = 1, \dots , n \) play infinitely many repetitions of the game according to the outputs of the regret minimizer \( \mathcal {R}^{(i)} \) defined in Algorithm 6, receiving as feedback at all times t the linear utility functions \( \ell ^{(i),\,t} \) defined in (10), the empirical frequency of play converges, almost surely, to the set of EFCEs.

ACKNOWLEDGMENTS

We thank the anonymous reviewers for their useful suggestions. We thank Dustin Morrill, Marc Lanctot, Amy Greenwald, and Mike Bowling for a useful discussion about behavioral deviation functions, and for pointing out an incorrect statement related to their recent framework [36] in a preliminary version of this article posted on arXiv. We are also grateful to the anonymous reviewers at NeurIPS 2020, where a preliminary version of this article appeared, for their useful comments.

Footnotes

¹ In normal-form games, a CE can be computed in polynomial time via linear programming. In extensive-form games, the computational complexity of computing a CE depends on the specific notion of correlation that is adopted. As discussed in more detail in the following, the problem can be solved in polynomial time for the notion studied in this article.
Footnote
² The term “subtree” does not refer to a subtree of the game tree, but rather to a subtree of the partially ordered set \( ({\mathcal{J}^{(i)}} ,\prec) \). In other words, the term subtree here refers to the fact that the quantities are specified only at information set j and all of its descendants.
³ Sequence-form strategies are also known under the term realization plans in the literature (e.g., Reference [47]). We will not use that latter term in this article.
Footnote
⁴ Throughout the article, we use the term linear and affine when referring to a function f on a domain \( \mathcal {X} \) to mean that f extends to a linear or affine function in the Euclidean space that contains \( \mathcal {X} \).
Footnote
⁵ More precisely, throughout the article, we make the following, standard technical assumptions about the way randomness can be leveraged by the regret minimizer and by the environment it interacts with:
• At all t, the regret minimizer has access to a private source of randomness, which we model as a random vector with finite mean \( \mathbf {S}^t \). Similarly, the environment has access to a private source of randomness, which we model again as a random vector with finite mean \( \mathbf {E}^t \). All sources of randomness are independent, that is, \( \lbrace \mathbf {S}^1, \mathbf {E}^1, \mathbf {S}^2, \mathbf {E}^2, \dots \rbrace \) are independent random variables.
• At all t, the output \( \mathbf {x}^t\in \mathcal {X} \) of the regret minimizer is a function of the past outputs \( \mathbf {x}^1,\dots ,\mathbf {x}^{t-1} \) and their corresponding feedbacks \( \ell ^1, \dots , \ell ^{t-1} \), as well as on the random outcome \( \mathbf {S}^t \). So, \( \mathbf {x}^t \) is measurable with respect to the \( \sigma \)-algebra generated by \( \lbrace \mathbf {E}^1, \dots ,\mathbf {E}^{t-1}, \mathbf {S}^1, \dots , \mathbf {S}^t\rbrace \).
• At all t, the linear utility function \( \ell ^t \) constructed by the environment is a function of the past outputs \( \mathbf {x}^1, \dots , \mathbf {x}^{t-1} \) and their corresponding feedback \( \ell ^1,\dots ,\ell ^{t-1} \), as well as of the random outcome \( \mathbf {E}^{t} \). So, \( \ell ^t \) is measurable with respect to the \( \sigma \)-algebra generated by \( \lbrace \mathbf {E}^1, \dots ,\mathbf {E}^{t}, \mathbf {S}^1, \dots , \mathbf {S}^{t-1}\rbrace \).
• Consequently, at all t, \( \ell ^t \) is conditionally independent of \( \mathbf {x}^t \), given the \( \sigma \)-algebra \( \mathcal {F}^{t-1} := \sigma (\mathbf {E}^1, \dots , \mathbf {E}^{t-1}, \mathbf {S}^1, \dots , \mathbf {S}^{t-1}) \) generated by all past random outcomes.
Footnote
⁶ The assumption that \( \mathcal {R}_\Phi \) is deterministic immediately guarantees that, at all times t, any linear utility function given as feedback to \( \mathcal {R}_\Phi \) is conditionally independent of the last output \( \phi ^t \), given the random outcomes used by the environments \( \mathbf {E}^1, \dots ,\mathbf {E}^{t-1} \) (cf. Footnote 5).
Footnote
⁷ On the surface, it might look like \( L^t \) is independent of the last output \( \phi ^t \) of the regret minimizer \( \mathcal {R}_{\Phi } \), and thus, that it trivially satisfies the requirements of Definition 2.8. However, that is not true: \( x^t \) is a fixed point of \( \phi ^t \), and, since \( x^t \) enters into the definition of \( L^t \), if \( \mathcal {R}_\Phi \) picks \( \phi ^t \) randomly, then it might very well be that \( L^t \) is not conditionally independent of \( \phi ^t \). We sidestep this issue by requiring that \( \mathcal {R}_\Phi \) is deterministic (cf. Footnote 6).
Footnote
⁸ As is common in the analysis of randomized algorithms, the stochastic process is adapted to the filtration of \( \sigma \)-algebras generated by all the past random outcomes of the algorithm (in our case, all past outcomes of the sources of randomness used by the regret minimizer and by the environment). In other words, using the notation of Footnote 5, \( \lbrace w^t\rbrace \) is adapted to the filtration \( \lbrace \mathcal {F}^t\rbrace \).
Footnote
⁹ More precisely, \( \ell ^t \) is conditionally independent of \( \pi^{(i),t} \), given \( \mathcal {F}^{t-1} \) (cf. Definition 2.8 and Footnote 5).
Footnote
¹⁰ We recall the classic Azuma-Hoeffding inequality [2, 26] for martingale difference sequences (e.g., Reference [34, Theorem 3.14]).
Lemma 4.2 (Azuma-Hoeffding inequality). Let \( Y_1, \dots , Y_n \) be a martingale difference sequence with \( a_k \le Y_k \le b_k \) for each k, for suitable constants \( a_k, b_k \). Then for any \( \tau \ge 0 \), \( \mathbb {P}{\begin{matrix} [\sum Y_k \le \tau \end{matrix}} ] \ge 1 - e^{-2\tau ^2 / \sum (b_k - a_k)^2}. \)
Footnote
¹¹ We shift \( L^t\circ h^{(i)}_{\hat{\sigma }} \) purely for technical reasons. We do it so \( g^{(i),\,t} \) is a linear utility function, and thus it can be passed in as feedback to a regret minimizer.
Footnote
¹² Technically, Farina et al. [14] only prove the bound (27) for the case \( m=2 \). However, as mentioned by the authors, the extension to generic \( m\in \mathbb {N}_{\gt 0} \) is direct.
¹³ That is, according to a pre-order tree traversal: If \( j \prec j^{\prime } \), then j appears before \( j^{\prime } \) in the iteration order.
¹⁴ As discussed in Lemma 4.1, in principle, any unbiased sampling scheme will work. For the purposes of analyzing the complexity of Algorithm 6, however, we will assume that the natural sampling scheme described in Section 2.2 is used. That sampling scheme runs in linear time in \( |{\Sigma^(i)} | \).

REFERENCES

[1] Aumann Robert J.. 1974. Subjectivity and correlation in randomized strategies. J. Math. Econ. 1, 1 (1974), 67–96.Google ScholarCross Ref
Reference 1Reference 2
[2] Azuma Kazuoki. 1967. Weighted sums of certain dependent random variables. Tohoku Math. J. 19, 3 (1967), 357–367.Google ScholarCross Ref
[3] Brown Noam and Sandholm Tuomas. 2018. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359, 6374 (2018), 418–424.Google ScholarCross Ref
Reference
[4] Brown Noam and Sandholm Tuomas. 2019. Solving imperfect-information games via discounted regret minimization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 1829–1836.Google ScholarDigital Library
Reference
[5] Cahn Amotz. 2004. General procedures leading to correlated equilibria. Int. J. Game Theor. 33, 1 (2004), 21–40.Google ScholarDigital Library
Reference
[6] Celli A. and Gatti N.. 2018. Computational results for extensive-form adversarial team games. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).Google ScholarCross Ref
Reference
[7] Celli Andrea, Marchesi Alberto, Bianchi Tommaso, and Gatti Nicola. 2019. Learning to correlate in multi-player general-sum sequential games. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 13055–13065.Google Scholar
Reference
[8] Celli Andrea, Marchesi Alberto, Farina Gabriele, and Gatti Nicola. 2020. No-regret learning dynamics for extensive-form correlated equilibrium. In Proceedings of the International Conference on Advances in Neural Information Processing Systems.Google Scholar
Reference
[9] Cesa-Bianchi Nicolo and Lugosi Gábor. 2006. Prediction, Learning, and Games. Cambridge University Press.Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[10] Chen Xi and Deng Xiaotie. 2006. Settling the complexity of two-player Nash equilibrium. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science. IEEE, 261–272.Google ScholarDigital Library
Reference
[11] Daskalakis Constantinos, Goldberg Paul W., and Papadimitriou Christos H.. 2009. The complexity of computing a Nash equilibrium. SIAM J. Comput. 39, 1 (2009), 195–259.Google ScholarDigital Library
Reference
[12] Daskalakis Constantinos and Papadimitriou Christos H.. 2009. On a network generalization of the minmax theorem. In Proceedings of the International Colloquium on Automata, Languages, and Programming. Springer, 423–434.Google ScholarDigital Library
Reference
[13] Dudík Miroslav and Gordon Geoffrey J.. 2009. A sampling-based approach to computing equilibria in succinct extensive-form games. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. 151–160.Google Scholar
Reference 1Reference 2
[14] Farina Gabriele, Kroer Christian, and Sandholm Tuomas. 2019. Regret circuits: Composability of regret minimizers. In Proceedings of the International Conference on Machine Learning. 1863–1872.Google Scholar
Reference 1Reference 2
[15] Farina Gabriele, Ling Chun Kai, Fang Fei, and Sandholm Tuomas. 2019. Correlation in extensive-form games: Saddle-point formulation and benchmarks. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 9229–9239.Google Scholar
Reference
[16] Foster Dean P. and Vohra Rakesh V.. 1997. Calibrated learning and correlated equilibrium. Games Econ. Behav. 21, 1-2 (1997), 40.Google ScholarCross Ref
Reference
[17] Fudenberg Drew and Levine David K.. 1995. Consistency and cautious fictitious play. J. Econ. Dynam. Contr. 19, 5-7 (1995), 1065–1089.Google ScholarCross Ref
Reference
[18] Fudenberg Drew and Levine David K.. 1998. The Theory of Learning in Games, Vol. 2. MIT Press.Google Scholar
Reference
[19] Fudenberg Drew and Levine David K.. 1999. Conditional universal consistency. Games Econ. Behav. 29, 1-2 (1999), 104–130.Google ScholarCross Ref
Reference
[20] Gordon Geoffrey J., Greenwald Amy, and Marks Casey. 2008. No-regret learning in convex games. In Proceedings of the International Conference on Machine Learning. 360–367.Google ScholarDigital Library
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
Reference 7
Reference 8
Reference 9
Reference 10
Reference 11
[21] Greenwald Amy and Jafari Amir. 2003. A general class of no-regret learning algorithms and game-theoretic equilibria. In Learning Theory and Kernel Machines. Springer, 2–12.Google ScholarCross Ref
Reference 1Reference 2Reference 3
[22] Hart Sergiu and Mas-Colell Andreu. 2000. A simple adaptive procedure leading to correlated equilibrium. Econometrica 68, 5 (2000), 1127–1150.Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
[23] Hart Sergiu and Mas-Colell Andreu. 2001. A general class of adaptive strategies. J. Econ. Theor. 98, 1 (2001), 26–54.Google ScholarCross Ref
Reference
[24] Hart Sergiu and Mas-Colell Andreu. 2003. Uncoupled dynamics do not lead to Nash equilibrium. Amer. Econ. Rev. 93, 5 (2003), 1830–1836.Google ScholarCross Ref
Reference
[25] Hazan Elad and Kale Satyen. 2008. Computational equivalence of fixed points and no regret algorithms, and convergence to equilibria. In Proceedings of the International Conference on Advances in Neural Information Processing Systems.Google Scholar
Reference
[26] Hoeffding Wassily. 1963. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58, 301 (1963), 13–30.Google ScholarCross Ref
[27] Huang Wan and Stengel Bernhard von. 2008. Computing an extensive-form correlated equilibrium in polynomial time. In Proceedings of the International Workshop on Internet and Network Economics. Springer, 506–513.Google ScholarDigital Library
Reference
[28] Jiang Albert Xin and Leyton-Brown Kevin. 2015. Polynomial-time computation of exact correlated equilibrium in compact games. Games Econ. Behav. 91 (2015), 347–359.Google ScholarCross Ref
Reference
[29] Kakade Sham, Kearns Michael, Langford John, and Ortiz Luis. 2003. Correlated equilibria in graphical games. In Proceedings of the 4th ACM Conference on Electronic Commerce. 42–47.Google ScholarDigital Library
Reference
[30] Koller Daphne, Megiddo Nimrod, and Stengel Bernhard von. 1996. Efficient computation of equilibria for extensive two-person games. Games Econ. Behav. 14, 2 (1996), 247–259.Google ScholarCross Ref
Reference
[31] Koutsoupias Elias and Papadimitriou Christos. 1999. Worst-case equilibria. In Proceedings of the Annual Symposium on Theoretical Aspects of Computer Science. Springer, 404–413.Google ScholarDigital Library
Reference
[32] Kuhn H. W.. 1953. Extensive Games and the Problem of Information. Princeton University Press, 193–216.Google Scholar
Reference
[33] Lanctot Marc, Waugh Kevin, Zinkevich Martin, and Bowling Michael H.. 2009. Monte Carlo sampling for regret minimization in extensive games. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1078–1086.Google Scholar
Reference
[34] McDiarmid Colin. 1998. Concentration. Springer, Berlin, 195–248.Google Scholar
[35] Moravčík Matej, Schmid Martin, Burch Neil, Lisỳ Viliam, Morrill Dustin, Bard Nolan, Davis Trevor, Waugh Kevin, Johanson Michael, and Bowling Michael. 2017. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356, 6337 (2017), 508–513.Google ScholarCross Ref
Reference
[36] Morrill Dustin, D’Orazio Ryan, Lanctot Marc, Wright James R., Bowling Michael, and Greenwald Amy R.. 2021. Efficient deviation types and learning for hindsight rationality in extensive-form games. In Proceedings of the 38th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 139). PMLR, 7818–7828.Google Scholar
Reference 1Reference 2
[37] Morrill Dustin, D’Orazio Ryan, Sarfati Reca, Lanctot Marc, Wright James, Greenwald Amy, and Bowling Michael. 2020. Hindsight and sequential rationality of correlated play. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20).Google Scholar
Reference 1Reference 2Reference 3
[38] Nash John F.. 1950. Equilibrium points in n-person games. Proc. Nat. Acad. Sci. 36, 1 (1950), 48–49.Google ScholarCross Ref
Reference
[39] Paige C. C., Styan George P. H., and Wachter Peter G.. 1975. Computation of the stationary distribution of a Markov chain. J. Statist. Computat. Simul. 4, 3 (1975), 173–186.Google ScholarCross Ref
Reference
[40] Papadimitriou Christos H. and Roughgarden Tim. 2008. Computing correlated equilibria in multi-player games. J. ACM 55, 3 (2008), 14.Google ScholarDigital Library
Reference
[41] Romanovskii I.. 1962. Reduction of a game with complete memory to a matrix game. Soviet Math. 3 (1962).Google Scholar
Reference
[42] Roughgarden Tim and Tardos Éva. 2002. How bad is selfish routing? J. ACM 49, 2 (2002), 236–259.Google ScholarDigital Library
Reference
[43] Shoham Yoav and Leyton-Brown Kevin. 2008. Multiagent Systems: Algorithmic, Game-theoretic, and Logical Foundations. Cambridge University Press.Google ScholarCross Ref
Reference
[44] Stoltz Gilles and Lugosi Gábor. 2007. Learning correlated equilibria in games with compact sets of strategies. Games Econ. Behav. 59, 1 (2007), 187–208.Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[45] Tammelin Oskari. 2014. Solving large imperfect information games using CFR+. arXiv preprint arXiv:1407.5042 (2014).Google Scholar
Reference
[46] Tammelin Oskari, Burch Neil, Johanson Michael, and Bowling Michael. 2015. Solving heads-up limit Texas hold’em. In Proceedings of the International Joint Conferences on Artificial Intelligence. 645–652.Google Scholar
Reference
[47] Stengel Bernhard von. 1996. Efficient computation of behavior strategies. Games Econ. Behav. 14, 2 (1996), 220–246.Google ScholarCross Ref
Reference 1Reference 2
[48] Stengel Bernhard von and Forges Françoise. 2008. Extensive-form correlated equilibrium: Definition and computational complexity. Math. Oper. Res. 33, 4 (2008), 1002–1022.Google ScholarCross Ref
Reference 1Reference 2
[49] Zinkevich Martin, Johanson Michael, Bowling Michael, and Piccione Carmelo. 2008. Regret minimization in games with incomplete information. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1729–1736.Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6

Index Terms

Simple Uncoupled No-regret Learning Dynamics for Extensive-form Correlated Equilibrium
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Multi-agent systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Algorithmic game theory and mechanism design
      1. Convergence and learning in games
      2. Exact and approximate computation of equilibria

Index terms have been assigned to the content through auto-classification.

Recommendations

Faster No-Regret Learning Dynamics for Extensive-Form Correlated and Coarse Correlated Equilibria
EC '22: Proceedings of the 23rd ACM Conference on Economics and Computation

A recent emerging trend in the literature on learning in games has been concerned with providing faster learning dynamics for correlated and coarse correlated equilibria in normal-form games. Much less is known about the significantly more challenging ...
Read More
No-regret learning dynamics for extensive-form correlated equilibrium
NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing Systems

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players ...
Read More
Correlated Equilibrium in Quitting Games

A quitting game is a sequential game where each player has two actions: to continue or to quit. The game continues as long as all players decide to continue. The moment at least one player decides to quit, the game terminates. The terminal payoff depends ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of the ACM Volume 69, Issue 6
December 2022
302 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/3570966
Editor:
Venkatesan Guruswami
University of California, Berkeley, United States
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 November 2022
- Online AM: 20 October 2022
- Accepted: 19 July 2022
- Revised: 16 February 2022
- Received: 18 April 2021
Published in jacm Volume 69, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Extensive-form games
correlated equilibrium
regret minimization
multi-agent learning
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 680
  Total Downloads
- Downloads (Last 12 months)492
- Downloads (Last 6 weeks)49
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Simple Uncoupled No-regret Learning Dynamics for Extensive-form Correlated Equilibrium

Journal of the ACM

Abstract

1 INTRODUCTION

1.1 Related Work

2 PRELIMINARIES

2.1 Mathematical Notation and Algorithmic Conventions

2.2 Extensive-form Games

2.2.1 Imperfect Information.

2.2.2 Sequences.

2.2.3 Sequence-form Strategies.

2.2.4 Deterministic Sequence-form Strategies.

2.3 Extensive-form Correlated Equilibrium (EFCE)

(Trigger Agent).

(ε-EFCE; EFCE)

2.4 Regret Minimization and Phi-regret Minimization

3 TRIGGER REGRET AND RELATIONSHIP WITH EFCE

(Trigger Deviation Function).

4 EFFICIENT NO-TRIGGER-REGRET ALGORITHM

4.1 Overview

4.2 From Deterministic to Mixed Strategies

4.3 Regret Minimizer for the Convex Hull of the Set of Trigger Deviation Functions

4.3.1 Regret Minimizer for \( {\bar\Psi^{(i)}_{\hat\sigma}} \).

4.3.2 Regret Minimizer for co(ψ(i)).

(Farina et al. [14], Section 4.312)

4.4 Computation of the Next Strategy

4.5 The Complete Algorithm

5 CONVERGENCE TO EFCE

ACKNOWLEDGMENTS

Footnotes

REFERENCES

Cited By

Index Terms

Recommendations

Faster No-Regret Learning Dynamics for Extensive-Form Correlated and Coarse Correlated Equilibria

No-regret learning dynamics for extensive-form correlated equilibrium

Correlated Equilibrium in Quitting Games

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media

4.3.2 Regret Minimizer for co(ψ⁽ⁱ⁾).

(Farina et al. [14], Section 4.3¹²)