Bilevel programming methods for computing single-leader-multi-follower equilibria in normal-form and polymatrix games

The concept of leader-follower (or Stackelberg) equilibrium plays a central role in a number of real-world applications bordering on mathematical optimization and game theory. While the single-follower case has been investigated since the inception of bilevel programming with the seminal work of von Stackelberg, results for the case with multiple followers are only sporadic and not many computationally affordable methods are available. In this work, we consider Stackelberg games with two or more followers who play a (pure or mixed) Nash equilibrium once the leader has committed to a (pure or mixed) strategy, focusing on normal-form and polymatrix games. As customary in bilevel programming, we address the two extreme cases where, if the leader’s commitment originates more Nash equilibria in the followers’ game, one which either maximizes (optimistic case) or minimizes (pessimistic case) the leader’s utility is se-N

lected. First, we show that, in both cases and when assuming mixed strategies, the optimization problem associated with the search problem of finding a Stackelberg equilibrium is N P-hard and not in Poly-APX unless P = N P. We then consider different situations based on whether the leader or the followers can play mixed strategies or are restricted to pure strategies only, proposing exact nonconvex mathematical programming formulations for the optimistic case for normal-form and polymatrix games. For the pessimistic problem, which cannot be tackled with a (single-level) mathematical programming formulation, we propose a heuristic black-box algorithm. All the methods and formulations that we propose are thoroughly evaluated computationally.

Introduction
Leader-follower (or Stackelberg) games model the interaction between rational agents (or players) when a hierarchical decision-making structure is in place. Considering, for simplicity, the two-player case, Stackelberg games model situations where an agent (the leader) plays first and a second agent (the follower) plays right after them, after observing the strategy the leader has chosen.
The number of real-world problems where a leader-follower (or Stackelberg) structure can be identified is extremely large. This is often the case in the security domain [6,19], where a defender, aiming to protect a set of valuable targets from the attackers, plays first, while the attackers, acting as followers, make their move only after observing the leader's defensive strategy. Other noteworthy cases are interdiction problems [9,23], toll setting problems [20], network routing problems [3], and (singleton) congestion games [10,22].
While, since the seminal work of von Stackelberg [31], the case with a single leader and a single follower has been widely investigated, only a few results are known for the case with multiple followers and not many computationally affordable methods are available to solve the corresponding equilibrium-finding problem.
In this paper, we focus on the fundamental case of single-leader multi-follower games with a finite number of actions per player where the overall game can be represented as a normal-form or polymatrix game-the latter is of interest as it plays an important role in a number of applications such as, e.g., in the security domain, where the defender may need to optimize against multiple uncoordinated attackers solely interested in damaging the leader. Throughout the paper, we assume the setting where the (two or more) followers play simultaneously in a noncooperative way, for which it is natural to assume that, after observing the leader's play (either as a strategy or as an action), the followers would reach a Nash Equilibrium (NE) (see [30] for a thorough exposition of this equilibrium concept). We refer to an equilibrium in such games as Leader-Follower Nash Equilibrium (LFNE).
As it is typical in bilevel programming, we study two extreme cases: the optimistic one where, if the leader's commitment originates more NE in the followers' game, one which maximizes the leader's utility is selected, and the pessimistic one where an equilibrium which minimizes the leader's utility is chosen.
In particular, the leader's utility at an optimistic equilibrium corresponds to the largest utility the leader may get assuming the best case in which the followers would (somehow) end up playing a Nash equilibrium which maximizes the leader's utility. Differently, the leader's utility at a pessimistic equilibrium corresponds to a utility value the leader could always get independently of the followers' behavior. From this perspective, a risk-taking leader would play according to an optimistic equilibrium, whereas a risk-averse leader would play according to a pessimistic equilibrium. For more types of solution concepts related to these two, we refer the reader to [2].
The original contributions of our work are as follows. 1 First, we illustrate that the optimization problem associated with the search problem of computing an LFNE in mixed strategies when the followers play an NE which either maximizes or minimizes the leader's utility is N P-hard and not in Poly-APX unless P = N P. After casting the general problem with mixed strategy commitments in bilevel terms, we propose different nonlinear and nonconvex single-level mathematical programming formulations for the optimistic case, suitable for state-of-the-art spatial-branch-and-bound solvers. For the pessimistic case, which does not admit a single-level mathematical programming reformulation of polynomial size, we propose a heuristic method based on the combination of a spatial-branch-and-bound solver with a black-box algorithm. We also briefly investigate (easier) variants of the problem obtained when restricting either the leader or the followers to pure-strategy commitments. We conclude by providing a thorough experimental evaluation of our techniques on a (normalform and polymatrix) testbed generated with GAMUT [26], also encompassing some structured games, employing different solvers: BARON, SCIP, CPLEX, SNOPT, and RBFOpt (the latter is used for black-box optimization).

Notation
Let N = {1, . . . , n} be the set of agents. For each p ∈ N , we denote by A p the agent's set of actions, with m p = |A p |. For each agent p ∈ N , we denote by x p ∈ [0, 1] mp , with e T x p = 1 (where e is the all-one vector), their strategy vector (or strategy, for short). Each component x a p of x p represents the probability by which agent p plays action a ∈ A p . We call x p a vector of pure strategies if x p ∈ {0, 1} mp , or of mixed strategies in the general case. We denote a strategy profile, i.e., the collection of the strategies each agent plays, by x = (x 1 , . . . , x n ).
For each agent p ∈ N , we define their utility function as u p : [0, 1] m1 × · · · × [0, 1] mn → R. A strategy profile x = (x 1 , . . . , x n ) is an NE if and only if, for each agent p ∈ N , u p (x 1 , . . . , x n ) ≥ u p (x 1 , . . . , x n ) for every strategy profile x where x q = x q for all q ∈ N \ {p} and x p = x p (this corresponds to assuming that no unilateral deviations would take place). We consider two game classes: Normal-Form (NF) and PolyMatrix (PM).
For NF games (see [30] for a reference), we let U p ∈ R m1×...×mn denote, for each agent p ∈ N , their (multidimensional) utility (or payoff) matrix where each component U a1,...,an p denotes the utility of agent p when all the agents play actions a 1 , . . . , a n . Given a strategy profile (x 1 , . . . , x n ), the expected utility of agent p ∈ N is equal to the multilinear function u p (x 1 , . . . , x n ) = x T p U p · q∈N \{p} x q . For PM games (see [34] for a reference), we have a utility matrix U pq ∈ R mp×mq per pair of agents p, q ∈ A. Given a strategy profile (x 1 , . . . , x n ), the expected utility of agent p is equal to the bilinear function u p (x 1 , . . . , x n ) = q∈N \{p} x T p U pq x q . We remark that, while in the NF case the degree of the polynomial corresponding to an agent's expected utility is equal to the number of agents, it is always equal to 2 in the PM case, independently of the number of agents involved. The computational impact of this property will be discussed in the paper.

Previous works
Since the original work of Nash [25], the problem of computing Nash equilibria in multi-player games (without a leader) has attracted a large interest-see the monograph [32] and [11,17] where the complexity of the problem is addressed. For more details on noncooperative game theory, we refer the interested reader to [30].
Most of the game-theoretical investigations on Stackelberg games have, to the best of our knowledge, mainly addressed the case of a single follower. In such setting, it is known that the single follower can play a pure strategy without loss of generality, i.e., that there always is a pure strategy by which they can maximize their utility, and that the optimization problem associated with the search problem of computing an equilibrium is easy with complete information [33], while it becomes N P-hard for Bayesian games [16]. Algorithms are proposed in [16].
For what concerns Stackelberg games with more than two players, some works have investigated the case with multiple leaders and a single follower, see [21]. For the problem involving a single leader and multiple followers (the one on which we focus in this paper), only a few results are available. It is known, for instance, that an equilibrium can be found in polynomial time if the followers play a correlated equilibrium in the optimistic case [15] (see [30] for more detail on correlated equilibria), whereas the associated optimization problem is N P-hard if they play sequentially one at a time (as in a classical Stackelberg game with many players) [16].

Problem statements, bilevel perspective, and computational complexity
In this section, we formalize the problem that we address in the paper, cast it in bilevel terms, and investigate its computational complexity and approximability.

Problem statements
In formal terms, the two main versions of the problem of computing an LFNE that we tackle in this paper, optimistic (O-LFNE) and pessimistic (P-LFNE), are defined as follows: O-LFNE: Given an n-agent game with n ≥ 3, find a strategy vector δ for the leader such that, after committing, the largest leader's utility over all the NE in the followers' game parameterized by δ is as large as possible. P-LFNE: Given an n-agent game with n ≥ 3, find a strategy vector δ for the leader such that, after committing, the smallest leader's utility over all the NE in the followers' game parameterized by δ is as large as possible.
When notationally convenient, we will refer to an either optimistic or pessimistic LFNE as O/P-LFNE.
We will distinguish between the cases where either the leader or the followers are restricted or not to pure strategies, considering four cases: leader in mixed and followers in mixed (LMFM), leader in pure and followers in mixed (LPFM), leader in mixed and followers in pure (LMFP), and leader in pure and followers in pure (LPFP).
In the general (mixed) case, we assume that the leader commits to a strategy, i.e., to a distribution of probability according to which they (the leader) select their action, and that, while the followers can observe the distribution chosen by the leader, they (the followers) cannot observe its realization (i.e., the action the leader plays). This is the case in, e.g., security games. The case in which the leader's strategy is pure is the converse one in which the leader's play is completely observable by the followers.
The assumption behind the followers playing mixed strategies is the same as in games without a leader (e.g., one can consider repeated games in which the leader has to commit to a single strategy before the game starts, whereas the followers can, at each iteration, draw a different action profile from their distribution of choice, thus playing mixed strategies).
For the sake of presentation, in the remainder of the paper we assume n = 3 (one leader, two followers). We remark that our results can be adapted to any n. In Section 9, we will indeed report on computational experiments carried out for games with more than two followers.
In the remainder of the paper, we assume that the last agent (the third), whom we relabel as agent , takes the role of leader. All the other agents (the followers) are compactly denoted by the set F = N \ { }. When n = 3, F = {1, 2}. For all f ∈ F , we define f := F \ {f }. We also denote x (the strategy vector of the leader) by δ and x 1 , x 2 (the strategy vectors of the followers) by ρ 1 , ρ 2 . For each p ∈ N , we let ∆ p be the simplex of strategies of player p, i.e., the set of nonnegative vectors δ, ρ 1 , or ρ 2 summing to 1.

Bilevel programming perspective
Computing an O/P-LFNE amounts to solving a bilevel programming problem.
In the optimistic case, we can compute an O-LFNE by solving the following problem: Due to Constraints (1b)-(1c), the second level problems call for a pair (ρ 1 , ρ 2 ) of followers' strategies forming an NE in the followers' game induced by the strategy δ ∈ ∆ chosen by the leader in the first level. Note that, due to the definition of NE, the pair (ρ 1 , ρ 2 ) is an NE in the game induced by δ if and only if ρ 1 (resp., ρ 2 ) maximizes player 1's (resp., player 2's) utility when assuming that player 2 (resp., player 1) plays ρ 2 (resp., ρ 1 ). Subject to these constraints, the first level calls for a triple (ρ 1 , ρ 2 , δ) maximizing the leader's utility.
In the pessimistic case, computing a P-LFNE amounts to solving to the following problem: This problem differs from its optimistic counterpart as, due to the assumption of pessimism, the leader here maximizes the minimum value taken by their utility over all pairs (ρ 1 , ρ 2 ) which are NE in the followers' game induced by δ-that is, for the chosen δ, ρ 1 and ρ 2 always correspond to a NE which minimizes the leader's utility.

Complexity results
As we will show, the optimization problem associated with the search problem of computing an LFNE is both N P-hard and inapproximable in both versions (O-LFNE and P-LFNE) in the LMFM case even with a single leader action (which implies that the result also holds for the LPFM case). This follows from the N P-hardness and inapproximability of the problem of computing, in a two-player game, a mixedstrategy NE which maximizes the sum of the players' utilities (the so-called social welfare) [17]: The problem of computing a mixed-strategy NE which maximizes the total players' utility is N P-hard and it is not in APX unless P = N P, even when the game is polymatrix.
The result is based on the fact that, for any SAT instance, it is possible to build a symmetric two-player game (U 1 , U 2 ), either NF or PM, such that: i) there is a (pure-strategy) NE in which both players play their last action and receive a utility equal to > 0, where is an arbitrarily small constant; ii) the game admits a (mixed-strategy) NE providing each player with a utility of m, where m is the number of actions, if and only if the SAT instance is a YES instance.
This implies that, in any such game, finding an NE where the players achieve a utility strictly larger than would suffice to claim that the corresponding SAT instance is a YES instance. It follows that one cannot decide in polynomial time whether such games admit an NE providing the players with a utility strictly larger than unless P = N P as, if that were the case, YES instances of SAT could be decided in polynomial time. This also shows that finding an NE which maximizes the social welfare (defined as the total players' utility) is not in APX . This is because the existence of an NE providing the players with a total utility strictly greater than 2 would suffice to conclude that the corresponding SAT instance admits answer YES.
We show that the result in [17] can be strengthened with a simple observation: The problem of computing a mixed-strategy NE which maximizes the total players' utility is not in Poly-APX unless P = N P, even when the game is polymatrix.
Proof Let = 1 2 m . On games corresponding to YES SAT instances (which admit an NE with total utility 2m), an algorithm with approximation ratio 1 α would yield an NE of total utility at least 1 α 2m. Note that, if 1 α 2m > 2 (i.e., 1 α > m ), the SAT instance is proven to be a YES instance. Therefore, there cannot be a polynomial time approximation algorithm with a factor better than m = 1 2 m m unless P = N P. Since the reciprocal of this factor is superpolynomial, the problem is not in Poly-APX .
For the problem of computing an O/P-LFNE, we show the following result: The optimization problem associated with the search problem of computing an O/P-LFNE in the LMFM and LPFM cases is N P-hard and it is not in Poly-APX unless P = N P, even when the game is polymatrix.
Proof Let us consider the O-LFNE case first. Given a game with utilities (U 1 , U 2 ) and m actions per player as defined in [17], we construct a 3-player leader-follower polymatrix game where: the leader only has one action and utility matrices U f1 = U f2 = 1, . . . , 1, 1 2 m ; player f 1 's utility matrices are U f1 = 0 and U f1f2 = U 1 ; player f 2 's utility matrices are U f2 = 0 and U f2f1 = U 2 .
Due to having a single action, the presence of the leader is immaterial (note that, therefore, the LMFM and LPFM cases coincide). Therefore, the set of followers' equilibria in the leader-follower game is the same as that of the original two-player game. It follows that SAT has answer YES if and only if the leader-follower game admits an equilibrium with leader's utility strictly larger than 1 2 m , as that corresponds to an NE in the followers' game with utility strictly larger than for each player. Along the lines of the previous proof, an algorithm with approximation factor 1 α would yield, for a YES instance, a leader utility of at least 1 α , allowing us to conclude that the instance is a YES instance if 1 α > 1 2 m . This shows that the problem of computing an O-LFNE is not in Poly-APX unless P = N P (even in polymatrix games).
For the computation of a P-LFNE, the reasoning is the same except for defining U f1 = U f2 = 1 2 m , 1 2 m , . . . , 1 2 m , 1 . We conclude the section by showing that deciding whether one of the leader's actions can be safely discarded is a hard problem, which implies that dominance-like techniques often used in game theory to reduce the search space of an equilibriumcomputing algorithm are inapplicable.

Proposition 4
In the LMFM case, deciding whether an action of the leader is played with strictly positive probability at an O/P-LFNE is N P-hard.
Proof Given a symmetric two-player game (U 1 , U 2 ) with m actions as defined in [17], we build a three-player game (U , U f1 , U f2 ) in which: the leader has two actions, while f 1 and f 2 have m actions each; when the leader plays their first action, the payoffs of all the players are 1/4; when the leader plays their second action, the payoffs of f 1 and f 2 are those in (U 1 , U 2 ) and the leader's payoffs are 1 for all the actions of f 1 and f 2 , except for the combination composed of the last action of f 1 and the last action of f 2 , in which the leader's payoff is 0.
We show that the first action of the leader can be safely discarded from the game (U , U f1 , U f2 ) if and only if the game (U 1 , U 2 ) admits a mixed-strategy NE providing the players with a utility of m, which implies that deciding whether the first action of the leader can be discarded is N P-hard. If the leader plays their first action, they receive a utility of 1/4. If the leader plays their second action, the followers play the best NE for the leader, which can be either i) the pure-strategy NE in which both play their last action providing the leader with a utility of 0 or, ii) if it exists, the mixed-strategy NE providing the leader with a utility of 1. For any mixed strategy of the leader, the behavior of the followers does not change w.r.t. the case in which the leader plays their second action as a pure strategy. This is because, when the leader randomizes between their two actions, the utility of the followers f 1 and f 2 is an affine transformation (with positive coefficients) of U 1 and U 2 , making them play exactly as in the case where the leader plays their second action as a pure strategy. Thus, at an optimistic LFNE the leader plays a pure strategy, playing their first action if (U 1 , U 2 ) does not admit a mixed-strategy NE and their second action if it does. The first action of the leader can therefore be safely discarded if and only if (U 1 , U 2 ) admits a mixed-strategy NE providing the players with a utility of m.
The proof is analogous in the pessimistic case after interchanging the leader payoffs of values 0 and 1.

Optimistic case with Leader in Mixed and Followers in Mixed (O-LMFM)
In this section, we focus on the optimistic setting in the general case where each player is allowed to play mixed strategies. We propose three different exact mathematical programming formulations for NF games and then illustrate how they can be simplified for PM games.

Exact formulations for NF games
We report the three formulations illustrating how to derive each of them in sequence.

O-NF-LMFM-I
To obtain a single level formulation for the problem, we proceed by applying a standard reformulation [30] involving complementarity constraints.
Let, for all i ∈ A 1 and j ∈ A 2 ,Ũ ij 1 := k∈A U ijk 1 δ k andŨ ij 2 = k∈A U ijk 2 δ k be the matrices of the followers' game, parameterized by δ. According to Constraint (1b), for (ρ 1 , ρ 2 ) to be a NE ρ 1 must be an optimal solution to the Linear Program (LP): Since the LP is feasible and bounded for any ρ 2 ∈ ∆ 2 , by complementary slackness we have that ρ 1 ∈ ∆ 1 is optimal if and only if there is a scalar v 1 such that the following holds for all i ∈ A 1 : v 1 can be interpreted as the best-response value of follower 1, equal to the largest utility the follower can achieve at an equilibrium. Applying a similar reasoning to ρ 2 , we obtain that ρ 2 ∈ ∆ 2 is optimal if and only if there is a scalar v 2 such that the following holds for all j ∈ A 2 : We conclude that (ρ 1 , ρ 2 ) is an NE if and only if there are v 1 , v 2 ≥ 0 such that ρ 1 and ρ 2 simultaneously satisfy these four conditions. After substituting forŨ 1 andŨ 2 their linear expressions in δ, we obtain the following constraints for player 1 and for all i ∈ A 1 : For player 2 and for all j ∈ A 2 , we obtain: By imposing such constraints in lieu of the two second level argmax constraints of Problem (1) (Constraints (1b)-(1c)), we obtain a continuous single level formulation with nonconvex trilinear terms. 2 Overall, the formulation reads: The problem contains m 1 + m 2 cubic constraints, m 1 + m 2 quadratic constraints, and a cubic objective function.

O-NF-LMFM-II
What we propose now is aimed at achieving a formulation which can be solved more efficiently. Since each term of the complementarity constraints we introduced is bounded from above and below, we can apply a simple reformulation along the lines of [28]. Let s 1 ∈ {0, 1} m1 and s 2 ∈ {0, 1} m2 be the antisupport vectors of ρ 1 and ρ 2 , (i.e., two binary vectors with m 1 and, respectively, m 2 components each of which has value 0 if and only if ρ 1 and, respectively, ρ 2 is strictly positive in that component). It suffices to impose the following constraints for all i ∈ A 1 : and the following ones for all j ∈ A 2 : M is an upper bound on the entries of U 1 , U 2 . This way, while still retaining the original trilinear objective function only bilinear constraints are needed. We obtain the following reformulation: Constraints (5), (7), (8)- (10).
At the cost of introducing binary variables, with this formulation we achieve fewer nonlinearities: only 2m 1 + 2m 2 quadratic constraints and a cubic objective function.

O-NF-LMFM-III
Ultimately, we aim to solve the problem with spatial-branch-and-bound techniques, such as those implemented in BARON and SCIP. The main strategy of such methods to handle nonlinearities is to isolate "simple" nonlinear terms (bilinear or trilinear in our case) by shifting them into a new (so-called defining) constraint to which a convex envelope is applied. We propose to anticipate this reformulation, so to be able to derive some valid constraints. First, we introduce: i) variable y jk 2 and constraint y jk 2 = ρ j 2 δ k for all j ∈ A 2 , k ∈ A , ii) variable y ik 1 and constraint y ik 1 = ρ i 1 δ k for all i ∈ A 1 , k ∈ A , iii) variable z ijk and constraint z ijk = ρ i 1 y jk 2 for all i ∈ A 1 , j ∈ A 2 , k ∈ A . By substituting each bilinear and trilinear term with the newly introduced variables, we then obtain a formulation which is linear everywhere, except for the defining constraints themselves.
The advantage of carrying out this reformulation step a priori is that we can now observe that, after introducing the new variables, the matrix {y jk 2 } jk∈A2×A is, by definition, the outer product of the stochastic vectors ρ 2 and δ and, as such, is a stochastic matrix itself. The same holds for the tensor {z ijk } ijk∈A1×A2×A , which is the outer product of the vectors ρ 1 , ρ 2 , δ and, as such, is a stochastic tensor. This implies the validity of the following three constraints: i∈A1 k∈A We remark that these inequalities are a subset of those that are obtained by applying a relaxation-linearization techniqueà la Sherali-Adams [29] to Constraints (8) and (9).
Overall, we obtain m (m 1 + m 2 ) + m m 1 m 2 quadratic constraints and a linear objective function, yielding a tighter formulation than O-NF-LMFM-II, as we will show computationally.

Exact formulations for PM games
We illustrate how the three formulations we proposed can be substantially simplified for PM games.

O-PM-LMFM-I
In PM games, the expected utility for follower 1 corresponding to an action i ∈ A 1 (which is a trilinear function for NF games with n = 3, and of order n in general) is defined as the following linear function (which is linear for any n and, in particular, for n = 3): The leader's utility is the following function, bilinear for any n: As a consequence, the PM counterpart to formulation O-NF-LMFM-I reads: Constraints (8)- (10).
Differently from the NF case, this formulation only contains m 1 + m 2 quadratic constraints and a quadratic objective (as Constraints (5) and (7) become linear here, while Constraints (4) and (6) and Objective (3) become quadratic).

O-PM-LMFM-II
Applying for the PM case the same reformulation we carried out in O-NF-LMFM-II, we obtain: Constraints (8) Besides the binary variables, this formulation contains only linear constraints and a quadratic objective.

O-PM-LMFM-III
Similarly to O-NF-LMFM-III, this formulation is derived by reformulating each multilinear term in O-PM-LMFM-II. In the latter, the only nonlinearity is in the objective function. Therefore, O-PM-LMFM-III is obtained by just reformulating the products δ i ρ j f it contains for all f ∈ F and j ∈ A f , adding valid constraints identical to those we added to O-NF-LMFM-III. We obtain:

Pessimistic case with Leader in Mixed and Followers in Mixed (P-LMFM)
Unless P = N P, it is clear that there is no single-level formulation of polynomial size (in terms of variables and constraints) for the problem of computing a pessimistic LFNE. This is because, given a triple δ, ρ 1 , ρ 2 , a single-level reformulation of polynomial size for the problem would allow for checking whether, for the given δ, the (ρ 1 , ρ 2 ) pair yields not just an NE (this can be checked in polynomial time by inspecting polynomially many constraints) but an optimal one. That is, it would allow us to verify in polynomial time whether a given solution to an N P-hard problem is optimal, which cannot be done in general unless P = N P.
For this reason, we adopt a different approach here, designing a heuristic method to tackle the pessimistic case based on a black-box solver coupled with an exact oracle. While the method is conceived to tackle the pessimistic case, it can also be used for the optimistic one (as we show in the computational results section).
The method is based on a Radial Basis Function (RBF) estimation which relies on the solver RBFOpt [18]. The idea is of exploring the leader's strategy space (variables δ) with a direct search which iteratively builds an RBF approximation of the objective function relying on the solution of an oracle formulation which is responsible for carrying out the objective function evaluation.
Given any incumbent valueδ, the oracle solves the (NF or PM) second level problem exactly after imposing δ =δ. For NF games, the oracle formulation we use is similar to O-NF-LMFM-III, employing a different reformulation with auxiliary variables y jk = ρ j 1 ρ k 2 , which yields a tighter reformulation than the original one in O-NF-LMFM-III when δ is given (as in this case). Crucially, in this formulation the sign of the objective function has to be changed so to produce a pair (ρ 1 , ρ 2 ) which minimizes the leader's objective function (rather than maximizing it) for the given δ =δ.
The oracle formulation for the optimistic and pessimistic cases reads as follows (± indicates that the sign of the objective function has to be flipped from + to − in the pessimistic case): Constraints (8) Besides the defining constraints for y ij , the other parts of the formulation are all linear.
For PM games, we can directly use formulation O-PM-LMFM-II: since each of the nonlinear terms in O-PM-LMFM-II is bilinear and it involves δ, when δ is fixed toδ the formulation corresponds to a Mixed-Integer Linear Program (MILP).

Optimistic case with Leader in Pure and Followers in Mixed (O-LPFM)
We focus now on the case in which the leader is restricted to pure strategies.

Exact formulations for NF and PF games
As it is clear, in the LPFM case the problem can be solved by using one of the formulations we proposed after imposing δ ∈ {0, 1} m . With a binary δ, though, we can obtain different formulations which contain fewer nonlinearities. We present them here for the NF and PM cases. We only consider the formulations denoted by III since they turn out to be easier to solve in practice (as we will see in the computational results).

O-NF-LPFM-III
For δ ∈ {0, 1} m , the quadratic defining Constraints (22) in O-NF-LMFM-III can be dropped in favor of the following three linear constraints: Together with y ik f ≥ 0, these constraints constitute the so-called McCormick envelope [24] , the envelope yields an exact reformulation [1]. The resulting formulation is obtained from O-NF-LMFM-III by dropping the quadratic (defining) Constraints (22) and substituting for them the linear Constraints (50)-(52). The only nonlinear constraints still present in the formulation are Constraints (23).

O-PM-LPFM-III
In O-PM-LMFM-III, the only nonlinearities are due to the quadratic (defining) Constraints (22). Due to δ ∈ {0, 1} m , by applying the McCormick envelope via Constraints (50)-(52) we can remove all the nonlinearities from the problem, obtaining an MILP.

O-NF/PM-LPFM-Implicit-Enumeration
When δ ∈ {0, 1} m , an LFNE can also be found by solving m times one of our formulations. It suffices to change the sign of the objective function in the pessimistic case, iteratively fixing δ = e k (where e k is the all zero vector with a single 1 in position k) and selecting the best outcome over all the iterations as the solution to the problem. While this method is correct for both variants (optimistic and pessimistic), in the optimistic case we can design a better algorithm, which we now introduce.
The main idea of the algorithm is pruning the search space A so to solve fewer subproblems thanks to a bounding technique. For each of the leader's actions, we compute the utility they would obtain if the followers played a Correlated Equilibrium (CE) (which can be computed in polynomial time via linear programming, see [30]). Since the set of correlated strategies is a (strict) superset of that of mixed strategies, its computation yields an upper bound (UB). We can thus iterate over i ∈ A and solve one of our formulations with δ = e k (where e k is the unit vector with a single 1 in position k) only if the UB with δ = e k is better than the best solution found thus far.
The algorithm reads: LB = max{LB, U tility(e k )} 8: end for BestCorrelatedEquilibrium(k) computes a UB with δ = e k by computing a CE in polynomial time via linear programming, along the lines of [30]. After sorting the leader's actions in decreasing order of UB via DescendingSort(A , U B), the algorithm iterates over A , computing with U tility(e k ) the exact leader's utility corresponding to playing the pure action δ = e k only if U B(k) is sufficiently promising. In our implementation, U tility(e k ) solves the same oracle formulations adopted in the black-box method.

A note on solution approaches for the remaining cases
For completeness, in this section we address the remaining cases that are obtained by restricting either the leader or the followers to pure strategies. Since all these cases can be solved fairly easily with only one exception, we will not consider them in the computational results section.

O/P-LFNE with Leader in Pure and Followers in Pure (O/P-LPFP)
The case where both the leader and the followers can only play pure strategies is trivial in both the optimistic and pessimistic versions. For its solution one can, first, construct each of the m 3 possible outcomes of the three players and, then, discard all the outcomes where the pair of followers' strategies do not induce an NE for the pure leader strategy they contain. For the optimistic case, it then suffices to compare the leader's utility corresponding to all the outcomes which have not been discarded, identifying one where the leader's utility is maximized. For the pessimistic case, an extra step is needed as one has to, first, group all the outcomes by leader strategy and then identify, in each group, an outcome corresponding to the smallest leader utility. An equilibrium is found by selecting, among all the remaining outcomes (at most one per leader's pure strategy) one which maximizes the leader's utility.

O/P-LFNE with Leader in Mixed and Followers in Pure (O/P-LMFP)
In the optimistic setting, the case in which only the followers are restricted to pure strategies can be solved by solving m 2 linear programming problems, one per followers' outcome. In each problem, we only have to impose best-response constraints on the followers' utilities guaranteeing that there is a leader's strategy δ for which the chosen outcome is an NE, maximizing the leader's utility at that outcome for δ. The follower's outcome and the corresponding δ yielding the largest leader utility is then an O-LFNE.
It is not difficult to see that the previous algorithm (which, overall, runs in polynomial time) is not correct in the pessimistic case. This is not surprising since, as shown in [12,13], the optimization problem corresponding to the equilibrium-finding problem is N P-hard in the pessimistic case even with followers restricted to pure strategies. For its solution, we can resort to the same methods proposed in this paper for the LMFM case, simply requiring ρ 1 and ρ 2 to be binary.

Computational results
For our computational experiments, we adopt a testbed composed of instances mainly taken from two GAMUT [26] classes, Uniform RandomGames (NF games) and Poly-matrixGames (PM games), generated with payoffs in [0, 100].
For simplicity, we assume that all the players have the same number of actions m, i.e., that m p = m for all p ∈ N . This is w.l.o.g., as one can always add extra actions to a player with a payoff small enough to guarantee that such actions will never be played at an equilibrium.
For the experiments on NF games in the LMFM case, we also consider eight GAMUT classes of structured normal form games, BertrandOligopoly, Bidirectional-LEGs, MinimumEffortGames, RandomGraphicalGames, DispersionGames, Covari-antGames, TravelersDilemma, and UniformLEGs, generating 10 instances with 2 followers and m = 8 actions per player for each of them.
Throughout the section, the results of our experiments are compared w.r.t. computing time (in seconds) and (multiplicative) optimality gap. 3 . For both values, we report the arithmetic average for each game class and value of m and n over the 10 3 The optimality gap is defined as min{ UB−LB LB 100, 10 5 }%, where LB and UB are, resp., the largest lower bound (corresponding to the best feasible solution) and the smallest upper bound found by the solver within the time limit. The min operator prevents an unbounded value for LB = 0. An optimality gap of 10 5 highlights that the method fails to produce a useful solution as, due to the payoffs being in [0, 100], any strategy of the leader can achieve, at least, a utility of 0.
corresponding instances. In all the boxplots that we report, the red dash indicates the median, the box extends from the 25th to the 75th percentile, and dotted lines denote the whole sample distribution. Outliers are highlighted with a red mark.
We adopt five solvers: BARON and SCIP (for globally optimal solutions to every formulation, apart from O-PM-LPFM-III which is an MILP), CPLEX (for globally optimal solutions to O-PM-LPFM-III, as well as to the oracle formulation for PM games in the implicit enumeration and black-box methods), SNOPT (for locally optimal solutions to the formulations with purely continuous variables), and RBFOpt as the backbone of our black-box heuristic for pessimistic cases of LFNE (we will, nevertheless, also experiment with it for some optimistic variants). The O-NF-LPFM-Implicit-Enumeration algorithm is implemented in C. The experiments are run on a UNIX computer with a dual quad-core CPU at 2.33 GHz, equipped with 8 GB of RAM. Each algorithm is run using a single thread within a time limit of 3600 seconds. For the exact methods, we halt the execution whenever the optimality gap reaches 10 −12 %. 4

O-NF-LMFM-I, II, and III (n = 3)
We compare the different NF formulations when solved with BARON and SCIP. For RandomGames instances, the average computing time and optimality gap for each combination of formulation and solver is reported in Figure 1 as a function of m.
The results obtained with the two solvers are quite different. BARON better performs on O-NF-LMFM-I (the formulation with purely continuous variables), while SCIP better performs on O-NF-LMFM-III (the "reformulated" formulation which contains binary variables introduced to remove nonquadratic terms from O-NF-LMFM-II, as well as extra valid constraints). These results suggest that the formulation which is solved more efficiently with each solver is O-NF-LMFM-I with BARON and O-NF-LMFM-III with SCIP. These results are in line with the general computational behavior of BARON and SCIP, as the former tends to exhibiting a better performance on highly nonlinear and mostly continuous problems whereas the latter becomes more efficient as the number of integer/binary variables of the problem increases.
Further inspecting Figure 1, we notice that, with SCIP, O-NF-LMFM-III always outperforms O-NF-LMFM-II. This shows that SCIP is incapable of automatically constructing the reformulation obtained with O-NF-LMFM-III.
As to the computing times, the largest m for which at least a game is solved to optimality by BARON within the time limit is m = 8 for O-NF-LMFM-I and m = 7 for the other formulations. With SCIP, we reach m = 9 with O-NF-LMFM-III and m = 3 with the other ones. In particular, SCIP with O-NF-LMFM-III always requires a shorter computing time than BARON with O-NF-LMFM-I for every number of actions.
In terms of optimality gaps, SCIP remarkably outperforms BARON. As one can see in Figure 1  10 5 % when m ≥ 20. This is due to the solver returning an LB of 0 due to failing find a feasible solution in the time limit. Differently, the gap achieved by SCIP with O-NF-LMFM-III is below 15% for m up to m = 25. Such results suggest that, for games of this size, one can always achieve an almost constant gap, contrarily to what the intrinsic difficulty of the problem would suggest, namely, an exponential quality degradation as the number of actions grows. Moreover, these results show that SCIP with O-NF-LMFM-III always finds a feasible solution (an NE) for the followers' game and for some leader's strategy, differently from the other pairs of solver and formulation.
These observations are substantially confirmed when experimenting with the same solver/formulation pair on the eight structured classes of NF games. The average computing times reported in Figure 2 are indeed in line with the trends we observed for RandomGames, with SCIP outperforming BARON most of the times (on average). This trend becomes different when considering DispersionGames, where SCIP performs less efficiently than for the other classes of games, achieving computing times which are considerably larger than those obtained with BARON. This is due to the solver failing to solve two game instances within the time limit. This can be better observed in Figure 3 which reports the computing times only for the instances that are solved to optimality with the two solvers, as well as the percentage of such instances. In particular, we observe that SCIP solves 91.875% of the instances on average, whereas BARON only solves 81.25%.

BertrandOligopoly
BidirectionalLEG In Figure 4, we report the computing times and the optimality gaps obtained with SCIP for games of the GAMUT class PolymatrixGames. Since the results obtained with BARON are similar to those we illustrated for NF games, we omit them for the sake of brevity. Within the time limit, the largest m for which at least an instance is solved to optimality is m = 15. For m ≤ 10, all instances are solved to within a gap of 0 (within the numerical tolerance we set). In particular, the optimality gap is always below 15% for instances with up to m = 25, showing a trend which is substantially  less steep than that for NF games. This suggests that PM games are, as expected, easier to solve.

O-NF-LMFM-I, local optimization (n = 3)
In Figure 5, we report the experimental results obtained with SNOPT for RandomGames using formulation O-NF-LMFM-I. Due to the local optimization nature of the solver for nonconvex problems, to obtain statistically more relevant results we run 30 restarts with different initial starting solutions, sampled uniformly at random from the simplices of the strategies of the three agents, and return the best solution found. Figure 5(a) shows that the computing times with SNOPT (cumulated over the 30 random restarts) are much shorter than the computing times required by BARON and SCIP, allowing for solving (to a local optimum) almost all the instances with m = 20 within the time limit. Differently, as shown in Figure 5  returned by SNOPT (measured as their ratio over the value of an optimal solution found by SCIP or BARON) is rather poor even with very few actions. Indeed, the median of the ratios is between 10% and 20% for games with up to m = 7. This emphasizes the effectiveness of our approach based on spatial-branch-and-bound methods.

O-NF/PM-LMFM-III (n ≥ 4)
In Table 1, we report the average computing times obtained with SCIP when employing formulations O-NF-LMFM-III and O-PM-LMFM-III for games with 4 players or more. In the time limit, we can solve NF games with up to m = 5 for n ≤ 4 (corresponding to up to m n = 625 different outcomes and nm n = 2, 500 different payoffs) and up to m = 4 for n ≤ 6 (corresponding to up to m n = 4, 096 outcomes and nm n = 24, 576 payoffs). Quite interestingly, with our methods we can tackle instances of a size comparable to that of the largest instances used in [27] (such instances are generated with GAMUT [26] and are comparable to the ones in our testbed) to evaluate a set of algorithms proposed to find an NE (in a single level problem), in spite of our problem being clearly harder (as it admits the former as a subproblem). With PM games, our algorithms scale much better, allowing for finding exact solutions to PM games with up to m = 10 for n ≤ 5 and up to m = 7 for n ≤ 6.

O/P-NF/PM-LMFM-BlackBox
When experimenting with the black-box method, we first consider the optimistic case for NF games as, for it, we can compare the quality of the solutions we find to either the optimal solution value or its tightest upper bound. Namely, we compare O-NF-LMFM-BlackBox to O-NF-LMFM-III, the latter solved with SCIP within the time limit. The results are reported in Figure 6. In Figure 6(a) we observe, on average and for m ≤ 10, that the black-box method yields solutions to within 90% of the optimal ones found with SCIP. This suggests that the method might be sufficiently accurate. As shown in Figure 6(b), for m ≥ 10 the burden of calling SCIP to solve the oracle formulation becomes too large, making the black-box algorithm impractical.
An interesting result, see Figure 6(a), concerns the gap between the utility of the leader at an optimistic LFNE or at a pessimistic LFNE. On the instances solved to optimality (m ≤ 5), where we can verify the quality of the heuristic solutions, we see that the gap is rather small, suggesting that, in RandomGames instances generated with GAMUT, the leader can manage to force the followers to play a strategy which provides the leader with a utility not dramatically smaller than that which they would obtain in an optimistic LFNE.
In Figure 7, we report analogous results obtained with polymatrix games. In the time limit, we compare O-PM-LMFM-III solved with SCIP to O-PM-LMFM- BlackBox. Differently from the NF case, Figure 7(b) shows that, for PM games, the computing time needed to solve the oracle formulation (which is an MILP in this case) is much smaller and scales much better with m. Except for the case of m = 2, Figure 7(a) allows us to draw comparable conclusions to those that we have drawn for the NF case, with the leader achieving, in the pessimistic case, solutions that are not too far away from the corresponding optimistic ones w.r.t. their utility. Lastly, we focus on the case where the leader is restricted to pure strategies. We report the results in terms of computing times obtained by imposing δ ∈ {0, 1} m in O-NF/PM-LPFM-III with SCIP for RandomGames in Figure 8(a,b) and with CPLEX for PolymatrixGames (for which the formulation becomes an MILP) in Figure 8(c,d).
Interestingly, by imposing a binary δ to tackle the LPFM case the size of the largest instances solvable within the time limit increases from m = 9 to m = 13 in Ran-domGames and from m = 15 to m = 25 for PolymatrixGames when compared to the LMFM case.
For both RandomGames and PolymatrixGames, a dramatic performance improvement is obtained with O-NF/PM-LPFM-Implicit-Enumeration: with it, the size of the largest instance that we can solve increases from m = 13 to m = 20 for Ran-domGames and from m = 25 to m = 50 for PolymatrixGames. As expected, the computing times for PolymatrixGames are much smaller (due to only requiring the solution of an MILP at each step), allowing us to solve to optimality much larger instances.

Conclusions and future work
We have studied game-theoretic leader-follower (Stackelberg) situations with a bilevel structure where multiple followers play a Nash equilibrium once the leader has committed to a strategy. After analyzing the complexity of the problem, we have provided different algorithms and mathematical programming formulations to find an equilibrium for the optimistic case as well as a heuristic black-box method for the pessimistic case. We have conducted a thorough experimental evaluation of the different methods we have proposed, using various optimization solvers. Our experiments suggest that spatial branch-and-bound solvers can be used as effective solution methods when coupled with our formulations, providing a reasonably good optimality gap even for large games. Future works include the study of structured games, with focus on understanding how the specific structure of a game could be exploited to obtain easier to solve formulations (as we did for polymatrix games in this work). Moreover, it would be of interest to study the adaptation of our techniques to succint games (whose normalform representation has exponential size) relying on cutting plane methods to cope with the presence of exponentially-many best-reponse constraints, possibly using notions of diversity and bound improvement within the separation problem, see [4,5,14], to achieve a faster convergence. It would also be of interest to combine stateof-the-art equilibrium-finding algorithms for such games with methods similar to the black-box one we have proposed, which would directly benefit from the existence of an efficient equilibrium-finding algorithm for reoptimizing the followers' problem after changing the leader's strategy.
Future works also include the study of equilibrium-finding methods based on support enumeration, understanding, in particular, whether games which admit Nash equilibria of small-support in the case without a leader would still admit smallsupport equilibria in the Stackelberg case.
Among the challenging problems that we are interested to address in the future, we mention the design of algorithms to find an equilibrium when the followers play either a strong Nash equilibrium, a strong correlated equilibrium, or a solution concept defined in cooperative game theory.