A priori data-driven robustness guarantees on strategic deviations from generalised Nash equilibria

In this paper we focus on noncooperative games with uncertain constraints coupling the agents' decisions. We consider a setting where bounded deviations of agents' decisions from the equilibrium are possible, and uncertain constraints are inferred from data. Building upon recent advances in the so called scenario approach, we propose a randomised algorithm that returns a nominal equilibrium such that a pre-specified bound on the probability of violation for yet unseen constraints is satisfied for an entire region of admissible deviations surrounding it, thus supporting neighbourhoods of equilibria with probabilistic feasibility certificates. For the case in which the game admits a potential function, whose minimum coincides with the social welfare optimum of the population, the proposed algorithmic scheme opens the road to achieve a trade-off between the guaranteed feasibility levels of the region surrounding the nominal equilibrium, and its system-level efficiency. Detailed numerical simulations corroborate our theoretical results.


Introduction
The study of noncooperative games plays a significant role in a panoply of applications ranging from smartgrids [44] to communication [46] and social networks [2].In these setups, agents can be modelled as self-interested entities that interact with each other and make decisions based on possibly misaligned individual criteria, while being subject to constraints (local or global) that restrict the domain of their choices.Even though a variety of systems can be analysed by means of deterministic game-theoretic tools [24,38,46], in many applications the decision making procedure is affected by uncertainty.A number of results in the literature have explicitly addressed uncertainty in a noncooperative setting.Specifically, [7] follows a randomized approach for the special case of stochastic zero-sum games.Most results rely on specific assumptions on the probability distribution [15,47] and/or the geometry of the uncertainty set [3,27,36].
To circumvent these limitations, recent developments adopt a data-driven perspective, focusing on the connection of game theory with the so called scenario approach [9].This is based on the idea that an optimisation problem with constraints parametrised by an uncertain parameter-with fixed but possibly unknown support set and probability distribution-can be approximated by drawing samples of that parameter and solving the problem subject to the constraints generated by those samples only; this approximation is known as the scenario program.Standard results in the scenario approach [8,11,13] provide certificates on the probability that a new yet unseen constraint will violate the randomised solution obtained by the scenario program.
While the aforementioned results apply to uncertain convex optimisation problems, the works [12] and [14] paved the way towards the provision of data-driven robustness guarantees to solutions of more general nonconvex problems.In [22,23,37], these theoretical advancements were leveraged for the first time in a gametheoretic context, for the formulation of distributionfree probabilistic feasibility guarantees for randomised Nash equilibria.These works provide guarantees for one specific equilibrium point (often assumed to be unique); this was extended in [39,40], by providing a posteriori feasibility guarantees for the entire domain.Besides the game-theoretic context, alternative methodologies for set-oriented probabilistic feasibility guarantees have been proposed in the seminal works [5,16], which a priori characterise probabilistic feasibility regions constructed out of sampled constraints using statistical learning theoretic results.More recently, the so called probabilistic scaling [4,33] has been proposed to obtain a posteriori guarantees on the probability that a polytope generated out of samples is a subset of some chance-constrained feasibility region.Following an approach similar to [39], the works [17,18] deliver tighter guarantees by focusing on variational-inequality (VI) solution sets.
The results above follow a standard approach in the game-theoretic literature, where a strict behavioural assumption-the so called rationality-is imposed on the players' decision making.Namely, the players are viewed as rational agents wishing to maximize their profit (expressed by some given cost function).However, studies have shown that this is unrealistic in practice [10,29,41,48] and that agents usually exhibit a boundedly rational behaviour [42], i.e., their decisions can deviate from rationality due to individual biases, behavioural inertia, restricted computational power/time, etc.The consequences of this become relevant in engineering applications, as the human role in technical systems evolves beyond mere users and consumers to active agents, operators, decision-makers and enablers of efficient, resilient and sustainable infrastructures [32].
To bridge this gap between real-world applications and the cognate literature, here we study games with uncertain constraints, where deviations from a nominal equilibrium are explicitly considered.We follow a randomised approach to approximate the coupling constraints by means of data.In this more general setting, where deviations are considered, providing guarantees for a single solution is devoid of any meaning: indeed, repetition of the game might lead to a different solution in a neighbourhood around the nominal equilibrium, irrespective of the employed dataset.Technically speaking, this renders the identification of the data samples that support the solution (cf.sample compression [34]) a challenging task.Focusing on the class of generalised Nash/Wardrop equilibrium seeking problems (GNEs/GWEs) [19], we contribute to the provision of data-driven robustness guarantees for the collection of possible deviations from the equilibrium as follows: (1) Adopting a scenario-theoretic paradigm, we establish a methodology for the provision of a posteriori probabilistic feasibility guarantees for a region around the randomised equilibrium of the game under study.This result (Theorem 1), complements [39], [40], [17], [18], that instead focus on the entire feasibility region.Focusing on a circumscribed region around a GNE/GWE allows offering tighter probabilistic bounds, while the results of [39], [40], [17], [18], can be obtained as a limiting case of Theorem 1. (2) We design a data-driven algorithm that returns a GNE/GWE and show that all points in a predefined admissible region surrounding it enjoy a priori probabilistic feasibility guarantees.This result (Theorems 2 and 3), unlike Theorem 1, offers an a priori statement valid for a region that is tunable by the user, modelling possible deviations from a nominal equilibrium that a designer wishes to take into account when incentivising a certain operation profile.
A distinctive feature of this result is that it provides a priori guarantees for a set rather than single points [37], [22], [23].These guarantees depend on a prespecified quantity, which in turn can affect the location of the nominal equilibrium and the size of the region for which these probabilistic guarantees hold.As such this region is tunable, unlike [25] where a priori guarantees for a set of solutions are provided, but this set is not controlled by the user and could be arbitrarily narrow.Moreover, the results of [25] do not focus on games and follow a fundamentally different approach.Furthermore, when the game under study admits a potential function-whose minimum coincides with some social welfare optimum-our methodology provides a new perspective for trading off the probabilistic feasibility of the region surrounding the nominal equilibrium and its system-level efficiency.
(3) We propose an equilibrium seeking algorithm as the machinery to obtain a region surrounding a GNE/GWE over which the aforementioned feasibility guarantees hold.The algorithm relies on a primal-dual scheme and is inspired by seminal developments in [20].However, the mapping that characterizes the algorithm updates differs from those typically encountered in the literature (e.g., see [20,Ch. 12]).This requires showing that the ad-hoc mapping enjoys certain continuity and cocoercivity properties, thus extending the proof-line of [20] (see Lemmas 2 & 3, and proof of Theorem 2), a task which is interesting per se.
Our contributions compared to the cognate literature are summarized in Table 1.The rest of the paper is organized as follows.In Section 2 we provide fundamentals of game theory and the scenario approach.In Section 3.1 we show how the feasibility guarantees for a region around the game solution can be a posteriori quantified.In Section 3.2 we propose a data driven algorithm and prove its convergence to an equilibrium such that the considered neighbourhood of strategic deviations can satisfy prespecified probabilistic feasibility requirements.An illustrative example in Section 4 corroborates our theoretical analysis.Section 5 concludes the paper and presents future research directions.To streamline the presentation of our results, some proofs are deferred to the Appendix.

Preliminaries
Notation: All vectors are column unless otherwise indicated.R n + is the nonnegative orthant in R n .For an n × n matrix A, we write A ≻ 0 (A ⪰ 0) when x ⊺ Ax > 0 (x ⊺ Ax ≥ 0), for any x ∈ R n .We denote by 0 q×r the q × r null matrix, by I r the r × r identity matrix, and by 1 r the vector of r ones; dimensions can be omitted when clear from the context.e q is the unit vector whose q-th element is 1 and all other elements are 0, ∥ • ∥ p the pnorm operator, and (•) r denotes the r-th component of its vector argument.B p (x, ρ) = {y ∈ R d : ∥y − x∥ p < ρ} is the open p-normed ball centred at x with radius ρ; when p is omitted, any choice of norm is valid.For a set S, |S| denotes its cardinality, while 2 S denotes its power set, i.e., the collection of all subsets of S. Finally, given D ≻ 0, proj K,D [x] := arg min y∈K (y − x) ⊺ D(y − x) is the skewed projection of x onto the set K.

Games with uncertain constraints
We consider a population of agents with index set N = {1, . . ., N }.The decision vector x i of each agent i ∈ N takes value in the set is the global decision vector that is formed by concatenating the decisions of the entire population.The vector x −i ∈ R n(N −1) comprises all agents' decisions except for those of agent i.In our setup, the cost incurred by agent i ∈ N is expressed by a real-valued function J i (x i , x −i ) that depends on local decisions as well as on the decisions from other agents j ∈ N \ {i}.In the following, with a slight abuse of notation, we can exchange x for (x i , x −i ) to single out agent i's decision from the global decision vector.Furthermore, we consider uncertain constraints coupling the agents' decisions.These can be expressed in the form where g : R nN × ∆ → R depends on some uncertain parameter δ taking values in a support set ∆ according to a probability measure P.
Feasible collective decisions under this setup can be found by letting every agent i ∈ N solve the following optimization program, where the decisions x −i of all other agents are given, is the projection of the coupling constraint on X i for fixed x −i and uncertain realization δ ∈ ∆.The collection of coupled optimization programs in (2) for all i ∈ N constitutes an uncertain noncooperative game; we denote it as G.
Note that (2) follows a worst-case paradigm, taking into account all possible coupling constraints that can be realised by variations of the uncertain parameter δ ∈ ∆.This can render the solutions of G rather conservative.Furthermore, it is in general not possible to compute a solution for G without an accurate knowledge of, and/or additional assumptions on, the support set ∆ and the probability distribution P. To circumvent these limitations, we follow a data-driven paradigm and approximate G by means of a finite number of samples drawn from ∆, namely the K-multisample δ K = (δ 1 , . . ., δ K ) ∈ ∆ K .In the sequel, we hold on to the standing assumption that these samples are independent and identically distributed (i.i.d.).Apart from this, no other knowledge on the support set ∆ and the probability distribution P of the uncertain parameter is required.Then, for a given multi-sample δ K , (2) can be rewritten as Instead of considering all possible uncertainty realizations δ ∈ ∆ as in (2), we let the data encoded in δ K lead agents to their decision by solving (3).We refer to the collection of coupled optimization programs in (3) as the scenario game G K (the subscript K implies dependence on the drawn multi-sample δ K ).Under standard assumptions, a solution to the scenario game G K exists and the problem is, in contrast to G, tractable using state-of-the-art equilibrium seeking algorithms.

Variational inequalities and game equilibria
Notably-under certain assumptions detailed nextsolutions to the game G K can be retrieved as solutions to a variational inequality (VI), for specific choices of the mapping F : X → R nN [19, Thm 3.9]: where Π K := X ∩ K k=1 C δ k denotes the problem domain.A classic game solution concept, which encounters wide application in the literature, is the Nash equilibrium (NE) [35].At a NE, no agent can decrease their cost by unilaterally changing their decision.Formally, this can be stated as follows.
For our analysis, we rely on the following conditions: Assumption 1 For all i ∈ N , J i (x i , x −i ) is convex and continuously differentiable in x i for any fixed x −i .
Assumption 2 (1) For any multi-sample δ K ∈ ∆ K , the domain Π K is non-empty.(2) The set X = N i=1 X i is compact, polytopic and convex.
Note that convexity of the cost function with respect to the agent's local decision is crucial for the design of tailored algorithms with theoretical convergence guarantees for Nash equilibrium seeking.Under these assumptions, we can determine a GNE as in Definition 1 by solving (4) with A class of problems of common interest can be modelled by the so called aggregative games [1,28,30], where the cost incurred by agents depends on some aggregate measure-typically the average-of the decision of the entire population.Such a cost can be expressed in (3) by the function J i (x i , σ(x)), where the aggregate σ : R nN → R n is defined as the mapping x → 1 N N i=1 x i .A solution frequently linked to this class of games is the Wardrop equilibrium (WE), a concept akin to the NE but specifically defined in the context of transportation networks [6].The variational WEs of G K can be expressed by using ] i∈N ; notice that in this case the second argument of J i is fixed and set to σ(x), consistently with the notion of WE where agents neglect the impact of their decision on others.
Assumptions 1 and 3 are standard in the game-theoretic literature [20,46].Assumption 2 is relatively mild; the affine form of the constraints is exploited in the proposed algorithm (see Section 3) for the convergence to an equilibrium bearing the desired robustness properties.
We point out that in general only a subset of solutions to G K can be retrieved through (4): these are referred to as variational equilibria, and enjoy favourable properties over nonvariational ones, as with the former the coupling constraints' burden is equally split among agents [26,31].
The following lemma, adapted from [20,Thm. 2.3.3],formalises the connection between the solutions to VI K and the GNEs (or GWEs) of G K .
Lemma 1 Under Assumptions 1, 2 and 3, VI K has a unique solution that is also an equilibrium of G K .
For the considered class of VIs, several algorithms from the literature can be employed to obtain a variational equilibrium of G K ; see, e.g., [19,38].We remark that, even if not explicitly shown for ease of notation, any solution x * to G K is itself a function of the drawn multisample δ K ∈ ∆ K .Probabilistic feasibility guarantees for the unique solution of VI K can then be provided both in an a priori and a posteriori fashion by resorting to the results in [22,23,37].However, these results are tailored to the provision of probabilistic feasibility guarantees for a single point (namely the solution of a VI): any strategic deviation from the equilibrium is not covered by such guarantees.We cover this issue in Section 3. First, we provide some background on the scenario approach.

Basic concepts in the scenario approach
A fundamental notion in the scenario approach is the probability of violation of an uncertain constraint.
Definition 2 (1) The probability of violation V : A data-driven decision-making process can be formally characterized by a mapping-the algorithm-that takes as input the data encoded by the samples and returns a set of decisions.
Definition 3 An algorithm is a function A : ∆ l → 2 R nN × R nN that takes as input an l-multisample and returns the pair (x * , S * l (x * )), namely, a point x * and a solution set S * l parametrized by x * .
In the setting we consider, we have x * ∈ S * l (x * ); however, this ought not to be the case in general.In the following, we interpret the above definition as contextdependent, in that the size l of the input multisample is admitted to vary-all else remaining fixed for a given algorithm A.
A key notion, strongly linked to that of algorithm, is the minimal compression set [34].This concept springs from the observation that typically only a subset of the sampled data is relevant to a decision or set of decisions, and all other samples are redundant.

Definition 4 (Compression set) Consider an algorithm A as in Definition 3. A subset of samples
As multiple subset of samples can exist that fulfil this property, the ones with the minimal cardinality are called minimal compression sets.
If we feed the algorithm with the set of samples corresponding to a compression, then the same decision will be returned as when we feed the algorithm with the entire multi-sample.As established in [34], the compression set is related to the notion of support samples [11] and that of essential constraints [8].Under certain nondegeneracy assumptions these concepts coincide.
3 Probabilistic feasibility of sets around equilibria

A first a posteriori result
Returning to the scenario game G K in (3), we now consider a more general setup where agents are allowed to deviate from x * following, e.g., unmodelled changes in their cost functions; while we suppose that these deviations are feasible with respect to the local constraints, we want to study the feasibility as regards the coupling constraints obtained through sampling.Specifically, the region in which agents' strategies can deviate from the nominal equilibrium is assumed to lie within a predefined open ball B(x * , ρ), where ρ > 0 is a fixed radius that denotes the maximum possible distance of agents' deviations from x * ; the latter is assumed to be unique as per Lemma 1.As such, the region of interest is This is depicted in Figure 1 using the ∞-norm (any other norm could have been used instead): an algorithm A (see Sec. 2.3) takes as input a multi-sample δ K and returns the region S * K around the solution x * ∈ R2 of a game with two players whose decisions are defined as scalar quantities.For this pictorial example, Π K is shaped exclusively by sampled coupling constraints.Any compression set as per Definition 4 for A must be associated with the solid blue constraints (these form a compression for x * ), and with the dashed red constraint that intersects B(x * , ρ)-as its removal would change S * K .
We can quantify the number of samples that form a compression set for the algorithm that returns S * K in an a posteriori fashion as established in Theorem 1.To this end, for a fixed confidence β ∈ (0, 1), let the violation level be defined as a function ϵ : {0, . . ., K} → [0, 1] K (in green) obtained as the intersection of the set of deviations B∞(x * , ρ) around the equilibrium x * (red dot) with the domain ΠK .The samples producing the constraints in blue are in the compression set of x * , while those associated with the red constraint are in the compression set of S * K ; discarding these does not change x * .
Proof : Let (x * , S * K ) be the solution returned by A for some given δ K , according to Definition 3. We aim at determining a compression set for A(δ K ), and use its cardinality to reach the theorem's conclusion by means of Theorem 2 in [40].This would be the union of: (i) the samples that form a compression set for x * -i.e., solving the problem using only these would result in the same equilibrium obtained by using all samples-, and (ii) any other sample (not in the compression set of x * ) whose removal can still lead to a change of the region S * K .Case (i): Determining a (possibly non-minimal) compression set for x * can be achieved, as suggested in [14], by progressively removing samples till a subset that leaves the solution unchanged is determined.We denote its cardinality by s * .With reference to Fig. 1, this set would be associated with the blue constraints active at x * .Case (ii): We need to count the samples whose removal does not change x * but yields a larger region S * K (red constraint in Fig. 1).Their number can be upper bounded by the M facets of Π K that intersect S * K .Hence, the number of samples that form a compression set for A(δ K ) is bounded by s * +M .Existence of a compression set I with a bound on its cardinality is sufficient for the application of Theorem 2 in [40].The fact that for the minimal compression set |I * | ≤ |I| ≤ s * + M always holds leads then to the statement of this theorem.■ It is important to stress that the application of Theorem 1 is agnostic on the choice of the equilibrium seeking algorithm.To use the result of Theorem 1, one needs to quantify (an upper bound of) the number of samples s * that form a compression set for x * and (an upper bound of) the number M of coupling constraints that correspond to facets of S * K .While s * ≤ nN under Assumptions 1-3,3 an upper bound for M can in general only be achieved a posteriori, i.e., once δ K is sampled.In the next section we show how we could obtain a priori bounds for the same quantity.

A priori probabilistic certificates
Consider the scenario game G K and suppose that bounded deviations from the solution are allowed.We model such deviations as a ball of radius ρ around the equilibrium, as in Section 3.1.In contrast to the a posteriori nature of the result therein, our goal here is to achieve an a priori bound.Namely, we aim at establishing the main statement of Theorem 1 with a prespecified violation level, which does not depend on the given multi-sample δ K .In other words, we seek a statement-holding with known confidence-of the form V(Π K ∩ B(x * , ρ)) < ε, with ε ∈ (0, 1) a priori fixed.
To achieve this we build upon the previous conclusions, which expose a link between the probability of constraint violation and the number M of facets of Π K (each originated from some uncertainty sample) that B(x * , ρ) intersects.In particular, a monotonic relationship follows from (6): the smaller M the better, i.e., less conservative, the theoretical feasibility guarantees on constraint violation for the strategies belonging to the feasible region S * K surrounding the equilibrium.Also, a smaller value of M can result in a larger region for which the guarantees of Theorem 1 hold-due to a smaller portion of B(x * , ρ) being cut off by intersection with Π K .This motivates us to study the role of M as a modulating parameter for the robustness of the feasibility certificates offered for the region S * K , as well as the extent of deviation from the nominal equilibrium covered by such certificates.

GNE-seeking algorithm with a priori robustness guarantees
We consider an iterative scheme to determine a solution of VI K in (4).In particular, since the problem involves coupling constraints, we build our Algorithm 1 upon a primal-dual scheme, where constraint satisfaction is achieved by the use of Lagrange multipliers; similar developments hold for both GNE and GWE problems.To this end, we define the augmented vector y := (x, µ) ∈ R nN +m by stacking the global decision vector x Algorithm 1 A priori robust GNE seeking algorithm Require: κ ← κ + 1 5: until ∥y (κ+1) − y (κ) ∥ ≤ ξ 6: y * ← y (κ+1) 7: return y * and Π K ∩ B(x * , ρ) and the Lagrange multipliers µ = (µ ℓ ) m ℓ=1 ∈ M ⊆ R m + .The set M denotes the domain of µ; in the sequel we impose some structure on M once some necessary theoretical ingredients are introduced.As deterministic constraints do not play a role in the evaluation of the robustness guarantees, suppose for ease of exposition that Π K only comprises uncertain coupling constraints.Let where a ℓ ⊺ denotes the ℓ-th row of A. Eq. ( 7) is the irredundant H-representation of the polytopic feasibility region Π K defined in (4), where the rows of matrix A are unit vectors.Property (7b) is key to the second statement in Lemma 2. It entails no loss of generality, since for any A, b forming an equivalent H-representation of Π K , ( 7) can be obtained by normalising each row of A and the corresponding component of b by the row-vector norm.Thus, the pair (A, b) encodes the set of randomised coupling constraints that constitute facets of Π K4 .
The main step of Algorithm 1 (step 3) is a projected gradient descent (ascent) update for x (µ) through the mapping T : R nN +m+1 × N → R nN +m given by T follows from the primal-dual conditions of the game solution; see [19,Sec. 4.2], [20, Sec.1.4.1].F is the pseudogradient mapping defined as in Section 2.2, A, b are as in (7), and ρ := cρ1 m , where c is a constant scaling factor (see Sec. 3.2.2) and M a nonnegative integer.In the second block-row of ( 8), the m − M least relevant (based on the multipliers value) coupling constraints are tightened by an amount cρ through the mapping Q : R m + × N → {0, 1} m×m .Finally, the asymmetric projection matrix D ≻ 0 includes the step-size parameter τ > 0 and is defined as Note that the constraint tightening performed in the second block-row of T is equivalent to preventing B(x * , ρ) from intersecting these constraints.In other words, Q ensures that the number of facets of Π K intersecting B(x * , ρ) is at most M , which in turn enables to obtain an a priori estimate of the number of samples that form a compression for S * K and hence on V(Π K ∩ B(x * , ρ)); this is formalised by Theorems 2 and 3. Since m − M coupling constraints are tightened, smaller values for M can result in a more robust and possibly larger region S * K ; however, they can also move the location of the nominal equilibrium x * to a somewhat less efficient point towards the interior of Π K .As we will demonstrate numerically in the sequel, this is the case with potential games [21].

Constraint tightening via mapping Q
We define the mapping Q as where • P : R m → {0, 1} m×m returns a permutation matrix such that P (µ)µ is the vector composed by the elements of µ arranged in decreasing order.• R : N → {0, 1} m×m takes as input the number of coupling constraints M ≤ m we allow B(x * , ρ) to intersect with and returns as output the matrix Compatibly with the definition of , where the last equality holds since all components of ρ are equal.

Convergence analysis and main result
Due to Q, mapping T is discontinuous on X × R m .To circumvent this, we restrict the multipliers to the set M on which we impose some structure granting continuity of T on X × M. To this end, let Z := [ζ, +∞) ∪ {0}, for some small ζ > 0, i.e., Z ⊂ R contains all nonnegative scalars which take value greater than ζ when nonzero.
Assumption 4 Let Λ be an arbitrarily large compact set.M admits the form Recalling that P (µ)µ rearranges the multipliers in descending order, the set M contains all vectors where the difference between every pair of strictly positive components-and the distance of the smallest of these from zero-is no less than ζ.We note that (13) is the union of q = m! + m + 1 disjoint convex subsets of R m + , each of which we denote as M j , i.e., M = q j=1 M j ; figure 2 illustrates this set for m = 3.It is therefore possible to compute the projection in line 3 of Algorithm 1 by, e.g., projecting on M j , for j = 1, . . ., q, and then setting y (κ+1) to be the solution among these that results in the minimum distance from y (κ) − D −1 T (y (κ) , ρ, M ).Still, the projection on M can be computationally intensive if q is large.
Imposing on M the structure of (13) endows T with the desired nonexpansiveness properties that are exploited in the proof of Lemma 3. In the numerical implementation of the algorithm, ensuring µ ∈ M can possibly introduce small perturbations in the multipliers-compared to standard formulations where µ ∈ R m + -which in turn could produce a slight violation of the constraints (this can be controlled through the magnitude of ζ).We note that M is compact by construction due to the intersection with the compact set Λ in (13) which can, however, be arbitrarily large thus not impacting the result numerically.Compactness is used in the proof of Theorem 2; Remark 1 discusses cases where this requirement can be lifted.
(2) Let D as in (9) and set τ > 0 such that Then, for any j = 1, . . ., q, Algorithm 1 converges to a solution of VI(X × M j , T ), when the gradient step in line 3 is projected on the corresponding subdomain, for any y (0) ∈ X × M j .
Continuity of the mapping is essential for the theoretical convergence of Algorithm 1.The second part of Lemma 3 provides an admissible range of values for τ such that Algorithm 1 converges to a solution of VI(X × M j , T ) if at each iteration the projection in line 3 is performed on the (convex) subdomain M j ⊂ M, j ∈ {1, . . ., q}.
The stepsize τ is chosen such that conditions standard in NE seeking are satisfied and oscillations among multiple equilibria are avoided.Still, we are interested in establishing convergence on the entire domain M, so at each iteration the projected solution might belong to a different subdomain.This does not trivially follow from the second part of Lemma 3; therefore, by Lemmas 2 and 3 we establish an additional condition on τ such that Algorithm 1 retrieves a solution of VI(X × M, T ).
Theorem 2 Consider Assumptions 1, 2, 3 and 4. Fixed 0 ≤ M ≤ m, assume the domain Π K is nonempty for any of the m M combinations of constraints tightened as in (12).Let D ≻ 0 be defined as in (9), where τ satisfies (15) and where Then Algorithm 1 converges to a solution of VI(X × M) for any initial condition y (0) ∈ X × M.
Note that as µ (κ) → µ * , we have Then, the solution returned by Algorithm 1 is the equilibrium of a variant of G K with m−M tightened constraints (follows from (12) with Q(µ) replaced by Q * ).
Remark 1 (Relaxing compactness) Theorem 2 still holds when Λ = R m in the definition of M in (13) if for all multi-samples, (i) is full row-rank, or (ii) all elements of A are positive.
(i) To show this, consider mapping T and matrix D in (9).The multipliers' update involves projecting (weighted according to D) on M, the term Since X is compact, there exists a subsequence {κ i } i∈N such that lim i→∞ κ i = ∞, lim i→∞ x (κi) = x, for some x ∈ X.It suffices to show that the sequence of multipliers {µ (κi) } ∞ i=1 remains bounded (all arguments in the proof of Theorem 2 from (34) onwards remain unaltered).For the sake of contradiction, assume that there exists at least one element of µ (κi) that tends to infinity across the considered subsequence.Let then µ (κi) = (µ F ), where based on our contradiction hypothesis lim i→∞ ∥µ (Taking the first elements of µ (κi) to be the ones that tend to infinity is only to simplify notation and is without loss of generality.∞ ∥ → ∞, we need the terms that are integrated in the multipliers' update, i.e., last two terms in (17), to be positive for all i (in fact across a subsequence), which since τ > 0 is equivalent to where (•) ∞ denotes the elements of its argument corresponding to µ (κi) F .As such, we have However, lim i→∞ x (κi) = x ∈ X, and (Q(µ (κi) , M )ρ) ∞ ≤ cρ for all i, while by Lemma 3, F is continuous over the domain of multipliers satisfying (13).Moreover, µ contains the components of µ (κi) that remain finite.Therefore, the limit as i → ∞ of the right-hand side of ( 19) is finite.Due to the assumed full row-rank structure ∞ ∥ < ∞, establishing a contradiction showing that the subsequence {µ (κi) } remains bounded.(ii) If all elements of A are positive, and since a ℓ ⊺ a ℓ = 1, for all ℓ = 1, . . ., m, all arguments of case (i) remain the same with the only difference that we directly have that The next result accompanies the region S * K = Π K ∩ B(x * , ρ) of strategic deviations from the equilibrium x * with a priori probabilistic feasibility guarantees that can be tuned by means of M .It should be noted that Theorem 2 establishes that there exists a choice of τ to guarantee convergence of Algorithm 1.The admissible range of values for τ is explicit via ( 15), ( 16), but difficult to quantify due to R. Numerical evidence suggests that selecting a small enough value is sufficient for convergence.
Theorem 3 Consider Assumptions 1, 2, 3 and 4. Let x * and S * K = Π K ∩ B(x * , ρ) be returned by Algorithm 1; fix ϵ ∈ (0, 1) and M .We then have that By Definition 2, Theorem 3 guarantees that for any point in S * K , the probability of constraint violation is bounded by ε, with confidence at least 1 The dependence of this term on M gives us an additional degree of freedom in trading the robustness of the solution for its associated probabilistic confidence.The choice of M can also have an effect on the size of S * K , as well as on the location of x * , thus resulting in a trade-off between performance and robustness.
For the case in which the coupling constraints concern exclusively the aggregate variable, it can be shown that the upper limit of the summation in the right-hand side of ( 20) can be replaced by n+M −1, as n is the dimension of the aggregate vector.This allows to state (20) with a much higher confidence of 1− for details, we refer the reader to [40], where the notion of support rank is exploited [45].

Numerical example
Consider a game with N agents whose decisions are subject to deterministic local constraints and uncertain coupling constraints on the aggregate decision: where C ≻ αI n , for some α > 0, and d ∈ R n .Note that a structure similar to our numerical example has been considered in applications of aggregative games such as electric vehicle charging and traffic management under uncertainty [38], [22], [23].We impose no knowledge of ∆ and P; we rely instead on a scenariobased approximation of the game, whereby each sample δ k ∈ δ K gives rise to b δ k , b δ k .Eq. ( 21) is an aggregative game in the form of (3).In this instance, we assume each agent's action has negligible effect on the aggregate, and accordingly consider a GWE-seeking problem.Following the definition of F WE (Sec.2.2), we get We employ Algorithm 1 to seek a WE x * such that, by fixing M , a prespecified theoretical violation level is guaranteed for the set Π K ∩ B(x * , ρ).Due to uniqueness of σ * , all sets B(•, ρ)-parametrised by any x * solving (21)-are projected on the unique ball B(σ * , ρ/N ) in the aggregate space.Also note that by definition of σ, at 5 We note that this case slightly transcends the conditions in Theorem 2, as F does not comply with Assumption 3-(1).Convergence of Algorithm 1 (following from the nonexpansiveness of T on each subdomain Mj) can still be ensured here due to the affine structure of F ; cf.[20, Sec.12.5.1].
most n non-redundant samples will contribute to define the domain Π (21).For the derivation of the robustness guarantees, we can thus restrict our attention to S * K = Π σ K ∩ B(σ * , ρ/N ) ⊆ R n .As remarked at the end of Section 3.3, we can apply (20) with the upper limit in the summation involved replaced by n+M −1.For the case n = 2, N = 50, and different choices of M , Figure 3 depicts the projected iterations {σ(x (κ) )}, κ = 1, 2, . . .generated by Algorithm 1 on the space Π σ K .It can be observed how the region S * K changes as the value of M is modified.
It is worth noting that in this case F (x) is integrablethis can be inferred by [20, Thm.1.3.1]since the Jacobian of the game is symmetric, i.e., ∇ x F (x) = ∇ x F (x) ⊺ .Therefore, a GWE x * can also be obtained by solving In other words, this game admits a potential function E(x) := σ(x) ⊺ Cσ(x) + d ⊺ σ(x), whose minimizers correspond to GWEs.E can be interpreted as the total cost incurred by the population of agents, and its minimization leads to the optimum social welfare.The contour lines of E are depicted in Figure 3: since x * minimises E(•), σ * lies on the contour associated to the minimum value of E within the feasible domain.Lower values of M result in larger regions for which guarantees are provided.Figure 4 shows how the sequence {E(x (κ) )} κ=1,2,... converges to the minimum potential within the possibly tightened feasibility region.It can be observed how in this case the efficiency of the equilibrium decreases as smaller values of M are chosen.The three panels in Figure 4 show the trade-off between system level efficiency and the guaranteed robustness levels.
The lower the value of M , the lower the empirical constraints violation-corresponding to a better confidence bound in the right-hand side of (20).

Concluding remarks
This work proposes a data-driven equilibrium-seeking algorithm such that probabilistic feasibility guarantees are provided for a region surrounding a game equilibrium.These guarantees are a priori and the region that is accompanied with such a probabilistic certificate is tunable.For games that admit a potential function, the proposed scheme is shown to achieve a trade-off between cost and the level of probabilistic feasibility guarantees.In fact, our scheme returns the most efficient equilibrium such that the predefined guarantees are achieved.Proving this conjecture is left for future work.Moreover, current work investigates a distributed implementation of the proposed equilibrium seeking algorithm.) evaluated along the iterations of Algorithm 1. Lower values of M yield better confidence on the theoretical robustness certificates for the considered region (see Thm. 3), which results in a lower empirical probability of constraint violation.On the other hand, the system-level efficiency of the equilibrium increases for higher values of M .

Proof of Lemma 2
Let µ, z be arbitrary vectors in M and, as in the proof of Lemma 3, define ⃗ µ, ⃗ z as the vectors composed by rearranging the elements of µ, z in decreasing order.According to this arrangement, let I µ = {i 1 , i 2 , . . ., i m } be the ordered set of indices of µ, i.e., i k : µ i k = ⃗ µ k , k = 1, . . ., m; as a result, i 1 and i m will be the indices of the largest and smallest components of µ, respectively.Applying a similar definition to z, we denote the corresponding set I z := {j 1 , j 2 , . . ., j m }.Then, the first M indices in I µ and I z , denoted as L µ and L z , respectively, are relative to the constraints not tightened by the application of Q(•, M ).In other words, for all ℓ ∈ L µ , (Q(µ, M )ρ) ℓ = 0-and similarly for z.Vice versa, the complementary sets We distinguish between the following cases: ) are pairwise disjoint and exhaust the set {1, . . ., m}.Hence we can write Now, notice that for any i ∈ L c µ ∩ L z ⊆ L c µ and j ∈ L µ ∩ L c z ⊆ L µ , we have by definition of L µ and L c µ that µ i ≤ µ j (which by (13) only holds with equality if µ i = µ j = 0).With analogous reasoning, we have z i ≥ z j for any i ∈ L c µ ∩ L z ⊆ L z and j ∈ L µ ∩ L c z ⊆ L c z .Let h 1 be the cardinality of the set L c µ ∩ L z , and h 2 that of L µ ∩ L c z .Then, where (a) holds since L µ , L z ⊆ {1, . . ., M }, and (b) follows from |L µ | = |L z | = M .Therefore h 1 = h 2 =: h and 0 ≤ h ≤ M , which implies U 1 ≤ 0 and U 2 ≤ 0 in (23).We can observe that U 1 < 0 and U 2 < 0 if L µ ∩ L c z and L c µ ∩ L z are nonempty and the corresponding components of µ and z are nonzero.In such a case h ≥ 1 and we can write where the inequality follows from (13) and the above discussion.A similar reasoning holds for U 2 .Lastly, note that if µ ̸ = z and h ≥ 1, then at least one of U 1 ≤ −hζ and U 2 ≤ −hζ will hold.By (23), we can thus conclude U ≤ −hζcρ for any µ, z ∈ M, µ ̸ = z.■

Proof of Lemma 3
Part (1): To prove that the mapping T is continuous on its domain, we first notice that T is by construction continuous on X × M when the operator Q(•, M ) is continuous on M (as the parameter M is fixed).Therefore, it is sufficient to show that for any µ, z ∈ M and any η > 0, there exists δ > 0 such that where ρ = cρ1 m ̸ = 0. To this end, consider any µ, z ∈ M such that ∥µ − z∥ < ζ 2 , with ζ as defined in (13). 6Let ⃗ µ and ⃗ z denote the vectors µ and z sorted in decreasing order; thus, ⃗ µ ℓ is the ℓ-th largest element of µ (and similarly for z).For any given ℓ, let i : µ i = ⃗ µ ℓ , j : z j = ⃗ z ℓ , and l := min {1,...,m} ℓ : i ̸ = j.In words, l is the smallest index for which the ℓ-th largest elements of µ and z do not appear at the same row of their respective vectors.We then let I be the set of indices for which the ordering of the elements of µ and z agrees, i.e., for all k ∈ I, there exists ℓ < l such that i = j = k, with i : µ i = ⃗ µ ℓ and j : z j = ⃗ z ℓ .
We prove our statement by contradiction.Suppose there exists i, j / ∈ I such that i : µ i = ⃗ µ ℓ and j : z j = ⃗ z ℓ for some ℓ > l, where µ i < µ j and z i > z j .First, we note that such an instance exists by hypothesis, as otherwise the only possible case is where i = j, which contradicts i, j / ∈ I and implies Q(µ, M ) = Q(z, M ).Since z ∈ M, it further holds z j < z i −ζ, which by We bound (26) from below by noting z j > µ j − ζ 2 , which holds since or equivalently µ j < µ i , which contradicts our hypothesis.Hence the elements of any pair of vectors µ, z ∈ M such that ∥µ − z∥ < ζ 2 must follow the same ordering.By definition of P (•), this implies P (µ) = P (z) and, in turn, ∥Q(µ, M ) − Q(z, M )∥ = 0.This validates (25) with δ = ζ 2 and any η > 0, establishing the continuity of Q(•, M ) on M and concluding the proof of the first part.
Part (2): We show that the mapping T fulfils certain nonexpansiveness properties required for the convergence of Algorithm 1, for compatible choices of τ .In particular, we provide a sufficient condition for which the iteration y (κ+1) = proj X×Mj ,D y (κ) − D −1 T (y (κ) , ρ, M ) , (27) converges to a solution of VI(X × M j , T ), where j ∈ {1, . . ., q} is fixed, for any y (0) ∈ X × M j .Notice that in (27) the skew projection is performed on the convex subdomain X × M j .( 27) is the solution of the VI(X × M j , T w, ρ, M ), for all w ∈ U. To ease notation, we drop the dependence of T and TD on ρ, M , as they remain fixed throughout the proof.According to [20, Thm.12.5.2](see also [49,Sec. 4.3]), to ensure convergence of (27) to a solution of the VI(X × M j , T ) it is sufficient to show that TD = T − D is β-cocoercive on U j , i.e., ( TD (v) − TD (w)) ⊺ (v − w) ≥ β∥ TD (v) − TD (w)∥ 2 , (28) for some β > 1  2 and all v, w ∈ U j , j ∈ {1, . . ., q}.In fact, we will go a step forward and demonstrate here that TD is co-coercive on U with β > 1  2 .Due to the saddle problem structure of the mapping in (8), we adopt the procedure in [20,Prop. 12.5.4]and define D as in (9) (see also [38]).It then follows from the above definitions that TD (w), for any w ∈ U, reduces to which can be easily seen by rewriting (8) as Then, for any w a , w b ∈ U, we can expand (28) by using (29), obtaining for all y a , y b ∈ X × M, where the last equality follows from the definition of U j and by expanding the norm.Matrix W can be written as W = W11 W12 W21 W22 , where W 11 ∈ R nN ×nN , W 12 ∈ R nN ×m , W 33 ∈ R m×m are: Expanding the inner product in (30) with respect to the matrix blocks W 11 , W 12 , W 21 , W 33 we obtain where for the last inequality we used, in order, (i) strong monotonicity of 2 -which follows from the same arguments used in the proof of Lemma 2-and (iv) (p τ + q τ ) ⊺ (p τ + q τ ) ≤ 2(p τ ⊺ p τ + q τ ⊺ q τ ).Expanding the term containing p τ , q τ in (31) we get where (a) is obtained by applying the Cauchy-Schwarz inequality, and in (b) we use the Lipschitz continuity of , and the triangle inequality.Notice that for the last term in (32) holds for any choice of τ ∈ 0, max 1 ∥A∥ , 1 .Recall that by invoking [20, Thm.12.5.2],our objective is to show that (28) holds for some τ > 0 and β > 1 2 .Then, by inspecting (32) and using (33), to achieve this it is sufficient to guarantee Solving the quadratic expressions above with respect to τ results in the admissible range of values in (15) (these are also satisfying τ ∈ 0, max 1 ∥A∥ , 1 , required for (33) to hold).Therefore, for any τ satisfying this condition, TD is co-coercive with β > 1 2 on the entire domain U, which in turn implies that co-coercivity of TD holds on each subdomain U j , j = 1, . . ., q, with the same modulus.By [20, Thm.12.5.2],this is sufficient to guarantee the convergence of ( 27) to a solution of the VI(X × M j , T ), thus concluding the proof.■

Proof of Theorem 2
Fix any τ satisfying the conditions of Lemma 3 and ( 16).
Due to our contradiction hypothesis (recall that {κ i } i∈N is a subsequence), the sequence of iterates generated by Algorithm 1 would be leaving M 1 towards M 2 infinitely often.Denote then by κ > κ the smallest index of the subsequence such that µ (κ) ∈ M 1 , but µ (κ+1) ∈ M 2 , i.e., after the κ-th iterate the original sequence would jump to M 2 (for the first time after κ).For this jump to occur, the unprojected solution for the Lagrange multipliers must be "closer" to M 2 than to any other sub- domain of M. To see this more formally, let D −1 µ denote the lower block-row of D −1 = τ I nN 0 2Aτ 2 τ Im , corresponding to the Lagrange multiplier update in line 3 of Algorithm 1.By definition of M, such a jump requires the Euclidean distance between the unprojected gradient step at κ + 1 and µ (κ) to satisfy Figure 5 illustrates this construction: (35) describes the minimum distance for a jump to occur.This is when the ellipsoidal contour levels according to which the projection is performed (skew projection defined by matrix D) have their major axis aligned between subdomains as in Figure 5 (solid red ellipses).For this two-dimensional example this distance would then be half the width of the white stripe, i.e., ζ/ √ 2. We rather impose ζ/2 (which is smaller) in (35), to account for the case where one of the subdomains is the origin (M 3 ).However, ∥µ (κ) − D −1 µ T (y (κ) , ρ, M ) − µ (κ) ∥ = τ ∥ − 2τ A(F (x (κ) ) + A ⊺ µ (κ) ) + Ax (κ) − b + Q(µ (κ) , M )ρ∥ = τ ∥ − 2τ A(F (x (κ) ) − F (x 1 ) + A ⊺ (µ (κ) − μ1 )) where the first equality follows from the definition of D −1 µ and T , and the second one by adding and subtracting F (x 1 ), A ⊺ μ1 and Ax 1 .The first inequality is due to the triangle inequality, while the last one follows from the previous one by upper-bounding (i) the first two terms using the definition of R; (ii) ∥Q(µ (κ) , M )ρ∥ by cρ √ m − M based on its definition; and (iii) the last three terms using ∥F (x (κ) ) − F (x 1 )∥ ≤ L F ∥x (κ) − x1 ∥ by Assumption 3, and ∥x (κ) − x1 ∥ ≤ δ, ∥µ (κ) − μ1 ∥ ≤ δ.By (36), and choosing τ as in (16), we have that where K is a constant, emanating from the coefficient of δ in (36) when substituting for τ the upper-bound in (16).Note that κ is a function of δ, as it depends on κ, which in turn depends on δ.Since δ is arbitrary, taking lim sup δ→0 in (37) and lim inf δ→0 in (35) establishing a contradiction.Then μ2 , μ1 ∈ M 1 , i.e., all cluster points should be in the same subdomain of M. As Lemma 3 establishes co-coercivity of T on each subdomain X×M j , j = 1, . . ., q, it must be μ2 = μ1 , i.e., Ω is a singleton, implying that Algorithm 1 converges.■

Proof of Theorem 3
The elements of the minimal compression set I of Algorithm 1 can belong to one or both of the following sets: (1) The subset I 1 of samples that form a minimal compression for x * .Note that since Algorithm 1 converges to the point (x * , µ * ) for a fixed choice of M , Q(µ * , M ) will be a fixed quantity.Then Algorithm 1 will converge to a solution of Find x * ∈ Π K such that where Π K denotes the polytope obtained from Π K by tightening at most M coupling constraints, as dictated by (12) with Q(µ * , M ).The constraints in (40) are equivalent to F (x * ) ⊺ x ≥ F (x * ) ⊺ x * for all x ∈ Π K .Then, x * is the minimiser of min which is unique due to Lemma 1.Since the cost function is linear in x and Π K is convex by Assumption 2, we obtain a scenario program as in [11] where the last equality is due to (7b); depending on the choice of norm, c = 1 if B(•, ρ) is expressed by a p-norm with p ≤ 2, c = √ n otherwise.Conversely, at most M constraints can intersect B(x * , ρ) upon convergence of the algorithm.Let L(M ) ⊆ {1, . . ., m} contain the indices of the M largest multipliers.Then, ℓ ∈ L(M ) ⇔ (Q(µ, M )ρ) ℓ = 0, and the second block row of T in (8) expresses

Fig. 2 .
Fig. 2. Domain M of the Lagrange multipliers associated to the coupling constraints, for ζ = 0.2 and m = 3.This results in q = 10 convex subsets, including the origin and a portion of the axes.
corresponding partition of A and b, respectively, where A ∞ , b ∞ are non-empty by hypothesis.To have ∥µ (κi)

≥ 2 Fig. 3 .
Fig. 3. Iterates generated by Algorithm 1 (blue diamonds) for different choices of M .In this numerical instance, N = 50, ρ = 10, and Xi := {xi ∈ R n : xi ∈ [x i , xi]}, with x i = (0 0), xi = (3.5 3.5).The randomly generated coupling constraints form the rectangular feasibility region Π σ K (delineated by the solid black line).The red-shaded region represents the intersection between the latter and the ball B1(σ * , ρ/N ) around the aggregate equilibrium σ * (red diamond marker).In this instance, its volume increases as larger values for M are chosen.The value associated to the contour lines of the potential function E decreases from top-right to bottom-left, showing that σ * is the unique minimiser in the admissible region (shaded in green) after constraint tightening is performed by the algorithm (see Sec. 3.2.2).

Fig. 4 .
Fig.4.Potential function E(x(κ) ) evaluated along the iterations of Algorithm 1. Lower values of M yield better confidence on the theoretical robustness certificates for the considered region (see Thm. 3), which results in a lower empirical probability of constraint violation.On the other hand, the system-level efficiency of the equilibrium increases for higher values of M .

Fig. 5 .
Fig. 5. Domain M of the Lagrange multipliers associated to the coupling constraints, for the case m = 2. Notice the minimum distance ζ between any two subdomains of M involves the origin as one of these subdomains.