Fast generalized Nash equilibrium seeking under partial-decision information

We address the generalized Nash equilibrium seeking problem in a partial-decision information scenario, where each agent can only exchange information with some neighbors, although its cost function possibly depends on the strategies of all agents. The few existing methods build on projected pseudo-gradient dynamics, and require either double-layer iterations or conservative conditions on the step sizes. To overcome both these flaws and improve efficiency, we design the first fully-distributed single-layer algorithms based on proximal best-response. Our schemes are fixed-step and allow for inexact updates, which is crucial for reducing the computational complexity. Under standard assumptions on the game primitives, we establish convergence to a variational equilibrium (with linear rate for games without coupling constraints) by recasting our algorithms as proximal-point methods, opportunely preconditioned to distribute the computation among the agents. Since our analysis hinges on a restricted monotonicity property, we also provide new general results that significantly extend the domain of applicability of proximal-point methods. Besides, the operator-theoretic approach favors the implementation of provably correct acceleration schemes that can further improve the convergence speed. Finally, the potential of our algorithms is demonstrated numerically, revealing much faster convergence with respect to projected pseudo-gradient methods and validating our theoretical findings.


I. INTRODUCTION
Generalized games model the interaction between selfish decision makers, or agents, that aim optimizing their individual, but inter-dependent, objective functions, subject to shared constraints. This competitive scenario has received increasing attention with the spreading of networked systems, due to the numerous engineering applications, including demand response in competitive markets [1], demand-side management in the smart grid [2], charging/discharging of electric vehicles [3] and radio communication [4]. From a game-theoretic perspective, the challenge is is to assign the agents behavioral rules that eventually ensure the attainment of an equilibrium. In fact, a recent part of the literature focuses on designing distributed algorithms to seek a GNE, a joint action from which no agent has interest to unilaterally deviate [5], [6], [7], [8]. In the cited works, the computational effort is partitioned among the agents, but under the assumption that each of them has access to the decision of all the competitors (or to an aggregation value, in the case of aggregative games). Such an hypothesis, referred as full-decision information, requires the presence of a central coordinator that can communicate with all the agents, and might be impractical in some domains [9], [10]. One example is the Nash-Cournot competition model M. Bianchi  described in [11], where the profit of each of a group of firms depends not only on its own production, but also on the total supply, a quantity not directly accessible by any of the firms. A solution is offered by fully-distributed algorithms that can be implemented by relying on peer-to-peer communication only. Specifically, we consider the so-called partial-decision information scenario, where the agents agree on sharing their strategies with some neighbors on a network; based on the knowledge exchange, they can estimate and eventually reconstruct the action of all the competitors.
The partial-decision information setup has only been studied very recently. A number of approaches have been proposed for non-generalized games (i.e., in the absence of coupling constraints) [11], [12], [13], [14]. Instead, fewer works deal with the presence of shared constraints, despite this is a significant extension, which arises naturally when the agents compete for common resources [5,2]. For example, in the Nash-Cournot model described above, the overall production of the firms is bounded by the market capacity. Of particular interest for this paper is the technique in [15], where the GNE problem is reformulated as that of finding a zero of a monotone operator. Indeed, the operator-theoretic approach is very elegant and convenient: several splitting methods are already well established to solve monotone inclusions, and the properties of fixed-point iterations are well understood [16,26], thus providing a unified framework to design algorithms and study their convergence. For instance, a fully-distributed method for aggregative games with affine coupling constraints is proposed in [17], based on a preconditioned forward-backward splitting [16, 26.5]. The authors of [18] exploit results on fixed-point iterations with errors [19] to solve generalized aggregative games on time-varying networks. All the aforementioned formulations resort to (projected) gradient and consensus dynamics, and are single-layer (i.e., they require a fixed number of communications per iteration). As a drawback, due to the partial-decision information assumption, theoretical guarantees are obtained only for small (or vanisihing) stepsizes, which significantly affects the speed of convergence. Alternatively, the work [20] presents a proximal-point algorithm (PPA) to solve (merely monotone) GNE problems, possibly under partial-information, but that requires an increasing number of communications at each step. Similarly, double-layer proximal best-response dynamics are designed in [21] for stochastic games.
However, the extensive communication required may be a performance bottleneck, both if a large number of iterations is needed to converge and if the agents have to send information multiple times for each time step. In fact, the communication time can overwhelm the time spent on local useful processing -e.g., this is a common problem in parallel computing [22]. Even neglecting the time lost in the transmission, sending large volumes of data on wireless networks results in an increased energetic cost.
Contributions: To improve speed and efficiency, we design fast, single-layer, fixed-step, fully-distributed algorithms to solve GNE problems with affine coupling constraints, in a partial-decision information scenario. Our contributions are summarized as follows: • We derive a novel GNE seeking preconditioned proximalpoint algorithm (PPPA), with convergence guarantees under strong monotonicity and Lipschitz continuity of the game mapping. Convergence holds even if the proximal operator is computed inexactly (with summable errors). Our analysis relies on fixed-point iterations and exploits a restricted monotonicity property. Thanks to the use of a novel preconditioning matrix, our algorithm is fully-distributed and requires only one communication per iteration. To the best of our knowledge, our scheme is the first non-gradientbased, single-layer (G)NE seeking method for the partialdecision information setup (III-IV); • We tailor our method to efficiently solve aggregative games.
Specifically, we design a single-layer GNE seeking PPPA where the agents only keep and exchange an estimate of the aggregative value, instead of an estimate of all the other agents' actions (V); • By exploiting our operator-theoretic formulation, we apply some acceleration schemes [23] to our PPPA and provide convergence guarantees. We observe via numerical simulations that the iterations needed to converge can be reduced up a factor two (VI); • Via numerical simulations, we compare our approach to the pseudo-gradient method in [15], which is the only other known fully-distributed, single-layer, fixed-step GNE seeking algorithm (excluding that in [17], for the special class of aggregative games). Our simulations show that our PPPA significantly outperforms the method in [15] in terms of the number of iterations needed to converge, thus considerably reducing the communication burden, at the price of locally solving a strongly-convex optimization problem, rather than performing a projection, at each time step. Moreover, our scheme only requires one communication per iteration, instead of two (VII).
Basic notation: N denotes the set of natural numbers, including 0. R (R ≥0 ) is the set of (nonnegative) real numbers. 0 n (1 n ) denotes the vector of dimension n with all elements equal to 0 (1); I n ∈ R n×n denotes the identity matrix of dimension n; the subscripts might be omitted when there is no ambiguity. For a matrix A ∈ R n×m , its transpose is A , [A] i,j represents the element on the row i and column j. null(A) = {x ∈ R m | Ax = 0 n } and range(A) = {v ∈ R n | v = Ax for some x ∈ R m } are the null-space and image of A, respectively. ⊗ denotes the Kronecker product. A is the largest singular value of A, A ∞ the maximum of the absolute row sums of A. A 0 stands for symmetric positive definite matrix. Given A 0, x | y A = x Ay denotes the A-induced inner product of the vectors x and y, x A = √ x Ax denotes the the A-induced norm of the vector x; we omit the subscript if A = I. If A ∈ R n×n is symmetric, λ min (A) = λ 1 (A) ≤ · · · ≤ λ n (A) =: λ max (A) denote its eigenvalues. diag(A 1 , . . . , A N ) denotes the block diagonal matrix with A 1 , . . . , A N on its diagonal. Given N vectors x 1 , . . . , x N , col (x 1 , . . . , x N ) = [x 1 . . . x N ] . For a differentiable function g : R n → R, ∇ x g(x) denotes its gradient. 1 is the set of absolutely summable sequences.
Operator-theoretic background: For a function ψ : and zer (F) := {x ∈ R n | 0 ∈ F(x)} denote the domain, set of fixed points and set of zeros, respectively. F −1 denotes the inverse operator of F, defined through its graph as , for all (x, u),(y, v) ∈ gra(F). Id(·) denotes the identity operator. For a function ψ : R n → R ∪ {∞}, ∂ψ : dom(ψ) ⇒ R n denotes its subdifferential operator, defined as ∂ψ( if ψ is differentiable and convex, its subdifferential operator is its gradient. N S : R n ⇒ R n denotes the normal cone operator for the the set S ⊆ R n , i.e., If S is closed and convex, it holds that ∂ι S = N S , and (Id + N S ) −1 = P S is the Euclidean projection onto the set S. J F := (Id + F) −1 denotes the resolvent operator of F. A single-valued operator , for all x ∈ R n , y ∈ R n (y ∈ fix(F)).

II. MATHEMATICAL SETUP
We consider a set of agents, I := {1, . . . , N }, where each agent i ∈ I shall choose its decision variable (i.e., strategy) x i from its local decision set Ω i ⊆ R ni . Let x := col((x i ) i∈I ) ∈ Ω denote the stacked vector of all the agents' decisions, Ω = Ω 1 × · · · × Ω N ⊆ R n the overall action space and n := N i=1 n i . The goal of each agent i ∈ I is to minimize its objective function J i (x i , x −i ), which depends on both the local variable x i and on the decision variables of the other agents x −i := col((x j ) j∈I\{i} ).
Furthermore, the feasible decision set of each agent depends also on the action of the other agents via affine coupling constraints. Specifically, the overall feasible set is where A := [A 1 , . . . , A N ] and b := N i=1 b i , with A i ∈ R m×ni and b i ∈ R m being local data. The game then is represented by the inter-dependent optimization problems: The technical problem we consider here is the computation of a GNE, as formalized next.
Definition 1: A collective strategy x * = col ((x * i ) i∈I ) is a generalized Nash equilibrium if, for all i ∈ I, Next, we postulate some common regularity and convexity assumptions for the constraint sets and cost functions, see, e.g., [15,Ass. 1], [24,Ass. 1].
Standing Assumption 1: For each i ∈ I, the set Ω i is non-empty, closed and convex; X is non-empty and satisfies Slater's constraint qualification; J i is continuous and the function J i (·, x −i ) is convex and continuously differentiable for every x −i .
The v-GNEs are so called because they coincide with the solutions of the variational inequality VI(F, X ) 2 , where F is the pseudo-gradient mapping of the game: Under Standing Assumption 1, x * is a v-GNE of the game in (2) if and only if there exist a dual variable λ * ∈ R m such that the following Karush-Kuhn-Tucker (KKT) conditions are satisfied [5,Th. 4.8]: A sufficient condition for the existence of a unique v-GNE for the game in (2)  . It implies strong convexity of the functions J i (·, x −i ) for every x −i , but not necessarily (strong) convexity of J i in the full argument. Standing Assumption 2: The pseudo-gradient mapping in (3) is µ-strongly monotone and θ 0 -Lipschitz continuous, for some µ, θ 0 > 0: for any pair x, y ∈ R n , (

III. FULLY-DISTRIBUTED EQUILIBRIUM SEEKING
In this section, we present an algorithm to seek a GNE of the game in (2) in a fully-distributed way. Specifically, each agent i only knows its own cost function J i and feasible set Ω i , and a portion of the coupling constraints, namely (A i , b i ). 1 Informally speaking, a v-GNE is a GNE where the cost of the common limitations is fairly shared; for example, if Ω i = R and A i = 1 for all i, the first condition in (4) means that, at a v-GNE, the marginal loss due to the presence of the coupling constraints is the same for each agent, namely λ * ≥ 0. For an overview on v-GNEs, please refer to [5], [24]. 2 For an operator M : R n → R n and a set S ⊆ R n , the variational inequality VI(M, S) is the problem of finding a vector ω * ∈ S such that Moreover, agent i does not have full knowledge of x −i , and only relies on the information exchanged locally with some neighbors over an undirected communication network G(I, E). The unordered pair (i, j) belongs to the set of edges, E, if and only if agent i and j can mutually exchange information. We denote: W = [w ij ] i,j∈I ∈ R N ×N the weighted symmetric adjacency matrix of G, with w ij > 0 if (i, j) ∈ E, w ij = 0 otherwise, and the convention w ii = 0 for all i ∈ I; L = D − W the weighted symmetric Laplacian matrix of G, where D ∈ R N ×N is the degree matrix of G, i.e., D = diag((d i ) i∈I ) and d i = N j=1 w ij , for all i ∈ I; N i = {j | (i, j) ∈ E} the set of neighbors of agent i. Moreover, we label the edges (e ) ∈{1,...,E} , where E is the cardinality of the edges set E, and we assign to each edge e an arbitrary orientation. We denote the weighted incidence matrix as V ∈ R E×N , where In the partial-decision information scenario, to cope with the lack of knowledge, each agent keeps an estimate of all other agents' actions [27], [28], [15]. We denote . Moreover, each agent keeps an estimate λ i ∈ R m ≥0 of the dual variable and an auxiliary variable z i ∈ R m . Our proposed dynamics are summarized in Algorithm 1, where the global parameter α > 0 and the step sizes τ i , δ i , for all i ∈ I, and ν have to be chosen appropriately (see IV). We note that in Algorithm 1 the agents evaluate their cost functions in their local estimates, not on the actual collective strategy.
In steady state, agents should agree on their estimates, i.e., x i = x j , λ i = λ j , for all i, j ∈ I. This motivates the presence of consensual terms for both primal and dual variables. We denote E q := {y ∈ R N q : y = 1 N ⊗y, y ∈ R q } the consensual space of dimension q and E ⊥ q its orthogonal complement, for any integer q > 0. Specifically, E n is the estimate consensus subspace and E m is the dual variable consensus subspace.
for all x i,−i , for all i ∈ I, as a consequence of Standing Assumption 2. Therefore the argmin operator in Algorithm 1 is single-valued, and the algorithm is well defined.

IV. DERIVATION AND CONVERGENCE ANALYSIS
In this section, we derive Algorithm 1 as a PPPA and show its convergence by leveraging a restricted monotonicity property. Before going into details, we need some definitions. We denote x = col((x i ) i∈I ). Besides, let us define, as in [15, Communication: The agents exchange the variables {x k i , x k i,−i , λ k i } with their neighbors. Each agent i ∈ I does: Distributed Averaging: Eq. [13][14], for all i ∈ I, where n <i := j<i,j∈I n j , n >i := j>i,j∈I n j . In simple terms, R i selects the i-th n i dimensional component from an n-dimensional vector, while S i removes it. Thus, We define the extended pseudo-gradient mapping F as and the operators The following lemma relates the unique v-GNE of the game in (2) to the zeros of the operator A. The proof is analogous to [15,Th. 1] or Lemma 10 in V, and hence it is omitted.
Lemma 1: The following statements hold: A. Derivation of the algorithm Lemma 1 is fundamental, because it allows us recast the GNE problem as that of computing a zero of the mapping A in (9). In turn, this can be efficiently done by applying standard operator-splitting methods [16,[26][27][28]. By following this approach, fully-distributed GNE seeking dynamics were developed by the authors of [15], [18]. In effect, in this section we show that also Algorithm 1 is an instance of the PPA [16,Th. 23.41], applied to seek a zero of the (suitably preconditioned) operator A.
Nonetheless, technical difficulties arise because of the partial-decision information setup. Specifically, the operator A is not monotone in general, not even if strong monotonicity of the pseudo-gradient mapping F holds, i.e., Standing Assumption 2. This is due to the fact that, in the extended pseudo-gradient in (7), the partial gradient ∇ xi J i (x i , x i,−i ) is evaluated on the local estimate x i,−i , and not on the actual value x −i . Only when the estimates x belong to the consensus subspace, i.e. x = 1 N ⊗ x (namely, the estimate of each agents coincide with the actual value of x), we have that We remark that many operator-theoretic properties are not guaranteed for the resolvent J B = (Id + B) −1 of a nonmonotone operator B : R q ⇒ R q . By definition, it still holds that zer(B) = fix(J B ), but J B may have a limited domain, or be not single-valued. In this general case, we write the PPA as that is well defined only if J B (ω k ) = ∅ for all k.
Next, we show that Algorithm 1 is obtained by applying the iteration in (10) is called preconditioning matrix, and the step sizesτ = diag((τ i I n ) i∈I ),ν = νI Em ,δ = diag((δ i I m ) i∈I ), have to be chosen such that Φ is positive definite. In this case, it also holds that zer(Φ −1 A) = zer(A). Sufficient conditions that ensure Φ 0 are provided in the next lemma, that follows by the Gershgorin's circle theorem.
In the following, we always assume that the step sizes in Algorithm 1 are chosen such that Φ 0. Then, we are able to formulate the following result.
with A as in (9), Φ as in (11): for any initial condition Proof. By definition of inverse operator we have that In turn, the first inclusion in (13) can be split in two components by left-multiplying both sides with R and S. By noticing that SN Ω = 0 (N −1)n , RR = I n and SR = 0 (N −1)n×n , we get Therefore, since the zeros of the subdifferential of a (strongly) convex function coincide with the minima (unique minimum) [16,Th. 16.3], (13) can be rewritten as ∀i ∈ I : The conclusion follows by defining are local auxiliary variables kept by each agent, provided that z 0 = V m v 0 . The latter is ensured by z 0 = 0 N m , as in Algorithm 1.

Remark 2:
The preconditioning matrix Φ is designed to make the system of inclusions in (13) block triangular, i.e., to remove the term W n x k+1 and R A λ k+1 from the first inclusion, and the terms V m λ k+1 from the second: in this way, x k+1 i and z k+1 do not depend on x k+1 j , for i = j, or λ k+1 . This ensures that the resulting iteration can be computed by the agents in a fully-distributed fashion. Furthermore, the change of variable z = V m v reduces the number of auxiliary variables and decouples the dual update in (14) from the graph structure.
Remark 3: By Lemma 3, Remark 1 and by the explicit form of the resolvent J Φ −1 A in (14), we conclude that

B. Convergence analysis
The convergence of Algorithm 1 cannot be inferred by standard results for the PPA, because the operator A (or Φ −1 A) is not monotone in general. The loss of monotonicity is the main technical difficulty that arises when studying (G)NE seeking under partial-decision information, and it is due to the fact that R F (x) is very rarely monotone in cases of interest (see Appendix D). However, a restricted strong monotonicity property holds for the operator F a in (8), that was exploited, e.g., in [29], [15], [27]. Analogously, we make use of a restricted monotonicity property of the operator A, which can be guaranteed for any game satisfying Standing Assumptions 1-3, without additional hypotheses, as formalized in the next two statements.
Moreover, the operator Φ −1 A retains this property, in the space induced by the inner product ·|· Φ .
Lemma 6: Let α max be as in (15) and assume that α ∈ (0, α max ] is chosen. Then Φ −1 A is restricted monotone, with respect to zer(A), in the Φ-induced space: for all (ω, u) ∈ gra(Φ −1 A) and all (ω * , u * ) ∈ gra(Φ −1 A) such that ω * ∈ zer(A), it holds that Proof. By definition, (ω, Φu), (ω * , Φu * ) ∈ gra(A). Hence the restricted monotonicity in Lemma 5 reads as 0 Based on the restricted monotonicity property in Lemma 6, in the remainder of the section we show that the iteration in (12) converges to a point in fix(J Φ −1 A ) = zer(Φ −1 A) = zer(A). Our analysis is based on an existing result for iterations of firmly quasinonexpansive (FQNE) operators, that is reported next for readability. Let ω 0 ∈ H and set: Then the following statements hold: a k∈N ∈ 1 . (iii) Suppose that every cluster point of (ω k ) k∈N belongs to C. Then, (ω k ) k∈N converges to a point in C.
We already noted in Remark 3 that the operator J Φ −1 A is single valued, with dom(J Φ −1 A ) = R N n+Em+En . However, to be able to apply the previous lemma to the iteration in (12), we still need the following two lemmas.
Proof. By Lemma 3, we can equivalently study the convergence of the iteration in (12). In turn, (12) can be rewritten as (16) with T = J Φ −1 A and γ k = 1, e k = 0, for all k ∈ N, since J Φ −1 A is firmly quasinonexpansive with respect to the norm · Φ by Lemma 8 and it has full domain by Remark 3. Moreover, fix(J Φ −1 A ) = zer(A) by definition of resolvent, and hence fix(J Φ −1 A ) = ∅ by Lemma 1. The sequence (ω k ) k∈N is bounded by Lemma 7(i). Therefore, (ω k ) k∈N admits at least one cluster point, sayω, and denote (k n ) n∈N a nondecreasing diverging subsequence such that (ω kn ) n∈N converges toω. Since J Φ −1 A (ω kn ) − ω kn → 0 by Lemma 7(ii) and by the continuity of J Φ −1 A in Lemma 9, it follows that J Φ −1 A (ω) =ω. Therefore all the cluster points of (ω k ) k∈N belongs to fix(J Φ −1 A ) and the convergence to an equilibrium of (12) follows by Lemma 7(iii). The conclusion follows by Lemma 1.
Remark 4: Algorithm 1 requires each agent i to solve an optimization problem to compute x k+1 i , at each iteration. However, from Lemma 7 and by inspection of the proof of Theorem 1, it is evident that the convergence result in Theorem 1 still holds if an approximation is used in place of the exact solution of the argmin in Algorithm 1, provided that the errors with respect to the exact solution of the optimization, e k i , are absolutely summable, i.e., (e k i ) k∈N ∈ 1 , for all i ∈ I. Further, the optimization problems are strongly convex, hence they can be efficiently solved via iterative algorithms.

V. AGGREGATIVE GAMES
In aggregative games, n i =n > 0 for all i ∈ I (hence n = Nn) and the cost function of each agent depends only on its local decision and on the value of the average strategy avg(x) := 1 N i∈I x i . Therefore, for each i ∈ I, there is a function f i : Rn ×Rn → R such that the original cost function J i in (2) can be written as Since an aggregative game is only a particular instance of the game in (2), all the considerations on the existence and uniqueness of a v-GNE and equivalence with the KKT conditions in (4) are still valid. Moreover, Algorithms 1 could still be used to compute a v-GNE. This would require each agent to keep (and exchange) an estimate of all other agents' action, i.e., a vector of (N −1)n components. In practice, however, the cost of each agent is only a function of the aggregative value avg(x), whose dimensionn is independent of the number N of agents. To reduce the communication and computation burden, in this section we introduce an algorithm specifically tailored to seek a v-GNE in aggregative games, that is scalable with the number of agents. The proposed iteration is shown in Algorithm 2, where the parameters α, τ i , δ i for all i ∈ I, and ν, β, have to be chosen appropriately and we used the notationF We note thatF i ( avg(x)).

Remark 5:
The update of the local actions x k i in Algorithm 2 is the solution of an inclusion, i.e., the problem of finding a zero of an operator. Equivalently, the problem can be reformulated as the variational inequality VI(g k i (y), Ω i ), where g k i is the mapping in the square brackets in (19). In this section, we prove that, for appropriate choices of the parameters, a solution exists and it is unique (hence the algorithm is well defined) and we suggest methods to find it. Moreover, we show how the update can be simplified for many problems of interest. For the time being, we note that the x k i update only depends on locally available variables, hence it does not require communication among the agents.
Since the agents rely on local information only, they do not have access to the actual value of the average strategy. To cope with the lack of information, each agent is embedded with an auxiliary error variable s i ∈ Rn, that is an estimate of the quantity avg(x) − x i . Each agent aims at reconstructing the true aggregate value, based on the information received from its neighbors. In particular, it should hold that s k → 1 N ⊗ avg(x k ) − x k asymptotically, where s := col((s i ) i∈I ). For brevity of notation, we also denote Remark 6: By the updates in Algorithm 2, we can immediately infer an invariance property of the iteration, namely that avg(s k ) = 0n, or equivalently that avg(x k ) = avg(σ k ), for any k ∈ N, provided that the algorithm is initialized appropriately , i.e., s 0 i = 0n, for all i ∈ I. In fact, the update of σ k (as it follows from Algorithm 2) reads as where Ln := L⊗In. This update can be regarded as a dynamic tracking for the time-varying quantity avg(x) [11] .
We define the extended pseudo-gradient mapping as for ξ = col((ξ i ) i∈I ) ∈ R n , and the operator where α > 0 is a fixed design parameter, ω := col(x, s, v, λ) ∈ R 2n+Em+N m and we recall that σ = x + s is just a shorthand notation. We remark thatF (x, The rest of the section is devoted to the convergence analysis of Algorithm 2. Similarly to IV, we first show that the iteration is indeed a PPPA applied to find a zero of the operatorÃ. Then, we restrict our analysis to the invariant subspace We start by characterizing the zeros of the operatorÃ.
Lemma 10: The following statements holds: Next, similar to Lemma 5, we show a restricted monotonicity property for the operatorÃ.
Proof. See Appendix C. Remark 7: The update in (19) is implicitely defined by a strongly monotone inclusion (see the proof of Theorem 2). We remark that there are a number of iterative methods (with linear convergence rate) to find the unique solution of this inclusion [16,26], or the equivalent VI [25,12], for example the forward-backward splitting [25, 12.5.1], [16, 26.5], (similar to a projected-gradient method). We note that the agents can seek a solution in a completely decentralized way, as the computation is only local. Also, the VIs have a low dimensionn (differently from [20], where a subgame of dimension n has to be solved at every step). Moreover, as in Remark 4, convergence is guaranteed even if the solution is only approximated at each step (with summable error).
analogously to Lemma 3. The integrability condition always holds for scalar functions, i.e., ifn = 1. Another notable case is the class of aggregative games with cost functions for some function f i and symmetric matrix Q i ; this setup models a number of applications, e.g., the Nash-Cournot game described in [11] or the resource allocation problem considered in [32]. In this case, the functions a i are Nevertheless, it is not needed for the agents to be able to compute an explicit expression for a i , as the optimization problem in (26) can be solved via (projected) gradient-based methods.

VI. ACCELERATIONS
In IV, we showed that Algorithm 1 can be recast (modulo the change of variables z = V m v) in the form (16), i.e., with T = J Φ −1 A a single-valued continuous FQNE mapping. This compact operator representation allows for some modifications of Algorithm 1, according to well-known accelerations schemes [23], that can increase its convergence speed. For the sake of example, in this section we propose three accelerated versions of Algorithm 1, and we present convergence results. Other solutions could be obtained by combining different schemes [23].
Proof. The theorem follows directly from Lemma 7, as for Therorem 1, by choosing γ k = η for all k ∈ N.

B. Inertia
The inertial version of Algorithm 1 is summarized in Algorithm 4, obtained (modulo the change of variable z = V m v) by expanding the compact form with ρ ≥ 0 the inertia parameter. Theorem 4: Let α ∈ (0, α max ], α max as in (15), and that the step sizesτ ,ν,σ are chosen as in Lemma 2. Then, for any ρ ∈ [0, 1 3 ), the sequence (x k , z k , λ k ) k∈N generated by Algorithm 4 converges to an equilibrium (x * , z * , λ * ), where x * = 1 N ⊗ x * and x * is the GNE of the game in (2).
Proof (sketch). The proof is analogous to that of [33,Th. 5], which is formulated for nonexpansive operators 3 ; however, in the finite dimensional case, the proof holds for continuous quasinonexpansive operators. Also, it can be shown that ρ satisfies the bound in [33,Eq. 6].

C. Alternated inertia
The proposed scheme is shown in Algorithm 5 and reads in compact form, modulo the change of variable z = V m v, as if k is even, ) Theorem 5: Let α ∈ (0, α max ], α max as in (15), and that the step sizesτ ,ν,σ are chosen as in Lemma 2. Then, for any ρ ∈ [0, 1), the sequence (x k , z k , λ k ) k∈N generated by . For all k > 0: Communication: The agents exchange the variables {x k i , x k i,−i , λ k i } with their neighbors. Local auxiliary variables update: each agent i ∈ I does . For all k > 0: Local auxiliary variables update: each agent i ∈ I does The agents exchange the variables {x k i ,x k i,−i ,λ k i } with their neighbors. Local variables update: each agent i ∈ I does Algorithm 5 Fully-distributed v-GNE seeking via alternating inertial PPPA . For all k > 0: Auxiliary variables update: Setρ k = 0 if k is even,ρ k = ρ otherwise. Each agent i ∈ I does The agents exchange the variables {x k i ,x k i,−i ,λ k i } with their neighbors. Local variables update: each agent i ∈ I does Algorithm 5 converges to an equilibrium (x * , z * , λ * ), where x * = 1 N ⊗ x * and x * is the GNE of the game in (2). Proof (sketch). The proof follows analogously to the proof of [23,Lemma 3.3], by exploiting that T is FQNE and continuous (the original result is formulated for firmly nonexpansive operators).
Remark 9: An analogous analysis can be carried out to obtain accelerated versions of Algorithm 2.

VII. NUMERICAL EXAMPLES A. Nash-Cournot game
We consider the Nash-Cournot game presented in [15,6]. N firms produce a commodity that is sold to m markets. Each firm i ∈ I = {1, . . . , N } is only allowed to participate in n i ≤ m of the markets, and the decision variables of each firm are the quantities x i ∈ R ni of commodity to be delivered to these n i markets. The quantity of product that each firm can deliver to the markets is bounded by the local constraints 0 ni ≤ x i ≤ X i . Moreover, each market has a maximal capacity r k , for k = 1, . . . , m. This result in the shared affine constraint Ax ≤ r, with r = col((r k ) k=1,...,m ) and A = [A 1 . . . A N ], where A i ∈ R m×ni is the matrix that expresses which markets firm i participates in. Specifically, the j-th column of A i has its k-th element equal to 1 if [x i ] j is the amount of product sent to the k-th market by agent i, for all j = 1, . . . , n i ; all the other elements are 0. Therefore, is the vector of the quantities of total product delivered to each market. Each firm i aims at maximizing its profit, i.e., minimizing the cost function Here, c i (x i ) = x i Q i x i + q i x i is firm i's production cost, with Q i ∈ R ni×ni , Q i 0, q i ∈ R ni . Instead, p : R m → R m associate to each market a price that depends on the amount of product delivered to that market. Specifically, the price for the market k, for k = 1, . . . , m, is We set N = 20, m = 7. The market structure is defined as in [15, Fig. 1], that defines which firms are allowed to participate in which of the m markets. Therefore, x = col((x i )) i∈I ) ∈ R n and n = 32. The firms cannot access the production of all the competitors, but they are allowed to communicate with their neighbors on a randomly generated connected graph. We select randomly with uniform distribution r k in [1,2], Q i diagonal with diagonal elements in [1,8], q i in [1,2],P k in [10,20], χ k in [1,3], X i in [5,10], for all i ∈ I, k = 1, . . . , m.
The resulting setup satisfies all our theoretical assumptions [15,VI]. We compute α max according to (15) as α max = 0.0043 and we choose the step sizes as in Lemma 2 to satisfy all the conditions of Theorem 1.
We compare the performance of Algorithm 1 versus that of the gradient-based method in [15,Alg. 1], which is to the best of our knowledge the only other available single-layer scheme to solve generalized games in the partial-decision information setup. Specifically, in [15, Alg. 1] a gain c > 1 αmax has to be chosen, together with some step sizes τ , ν, σ: we set all the parameters such that the step sizes are as big as possible, provided that the conditions in [15,Th. 2] are satisfied. This results in very small step sizes, e.g., τ * ≈ 3 * 10 −9 (and c * ≈ 400).
The results are illustrated in Figure 1, where the two Algorithms are initialized with the same random initial conditions. When setting the parameters that guarantee theoretical convergence, [15, Alg. 1] is very slow, due to the small step sizes; and our PPPA method shows a much faster convergence. However, in our numerical experience, the bounds on the parameters are pretty conservative, and in effect we observe faster convergence for bigger step sizes. For [15,Alg. 1], the fastest convergence is attained by setting c 100 times smaller and the step sizes 10 7 times bigger than the theoretical values; for larger values of the step sizes, convergence is lost. We also remark that Algorithm 1 requires only one round of communication per iteration (the agents exchange the variables x i , λ i ) while [15, Alg. 1] requires two rounds of communication (the agents exchange x i , λ i , then they first update and then exchange the variables z i ).
To improve the convergence speed of Algorithm 1, we apply the acceleration schemes discussed in Section VI. We choose the parameters that ensure convergence. The impact is remarkable, up to halving the number of iterations needed for convergence, as shown in Figure 2.

B. Charging of plug-in electric vehicles
We consider the charging scheduling problem for a group of plug-in electric vehicles, modeled by an aggregative game as in [3]. Each user i, i ∈ I = {1, . . . , N }, can plan the charging of its vehicle during a temporal horizon of 24 hours, discretized inton intervals. Specifically, each users aims at choosing the energy injections x i ∈ Rn of each time interval to minimize its cost, given by the quadratic function where q i x i + c i x i represents the battery degradation cost, while (a(avg(x) + d) + b1n) is the cost of energy, with b > 0 a baseline price, a > 0 the inverse of the price elasticity of demand and d ∈ Rn the inelastic base demand (not related to vehicle charging) along the temporal horizon. We assume a maximum injection per interval for each vehicle and a desired final charge level for each user, resulting in the local constraints Ω i = [0n,x i ] ∩ {y ∈ Rn | 1 n y = γ i }. Moreover, we consider the transmission line constraints 0n ≤ i∈I x i ≤c. We choose N = 10,n = 12. As in [3], for all i ∈ I, we select q i in [0.002, 0.006], c i in [0.055, 0.095], γ i in [0. 6,1] with uniform distribution ; [x i ] j = 0.25 with probability 20%, [x i ] j = 0 otherwise. We set [c] j as 0.4 if j ∈ {1, 2, 3, 11, 12}, as 1 otherwise (corresponding to more restrictive limitations during the daytime); a = 0.038, b = 0.06 and d as in [3].
Because of the quadratic structure of the cost functions, Standing Assumptions 1-2 hold for any choice of a > 0, q i > 0. We let the agents communicate over a randomly generated connected graph. Figure 3 shows the performance of Algorithm2 compared to [17, Alg. 1] (which requires two rounds of communication per iteration). Our PPPA significantly outperforms the gradient-based method, when the step sizes are set at their theoretical upper-bounds.

VIII. CONCLUSION
Preconditioned proximal-point methods are suitable to design fully-distributed generalized Nash equilibrium seeking algorithms. Our algorithms proved much faster than the existing pseudo-gradient-based methods, at least in our numerical By the explicit expression of J Φ −1 A in (14), by Lipschitz continuity of the projection and by recalling that the composition of continuous function is continuous, we just have to show that the mappings ω k = col(x k , v k , λ k ) → x k+1 i are continuous, for all i ∈ I. Specifically, we have that w ij x j,−i ). By continuity of p i on R N n and Assumption 1, g i is a continuous function relative to the set R ni × R N n+Em+N m . Moreover, g i is level bounded in y uniformly in ω ([34, Def. 1.16]), i.e., for all ω ∈ R N n+Em+N m and γ ∈ R there exist a neighbor V ofω such that the set {(y, ω) | ω ∈ V, g i (y, ω) ≤ γ} is bounded in R ni × R N n+Em+N m (this follows since g i is continuous and strongly convex in y for every ω). Therefore, the map h i is outer semicontinuous and locally bounded [34,Cor. 7.42]. This is equivalent to h i being continuous, since h i is also single-valued [34,Cor. 5.20].

C. Proof of Theorem 2
Analogously to Lemma 3, it can be shown that Algorithm 2 is equivalent to the iteration ω k+1 ∈ JΦ −1Ã (ω k ), modulo the transformation z k = V m v k . In particular, we next show that the inclusion in (19) has a unique solution. First, we notice that the problem is equivalent to solving the variational inequality VI(g i (·, ϑ k ), Ω i ), where ϑ k = (x k , s k+1 , s k , λ k ), and g i (y, ϑ k ) := τ i y + αF i (y, y + s k+1 is continuous. Moreover g i is strongly monotone in y for any value of ϑ k . In fact,F i , isθ-Lipschitz, becauseF isθ-Lipschitz by Lemma 11. Therefore we can write (y−y ) (g i (y, ϑ k )−g i (y , ϑ k )) ≥ (τ i −α √ 2θ) y−y 2 , for any y, y ∈ R n , and the conclusions follows from the assumption on α. Therefore, we can conclude that, for all k ∈ N, the VI(g k i (·, ϑ k ), Ω i ) (and hence, the inclusion in (19)) admits a unique solution [25,Th. 2.3.3]; moreover the solution map ϑ k → x k+1 is continuous in ϑ k [35,Th. 4.1]. Therefore, as in Lemma 9, the operator JΦ −1Ã is singlevalued, continuous, and dom(JΦ −1Ã ) = R 2n+Em+N m .

D. Monotonicity of the extended pseudo-gradient
Monotonicity of the operator R F , with F being the extended pseudo-gradient in (7), has been sometimes postulated to show convergence of (G)NE seeking algorithms under partial-decision information [27,Ass. 4(i)], [36,Ass. 5], [37,Ass. 4]. Next, we show that this is a very restrictive condition.
For simplicity, we assume that the costs J i are twice continuously differentiable, for all i ∈ I. The derivative with respect to the full argument (Jacobian) of the operator Since D is block diagonal, we can equivalently check this condition for the single blocks. Without loss of generality, let us consider only the first block (for the other blocks, the following applies modulo row and column permutations), for which . . .
We conclude that the operator R F is monotone if and only if ∂ 2 Ji ∂xi∂xj (x i ) = 0 ni×nj for all x i ∈ R n , all i ∈ I, j ∈ I\{i}. This only holds if the cost function of each agent does not depend on the actions of the other agents (or if the part of the cost function that depends on the actions of the other agents is a separable addend).