Hierarchical Noncooperative Dynamical Systems Under Intragroup and Intergroup Incentives

In this article, a framework for hierarchical noncooperative systems with dynamic agents is proposed. In the characterized framework, agents in each group are incentivized by a corresponding group manager who represents the benefits of group utility via an intragroup incentive mechanism. The coefficients in intragroup incentive functions are characterized as the group manager's strategy in this article. The update rules that can be adopted by the group managers are proposed based on the local state and the payoff information with continual and intermittent observation on the state of the agents from the other groups. Sufficient conditions under which the trajectory of agents' state converges toward the group Nash equilibrium are derived for the proposed update rules. Furthermore, to improve the social welfare of the entire system, we propose an intergroup incentive scheme in the group managers level for a system governor to bring agents' state to a target equilibrium. To deal with the uncertain information on agents' personal payoff functions for the system governor, we present sufficient conditions to guarantee the convergence of agents' state to the target equilibrium. Three numerical examples are provided to illustrate the efficacy of our approach.


Hierarchical Noncooperative Dynamical Systems Under Intragroup and Intergroup Incentives
Yuyue Yan , Member, IEEE, and Tomohisa Hayakawa , Member, IEEE Abstract-In this article, a framework for hierarchical noncooperative systems with dynamic agents is proposed.In the characterized framework, agents in each group are incentivized by a corresponding group manager who represents the benefits of group utility via an intragroup incentive mechanism.The coefficients in intragroup incentive functions are characterized as the group manager's strategy in this article.The update rules that can be adopted by the group managers are proposed based on the local state and the payoff information with continual and intermittent observation on the state of the agents from the other groups.Sufficient conditions under which the trajectory of agents' state converges toward the group Nash equilibrium are derived for the proposed update rules.Furthermore, to improve the social welfare of the entire system, we propose an intergroup incentive scheme in the group managers level for a system governor to bring agents' state to a target equilibrium.To deal with the uncertain information on agents' personal payoff functions for the system governor, we present sufficient conditions to guarantee the convergence of agents' state to the target equilibrium.Three numerical examples are provided to illustrate the efficacy of our approach.Index Terms-Gradient play, incentive mechanism, information hierarchy, large-scale system, noncooperative systems.

I. INTRODUCTION
C OORDINATION issues between the individual interests and social interest have become essentially important for studying multiagent systems in the coming smart society.In order to investigate the coordination issues, game theory has been used one of the disciplines concerning the relations between human decision-making and resulting phenomena as a whole [1], [2].In noncooperative systems, each agent is presumed to be fully rational and selfish and, hence, aims to increase its own payoff by adjusting its individual state in the system.Under this presupposition, the selfish agents' continuous dynamic decision behaviors in a noncooperative system are typically modeled The authors are with the Department of Systems and Control Engineering, Tokyo Institute of Technology, Meguro 152-8552, Japan (e-mail: yan.y.ac@m.titech.ac.jp; hayakawa@sc.e.titech.ac.jp).
For the aggregation of such self-interested agents, it has turned out that the imposition of explicit incentive mechanisms changes agents' decision-making tendencies and, hence, results in the endogenously cooperative behaviors in the noncooperative systems.For instance, Alpcan et al. [17] designed pricing mechanisms to achieve social utility maximization for the selfish agents driven by pseudogradient dynamics.Yan and Hayakawa [7], [18] presented a framework where a system manager knowing all the agents' payoff functions imposes some explicit incentive mechanisms to stabilize an unstable Nash equilibrium and a socially maximum state.However, for a noncooperative system with a large number of agents, the requirement for a single system manager knowing all agents' payoff functions is extremely stringent.To deal with this problem, hierarchical structures consisting of a system governor (e.g., president) and multiple managers (e.g., mayors) often exist in our society, where the agents are divided into several groups controlled by the corresponding group managers.In those structures, the system governor usually observes only limited information from each of the groups but the group mangers know more specific information in their own groups.To our knowledge, the theoretical analysis of pseudogradient-based noncooperative dynamic systems with hierarchical incentives is not considered yet in the literature.
Some hierarchical structures of incentive mechanisms or noncooperative systems can be found in [19], [20], [21], and [22].For example, Ng et al. [20] considered a two-level incentive mechanism design problem to mitigate the straggler effects in the federated learning training tasks.Mukaidani and Xu [21] studied incentive Stackelberg games with multiple leaders and followers for a class of stochastic linear systems with external disturbance, where several agents take the position as leaders and the rest of the agents take the position as followers so that the outcome of entire systems depends on the state of both the leaders and the followers.Alternatively, in the literature of economics, delegation games describe a different situation in which some principals choose a compensation scheme for their agents while the latter play a game on behalf of the principals [22].In such a case, the payoffs of all players (i.e., principals and agents) are determined by the actions chosen by the agents.
In this article, we focus on the social welfare improvement problem for large-scale hierarchical noncooperative dynamical systems driven by the pseudogradient dynamics.Specifically, we assume that the agents in the noncooperative system belong to one of the several groups and are influenced by the corresponding group managers via some intragroup incentives.We characterize the situation where group managers try to enhance the welfare of their own groups by continually updating their own intragroup incentives to the group members.Different from preliminary version [23], this article generalizes the results to the case with nonquadratic payoff functions and different sensitivities in the pseudogradient dynamics.We explore the stability of group Nash equilibrium of the hierarchical noncooperative systems, and derive conditions where the trajectory of agents' state converges to the group Nash equilibrium under group managers' intragroup incentives.Furthermore, we propose the intergroup incentive mechanism for a system governor in order to reconstruct the group utility functions in the group managers level to move the group Nash equilibrium so that the social (entire) welfare is improved.To deal with the situation where the system governor may not know all the agents' individual payoff functions and all the agents' state, we present sufficient conditions to guarantee the convergence of agents' state toward a target (suboptimal but not optimal due to the lack of enough information) equilibrium using some macroscopic data.Our numerical example reveals the connection between the proposed three-layer hierarchical incentive structure and the two-layer incentive structure characterized in [7] for a large-scale Cournot game.
The rest of this article is organized as follows.We explain hierarchical noncooperative systems with dynamic agents under intragroup incentives in Section II.In Section III, we propose a couple of update rules for the group managers to update their intragroup incentives.Furthermore, in Section IV, we characterize the intergroup incentive mechanisms in the manager layer to increase the social welfare of the entire multiagent system.Three illustrative numerical examples are presented in Section V. Finally, Section VI concludes this article.
Notations: We use the following notations in the article.We write Z 0 for the set of nonnegative integers, Z + for the set of positive integers, R for the set of real numbers, R + for the set of positive real numbers, R n for the set of n×1 real column vectors, and R n×m for the set of n × m real matrices.Moreover, (•) T denotes transpose, (block-)diag[•] denotes a (block-)diagonal matrix, 0 n denotes an n × n zero matrix, and I n and 1 n denote the identity matrix and the ones vector of dimension n, respectively.Finally, A < 0 denotes the fact that the matrix A is negative definite, [row i (A)] i∈I denotes a matrix with the rows composed of the i(∈ I)th rows of matrix A, x = √ x T x denotes the Euclidean norm of a vector x, A denotes the matrix norm of a matrix A, and He(•) denotes the Hermitian part of a matrix.

A. System Description
Consider the hierarchical noncooperative system consisting of an agent layer and a manager layer, where n is the number of agents that belong to one of the m (≥ 2) number of groups in the agent layer and are influenced by the corresponding group managers with some intragroup incentives.Let M = {1, . . ., m} denote the set of groups and let n k (≥ 2) denote the number of agents in group k ∈ M, where k∈M n k = n.The set of overall agents is denoted by the state profile of all the agents, where x i ∈ R denotes the state of agent i, and x k ∈ R n k denotes the state profile of the agents in group k ∈ M. The payoff function of agent i ∈ N without incentive is denoted by J i : R n → R, which may depend on all the agents' state and is supposed to be continuously differentiable.
1) Incentive Structure: In this article, we assume that the m number of the group managers try to enhance the welfare of their own groups, which they evaluate by the individual group utility functions, by imposing an intragroup incentive mechanism to the agents in their own groups.The group utility functions U k : R n → R, k ∈ M, are defined as the weighted sum of the payoff functions of their own group members, i.e., where η i ∈ R + , i ∈ N k denote the weights (priorities) of the agents evaluated by the group manager k ∈ M. Furthermore, we assume that there is the system governor who also imposes a similar intergroup incentive mechanism on the manager layer so that the welfare of the entire agents defined by for some weights ξ k ∈ R + , k ∈ M, of the groups is improved (see the structure of the hierarchical noncooperative system illustrated in Fig. 1).

2) Intergroup Incentives:
To improve the welfare Π(x) of the entire multiagent system, the system governor constructs incentivized group utility functions among the groups given by where g k (x) denotes the intergroup incentive function for group k satisfying the constraint k∈M g k (x) = 0 and x denotes limited information of the state profile x which is defined below.This constraint represents the case where the system governor serves merely as a mediator transferring payoffs among the groups.In general, the system governor may not know the specific values of the agents' states x 1 , . . ., x n especially when n is large.In this article, we suppose that the system governor observes some kind of macroscopic data (e.g., average of the state values) from each of the groups, and the intergroup incentive function g k (x) is a simple function mapping from R m to R (instead of R n to R).This observed data x is given as a linear mapping from the agents' states given by where c k ∈ R 1×n k , k ∈ M, and C ∈ R m×n .For example, if the observed data xk is simply given as the average of the state values of With a slight abuse of notation, the intergroup incentive function1 g k (x) is given by where v [v 1 , . . ., v m ] T ∈ R m represents the intergroup incentive coefficient.In this case, the constraint k∈M g k (x) = k∈M g k (v, x) = 0 is indeed satisfied.With this notation of g k (v, x), we write Ũ k (x) in (3) as Ũ k (v, x).
3) Intragroup Incentives: In order to improve the group utility Ũ k (•, •), the group managers shift the Nash equilibrium (defined in Definition 1 below) of the group through the intragroup incentive mechanism.Specifically, the incentivized payoff functions for each agent to increase are given by where p k i denotes the intragroup incentive function imposed by group manager k ∈ M to the agents in N k under its control given by and denotes the strategy of the group manager k.Note that the group managers serve merely as mediators transferring payoffs among the agents so that the sum of the incentive functions is zero, i.e., Definition 1: Given the strategy u = [(u 1 ) T , . . ., (u m ) T ] T ∈ R n of the group managers, the profile x * (u) ∈ R n is called a Nash equilibrium with respect to { Ji (u k , x)} i∈N given by ( 5 , holds for all i ∈ N k and k ∈ M, where x −i is the agents' state profile except agent i. With a given u, since Ji (u k , x) is continuously differentiable for all i ∈ N under (6), the Nash equilibrium x * (u) satisfies for all k ∈ M. On the other hand, with a given v, at the Nash equilibrium x * (u), the group manager k may wish to unilaterally change its strategy u k to benefit its own group when This observation induces another concept of equilibrium at which no group manager can benefit its own group more by unilaterally changing its strategy for the intragroup incentives.
It is worth mentioning that both of the Nash equilibrium and the group Nash equilibrium are characterized independently of the agents' underlying dynamics.Since Ũ k (v, x) is continuously differentiable for all k ∈ M under (4), the existing group Nash equilibrium x (v) satisfies is called a subgame perfect equilibrium for intragroup incentives if the corresponding Nash equilibrium x * (u (v)) coincides with the group Nash equilibrium x .

4) Behavior of Agents Under
Incentives: Now we consider the situation where each agent is a selfish and dynamic decision-maker continually changing its own state by following the pseudogradient dynamics [3] in terms of the incentivized payoff functions, i.e., where α 1 , . . ., α n denote the agent-dependent positive sensitivity parameters.The pseudogradient dynamics (9) capture the fact that the agents concern their own incentivized payoffs and myopically change their states according to the current information without any foresight on the future state [6], [7], [24], [25], [26], [27].Consequently, the agents' state dynamics (9) with the intragroup incentive functions ( 6) are described by the dynamics given by Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where ∂x n ] T denotes the pseudogradient function characterized by the agents' individual payoff functions, and α (α 1 , . . ., α n ).

1) Available Information for Agents:
The state profile x(•) is available for all the agents.No information of payoff functions is exchanged among the agents.The signal u k i (t) from group manager k is available only for agent i.
2) Available Information for Group Managers: In this article, we assume that group manager k has access to the payoff functions J i (•), i ∈ N k , and the state x i (t), i ∈ N k , in its own group.The state profile x −k (t) of other groups can be continually or intermittently observed by group manager k.No communication between the group mangers is assumed, i.e., the strategies of the other group managers is unavailable.The block diagram of information hierarchy is illustrated in Fig. 2.
3) Available Information for System Governor: We suppose that the system governor does not know the full information of the agents' state and payoff functions, but have access to the group utility functions U k (•), k ∈ M, and the low dimensional (macroscopic) data xk , k ∈ M, from the groups.
Motivation 1: It is important that for a given u(t) ≡ ū ∈ R n , all the Nash equilibria of the noncooperative system are the equilibria of the dynamics (10) since ẋ(t) ≡ 0 holds under (7) with u replaced by ū.In general, there may be multiple Nash equilibria in the noncooperative system.Some sufficient conditions for existence of a unique Nash equilibrium can be found in [3] and [28,Ch. 2], which can also guarantee global stability of the pseudogradient dynamics with u(t) ≡ 0. For example, supposing that the Jacobian matrix ∇f (x) of the pseudogradient function f + , it can be shown that the nonincentivized system exhibits a unique and globally asymptotically stable Nash equilibrium under the pseudogradient dynamics (with u(t) ≡ 0).Alternatively, supposing that the nonincentivized system is a strictly monotone game (i.e., the pseudogradient [28], it can be also shown that the nonincentivized system exhibits a unique and globally asymptotically stable Nash equilibrium under the pseudogradient dynamics (with u(t) ≡ 0).In these two cases, for a given u(t) ≡ ū ∈ R n , noticing that the matrix ) remains as a negative-definite matrix or the noncooperative system remains as a strictly monotone game, the Nash equilibrium x * (ū) is the unique and globally asymptotically stable equilibrium of the dynamics (10) satisfying ẋ(t) ≡ 0. Therefore, by well designing the strategies u k , k ∈ M, for the intragroup incentive schemes, the group managers may be able to move the Nash equilibrium to a state possessing a better group utility than the nonincentivized (u(t) ≡ 0) case.
Motivation 2: In general, the group manager k is not able to obtain the group utility functions U −k (•) from the other groups.The group managers may continually change their own strategy u k (t), k ∈ M, t ≥ 0, in order to change the Nash equilibrium to a state associated with a better group utility.
Motivation 3: Given the subgame perfect equilibrium u (v) with v = 0 (i.e., without intergroup incentives), even though the agents' state may reach the group Nash equilibrium x , the entire social welfare may still not be maximized because the group managers do not cooperate with each other.
Problem: Considering the hierarchical noncooperative dynamical system, our main objectives in the article are twofolds: 1) design some update rules for the group manager k ∈ M to continually update its strategy u k (t) only using the information on the agents' state x(t) and payoff functions J i (•), i ∈ N k ; 2) update intergroup incentive coefficient v for the system governor to stabilize a target equilibrium for improving the entire social welfare using limited information xk (t), k ∈ M.

III. UPDATE RULES FOR INTRAGROUP INCENTIVES
In this section, we propose our update rule for group manager k to update u k (t) for its intragroup incentive mechanism under the scenarios with two types of observations, i.e., continual and intermittent observations, on x −k (t) whereas the state information x k (t) of its own group is available for all t.In both of the two scenarios, we suppose that there is no intergroup incentive mechanism among the groups [i.e., v = 0 in (4)] with the following assumption.
Assumption 1: The group utility function In this section, we consider the situation where the value of x −k (t) is fully observed by group manager k for every time instant t ≥ 0. Note that our main idea in constructing the update rule of u k (t) for group manager k is to make the best-response state for group k coincide with the individual best-response state for all the group members in N k .Specifically, we consider the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
update rule for the group managers given by where represents the best-response state of group k given the other groups' state x −k (t).The update rule (11) captures the fact that the group managers concern their own group utilities and myopically change their strategies according to the current information without foresight on the other groups' future state.Note that the best-response state xk (t) of group k is invariant under the same priority ratio, i.e., η i+1 : 1) Nonquadratic Payoff Functions: For the statement of the following results, we let the state profile x ∈ R n be partitioned by With a slight abuse of notation, we write ∇γ k + (x) for ∇γ k + (x −k ), and ∇γ k − (x) for ∇γ k − (x −k ).Before we present a theorem, we define an n × n matrix for a group Nash equilibrium x (0) ∈ R n .Theorem 1: Consider the noncooperative system with the pseudogradient dynamics (9) and the intragroup incentive function (6) under Assumption 1.Let the group managers' strategy u k be updated by (11) and (12).If the matrix A s diag[α]A(γ, x (0)) is Hurwitz, then the group Nash equilibrium x (0) is locally asymptotically stable and the group managers' strategy u(t) converges to the corresponding subgame perfect equilibrium as t → ∞.
Remark 1: To construct the update rule (11), each manager k only needs to observe the state profile x −k (t) ∈ R n−n k from the other groups instead of observing the other mangers' strategy u −k (t) and hence the proposed update rule in the manager layer is certainly different from the existing Nash equilibrium seeking dynamics.But note that the state profile x k (t) ∈ R n k is also required for constructing the intragroup incentive functions (6) within group N k .
Remark 2: Implementing the update rule ( 11) is understood as a reasonable and intuitive but myopic try for the group managers.None of those group managers can know stability of the group Nash equilibrium beforehand because they never know the exact expression of the matrix A s as the information of x , ∇f −k (x), ∇γ −k − (x), and ∇γ −k + (x) is undisclosed to them.To guarantee stability of the hierarchical noncooperative system, the behavior of a system governor who imposes intergroup incentive mechanism among the group managers is explored in Section IV.
The next result provides a sufficient stability condition without the information of agents' personal sensitivity parameters α 1 , . . ., α n .
Proposition 1: Consider the noncooperative system with the pseudogradient dynamics ( 9) and the intragroup incentive function ( 6) under Assumption 1.Let the group managers' strategy u k be updated by (11) and (12).If there exists α ∈ R n + such that holds then the group Nash equilibrium x (0) is locally asymptotically stable and the group managers' strategy u(t) converges to the corresponding subgame perfect equilibrium as t → ∞ for any α ∈ R n + .Remark 3: For the case where there exists α ∈ R n + such that (∇f (x) + ∇u(x)) T diag[α] + diag[α](∇f (x) + ∇ u(x)) < 0 hold for all x ∈ R n with ∇u(x) defined in (39) in the Appendix, it can be shown that global asymptotic stability of the group Nash equilibrium is guaranteed.Note that ( 17) is a special case of this condition with x = x (0).
2) Quadratic Payoff Functions: Now, we specialize the results to the noncooperative systems with quadratic payoff functions J i (x), i ∈ N, given by where ∈ R n×n is nonsingular, for the given u, it follows that there exists a unique Nash equilibrium x * (u) given by x * (u Hence, for a group Nash equilibrium x , the subgame perfect equilibrium u is given by u = −Ax − b. Consequently, the agents' state dynamics (9) with the quadratic payoff functions (18) and the intragroup incentive functions (6) are described by the affine dynamics given by Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
For the following statements, for each k ∈ M, we let A k i∈N k η i A i and B k i∈N k η i b i be partitioned by: which are used in (1) to be rewritten as Note that "*" represents some matrices with consistent orders.Here, we note that n i , and In this case, notice that P k is equivalent to the matrix ∇f k (x) defined in (13).
Remark 4: Assumption 1 requires A k k < 0, k ∈ M, for the quadratic payoff functions (18) so that xk (t) in (12) given by implies that the matrices defined in ( 14) and ( 15) are simply given by ∇γ k Furthermore, it follows from (8) that the unique group Nash equilibrium: There is a unique group Nash equilibrium x ∈ R n , i.e., G is invertible.
Theorem 2: Consider the noncooperative system with the pseudogradient dynamics (9), the quadratic payoff functions (18), and the intragroup incentive function (6) under Assumptions 1 and 2. Let the group managers' strategy u k be updated by (11) and (12).Then, the group Nash equilibrium x (0) is globally asymptotically stable and the group managers' strategy u(t) converges to the corresponding subgame perfect equilibrium u (0) = −Ax (0) − b as t → ∞, if and only if the matrix Remark 5: If A k − = 0 and A k + = 0 hold for all k ∈ M, then the pseudogradient dynamics of the agents in N k are not mutually affected by the agents in the other groups, and hence xk (t) = arg max 23) is in fact constant being independent of the values of x −k (t) for all k ∈ M. Hence, the system is understood as a combination of m number of independent noncooperative systems with the sets of agents N 1 , N 2 , . .., N m incentivized by the corresponding group managers.

B. Intermittent Observation of x −k (t)
It is not always the case where the group managers are able to observe the state profile x −k (t) from the other groups for every time instant t ≥ 0. In this section, we characterize the situation where group manager k only has intermittent access to x −k (t) at some specific time instants, whereas the state information x k (t) of its own group is available for all t to process the intragroup incentive function (6).It is observed from real society that the governments/public may have intermittent access to realize the financial status of local companies because those local companies usually have termly financial reports to the public or they may need to go through a temporary inspection for some specific time instants required by the financial department of the government.
Therefore, we consider the sampled-data-based, piecewise constant update rule (11) with (x k (t), x −k (t)) replaced by (x k (t s ), x −k (t s )) for t s ≤ t < t s+1 , where {t s } s=0,1,2,... denotes the sequence of sampling instants with t 0 = 0 and lim s→∞ t s = ∞.The sampling intervals between two sampling instants are defined by T s t s+1 − t s ∈ R + , s ∈ Z 0 , which may be constant or time-varying depending on the information disclosure structure.Even though we can provide slightly more general results for the case with nonquadratic payoff functions, in this section we focus mainly on the case with quadratic payoff functions.The case with nonquadratic payoff functions is discussed in Remark 6 at the end of the section.
Suppose that the payoff functions J i (x), i ∈ N, are given as quadratic functions (18).The next result provides a sufficient stability condition for the proposed sampled-data-based update rule (11).For the statements of the following results, let Φ(T ) e diag[α]AT (I n + A −1 K) − A −1 K, where: ) Proposition 2: Consider the noncooperative system with the pseudogradient dynamics (9), the quadratic payoff functions (18), and the intragroup incentive function (6) under Assumptions 1 and 2. Let the group managers' strategy u k be updated by the sampled-data-based update rule (11) with (x k (t), x −k (t)) replaced by (x k (t s ), x −k (t s )).If there exists a positive-definite matrix P ∈ R n×n such that for all s ∈ Z 0 , then the group Nash equilibrium x (0) is globally asymptotically stable and the group managers' strategy u(t) converges to the corresponding subgame perfect equilibrium as t → ∞.Proposition 2 indicates that the choice of the sampling instants {t s } s=0,1,2,... is essential in the sampled-data-based update rule Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I SUMMARY OF THE RESULTS WITHOUT INTERGROUP INCENTIVES (I.E.,
v = 0) (11).The next result shows that sufficiently small sampling intervals should preserve the asymptotic stability when the group Nash equilibrium is asymptotically stable under the continual update rule (11).Theorem 3: Consider the noncooperative system with the pseudogradient dynamics (9), the quadratic payoff functions (18), and the intragroup incentive function (6) under Assumptions 1 and 2. Let the group managers' strategy u k be updated by the sampled-data-based update rule (11) with (x k (t), x −k (t)) replaced by (x k (t s ), x −k (t s )).Suppose that the matrix A s = diag[α]A is Hurwitz with A s defined in (24).Then, exists σ ∈ R + such that the group Nash equilibrium x (0) is globally asymptotically stable for any sampling instants t s , s ∈ Z 0 , satisfying T s < σ, s ∈ Z 0 .
Even though the conditions shown in Proposition 2 and Theorem 3 require the knowledge of agents' sensitivity α ∈ R n + , it is worth noting that Theorem 3 along with Proposition 1 in Section III-A suggests a sufficient stability condition for unknown agents' sensitivity parameters α ∈ R n + .Corollary 1: Consider the noncooperative system with the pseudogradient dynamics (9), intragroup incentive function (6), and quadratic payoff functions (18) under Assumptions 1 and 2. Let the group managers' strategy u k be update by the sampleddata-based update rule (11) with (x k (t), x −k (t)) replaced by (x k (t s ), x −k (t s )).Suppose that there exists α ∈ R n + such that A T s diag[α] + diag[α]A s < 0 for the matrix A s defined in (24).Then, there exists σ ∈ R + such that the group Nash equilibrium x (0) is globally asymptotically stable for any α ∈ R n + and any sampling instants t s , s ∈ Z 0 , satisfying T s < σ, s ∈ Z 0 .
Remark 6 : The results in Sections III-A and III-B are summarized in Table I.Even though the results in Proposition 2, Theorem 3, and Corollary 1 are established only for the noncooperative systems with quadratic payoff functions, it is worth noting that as long as local stability is concerned around a group Nash equilibrium x (0), those results can be generalized to nonquadratic cases.Specifically, for a (not necessarily quadratic) payoff function J i (x), it can be expressed in the form of where ε i (x) includes third-or higher-order terms, A i ∈ R n×n is the Hessian matrix of 27) plays a similar role as the one in ( 18), the results in Proposition 2, Theorem 3, and Corollary 1 can be similarly described for the nonquadratic cases.

IV. SOCIAL WELFARE IMPROVEMENT VIA INTERGROUP INCENTIVES
In this section, we focus on how to properly design the intergroup incentive mechanism for the system governor with the macroscopic data xk , k ∈ M, to improve the weighted social welfare of the entire hierarchical system defined in (2) as much as possible. 2Under the intergroup incentive mechanism (3), ( 4), the parameter xk (t) in the group manager's strategy update rule (11) is remodeled from (12) to Note that the group Nash equilibrium under the intergroup incentive mechanism depends on v and is denoted by For a given v ∈ R m , we suppose that there exists a unique group Nash equilibrium k ∈ M, and ( 8) under the following assumption (which implies Assumption 1).Assumption 3: The group utility function 28) is continuously differentiable with respect to x −k for any v k ∈ R and k ∈ M.
Note that there may not exist a coefficient v such that x (v) coincides with the maximum point of Π(x) because the intergroup incentive function (4) under observed (limited) data restricts feasibility on making the vector ∂ Ũ k (x (v)) ∂x k parallel to c k for all k ∈ M so that (8) holds, but there may exist a best intergroup incentive coefficient v maximizing the weighted social welfare Π(x (v)) given by ( 2) at v = v ∈ R m .For the following statements, we denote the best intergroup incentive coefficient by v arg max v∈R m Π(x (v)) and use the corresponding group Nash equilibrium x x (v ) as the target equilibrium 3 .Moreover, we define which are known to the system governor because the group utility functions U 1 , . . ., U m are supposed to be known to him.
In addition, we define and Γ(x, v) ∈ R n×m . 2 The system governor may not be able to maximize the social utility because of the lack of enough information as discussed later. 3If at the the maximum point of Π(x) happens to be equal to δ k c k with some scaling factor δ k ∈ R for all k ∈ M, then the target equilibrium Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Remark 7:
If n k = 1 for all k ∈ M, the considered problem is reduced to an incentive designing problem for m-agent systems (which has been addressed in [17] and [18]).Now, we propose a framework of how the system governor appropriately designs v(t) ∈ R m for the intergroup incentives in the manager layer to encourage the trajectory of agents' state converge toward the target equilibrium x .In the beginning, let us suppose that v(t) ≡ v ∈ R m .
Corollary 2: Consider the hierarchical noncooperative system with pseudogradient dynamics (9) under Assumption 3. Let the intragroup incentive function (6) be updated by (11) with (28).If the matrix A s diag[α]A(γ, x ) is Hurwitz, then the intergroup incentive functions (3), (4) along with v(t) ≡ v ∈ R m guarantee that the target equilibrium x is locally asymptotically stable.
Remark 8: For the case of quadratic payoff functions J i (x), i ∈ N, as defined in (18), it follows that the parameter xk (t) in ( 28) is given by: so that they do not depend on v k and x.As a direct consequence of Theorem 2, it can be shown that the intergroup incentive function (4) along with v(t) ≡ v ∈ R m guarantees that the target equilibrium x is globally asymptotically stable if and only if A s is Hurwitz with A s defined in (24).It is intuitive that only letting v be a constant vector does not guarantee the convergence when the matrix A s is not Hurwitz.Hence, it is natural to consider a feedback controller for the intergroup incentive mechanism for the system governor based on the observed data x(t).Specifically, consider a linear feedback controller Note that the linear feedback controller (33) ensures that the target equilibrium x is an equilibrium of the closed-loop dynamics of ( 10), (11), and (28) given by (34) where the group managers' strategy profile u understood as a function solely depending on x.For the following statement, we let: However, it is necessary to point out that the individual payoff functions J i (x), i ∈ N, are unknown to the system governor.
To deal with the uncertainty in J i (x), i ∈ N, we present the following result for guaranteeing asymptotic stabilization.Theorem 4: Consider the hierarchical noncooperative system with pseudogradient dynamics (9) under Assumption 3. Let the intragroup incentive function (6) be updated by (11) with (28).Suppose that there exists α ∈ R n + such that the matrix He(diag[α] block-diag[∇f 1 (x ), . . ., ∇f m (x )]diag[α]) is negative definite.Then, the intergroup incentives (3), ( 4), (33) with the matrix guarantee that the solution x(t) ≡ x of the closed-loop dynamics given by ( 34) and ( 35) is locally asymptotically stable.

Remark 9:
The conditions in Theorem 4 can be simplified for the case where the payoff functions are quadratic and given by (18).Specifically, supposing that there exists This is because T k and (32).Note that (38) guarantees global asymptotic stability for the target equilibrium.

V. ILLUSTRATIVE NUMERICAL EXAMPLES
In this section, several numerical examples are provided to demonstrate the efficacy of our proposed approach.
Example 1: Consider the 4-agent hierarchical noncooperative market with the agents' sets N 1 = {1, 2}, N 2 = {3, 4}, and the payoff functions (18) , and the other unmentioned parameters being zero.Suppose that there is no system governor coordinating the two subgroups.Let the priorities evaluated by the group managers be equal, e.g., η 1 = η 2 = 1 and η 3 = η 4 = 1.Letting the sensitivity parameters be given by α = (1, 1, 1, 1), the group Nash equilibrium without intergroup incentive is given by x = [−1.3350,0.2341, 4.3729, −3.6594] T and the matrix Hurwitz.Then, it follows from Theorem 2 that the group Nash equilibrium x (0) is globally asymptotically stable under the pseudogradient dynamics (9) incentivized by the intragroup incentive scheme ( 6), (11).On the other hand, let the sampling instants t s , s ∈ Z 0 , satisfy that T s = t s+1 − t s ∈ {0.15, 0.09} for the sampled-data-based update rule.In this case, Φ(T s ) = e T s A (I 4 + A −1 K) − A −1 K with K given by (25) for T s = 0.15 and 0.09 satisfies (26) for  λ(x) = λ 0 − n i=1 β i x i , where x i ∈ R + denotes the quantity of the products and β i ∈ R + denotes the market power of firm i, and λ 0 ∈ R + is a market specific parameter representing the cap price.In this country (Cournot game), firms compete in quantities rather than prices according to the payoff functions given by J i (x) = λ(x)x i − C i (x i ), i ∈ N, where C i (•) is the production cost of firm i given by C i (x i ) = a i x 2 i + b i x i , i ∈ N, with a i ≥ 0 and b i > 0. The gross sales value of production in city k ∈ M is given as the group utility function U k (x) defined in (1) with η i = 1, i ∈ N k , whereas the gross domestic product is given as the social welfare function Π(x) defined in (2) with In terms of the incentives, each firm in city k is influenced by the production tax/subsidy (intragroup incentive) p k i (u k , x k ) depending linearly on the firm's production quantity given by (6) administered by the mayor (group manager).Likewise, each city k ∈ M is influenced by the transaction tax/subsidy (intergroup incentive) g k (v, x) depending linearly on the sum of the firms' production quantities xk = 1 T n k x k in city k given by (4) administered by the national economic administration (system governor), i.e., c k = 1 T n k , k ∈ M. Note that the role of the incentive functions p k i (u k , x k ) and g k (v, x) is to suggest the modified payoff structure Ji (u k , x) and Ũ k (x) to the agents and the group managers such that those players should follow the pseudogradient dynamics (34) and the update rule (11) associated with the modified payoff functions (5) and (3).Different from the objective of mayor k maximizing the incentivized group utility Ũ k , the objective of the national economic administration is to maximize the gross domestic product Π(x) using the observed data xk , k ∈ M, and the gross sales values U k (x), k ∈ M, of production.Now, let n = 60 and suppose that the amount of firms in each city is equal to each other satisfying mn k = 60, k ∈ M. Fig. 5(a) shows the gross domestic product (social welfare) Π(x) at the (unique) group Nash equilibrium x (0) without intergroup incentive for a i = 10, i ∈ N, b i = 3, i ∈ N, λ 0 = 8, and β i ∈ (0, 0.2), i ∈ N, satisfying Assumptions 1 and 2 with m = 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, and 60, where the number of firms in each city k ∈ M is given by n k = 60, 30, 20, 15, 12, 10, 6, 5, 4, 3, 2, and 1, respectively.[How the values of Π(x ) for the cases of (m, n k ) = (1, 60) and (60,1) is calculated is explained below.]Fig. 5(b) shows the difference values between the gross domestic product Π(x) at the group Nash equilibrium x (v) with v = 0 and v = v arg max v∈R m Π(x (v)), which indicates the improvement made by the national economic administration via constructing the intergroup incentives.
When (m, n k ) = (1, 60) and (60,1), it turns out that the proposed three-layer hierarchical incentive structure reduces to the incentive structure characterized in [7] because only one of the mechanisms of the intergroup incentives (4) and the intragroup incentives ( 6) is essentially working.Specifically, the intragroup incentive functions ( 6) for (m, n k ) = (1, 60) constructed by the single group manager are exactly the same as the intergroup incentive functions (4) for the case of (m, n k ) = (60, 1) constructed by the system governor.Here, note that the system governor or the group manager constructing the incentive mechanisms happens to possess complete information from the agent layer for maximizing the social welfare for the entire society [i.e., x (0) and x (v ) coincide with the maximum point of Π(x) for (m, n k ) = (1, 60) and (60,1), respectively].
However, in general, obtaining the full information of all the agents in a centralized manner is extremely costly, and hence the hierarchical incentive structure has to be established.Even though the degradation of the social welfare can be seen in the three-layer hierarchical incentive structure for (m, n k ) = (1, 60) or (60,1) due to the lack of information, the orange line which represents Π(x (v )) with the best intergroup incentive coefficient v in Fig. 5(a) is understood as the maximum value of the social welfare that the national economic administration can help to reach.It is interesting to note from Fig. 5(b) that the larger the number m of the groups is, the more difference we can observe between Π(x (v )) and Π(x (0)) so that the role the system governor plays becomes more important.

VI. CONCLUSION
In this article, we investigated the stability and stabilization problem for the noncooperative systems.In the characterized framework of hierarchical noncooperative systems, agents selfishly make their decision under some intragroup incentives, which are controlled by the group managers and updated by our proposed update rules.We explored the stability of group Nash equilibrium of the hierarchical noncooperative systems with dynamic agents, and derived conditions where the trajectory of agents' state converges to the group Nash equilibrium under group managers' intragroup incentives.Furthermore, we proposed the intergroup incentive mechanism for a system governor in order to reconstruct the group utility functions in the group managers level to move the group Nash equilibrium so that the social welfare is improved.To deal with the situation where the system governor may not know all the agents' individual payoff functions and all the agents' state, we presented sufficient conditions to guarantee the convergence of agents' state toward a target equilibrium using some macroscopic data.In this article, even though we assumed that the system governor is able to obtain 1-D data from each group, the case where richer (higherdimensional) information is available for the system governor is expected to have higher welfare state when we evaluate the target equilibrium.Finally, we provided three numerical examples for demonstrating stability and stabilization of the group Nash equilibrium for 4-agent and 60-agent hierarchical noncooperative systems.Future extensions include the incentive design with time-varying payoff dependency and cognitive hierarchy, member affiliation problems, and security problems with malicious attackers in the agent layer.

APPENDIX
Proof of Theorem 1: First, note that the group Nash equilibrium is an equilibrium of the closed-loop dynamics of ( 10)- (12).Note that the vector u k (t) in the update rule (11) can be expressed by u pressed by (39), shown at the bottom of this page, below because (14) and (15).Therefore, the Jacobian matrix of the closed-loop dynamics of (10)-( 12) at x (0) is given by diag[α](∇u(x (0)) + ∇f (x (0))) = A s .Then, it follows from Lyapunov's indirect method that the result is immediate.
Proof of Proposition 1: First, letting x x − x .Recall that linearizing the system dynamics (9) around x yields ẋ(t) = A s x(t).Consider the Lyapunov function candidate (γ, x (0)) < 0 is satisfied, it follows using the linearized dynamics that V (x(t)) = xT (t)(A T s P + P A s )x(t) < 0 around x (0) and hence the group Nash equilibrium x (0) is locally asymptotically stable for any α ∈ R n + .Proof of Theorem 2: First, note that the sufficiency is a direct consequence from Theorem 1.For necessity, it follows from 11), (22), and ( 23) yield Then, it follows that u(t) = Kx(t) + l − b, where the matrix K is defined in (25) and: ) Now, the closed-loop dynamics of ( 11) and ( 19) are given by ẋ ), where we used the facts Since the group Nash equilibrium x (0) is a unique equilibrium of the closed-loop dynamics under Assumption 2, it follows that x (0) is globally asymptotically stable if only if A s is Hurwitz.The convergence result for u(t) is also immediate since u(t) = Kx(t) + l − b holds.
Proof of Proposition 2: First, note that the entire profile u in pseudogradient dynamics (19) characterized from managers' intragroup incentive schemes is given by u(t) = Kx(t s ) + l − b, t ∈ [t s , t s+1 ), where the matrix K and the vector l are defined in (25) and (41).In this case, the closed-loop dynamics of (11) and (19) where we used the fact (A + K)x (0) + l = 0.Then, the solution of the continuous-time dynamics (42) satisfies for t ∈ [t s , t s+1 ) with τ t − t s ∈ [0, T s ), which indicates for any positive-definite matrix Q ∈ R R n×n .It follows from (43) that the state x(t) can be expressed as x(t) = Φ(τ )x(t s ), t ∈ [t s , t s+1 ) with τ t − t s ∈ [0, T s ).Since Φ(t) is continuous and Φ(0) = I n holds, there exists a T ∈ R + such that Φ(τ ) is invertible for all t < T .Hence, it follows from Φ −1 (τ )x(t) = x(t s ) that: Since Φ(τ ) − I n → 0 as τ → 0, there exists ζ ∈ R + such that holds for all τ < ζ, where w = λ min (Q) > 0 is the minimum eigenvalue of Q.Now, let σ = min(ζ, T ) so that T s < ζ and T s < T for all s ∈ Z 0 .Now, consider the Lyapunov function candidate V (x) = 1 2 xT P x.Then, it follows from xT Qx > w x 2 , (45), (46), and (47) that the time derivative of V (x) along the system trajectories of (42) is given by: V (t) = xT (t)P diag[α](Ax(t) + K x(t) + K x(t s ) − K x(t)) = −x T (t)Qx(t) + xT (t)P diag[α]K(x(t s ) − x(t)) ≤ −w x(t) 2 + x(t) P diag[α]K x(t s ) − x(t) and hence the result is immediate.
Proof of Corollary 1: The result is direct consequence of Theorem 3 by noting from Proposition 1 that the matrix A s = diag[α]A s is Hurwitz for any α ∈ R n + .

Proof of Corollary 2:
The proof is a direct consequence of Theorem 1 since the Jacobian matrix at x is A s .
Proof of Theorem 4: First note that the linearized closed-loop dynamics with the shifted x = x − x state are given by ẋ(t) = diag[α](∇f (x ) + ∇ũ(x , v ))x(t), where ∇ũ(x , v ) is the Jacobian matrix of the function u(x, v(x)) with respect to x at (x , v ) given by ∇ũ(x , v ) = i t follows using the linearized dynamics that V (x(t)) = −2(x(t) − x ) T R(x(t) − x ) < 0, around x and hence the target equilibrium x is asymptotically stable for any matrices ∇f k (x ), k ∈ M, and any α ∈ R n + .

Manuscript received 14
October 2022; revised 20 October 2022; accepted 27 January 2023.Date of publication 6 February 2023; date of current version 14 June 2024.This work was supported in part by the JST Moonshot R&D Program under Grant JPMJMS2021 and in part by the JSPS KAKENHI under Grant 21K04117.The work of Yuyue Yan was supported by Chinese Scholarship Council (CSC).Recommended by Associate Editor B. Gharesifard.(Corresponding author: Tomohisa Hayakawa.)

Fig. 1 .
Fig. 1.Hierarchical noncooperative system with m groups of agents.The network in the agent layer represents payoff dependencies.The system governor (e.g., central government) appears at the top of the hierarchy and specifies the social utility function.The group managers and the agents are considered as the selfish players taking into account the intergroup and the intragroup incentives, respectively.

Fig. 2 .
Fig. 2. Block diagram of available information between the three layers.The strategy u k i of group manager k and the intergroup incentive coefficient v k are understood as the control signals to the agents in N k and the group manager k, respectively.

Fig. 3 .
Fig. 3. Trajectories of x(t) and u(t) influenced by the managers' intragroup incentives and update rule (11).Dash-dotted: Under the continual update rule (CUR).Solid: Under the sampled-data-based update rule (SUR).The black dash-dotted lines in (a) represent the group Nash equilibrium x .(a) Trajectories of agents' state.(b) Trajectories of group managers' strategy.