Topology control with IPD network creation games

Network creation games couple a two-players game with the evolution of network structure. A vertex player may increase its own payoff with a change of strategy or with a modification of its edge-defined neighbourhood. By referring to the iterated prisoners dilemma (IPD) game we show that this evolutionary dynamics converges to network-Nash equilibria, where no vertex is able to improve its payoff. The resulting network structure exhibits a strong dependence on the parameter of the payoff matrix. Degree distributions and cluster coefficients are also strongly affected by the specific interactions chosen for the neighbourhood exploration. This allows network creation games to be seen as a promising artificial-social-systems approach for a distributive topology control of complex networked systems.


DEUTSCHE PHYSIKALISCHE GESELLSCHAFT
Basic concepts of network creation games are presented in section 2 together with four different schemes of neighbourhood explorations. The IPD is briefly described in section 3. Convergence into and perturbation of NNEs are discussed in section 4. The structural properties of the asymptotically stationary NNEs are presented in section 5. Degree distribution and cluster coefficient reveal a clear dependence on the parameters of the payoff function and on the different schemes of neighbourhood explorations. Conclusions are given in section 6.

Basics of network creation games
Network creation games couple a two-players game with the evolution of network structure. The game is played in subsequent rounds. During one round a randomly chosen vertex is activated while all other vertices remain passive. The active vertex is interested in increasing its own payoff. It can do so either by a change of strategy and/or a modification of its edge-defined neighbourhood.
In the first place, an active vertex i plays with its up-to-then frozen strategy s old i against all its current k i neighbours n ∈ N i , which are those vertices that are connected to i by an edge. Due to their passive status, the strategies s frozen n of the neighbours are all frozen. The game interaction of i with n is evaluated by the payoff function π(s old i , s frozen n ). From all its neighbours player i collects the average payoff For the case k i = 0 the interaction-less player i gets no payoff, i.e. π old i = 0. Vertex i is interested in maximizing its average payoff. It could do so either in a random or a deterministic way. In the first case, the vertex would test a randomly chosen strategy and accept/reject it in the case of a payoff increase/decrease. We prefer to use the deterministic case. In the deterministic case the vertex rationally determines the average payoff for every possible strategy. The best-response strategy s opt i = arg max π i (s i ) yields the largest average payoff If the best-response strategy is not equal to the old strategy s old i , then vertex i changes its strategy to s opt i . In the case of degeneracy, the vertex has two options. If its old strategy belongs to the current degenerate set of optimal strategies, then it again sticks to the old one. Otherwise the vertex will choose one of the degenerated best-response strategies at random.
The vertex might even gain more payoff by modifying a part of its neighbourhood during a third and final step of its active round. We will discuss four different schemes of neighbourhood explorations. All of them are confined to edge exchange, which keeps the overall number of edges in the network constant. We denote them as (i) XOR-new-vertex-exploration (XOR-NVE), (ii) OR-new-vertex-exploration (OR-NVE), (iii) XOR-new-edge-exploration (XOR-NEE), and (iv) OR-new-edge-exploration (OR-NEE).
Within the XOR-NVE scheme the active vertex sticks to its best-response strategy s opt i obtained from (2). One after the other the active vertex then plays against all its non-neighbour i turns out to be larger than the average payoff (2), does the active vertex build a new edge to one of the maximum-producing vertices m and remove the edge to one of the old neighbours which has provided i with the lowest payoff min n π(s opt i , s frozen n ). Otherwise the active vertex i will keep its neighbourhood as before. In the case that the active vertex has had no neighbour before, it of course accepts the new neighbour, but in order to conserve the overall number of edges an old edge somewhere in the network is taken out at random.
The OR-NVE scheme is similar to the XOR-NVE scheme. In addition the active vertex is now also allowed to modify its strategy during the new-vertex exploration in order to search for the best combination of new strategy and new neighbour. If max s i max m (π(s i , s frozen m )) > π opt i , then the active vertex adopts the new strategy as well as the new neighbour, and disconnects its worst old neighbour.
The third and fourth neighbourhood exploration schemes XOR-NEE and OR-NEE are fully analogous to the first and second schemes XOR-NVE and OR-NVE, except that the new-vertex exploration is replaced by a new-edge-exploration. The active vertex then tests the two vertices attached to the picked edge. In case of an increase of its average payoff, the active vertex first extends its neighbourhood by these two vertices, only then to remove the two worst out of its temporarily k i + 2 edges. Note that it might happen that one of the two new edges is again removed. Some further special situations have to be distinguished. If one of the two vertices attached to the explored edge is already a first neighbour to the active vertex, then in the case of an increase of its average payoff only one edge to the worst old neighbours is removed. If the active vertex has had no neighbours before, then the two vertices attached to the explored edge are accepted as new neighbours and two other edges somewhere in the network are taken out at random.
In all of the four neighbourhood exploration schemes the vertex i has a global information horizon to locally change the topology of the network. Generalizations towards a local information horizon are straightforward, but come with a disadvantage: during a neighbourhood update it may happen that the network disintegrates into disconnected clusters. With a local information horizon a reconnection is excluded during all following updates. On the contrary, a global information horizon allows for a future reconnection. This is the reason why we will go for the global information horizon.
During one round of the network creation game it might occur that neither by a change of strategy nor by a new-neighbourhood exploration is the active vertex able to improve its average payoff beyond the initial (1). In subsequent rounds this might happen also to other then active vertices. These vertices stick to their old strategy and their old neighbourhood. If no vertex belonging to the network is able to improve its payoff either by a change of strategy or by a modification of its neighbourhood, then all vertices stick to their old strategy and their old neighbourhood. The network state will no longer change. Such stable network states are called NNE [7]- [12].
The existence of NNE is not clear a priori. In section 4, we will see by extensive simulations that for the network creation game based on the IPD in fact such NNE do exist.

IPD
The prisoner's dilemma (PD) is a two-players game where each player can either choose the strategy cooperate (C) or defect (D). The game is characterized by its payoff matrix where π(s i , s j ) is the payoff of player i with strategy s i playing against opponent j with strategy s j . The entries of the payoff matrix are called suckers payoff (S), punishment (P), reward (R), and temptation to defect (T ). They obey the inequalities We fix and keep T as a free parameter that we will vary within the bounds 3 < T < 6. The dilemma arises because the global optimum π(C, C) = R is ruled out by the selfish behaviour of the individuals. Since π(D, s j ) > π(C, s j ) holds independent of the opponents strategy, the strategy D is dominant. Caused by this selfishness both players will therefore choose to defect, producing a payoff P for each of them. As long as the opponent sticks to D none of players is able to improve its payoff with a change of strategy. This state defines the only Nash equilibrium of the PD game. This only-D Nash equilibrium carries over to the PD game played on fixed networks. No matter how the initial distribution of strategies across the vertices of the network looks, after a sufficiently long round of rational payoff-maximizing updates all vertices will choose strategy D at the end 4 .
For a network creation game the PD is not very exciting. The only-D Nash equilibrium of fixed networks also carries over to the NNE of a PD network creation game. Every vertex will choose strategy D. If one of them is forced by perturbation to adopt the C strategy, then the active D vertices will all stick to their strategy and build up a new edge to the C vertex, thus increasing their individual average payoff. Once the C vertex becomes active again, it will not change its neighbourhood but switch its strategy from C to D to obtain a larger payoff. After this a new NNE is reached, where again all vertices have adopted the D strategy. Note, that for 3 < T < 6 the transition from one NNE to the next NNE is completely independent of the parameter T . Consequently, the ensembles of NNEs obtained from a PD network creation game for different T values result in identical network structure properties. A topology control by changing the value for T is not possible. This is no longer the case for the IPD game [16]. In its simplest version it comes with the eight strategies listed in table 1. Each strategy s = (s 0 , s C , s D ) consists of an opening move s 0 as well as the direct responses s C and s D to a previous C and D move of the opponent. If for example a player chooses the generous tit-for-tat strategy (C,C,D) and its opponent plays 6 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT Table 1. Each of the eight strategies of the IPD game comes with an opening move s 0 as well as the direct responses s C and s D to a previous C and D move of the opponent. Generous anti tit-for-tat 7 (C,C,D) Generous tit-for-tat 8 (C,C,C) Always cooperate with the always defect strategy (D,D,D), then C meets D in the opening move and D meets D in all subsequent moves. In terms of payoff for the generous tit-for-tat player this translates into S for the opening move and P for all subsequent moves. Averaged over their infinitely repeated encounter this produces the payoff ∞ (7, 1) = P. Analogously, we arrive at the full 8×8 average payoff matrix According to (6) the clash of the defective strategies 1 with 1, 1 with 5, and 5 with 5 all represent Nash equilibria. Also 7 with 7 represents a Nash equilibrium. The latter Nash equilibrium is stricter than the former because ∞ (7, 7) > ∞ (1, 1) = ∞ (1, 5) = ∞ (5,5). Put in other words, compared to the defective strategies the generous tit-for-tat strategy is dominant.
This has immediate consequences for the IPD game played on fixed networks. Almost independent of the initial distribution of strategies across the vertices of the network, the payoffmaximizing updates drive the strategies away from the defective regime and into the cooperative regime. The cooperative strategy regime mostly consists of generous tit-for-tat. Also strategies like always cooperate and suspicious cooperate might occur due to stabilizing effects between network structure and the prevailing generous tit-for-tat strategy. For more details about the IPD game on fixed networks, including a discussion on strategy-avalanche statistics between subsequent Nash equilibria, we refer to [16,17].

Existence and perturbation dynamics of NNEs
Upon using the two-players IPD game (6) for the network creation game of section 2, the first thing to demonstrate is the existence of NNE. This will be shown by simulation. For demonstration, the variant with the XOR-NVE scheme is applied to an initial BA-scale-free network [1], where each vertex has been randomly assigned one out of the eight possible IPD strategies. For the payoff parameter T = 4 figure 1(a) illustrates the degree distribution obtained from the first NNE. It differs significantly from the initial scale-free distribution. A similar finding holds for the strategy distribution. As can be seen in figure 2 it is no longer homogeneous. The by far most frequent strategy is generous tit-for-tat. Two other cooperative strategies occur with small frequency. Defective strategies are highly suppressed. It is the generous tit-for-tat strategy, which successfully fights against the defective strategies and cooperates with the cooperative strategy. Without a perturbation the first NNE state is stable.As perturbation the strategy of a randomly picked vertex is randomized. The evolutionary dynamics of the network creation game again sets in and converges to a second NNE. This sequence of perturbation and subsequent convergence into the next NNE is then repeated over and over again. Figure 1 gives a measure of difference between the probability distributions and is illustrated in figure 1(b). The two curves represent different initial network structures, namely a BA scale-free network and a random Poisson network with the same average degree k . Independent of the initial network structure, the asymptotic form p k (∞) is reached at approximately t ≈ 500. We refer to this asymptotic regime as the stationary NNEs. It is interesting to study the dynamics from one stationary NNE to the next one in more detail. We define a strategy-based Hamming distance (8) between two subsequent NNEs. As shown in figure 3(a) this quantity turns out to be very small. Despite a weak dependence on the payoff parameter T its values lies around H strategy ≈ 2. The reason for this small value becomes clear by looking again at figure 2. For the stationary NNEs the strategy distribution is almost entirely peaked in the generous tit-for-tat strategy. Consequently, almost all vertices adopt this strategy for the two subsequent NNEs.  (9) and (c) average avalanche size between two consecutive stationary NNEs obtained from the IPD network creation game with the XOR-NVE scheme. Each curve has been sampled over 20 independent realizations. The size of the network has been fixed to N = 100 vertices and an average degree of k = 8.
Similar to (8) we also define an edge-based Hamming distance between two subsequent NNEs. a ij represents the adjacency matrix of the network. Figure 3(b) reveals that the edge-based Hamming distance is significantly larger than its strategy-based counterpart. It also shows a strong increase with the payoff parameter T . The two Hamming distances (8) and (9) only look at the difference between the two subsequent NNEs. The average avalanche size of figure 3(c) reveals that there are not more strategy and edge changes in between. The avalanche size is defined as the number of active vertices, which either evoke a change of strategy or a change of neighbourhood. As can be seen in figure 4, the fluctuations around the average avalanche sizes turn out to be large.  Each curve has been sampled over 20 independent realizations. The size of the network has been fixed to N = 100 vertices and an average degree of k = 8.

Properties of stationary NNEs
Qualitatively, the results of section 4 also carry over to other payoff parameter values and to the other neighbourhood exploration schemes. We will now examine the quantitative differences of the respective stationary NNEs. Figure 5 depicts the degree distributions. For all neighbourhood exploration schemes they show a dependence on the chosen value for the payoff parameter T . For the XOR-NVE scheme the degree distribution is close to a Poissonian for T = 3, becomes broader for larger T and evolves into a two-hump structure for T = 6. For the OR-NVE scheme the trend is opposite. For T = 3 the distribution is broadest and evolves in direction of a Poissonian for larger T . Note here that we have included the two boundary values T = 3 and T = 6 in the discussions belonging to figures 5 and 6.
Note, that the Poissonian has to be viewed as a reference distribution. It is the stationary result of a local random edge exchange, where a randomly picked vertex builds up an edge to a randomly picked non-neighbour and removes an edge to a randomly picked neighbour [18]. In this respect, the results of figures 5(a) and (b) clearly show that the game component of the IPD network creation game introduces a new quality, which goes beyond a simple geometric restructuring of the network.
The degree distributions resulting from the two NVE schemes are illustrated in figures 5(c) and (d). The XOR variant produces little variation for most T values. Only close to T = 6 does a noticeable variation occur. All distributions come with a pronounced tail towards large degrees, but are not scale-free. A reference distribution is also shown, which results from a purely geometric new-edge randomization, where a randomly picked vertex builds up new edges to the two vertices of a randomly picked edge and removes the same amount of edges to randomly picked old neighbours. Although some small differences are noticeable, this distribution remains Another structural property of the stationary NNEs to look at is the degree-dependent cluster coefficient C|k , which counts the relative number of triangles attached to a vertex with degree k. The results for the four different neighbourhood exploration schemes of the IPD network creation game are shown in figure 6. For the variant with the OR-NVE the cluster coefficient does not show a k dependence. However, we notice a dependence on the payoff parameter. The cluster coefficient is a decreasing function with T and for T ≈ 5 approaches the value C ≈ k /N set by the reference Poissonian network obtained from a local random edge exchange. This trend is in accordance with the trend observed in the respective degree distribution, which has been exemplified in figure 5(b).
Except for T = 3, an analogous picture arises for the variant with the XOR-NVE. The cluster coefficient shows almost no dependence on the vertex degree. It is an increasing function with T . For T 3 slightly larger than three the cluster coefficient is close to the value C ≈ k /N set by the reference Poissonian network obtained from a local random edge exchange. So far, this trend is consistent with the trend observed in the respective degree distribution of figure  5(a). However, the payoff value T = 3 appears to be singular. The cluster coefficient reveals a power-law dependence C|k ∼ k −δ with exponent δ ≈ 0.66. We have carefully checked by extensive simulations that this power-law is immediately lost once the payoff parameter T 3 is chosen to be just slightly larger than three. We can not offer an explanation for this singular finding.
The cluster coefficients obtained from the NEE schemes turn out to be much larger than for the new-vertex exploration schemes. This is of course clear since the acceptation of a newly explored edge introduces a new triangle. For all payoff parameters T , although with differing quality, the cluster coefficient turns out to be a power-law function of the vertex degree. The exponent δ ≈ 0.4 as well as the order of magnitude of the cluster coefficient is the same as for the purely geometric new-edge randomization, where a randomly picked vertex builds up new edges to a randomly picked edge and removes the same amount of edges to randomly picked old neighbours. Note however, that for both new-edge exploration schemes the absolute value of the cluster coefficient reveals a noticeable dependence on the payoff parameter T .

Conclusion and outlook
By referring to the IPD we have shown that network creation games produce stationary NNE, where no vertex is able to improve its payoff by a change of strategy nor by a modification of its neighbourhood. The resulting network structures show a strong dependence on the payoff parameter as well as on the explicit scheme used to explore a modified neighbourhood.
This represents a self-organizing route towards a distributive topology control of complex networks. Put in other words: give a game to the vertices of a network and let them play all by themselves according to the given policy, then after some time they find a self-organized NNE topology. Once a different policy is given to them, the vertices will self-organize into another NNE topology with different structural properties.
For evolutionary engineering the stability of the reached NNE is very attractive. A future challenge will be to find and design distributive game policies which are directly coupled to a global optimization objective. In the context of social and biological systems other processes besides those leading to a stable NNE are of interest. A more dynamic coevolution of network structure and game strategy has been recently proposed in [19].