Di ﬀ erential evolution particle swarm optimization algorithm based on good point set for computing Nash equilibrium of ﬁnite noncooperative game

: In this paper, a hybrid di ﬀ erential evolution particle swarm optimization (PSO) method based on a good point set (GPDEPSO) is proposed to compute a ﬁnite noncooperative game among N people. Stochastic functional analysis is used to prove the convergence of this algorithm. First, an ergodic initial population is generated by using a good point set. Second, PSO is proposed and utilized as the variation operator to perform variation crossover selection with di ﬀ erential evolution (DE). Finally, the experimental results show that the proposed algorithm has a better convergence speed, accuracy, and global optimization ability than other existing algorithms in computing the Nash equilibrium of noncooperative games among N people. In particular, the e ﬃ ciency of the algorithm is higher for determining the Nash equilibrium of a high-dimensional payo ﬀ matrix game.


Introduction
Game theory is a mathematical theory that studies interactions between decision makers. It mainly studies how players use the information they have to make decisions. Game theory is widely used in many fields, such as economics, political science, biology and other fields [1][2][3][4][5][6]. Noncooperative game is the main component of game theory, and the Nash equilibrium is the core concept of noncooperative game. However, achieving equilibrium requires players to make predictions according to certain steps in the game. Therefore, it can be reduced to a calculation problem to a certain extent.
At present, many numerical methods have been proposed to solve the Nash equilibrium, such as the Lemke-Howson algorithm [7], global Newton algorithm [8], projection-like method [9], trust region algorithm [10] and other traditional calculation methods. In recent years, with the increased complexity of game problems, traditional numerical methods are almost impossible to use in solving due to the increasing difficulty of finding the solution and the computation time. Increasing numbers of scholars have focused on intelligent algorithms for biological simulation. Because of the rationality of imitating biological behaviors, such algorithms have the implicit rational characteristics of games. By such means, such algorithms can be considered as some sort of paths to achieve the Nash equilibrium.
Pavlidis [11] verified the effectiveness of three computational intelligence algorithms, namely, covariance matrix adaptation evolution strategies, particle swarm optimization (PSO) and differential evolution (DE), to compute the Nash equilibrium of finite strategic games. Boryczka [12] compared DE with two well-known algorithms: simplicial subdivision and the Lemke-Howson algorithm. It has also been proven that the DE method can obtain an approximate Nash equilibrium solution. Chen [13] utilized the genetic algorithm (GA) to acquire the Nash equilibrium of an N-person noncooperative game. With the examples of double matrix games, the implementation of the genetic algorithm was discussed and proven to be effective. Qiu [14] proposed an immune algorithm (IA) for solving game equilibrium. The advantages of this algorithm in solving game problems and its stable convergence were verified through examples. Franken [15] investigated the application of coevolutionary training techniques based on PSO to evolve the iterated prisoner's dilemma (IPD). Jia [16] proposed an immune particle swarm optimization (IPSO) to compute finite noncooperative games among N people. The results show that this algorithm is superior to the immune algorithm and original swarm algorithm in solving game problems. Yang [17] proposed the fireworks algorithm (FWA) to compute the Nash equilibrium of N-person noncooperative game. Computer simulation results demonstrate that the proposed algorithm is effective and superior to IPSO.
The above studies all demonstrate the advantages of intelligent algorithms in solving Nash equilibrium problems. However, several issues remain, such as the higher complexity of the algorithm, its slow convergence speed and its low accuracy; in particular, the convergence of the algorithm is not proven theoretically. The goal of this paper is to propose an efficient hybrid of GPDEPSO to solve the game problem and then prove its convergence. The paper is organized as follows: In Section 2, we summarize the concept of a noncooperative N-person game. In Section 3, we propose a hybrid algorithm, GPDEPSO, to compute the Nash equilibrium of noncooperative N-person games. First, we initialize the population with a good point set to ensure the initial particle distribution is global, which will help the algorithm avoid local convergence. Then, the position updating formula of PSO is simplified in a new form that does not have a velocity term, and it is used as a variation operator to perform variation crossover selection with DE. The convergence of GPDEPSO is proved by stochastic functional analysis in Section 4. Section 5 is devoted to computational experiments, and by comparing the algorithm proposed in this paper with other algorithms, its superiority is proven. [18] Definition 1. We consider an N-person finite strategic game defined by Γ = ((N,

Game and Nash equilibrium Definitions
where (1). N = {1, . . . , n} is the set of players and n is the number of players; (2). S i = {s i1 , . . . , s im i } ∀i ∈ N is the pure strategy set of player i, m i represents the number of strategies available to player i, S = n i=1 S i is the Cartesian product of pure strategy sets of all players, and each pure strategy profile meets (S 1 , S 2 , . . . , S n ) ∈ S ; (3). U i : S → R ∀i ∈ N represents the payoff function; (4).
∀i ∈ N is the set of mixed strategies, where x i j is the probability that player i adopts s i j for j = 1, · · · , m i . X = n i=1 X i is the Cartesian product of mixed strategy sets of all players, and each mixed strategy profile meets (x 1 , x 2 , . . . , x n ) ∈ X; (5). f i : X → R ∀i ∈ N represents the expected payoff function.
. . , s nk n ) n i=1 x ik i represents player i getting the expected payoff value when he chooses a mixed strategy x i = (x i1 , . . . , x im i ) ∈ X i . where U i (s ik i , . . . , s nk n ) represents player i getting the payoff value when each player i chooses the pure strategy s ik i ∈ S i , i = 1, . . . , n.
, ∀i ∈ N, then x * is the Nash equilibrium point of an N-person finite noncooperative game, where i ∧ = N\{i}, ∀i ∈ N. Conclusion 1. Mixed strategy x * is the Nash equilibrium point if and only if every pure strategy where (x * s ik i ) is only the player i replacing their own strategy with s ik i , and the other players do not change their own strategy under the condition of equilibrium solution x * .
In particular, the Γ is a bimatrix game when N = 2, (x * , y * ) is the Nash equilibrium solution if and only if x * By * ≥ x * By T , ∀y, where A and B are payoff matrices for each player.
Theorem 1. A mixed strategy x * ∈ X is the Nash equilibrium point of a game Γ if and only if x * is an optimal solution to the following optimization problem, and the optimal value is 0: Proof Necessity: Suppose that x * is the Nash equilibrium point. According to Conclusion 1, we have Sufficiency: Assume that x * is the solution of problem (2.1). According to x * satisfies conditions of (2.1), we can know that For the two-person double matrix game, the above optimization problem can be simplified as: where A i is the ith row of matrix A and B j is the jth column of matrix B. Then, (x * , y * ) is a Nash equilibrium solution of a two-person noncooperative game, which is also f (x * , y * ) = 0.
3. Differential evolution particle swarm optimization algorithm based on good point set (GPDEPSO)

Good point set
Good point set was originally proposed by Hua Luogeng et al. [19] and defined as follows: (2). Let G s have a set of points P n (i) with n points: where {·} represents the decimal part of the value.
(3). For any given point r = (r 1 , r 2 , · · · , r s ) ∈ G s , let N n (r) = N n (r 1 , r 2 , · · · , r s ) represents the number of points in P n (i) that meet the following inequality: where |r| = r 1 r 2 · · · · · r s , is called a deviation of the point set P n (i). If ∀n, ϕ(n) = O(1), then P n (i) is said to be uniformly distributed on G s and the deviation is ϕ(n).
is a constant related only to r, ε(ε is an arbitrarily small positive number), then, the P n (i) is called the good point set, r is called the good point.
In this paper, we take In the following, we generate two distribution maps (Figures 1 and 2) of 500 populations with the random point method and the exponential good point set method, respectively.  Figure 2. Two-dimensional initial population generated by exponential sequence.
As shown, the good point sequence is more uniform and global than the random point method. In addition, the good point method is independent with spatial dimensions, thus it can be well adapted to the high-dimensional problems. It is also stable that the distribution results are the same every time when the number of points is the same.

Differential evolution algorithm (DE)
DE is a new simple and robust evolutionary algorithm that was first introduced by Storm and Price [20]. There are four operations of DE: initialization, variation, crossover, and selection operation.
(1). Initialization Let each individual in a population be where N and D represent, respectively, the population size and space dimension. In the study of DE, it is generally assumed that the initial population conforms to uniform probability distribution, and its form is as follows: where rand[0, 1] represents random values in the range [0,1], x U j and x L j represent, respectively, the upper and lower bounds of parameter variables.
(2). Variation The variation operation is mainly executed to distinguish DE from other evolutionary algorithms. The variation of individual V = (v i1 , . . . , v iD ) is generated by the following equation: where r1, r2, and r3 are different integers between 1 and N, and they are also different from i; F is a constriction factor to control the size of difference of two individuals, and t is the current iterate point.
(3). Crossover We use the crossover between the parent and offspring with the given probability for generating new individual U = (u i1 , . . . , u iD ): where rand( j) is random value in the range [0,1], CR is crossover operator in the range [0,1], and rnbr(i) ∈ {1, . . . , D} is a randomly selected sequence, which ensures that a new individual gets at least one component value from the variation vector.
(4). Selection The offspring X t+1 i is generated by selecting the individual and parent according to the following equation: where f (·) is the fitness function value.

Particle swarm optimization algorithm (PSO)
PSO was proposed by Eberhart and Kennedy [21], and is a random search algorithm, which is inspired by the activities of flocking birds. PSO uses a population of individuals called particles, and there are two main operations in PSO, speed updating and position updating: The ω is the inertia weight, c 1 and c 2 are acceleration constants, r 1 and r 2 are random values in the interval (0, 1), v t i j is the ith particle's velocity in generation t, x t i j is the ith particle's position in generation t, x t pbest is the personal best position of particle i before generation t, and x t gbest is the global best position in the searching history [22].
Speed updating can be further explained as follows: ωv t i j is called the current state of the particle with the ability to balance global and local search. c 1 r 1 (x t pbest − x t i j ) is the cognitive modal of the particle, which represents the ability of learning from itself and endows particles with strong local search capabilities. c 2 r 2 (x t gbest −x t i j ) represents the social cognition modal of the particle and information sharing among particles, that is, the ability to learn from the entire population and endow particles with strong global search ability. Then, the position updating makes the particles reach the new position.
A simplified position transformation formula without velocity term is expressed as follows [23]: (3.10)

GPDEPSO experimental steps and its implementation
The steps of GPDEPSO is described as follows : Step 1: Set the parameters of GPDEPSO, such as N, D, CR, F 0 , ω min , ω max , x L , x U , c 1 , c 2 , and set the maximum number of iteratios T , accuracy ε, where Step 2: Randomly generate N initial populations P(0) by using a good point set, and m i Step 3: Calculate the fitness function value f (x) of each individual in population P(t) and determine the x t pbest and x t gbest .
Step 4: The next generation population P 1 (t) is generated by variation of formula (3.10), and population P 2 (t) is generated by variation of formula (3.6).
Step 5: The population P (t) is generated by crossover of formula (3.7).
Step 6: According to formula (3.8), populations P(t) and P (t) are selected to generate offspring population P(t + 1) and the fitness function value of population P(t + 1) is calculated.
Step 7: Determine whether to end according to the accuracy and the maximum number of iterations, and output the optimal value; otherwise, turn to step 3.
The pseudo code of GPDEPSO is as follows:

Algorithm 1 GPDEPSO
Input: Parameters N, D, CR, F 0 , T , ω min , ω max , x L , x U , c 1 , c 2 , ε Output: The best vector (Solution) · · · ∆ t ← 1 (Initialization with good point set) For the minimal optimization problem, GPDEPSO evaluation function { f (X t i ) : 1 < t < T } is a monotonous nonincremental sequence. The convergence of GPDEPSO is illustrated by He's stochastic functional analysis method in the literature [24]. An iteration process of GPDEPSO is abstracted as a composite random mapping of a DOPSO and a SO. Definition 3. The process of generating test vector U by DOPSO is to recombine and transform everyone dimensional component of the target vector according to probability θ = CR + 1/D. In addition, the process can be described as a random mapping of the solution space Ψ 1 : Ω × S → S 2 , which is defined as: where (Ω, A, µ) is the complete probability measure space and Ω is the nonempty abstract set, and its element ω is a basic event. A is the σ − algebra composed of some subsets of Ω, µ is the probability measure on A, and S is the solution space: The selection operator SO is the process of selecting the optimal individual from the test vector U i and the target vector X i according to the greedy selection method. It is a mapping on the solution space Ψ 2 : S 2 → S : Combining the above two definitions, it can be seen that one iteration of GPDEPSO is equivalent to mapping Ψ = (Ψ 2 • Ψ 1 ) : Ω × S → S (Ω × S → S 2 → S ) to the current population P(t), where Ψ is a reverse-order synthesis mapping corresponding to DOPSO and SO mapping. Then, for the new population P(t + 1) it can be expressed as P(t + 1) = Ψ(ω, P(t)) = Ψ 2 (Ψ 1 (ω, P(t))), 0 ≤ t ≤ T − 1.
Let f (X t best ) be the fitness function value of the best individual X t best in P(t). Under the effect of Ψ, the new generation population generated by GPDEPSO will be superior to the previous one. Therefore, the sequence { f (X t best )} 1≤t≤T is necessarily a monotonous nonincremental sequence (assuming that f (X) is the minimum optimization function in this paper). In addition, the evolutionary process of GPDEPSO can also be characterized by the optimal individual, and the mapping Ψ can be redefined as a mapping corresponding to the process of generating the optimal individual, that is, X t+1 best = Ψ(ω, X t best ) = Ψ 2 (Ψ 1 (ω, X t best )). Lemma 1. Let λ : S × S → R be the distance defined on S and satisfy λ(X i , X j ) = | f (X i ) − f (X j )|, ∀X i , X j ∈ S ; then, (S , λ) is a complete separable metric space.
Proof. According to the definition of DOPSO and SO operation, it can be seen that the new population generated by GPDEPSO in each iteration is better than the previous one. Therefore, for random mapping Ψ = (Ψ 2 • Ψ 1 ) : Ω × S → S , there exists a random variable with nonnegative real value 0 ≤ K(ω) < 1, which makes the following formula hold: best , X t best ). letting Therefore, the mapping formed by GPDEPSO, Ψ : Ω × S → S is a random contraction operator.
According to Theorem 2, the random mapping Ψ is a random contraction operator. By using the conclusion of Lemma 2, for Ψ(ω) there must exist a unique random fixed point. Then, the convergence criterion of GPDEPSO can be derived, which means that GPDEPSO is asymptotically convergent.

Results and discussion
The following Tables 1-4 present computational results of the above examples by GPDEPSO.   From above four tables, we can see that the Nash equilibrium can be obtained under the given parameters by using GPDEPSO proposed in this paper. In addition, the accuracy of fitness function values is higher approximately 10 4 times and 10 2 times than that of Refs. [16] and [17]. It can be seen that the speed of this algorithm does not be affected under the condition of increasing the population size (N=50). On the contrary, according to the results of Examples 1, 3, and 4, it is show that the number of iterations is reduced significantly, and the Example 4 is the most obvious. Compared with Refs. [16], the number of iteration of Example 4 is reduced by approximately 16 times, which is approximately 8 times compared with Ref. [17]. Through the above analysis, GPDEPSO is superior to the algorithms presented in the existing literature in terms of iteration times and accuracy of results. Furthermore, although the number of populations is lager than that in the previous literature, it does not affect the speed and accuracy of the algorithm, but plays a powerful role in finding the global optimal solution.
The following are two comparison figures for solving Example 5, a high-dimensional payoff matrix game. Figure 3 is a comparison between GPDEPSO and hybrid differential evolution particle swarm optimization algorithm (DEPSO). Figure 4 is a comparison of GPDEPSO, differential evolution algorithm based on good point set (GPDE), and particle swarm optimization algorithm based on good point set (GPPSO). The Nash equilibrium solution calculated by all methods is x * = y * = (0.1, . . . , 0.1) 1×10 .  As shown in Figures 3 and 4, the high-dimensional payoff matrix game can be solved better by GPDEPSO. Compared with Ref. [17], it can be seen that not only the calculation speed is greatly improved, but the accuracy of the fitness function value is also better. It can be seen from Figure 3 that GPDEPSO converges faster and more smoothly than DEPSO, which indicates that the good point set can avoid falling into a local optimum during the calculation process. From Figure 4, comparing GPDEPSO, GPPSO, with GPDE, and find that GPDEPSO combines the ability of PSO to find the global optimum quickly with the fast convergence ability of DE. In Example 5, GPDEPSO can quickly find the global optimal solution within the first 50 iterations, and the approximate exact solution can be obtained after 150 iterations. Therefore, GPDEPSO has a good advantage in solving Nash equilibrium problems with a high-dimensional payoff matrix.

Conclusions
We propose GPDEPSO from the point of view of DE, considering the different characteristics of DE and PSO, and we use a good point set to make the initial data more uniform. This algorithm combines the advantages of DE and PSO, which not only ensures the simple operation, easy implementation, and fast convergence of DE, but also enhances its global optimization ability. By solving the Nash equilibrium of noncooperative games, we find that the proposed algorithm is superior to the comparative algorithms in terms of the calculation accuracy and convergence. In particular, for a high-dimensional payoff matrix game, the efficiency of the algorithm is remarkable. In the future, since a Nash equilibrium problem is a complex, NP-hard problem, it would be interesting to consider the influence of different variation operations and selection strategies on the solution of the Nash equilibrium, as well as to consider the Nash equilibrium problems for more complex multiobjective games and multiple games.