A COEVOLUTIONARY MULTIOBJECTIVE EVOLUTIONARY ALGORITHM FOR GAME ARTIFICIAL INTELLIGENCE

Recently, the growth of Artificial Intelligence (AI) has provided a set of effective techniques for designing computer-based controllers to perform various tasks autonomously in game area, specifically to produce intelligent optimal game controllers for playing video and computer games. This paper explores the use of the competitive fitness strategy: K Random Opponents (KRO) in a multiobjective approach for evolving Artificial Neural Networks (ANNs) that act as controllers for the Ms. Pac-man agent. The Pareto Archived Evolution Strategy (PAES) algorithm is used to generate a Pareto optimal set of ANNs that optimize the conflicting objectives of maximizing game scores and minimizing neural network complexity. Furthermore, an improved version, namely PAESNet_KRO, is proposed, which incorporates in contrast to its predecessor KRO strategy. The results are compared with PAESNet. From the discussions, it is found that PAESNet_KRO provides better solutions than PAESNet. The PAESNet_KRO can evolve a set of nondominated solutions that cover the solutions of PAESNet.


INTRODUCTION
In recent years, there has been an increasing interest in bio-inspired computing (Mange & Tomassini, 1998;Sipper, 2002;Teo, 2003;Teuscher et al., 2003;De Castro & Von Zuben, 2005;Floreano & Mattiussi, 2008).It is a broad area encompassing disciplines such as evolutionary algorithms and artificial neural networks that transform biological ideas into computer operations and algorithms.Evolution and learning (Nolfi & Floreano, 1999) in computational intelligence are two mechanisms of bio-inspired algorithms to figure out the best and most effective solutions to problems arising from various science, engineering and financial fields in noisy, dynamic, complex environments.According to Nolfi and Floreano (1999), evolution is defined as "a form of adaptation capable of capturing relatively slow environmental changes that might encompass several generations", while learning is defined as a process that "allows an individual to adapt to environmental changes that are unpredictable at the generational level".Evolutionary Algorithms (EAs) (Poli & Logan, 1996;Deb, 2001;Eiben & Smith, 2007;Maragathavalli, 2011) are used as a stochastic optimization method to search a set of promising solutions in complex problems, based on the basic principles of biological evolution such as selection, crossover and mutation operations as shown in Figure 1.Coevolutionary Algorithms (CAs) are one of the classes of EAs in which the individual (or population) fitness is depends on the interactions with other individuals (populations).There are two basic methods of CAs in the literature: competitive coevolution and cooperative coevolution (Coello Coello & Sierra, 2004).In competitive coevolution (Rosin & Belew, 1997), individual fitness is evaluated by competing with other individuals to survive in a series of competitions.However, in cooperative coevolution (Potter & De Jong, 1994), the individual fitness is determined by cooperating with other individuals to solve the problems.
Artificial Neural Networks (ANNs) (Haykin, 2009) are a learning paradigm inspired by the operation of the biological nervous systems, which functions analogously to the human brain.Traditionally, ANNs are trained using learning algorithms such as backpropagation (Rumelhart et al., 1986) to determine the optimal connection weights between nodes.However such methods are gradient-based techniques which tend to have two major drawbacks: slow learning speed and easily becoming trapped in local minima (Zhu et al., 2005;Burse et al., 2011) when attempting to optimize the connection weights.There is a large volume of published studies describing the role of EAs in ANNs.Evolutionary approaches have been proposed as an alternative method for optimizing the connection weights to overcome the issues described above.ANNs evolved through this method are thus referred to as Evolutionary ANNs (EANNs).In the literature, research into EANNs generally involves one of three approaches: 1. Evolving the weights of the network (Belew et al., 1990;Fogel et al., 1990).2. Evolving the architecture (Miller et al., 1989;Kitano, 1990).
3. Evolving both simultaneously (Koza & Rice, 1991;Angeline et al., 1994;Teo & Abbass, 2004).The primary objective of this study is to investigate the effects of multiobjective competitive coevolution for artificial neural network in dynamic and unpredictable video game environments.One of the well-known Multiobjective Evolutionary Algorithms (MOEAs) called Pareto Archived Evolution Strategy (PAES) is integrated with K Random Opponents (KRO) competitive fitness strategy in order to evolve both architecture and connection weights (including biases) of ANNs.With this, it hopes to show that it is able to autonomously play the commercial video game known as Ms. Pac-man.This game is an interesting, non-deterministic and challenging test-bed for evaluating machine as well as human intelligence (Lucas, 2005).Therefore it is an ideal benchmark to test and analyze whether computer-based controllers can play the game in an intelligent manner similar to that of a human playing the game.

METHODS
This section is divided into three subsections to present and describe the PAES, the Pareto Archived Evolution Strategy Neural Network (PAESNet) and the integration of PAESNet with a competitive fitness strategy respectively.

PARETO ARCHIVED EVOLUTION STRATEGY
Pareto Archived Evolution Strategy or PAES was first introduced by Knowles and Corne (1999), is one of the simplest yet effective MOEAs.The mutation operator plays a major role in this algorithm by altering the genes in each chromosome in the population, such as Cauchy mutation, Gaussian mutation and so on.Additionally, PAES implements the elitism approach by preserving the best individuals from every generation, and an archive stores all the nondominated solutions along the Pareto front.A crowding method which works by recursively breaking down the objective space into d-dimensional grids is also introduced for diversity maintenance of the nondominated solutions in the archive.There are three different basic forms of PAES: (1+1)-PAES, (1+λ)-PAES and (µ+λ)-PAES (Knowles & Corne, 2000).The (1+1)-PAES generates a single offspring from a single parent through a mutation mechanism, and the offspring will then compete with the parent for survival.In the (1+λ)-PAES, a set of λ offspring is created from a single parent and the fittest individual is chosen among the λ offspring and the parent.In the (µ+λ)-PAES, a set of λ offspring is generated from µ parents.The next generation consists of the µ best individuals selected from the union of µ parents and λ offspring.Overall, the (1+1)-PAES is becoming more popular as compared to other forms because of its simplicity, which has also been applied to serve as a baseline algorithm for handling multiobjective optimization problems.Pareto Archived Evolution Strategy Neural Network or PAESNet is discussed.In this proposed system, two objectives are involved.The first objective, F 1 is to maximize the game scores of Ms. Pac-man game as shown in Equation 1 whereas the second objective F 2 is to minimize the number of hidden neurons in the feed-forward ANN as shown in Equation 2. The initial value of hidden neurons is set to 20.At the start of the initialization phase, the ANN weights, biases and active hidden neurons in hidden layer are encoded into a chromosome from uniform distribution with range [-1, 1] to act as parent and its fitness is evaluated.Subsequently, polynomial mutation operator is used with distribution index = 20.0 to create an offspring from the parent and its fitness is evaluated.After that, the fitness of the offspring and parent are compared.If the offspring performs better than the parent, then the parent is replaced by the offspring as a new parent for the next evaluation.Otherwise the offspring is eliminated and a new mutated offspring is generated.If the parent and the offspring are incomparable, the offspring is compared with set of previously nondominated individuals in the archive.The proposed algorithms are run 10 times with 5000 evaluations in each.Figure 2 shows the flowchart of PAESNet.
( ) where n and N represent the number of lives in a full game, M and h i represent the number of hidden neurons in the feed-forward ANN.Pareto archived evolution strategy neural network with K random opponents In this subsection, one proposed competitive coevolution PAESNet: Pareto Archived Evolution Strategy Neural Network with K Random Opponents (PAESNet_KRO) is presented for creating the Ms. Pac-man agent to solve two objective optimization problem.Basically, the framework of the PAESNet_KRO model is similar to the PAESNet as shown in Figure 2. The main differences of PAESNet_KRO in comparison to PAESNet are the two additional methods for parent selection process, opponents selection and reward assignment.The opponents selection method will select individuals as the opponents based on the KRO.The fitness of each individual is measured against K number of random opponents without self-play as shown in Figure 3.With this strategy, this method will randomly select opponents from the archive.The K is tested with the values of 2 in this study.After the opponents selection process, each individual will compete against the entire set of opponents.During the tournament, the reward value will be calculated for each competition by the reward function as shown in Equation 3.Each reward value will be summed up as the fitness score for the individual using the reward assignment method.The individual with highest fitness score is selected as the next parent and the iteration continues.The predefined maximum number of evaluations serves as the termination criterion of the loop.In this study, the number of runs is set to 10 and each run is tested 5000 evaluations consecutively. ) min( ) max( Coverage (C) metric is used for comparing the dominance relationship between two Pareto fronts.As stated in (Zitzler, 2000), the formal definition follows.
• Let P 1 , P 2 ⊆ P be two sets of nondominated solutions.
• The function C maps the ordered pair (P 1 , P 2 ) to the interval [0, 1]: if u 1 dominates u 2 or u 1 equal to u 2 .If the value C(P 1 , P 2 ) = 1 means that all the solutions in P 2 are dominated by P 1 .Otherwise, if value C(P 1 , P 2 ) = 0 represents the situation when none of the points in P 2 are dominated by P 1 .In addition, if C(P 1 , P 2 ) is higher than C(P 2 , P 1 ), then P 1 is better than P 2 .Figure 4 shows the graphical presentations for coverage metric.The scale is 0 (no coverage) at the bottom and 1 (total coverage) at the top per rectangle.

EXPERIMENTAL RESULTS AND DISCUSSIONS
Table 1 shows the experimental results of best scores over 5000 evaluations in 10 runs.A paired-samples t-test was conducted to ascertain whether there was a significant difference between scores on PAESNet and PAESNet_KRO.There was a significant difference in the scores for PAESNet_KRO (M = 19617, SD = 2182.5523)and PAESNet (M = 14795, SD = 2024.4629);t(9) = -4.7987,p = 0.0010, p < 0.05 (two-tail) as shown in Table 2.These results suggest that coevolutionary approach really does have an effect on the quality of PAESNet_KRO.Additionally, the coverage metric is used to compare the significance of the dominance relationship between two sets of nondominated solutions.From the data in Table 3, it is apparent that the nondominated solutions obtained by PAESNet are clearly dominated by the nondominated solutions obtained by PAESNet_KRO.The global Pareto fronts for the PAESNet_KRO and PAESNet are shown in Figure 5.The average value of PAESNet dominated by PAESNet_KRO is 93%.On the other side, the average value of PAESNet_KRO dominated by PAESNet is only 2%.It is interesting to note that almost all the coverage values of C(PAESNet, PAESNet_KRO) are equal to 0. These results indicate that none solution found by the PAESNet_KRO is dominated by any solution found by the original PAESNet.While, majority values of C(PAESNet_KRO, PAESNet) are equal to 1 mean that all solutions in PAESNet are dominated by PAESNet_KRO.Here, boxplots as shown in Figure 6 are used to visualize the distribution of these samples.As can be seen from the chart, the C(PAESNet_KRO, PAESNet) reported significantly more median than the C(PAESNet, PAESNet_KRO).Overall, the results show that PAESNet_KRO is capable to solve the multiobjective problem in dynamic game environments and achieve better nondominated solutions.A possible explanation for this might be that KRO strategy is more effectively to select the best nondominated solutions from the archive as the parent in order to create offspring for next generation.

CONCLUSIONS
PAESNet_KRO is presented, an improved elitist multiobjective evolutionary algorithm that employs competitive coevolutionary approach compared to its predecessor PAESNet.In this paper, two comparisons of PAESNet_KRO with PAESNet have been carried out via Ms. Pacman game domain.The key results of the comparison are (1) PAESNet_KRO performs better that its predecessor PAESNet in controlling the behaviour of Ms. Pac-man agent to play the game autonomously and (2) the measure coverage indicates clear advantages of PAESNet_KRO over PAESNet.In conclusion, the coevolutionary method has proven to be effective in improving the performance of multiobjective optimizer.

TABLE 1 .
The best game scores over 5000 evaluations in 10 runs

TABLE 2 .
t-test (paired two sample for means)