The Evolution of Cooperative Behaviours in Physically Heterogeneous Multi-robot Systems

Evolutionary Robotics  (ER)  is  a methodology that  uses  evolutionary  computation  to  develop controllers  for autonomous  robots. The  existing works focus  on  single  robot  systems  or  physically homogenous  multi‐robot  teams,  although  physical heterogeneousness  is more prevalent  in the real world. It  is  therefore  instructive  to  examine  whether cooperative  behaviours  can  be  synthesized  using artificial  evolution  for  a  team  of  physically heterogeneous  robots. This paper makes  an  important contribution  in  answering  the  question  of  whether robots  with  distinct  capabilities  can  synthesize  their control  strategies  to  accommodate  their  own capabilities without human intervention. We present an empirical analysis of the collaboration mechanisms and suggest  guidelines  about  how  to  choose  appropriate evolution methods. Simulated experiments with a team of  e‐puck  robots  show  that  evolution  can  lead  to effective controllers for robots with distinct capabilities.


Introduction
The unknown and dynamic characteristics of environments may mean that explicitly designing a control system for a single robot is difficult, let alone for multi-robot systems (MRS). Evolutionary robotics (ER) is a field of research that applies artificial evolution to generate control systems for autonomous robots [1,2]. By adopting artificial evolution, it is possible to learn and synthesize the control systems of a group of robots that are able to display expected behaviours without human supervision. In addition, explaining the initiation and maintaining of cooperation in multi-robot systems will accelerate our understanding of biological systems and human societies [3,4].
What has not been shown is that ER methods can be extended to generate robot controllers capable of complex autonomous behaviours; in particular, no ER work has yet shown that it is possible to evolve complex controllers in general cases or for generalized tasks [8]. Waibel et al. [20] distinguished three types of cooperative tasks and suggested guidelines for the optimal choice of genetic team composition and level of selection.
Although the above-mentioned works have their strengths and weaknesses, none of them fully satisfies our expected design requirements because they deal with a single robot system (e.g., developmental robotics [21,22]) or physically homogenous multi-robot teams. In this paper, we focus on physically heterogeneous multi-robot teams that interact to solve cooperative multi-agent tasks. Here, "physically heterogeneous" refers to hardware (or capability, e.g., onboard sensors and actuators) diversity and "genetically heterogeneous" refers to genetics (i.e., control rules) diversity. Dynamically formed heterogeneous robot teams executing coordinated tasks may have little prior information about the tasks, other robots, and the environments in which they will operate.
Cooperative co-evolutionary algorithms (CCAs) [23][24][25][26] are applied by manually decomposing problem representations into subcomponents and then creating separate populations of individuals for each subcomponent. This allows a team to be composed of specialized sub-groups and corresponds to biological coevolution of multiple species. Many characteristics differentiate the approach proposed in this paper from CCAs. While CCAs require user-specified decomposition of the problem, we do not rely on this decomposition. Moreover, these CCAs evolve the controller of physically homogenous robots (or software agents) under different sub-groups with different fitness evaluation functions; we evolve the controller of physically heterogeneous robots under different sub-groups sharing the same fitness evaluation functions.
Unlike the existing works in this field, our work seeks to answer the question of whether robots with distinct capabilities can synthesize their control strategy to accommodate their own capabilities without human intervention. To attain this objective, we have designed an evolution framework to synthesize cooperative behaviours for physically heterogeneous MRS and introduce the important factors that can influence the efficiency of evolution. We investigate this phenomenon in an attempt to understand these enabling features.
Several options for the key steps are presented to evaluate the synthesis process of controllers using artificial evolution under different experimental conditions. Accordingly, we compare the performance and suggest guidelines about how to choose appropriate evolution methods.
The rest of this paper is structured as follows. In the next section, we introduce the design framework of evolution and present a detailed discussion on the implementation of the proposed method. The experimental set-up in Section 3 includes the description of simulated environment and experimental conditions. In Section 4 we discuss the experimental results and evaluate the performance of synthesized controllers, and conclude with a summary and outlook in Section 5.

Evolution of Cooperation
While evolving control strategies for MRS consisting of multiple robots with distinct capabilities, it is expected that the team can display cooperative behaviours, and that each type of robot can synthesize its control strategy to accommodate its own capabilities. Firstly, we focus our attention on design framework in Algorithm 1, and introduce the important factors that can influence the efficiency of evolution. Algorithm 1. The design framework of evolution. There are several different implementations of the steps in Algorithm 1. In order to verify the effectiveness, the proposed framework is applied to evolve desired behaviours for physically heterogeneous multi-robot systems. We present different options for the key steps, evaluate the synthesis process of controllers using articial evolution under different experimental conditions, and suggest guidelines about how to choose appropriate evolution methods.

Neural Controller
Neural networks have been proved in many studies to be efficient controllers of robots. The neural network may have hidden neurons [20], the connections and neuron units can be added or removed during evolution [15], and modular network architecture can be used to reduce interference in the adaptation of different parts of the controller system [27]. However, a simple feed-forward neural network with two layers is competent in our scenario, where only the weight connections of the neural controller are encoded and evolved. The neural network architecture can be chosen according to the actual situations.
The e-puck robots are small and two-wheeled, and equipped with eight infrared sensors to detect walls and other robots up to a distance of 60 mm, one colour camera, three directional microphones and a virtual gripper. Each robot is provided with two motors that control the speed of the two responding wheels and a loudspeaker that emits a sound (the loudspeaker has only two statuses: on or off) (see Figure 1). The e-puck robot can be configured to different types of robot with distinct capabilities. The high-level controller is implemented as a feedforward neural network with 18 and 3 nodes in the input and output layers, respectively (see Figure 2). The weights of the neural network are evolved using evolutionary algorithms. The 18 sensory neurons encode the state of the eight proximity infrared sensors, three sound sensors, five vision sensors, a gripper and a bias unit (i.e., a unit whose activation state is always 1.0). Each sensory neuron is directly connected to two motor neurons and a speaker neuron. Thus, the neural controller is made up of 18×3=54 connects, each associated with a weight ranging in the interval [-10,+10] and represented in the genotype with 8 bits. Therefore, the genotype is composed of 54×8=432 bits.

Evolution
Each type of robot has its own subpopulation P i (i=1, … , Ntype), and robots of the same type share the same genes (i.e., they are genetically homogeneous). Each subpopulation is co-evolved in a round-robin fashion, that is, the fitness of a subpopulation member is evaluated by combining it with the representatives from the remaining (temporarily frozen) subpopulations (see Figure 3). There are many possible methods for choosing representatives with which to collaborate, including greedy and random choice strategies. Three typical methods are used for choosing representatives in subpopulations. The "static-select" method selects fixed partners as the representatives, that is, the jth individual in subpopulation P i is evaluated by combining it with the jth individuals in the remaining subpopulations P k (k=1,…,Ntype, k≠i). However, the "bestselect" method selects the best individuals in the remaining subpopulations P k (k=1,…,Ntype, k≠i) to be combined with the jth individual in subpopulation P i , which is being evaluated. The "Pr-select" method is a more ecological approach, in which representatives are chosen non-deterministically based on their fitness. The probability for selecting the jth individual from subpopulation P i is given by where λ is a positive parameter called the inverse temperature, v i j is the fitness of the jth individual from subpopulation P i , and Npop is the subpopulation size. Low λ causes (nearly) equi-probable selection of all individuals, while high λ causes selection of the highest performance with probability close to one. We empirically set λ=t/100 at the generation t.
The combined group is allowed to live for four epochs, and each epoch consists of 120/0.05=2400 cycles (120 is the time each epoch lasts in seconds, and 0.05 is the length of the simulation step).
The evolutionary algorithm used to evolve the controllers utilizes a population of 100 randomly generated binary genotypes for every subpopulation. At every generation, the best 10% of genotypes will be directly copied to the new generation, and the best 30% of genotypes will be used to create new individuals, with a 20% probability of crossover. Each offspring is mutated with a 2% probability of flipping each bit. One evolutionary process lasts 400 generations, and five independent runs were performed starting with different randomly generated initial populations.

Performance Function Specification
We consider the different capabilities of robots in two facets: task discovering and task solving. The first facet focuses on the robots' movement and sensor ranges, and the second concerns the state transition of tasks being executed. We abstract the task into three different states (i.e., S0, S1 and S2) inspired by fire extinction. The start state of a task is S0, and it will transition to the next state S1 if the task in state S0 has been continuously executed by any robot for Tg seconds. Similarly, the transition from state S1 to S2 occurs once any robot continues executing the task in state S1 for To seconds, and we say the task is completed when it is in state S2. However, the task's state reverts from S1 to S0 after Tr seconds if no robot continues to execute it (see Figure 4). We use colour change as the environment cue to simulate the state transition of tasks in experiments, facilitating robots to evolve their behaviours. Robot teams consist of two types of e-puck robots, namely EP1 and EP2. The EP1 robots have greater speed and longer vision sensor ranges than EP2, but their task solving capabilities are quite poor compared to EP2: T 1 g=T 2 g and T 1 o>>T 2 o (e.g., T 1 g=T 2 g=2s, T 1 o=50s , T 2 o=5s -see Figure 4). In order to make it easier for robots to evolve their controllers, we introduce a virtual gripper to every robot. When a robot is standing before a task spot within 10 mm, its gripper is closed; otherwise, it is open. Although this particular scenario involves only two types of robots, the actual number of types may be larger.
We take two different options to set the fitness function for subpopulations. The first "shared-fitness" option means that every subpopulation shares the same fitness function: where Σ � is the total number of tasks found, Σ � is the total number of completed tasks, and α and β are scalar weighting factors. The values of the factors are empirically set to 2 and 5, respectively.
A robot will be rewarded only when it finds a task for the first time, and all robots finding the same task again will not be rewarded. This rule is set in case the abnormal behaviour of repeating to find the same task in order to gain reward is synthesized.
The second "independent-fitness" option means that two subpopulations have the separate fitness function: where f � � is the fitness function for EP1 robots, and f � � is for EP2 robots. Σ � is the total cumulative number of times robots choose to emit sound when they are around a task, and Σ � is the total cumulative number of times that robots choose to emit sound when they are not around any task. γ and δ are scalar weighting factors, which are empirically set to 2 and -2, respectively. γ � Σ � indicates that a robot will be rewarded if it chooses to emit sound when it is around a task, and δ � Σ � indicates that the robot will be punished otherwise.

Experimental Setup
To perform experiments with a large team of robots, we evaluated our approach in realistic simulation experiments (see Figure 5) on an open evolutionary framework called Teem [28]. The robots can perceive the intensity of sound using three sound sensors that simulate three directional microphones using a set of equations [9], and the total perceived amplitude (TPA), i.e., the result of the contribution of different sound sources corresponding to different robots is computed according to the proposed function. To simulate the robot and the environment as accurately as possible, short range detection of obstacles or other robots is achieved by using eight proximity sensors, simulated using a sampling technique [1]. Every robot has a circular camera installed at height H, and the camera will not see any object at a lesser height. The data collected by the cameras are the pixel and depth buffer for objects' colour and distance in field of view (FOV). The FOV and sensor range of the cameras are tuneable to the various sensor capabilities of different robots. A fog model is used to compute light attenuation with the function where �������� is the original data, �������� � is the computed data, ���������� is the density of fog, and ���������� is the colour depth.
For example, in Figure 6, two robots face each other and each robot's camera visualizer reflects other objects' colour, position and distance in FOV. The grey value, spatial difference and/or average value of the whole image and/or section are assigned to the neural controller's inputs. We evaluate the synthesis process of cooperative behaviours using articial evolution under different experimental conditions, as shown in Table 1. With the virtual gripper, a robot can easily estimate whether it is close to a task spot or not. Hence, the virtual gripper option is introduced to investigate the effect of this input information in the evolution of cooperative behaviours. With the speaker as an output, a robot's controller can emit a sound to attract other robots. In contrast, when the option is off, no robot can use the sound to invite other robots to participate in solving the cooperative task, and the evolution of cooperative behaviours may come to be more difficult.

Conditions
Options Comments Virtual gripper with-gripper Use the gripper as input.

withoutgripper
Do not use the gripper as input.
Representative selecting method static-select Afixed partner is selected as the representative. best-select The best one is selected as the representative.

Pr-select
The representative is selected non-deterministically based on the fitness.  At the beginning of each epoch the robots and tasks are placed in randomly generated positions and orientations within the arena. Each task will be randomly placed in a new position immediately after it is completed. Hence, the total number of tasks in the environment remains unchanged. There are four tasks, five EP1 and five EP2 robots, and the size of environment is 200×200.

Experimental Results
As we have mentioned above, the robot team consists of two types of e-puck robots, namely EP1 and EP2. The EP1 robots have greater speed and longer vision sensor ranges than the EP2, but their task solving capabilities are quite poor compared to the EP2. Hence, we expect the robots' control strategies to accommodate their own capabilities and maximum the global revenues; i.e., the EP1 robots should prefer to search for new tasks and emit sounds when they find them, and EP2 robots should be attracted by these sounds and follow them to help execute the new task. EP1 robots should be uninterested in following these sounds and instead choose to spread out to find other tasks.
We conducted Experiments 1-6 and the obtained results are presented in Figures 7-12. Table 2 lists all the comparative experiments that we have carried out. In order to remain concise, the default "with-speaker" option in Experiments 3-6 is not listed in the table. Note that the fitness plotted in Figures 7 and 8  The best performance of the robot team with and without speakers (i.e., in Exp 1 and Exp 2) did not differ significantly, as shown in Figure 7. The evolution of cooperative behaviours do not benefit from the "staticselect" method. However, one interesting observation is that the robot team with speakers outperforms the other with respect to the mean performance (see Figure 8). In fact, cooperative behaviours occasionally take place in the robot team with speakers. Consequently, we conclude that the robots are able to find tasks after evolution, and the sound sensors and speakers work well. The reason cooperative behaviours do widely emerge is that the probability of EP1 and EP2 robots with high fitness partnering up is very low under "static-select".
At the beginning of each epoch of each combined group from different subpopulations, the robots and tasks are placed in randomly generated positions. This explains why the best performance over 400 generations fluctuates significantly in Figure 7 and the mean performance is relatively steady in Figure 8.   1  22  43  64  85  106  127  148  169  190  211  232  253  274  295  316  337  358 379 400

Fitness
Generation without-gripper& static-select& shared-fitness& without-speaker without-gripper& static-select& shared-fitness& with-speaker We carried out Exp 3 and Exp 4 in addition to Exp 2, and present the performance comparison in Figure 9. The results illustrate that the cooperative behaviours are successfully synthesized under "best-select" and "Prselect". Exp 3 and Exp 4 display very similar performance, which is significantly better than Exp 2.
Moreover, Exp 3 outperforms Exp 4 in the early generations; however, the opposite is true in the later generations. This suggests that "best-select" is a good option for the early, unstable phase, and "best-select" is suitable for the later, stable phase. Figure 9. Evolution of the best performance using different selection method.   1  22  43  64  85  106  127  148  169  190  211  232  253  274  295  316  337  358  379  400 without-gripper& best-select& independent-fitness without-gripper& best-select& shared-fitness -10,00 Generation without-gripper& best-select& independent-fitness with-gripper& best-select& independent-fitness In Exp 3 and Exp 4, we observed the emergence of interesting evolution behaviours. Over the first few generations, robots do not even know how to find and execute tasks, let alone cooperate with another type of robot. Then, after robots learn how to accomplish tasks, some EP1 robots emit sounds, and other robots (including EP1 and EP2 robots) are attracted by the sound and help to carry out the tasks. EP1 robots find they do not benefit when they are attracted and engage in helping other robots; hence, they begin to neglect such sounds. In the end, EP1 robots concentrate on discovering task spots and attracting other robots, and EP2 robots are inclined to be attracted and help to execute tasks. In a word, every robot finally finds the appropriate niche in which it can make a useful contribution.
To test the hypothesis that each type of robot can synthesize its control strategy to accommodate its own capabilities even when all the robots share the same fitness function, we performed Exp 5 and compared performance to Exp 3 in Figure 10. The results illustrate that different options of fitness functions do not lead to significant changes in group fitness.It is important to note that in Exp 3 and in Exp 5 are compared, because has no meaning in comparison with . We are always concerned with the total number of found and completed tasks.
The significance of the results in Figure 10 is the suggestion that using the same fitness function for different sub-groups is an alternative to using independent ones that are difficult to design. However, this conclusion requires further study before it can be widely applied.
As mentioned above, we introduce a simulated gripper to every robot in order to make it easier for robots to evolve the controllers. The expected result is obtained for EP1 robots as shown in Figure 11: EP1 robots are more likely to generate controllers to emit sounds to attract other robots to solve tasks with the help of gripper. In contrast, Figure 12 shows interesting and counterintuitive results with respect to the EP2's fitness. The performances of EP2 with gripper and without gripper do not differ significantly. A possible reason could be that EP2 robots are able to synthesize effectively under both conditions. The results suggest that a good awareness of the position between tasks and robots may help robots to synthesize the desired controllers faster.

Verification of Controllers
Without loss of generality, the synthesized controllers from Exp 6 are used to demonstrate the controller's stability. Although there are many definitions of stability, we are concerned with whether the outputs of the system remain bounded under different initial conditions. Three different cases are used in the demonstration, as shown in Table 3, where Case 1 is the original case used for evolving the controllers. Each case was run 10 times to determine the mean and standard deviation of the total number of completed tasks for achieving statistical significance. The start positions of robots and tasks can be identical or different in the 10 runs for each case. Note that the results can even vary for 10 runs with identical start positions, because each task will be placed again randomly to a new position after it is completed. The results presented in Figure 13 suggest that the number of completed tasks is almost free from the impact of the settings of start positions of robots and tasks. It also  1  22  43  64  85  106  127  148  169  190  211  232  253  274  295  316  337  358  379  400 Fitness Generation without-gripper& best-select& independent-fitness with-gripper& best-select& independent-fitness shows that the controllers are able to scale with the team size to ensure acceptable solution qualities. The standard deviation in each data set is very small. These data and the observation reveal that the controllers demonstrate relatively stable performance with different start position settings. In order to further evaluate the synthesized controllers from Exp 6, the aforementioned scenario has also been implemented and tested on a team of physical robots with different capabilities. The set of robots forming our application scenario is composed of two CCNT-A and two CCNT-B robots, as illustrated in Figure 14. Each robot is equipped with a Logitech QuickCam Pro 4000 camera, BSWA MPA 416 array microphones, infrared distance sensors, front and rear bumpers, and a beep speaker. For the sake of consistency, the capabilities of the CCNT-A robots are configured to be compatible with EP1, and the CCNT-B robots correspond to EP2. In the experiment, the robots are charged with three tasks in an outdoor environment. In contrast to the simulated environment, the tasks are removed from the environment manually and will not be inserted into the environment again when completed. As expected, all the robots spread out to find tasks at the beginning of the experiment. A CCNT-B robot found a task first, and went to solve the task by itself. The reason why the CCNT-B robot did not emit sounds to attract other robots to help it to solve the task is that it was capable of doing the work alone efficiently. Then, a CCNT-A robot found another task, and it preferred to emit sounds to attract others to help to execute the task. Two CCNT-B robots were both attracted by the sound and went to help the CCNT-A robot, while another CCNT-A robot was not attracted and continued looking for other tasks. Finally, the last task was found by the CCNT-A robot and completed with the help of CCNT-B robots.
The results were therefore consistent with those observed in the simulated environment. This verifies the feasibility of using evolution methods to synthesize controllers for physically heterogeneous MRS.

Conclusions
In this paper, we have considered synthesizing cooperative behaviours using articial evolution for a team of physically heterogeneous robots. The method has been validated through a series of experiments with a team of e-puck robots. Satisfactory results have been obtained for the coordinated tasks. For future work, we will make a systematic comparison and evaluation of controller evolution under different conditions, and we also plan to reduce the computation cost by creating the necessary infrastructure to support distributed evolutionary computing.