Using Centroidal Voronoi Tessellations to Scale Up the Multi-dimensional Archive of Phenotypic Elites Algorithm

The recently introduced Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) is an evolutionary algorithm capable of producing a large archive of diverse, high-performing solutions in a single run. It works by discretizing a continuous feature space into unique regions according to the desired discretization per dimension. While simple, this algorithm has a main drawback: it cannot scale to high-dimensional feature spaces since the number of regions increase exponentially with the number of dimensions. In this paper, we address this limitation by introducing a simple extension of MAP-Elites that has a constant, pre-defined number of regions irrespective of the dimensionality of the feature space. Our main insight is that methods from computational geometry could partition a high-dimensional space into well-spread geometric regions. In particular, our algorithm uses a centroidal Voronoi tessellation (CVT) to divide the feature space into a desired number of regions; it then places every generated individual in its closest region, replacing a less fit one if the region is already occupied. We demonstrate the effectiveness of the new"CVT-MAP-Elites"algorithm in high-dimensional feature spaces through comparisons against MAP-Elites in maze navigation and hexapod locomotion tasks.


Using Centroidal Voronoi Tessellations to Scale Up the Multi-dimensional Archive of Phenotypic Elites Algorithm Vassilis Vassiliades, Konstantinos Chatzilygeroudis, and Jean-Baptiste Mouret
Abstract-The recently introduced Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) is an evolutionary algorithm capable of producing a large archive of diverse, high-performing solutions in a single run. It works by discretizing a continuous feature space into unique regions according to the desired discretization per dimension. While simple, this algorithm has a main drawback: it cannot scale to high-dimensional feature spaces since the number of regions increase exponentially with the number of dimensions. In this paper, we address this limitation by introducing a simple extension of MAP-Elites that has a constant, pre-defined number of regions irrespective of the dimensionality of the feature space. Our main insight is that methods from computational geometry could partition a high-dimensional space into well-spread geometric regions. In particular, our algorithm uses a centroidal Voronoi tessellation (CVT) to divide the feature space into a desired number of regions; it then places every generated individual in its closest region, replacing a less fit one if the region is already occupied. We demonstrate the effectiveness of the new "CVT-MAP-Elites" algorithm in high-dimensional feature spaces through comparisons against MAP-Elites in maze navigation and hexapod locomotion tasks.

I. INTRODUCTION
Evolution started from a common ancestor [1] and gave rise to the biodiversity we see today, which is estimated to amount to 1 trillion species [2]. Inspired by this observation, the field of evolutionary robotics has recently seen a shift from evolutionary algorithms (EAs) that aim to return a single, globally optimal solution, to algorithms that explicitly search for a very large number of diverse, high-performing individuals [3], [4], [5], [6], [7], [8], [9], [10].
EAs have traditionally been used for optimization purposes [12] with the aim of returning a single solution that corresponds to the global optimum of the underlying search space ( Fig. 2A). Various forms of diversity maintenance (niching) techniques have been designed to supply EAs both with the ability to avoid premature convergence to local optima and to perform multimodal optimization [13], [14], [15], [16], [17]. In the latter case, the aim is to return multiple solutions that correspond to the peaks of the search space (Fig. 2B).
This work was supported by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (Project: ResiBots, grant agreement No 637972).

MAP-Elites
CVT-MAP-Elites This means that the number of niches grow exponentially with the number of additional dimensions or discretizations, thus, it cannot be used in high-dimensional feature spaces. In contrast, Centroidal Voronoi Tessellation (CVT) -MAP-Elites, uses a CVT [11] to partition the feature space into k homogeneous geometric regions, where k is the pre-specified number of niches. Here, MAP-Elites uses 5 discretizations per dimension, resulting in 25 niches, whereas, CVT-MAP-Elites uses 7 niches.
A recent, alternative perspective in the field of evolutionary computation views EAs more as diversifiers, rather than as optimizers [8]. Research on such EAs was initiated by the introduction of the Novelty Search (NS) algorithm [19], [20], which looks for individuals that are different from those previously encountered in some behavior space or feature space (we use both terms interchangeably). The behavior space is defined by features of interest that describe the possible behaviors of individuals over their lifetime [21], [8]. For example, points in this space could be the final positions of a simulated robot in a 2D maze environment. By rewarding novelty instead of fitness, NS promotes behavioral diversity and accumulates potential stepping stones for building more complex solutions [22]. This algorithm relies on the intuition that it can be more beneficial to encourage exploration in the behavior space rather than the genotype space, which is confirmed experimentally for several domains [23], [24].
Yet, purely exploring the behavior space without considering the task performance is not practical in many cases. For example, we might not only be interested in finding controllers that make a robot reach different points in the environment, but also the fastest controller for each point. To address this issue, algorithms like NS with Local Competition [3] and Multidimensional Archive of Phenotypic Elites (MAP-Elites) [  Algorithms for global optimization aim to find a single global optimum of the underlying parameter space in which they operate (A). Multimodal optimization algorithms, on the other hand, aim at finding multiple optima (B). Illumination algorithms, such as MAP-Elites (C) go one step further by aiming to discover significantly more solutions, each one being the elite of some local neighborhood defined in some feature space of interest. For finding the solutions of the function illustrated, we used (A) Covariance Matrix Adaptation Evolution Strategies [18], (B) Restricted Tournament Selection [14], and (C) MAP-Elites [6]. them locally. For this reason they are called illumination algorithms [6] or quality diversity (QD) algorithms [7], [8]. It is important to note that in multimodal optimization, the primary interest is optimization, i.e., we are interested in maintaining diversity in order to find the peaks of the underlying genotype or phenotype space. On the contrary, the primary goal of illumination algorithms is not optimization, but diversity [8].
Whereas NS and its variants are governed by dynamics that continually push the population to unexplored regions in behavior space, MAP-Elites (Alg. 2) uses a conceptually simpler approach: it discretizes the d-dimensional behavior space into unique bins ( Fig. 1 left) by dividing each dimension into a specified number of ranges (n 1 , ..., n d ); it then attempts to fill each of the d i=1 n i bins through a variation-selection loop with the corresponding genotype and its fitness, replacing a less fit one if it exists. The final result is an archive that contains the elite solution of every niche in behavior space. Being an illumination algorithm, MAP-Elites aims to return more information than multimodal optimization algorithms, i.e., solutions that are the best in their local region, but not necessarily ones where the fitness gradient is (near) zero (Fig. 2C). The maximum number of solutions that can be returned (i.e., the number of niches) is controlled by the number of discretization intervals (i.e., k = d i=1 n i ) and is a userdefined input to the algorithm. Put differently, the problem solved by MAP-Elites is "find k (e.g., 10,000) solutions that are as different and as high-performing as possible".
MAP-Elites has been employed in many domains. For instance, it has been used to produce: behavioral repertoires that enable robots to adapt to damage in a matter of minutes [25], [26], perform complex tasks [27], or even adapt to damage while completing their tasks [28], [29]; morphological designs for walking "soft robots", as well as behaviors for a robotic arm [6]; neural networks that drive simulated robots through mazes [7]; images that "fool" deep neural networks [30]; "innovation engines" able to generate images that resemble natural objects [31]; and 3D-printable objects by leveraging feedback from neural networks trained on 2D images [32].
The grid-based approach of MAP-Elites requires the user to only specify the number of discretization intervals for each dimension, making the algorithm conceptually simple and straightforward to implement. However, this approach suffers from the curse of dimensionality, since the number of bins increase exponentially with the number of feature dimensions. The increase in the number of niches results in reduced selective pressure, making the algorithm unable to cope with high-dimensional feature spaces even when memory is not a problem. For this reason, MAP-Elites has only been employed in settings with low-dimensional feature spaces (2 to 6 dimensions). However, scaling to high dimensions is a desirable property, as this would potentially allow MAP-Elites to be used with more expressive descriptors and create archives of better quality and diversity.
A way to address this limitation is by employing a method that maximally spreads a desired number of niches in feature spaces of arbitrary dimensionality. In this paper, we achieve this using a technique from computational geometry known as centroidal Voronoi tessellations (CVTs) [11]. In particular, the contribution of this paper is two-fold: (1) we introduce a new algorithm that we call "CVT-MAP-Elites" (Fig. 1 right) and demonstrate its advantage over MAP-Elites in a maze navigation task and the simulated hexapod locomotion task of [25]; (2) we propose a new methodology for assessing the quality of the archives produced by illumination algorithms.

II. CENTROIDAL VORONOI TESSELLATION MAP-ELITES
A Voronoi tessellation [33] is a partitioning of a space into geometric regions based on distance to k pre-specified points which are often called sites. Each region contains all the points that are closer to the corresponding site than to any other. If the sites are also the centroids of each region (and the space is bounded), then the Voronoi tessellation is the CVT [11] of the space (Fig. 1 right). CVTs have found application in problems ranging from data compression to modeling the territorial behavior of animals [11].
There exist various algorithms for constructing CVTs [35]. Lloyd's algorithm [36] is the most widely used in 2D spaces and consists of repeatedly constructing the Voronoi tessellation of the k sites, computing the centroids of the resulting Voronoi regions and moving the sites to their corresponding centroids. However, explicitly constructing Voronoi tessellations in highdimensional spaces involves complex algorithms [33]. An C ←− update centroids(I) 7: return centroids C alternative, simpler approach is to use Monte Carlo methods to obtain a close approximation to a CVT of the feature space [34]. In this work, we follow this approach and construct such an approximation using Alg. 1 (adapted from [34]). The algorithm first randomly initializes k centroids (line 2) and generates K random points (K >> k) (line 3) uniformly in the feature space according to the bounds of each dimension (e.g., the feature space could be defined in [0, 1] d ). The algorithm then alternates between assigning each point to the closest centroid and updating each centroid to be the mean of its corresponding points (lines 4-6). This procedure is analogous to using a clustering algorithm (such as k-means [37]) to find k clusters in a dataset that contains many well-spread points. Therefore, constructing a CVT can intuitively be seen as forcing the k sites to be well-spread in the space of interest. x = selection(X ) 8: x = variation(x) 9: ADD TO ARCHIVE(x , X , P) 10: return archive (X , P) 11: procedure ADD TO ARCHIVE(x, X , P) 12: 14: if P(c) = null or P(c) < p then 15: CVT-MAP-Elites partitions the d-dimensional feature space into k Voronoi regions and then attempts to fill each of the regions through a selection-variation loop. Algorithmically, it first obtains the coordinates of the k centroids (C; Alg. 3, line 2) by constructing the CVT as described above (Alg. 1). It then creates an empty archive with capacity k (X and P store the genotypes and performances, respectively). The algorithm then evaluates G random parameter vectors (x), simultaneously recording their performance (p) and feature descriptor (b), i.e., their location in feature space (Alg. 3, 4-6). Next, it finds the index (c) of the centroid in C that is closest to b (Alg. 3, line 14), which implicitly gives information about its Voronoi region. If the region is free, then the algorithm stores the parameter vector x in that region; if it is already occupied, (X , P) ←− create empty archive(k) 4: for i = 1 → G do Initialization: G random x 5: x = random solution() 6: ADD TO ARCHIVE(x, X , P) 7: for i = 1 → I do Main loop, I iterations 8: x = selection(X ) 9: x = variation(x) 10: ADD TO ARCHIVE(x , X , P) 11: return archive (X , P) 12: procedure ADD TO ARCHIVE(x, X , P) 13: if P(c) = null or P(c) < p then 16:  [38], [33] or O(k) in the worst case.

III. EVALUATION
To assess the scalability of CVT-MAP-Elites, we experiment with maze navigation and hexapod locomotion tasks. Both classes of tasks have been successful in testing the ability of illumination algorithms to explicitly generate archives that contain many diverse and high-performing solutions [25], [8].
A few metrics have been used in the literature to evaluate illumination algorithms [6], [7], [8], many of which utilize the MAP-Elites grid. For example, "coverage" [6] measures the expected number of cells an algorithm can fill using a specific descriptor, while "quality diversity score" [7], [8] projects the descriptors to a certain feature space and calculates the sum of the fitness values stored in each cell.
These metrics, however, have two disadvantages that prevent us from using them. First, they are dependent on the behavior space and a particular discretization of MAP-Elites, whereas we would like to compare not only different EAs, but also spaces of different dimensionality (e.g., 2D vs 1000D). Second, they do not explicitly assess if the archives contain the "right" diversity, where "right" here is task-specific. For example, in our maze navigation experiments, we would like the resulting archives to contain controllers that take the robot from the starting position to the goal using different trajectories. In our hexapod locomotion experiments, we are interested in collecting diverse and high-quality solutions that would be useful even if the dynamics of the robot change due to some damage.
Since the aforementioned metrics cannot work in our case, we propose the following methodology for assessing the quality of the archives produced by each EA-descriptor pair P i : in an analogy with supervised learning scenarios, training is done during the standard evolutionary phase, while testing for generalization is done during an evaluation phase in settings not experienced during evolution. Each evaluation setting slightly modifies the simulator in order to test whether different classes of behaviors are found by P i , without however changing the fitness function (i.e., the way individuals are rewarded). For example, in the maze navigation experiments, an evaluation scenario e would correspond to changing the environment by blocking certain paths of the maze in order to test whether controllers that achieve a particular trajectory are found by P i . In the hexapod locomotion experiments, e would correspond to changing the dynamics of the robot by removing a leg, in order to test whether P i has found controllers that perform well in spite of this damage. For each P i , we generate multiple archives from independent evolutionary trials, in order to make statistical assessments, and use the following metrics: A. Expected best performance of an EA-descriptor pair in an evaluation scenario e In order to calculate the "best performance" of an archive returned by P i , we exhaustively 1 evaluate all solutions contained in the archive by simulating the given evaluation scenario e, and return the fitness value of the fittest solution. To calculate the "expected best performance" of P i on e, we perform this procedure on the archives returned from all evolutionary runs and take the median value. For example, in the maze experiment where e corresponds to allowing only a single open path towards the goal, this measure would capture whether P i was able to find at least one solution (controller) that makes the robot follow this path, even if this solution is the only one in the archive that achieves this. Intuitively, this metric asks: can P i find a solution for a test problem?

B. Expected quality of an EA-descriptor pair
The "expected best performance" metric is the extreme case of a more general metric that asks: how many solutions can P i find for a test problem? Since in our case "solving a problem" is not a discrete event, i.e., the fitness values are continuous variables, we can define this metric to be the probability that the fitness value X drawn from the archives produced by P i is less than or equal (if we are minimizing; greater if we are maximizing) to a certain value x, where x is task-specific. If we consider not just one value for x, but a range of values, we can generalize this metric by generating the cumulative distribution function (CDF), for a minimization problem, or the complementary CDF (CCDF), for a maximization problem. That is: We calculate these functions by first creating a set of possible fitness values for the given task (e.g., x ∈ {0, 1, 2, ..., 400}), and then for each of these values (x) we calculate the ratio of solutions from the archive that have a fitness value less than or equal to x in the case of CDF, or greater than x in the case of CCDF. We perform this procedure on the archives returned from all independent evolutionary runs of P i (in order not to depend on a particular archive), as well as all evaluation scenarios, and record the median ratio for each possible fitness value. For example, if there are 20 evolutionary runs and 10 evaluation scenarios, this means that the median ratio is calculated over 20 × 10 = 200 numbers, for each possible fitness value.
By querying the CDF or CCDF for a certain fitness value that is considered "good" in a given problem is akin to asking: what is the expected percentage of good solutions returned by P i ? For example, suppose that we are comparing P 1 and P 2 in our hexapod locomotion task (i.e., maximize walking speed). If we consider a walking speed of at least 0.3 m/s to be wellperforming in our task, we can query the CCDF tables of 3), this means that P 2 is not as effective as P 1 for the same performance level, thus, we can say that the quality of P 2 is lower than the quality of P 1 for a walking speed of at least 0.3 m/s in the considered scenario.

IV. MAZE NAVIGATION EXPERIMENTS
A. Experimental Setup 1) Simulation and Fitness Function: We use a maze navigation task where a simulated mobile robot (Fig. 3a) is controlled by an artificial neural network whose architecture and parameters are evolved (for parameter settings, see Appendix C). The robot starts from the bottom of the arena and needs to reach the goal point at the center (Fig. 3b). At every simulation step, the Euclidean distance between the current position of the robot and the goal point is measured. The fitness function is the smallest distance achieved over the robot's lifetime [39], which is set to 3000 simulation steps.
2) Evaluation phase: We evaluate the archives of the EAdescriptor pairs in 16 different environments (shown in Fig. 4). These environments are created by selectively blocking the openings of the "open" maze environment to effectively allow only one realizable trajectory to the goal per environment.  4) Generating centroids: The CVT algorithm relies on randomly sampled points that are clustered to find well-spread centroids. For the maze task, we cannot generate random trajectories by straightforwardly connecting random points in the arena because such trajectories would not match the physical constraints of the robot, which cannot teleport from any point to another in a single time step; and we cannot use a basic random walk because it would need too many samples to cover the space well. Instead, we generate a random point of the trajectory (e.g., the 42nd time step) within the bounds that are physically possible with the robot (here, 42 times ±2 units from the starting point, while staying within the bounds of the arena, that is, 1000 × 1000), then we generate another random point of the trajectory (e.g., the 23rd) with updated constraints (23 times ±2 units from the starting point, and 19 times ±2 units from the 42nd point), etc. We continue this process until we have chosen all the points of the trajectory.

B. Results
For all experiments, we use 30 independent evolutionary trials and 200k function evaluations. The results show that 2 See Fig. 8 for how the niches look like in 2D. the expected best performance of both MAP-Elites with the 2D, 10D and 20D descriptors, and CVT-MAP-Elites with the additional 50D, 250D and 1000D descriptors remains constant over all the evaluation scenarios (Fig. 4). The median distance is less than 2 units in most scenarios, while for the 8 th and 14 th environments, the median distance is still less than 10 units which is equal to the radius of the robot.

V. HEXAPOD LOCOMOTION EXPERIMENTS
A. Experimental Setup 1) Simulation and Fitness Function: We use the hexapod locomotion task of [25] using the Dynamic Animation and Robotics Toolkit 3 . The controller is designed to be a simple, open-loop oscillator that actuates each servo by a periodic signal of frequency 1Hz, parameterized by 3 values: the amplitude of oscillation, its phase shift and its duty cycle (i.e., the fraction of each period that the joint angle is positive). Each leg has 3 joints, however, only the movement of the first 2 is defined in the parameters 4 [25]. Therefore, there are 6 parameters per leg, thus, 36 parameters for controlling the whole robot. The fitness function is the forward distance covered in 5 seconds (thus, we maximize walking speed).
2) Evaluation phase: During the evolutionary phase, we use a model of an intact robot to generate the archives, whereas, during the evaluation phase, we evaluate 6 damage cases that correspond to removing a different leg from the hexapod robot (see Fig. 7). Fig. 4: Best performance (distance to the goal, thus, lower is better) for each algorithm-descriptor pair in the "open" maze environment (leftmost column) and all 16 evaluation environments which are created by selectively blocking certain paths that lead to the center of the maze (the goal). The box plots show the median (black line) and the interquartile range (25 th and 75 th percentiles) over 30 solutions; the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually. In all cases, the median is below 10 units, indicating that the algorithms have good overall performance with all behavioral descriptors in these 1000 × 1000 square unit environments. The behavior of CVT-MAP-Elites does not deteriorate in high-dimensional spaces (e.g., 1000D), illustrating that the algorithm can scale well. The 8 th and 14 th evaluation environments, where there are big outliers, are symmetrical and seem to be the most difficult ones, as they require an almost full clockwise turn followed by an almost full counterclockwise turn and vice versa.
where b is the descriptor, C i (t) denotes the Boolean value of whether leg i is in contact with the ground at time t (i.e., 1: contact, 0: no contact), recorded at each time step (every 15 ms) and averaged over the number of time steps T of the simulation. For MAP-Elites, we use 5 discretizations per dimension, resulting in 15625 bins. • 12D subset of controller parameters that contains the phase shift for each of the 12 joints. For MAP-Elites, we use 3 discretizations per dimension (thus, 3 12 bins). • 24D subset of controller parameters that contains the amplitude of oscillation and the phase shift for each of the 12 joints. For MAP-Elites, we use 2 discretizations per dimension (resulting in nearly 17 million bins).
• Controller Parameters (36D): In this case, the behavior space is the same as the parameter space. MAP-Elites cannot be employed since even using 2 discretizations per dimension requires more than 256 GB of RAM.

B. Results
For all experiments, we use 20 independent evolutionary runs 5 and 75k generations. Overall, our results indicate that during the evolutionary phase (Fig. 6A), the median performance of the best individuals of CVT-MAP-Elites is approximately the same when using the 12D, 24D and 36D descriptors, despite the increase in dimensionality. When using the 6D descriptor, the performance is slightly better, however, this is likely due to the fact that this descriptor is calculated in behavior space, whereas the others in parameter space [23], [24]. With MAP-Elites, the performance of the best individuals when using the 6D descriptor is approximately the same as in CVT-MAP-Elites. However, when the dimensionality of the descriptor increases, the performance of MAP-Elites deteriorates significantly.  The results of both the undamaged and damaged conditions indicate that MAP-Elites-6D has higher probability of finding better solutions than all other EA-descriptor pairs (Fig. 6B). CVT-MAP-Elites-6D does not perform as well, most likely due to the fact that it uses 10k niches during evolution, whereas MAP-Elites-6D uses 1.5 times more (5 6 ); this indicates that the number of niches for CVT-MAP-Elites could be better tuned, though this is beyond the scope of this study.
The expected quality of the archives produced by MAP-Elites significantly decreases with higher-dimensional descriptors, whereas with CVT-MAP-Elites it is not (Fig. 6B). For all the damaged cases and in particular for a walking speed of 0.2 m/s, the expected percentage of solutions returned by MAP-Elites that have at least this value is 21.3% for 6D, 0% for 12D and 0% for 24D; for CVT-MAP-Elites these are 13.3% for 6D, 6.5% for 12D, 10.8% for 24D and 10.5% for 36D. Comparing the two algorithms using the 12D and 24D descriptors reveals that the differences are highly significant (p < 10 −44 , Mann-Whitney U test). Thus, randomly choosing among the solutions returned by both algorithms, we obtain higher quality ones with CVT-MAP-Elites, as the descriptor dimensionality increases.

VI. DISCUSSION AND CONCLUSION
We have shown that CVT-MAP-Elites can be applied in problems where the dimensionality is prohibitive for MAP-Elites (e.g., 1000 dimensions in the maze experiments). We have additionally demonstrated that in the hexapod locomotion tasks, CVT-MAP-Elites found gaits that were on average 1.7 to 2.1 times faster (during the evaluation settings) with the corresponding increase in feature space dimensionality. This is because CVT-MAP-Elites has a more precise control over the balance between diversity and performance. For example, there is more selective pressure for performance when randomly selecting a parent from an archive of 1000 elites than from an archive of 1 million because the niches are bigger in the former case than in the latter (an elite from a big niche "reign" on more solutions). In the extreme case of a single niche, MAP-Elites and CVT-MAP-Elites would act like a stochastic hill climber, that is, with a very strong pressure for performance. In MAP-Elites the increase in dimensionality exponentially increases the number of niches, and as these niches get filled with solutions, selective pressure for performance decreases. In CVT-MAP-Elites, by having fewer niches (thus, solutions) and keeping the same selection method (e.g., 100 parents uniformly at random at every generation), we effectively increase selective pressure for performance.
In all our experiments, we chose to perform distance calculations using the Euclidean norm because it is the distance function used in other quality diversity studies [40], [7], [3]; however, "fractional" norms (Minkowski distance of order p < 1) might be more appropriate for high-dimensional spaces, because they provide a better contrast between the farthest and the nearest neighbor [41], [42]. Nevertheless, additional experiments revealed that results are not qualitatively affected by a change to fractional norms, and, in particular, that there is no significant difference between a Euclidean and a fractional norm when considering the generalization performance (see Fig. 10, 11B, 13, 14B).
CVT-MAP-Elites is a natural extension of MAP-Elites since it behaves like the latter in low-dimensional spaces, if given the same amount of well-spread niches. In addition, it does not require any major modifications over MAP-Elites. This is because the CVT routine (not part of the main EA) is responsible for generating the centroids and only needs to run once, before the EA starts; thus, the EA only needs to load these centroids. One can easily substitute the CVT routine with their own implementation, without any change in the EA. Interestingly, such a routine could be designed in a way that places more centers (thus, more variation) along certain dimensions. To ease deployment, we provide a python script for generating the centroids (see Appendix B).
The computational complexity of the method we provide for constructing the CVT (in Alg. 1) is O(ndki), where n is the number of d-dimensional samples to be clustered, k is the number of clusters and i is the number of iterations needed until convergence. From our experiments, we noticed that the clustering does not need to be very precise when using such a large number of clusters (i.e., thousands), thus, the number of iterations i can be fixed to limit the overall complexity of CVT-MAP-Elites. Nevertheless, one could resort to other CVT construction methods, which could offer significant speedups [35]. Furthermore, finding the centroid closest to a given behavior descriptor can be accelerated using data structures such as k-d trees [38] or others that exploit the characteristics of the Voronoi tessellation (e.g., see [43], [33]).
A key factor that enabled CVT-MAP-Elites to scale to 1000 dimensions in the maze experiments is the proper sampling of trajectories to generate the CVT and, as a consequence, the generation of centroids that more closely approximate trajectories that could be followed by the robot. A naive approach for sampling trajectories would sample each element along the trajectory from [0, 1000] 2 in the 1000 × 1000 arena. This, however, would result in unrealistic centroid trajectories, because the robot cannot move in a single time step to any point of the arena (it is constrained by its maximum speed). The behavior descriptors of the hexapod experiments do not have this problem, as they are properly defined in [0, 1] d (i.e., the behavior space is dense). Nevertheless, care needs to be taken when sampling the behavior space of interest during CVT construction: the centroids extracted from the samples have to be representatives of potential behaviors that make sense in the task.
While it is easy to sample behavior descriptors in many evolutionary robotics tasks, there exist many others for which sampling realistic behavior descriptors might prove challenging, thus, complicating the use of CVT-MAP-Elites. This is typically the case for behavior descriptors that involve trajectories of dynamical systems in high-dimensional state spaces. To put it clearly, CVT-MAP-Elites allows MAP-Elites to scale up to high-dimensional spaces, but it is still a gridbased algorithm [40], which is a category of quality diversity algorithms that is especially effective when the behavior space can be divided beforehand [40]. By contrast, if the bounds of the space are unknown before evolution, or if the set of possible behavior descriptors is heavily constrained 6 , then vanilla CVT-MAP-Elites is probably not the most appropriate algorithm.
This issue can be potentially mitigated by letting the solutions generated by the evolutionary algorithm define the bounds of the space of interest. Thus, instead of pre-calculating the CVT, we can periodically recalculate it throughout the evolutionary run based on whether the bounds have changed [44]. In addition, it is possible to change the bounding volume to a different shape than a hyper-rectangle, as well as to vary the density of the niches so that to have a greater number on the outer part of the volume: such a change could potentially put more pressure towards expanding the volume and explore novel regions in behavior space. At any rate, other quality diversity algorithms could be more effective when nothing is known about the behavior space beforehand [40]. In these cases, it is worth investigating algorithms that rely on distance computations between different generated points [45] (such as Novelty Search with Local Competition [3] or Restricted Tournament Selection [14]), rather than between points and fixed centroids.

APPENDIX A ADDITIONAL EXPERIMENTS
In this section, we report additional experiments where we vary the number of niches k and the distance metrics used both when generating the centroids (during the CVT construction) and when finding the closest centroid (throughout the evolutionary run). Number of niches: In the maze navigation task, we vary k ∈ {5, 50, 500, 5000} and show results with 3 different descriptors (2D, 20D, 250D). In the hexapod locomotion task, we vary k ∈ {10, 100, 1000, 10000, 100000} using the duty factor (6D) behavioral descriptor.
Distance metrics: We use the following formula which defines the Minkowski distance of order p between two points x ∈ R d and y ∈ R d : By setting p = 2, we obtain the Euclidean distance metric. As shown in [41], as p increases, the contrast between the farthest and nearest neighbor becomes poorer, especially in higher dimensional spaces. By setting p < 1, we obtain a "fractional distance metric" [41] which has been shown to be better suitable in high dimensional spaces [41], despite not being a true metric (as it violates the triangle inequality). In both tasks, we compare the Euclidean distance metric (p = 2) with a fractional one of order p = 0.5. More specifically, in the maze navigation task, we perform such a comparison using 3 different descriptors (50D, 250D, 1000D); in the hexapod locomotion task, we use the full controller space (36D) as the behavior space. Maze Navigation Results: In the maze navigation task, the expected best performance increases with the increase of k (Fig. 9). The worst performance is achieved with k = 5, however, this is expected as it is equivalent to using a population size of at most 5. Parameter k can be set according to the user's preferences: for example, it makes sense to have a high value when visualizing search spaces [6]. The cumulative distribution function (CDF) in Fig. 11A shows that when using a k = 50, the percentage of "good" solutions (i.e., where the distance to the goal is less than 100) is higher than when using k = 500 or k = 5000. This is expected since having more niches allows the algorithm to retain more locally optimal solutions. This also shows that CVT-MAP-Elites can work well in this problem even with a population size of at most 50. Fig. 11A additionally shows that the expected quality of the archives is higher when using the higher dimensional descriptors (i.e., 20D or 250D); this is expected since the 2D descriptor cannot capture multiple trajectories reaching the same 2D location. There is no significant difference between the two distance metrics as shown in Fig. 10 and Fig. 11B.
Hexapod Locomotion Results: In the hexapod locomotion task, the expected best performance is not significantly affected when the robot is undamaged and when k ∈ {10, 100, 1000, 10000}; however, when k = 100000 there is a significant decrease (Fig. 12). In the damaged cases, the best performance is achieved with k = 10000, while k = 10 has the worst results. The complementary CDF (CCDF) shown in Fig. 14A shows that it is more preferable to use smaller k if we care about the evolutionary performance (undamaged robot). However, when we care about the generalization performance (evaluation settings -damaged cases), there is no significant difference when k ∈ {10, 100, 1000, 10000}, while k = 100000 has the worst expected performance.
The expected best performance in the undamaged case is slightly greater when using the fractional norm (Fig. 13). This is also apparent from the CCDF in Fig. 14B illustrating that the fractional norm has advantages when we care about the evolutionary performance. In the damaged cases, however, there is no significant difference between the performance values of the two norms in both the expected best performance (Fig. 13) and the expected quality of the solutions (Fig. 14B).

APPENDIX B SOURCE CODE
The source code of the experiments can be found in https: //github.com/resibots/vassiliades 2017 cvt map elites.

APPENDIX C PARAMETERS OF EVOLUTIONARY ALGORITHMS
The following table summarizes the parameters used for the evolutionary algorithms in the experiments reported in the main text.  Fig. 9: Comparison of different values of cluster parameter k (number of niches/centroids) for the expected best performance (distance to the goal) in the "open" maze environment (leftmost column) and all 16 evaluation environments each of which permits a single path to the goal (center). The box plots show the median and the interquartile range over 30 solutions, apart from the rightmost column which is calculated from the medians over all 17 environments. As k increases, the expected best performance increases as well, however, there is no significant difference between k = 500 and k = 5000. Fig. 10: Comparison of distance metrics for the expected best performance (distance to the goal) in the "open" maze environment (leftmost column) and all 16 evaluation environments each of which permits a single path to the goal (center). The box plots show the median and the interquartile range over 30 solutions, apart from the rightmost column which is calculated from the medians over all 17 environments. There is no significant difference between the two distance metrics. There is no significant difference between the two distance metrics. Fig. 12: Comparison of different values of cluster parameter k (number of niches/centroids) for the expected best performance (walking speed) in the hexapod locomotion task of the evolutionary setting (undamaged robot, leftmost column) and the evaluation settings (damaged robot). The box plots show the median and the interquartile range over 20 independent solutions (after 75k generations). Fig. 13: Comparison of distance metrics for the expected best performance (walking speed) in the hexapod locomotion task of the evolutionary setting (undamaged robot, leftmost column) and the evaluation settings (damaged robot). The box plots show the median and the interquartile range over 20 independent solutions (after 25k generations).