Efficient ab initio schemes for finding thermodynamically stable and metastable atomic structures: Benchmark of cascade genetic algorithms

A first-principles based methodology for efficiently and accurately finding thermodynamically stable and metastable atomic structures is introduced and benchmarked. The approach is demonstrated for gas-phase metal-oxide clusters in thermodynamic equilibrium with a reactive (oxygen) atmosphere at finite pressure and temperature. It consists of two steps. At first, the potential-energy surface is scanned by means of a global-optimization technique, i.e., a massive-parallel first-principles cascade genetic algorithm for which the choice of all parameters is validated against higher-level methods. In particular, we validate a) the criteria for selection and combination of structures used for the assemblage of new candidate structures, and b) the choice of the exchange-correlation functional. The selection criteria are validated against a fully unbiased method: replica-exchange molecular dynamics. Our choice of the exchange-correlation functional, the van-der-Waals-corrected PBE0 hybrid functional, is justified by comparisons up to highest level currently achievable within density-functional theory, i.e., the renormalized second-order perturbation theory, rPT2. In the second step, the low-energy structures are analyzed by means of ab initio atomistic thermodynamics in order to determine compositions and structures that minimize the Gibbs free energy at given temperature and pressure of the reactive atmosphere.


I. INTRODUCTION
A functional material, such as a solid-state catalyst, is in general very different from the pristine material which is initially introduced into the reactive environment. After a so called "induction period", nanostructures of various shapes and compositions, point defects, extended defects such as steps, dislocations, and stacking faults, can result from and will be modified by interaction of the surface of the catalyst with the reactive environment.
Therefore, for a functional material design, it is crucial to reliably predict stoichiometry and structure of the material under realistic conditions. However, the compositional and structural degrees of freedom at the material-gas interface result in an enormous (combinatorial) increase of the configurational space.
However, in the presence of a reactive atmosphere, clusters can adsorb species from the gas phase, changing their stoichiometry. Under certain conditions this new state, not the initial cluster, can become the active (functional) material. Thus, in order to understand the functional properties of clusters in a reactive atmosphere, it is important to know which structures and stoichiometries are energetically accessible.
In this paper, we introduce a robust first-principles methodology for the determination of (meta)stable structures at realistic environmental conditions. In particular, we address the problem of validation of methods for global structure optimization. No existing method provides criteria for identifying a local minimum on a potential-energy surface as a global one. We show how this problem can be addressed, and describe a methodology for selecting a structure optimization scheme that is more likely to find the global energy minimum for a given system with less iterations.
The paper proceeds as follows. The first step is an extensive and efficient scanning of the potential-energy surface (PES) by global structure optimization; subsequently, the influence of the experimental conditions (here, temperature and pressure of a reactive atmosphere) is included. As model system, we study gas-phase Mg M O x clusters, where M = 1, 2, 3, 10, 18 (see Ref. 7 for results on this system at other values of M ) and x is the composition which minimizes the cluster free energy at given environmental conditions. Up to now, different methodologies for global structure optimization and levels of theory for the inter-atomic interactions have been employed to study only stoichiometric, (MgO) M , clusters. 2, The previous studies employed ab initio or empirical potentials for the evaluation of the total and relative energies. Both approaches have certain advantages and drawbacks. Calculations based on empirical potential or force fields are computationally cheap and fast enough to thoroughly scan the PES, but results greatly depend upon the choice of the parameters used 12 (see our analysis and discussion below). In contrast, ab initio calculations entail a fully quantum-mechanical description of the chemical bonding and of bond breaking/forming. Thus, ab initio calculations are generally more accurate. However they are computationally expensive and the accuracy of the results depends on the exchange-correlation functional 23 , which must be approximated. This approximation must be validated against higher-level methods.
As it will become clear along the paper, the model system that we chose is representative of a class of systems for which very accurate evaluation of the potential-energy surface (PES) is mandatory for meaningful predictions (for instance, we will demonstrate that hybrid exchange-correlation functionals are required). Since high-accuracy calculations are computationally very expensive, a scheme for an efficient scanning of the PES is necessary in order to avoid wasting time with high-accuracy calculations in uninteresting regions of the configurational phase space.
The first step of our methodology, i.e., the PES scanning, was chosen to be performed via a massive-parallel cascade genetic algorithm (cGA). In such global scan, we look for the set of coordinates (geometrical structure) as well as spins that minimize the total energy of the cluster. An alternative scheme, where parallel searches at fixed spins are performed is discussed in section II A 8. The term "cascade" is used because the global-optimization is designed as a multi-step procedure involving increasing levels of accuracy for the evaluation of the globally-optimized quantity, which, in our case, is total energy.
In section II A we introduce our cGA scheme. The method is not unique, as many choices are possible at every sub-step of its implementation. Therefore, we propose a validation scheme for the identification of those crucial parameters that need a careful tuning in order to achieve an efficient and accurate estimate of the desired quantities (e.g., structure and energy of the global minimum at each cluster size). The details of this validation, which constitute the core message of this paper, are described in sections III A and III B.
As second step, the structures found via the cGA are post-processed by applying the concepts of ab initio atomistic thermodynamics (ai AT) 34 in order to analyze the (T, p O 2 ) dependence of the stable compositions and structures of the various sizes M of the Mg M O x clusters in the oxygen atmosphere. A first-principles description of the environmental effects on materials has been previously developed and appplied extendively for predicting the relative stabilities of different phases in bulk semiconductors and surface of semiconfductors and oxides. [35][36][37][38][39] In section II D we present a self-contained introduction of ai AT, here adapted to the study of the relative stabilities of small Mg M O x clusters at a finite temperature and pressure. 40 .
After showing the phase diagram of Mg 2 O x , we discuss the issue of "coexistence regions".
By coexistence regions, we mean areas of the (T, p) phase diagram where different compositions and/or different isomers of the same composition coexist, as their free energies are nearly degenerate.

A. Parallel Global Search
We have implemented a parallel algorithm for global search, tailored for cluster structures.
Building on the known genetic algorithm (GA) [41][42][43][44] approach, we designed a cascade genetic algorithm (cGA) for finding the global-minimum-(GM) and lowest local-minimum-energy structures of a cluster of certain size and composition.
GA is a global-optimization technique based on the principles of natural evolution. In general, a GA includes the following steps: An initial population is formed with a group of individuals created randomly. The individuals in the population are then evaluated via a so-called fitness function, designed to quantify how well they satisfy chosen criteria. This (scalar) fitness is the quantity to be optimized. Two individuals are then randomly selected, with weight based on their fitness, the higher the fitness, the higher the chance of being selected. These individuals "mate", i.e., are combined to create one candidate offspring, that can be in turn "mutated" randomly. In the next selection for "mating", this new individual is included in the pool of candidates and can be selected on the basis of its fitness . This continues until a suitable solution has been found or a certain number of iterations have passed, depending on convergence criteria.
In our cascade scheme successive steps employ higher level of theory and each of the next level takes information obtained at the immediate lower level. It proceeds by first performing an extensive GA pre-scanning of structures by means of a computationally inexpensive classical force field (here, ReaxFF 45,46 ). The low-energy structures found with this scanning are used as initial guess for further, extensive GA for the minimization of the ab initio total energy of the clusters. In this second GA search, we look for global minima (at different stoichiometries) of the PES's described by the the van-der-Waals-corrected 47 PBE0 48 hybrid functional (PBE0+vdW). In order to alleviate the computationally (prohibitively) expensive direct relaxation (local optimization) of structures with a hybrid functional, we found that a very efficient strategy is to relax the candidate structures by means of a lowerlevel functional, namely PBE+vdW (the van-der-Waals-corrected 47 PBE 49 functional) but the PBE+vdW energetics are not used for the evaluation of fitness function. The fitness function is evaluated after finishing an additional step of calculating total energy using PBE0+vdW at the geometry fixed to the one found in the former step (see below for details and discussion of an alternative scheme). Such an algorithm yields reliable predictions only if local equilibrium PBE+vdW structures are close to hybrid xc-functional one. This assumption has been thoroughly tested for selected Mg M O x clusters (namely, Mg 10 O 10 , MgO 4 , where we found that the difference between PBE0+vdW binding energies for structures relaxed at the GGA xc-functional level and that of at the hybrid xc-functional level are negligible. One major possible source of inefficiency in using a DFT-based search from scratch is related to the creation of the initial pool. Since the aim of the GA approach is to perform an unbiased scan of the structures, the initial pool should be ideally composed of (constrained 66 ) random structures at a given composition. However, running a geometrical optimization by starting form a random structure can require several hundreds if not thousands of optimization steps. Performing such optimization fully at the DFT level is impractical. Furthermore, a lot of computational effort may then be spent in converging the forces for configurations very far from a local minimum, where high accuracy is not needed.
A way of overcoming this problem is to perform local optimization of randomly formed structures by means of a force field. In principle this pre-relaxation could serve already as a starting point for DFT-based GA. In practice, due to the inexpensiveness of a classical force field when compared to DFT calculations, we performed a thorough GA scan with the force field and the low-energy structures generated in this way served as initial pool for the DFT scan. This choice, however, carries a potential danger, i.e., that the optimization performed at the force-field (FF) level biases the initial pool for the DFT scan, such that a relevant portion of the configurational phase space remains unexplored. In the following we show that, when using a FF as flexible as reaxFF for the FF-based initial GA, no such bias is introduced. However, we also show that the reaxFF results are subject to significant errors in the relative energies and structures. Even if a DFT post-relaxation is performed on the pool obtained via reaxFF, important structures (including the GM) can be missed.
Schematically, our cGA algorithm proceeds as follows (all terms typeset in italic will be explained afterwards): (1) Selection of a composition of the clusters and formation of an initial pool of (constrained) random structures, locally optimized by a FF.
(2) Evaluation of the fitness function for all structures (using FF binding energy).
(3) GA global optimization using the FF. This consists of the following steps (i)-(v), which are iterated until convergence.
(i) Selection of two structures (in GA jargon, parents).
(ii) Assemblage of a new trial structure (child) through crossover and mutation.
(iii) Local optimization (force minimization) of the child structure using the FF.
(iv) Evaluation of the fitness function. Comparison of the optimized child with existent structures, reject if similar: Jump to (i). If not rejected, the next step is performed.
(v) A check whether convergence has been reached. If so, FF-GA is stopped and the next step, i.e., DFT-GA, is performed.  (c) Local optimization of the child structure with PBE+vdW, low-level settings.
(d) Comparison of the optimized child with existent structures. Early rejection if similar: Jump to (a). If not rejected, the next step is performed.
(e) Further local optimization of the child with PBE+vdW, high-level settings.
(f) Evaluation of fitness function based on PBE0+vdW total energy, with the geometry found in the former step.
(g) Check whether convergence has been reached. If so, stop.
In the following, we explain one by one the keywords introduced in the scheme above.

Initial random pool
We generate structures with atoms randomly distributed on the surface of an ellipsoid with axes of random length, with the constraint that the closest distance between neighbours is larger than a certain threshold value and for Mg-O, O-O, and Mg-Mg, the threshold values were set to 1.5, 1.0 and 2.75Å, respectively. These threshold values are set as 75% of the equilibrium bond distance of the three isolated dimers. The constrained random generation algoritm is built in order to get structures in three flavours: nearly spherical (three axes of almost equal length), prolate (two short axes, one long) and oblate (two long axes and one short). The limit of prolate structures are linear structures and the limit of oblate ones are planar structures.
For benchmarking, we strictly obeyed these rules. However, in practice it is sometimes beneficial to accompany the random structures with some other structures constructed by using any available prior knowledge or chemical intuition of the system. One (obvious) example in the case of the Mg M O x clusters is constructing a parallelepipedal rocksalt-like structure for a stoichiometric cluster with a suitable number of atoms (e.g., 2×2×2, 3×4×5 ...).

Fitness function
Each cluster i in the population is assigned a normalized fitness value, ρ i , based on its total energy (binding energy for the FF): and i is the relative energy of the i th cluster as defined below: Where E i is the total energy of the i th cluster of the population and E min , E max correspond to the dynamically updated lowest and highest total energies in the population, respectively.
With this definition, low (more negative) energy clusters have high fitness and high (less negative) energy clusters have low fitness.

Selection rule
We use a "roulette-wheel" selection criterion 50 with selection probability proportional to the value of the normalized fitness function. The idea is that the lower the total (or binding) energy (i.e., large negative value) of a certain configuration, the larger the probability to be chosen from the population. A cluster is picked at random and is selected for mating if its normalized fitness value (ρ i , defined in the previous paragraph) is greater than Rand[0, 1], a randomly generated number, uniform in the interval [0, 1].
A subtle problem is related to a possible "poisoning" of the selection pool with many structures that are all similar too each other. We have noticed that, frequently, a basin in the PES contains many local minima. These minima are different enough from each other to be judged as not similar by the geometric criterion defined below; on the other hand, some persistent topological feature is shared among all such minima. In such cases, the genetic pool may be flooded by a large number of alike structures, energetically close to the running GM, due to the high likelihood that mating among similar structures produces similar structures. GA takes then significantly long time to reach another basin in the PES.
This problem is known as "too early convergence" (or pre-convergence) 43 and, traditionally, the only suggested strategy to obviate this problem is to restart GA using a set of completely different random initial structures. However, we found (see below) that by selecting with small but non-negligible probability one "bad" (high-energy) structure in the population helps in moving out to a different basin. Therefore, we define also a complementary fitness functionρ j = (1 − ρ j ) so that, with properly tuned frequency (see section III A for details), we allow for the selection of one structure with highρ j , which is combined via crossover with another structure selected via the usual high-ρ i criterion. In section III A, we show that this choice greatly helps the convergence of the GA scheme and we also show how we optimized the mixing ratio among different selection rules.

Crossover
The crossover operator takes care of combining the parent clusters selected as explained above. We have implemented three rather different crossover operators. All such operators are applied to two (one could use more than two, in principle) selected parents and the first step is always to apply to both parent clusters a random rotation around their center of geometry, which is fixed at the origin of the coordinate axes for convenience. The crossover schemes differ in the way the hard constraint of imposing to the newly assembled cluster (the child) the same stoichiometry of the parent clusters is implemented.
(i) Crossover-1: This is a combined crossover and mutation (see below). The strategy is to decouple the atomic coordinates from their species. Let us consider that atomic coordinates of a cluster are listed in a "geometry" file where each line contains the four coordinates (three space plus one spin coordinate) of an atom and a label for its species. The sequence of atomic species in such geometry file is defined once and for all throughout the GA scan.
This fixes the stoichiometry. If the cluster contains an even number N of atoms (N = M +x), a child is built by taking from parent A, after the random reorientation, the coordinates of the N/2 atoms that are above a z = z A plane , where z A is chosen in order to have precisely N/2 atoms above it. From parent B, the coordinates of the N/2 atoms that are below a suitable z = z B plane are taken. In case N is odd, the parent with the highest fitness, let's say A, contributes with one extra atom, selected by adjusting the cutting plane z = z A to the purpose. This procedure produces a list of N lines containing the coordinates. To this list, it is pasted the fixed column of the labels indicating the atomic species. Thus, some of the atoms may swap species.
After assemblage one or more pair of atoms may be found to be "too close" (closeness is defined with the same three thresholds mentioned above). This can happen only at the interface between the two pieces of the parents. In this case the two halves are pushed away along the z-axis until the minimum distance between pairs is satisfactory. Also this adjustment operation is regarded as mutation (see below).
This approach is efficient in proposing new structures with the correct composition, but often, due to the interchange of atomic species, it can destroy those local features that determine a high fitness for the parent clusters. We have seen that the use of this crossover operator promotes a rapid decrease in energy of the running GM. However, due to the builtin large variation of local composition at the interface between the halves of the parent clusters, the finding of the actual GM may be hindered.
According to our sampling, also the total spin of the clusters is left free to evolve together with the spatial coordinates of the atoms. In this way we sample on equal footing the configurational space of atomic coordinates and the spin. The crossover of the spin coordinates is performed in the following way: When we create a new child by grabbing the atomic coordinates from the parents as explained above, we also make note of the atom-projected spin moments (via Hirshfeld partitioning of the electron density) for each atom. Such spin moments are given as initial moments of the individual atoms of the child. During the optimization process, these atom-projected moments are left free to change.
(ii) Crossover-2: This is close to the cut-and-splice crossover operator of Deaven and Ho 51 .
After the reorientation, atoms with positive z-value are selected from one cluster and atoms with negative z-value are selected form the other cluster. These complementary fragments are spliced together. In this way the stoichiometry is not necessarily preserved. The choice, here, is to accept the child if the stoichiometry is preserved, otherwise reject it, select new parents, and iterate the procedure until the child has the required stoichiometry 41,43 . The advantage of this procedure is that it helps to maintain winning features of the parent molecule but most of the time it takes many attempts to obtain a valid child, even for a moderately sized cluster. In case one or more pairs of atom are too close, we adopt the same remedy as for crossover-1. The spin coordinates are taken care of the same way as in the crossover-1 case.
(iii) Crossover-3: After re-orientation of the selected parent clusters we take all the metal (Mg) atoms from one parent molecule and all the oxygen atoms from the other parent molecule. This crossover helps introducing diversity in the genetic pool, but the rate of rejection during the assemblage of the child can be rather high due to the high likelihood that two atoms are too close.
Other crossover rules can be designed for this particular system and a totally different one may be needed for a different system. For instance, we have extended our scheme to periodic structures, in order to treat, e.g., structures adsorbed on (not necessarily clean) surfaces and defects in bulk materials. This extension will be presented elsewhere 52 .
The three mentioned schemes are thus not intended to form an exhaustive set of crossover operators, but rather to propose a sufficiently diversified set, so that we can in the following meaningfully test one scheme or combinations of schemes against another in order to find which scheme, or combination, is more robust in finding the GM of the test PES's.

Mutation
After crossover, which generates a child, mutation is introduced. As for crossover, different mutation operators can be defined. We have adopted a) a rigid translation of the two halves of the clusters relative to each other (this is performed if atoms coming from the two different parents find themselves too close upon splicing of the two halves) and b) exchange of the atom species without perturbing their coordinates (this is included in crossover-1, but not performed after crossover-2 and -3). We have purposely not introduced any such mutation after crossover-2 and -3. Since we mix different kinds of crossover (as explained in section-III A) along with different selection schemes (as discussed in section-II A 3 and III A), there is always the chance that a species exchange is performed, via crossover-1.

Similarity of structures
In order to decide whether a newly found structure was already seen previously during the GA scan, after the local optimization we a) compare the energy of the new structure with that of all the others seen before and b) use a criterion based on the distances between all the atoms' pairs. In practice, we construct a coarse-grained radial distribution function (rdf) of the clusters, consisting of 14 bins conveniently spaced. Each bin contains the (normalized) number of atom pairs whose mutual distance is included between the two distances that define the boundaries of the bin. For each cluster we have then a 14-dimensional rdf-array and the euclidean distance (i.e., the square root of the sum of the squared difference between corresponding elements in the two arrays) between the arrays arranged for two clusters is evaluated. If this distance (note that it is a pure number) is greater than a convenient threshold (we used 0.01), then the structures are considered as different 67 In the opposite case, if also the energies of the two clusters are within 0.005 eV, the two structures are considered as similar.
We notice that this very simple similarity criterion uses a descriptor of the cluster, the 14dimensional rdf-array, that is invariant upon rigid translations and rotations of the cluster, and upon permutation of the atoms of the same species.

Local optimization and early-rejection scheme
We start by noticing that the child-assemblage step is usually a computationally cheap part of the algorithm (not always, see below), because it requires just some I/O manipulation. The local optimization step is the computationally expensive part of the algorithm, in particular at the ab initio level, because one needs several energy and force evaluations.
For a wide range of clusters, fully ab initio based global optimization search is computationally expensive, and becomes prohibitive rapidly for larger cluster sizes. Moreover, if the initial members of the population (randomly generated and then relaxed clusters) are far away from the actual GM, the convergence becomes extremely time-consuming. Therefore, we have adopted a classical FF (namely ReaxFF) 45,46 for performing a computationally inexpensive pre-scan of the PES of the clusters. 68 For DFT-based local optimization, we are using the "trust-radius-method enhanced version of the BFGS optimization algorithm", as implemented in the full-potential, all-electron numerical-atomic-orbitals-based code FHI-aims, 53 which is the code we also chose for the evaluation of energies and forces at the DFT level.
In the low-level settings DFT calculations, forces were converged to less than 0.01 eV/Å, using the van-der-Waals-corrected 47 PBE exchange and correlation functional 49 , henceforth labelled PBE+vdW. The grid settings were set to "light" and the basis set to tier-1 53 .
The main reason of this intermediate step is the implementation of the early-rejection scheme. Although, as shown in sections III B and III C, the geometry and the energy of the structures is not fully converged with PBE+vdW @ low-level settings, we have realized that there is a one-to-one mapping between the structures found at this level and those fully converged. In other words, if two structures are similar (according to the criterion described in section II A 6) with geometries locally optimized with the PBE+vdW @ low-level settings, then they are similar also when the geometries are further optimized at the PBE+vdW @ high-level settings (see below).
In practice, the early-rejection scheme consists in rejecting those structures that, when optimized with low-level settings, result in similar to already known structures or are more than 1.5 eV higher in energy than the current GM. With this (rather conservative) choice, we avoid the risk of rejecting structures that would eventually result in the GM or close to it. Note that child structures that are rejected at this stage because of high energy are not forgotten. For them, the energy at the PBE0+vdW @ high-level settings is in any case calculated (without further optimization) and their fitness evaluated. Thus, there is a chance that they are selected as parents (in particular in the mixed ρ i /ρ j scheme). This PBE0+vdW evaluation is necessary in order to have a comparison with the fully optimized structures using consistent quantities. Furthermore, while passing from PBE to PBE0 can change considerably the relative energies (see below), further optimizing from low-level settings to high-level settings lowers the energy by no more than 0.2 eV. Thus the rejected structure enters the competition for the selection with a fitness evaluated with a large error bar, but still acceptable. On the other hand, it would be unwise to forget such structure, as variety is one necessary ingredient of a successful GA scan.
In the PBE+vdW, high-level settings optimization, atomic forces were converged to less than 10 −6 eV/Å. The grid settings where set to "tight" and the basis set was tier-2 53 .
In cascade, for the structure optimized with PBE+vdW, high-level settings, we evaluate (without further optimization) the PBE0+vdW energy with the tier-2 basis set. This energy is later used for the calculation of fitness of that particular cluster. The difference between binding energy for an isomer optimized with PBE0+vdW forces (tight / tier 1 / forces converged to 10 −5 eV/Å) and the binding energy of the same isomer optimized with PBE+vdW (tight / tier 2 / forces converged to 10 −5 eV/Å), when the energy of both geometries is evaluated via PBE0+vdW (tight / tier 2), is small, i.e. at most 0.04 eV among all cases we checked. The computational cost of the PBE0+vdW further optimization would be thus not worthy (we estimated a gain of up to a factor 2 of overall computational time just by skipping the latter optimization) A two-level scheme, reminiscent of our cascade approach, was already introduced in Ref. 44, but our approach goes beyond that, e.g., by including the initial pre-screening of structures performed through FF-GA. Furthermore, the fine tuning of the several levels of our cascade approach (choice of the functionals, of the levels of increasing accuracy) is here motivated by carefully selected benchmark tests, discussed in section II B.
8. Fixed-spin scheme and perturbative PBE0, based on PBE orbitals An alternative to our scheme, where the spin is left free to evolve during the electronicstructure optimization, is to perform several parallel and independent searches at a given stoichiometry with different fixed spins. In this way, the spin at all levels along the cascade would be fixed, in particular PBE and PBE0 electronic structure calculations (with the same geometry) will have the same spin (as well as the same geometry). This feature allows for a fascinating possible short-cut for the estimate of the PBE0 (or, in general, of the hybrid xc-functional) energy, i.e., calculating PBE0 xc energy by using PBE orbitals, without performing the self-consistent field (scf) optimization. In practice, this would be a perturbative PBE0 calculation, using the PBE orbitals 69 as starting point. Given the perturbative nature of this energy evaluation, the spin of converged PBE and perturbative PBE0 must agree, otherwise the orbitals would be too different.
Following this approach, provided that the non-scf PBE0 (relative) energies are a good approximation of the converged PBE0 energies, the advantage in terms of CPU time is evident.
However, a) the fact that perturbative PBE0 yields reliable relative energies among the different isomers must be tested for each system and b) the speed up for the single PBE0 calculation is obtained at the expense of the necessity of performing more than one fixed spin GA for each stoichiometry, in fact c) it is not always easy to estimate the full list of possible low-energy spin states for a given stoichiometry. In summary, for those stoichiometric clusters for which no isomers with degenerate spin are found, the fixed spin + perturbative PBE0 is a practical route with computational cost lower than the variable-spin alternative. Therefore, the perturbative PBE0 energies are reliable enough for the GA scheme, in terms of energy-based likelihood of selection of isomers for the crossover. However, accurate converged PBE0 energetics must be evaluated at the end of the GA scan, although only for the low energy clusters.
For all the other cases, the large absolute errors make it impossible to rely on the perturbative-PBE0 energetics for constructing the fitness function. Furthermore, for the non-stoichiometric clusters, we observe a systematic lower number of iterations for converging the electronic structure when the spin is left free to evolve, compared to the fixed-spin case. In other words, for non-stoichiometric Mg M O x clusters, the free-spin scf optimization is more efficient that the fixed-spin one.

Parallelization
As it should be clear form the analysis of the GA algorithm steps, the operation of selecting from the genetic pool two structures for the mating and the subsequent local optimization of the child, is an operation that can be performed at any moment also when a local optimization of a child is already running. The algorithm is thus suitable for a very efficient parallelization.
It should be noted here that FHI-aims is very efficiently parallelized 54

Global convergence
A robust criterion for the convergence of a global scanning of a high-dimensional PES does not exist. An operative criterion is to continue the search until, for a "long while", no new structure with better fitness than the current optimal one is found. As "long while", one can set a time that is several times the time employed to find the current optimal structures.
In the present work we scanned for at least double the time needed to find the actual GM.
A GA algorithm applied to structure optimization is thus configured as a sequence of local optimizations, followed by large jumps in the configurational space (the generation of the child by combining two parent structures). The goal of the scanning is to find the GM, but also a large set if not all structures energetically close to the GM. In fact, it is not necessary to know a single GM structure, but also whether other nearly degenerate structures are present. The underlying assumption behind the crossover scheme is that that piecewise the geometrically localized features of the high fit structures (the target of the search) can be randomly formed during the scan (already present in the random initial pool or randomly hit by mutation) and that those "winning" local features have a high chance of being propagated, i.e., structures containing those local features have higher fit and have high chance to be selected for generating a new structure. By "local feature" we literally mean the local arrangement (e.g., bond distances and angles with first nearest neighbours) of an atom or a group of atoms.

Mechanical stability of the structures (harmonic analysis)
After the DFT-GA search is concluded, we perform a calculation of the harmonic vibrations of each structure within an energy window of 1 eV from the GM. This step has two purposes. The immediate purpose is to identify unstable structures, i.e., structures having imaginary vibrational frequencies. The other purpose is to store the frequencies for the stable structures for the evaluation of the vibrational free energy (see section II D). In case a structure is found unstable, we perturb its geometry along the unstable mode (i.e., the mode related to the imaginary frequency) and then proceed by locally optimizing it. 71 Within FHI-aims, harmonic vibrational frequencies are computed via finite differences of the analytic forces. It should be mentioned here that for MgO clusters we have found many structures that, at the usual recommended level of accuracy (really-tight grid settings, forces converged to 10 −6 eV/Å), showed soft modes mixed with the rigid-body motions. By inspection, these soft modes were identified as finite rotations of moieties such as O 2 with respect to the rest of the clusters. In order to numerically resolve the soft modes, we found that a force convergence criterion at 10 −7 eV/Å is needed as well as grid tighter than reallytight. The latter was achieved by setting a radial multiplier 53 equal to 4 (instead of the default 2) on top of the really-tight grid settings.
where β i,j = 1/k B T i,j and E i,j are the total energies of the two selected structures. This rule ensures that the sampling remains canonical at all temperatures.
In REMD, the high temperature replicas allow the sampling to quickly reach regions in the We have ensured that after finding the GM, the system was subsequently run for sufficiently long time and no new low-energy structure was found. In practice, the GM was typically found within 10 ns, i.e., the total run was at least 10 times longer than the time needed to find the GM.
Here, the purpose of our REMD sampling is to find the geometrically optimized local minima. Therefore, we have locally optimized structures coming from the low temperature replicas (thereby closer to the local minima), by extracting them at constant (frequent) intervals. Afterwards, we applied the same geometrical criterion for sorting out the similar structures as for cGA and thus constructed a pool of low-energy local minima.
The reason for the reliability granted to this scheme is that REMD samples the system without any bias, exploiting the dynamics of the system i.e., following the integration of the equations of motion. Only if the system is not ergodic (loosely speaking, its configurational space can be partitioned into disconnected regions, such that no trajectory at finite temperature can go from one such region to another), REMD is not able to explore the whole configurational space. The highest temperatures of the REMD sampling were chosen so that the cluster were molten; actually we went to near vaporization and we avoided that occasional vaporization of the clusters scattered the atoms by imposing confining walls 72 .
The result of the comparison between REMD and our cGA are reported in section III A. In order to clarify these issues, we performed two tests. In section III D we show the results for Mg 2 O x clusters, i.e., the same system on which we base most of the tutorial sections on phase diagrams.
In the first test, the lowest-energy FF-GA structures were relaxed at DFT level and the latter structures were compared to the structures found by DFT-GA for the same system.
In the second test, we compared the structures obtained with FF/DFT-GA (i.e., DFT-GA by starting with structures previously obtained by FF-GA) with DFT-only GA (i.e., DFT-GA by starting from random structures).

D. Ab initio Atomistic Thermodynamics
The determination of (meta)stable structures by means of ab initio thermodynamics has been introduced elsewhere in its original formulation for defects in semiconductors [35][36][37] , adsorption on metal-oxide surfaces 38,39 , and gas phase clusters 40,59 . Here we briefly review this method, adapting the notation to the case of Mg clusters immersed in an atmosphere composed of gas-phase O 2 . This ligand may adsorb onto the clusters: The Where µ O = 1 2 µ O 2 . Therefore, those structures that minimize Eq. 5 at given (T, µ O 2 ) will be the preferred ones at those experimental conditions. The application of ai AT to study Mg M O x clusters proceeds along the following steps: (i) Given a number of Mg atoms M , for all oxygen compositions calculate the free energy of the low-energy isomers as found in the GA scan. (ii) Compare the relative thermodynamical stability of those structures ( both stoichiometric and non-stoichiometric compositions), as a function of (T, p), i.e., construct a phase diagrams. (iii) Identify the most relevant composition that comes out to be the most stable at a given experimental condition (T, p). 40 All the free-energy terms in Eq. 5 can be evaluated at the ab initio level, below we show in detail how.
In order to evaluate ∆G f (T, p O 2 ) from Eq. 5, one needs to compute the free energies of the cluster+ligands and of the pristine cluster, and the chemical potential of oxygen. For gas-phase clusters, this can be done in terms of their corresponding partition 60 functions that requires the consideration of translational, rotational, vibrational, electronic, and configurational degrees of freedom. 34,40 . In summary, the free energy of the cluster is calculated as : 73 where E DFT is the DFT total energy, m is the mass, I A,B,C are the three inertia moments of the cluster, the ν i are the harmonic vibrational frequencies, M is the spin multiplicity, and σ is the symmetry number. Here, a term ln V (V is a reference volume), which appears in F translational is dropped, due to the fact that it cancels out when taking the difference The chemical potential of oxygen is calculated as: where ν OO is the O-O stretching frequency. The latter relation allows for the one-to-one mapping of the reactant's (oxygen) chemical potential to its partial pressure.

A. FF-GA vs FF-REMD simulation: Validation of (FF-)GA scanning
In Fig. 1 we plot the total number of force evaluations needed by two different global PES scanning methods (namely, FF-REMD and FF-GA, respectively) to locate the same global minimum. Unsurprisingly, we see in Fig. 1 that to find the GM, GA takes a significantly smaller number of force evaluations than REMD 74 . It is important to mention here that the average number of force evaluations to get one minimum using FF-REMD could depend on the provided initial random structures. In this case we have given the same initial structures to both GA and REMD. In order to alleviate the effect of the initial random structures on REMD output, we have tried three completely different set of initial random structures and performed 100ns REMD run each. What we show in Fig. 1 is the average over these three initial sets. Note however, that the order of magnitude of the number of force evaluations (i.e., MD steps) needed to find the global minimum was found to not change with the initial conditions. We have compared the performance of different FF-GA schemes among themselves and against REMD. For all the FF-GA schemes, we set a maximum number of force evaluations to 2 × 10 6 . We have found that for systems like Mg 3 O 9 , i.e., with very high O-coverage, the GM is found using crossover-1 (as discussed in methods section), but not with crossover-2 and -3 alone, as shown in the top two panles of Fig. 2. For these plots, we ran FF-GA for 2 × 10 6 force evaluations and we checked whether the GM was indeed found by comparing to 100 ns long REMD scan. In some other cases (e.g., Mg 10 O 10 ), this situation is changed.
Here, crossover-2 could find the global minimum, while crossover-1 and -3 could not. Of course, when we say that the GM is not found by some crossover operator, it means that it is not found within the defined number of force evaluations. It may well be found over (much) longer times. However, what we point out is that one of the schemes finds the GM within the given time, while the other(s) do not, over a time that is at least twice longer than the time required by the scheme that finds the GM to find it. The lesson we learn from this comparisons is that one should then combine crossover operators in order to enhance the chance to find the GM.
We note that when designing the GA scheme, one needs to realize that also the average time spent in FF-GA for generating new candidate structures is not the same for different crossover schemes. For instance, in crossover-1 there is no rejection of candidate structures, because the stoichiometry is forced by construction, while crossover-2 may use several trials before hitting the right stoichiometry. The additional time required to obtain a successful candidate using crossover-2 becomes increasingly large with cluster size, in particular for very unbalanced stoichiometries, i.e., number of oxygen (much) larger or smaller than the number of magnesium atoms. In the case of FF-GA, the time spent in creating a successful candidate with crossover-2 becomes so large that it is even comparable to the time spent in the subsequent local optimization. In Fig. 3, we have compared the number of force evaluations performed by crossover-1 and crossover-2 after a given CPU time, for a FF-GA scan. It is clear that crossover-2 always performs less number of force evaluations, irrespective of size of the cluster. The additional time needed by FF-GA using crossover-2 is due to many rejection moves when the correct stoichiometry is not matched after performing the cut-and-splice operation. In the case of DFT-GA this problem is less important, since force evaluations are orders of magnitude more expensive than for FF-GA. Since however any single replica in DFT-GA is performed on several CPUs, if the crossover move is performed on a single CPU then also in this case the time spent in blindly producing the right stoichiometry may be noticeable.
The choice of the crossover scheme is not the only tuning parameter that determines the efficiency of the GA scheme. Another crucial parameter is the selection criterion. If the parent clusters are selected for the crossover with probability proportional to their fitness function, and many similar structures populate the high fitness (low energy) region of the genetic pool, then the selection always fishes among very similar structures and escaping from the basin may be very difficult, also with different mutation schemes. The situation in which there are many similar structures with similar energy is not unusual for clusters.
In facts, we have observed it in many cases. We have investigated the benefits of adding the possibility for selecting low fitness function structures, in order to ensure more diversity in the crossover scheme. We show the results for the case of Mg 18 O 18 . In practice, since the fitness function has values between 0 and 1, we have defined a complementary fitness function,ρ i , such that the higher-energy structures have high likelihood to be selected, and vice versa for the low-energy structures. We have tried a scheme with 50% likelihood to select two structures according to the fitness function ρ i , and 50% probability that one structure i is selected by using the usual fitness function ρ i and the other structure j by using the complementary fitness functionρ j = 1 − ρ j . Another scheme uses 10% of the ρ i /ρ j selection criterion and 90% of the usual ρ i /ρ j criterion. These two variants are compared to the usual scheme (100% the ρ i /ρ j criterion), using REMD as reference for the GM. The results are shown in Fig. 2, bottom panel. We find that, while neither the normal nor the 50%-mixing scheme is able to find the global minimum in the allotted time, the 10% scheme eventually finds the global minimum as predicted by REMD. We conclude that it is beneficial that a small percent of high energy structures are used to form new candidate structures.

B. Ab initio GA and DFT functionals
As already mentioned in section II A, we performed a cascade algorithm, using increasing level of accuracy. The preliminary scan is performed using FF-GA. The set of lowest bindingenergy structures, within 4 eV from the GM, obtained from FF-GA is used as initial pool for a DFT-based GA structure search. To give a flavor, this threshold selects for the DFT-GA initial pool 20 structures for a small system like MgO 4 and 300 for the largest system we report here, Mg 10 O 10 . In practice, the selected structures are at first optimized at the DFT level. In some cases (e.g., for Mg 10 O 10 ), this operation already yields the actual GM, as confirmed by our DFT-GA and also found in other studies for stoichiometric clusters. 11,12 But, in most of the cases the lowest energy of such structures turned out to be more than 2 eV higher in energy than the actual global minimum.
In Fig. 4 we elucidate the reason why we need to go down to the highest level of ac- Here we underline that our GA performs the optimization in the coordinate and spin space at the same time. We find that while stoichiometric clusters are always singlet in the ground state, non-stoichiometric clusters exist typically in different spin states, almost degenerate. This delicate issue is the object of a different publication 7 . is defined as: where E(MgO x ), E(Mg) and E(O 2 ) are the total energies of MgO x , Mg and O 2 , respectively. At each stoichiometry, the same geometry was used ofr all functional, namely the PBE0+vdW GM.
where for both MgO x and MgO x−2 the GM structures were used.
We find that PBE+vdW strongly overestimates the adsorption energy at larger x, resulting in a qualitatively incorrect prediction that O 2 adsorption would be favored over desorption up to a large excess of oxygen. Such behavior is not confirmed by hybrid func- This is a remarkable failure of a strategy which post-optimizes via DFT a set of FF structures. We further analyze this aspect with the aid of the bottom panel in Fig. 8. The GM as found by DFT (structure A) is quite different from the second most stable in (DFT) energy hierarchy (B). The latter is also found by optimizing some of the low-energy FF structures. In order to show that structure A cannot be found by post-optimizing via DFT FF-optimized structures, we have performed the following test. Isomer A was optimized via FF. This gave isomer C'. The latter was then optimized via DFT. This gave isomer C, which is indistinguishable from C' upon visual inspection, but rather different from A. We conclude that the FF used in the example is totally blind to Mg 2 O 6 DFT-GM. Note that for those "DFT-GA" structures that are also found by minimizing "reaxFF-GA" structures, The tests were performed also at other sizes, but all the main features are shown in the presented example. In particular reaxFF is in general blind to the DFT-GM (in the sense clarified above) for non-stoichiometric structures. This example is strictly valid only for reaxFF, so one could think in principle that a way to avoid DFT-GA is to use a better potential or to fit a potential on-the-fly with a training-set based on the DFT structures already found. The caveat we put forward, though, is that for clusters and in particular small clusters, a classical potential may be simply never be able  In a pure oxygen environment, from Eq. 5, we see the µ O variations (i.e., ∆µ O ) is restricted to a finite range and the oxide will decompose into bulk cluster and oxygen, if ∆µ O is less than the so called "O-poor limit". 39 Therefore, the oxide is only stable if the following relation is valid: We now rewrite µ O with respect to µ ref as below: where being the zero point energy of oxygen molecule. Thus, Eq. 10 becomes: In the (T, p O 2 ) phase diagrams, the "O-poor" limit is the boundary between Mg The "O-rich limit" 39 , ∆µ O = 0, refers to a condition where oxygen gas is in equilibrium with O 2 droplets condensed on the clusters. This leads to the other restriction: Combining the above two equations we get the range of ∆µ O : From Eq. 7, we get the another expression for ∆µ O as a function of T and p O 2 : I.e., by using µ O 2 (T, p O 2 ) as expressed in Eq. 7: Since, as we see from this above expression, oxygen condensation point depends on the chosen temperature and pressure, the "O-rich" line is not a straight line. The limiting boundary, ∆µ O = 0, is shown as a black line in Fig. 9.

G. Coexistence regions and reactivity of the clusters
In the phase diagram as shown in Fig. 9, the unlabeled regions in sand color are those where clusters with different stoichiometry are found with free-energy difference less than 3k B T . When the difference in free energy between two structures is 3k B T , the ratio of the Boltzmann factors is ∼ 20, i.e., the system is 20 times more often in the low-energy state that in the high energy one. If these two are the only states, this ratio is reflected in a population of ∼ 5% in the high energy state and ∼ 95% in the low-energy one. The value of 3k B T was indeed chosen in order to have a significant population of the high energy state. This is one manifestation of the unusual properties of matter at the (sub-)nanoscale. At the thermodynamic limit, coexistence regions are collapsed to a line, while for systems with few degrees of freedom, these are typically extended regions. The compositions competing in a given coexistence region are at least those adjacent to such region. However, more compositions can contribute to the population. In order to find them, one has to resort to a more detailed diagram, like those of Figs. 9.
Besides the regions where different stoichiometries compete, we have also marked those region at fixed stoichiometry where two or more isomers compete.
In both cases the dynamic transformation between structures (the transition barriers and rates among structures will be assessed elsewhere, but we have noticed that these barriers are typically quite low for small clusters) may enhance the reactivity of the involved clusters. If a second reactant (i.e., besides oxygen) is added to the gas phase after equilibrium between Mg clusters and oxygen is achieved, in the coexistence regions the new reactant will find a highly dynamical environment. Extending the observations of Reuter and Scheffler 63 made on extended surfaces, it is therefore arguable that the reactivity of the clusters towards the new reactant will be enhanced in the coexistence regions.

IV. CONCLUSIONS
In summary, we introduce and thoroughly benchmark a methodology for the efficient and accurate scanning of ab initio free-energy surfaces.
We demonstrate our methodology by studying gas-phase metal-oxide clusters in thermodynamic equilibrium in an oxygen atmosphere. This choice is representative of a class of systems for which extensive and very accurate knowledge of the potential-energy surface is mandatory for meaningful predictions.
At first, we scan the potential energy surface of the clusters at different stoichiometries by employing a massive-parallel "cascade" genetic algorithm (cGA), together with ab initio atomistic thermodynamics (ai AT The cGA framework is thoroughly validated by comparing it with a reliable and unbiased scheme for the sampling of the PES, i.e., replica-exchange molecular dynamics.
We find that only with certain choices of the parameters the cGA scheme is able to always locate the GM. In particular, we have shown the benefits derived from using mixed schemes for selection, crossover, and mutation operations. Alternative and possibly more efficient selection, crossover, and mutation operations, in particular beyond the fixed composition constraint are under study in our group.
The cGA output structures are then examined by applying the concept of ab initio atomistic thermodynamics aiAT in order to analyze the temperature and pressure dependence of the composition, structure, and stability of the various isomers for each size of the clusters.
In order to exemplify our methodology, we have applied it to the case of small Mg M clusters in oxygen atmosphere.
The methodology we introduce is a necessary step in order to understand the (meta-)stability of the clusters in view of their employment for heterogeneous catalysis.
Here, we show as an example that Mg 2 O x cluster display large areas in the (p, T ) phase diagrams at various size in which clusters with different compositions and/or structures can coexist. These regions are suggested as promising regions for an enhanced reactivity of the ensemble of clusters. In Fig. 11 we show, for the case of Mg 2 O x the effect of the inclusion of the terms F translational , F rotational , and F vibrational in the free energy of the clusters (see Eqn.6). We find that neglecting all terms, or including only some of those, results only in slight changes in the phase diagrams, comparable to the differences between PBE0+vdW and beyond-RPA diagrams. Focusing on the effect of including or not F vibrational , we note that a minor shift of the stability regions of the different compositions. This shift is such that higher stoichiometries become stable at lower temperatures and higher pressures when F vibrational is switched on (i.e., stability boundaries shift "up and to the left" in the plots). This is due to the fact that the adsorption of one extra oxygen molecule causes a lowering of F vibrational as new vibrational modes are added to the system. We also considered the exclusion of F vibrational for Mg 10 O x and also in this case the perturbation to the phase diagram was barely visible, but in the same direction as for Mg 2 O x . Thus, the observation does not change with the size of the cluster. In this paper, we consider only the case of harmonic contributions to the vibrational free energy. This is not always a good approximation 40 , in particular at higher temperatures. Strictly relying on harmonic free-energy is thus a possible limitation of our methodology: An efficient strategy when large anharmonic contributions are present, but the structures do not melt, is to evaluate the excess anharmonic free energy through thermodynamic integration, as outlined in Ref. 7 . This approach will be analyzed in detail elsewhere.

Effect of O 2 -binding-energy accuracy
As can be seen in Fig. 6 The check of the stability of a structure could be performed also within the GA scan, namely between step 6.e and 6.f. In this case, one would accept stable structures and proceed, while unstable structures would be perturbed along the unstable mode and further geometry optimization would be performed, until a local minimum is found. However, the evaluation of the harmonic frequencies, at the necessary accuracy, is extremely expensive, and it is more convenient to accept some possible saddle points in the pool and perform the stability analysis afterwards. 72 In practice we imposed a smooth confining potential that is activated only when the distance of one atom from the center of mass of the whole system is larger than a certain relatively large distance, namely 3 times the typical bond distance multiplied by the cubic root of the number of atoms. With the idea that particles stay within a sphere that grows with a volume proportional to the number of atoms 73 The expression for F rotational is valid for non-linear structures; for the (few, in our case) linear structures, it is modified into: −k B T ln( 8π 2 I A k B T h 2 ) 74 REMD gives much more information than the local minima 40 . This information was not used