MeGen - generation of gallium metal clusters using reinforcement learning

The generation of low-energy 3D structures of metal clusters depends on the efficiency of the search algorithm and the accuracy of inter-atomic interaction description. In this work, we formulate the search algorithm as a reinforcement learning (RL) problem. Concisely, we propose a novel actor-critic architecture that generates low-lying isomers of metal clusters at a fraction of computational cost than conventional methods. Our RL-based search algorithm uses a previously developed DART model as a reward function to describe the inter-atomic interactions to validate predicted structures. Using the DART model as a reward function incentivizes the RL model to generate low-energy structures and helps generate valid structures. We demonstrate the advantages of our approach over conventional methods for scanning local minima on potential energy surface. Our approach not only generates isomer of gallium clusters at a minimal computational cost but also predicts isomer families that were not discovered through previous density-functional theory (DFT)-based approaches.


Introduction
Metal clusters are aggregates of atoms, and they exhibit substantially different properties from their corresponding bulk metals [1,2]. They also differ from the nano-particles, and their extreme size sensitivity reflects in almost all their properties like melting point, reactivity, etc [3][4][5]. Clusters, as opposed to their bulk counterpart, have numerous local minima, and the number of such minima increases exponentially with the size of the cluster. It has been demonstrated that many properties of these clusters depend on their ground state (GS) structure. Hence, there has been a constant drive to develop better algorithms for the search of GS and energetically low-lying structures [6]. For example, random search provides an unbiased configuration sampling [7] while genetic algorithm combines and propagates useful structural markers [8,9]. Basin-hopping [10,11] represents an efficient technique for escaping from local minima and mapping the potential energy surface (PES) and particle swarm optimization [12,13] relies on the population information for navigating the energy landscape. However, when these techniques are combined with ab-initio methods for the description of inter-atomic interaction, they become computationally expensive and result in an insufficient sampling of PES [14,15]. On the other hand, classical potentials fall short of describing the problem with enough accuracy. Further, the search of atomic structure will be biased with the range or form of the potential used [16]. Hence faster and more reliable models need to be develop for searching the GS and low-lying isomers of metallic clusters.
Recent machine learning applications have accelerated the search for new molecules with specific desired properties. Much progress has been made in developing deep generative models for 2D molecule generation. Such as Kusner et al [17], Gómez-Bombarelli et al [18], Simonovsky and Komodakis [19], Jin et al [20], Dai et al [21] used variational autoencoder (VAE) and its variants for generation of molecules, Guimaraes et al [22], De Cao and Kipf [23] used generative adversarial network (GAN), Segler et al [24] used recurrent neural network (RNN), Popova et al [25] used reinforcement learning (RL)-RNN and Maziarka et al [26] developed Mol-cycle-GAN for generation of molecules. Most of these studies use SMILES string representation or graph representation of molecules. SMILES have various limitations, such as not capturing molecular similarity; for example, two similar chemical structures may have very different SMILES representations, hindering VAEs from learning smooth molecular embeddings. Furthermore, methods that use SMILES to generate SMILES strings are prone to generate strings that do not correspond to a valid molecule. Hence Jin et al [20], Simonovsky and Komodakis [19], Maziarka et al [26] have shown that operating directly on graphs helps generate more valid molecules.
The domain of 3D structure generation is lacking in comparison to 2D structure generation. Moreover, most of the work in 2D structure generation is SMILES string or graph-based, where an atom is considered a node, and the bond between the atom is the edge. One of the major disadvantages of string or graph-based representation is that they do not contain the inter-atomic distance or 3D information about the molecule. In contrast, chemical properties depend on the 3D structure of the molecule. Recently efforts have been made to develop a model to generate 3D structures and bias the structure generation based on chemical property. One such attempt would be to combine the method described by Mansimov et al [27] to sample 3D conformers given a molecule with graph-based generative models to generate 3D structures in a two-step procedure. Gebauer et al [28] developed an autoregressive deep generative model based on SchNet [29] which is capable of generating a variety of C 7 O 2 H 10 3D structures which are close to equilibrium. Gebauer et al [30] made improvements in the previous model to develop G-SchNet, which can deal with an arbitrary number of atom species. In inorganic chemistry, the 3D structure generation model is even less common than it is in organic chemistry. Notable mentions are iMaTGen by Noh et al [31], which learns latent space of inorganic structures using 3D images as input to generate crystal structures Kim et al [32], who is also one of the authors of iMatgen proposed a new generative model to generate crystal structures using GAN; they used a coordinate-based inversion-free crystal representation inspired by point clouds [33]. Nouira et al [34] developed CrystalGAN to produce stable ternary structures by using GAN. Court et al [35] introduced a deep-representation learning autoencoder-based generative pipeline for geometrically optimized 3D crystal structures, which also predicts the values of target properties.
In this work, we propose a 3D structure generator model named as MeGen (Metal cluster structure Generation). MeGen generates energetically low-lying 3D structures of Ga clusters in Cartesian coordinates using RL. Our model exploits the rotationally covariant state representation for 3D structure generation. We integrate this state representation into an actor-critic neural network architecture with a rotationally covariant auto-regressive policy, where the position of the atom to be placed is modeled through a flexible distribution based on spherical harmonics. The reward function is based on a fundamental physical property (energy). We use the previously developed machine learning (ML) model DART [36] to calculate the energies of various conformers/isomers generated during training. Contrary to other models that use structural information, our DART model uses features that capture topological/connectivity information to predict metallic clusters energies. As the descriptor consists of just the atom counts in each shell, it captures topological information and is known as topological atomic descriptor. DART model provides accuracy comparable to DFT at minimal computational cost. In a nutshell, MeGen is an RL-based algorithm that generates low-energy 3D structures of metal clusters with biased structure generation towards low-energy structures. MeGen employs DART-predicted energy to evaluate the generated structures. Including DART reduces the number of compute-intensive optimizations and adds a bias in the model's learning by rewarding low-energy generated structures. This makes MeGen more efficient than algorithms like basin hopping, random search, and genetic evolutions that do not have such a bias and require local structure optimization.
Our main contributions are as follows: 1. We introduce a new dataset called as Gallium Neutral Clusters, GNC version 2.0. The dataset comprises of optimized, low energy isomers of gallium clusters with size ranging from 2 atoms to 70 atoms and their corresponding energies. 2. We propose a model based on RL for generation of structures in Cartesian coordinates. 3. We accelerate the workflow of generating low energy isomers of gallium clusters and in the process we also hit the GS. 4. We demonstrate that this method generates novel structural motifs along with previously reported ones and increases the probability of finding new unique low-lying isomers for clusters.  Figure 1 provides the schmatic representation of the overall workflow proposed in this manuscript. This section describes various components of the proposed framework as shown in figure 2 to generate 3D structures in Cartesian coordinates for gallium clusters.

Dataset
In our previous work, DART is trained over GNC_31-70 data-set to predict energy for a give Ga cluster. The GNC_31-70 comprised of low energy isomers and their corresponding energies with sizes ranging from 31 to 70 atoms [36]. In this work, we added new optimized geometries of neutral Ga n clusters (size n = 2-31) to data-set GNC_31-70. The new data-set called GNC version 2.0., comprises of optimize low energy isomers of gallium clusters with size ranging from 2 to 70 atoms. The dataset has total of 8166 structures of gallium cluster. This new data set GNC 2.0. is used to train the DART which is integrated as a reward function in MeGen model. All the calculations were carried out within the Kohn-Sham formulation of DFT. Projector augmented wave potential [37,38] was used, with Perdew-Burke-Ernzerhof approximation [39,40] for the exchange-correlation and generalized gradient approximation, as implemented in planewave, pseudopotential based code, Vienna ab initio simulation package (VASP) [41][42][43].

RL formulation
RL formulations, in general, have an agent which interacts with the environment to maximize its reward. In principle, the formulation can be described as a Markov decision process (MDP).
• State s ∈ S, S is set of all possible states.
• Action a ∈ A, A is the set of possible actions.
By solving the RL problem, we aim to learn a stochastic policy π : S × A −→ R, which is a conditional probability density over actions given the current state, such that the expected cumulative reward is maximized. In other words, we aim to find a mapping from states to action, called a policy (denoted as π), with maximum achievable total reward. Most often, the policy π is parameterized using a neural network.
Similar to Simm et al [44], we begin with formulating 3D structure generation as a RL problem. As shown in figure 3, we generate gallium clusters by adding a single atom from the bag B to the existing seed structure of the gallium cluster present on the 3D canvas C. In principle, we are generating N atoms cluster from N − 1 atoms seed cluster already present on canvas C. The idea is to allow the model to learn the process of addition of a single atom to N − 1 atom cluster to generate N atom cluster. The reward function R a (s, s ′ ) is given in equation (4) as the negative of the energy calculated using DART model [36]. This allows the model to learn the most efficient ways of adding atoms in the presence of different structural motifs across different sizes/classes of gallium clusters to get low-energy gallium clusters. This will also allow the  model to access the structural motifs, which are hard to find in some classes but easily found in others. As introduced by Simm et al [44] to model action a using policy π, we need to model sub-action such as given a state s choose focal atom f to which new atom type e will be added at a particular distance d and orientation x as given in equation (1), (1) In our case equation (1) gets modified to equation (2). This is because we randomly choose focal atom f and atom type e of next atom is always gallium, π(a/s) = π(x, d, f |s) = p(x|d, f, s) p(d|f, s). (2)

State representation
As shown in figure 4, we concatenate a vectorized representation of the bag B with each atom on the canvas C. This concatenated representation is fed to the state embedding network CORMORANT [45], which then transforms bag B and canvas C to get a rotationally covariant representation s cov . We further obtain s inv from s cov by applying a combination of transformations known as τ inv from Anderson et al [45]. An invariant representation s inv is essential as the choice of distance d between the focal atom and the new atom must be invariant to rotation and translation. The position of the new atom w.r.t the focal atom needs to be covariant under rotation and dependents on distance d. Therefore we condition s cov on distance d to preserve rotational covariance (see figure 5). The number of channel τ is set to four (τ 1 , τ 2 , τ 3 , τ 4 ) in our work. Starting from seed structures, we choose one seed structure at random. We use CORMORANT to obtain rotation-covariant (scov) and invariant(s inv ) state representations. τ1, τ2, τ3, τ4 are four channels and τ inv is the combination of learnable transformation taken from Anderson et al [45] to obtain invariants from covariant features.

Actor-critic model
As shown in figure 5, the choice of focal atom f is random such that there is enough exploration and an increased number of gallium clusters generated with different structural motifs. Since the bag B contains only gallium atoms, there is not much choice in choosing the atom type for the next atom e to be placed on canvas C. The choice of distance d between the focal atom and the new atom should be invariant to rotation and translation; hence to achieve this, we use s f,e inv and a mixture density network (MDN). Distance distribution between the focal atom and the next atom to be placed is modeled using Gaussian mixtures as shown in equation (3), where π m is the mixing coefficient of the mth Gaussian N (µ m , σ 2 m ). The mixing coefficients π m , and means µ m are predicted using a MDN as shown in figure 5. The standard deviations σ m are global parameters. We clip values below zero to ensure that the sampled distances are positive, As shown in figure 5 the orientation x of the atom depends on the chosen distance d hence we condition s cov f,e on distance d to obtain orientation x. The critic needs to compute a value V for the state s that is invariant under translation and rotation.

Reward calculator-DART Model
We use DART model [36], which predicts the energy of gallium clusters as reward function R a (s, We use proximal policy optimization (PPO) to learn the optimal policy. To increase the exploration, we randomly choose a focal atom. We further have distance-based restrictions on the placement of new atom. We calculate the three nearest neighbors (nns) of the new atom. The first nn should be between distance 2.3-3.1 Å. The second nn should be between 2.5 and 3.1 Å, and the third nn should be between 2.5 and 5.5 Å. The rationale behind choosing the distance cutoffs is based on the Ga-Ga pairwise distance and the min-max observed for each of these nns. For example, the first nns min-max was 2.3-3.1 Å. The Ga-Ga pairwise distances were calculated over a range of Ga clusters. If these distance restrictions are violated, we give a penalty of −10.

Computational resources
MeGen is implemented using PyTorch [46]. It took us 6 h to train the MeGen model using a single compute node of our high-performance computing (HPC), which has Nvidia RTX-2080 Ti GPU and Intel 40 cores/CPUs (central processing unit). Once the model is trained, generating structures takes a few minutes. Further filtering and DFT optimizations of the generated structures take a day on an HPC consisting of 4 nodes, each equipped with 16 cores/CPUs. In principle, the time required to obtain low-energy structures of the clusters using a conventional DFT-based search method depends on the number of atoms in a cluster. For example, with the existing computational power, optimizing a Ga cluster with about 50-70 atoms takes about an hour or two. However, it took us about two months to generate low-energy structures of Ga2-Ga30 on an HPC consisting of 4 nodes, each equipped with 16 cores/CPUs. The computational time will only increase with the increase in the cluster size. Whereas the computational time for our workflow is independent of cluster size; hence there will be hardly an increase in the computational time.
Removal of duplicate structures from MeGen generated structures in a non-trivial task. To capture the relative positions and orientations of the atoms in a structure we calculate bond-lengths (pairwise-distance) and angle. On the basis of these bond-lengths and angle we remove duplicate structures. Removal of duplicate structures is performed as follows: 1. Calculate all pairwise distances. We get upd = N×(N−1) 2 , number of unique pairwise-distance for N atom cluster. 2. Calculate bond cutoff as shown in equation (5) where b i 1 and b i 2 are the ith bonds of structure 1 and 2, respectively, if B c <= 0 we consider the structures to be similar and place them in same group and pick only one structure from each group thereby removing the duplicate structure. 3. Further we remove duplicates using angles. 4. Calculate the angle (x) between the atom of interest and its two nns. 5. Do step 4 for all the N atoms in the cluster. 6. Sort the angles (x) and calculate cutoff angle (A c ) as shown in equation (6) where x i 1 and x i 2 are the ith angles of structure 1 and 2, respectively. 7. For an N atom cluster if A c < N Å we consider the structures to be similar and place them in same group and pick only one structure from each group thereby removing the duplicate structure,

Results
In our previous work, we trained a DART model to predict energy for a given structure. In this work, we have developed a model which predicts structures of (N) atom clusters based on information about (N − 1) atom clusters. Our MeGen model could be employed in two ways. Isomers for an (N) atom cluster could be generated using isomers of (N − 1) atom cluster as seed. Alternatively, isomers for N atom clusters can be generated from scratch, i.e. starting from two atom clusters generating three atom clusters, which would be used as the seed structures to generate four atom clusters and so on and so forth. In what follows, we demonstrate both these approaches.
To demonstrate structure generation using seed structure, MeGen is employed to predict structures of four different sizes Ga [8,18,30,57] from seed structures of Ga [7,17,29,56] respectively. The number of unique structures generated from DFT and MeGen are tabulated in table 1. The number of structures generated from our model is significantly more than the one generated through DFT based search and MeGen has always hit the GS for all sizes. Further in figure 6 we compare structural motifs of isomers generated through DFT vs that of MeGen model for Ga 8 . As can be seen in both approaches GS is identical. Even for a small size like Ga 8 , MeGen model predicts some structural motifs which are missed out by DFT based search. Further the high energy isomers of Ga 8 are observed as building blocks of larger clusters with more than 30 atoms [47].   In our second experiment we generated structures of Ga 11 from seed structures of Ga 10 . However to generate isomers of Ga 12 instead of using DFT generated structures we have used MeGen predicted optimize structures of Ga 11 as seed. This was repeated till Ga 15 , where seed structures were also generated from MeGen. Table 2 demonstrate the number of unique isomers generated through DFT vs MeGen model. As was the previous case number of generated low energy isomers through MeGen is much more than that of DFT.
Further we also investigated distinct structural families present in these unique structures. Our analysis clearly demonstrate that our model has captured structural motifs which were not captured previously through DFT simulations. For example DFT based search has predicted only two distinct structural motifs for Ga 11 where as MeGen has predicted one more. For next three sizes from Ga 12 to Ga 14 only one structural motif was captured in DFT based searches where as MeGen has predicted many more distinct structural motifs. All these structural motifs along with their energies with reference to their respective GS are presented in figure 7. The second experiment is the proof of concept that we can generate isomers along with GS from scratch. And the model is successful in capturing various structural motifs. A finer/detailed analysis of these isomer families also brings out the growth pattern in this size range.

Conclusion
In this study, a RL model, namely MeGen, was developed to generate 3D low-energy isomers of Ga clusters. MeGen is significantly more efficient than the conventional workflow for generating GS geometries as well as low lying isomers in terms of time and computational resources. Integration of the DART model as a reward function to calculate energy allows us to efficiently learn the process of addition of a single atom to (N − 1) atom cluster to generate (N) atom low energy cluster. Investigating the structure generated using MeGen indicates that the model has learned the underlying cluster growth pattern by its ability to identify valid low energy structures incorporated by the reward function of the RL model. Furthermore, the MeGen model can be extended to any metal cluster.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).