Machine Learning Nucleation Collective Variables with Graph Neural Networks

The efficient calculation of nucleation collective variables (CVs) is one of the main limitations to the application of enhanced sampling methods to the investigation of nucleation processes in realistic environments. Here we discuss the development of a graph-based model for the approximation of nucleation CVs that enables orders-of-magnitude gains in computational efficiency in the on-the-fly evaluation of nucleation CVs. By performing simulations on a nucleating colloidal system mimicking a multistep nucleation process from solution, we assess the model’s efficiency in both postprocessing and on-the-fly biasing of nucleation trajectories with pulling, umbrella sampling, and metadynamics simulations. Moreover, we probe and discuss the transferability of graph-based models of nucleation CVs across systems using the model of a CV based on sixth-order Steinhardt parameters trained on a colloidal system to drive the nucleation of crystalline copper from its melt. Our approach is general and potentially transferable to more complex systems as well as to different CVs.

where ∇F t (s) is the mean force in s, p b t (s) is the biased probability density sampled in the time interval of length τ , beginning at time t, and β = (k B T ) −1 with k B being the Boltzmann constant, and T the temperature.In order to merge the force contributions generated by iterative, history-dependent updates of a metadynamics bias potential into a single estimate of the force in s, we adopt a weighted sum, as proposed in the Umbrella Integration method 1,2 : where the biased probability density p b t (s) is used as a position-dependent weight function to localise the estimate of the force in the region of s sampled in the time interval [t; t+τ ].The FES in s is computed by numerically integrating the mean force ∇F t (s) t .In Figure S1, on the left column, we report for all the metadynamics simulations performed in this work, the FES reconstructed by integrating the mean force in the space of the GNN-predicted CVs, indicated as model n, and n(Q6), respectively.

Reweighting to obtain an FES function of the analytical CVs
4][5][6] Here, to perform reweighting, we take advantage of the approach outlined in Marinova and Salvalaglio 1 , where it is noted that integrating the mean force decouples the calculation of the free energy surface F (s) from the calculation of the time-dependent component of the metadynamics bias c(t). 5,6Hence we perform reweighting by assigning to any configuration extracted at time t from a given biased trajectory the weight: where V t (s) is the instantaneous total bias potential acting in the biased CVs s, and c(t) is computed from its definition as: where F (s) is estimated from the integral of the mean thermodynamic force ∇F t (s) t (Eq.S2).This approach to reweighting applies to both Well-tempered and standard metadynamics and, therefore, can be applied to postprocess all the simulations performed to validate the applicability of the GNN-model CVs to metadynamics calculations.In Figure S1, central column, we report the time series of the analytical and model CVs showing that, despite showing an excellent correlation, statistical noise prevents them from perfectly matching.On the right column of the same figure, we display the FES function of the analytical CVs obtained by reweighting from each of the independent simulations performed in this study.The setup of the four independent simulations performed in this study is summarized in Table S1, the corresponding PLUMED 7 input files are available in PLUMED-NEST (https://www.plumed-nest.org/,plumID:23.026 8).Merging Independent Metadynamics Simulations with MFI Eq. S2 can be extended to naturally include the contribution from M multiple independent simulations as shown in Marinova and Salvalaglio 1 .The mean force can be obtained as a weighted average of the contribution of different simulations following the approach proposed by Umbrella Integration. 2 In such case the weight of the k th simulation, of length T k steps, is defined as: The combined mean force obtained from M independent simulations is thus: Combining the mean force obtained from multiple metadynamics simulations enables the progressive refinement of the local mean force estimate across s, and therefore leads to the estimate of a FES that accounts for the contribution from all simulations without the need to arbitrarily aligning FES obtained from independent runs.This approach can be extended to reweighted FES by defining the weight of a configuration obtained at time t, from the k th simulation is computed as: where V k,t (s) is the total bias potential acting on s in simulation k at time t, T k is the total time of simulation k, and c k (t) is the time-dependent constant c(t) (Eq.S4) computed for simulation k.In Figure S2, we report the convergent behaviour of the reweighted FES obtained by combining an increasingly large number of frames sampled from the four independent simulations, reported in Figure S1, Table S1.In the left column, we show the combined FES obtained by reweighting the samples in the space of the model variables.In the right column, the same result is shown for the analytical variables.The FES obtained by combining all samples from the four independent metadynamics simulations is reported in Figure 7 of the main manuscript.

Figure S1 :
Figure S1: Sampling and Free Energy Surfaces from Well-Tempered and standard metadynamics.The left column reports FES computed as a function of the biased collective variables, namely the GNN-approximated n and n(Q6).The central column shows the time series of both model (light color) and analytical (dark color) collective variables.Introducing a bias potential function of the GNN-approximated variables (the model CVs), enables a reversible sampling of transitions between the liquid-droplet and crystal states.The right column displays reweighted free energy surfaces as a function of the analytical CVs, which are computed a-posteriori every 500 steps at a negligible overhead cost.All simulations setup tested achieve reversible sampling.

Figure S2 :
Figure S2: Convergent behaviour of the reweighted FES with an increasing number of samples from four independent metadynamics simulations.The left column shows the FES in the space of the model CVs.The right column displays the same result for the analytical CVs.

Table S1 :
Settings for the time-evolution of the bias potential V t (s) in metadynamics simulations 1-4, analysed in Fig S1.