Transferable Implicit Solvation via Contrastive Learning of Graph Neural Networks

Implicit solvent models are essential for molecular dynamics simulations of biomolecules, striking a balance between computational efficiency and biological realism. Efforts are underway to develop accurate and transferable implicit solvent models and coarse-grained (CG) force fields in general, guided by a bottom-up approach that matches the CG energy function with the potential of mean force (PMF) defined by the finer system. However, practical challenges arise due to the lack of analytical expressions for the PMF and algorithmic limitations in parameterizing CG force fields. To address these challenges, a machine learning-based approach is proposed, utilizing graph neural networks (GNNs) to represent the solvation free energy and potential contrasting for parameter optimization. We demonstrate the effectiveness of the approach by deriving a transferable GNN implicit solvent model using 600,000 atomistic configurations of six proteins obtained from explicit solvent simulations. The GNN model provides solvation free energy estimations much more accurately than state-of-the-art implicit solvent models, reproducing configurational distributions of explicit solvent simulations. We also demonstrate the reasonable transferability of the GNN model outside of the training data. Our study offers valuable insights for deriving systematically improvable implicit solvent models and CG force fields from a bottom-up perspective.


S-19
Table S1: Interval, RMSD cutoff, and numbers of removed configurations for each protein for the selection of data samples.The RMSD removed row indicates the number of frames removed based on the selected RMSD cutoff, while the front and end removed rows indicate the number of configurations discarded from the beginning and the end of each ensemble.This selection process resulted in 100,000 data configurations for each protein.10,596 a If listed, a 2nd interval was used to select frames after RMSD-based, front, and end configuration removal.b RMSD cutoff selected based on RMSD ranges from umbrella sampling MD (Table S2).
Selecting data samples from explicit solvent MD simulations Atomistic TIP3P S1 explicit solvent molecular dynamics (MD) simulations for chignolin CLN025 (CLN025), Trp-cage, BBA, Villin, WW domain, and NTL9 were conducted by Lindorff-Larsen et al.S2 From each of the six ensembles, we selected 100,000 configurations as data samples for training.The process for selecting data samples for each protein was as follows: first configurations were selected at a specific interval, next a small number of configurations beyond a specific peptide backbone heavy atom root-mean-squared-deviation (RMSD) distance cutoff from the folded structure x • were removed, and lastly a number of configurations were removed from the beginning and the end of the ensembles to ensure exactly 100,000 configurations for all proteins.Configurations beyond the RMSD cutoff were removed as these fell outside of the range covered by the noise configurations.Few configurations fell beyond RMSD the cutoff, if at all.All selection intervals, RMSD cutoffs, and beginning (front)-and end-of-ensemble removed frames are detailed in Table S1.
Due to the 200 ps save interval used by Lindorff-Larsen et al.S2 , our selection procedure results in the following time intervals between each data configuration: 1 ns for CLN025 and Villin, 2 ns for Trp-cage, 3.2 ns for BBA, 10 ns for WW domain, and 26.8 ns for NTL9.
k-means clustering (using the cluster.kmeansfunction of the pytraj package S3 ) was used to identify x • for each protein.The centroid frame of the most populated of k = 10 computed clusters was selected as x • for each protein.A similar process was used to identify x • in the original study by Lindorff-Larsen et al.S2 .The structures we identified as x • match those identified by Lindorff-Larsen et al.S2 Umbrella sampling simulation set-up RMSD-biased umbrella sampling MD simulations were conducted for each protein in the training set using the GBn2 implicit solvent model model.S4 We also performed similar simulations using the trained SchNet implicit solvent model.
The RMSD was computed in reference to the folded structures, x • , identified using a procedure detailed in the previous section Selecting data samples from explicit solvent MD simulations.Only peptide backbone heavy atoms were used to compute the RMSD.A harmonic RMSD-biasing potential S1)   was applied to the ith umbrella window.A total of M windows were sampled for each protein.The RMSD centers, RMSD • i , for each protein were evenly spaced at an interval of 0.05 nm within the RMSD range indicated in the tables that follow.RMSD ranges were selected to cover the full configurational space of each protein (Figure S15).
For all umbrella sampling simulations (with exceptions detailed later), a force constant k i RMSD = 15, 000 kJ/mol/nm 2 was applied to the first two (i = 1 and i = 2) and the last two (i = M − 1 and i = M ) windows within the indicated RMSD ranges.k i RMSD = 12, 500 kJ/mol/nm 2 was applied to the next two (i = 3 and i = 4) and the next-to-last two (i = M −3   and i = M −2) windows within the indicated ranges.For all other windows, k i RMSD = 10, 000 kJ/mol/nm 2 was applied.
Minimization and equilibration were performed prior to all production simulations.Minimization was performed using the minimizeEnergy method of the Simulation class in OpenMM.S5 Equilibration was performed by iteratively increasing k i RMSD to its production value.For each protein, the tables that follow detail the total number of minimization and equilibration steps and time steps, respectively.
Table S3: Selection ranges and intervals of noise sample configurations.The process for selecting noise sample configurations was as follows: first configurations were removed from the beginning of each umbrella window, next the remaining configurations were concatenated into a single umbrella ensemble, and lastly configurations were selected from the umbrella ensemble at a specific interval.Ranges of selected configurations are shown for each umbrella window, while the interval at which frames were selected are shown for the entire umbrella ensemble.An additional frame was occasionally removed from some windows to ensure 100,000 configurations for each protein.Umbrella simulations with GBn2 to generate noise configurations RMSD-biased umbrella sampling MD simulations were conducted to generate noise samples for the following training set proteins: CLN025, Trp-cage, BBA, Villin, WW domain, and NTL9.Table S2 details the simulation parameters, number of minimization steps, number of equilibration time steps, RMSD ranges, and total number of umbrella windows M used for these umbrella sampling simulations.For each protein, the interval and range of noise sample configurations selected from each umbrella ensemble are detailed in Table S3.From the resultant umbrella ensembles, 100,000 configurations for each protein were used as noise samples for training with the potential contrasting S10 method.

System CLN025
Note that for CLN025 and Trp-cage, each umbrella trajectory is of significantly longer timescale than for the remaining proteins in the training set (Table S2).While it was initially unknown what timescale would be necessary to achieve convergence of the free energy profile with respect to RMSD for each protein, later analysis revealed that such long timescales were unnecessary.As such, trajectories for the remaining proteins are of shorter timescale and save configurations at a higher frequency.Figure S15 suggests that our noise configurations sufficiently sample the RMSD phase space covered by the data configurations.Free energy profile convergence for each protein is demonstrated in Figure S16, while the process for

MD simulations for proteins outside of the training set
Table S4 details the simulation parameters, number of minimization steps, number of equilibration time steps, RMSD ranges, and total number of umbrella windows M for umbrella sampling simulations used to generate atomistic, TIP3P explicit solvent and GBn2 implicit solvent configurations for the following proteins: chignolin 1uao (PDB ID: 1uao), S13 a T8P mutant of chignolin 1uao that does not favor the native hairpin structure, S14 and a structurally-active 10-mer of the C-Jun amino-terminal kinase-interacting protein 1 (JIP1).S12 These proteins were used to evaluate the transferability of the SchNet implicit solvent model outside the training set.The explicit solvent configurations were generated by Ojaghlou et al.S12 for the JIP1 10-mer and by new umbrella sampling MD simulations for 1uao and the T8P mutant.GBn2 configurations were also generated using umbrella sampling MD simulations for all three proteins.Since there are 18 individual structures associated with the PDB ID 1uao, the structure with the lowest gas-phase energy (the 15 th structure) was used as x • .The same structure (albeit with the T8P mutation) was used as x • for the T8P mutant.The T8P mutant was generated by using the PDB Reader & Manipulator tool in CHARMM-GUI S15-S17 to mutate the selected 1uao PDB structure.While the JIP1 10-mer lacks a distinct folded state, the compact structure of Conformer 1 as identified by Ojaghlou et al.S12 was used as x • .Free energy profile convergence for umbrella sampling simulations of these proteins is displayed in Figure S17.

ML-MD simulations using the SchNet implicit solvent
Table S5 details the simulation parameters, number of minimization steps, number of equilibration time steps, RMSD ranges, and total number of umbrella windows M used for RMSD-biased umbrella sampling simulations using the SchNet implicit solvent.Umbrella sampling was conducted to enable the calculation of free energy profiles with respect to RMSD from x • for comparison to the results from atomistic TIP3P and GBn2 simulations.
Note that while our SchNet implicit solvent model was developed with the intention of using it with C36, ff14SB S11 was used for the JIP1 10-mer since the explicit solvent configurations were generated by Ojaghlou et al.S12 using ff14SB.Free energy profile convergence for these simulations is displayed in Figure S18.

Benchmarking simulations
Table S6 details the parameters of simulations used to evaluate the simulation speed of the SchNet implicit solvent.All simulations were run with a harmonic RMSD biasing potential (Eq.S1) centered at 0.0 nm from the folded structures.Compared with more extended conformations, folded structures produce more nearest neighbors per atom when defining the graph, creating a worst-case scenario for the SchNet implicit solvent model.

Loss functions
The root-mean-squared-error (RMSE) loss function was used for pre-training the SchNet S22 model to fit E GBn2 .This was implemented in PyTorch S19 as the square root of the MSELoss where N set is the total number of configurations in the training / testing sets.The dataset used to pre-train SchNet was composed of 100,000 atomistic configurations and 100,000 GBn2 configurations per protein.These were the same configurations used to train SchNet with the potential contrasting method.The testing set held aside when pre-training SchNet was composed of 50,000 configurations.As such, for the training set N set = 150, 000 and for the testing set N set = 50, 000.
The objective function used in potential contrasting is defined as where k indices over proteins in the training set.
∆F is a free parameter during training.When training SchNet with potential contrasting, the bias parameter of the final layer of the feed-forward NN is held constant to prevent it from counter-acting ∆F .u k q (x) is the potential energy function corresponding with the noise distribution of the kth protein, while u p (x; θ) = U C36 (x) + E SchNet (x; θ) is the potential energy corresponding with the learned CG distribution that approximates the atomistic distribution.U C36 is simply the potential energy computed from the CHARMM force field.S6,S7 We used the same number of data samples N k p = 100,000 and noise samples N k q = 100,000.
As such ν k = 1 for all proteins in our training set and will be omitted from all equations that follow.Equation S3 was implemented in PyTorch as the sum of individual binary cross entropy (BCE) with log-odds-ratios (logits) loss functions for each protein where σ (z) = 1/ [1 + exp (−z)] is the sigmoid function and y is a label that indicates whether x ki is a data sample (y = 1) or noise sample (y = 0).N k tot = N k p + N k q was the total number of configurations for the kth protein.Each individual BCE with logits loss term was implemented with the BCEWithLogitsLoss class in PyTorch.By inserting the logit of x ki for our learned p (x; θ) and our noise q k (x) distributions, into σ (z), it is clear that equations S3 and S4 are effectively equivalent.

An L2 regularization term λ
Nparms i=1 θ 2 i (where N parms is the total number of trainable parameters in SchNet, θ i represents the ith trainable parameter, and λ represents the regularization constant hyperparameter) was added to both loss functions (equations S2 and S4)   during training to help prevent overfitting.This was implemented in PyTorch by specifying a weight decay value within the optim.Adam class.
As discussed in the main text, the cutoff distance used for defining nearest neighbors (r cut ) and the number of interaction blocks (N IB ) are the most impactful hyperparameters for the SchNet architecture.Increasing the embedding width only appeared to increase memory usage, and as such no rigorous evaluation of this hyperparameter was performed.
With all other optimal hyperparameters selected (Table S7 - For models that we attempted to run ML-MD simulations with, the optimal value of λ was selected by running umbrella simulations for CLN025 and computing free energy profiles with respect to RMSD from x • .For the hyperparameter combination used to generate the ML-MD simulation data discussed in the main text (N IB = 3, r cut = 1.8 nm), CLN025 free energy profiles computed from ML-MD for models with different values of λ are shown in (λ = 1 × 10 −3 ), it was inaccurate and unstable for Trp-cage (Figure S22).While we also attempted simulations with larger models, we did not search for an optimal λ for these.Note that we also did not search for an optimal λ when pre-training SchNet to fit E GBn2 , as we did not run ML-MD simulations with pre-trained models.We instead simply used our initially selected value.
Both pre-training and potential contrasting were performed across two Nvidia Volta V100 GPUs using the DistributedDataParallel class.All configurations for three proteins were loaded onto each individual GPU.

Noise potential & free energy profile calculations
The multi-state Bennett acceptance ratio S24 (MBAR) and the unbinned weighted histogram analysis method S25 (UWHAM), as implemented in the FastMBAR package, S26 were respectively used to compute the noise potential u q (x) as defined in the potential contrasting loss function (Eq.S3) and the free energy profile from the states sampled in biased umbrella sampling simulations.Free energy profiles were also computed from long timescale unbiased MD simulations.The methods used to compute u q (x) and these free energy profiles are detailed in the subsections below.

Constructing the noise potential using FastMBAR
As detailed in section Umbrella sampling simulation set-up and shown in equation S1, each window in RMSD-biased umbrella sampling simulations has a different biasing potential u i bias (x) applied.The potential energy of the ith umbrella window in simulations with the GBn2 implicit solvent model is where U ff (x) is the vacuum potential energy of the solute computed from the chosen protein force field and E GBn2 (x) is the solvation free energy computed from GBn2.
Since a different potential energy function u i (x) was used for each umbrella window, a different distribution was sampled in each window.However, for training with potential contrasting, S10 the noise samples must be drawn from a single noise distribution q (x) yielded by a noise potential u q (x).As explained in the works of Ding et al.S26 and Ding and Zhang S10 , by pooling all configurations from each umbrella window, they can treated as if they had been drawn from a single generalized ensemble p gm (λ = i, x) ∝ exp (−β [u i (x) + v i ]).Here, λ is simply a discrete value representing the index of the ith state of p gm (λ = i, x).The marginal probability density p gm (x) on x in p gm (λ = i, x) is yielded by the potential energy where v i is a fitted biasing energy and M is the total number of states.Fitting of v i is conducted so that the relative free energy G i of the ith state of p gm (λ = i, x) matches the relative population of configurations in the ith state of p gm (λ = i, x).Thus, v i must satisfy where G * i is the relative free energy of the proteinsolvent system in the ith thermodynamic state / umbrella window with the potential u i (x).
These v i 's are fit by minimizing the convex objective function where here N tot = M i=1 n i .By solving for v i we can compute u gm (x) as shown in equation S7 and treat the marginal probability density p gm (x) as our noise distribution q (x).Thus, u gm (x) is our noise potential energy u q (x).
In practice, fitting v i and computing u q (x) was done using the FastMBAR package.S26 This process of fitting v i to compute u q (x) was done for each protein in our training set.As such, the kth protein has its own noise distribution q k (x) yielded by u k q (x).These u k q (x)'s were used in the potential contrasting objective function (Eq.S3).

Free energy profile calculations
Free energy profiles were computed with respect to backbone heavy atom RMSD from the folded configuration x • and with respect to the radius of gyration (Rg) of the backbone heavy atoms.The methods used to compute these free energy profiles are detailed below.evaluated using TICA, and low quality noise should be replaced with more realistic noise generated using more rigorous enhanced sampling techniques.
To further investigate causes of decreased accuracy for BBA, WW domain, and NTL9, we trained three protein-specific SchNet models as implicit solvents using the potential contrasting method.We first pre-trained the three protein-specific models to fit E GBn2 for BBA, WW domain, and NTL9 configurations, respectively.The same hyperparameters (r cut = 5 nm, N IB = 3) used to train the transferable model were used for the protein-specific models.
Free energy profiles for the protein-specific models are displayed in Figure S3.Significant improvement over the transferable model is noted for the NTL9-specific model.While some improvement is noted for the BBA-specific model, the previously discussed poor overlap between the BBA noise and data distributions likely prevents any more substantial improvement.Interestingly, while little improvement is noted for the WW domain-specific model, the free energy profile for this model closely aligns with GBn2 from 0 to ∼10 Å RMSD from the folded state.Some alignment between protein-specific models and GBn2 also occurs for certain RMSD ranges for BBA and NTL9.This alignment is likely the result of our pre-training and parameter freezing scheme.While we have demonstrated that this scheme is essential to prevent non-physical results, it is possible that freezing all of the interaction blocks results in too much of a loss of flexibility for some of the larger proteins in our training set.
To further investigate the effect of model flexibility, we trained several WW domainspecific models using the potential contrasting method while unfreezing 1 interaction block.
Training was initialized with the same set of parameters obtained from pre-training the WW domain-specific model discussed in the previous paragraph.While all three of the resultant free energy profiles (Figure S4) are noisy to some extent, the model with the 1 st interaction block (i.e., closest to the embedding layer and furthest from outputting the final updated atomic featurizations used as inputs to the energy-predicting NN) unfrozen most closely matches the explicit solvent free energy profile.Unlike the transferable model, all three Results for the SchNet model were determined using a re-weighting scheme as detailed in the Methods section.For reference, the free energy profiles computed from umbrella simulations with the GBn2 implicit solvent model are also shown as solid lines.Results for the SchNet model were determined using a re-weighting scheme as detailed in the Methods section.
Figure S3: Re-weighted free energy profiles computed for protein-specific (denoted as "singleprotein") SchNet models trained as implicit solvents with the potential contrasting method.
Results for the SchNet models were determined using a re-weighting scheme as detailed in the Methods section.For reference, free energy profiles computed from configurations from TIP3P explicit solvent simulations conducted by Lindorff-Larsen et al.S2 , free energy profiles computed from umbrella simulations with the GBn2 implicit solvent, and free energy profiles for the transferable SchNet model (as shown in Figure 5, denoted here as "transferable") are displayed.The NTL9-specific model more closely matches the TIP3P results than the transferable model.Interestingly, from 0 to ∼10 Å RMSD, the WW domain single-protein free energy profile closely aligns with the GBn2 results.See section Possible causes of lower accuracy for BBA, WW domain, & NTL9 for more details.
Figure S4: Re-weighted free energy profiles computed for WW domain-specific SchNet models trained with the potential contrasting method while unfreezing 1 interaction block.Results for the SchNet models were determined using a re-weighting scheme as detailed in the Methods section.For reference, the free energy profile computed from configurations from TIP3P explicit solvent simulations conducted by Lindorff-Larsen et al.S2 and the free energy profile computed from umbrella simulations with the GBn2 implicit solvent are displayed.While all models correctly capture the results around the TIP3P global minimum at ∼1 Å RMSD from the folded state, the model trained with the 1 st interaction block unfrozen most closely matches the TIP3P results.See section Possible causes of lower accuracy for BBA, WW domain, & NTL9 for more details.
Figure S5: Free energy profiles for SchNet models (r cut = 5 nm, N IB = 3) trained with the potential contrasting method with different amounts of input data.The free energy profiles computed from simulations by Lindorff-Larsen et al.S2 in TIP3P explicit solvent are shown in red for comparison.Results for the SchNet models were determined using a re-weighting scheme as detailed in the Methods section.The free energy profiles for the model denoted as "200K samples" are the same as those displayed in Figure 5. From the original training data of 200,000 configurations per protein, every 10 th configuration was selected for use as training data for the data-limited model.The free energy profiles for the data-limited model are denoted as "20K samples".Due to the sharpness of the minimum in the SchNet free energy profiles for CLN025, the free energy profile was uniformly shifted downwards by 1.5 k B T for better visualization.Despite training with less data, all noise configurations were used to compute the free energy profiles for the data-limited model.The free energy profiles generated from the data-limited model are similar to those generated from the original model without data limitations, with the exception of NTL9.RMSD is computed with respect to peptide backbone heavy atom positions from x • .All TIP3P configurations for these 6 proteins were generated by Lindorff-Larsen et al.S2 Noise configurations from the GBn2 umbrella sampling ensembles fully cover the RMSD ranges sampled by the TIP3P data configurations, while also including configurations outside of these ranges.This follows best-practice for generating noise samples for training with potential contrasting.S10 Figure S16: Converged free energy profiles for all training set proteins.Umbrella sampling windows were split 5 ways to show that umbrella sampling GBn2 free energy profiles were converged.These splits were created according the following example for CLN025: S1 0 -212 ns, S2 0 -424 ns, S3 0 -636 ns, S4 0 -848 ns, and S5 0 -1,060 ns.For all proteins, the final split includes all configurations for each window.
Figure S19: Impact of the L2 regularization constants, λ, on the performance of SchNet models (N IB = 3, r cut = 1.8 nm) trained with potential contrasting.We used the protein CLN025 as an example.Parameters in the interaction blocks were frozen during optimization with potential contrasting.The explicit solvent (TIP3P) free energy profile was computed from the configurations generated by Lindorff-Larsen et al.S2 , while configurations from umbrella sampling simulations were used to compute the GBn2 and SchNet free energy profiles.The SchNet model trained with λ = 1 × 10 −6 produces a free energy profile that closely matches the explicit solvent result.See text Section: Hyperparameters for further discussion.
Figure S20: SchNet models (N IB = 3, r cut = 1.8 nm) trained using potential contrasting without freezing interaction block parameters performed poorly.We used the protein CLN025 as an example.The explicit solvent (TIP3P) free energy profiles was computed from the configurations generated by Lindorff-Larsen et al.S2 , while configurations from umbrella sampling simulations were used to compute the GBn2 and SchNet free energy profiles.No λ resulted in a SchNet free energy profile that closely matched the explicit solvent result.SchNet models with λ < 1 × 10 −4 result in simulation instability.
Figure S21: Impact of the L2 regularization constants, λ, on the performance of SchNet models (N IB = 2, r cut = 1.8 nm) trained with potential contrasting.We used the protein CLN025 as an example.The explicit solvent (TIP3P) free energy profiles was computed from the configurations generated by Lindorff-Larsen et al.S2 , while configurations from umbrella sampling simulations were used to compute the GBn2 and SchNet free energy profiles.The SchNet model with λ = 1 × 10 −3 results in a free energy profile that most closely matches the explicit solvent result.Figure S22: Free energy profiles for CLN025 and Trp-cage using a SchNet implicit solvent model with a smaller number of interaction blocks (N IB = 2, r cut = 1.8 nm, λ = 1 × 10 −3 ).All explicit solvent (denoted as TIP3P) free energy profiles were computed from the configurations generated by Lindorff-Larsen et al.S2 , while configurations from umbrella sampling simulations were used to compute the GBn2 and SchNet free energy profiles.The SchNet free energy profile for CLN025 more closely matches the TIP3P free energy profile than the GBn2 free energy profile.Simulations for Trp-cage were too unstable to continue beyond 4 ns of production MD, and as such the Trp-cage SchNet free energy profile is not likely to be converged.Figure S23: Explicit solvent (denoted as TIP3P) and GBn2 configuration TICs.TICA dimensionality reduction was performed according to the procedure detailed in section Possible causes of lower accuracy for BBA, WW domain, & NTL9 .All TIP3P configurations for these 6 proteins were generated by Lindorff-Larsen et al.S2 Noise configurations from the GBn2 umbrella sampling ensembles sufficiently cover the ranges sampled by the TIP3P data configurations for all proteins, except for BBA.
S8), several SchNet implicit solvents with different values of the L2 regularization constant λ were trained.While not found to affect re-weighted free energy profiles, λ had a significant effect on the performance of the SchNet implicit solvent model in ML-MD simulations.Unlike when performing reweighting, computing free energy profiles from ML-MD simulations involves entirely new configurations (i.e., not included in the training set).Since it is well-known that regularization can improve the ability of ML models to generalize to samples outside of the training set, this outsized role of λ is not unexpected.

Figure S19 .
Figure S19.The most accurate model (λ = 1×10 −6 ) was used for all simulations discussed in the main text.For a SchNet implicit solvent model trained with the same hyperparameters but without holding the interaction block parameters constant, free energy profiles are shown in Figure S20.Lastly, for a smaller SchNet model (N IB = 2, r cut = 1.8 nm), free energy profiles are shown in Figure S21.While one of these models was accurate for CLN025

Figure S1 :
Figure S1: A SchNet model S22 with larger r cut but smaller N IB outperforms a model with smaller r cut but larger N IB .Both models were trained to fit E GBn2 .Free energy profiles for the model with larger r cut are shown in the dotted line, while free energy profiles for the model with a larger N IB are shown in the dashed lines.For reference, the free energy profiles computed from umbrella simulations with the GBn2 implicit solvent model are also shown as solid lines.The x axes in the plots correspond to the RMSD from the folded structures.Results for the SchNet model were determined using a re-weighting scheme as detailed in the Methods section.

Figure S2 :
Figure S2: SchNet models trained to fit E GBn2 with different values of N IB perform similarly.Free energy profiles are shown for all models.The cutoff distance is held constant at r cut = 1.8 nm.The x axes in the plots correspond to the RMSD from the folded structures.For reference, the free energy profiles computed from umbrella simulations with the GBn2 implicit solvent model are also shown as solid lines.Results for the SchNet model were determined using a re-weighting scheme as detailed in the Methods section.

Figure S6 :
Figure S6: Free energy profiles for SchNet models trained with potential contrasting with different values of N IB .The cutoff distance is held constant at r cut = 1.8 nm.The x axes in the plots correspond to the RMSD from the folded structures.For reference, the free energy profiles computed from simulations conducted by Lindorff-Larsen et al.S2 in TIP3P explicit solvent are shown as red solid lines.Results for the SchNet model were determined using a re-weighting scheme as detailed in the Methods section.

Figure S7 :
Figure S7: Free energy profiles for SchNet models trained with potential contrasting with different values of r cut .The number of interaction blocks is held constant at N IB = 3.The x axes in the plots correspond to the RMSD from the folded structures.For reference, the free energy profiles computed from simulations conducted by Lindorff-Larsen et al.S2 in TIP3P explicit solvent are shown as red solid lines.Results for the SchNet models were determined using a re-weighting scheme as detailed in the Methods section.

Figure S8 :
FigureS8: Free energy profiles for SchNet models trained with potential contrasting with values of r cut from 1 to 2 nm.The number of interaction blocks is held constant at N IB = 3.The x axes in the plots correspond to the RMSD from the folded structures.For reference, the free energy profiles computed from simulations conducted by Lindorff-Larsen et al.S2   in TIP3P explicit solvent are shown as red solid lines.Results for the SchNet models were determined using a re-weighting scheme as detailed in the Methods section.

Figure S9 :
Figure S9: A SchNet model (r cut = 1.8 nm, N IB = 3) trained with the potential contrasting method without prior pre-training fails to match the free energy profiles from explicit solvent simulations.Free energy profiles for the model trained for 30 and 60 epochs are shown in light blue and light green, respectively.The x axes in the plots correspond to the RMSD from the folded structures.Performance is still poor despite the model trained for 60 epochs reaching a similar value of the potential contrasting loss function (ℓ tot = 0.041) to the model used for ML-MD simulations in the main text (r cut = 1.8 nm, N IB = 3; ℓ tot = 0.038).For reference, the free energy profiles computed from simulations conducted by Lindorff-Larsen et al.S2 in TIP3P explicit solvent are shown as red dashed lines, and the free energy profiles computed from umbrella simulations with the GBn2 implicit solvent model are shown as blue dashed lines.Results for the SchNet model were determined using a re-weighting scheme as detailed in the Methods section.

Figure S10 :
Figure S10: A SchNet model (r cut = 1.8 nm, N IB = 3) trained with the potential contrasting method and with prior pre-training, but without holding the interaction block parameters (θ F ) constant, fails to match the free energy profiles from explicit solvent simulations.The free energy profiles for this model are shown in light blue.The x axes in the plots correspond to the RMSD from the folded structures.Notably, this model reaches a lower value of the potential contrasting loss function (ℓ tot = 0.11) than the model used for ML-MD simulations in main text (r cut = 1.8 nm, N IB = 3; ℓ tot = 0.038).For reference, the free energy profiles computed from simulations conducted by Lindorff-Larsen et al.S2 in TIP3P explicit solvent are shown as red dashed lines, and the free energy profiles computed from umbrella simulations with the GBn2 implicit solvent model are shown as blue dashed lines.Results for the SchNet model were determined using a re-weighting scheme as detailed in the Methods section.

Figure S11 :
Figure S11: The optimal SchNet model predicts solvation free energy (E SchNet ) distributions that are similar in scale to the GBn2 solvation free energy distributions (E GBn2 ).Solvation free energies are shown for a SchNet model with r cut = 1.8 nm and N IB = 3.The E GBn2 distributions are displayed as filled in, while the E SchNet distributions are shown as solid lines.

Figure S13 :
FigureS13: Re-weighted free energy profiles closely match those computed from ML-MD simulations.Explicit solvent (denoted as TIP3P), GBn2, and the SchNet implicit solvent free energy profiles (both re-weighted and from ML-MD simulations) for chignolin CLN025 and Trp-cage are displayed.The x axes in the plots correspond to the RMSD from the folded structures.TIP3P free energy profiles were computed from the configurations generated by Lindorff-Larsen et al.S2  while GBn2 and SchNet free energy profiles were computed by solving the MBAR equation for configurations from umbrella sampling simulations.The re-weighted SchNet free energy profiles (r cut = 1.8 nm, N IB = 3) were computed following the re-weighting procedure detailed in the Methods section.

Figure S14 :
Figure S14: Comparison between the free energy profiles computed from explicit solvent simulations (TIP3P), implicit solvent simulations (GBn2), and the optimized SchNet model.Results for the SchNet model were determined using ML-MD umbrella simulations.The x axes in the plots correspond to the radius of gyration (Rg) of the backbone heavy atoms.(A) Free energy profiles for chignolin CLN025 and Trp-cage.(B) Free energy profiles for chignolin 1uao and a T8P mutant.TIP3P and GBn2 free energy profiles were computed from umbrella simulations.

Figure S15 :
Figure S15: Explicit solvent (denoted as TIP3P) and GBn2 configuration RMSD histograms.RMSD is computed with respect to peptide backbone heavy atom positions from x • .All TIP3P configurations for these 6 proteins were generated by Lindorff-Larsen et al.S2  Noise configurations from the GBn2 umbrella sampling ensembles fully cover the RMSD ranges sampled by the TIP3P data configurations, while also including configurations outside of these ranges.This follows best-practice for generating noise samples for training with potential contrasting.S10 See text section Selecting data samples from explicit solvent MD simulations for further details.

Table S2 :
Parameters for RMSD-biased umbrella sampling MD simulations for proteins within the training set.
An additional frame was removed from windows in range 1 -1.45 nm.
bAn additional frame was removed from windows in range 1 -1.75 nm.

Table S4 :
Parameters used for TIP3P and GBn2 RMSD-biased umbrella sampling MD simulations for proteins outside of the training set.
e Length of each umbrella window traj.f Total #

Table S6 :
Parameters for MD and ML-MD simulations used for benchmarking.

Table S7 :
Hyperparameters explored for SchNet.S22Bolded values indicate hyperparameters for the best performing model as determined from re-weighted free energy profiles (see Figure5), while underlined values indicate the hyperparameters of the smaller model used in ML-MD simulation.

Table S8 :
Hyperparameters explored for training set-up.Hyperparameters are indicated for both pre-training (i.e., training SchNet to fit E GBn2 ) and potential contrasting.Bolded values indicate optimal hyperparameters.In cases where only a single value is listed, no further optimization was necessary beyond the initially selected value.Separate learning rate used for ∆F k 's.b Total # of configurations held out of the training set for each protein.Different random sets of configurations were held out every epoch.