Calculations of Absolute Solvation Free Energies with Transformato—Application to the FreeSolv Database Using the CGenFF Force Field

We recently introduced transformato, an open-source Python package for the automated setup of large-scale calculations of relative solvation and binding free energy differences. Here, we extend the capabilities of transformato to the calculation of absolute solvation free energy differences. After careful validation against the literature results and reference calculations with the PERT module of CHARMM, we used transformato to compute absolute solvation free energies for most molecules in the FreeSolv database (621 out of 642). The force field parameters were obtained with the program cgenff (v2.5.1), which derives missing parameters from the CHARMM general force field (CGenFF v4.6). A long-range correction for the Lennard-Jones interactions was added to all computed solvation free energies. The mean absolute error compared to the experimental data is 1.12 kcal/mol. Our results allow a detailed comparison between the AMBER and CHARMM general force fields and provide a more in-depth understanding of the capabilities and limitations of the CGenFF small molecule parameters.


Bootstrapping statistics
For all statistical measures (root mean squared error (RMSE), mean absolute error (MAE), Pearson correlation (r), and Spearman's rank correlation reported in this work, we report error estimates obtained by bootstrapping.Values were selected randomly with replacement from the original dataset 1,000 times, and all statistical measures were computed for each random selection.The resulting resampled metrics were stored in a list.Subsequently, the 95% confidence interval was computed by determining the 2.5th and 97.5th percentiles of the resampled metrics.These were used as upper and lower bounds for the statistical measures.

Default switching function in OpenMM
The standard switching function of OpenMM referred to as OMMvswi in this manuscript is defined as follows * : where x = (r−r switch ) (r cutoff −r switch ) .It decreases from 1 at r = r switch to 0 at r = r cutoff .

Switching function in CHARMM
Since it is somewhat hidden in a relatively old publication, we also give the equations for the vswitch function of CHARMM as described originally by ?: Note that this function is used as S(r 2 , r 2 on , r 2 off ), where r is the distance between the two particles, r off the cutoff distance, and r on the distance where the switching region starts.The function decreases from 1 at r = r on to 0 at r = r off ., For all compounds studied in this work, the LRC was estimated in a postprocessing step ("post-calculated LRC", orange crosses); cf.main manuscript.For eleven compounds, the LRC was also calculated more correctly as follows: For each of the molecules, the ASFE was recomputed from scratch with OpenMM's LRC set to active during all MD simulations, thus including the LRC to the virial.The difference between the ASFE including the full isotropic LRC computed in this manner and the ASFE without LRC is referred to as "reference LRC" and plotted with a blue x.As one can see, the two LRCs agree reasonably well in all cases.

Figure S3 :
FigureS3: The initial comparison between results from transformato (TF) and reference results from the literature 1,2 before ensuring that identical force field parameters were used.

Figure S4 :
Figure S4: Comparison of ASFEs calculated with transformato (TF) when using two different switching functions for the Lennard-Jones interactions (OMMvswi and OMMvfswi); cf. the main manuscript.Values for all compounds in Fig. S2 -including the ones for which no experimental values are available -are shown.The green dashed line represents the linear regression line y = 0.98x + 0.25.

Figure S6 :
Figure S6: The average long-range correction ∆E LRC as a function of the number of atoms per molecule, including hydrogens, are shown as blue squares.The spread of the correction for all molecules consisting of the same number of atoms are shown as error bars.If there are less than 3 molecules with the same number of atoms, no error bars are shown.

Figure S7 :
FigureS7: Expanded statistical measures for the results reported in this study (blue) and by Mobley and Guthrie 3 (orange).From top to bottom: mean absolute errors (MAEs), root-mean-squared errors (RMSEs) and Pearson correlations.The molecules are classified according to their primary functional group.In each plot, the crosses, corresponding to the second y-axis on the right, indicate the number of molecules belonging to each group.