Assessing the Accuracy of Inhomogeneous Fluid Solvation Theory in Predicting Hydration Free Energies of Simple Solutes

Accurate prediction of hydration free energies is a key objective of any free energy method that is applied to modeling and understanding interactions in the aqueous phase. Inhomogeneous fluid solvation theory (IFST) is a statistical mechanical method for calculating solvation free energies by quantifying the effect of a solute acting as a perturbation to bulk water. IFST has found wide application in understanding hydration phenomena in biological systems, but quantitative applications have not been comprehensively assessed. In this study, we report the hydration free energies of six simple solutes calculated using IFST and independently using free energy perturbation (FEP). This facilitates a validation of IFST that is independent of the accuracy of the force field. The results demonstrate that IFST shows good agreement with FEP, with an R2 coefficient of determination of 0.99 and a mean unsigned difference of 0.7 kcal/mol. However, sampling is a major issue that plagues IFST calculations and the results suggest that a histogram method may require prohibitively long simulations to achieve convergence of the entropies, for bin sizes which effectively capture the underlying probability distributions. Results also highlight the sensitivity of IFST to the reference interaction energy of a water molecule in bulk, with a difference of 0.01 kcal/mol changing the predicted hydration free energies by approximately 2.4 kcal/mol for the systems studied here. One of the major advantages of IFST over perturbation methods such as FEP is that the systems are spatially decomposed to consider the contribution of specific regions to the total solvation free energies. Visualizing these contributions can yield detailed insights into solvation thermodynamics. An insight from this work is the identification and explanation of regions with unfavorable free energy density relative to bulk water. These regions contribute unfavorably to the hydration free energy. Further work is necessary before IFST can be extended to yield accurate predictions of binding free energies, but the work presented here demonstrates its potential.


■ INTRODUCTION
Inhomogeneous fluid solvation theory 1 (IFST) is a statistical mechanical framework for calculating the effect of a solute on the free energy of the surrounding solvent relative to its bulk state. 2 The solute can be a protein, 3 peptide, 4 or small molecule 5 and the solvent is commonly water. 6 IFST has found particular use in the pharmaceutical industry with the advent of Schrodinger's WaterMap software. 7 It has also been developed in the solvation theory of ordered water (STOW) package, 8 and recent work using grid inhomogeneous solvation theory (GIST) has explored the results of performing IFST calculations on a Cartesian grid. 9 One of the useful features of IFST is that the free energy changes are calculated for small subvolumes surrounding the solute and this allows the contribution of different regions of space to be calculated and visualized. This has been used to understand the determinants of binding affinity 10 and design new inhibitors in the hit-to-lead and lead optimization stages of drug development. 11 Work in this lab has focused on the importance of modeling solvation at protein surfaces 12 and the data requirements for convergence of the thermodynamic quantities computed by IFST. 13 One important issue that has not been addressed in detail is the quantitative accuracy of IFST. Initial work suggested a reasonable comparison with the experimental solvation free energy of methane 5 and recent work notes solvation free energies as a key benchmark for the method. 9 While a direct comparison with experiment is interesting, the results rely on the force field and particularly the water model that is used. 14 Thus, a more useful comparison is with an equivalent computational technique. This decouples testing of the method from testing of the parameters. In this work, we compute solvation free energies for six simple molecules using IFST and compare the results with solvation free energies calculated using free energy perturbation (FEP). 15

■ SIMULATION DETAILS
Hydration free energies for six simple molecule were calculated using two statistical mechanical computational methods, FEP and IFST. FEP calculates the hydration free energies directly, whereas IFST calculates the hydration enthalpies and entropies and these are combined to yield the predicted hydration free energy. All molecular dynamics (MD) simulations in this work were performed with the TIP4P-2005 water model. 16 Systems Setup. The six solutes studied were acetamide, benzene, isobutane, methane, methanol, and N-methylacetamide. The parameters for acetamide, benzene, isobutane, methanol, and N-methylacetamide were taken from the CHARMM36 force field. 17 The parameters for methane were not present in the CHARMM36 force field and were thus adapted from ethane. The methane carbon atom was assigned an atom type of CG331 and a partial charge of −0.36. The methane hydrogen atoms were assigned an atom type of HGA3 and a partial charge of +0.09. The bond lengths, bond angles, and dihedral angles were set to their force field equilibrium values for all molecules.
Water Setup. To generate a reasonable initial water density, a water shell of radius 50.0 Å was generated around each biomolecule with the SOLVATE program version 1.0 from the Max Planck Institute. 18 The resulting water globules were then cut to rhombic dodecahedral unit cells with side lengths of 25.0 Å. To standardize the geometries of the water molecules, every hydrogen atom was deleted and all the necessary hydrogen atoms and lone pairs were built using the appropriate geometry for TIP4P-2005 water. The boxes contained 384, 371, 376, 394, 376, and 374 water molecules for the molecules acetamide, benzene, isobutane, methane, methanol, and N-methylacetamide, respectively. No ions were included in the systems.
IFST Equilibration. Equilibration was performed for 1.0 ns in an NPT ensemble at 300 K and 1 atm using Langevin temperature control and Nose−Hoover 19 Langevin piston pressure control. 20 All systems were brought to equilibrium before continuing, by verifying that the energy fluctuations were stable. MD simulations were performed using an MD time step of 2.0 fs. Electrostatic interactions were modeled with a uniform dielectric and a dielectric constant of 1.0 throughout the equilibration and production runs. van der Waals interactions were truncated at 11.0 Å with switching from 9.0 Å. Electrostatics were modeled using the particle mesh Ewald method, 21 and the systems were treated using rhombic dodecahedral periodic boundary conditions. All solute atoms were fixed for the entirety of the equilibration and production simulations.
IFST Simulation. 100.0 ns of production simulation in an NPT ensemble were performed at 300 K and 1 atm for each system. System snapshots were saved every 10.0 fs, yielding 10 000 000 snapshots in total for each system. MD simulations for IFST were performed using NAMD 22 version 2.8 compiled for use with CUDA-accelerated GPUs.
IFST Implementation. An important decision in implementing IFST is the choice of subvolumes over which to perform the calculations. In protein binding sites, water molecules commonly cluster in distinct locations and the concept of a hydration site is useful to calculate contributions to the free energy from spherical regions. In the context of small molecule solvation, the water molecules do not cluster in distinct locations and the concept of a hydration site is not useful. Two approaches have been used to account for this in previous work. In the Lazaridis work on the solvation of methane, 5 the region surrounding the solute was split into subshells at different distances from the origin. For more complex molecules, spherical polar coordinates could usefully be used to further split these subshells. In the recent work on GIST, the region surrounding the solute was split into cubic voxels on a Cartesian grid. 9 This has the advantage that each subvolume has the same volume. There are a number of other possibilities for partitioning a volume into subvolumes and these different methods of honeycombing space are likely to affect the performance of IFST. In this work, we have used a Cartesian grid to honeycomb the volume and we use the term voxel to specify one cubic unit in this space. We consider all voxels within 12.0 Å of the origin in the calculation, where the centroid of each solute lies at the origin. In this study, we have not taken advantage of the symmetry of the solute molecules, which would provide more data, as we are interested in applying IFST to systems that do not have such symmetry in the future.
IFST Calculations. IFST calculates the difference in interaction energy (ΔE IFST ) and correlation entropy (ΔS IFST ) between each subvolume (v) and the equivalent number of bulk water molecules (n). These can be termed the local quantities. The contribution of each subvolume to ΔE IFST was calculated from the mean solute−water interaction energy (E sw ), the mean water−water interaction energy (E ww ), and the mean interaction energy of a bulk water molecule (E bulk ) as follows: For the TIP4P-2005 water model, E bulk is calculated from a 100 ns NPT simulation of bulk water at 300 K and 1 atm as −11.5813 kcal/mol. In this work, E bulk and E ww are defined as half the interaction energy of one water molecule with all other water molecules. Previous work have defined E bulk and E ww as the total interaction energy of one water molecule with all other water molecules and included factors of one-half in eq 1. 13,23 The two approaches are identical. The difference in correlation entropy is calculated from solute−water entropy (S sw ), the water−water entropy (S ww ), and the entropy of a bulk water molecule (S bulk ) as follows: All entropies calculated in this work exclude vibrational entropy changes. For each subvolume, S sw is calculated from the translational and orientational contributions using the translational and orientational correlation functions, g sw (r) and g sw (ω| r).
sw sw sw (4) symmetry. The orientational correlation functions were assumed to be independent of the position within the hydration sites. The default grid resolution was 0.5 Å and the default angular bin size was 45°. This leads to 8, 4, and 4 angular bins for α, cos β and γ and thus 128 angular bins in total. The ΔS ww term can also be calculated from translational and orientational contributions. A rigorous treatment would use the inhomogeneous water−water pair-correlation functions g ww (r,r′) and g ww (ω,ω′|r,r′) from the system and the homogeneous water− water pair-correlation functions g ww (R) and g ww (ω rel |R) from bulk water. R and ω rel are the distance and relative orientation between two water molecules in bulk water.
However, converging calculations of the inhomogeneous water−water pair-correlation functions requires a very large amount of data and thus the Kirkwood superposition approximation (KSA) is used. 24 The KSA provides an approximation of a third-order correlation function in terms of lower order correlation functions. In this context, the KSA is applied to the homogeneous triplet solute−water−water correlation function. The inhomogeneous water−water correlation function is then assumed to be independent of the solute−water correlation functions and thus equal to the water−water correlation function in bulk solvent. The KSA is used for both g ww (r,r′) and g ww (ω,ω′|r,r′): These correlation functions can be converged in bulk water due the increased data available from it being a homogeneous and isotropic system. Previous work on the solvation of methane suggests that these KSAs lead to reasonable results due to the cancellation of errors between more and less structured regions. 5 In this work, S ww was only calculated for pairs of voxels within 3.6 Å. This is likely a good assumption for the translational contributions, which in bulk water are derived almost exclusively from the excluded volume and the first solvation shell. 25 The orientational contributions to S ww (and S bulk ) are expected to be underestimated, as only approximately 75% of the orientational entropy in bulk water is derived from the first solvation shell. 25,26 Calculations for g ww (R) and g ww (ω rel |R) were performed on pairs of voxels on the Cartesian grid from the 100 ns NPT simulation of bulk water at 300 K and 1 atm. The difference in free energy (ΔG IFST ) for each voxel is then calculated from ΔE and ΔS.
FEP Protocol. The equilibrated systems generated for the IFST simulations were used as the start points for the FEP systems. These systems consist of the solute in water and correspond to the λ = 0.0 states. FEP calculations were performed in both forward and backward directions to yield corresponding predictions for annihilation (λ = 0.0 to λ = 1.0) and creation (λ = 1.0 to λ = 0.0) of the solutes. Each annihilation and creation was split into 24 steps to yield 48 λ windows per system. The lambda schedules are reported in Table 1. Two measures were adopted to avoid the numerical instabilities termed "end-point catastrophes" that occur when λ approaches 0.0 or 1.0. First, a soft-core potential was employed with a van der Waals radius-shifting coefficient of 5.0. 27,28 Second, van der Waals interactions were scaled down to zero between λ = 0.0 and λ = 1.0 whereas electrostatic interactions were scaled down to zero between λ = 0.0 and λ = 0.4. 29 MD simulations for FEP were performed using NAMD version 2.8. 22 FEP Equilibration. Starting with the equilibrated systems generated for the IFST simulations, further equilibration was performed for 800 ps in an NPT ensemble for each lambda window.
FEP Simulation. 7.0 ns of production simulation in an NPT ensemble were performed at 300 K and 1 atm for each lambda window.
The free energy change for each small step (ΔG a→b ) is calculated using the partition functions (Q) for the two states which are calculated using the Hamiltonians (H) for the two states.
We employed the Bennett acceptance ratio (BAR) method, 30 a technique that combines forward and backward FEP simulations to provide efficient estimates of free energy changes. 29 BAR was implemented using the ParseFEP Plugin from VMD. The ParseFEP Plugin is described in "A Toolkit for the Analysis of Free-Energy Perturbation Calculations" and the free energy and the statistical error are computed using eqs 11 and 14 from that paper. 31 The estimated statistical error in the FEP free energy predictions using BAR was less than 0.1 kcal/ mol in all cases. Additional Considerations. IFST calculations equate the change in enthalpy (ΔH) with the change in potential energy (ΔE). For calculations in an NPT ensemble, there is an additional contribution to the free energy of PΔV, where P is the pressure and ΔV is the change in volume. However, for solvation of small molecules in water under ambient conditions, PΔVis significantly smaller than ΔE and can be safely neglected. 32 In addition, the FEP and IFST calculations in this study both use a nonpolarizable force field. The parameters for the solutes are for prepolarized molecules and thus the free energy change associated with polarization of the solutes is ignored. 14 The calculated values are thus expected to differ from the experimental values. As noted previously, such calculations also ignore the difference in polarization of the water molecules between bulk water and the solution. 14 An additional consideration when comparing IFST calculations to experiment are terms that arise from the thermal motion of the solute and the change in ideal solvent entropy upon solute insertion. The additional terms are commonly termed the liberation contributions (ΔE lib and ΔS lib ) and the volume entropy contribution (S ve ). 33 Collectively, they are termed the nonlocal contributions and are distinct from the correlation terms in eqs 1 and 2. 1 The local contributions ΔE IFST , ΔS IFST , and ΔG IFST represent Ben-Naim's standard energy, entropy, and free energy of solvation. 34 The liberation enthalpy ΔE lib is calculated from the solvent's thermal expansion coefficient α and isothermal compressibility κ: For solvation in water under standard ambient conditions, ΔE lib is approximately +0.05 kcal/mol. IFST is formulated in such a way that portions of ΔS lib and S ve cancel with some of the correlation terms. 1 The entropy from the nonlocal contributions ΔS nonlocal is calculated from the remaining terms: For solvation in water under standard ambient conditions ΔS nonlocal makes a contribution of +0.55 kcal/mol to the free energy. In this work, we compare the predictions of FEP and IFST with Ben-Naim's experimentally determined standard energy, entropy, and free energy of solvation. These do not include the nonlocal contributions, which are thus excluded from our calculations.
The standard energies, entropies, and free energies of solvation of solvation for benzene, isobutane, methane, and methanol are taken from the same source. 35 The standard free energies of solvation for acetamide and N-methylacetamide are derived from Wolfenden 36 and taken from the same source. 37 The enthalpies of solvation for acetamide and N-methylacetamide are taken from the same source 38 and then modified to yield the standard enthalpies of solvation. 39 The entropies of solvation were determined from the standard energies and free energies.

■ RESULTS AND DISCUSSION
The simulation of bulk water is a key part of this work, as ΔE IFST depends critically upon the calculated value of E bulk . This is because the volume of solvent we are considering (voxels within 12.0 Å of the origin) contains approximately 240 water molecules. Thus, a difference of 0.01 kcal/mol in E bulk leads to a difference of 2.4 kcal/mol in ΔE IFST and ΔE FEP . This is significant in the context of predicting solvation free energies. A simple calculation suggests that E bulk must be correctly converged to within 0.0002 kcal/mol for ΔE IFST and ΔE FEP to be correctly calculated to within 0.1 kcal/mol. This extreme dependence means that it is thus vital that the calculation of E bulk is converged. Figure 1 shows the convergence of E bulk for the 100.0 ns NPT simulation. After approximately 75 ns, the running average remains within 0.0002 kcal/mol of the final calculated value, yielding a final E bulk of −11.5813 kcal/mol for the TIP4P-2005 water model. The simulation of bulk water also allows calculation of the excess entropy (S bulk ). Numerous works have reported the excess translational entropy of bulk water based on IFST The RDF between 0.0 and 3.6 Å from the bulk water simulation of TIP4P-2005 (blue line) and an "effective RDF" computed for voxels at the given distance from the origin (red line). The effective RDF is calculated by first computing the distance between the origin and the center of each voxel. Voxels at the same distance from the origin are then grouped together. The value of g(r) at each distance is calculated from the average number of water molecules in the group of voxels at that distance and their summed volume.   25,26,40−43 Due to bulk water being isotropic, the excess translational entropy can be calculated from the radial distribution function (RDF). Using the RDF with a radial bin size of 0.1 Å, the excess translational entropy in the first solvation shell (between 0.0 and 3.6 Å) is calculated to be −3.29 cal/mol/K for the TIP4P-2005 water model. This makes a contribution of +0.98 kcal/ mol to the excess free energy. It is interesting to compare this with the translational entropy based on IFST calculations using a Cartesian coordinate system. Using the default grid resolution of 0.5 Å, the excess translational entropy for voxels within the first solvation shell (between 0.0 and 3.6 Å) is calculated to be −2.10 cal/mol/K. This makes a contribution of +0.63 kcal/mol to the excess free energy. The true RDF and an "effective RDF" using the Cartesian coordinate system can be seen in Figure 2.
The curve is much flatter for the Cartesian coordinate system, with a significantly reduced contribution from the excluded volume. This suggests that a Cartesian grid of 0.5 Å may be too coarse to fully capture the translation probability density function. However, similar results are observed for the solute water RDFs (data not shown) and the errors in the translational entropy for the bulk system and the solute−water systems largely cancel. A similar approach can also be taken for the orientational entropy. Using spherical polar coordinates with a radial bin size of 0.1 Å and an angular bin size of 10°, the excess orientational entropy in the first solvation shell (between 0.0 and 3.6 Å) is calculated to be −7.28 cal/mol/K for the TIP4P-2005 water model. This makes a contribution of +2.17 kcal/mol to the excess free energy. For the Cartesian coordinate system using a grid resolution of 0.5 Å and an angular bin size of 10°, the excess orientational entropy in the first solvation shell (between 0.0 and 3.6 Å) is calculated to be −4.70 cal/mol/K. This makes a contribution of +1.40 kcal/mol to the excess free energy. Figure 3 shows the cumulative orientational contributions to S bulk between 0.0 and 3.6 Å for the radial and Cartesian coordinate calculations. These findings show that S bulk is underestimated using a grid resolution of 0.5 and this suggests that S ww is likely to be misestimated.
After calculating the necessary values for the bulk water simulation, we tested the method by using the bulk simulation   as a basis for IFST calculations. The IFST quantities were calculated for all voxels within 12.0 Å of the origin. In all the following work, we report the entropic contributions to the free energy (−TΔS IFST ) in kcal/mol rather than the entropies (ΔS IFST ) in cal/mol/K, as this allows a direct comparison with the enthalpic contributions to the free energy. The second column of Table 2 shows the contributions from the different IFST quantities for the bulk water simulation, which should all be zero when there is no perturbation from a solute. It is clear that the orientational contributions to −TS sw are overestimated by this method and this is due to the inherent bias in the histogram method. The overestimate is very small for each voxel (approximately 0.000 015 kcal/mol) but there are approximately 60 000 voxels and this leads to a significant overestimate in total. For this reason, we expect that the orientational contributions to S sw will be overestimated for the solute systems. The other contributions to the free energy are close enough to zero to be acceptable. We then moved on to consider the solute−water systems. A key question that we have previously attempted to address is the size of the solvation shell around a solute that is affected by its presence. 13,23 However, previous studies were incomplete because they considered the per voxel values of ΔE IFST , ΔS IFST , and ΔG IFST with increasing distance from the origin rather than the total ΔE IFST , ΔS IFST , and ΔG IFST . These total values are affected by the volume increasing with the cube of the distance from the origin and are more revealing. We have calculated these values for the six solutes to more thoroughly address this issue. The results can be seen in Figure 4 for acetamide.
The total ΔS IFST approaches its final value at around 5.0 Å from the origin whereas the total ΔE IFST and thus the total ΔG IFST approach their final value at around 9.0 Å from the origin. The slight peaks in ΔS IFST at 5.0 and 9.0 Å are due to the first and second solvation shells, as the plot is of the distance from the origin and acetamide has an approximate "radius" of 3.0 Å. When the size of the solutes is considered, the results indicate that the solvent is perturbed to a distance of approximately 7.0 Å from the surface of a solute. This corresponds to the first two solvation shells, with the first solvation shell contributing around 80% to ΔG IFST . As discussed previously, these distances may be much larger for charged systems and they may also be different for larger solutes. Indeed, studies on a 16-residue peptide suggest that significant perturbations may extend to the third solvation shell at around 10 Å from the surface. 44 We also wished to study the convergence of the IFST quantities. As previously, we distinguish between the convergence with increased simulation time for a given sampling frequency and the convergence with increased sampling frequency for a given simulation time. 13,23 The convergence with increased simulation time can be seen for acetamide in Table 3 for a sampling frequency of 10 fs. The result for ΔE IFST after 20.0 ns differs from the result after 100 ns by only 0.1 kcal/mol but the result for TΔS IFST differs by more than 0.1 kcal/mol, even after 50.0 ns. It is the difference in the orientational component of S sw that is notably different as the simulation time is decreased. This is not unexpected, as there are 128 times as many histogram bins for the orientational component of S sw as for the translational component. These calculations suggest that converged predictions for IFST require at least 100 ns of simulation with a grid resolution of 0.5 Å and an angular bin size of 45°b ut that simulations of 20 ns may be sufficient to predict ΔE IFST . The convergence with increased sampling frequency for acetamide can be seen in Table 4 for a simulation time of 100 ns. The result for ΔE IFST using a sampling frequency of 1000 fs is the same as the result using a sampling frequency of 10 fs to one decimal place, and the result for TΔS IFST using a sampling frequency of 20 fs is the same as the result using a sampling frequency of 10 fs to one decimal place. Again, it is the orientational component of TS sw that is notably different as the sampling frequency is decreased. These calculations suggest that converged predictions for IFST using a histogram method require at least 5 000 000 samples from a 100 ns simulation with a grid resolution of 0.5 Å and an angular bin size of 45°.
Other methods, such as the k-nearest neighbors (k-NN) algorithm, 45 may require shorter simulations and less sampling. 9,42 We also wished to consider the effect of the grid resolution and angular bin size on the orientational and translational components of S sw . The results in Tables 3 and 4 indicate that the translational component of S sw is well converged for a grid resolution of 0.5 Å. We also studied the results for grid resolutions of 0.25 and 0.75 Å to assess whether the translational component of S sw is fully quantified with the default grid resolution of 0.5 Å. The results for acetamide are  reported in Table 5 and indicate that the answer is no. A grid resolution of 0.25 Å yields a converged value of 4.3 kcal/mol for the translational component of −TS sw in comparison with a value of 3.7 kcal/mol for a grid resolution of 0.5 Å. Smaller grid resolutions, which may yield a higher prediction, were not tested. Testing the orientational component of −TS sw in this fashion is more complicated because it requires a grid resolution in addition to an angular bin size. In this work, we have only explored the variation in entropy with respect to the angular bin size at the default grid resolution of 0.5 Å. We studied the results for angular bin sizes of 36°and 60°to assess whether the translational component of −TS sw is fully quantified with the default angular bin size of 45°. The results for acetamide are reported in Table 6. An angular bin size of 36°yields a value of 3.6 kcal/mol for the orientational component of −TS sw in comparison with a value of 2.9 kcal/ mol for an angular bin size of 45°. However, it is not clear that the results for an angular bin size of 36°are fully converged from 10 000 000 samples over 100 ns and thus the question cannot be answered. Indeed, the results from the simulation of bulk water presented in the first column of Table 2 suggest that the orientational component of −TS sw is not fully converged from 10 000 000 samples over 100 ns for an angular bin size of 45°. Due to computational restrictions, we have not tested effect of the grid resolution and angular bin size on the orientational and translational components of ΔS ww . However, the results from the simulation of bulk water suggest that the effects on S sw and ΔS ww may counterbalance one another. The IFST predictions can be seen in Table 2. Calculations suggest that the translational and orientational contributions to the solvation entropy are of a similar magnitude in all cases, though the orientational contributions are always slightly larger. In addition, the solute−water terms are predicted to be 5−8fold larger than the water−water terms. This is partly because the contribution of the translational water−water term is negative due to the solute excluded volume, but the orientational water−water term is also small. However, there are two issues to be considered. The first is that the water− water terms depend on the validity of the KSAs, which have not been thoroughly explored. The second is that our calculations suggest that the water−water terms are misestimated using a Cartesian grid with a resolution of 0.5 Å. It is interesting to note that the solute−water interactions (E sw ) are highly favorable in all cases, at the expense of an unfavorable change in water− water interactions (ΔE ww ). This leads to a favorable change in overall energy (ΔE IFST ) that is approximately half the magnitude of E sw for the majority of solutes.
Assessing the data generated using IFST, it is clear that there are a number of issues with the method. The selection of grid resolution and angular bin size affects all the calculated entropies and in addition the use of the KSAs has not been comprehensively validated. However, these issues affect all solutes and thus we might expect a systematic misestimation of the entropies in comparison with FEP. We proceed, with this in mind. The results from experiment along with the predictions from FEP and IFST can be seen in Table 7. The correlation plots between experiment, IFST, and FEP can be seen in Figure  5.
Of primary interest is the agreement between FEP and IFST. The mean unsigned difference in the solvation free energies is 0.7 kcal/mol, with the IFST predictions tending to be less favorable than the FEP predictions. FEP and IFST both yield reasonable predictions of the experimental quantities with mean unsigned errors for the solvation free energies of 1.3 and 1.8 kcal/mol, respectively. N-Methylacetamide is particularly poorly predicted by both methods. Agreement of IFST with the experimental enthalpic and entropic contributions is reasonable, with mean unsigned errors of 0.9 and 1.2 kcal/ mol, respectively. However, the results suggest that the entropic component −TΔS tends to be overestimated by IFST, with a mean signed error of +1.2 kcal/mol. These results may explain why the IFST predictions of the hydration free energies tend to be less favorable than the FEP predictions. They are also commensurate with the overestimation of TΔS by 1.0 kcal/mol for the simulation of bulk water.
One of the main advantages of IFST calculations is the ability to break the solvation free energies into contributions from different regions and visualize these contributions. This promotes understanding of the system being studied and an insight into the thermodynamics of solvation. Figure 6 shows the contributions from different regions to the hydration free energy of acetamide at different contour levels, visualized using VMD. 46 The region with the highest relative free energy density is a ring surrounding the carbonyl oxygen ( Figure 6A). The carbonyl group is actually predicted to contribute approximately −6.0 kcal/mol to the hydration free energy of acetamide ( Figure 6B). At lower contour levels, the contributions from the amide hydrogens are visible, with each  a The orientational component of −TS sw for acetamide, calculated from 1 000 000, 2 000 000, 5 000 000, and 10 000 000 samples with angular bin sizes of 30°, 45°, and 60°. contributing approximately −1.5 kcal/mol to the hydration free energy ( Figure 6C). It is worth noting that the favorable regions surrounding the amide hydrogens would be well described by spherical hydration sites but the favorable regions surrounding the carbonyl oxygen would not. Contouring from positive values, the regions contributing most unfavorably to the hydration free energy are above and below the plain of the amide bond ( Figure 6D). This effect has been noted in previous studies using WaterMap 47 and agrees with observations from the Cambridge Crystallographic Data Centre on the hydrophobic character above and below the plane of amide bonds (personal communication with Oliver Korb). It may also explain why defectively packed backbone hydrogen bonds, which have been termed dehydrons, lead to hotspots at protein surfaces. 48 At lower contour levels, the contributions from the methyl group are visible, again contributing unfavorably to the    Figure 6E). At very low contour levels, one can observe the small and unfavorable contributions from water molecules in the second solvation shell ( Figure 6F). The presence of regions with high number density that contribute unfavorably to a favorable hydration free energy may seem counterintuitive. The reason why a vacuum is not present within such regions can be understood using a simple example. Consider water molecules in a voxel with weak solute−water interactions (E sw = −1.0 kcal/mol) and water−water interactions that are more favorable than in bulk (ΔE ww = −2.0 kcal/mol) that is strongly ordered (−TS sw = +3.0 kcal/ mol and −TΔS ww = +0.5 kcal/mol). For this voxel, ΔG IFST = +0.5 kcal/mol and the region contributes unfavorably to the solvation free energy. One could view these water molecules as "icelike" given their unfavorable entropy and favorable enthalpy compared with bulk water. However, if such a region is vacated (without allowing solvent rearrangement) then all surrounding voxels also lose their pair interactions and correlations with water molecules inside the voxel. The total change in free energy (going to this unphysical state) can thus be unfavorable. If subsequent solvent rearrangement around the vacuum region cannot overcome this loss, then the voxel will be occupied despite contributing unfavorably to the solvation free energy. Clearly, there are cases where water can rearrange around a vacuum, but water molecules at the vacuum interface are likely to show reduced water−water interactions. It is worth noting that the unfavorable contributions are smaller in magnitude than the favorable contributions. Considering all voxels in the case of acetamide, the most favorable relative free energy density is −0.28 kcal/mol/Å 3 whereas the least favorable relative free energy density is +0.04 kcal/mol/Å 3 . Because the exact free energy densities calculated will depend on the grid resolution and angular bin size used, it will be important to test this principle in future work. If these results accurately reflect the thermodynamics, an interesting consequence is the potential for hydration sites in protein binding sites that contribute unfavorably to the hydration free energy. Such sites would be major hotspots of binding.
There is one additional issue which must be mentioned. ΔE IFST , −TΔS IFST , and ΔG IFST are calculated as the sum of contributions from each voxel inside a given volume V. The contributions from each voxel include pairwise contributions due to correlations with every other voxel. In the IFST framework, the pairwise contributions are split equally between the two voxels of the pair. Importantly, there are pairs where one of the voxels is not inside the volume V and thus there are regions outside the volume V which contribute to ΔE IFST , −TΔS IFST , and ΔG IFST . This must be considered when implementing IFST by counting pair contributions consistently. However, the contribution of voxels outside the volume V is expected to be very small if the volume V is significantly larger than the perturbing solute. In this case, a voxel outside V will only have significant correlations with voxels inside V that are close to the periodic boundary. These will have bulklike properties, leading to E sw ≈ 0, ΔE ww ≈ 0, S sw ≈ 0, ΔS ww ≈ 0, and ΔG IFST ≈ 0.

■ CONCLUSIONS
IFST is a useful method to model solvation and has found particular application in understanding hydration phenomena in biological systems. 3,7,10,12 However, there has been little analysis to assess the quantitative predictions of IFST. 3 This study addresses this issue by comparing hydration free energies from IFST with hydration free energies from FEP. The findings of this study are that IFST calculations show a good agreement with FEP calculations, with a mean unsigned difference of 0.7 kcal/mol in the predicted hydration free energies. Comparison of IFST predictions with the experimental hydration enthalpies and entropies suggests that there is a systematic overestimation of the entropies, which leads to a systematic underestimation of the free energies. For the CHARMM36 force field and the TIP4P-2005 water model, the agreement with the experimental hydration free energies is reasonable for FEP and IFST, with mean unsigned errors of 1.3 and 1.8 kcal/mol, respectively. Both IFST and FEP rely on the force field and are thus subject to the same problems in this respect. For IFST, the fine dependence of ΔE IFST on E bulk means that it is thus vital that the calculation of E bulk is converged correctly. E bulk will depend on the force field, water model, simulation package, and simulation parameters employed. Due to cancellation, this fine Figure 6. Visualizations of the relative free energy density relative to bulk water for regions surrounding acetamide. Six views of acetamide, with the relative free energy density contoured at different levels. Favorable relative free energy density is contoured in red and unfavorable relative free energy density is contoured in yellow. The contouring for the favorable relative free energy increases from (A) to (B) to (C). The contouring for the unfavorable relative free energy decreases from (D) to (E) to (F). dependence will be of less importance in calculations of hydration free energy differences or binding energies, but only if the number of water molecules considered is similar. In addition, the systematic overestimations of −TΔS IFST are expected to be similar if the grid resolution and bin sizes are the same. Thus, predictions for different molecules and different regions are directly comparable for a given IFST implementation, but not necessarily between different IFST implementations.
In addition to these results, there are a number of important points raised by the study. IFST has a number of limitations that should be addressed before widespread quantitative application. Current implementations are only rigorously valid for a fixed solute and this limits the utility. Flexible solutes can be considered in the IFST framework by performing an analysis on a number of solute conformations, 9 but this will require yet more data. One of the major approximations in IFST is the use of the KSAs for the water−water entropy calculations. Previous work on the solvation of methane suggests that these KSAs lead to reasonable results due to the cancellation of errors between more and less structured regions. However, this may not be the case for polar solutes and further testing of the KSAs should be a focus of future work. Sampling is also a problem for IFST. Previous work has explored the data requirements for convergence of the solvation energy and entropy. 13,23 However, the comparison with FEP in this work allows a more thorough analysis. The results suggest that converged energies can be effectively sampled from 10 000 snapshots over a 20 ns simulation whereas the orientational component of the solute− water entropy term requires at least 5 000 000 snapshots over a 100 ns simulation to reach a converged prediction even with a modest angular bin size of 45°. However, the histogram method introduces a systematic overestimating bias to the entropies and this will scale with the volume of the system. Other work has shown that it is also difficult to estimate entropies using perturbation methods such as FEP and thermodynamic integration (TI). 49 In the context of IFST, assessment of the entropies may be more feasible using the k-NN algorithm, 9,42 particularly for the increased sampling requirements of considering a highly flexible solute. It is difficult to fairly assess the computation times required, because the FEP calculations are integrated into NAMD whereas the IFST calculations are not. This requires postprocessing of largetrajectory files at considerable computational cost. Comparing just the simulation times, each solute was simulated for a total of 336 ns for FEP and 100 ns for IFST. While FEP is inherently more parallelizable than IFST in its current format, it should be possible to apply IFST to multiple independent MD simulations, which would parallelize it effectively.
While suffering a number of drawbacks noted above, IFST does have a number of advantages over perturbation methods such as FEP and TI. Because IFST calculations are performed on a single simulation, there is no need for a lambda window schedule and no need to monitor overlap between lambda windows. In addition, there are no end points and thus there are no end-point catastrophes and no need for soft-core potentials. Importantly, there is no pathway and thus small and large perturbations can be considered with the same degree of sampling. However, perhaps the most useful feature of IFST is the spatial decomposition of the solvation free energies. This yields results that are more insightful than the total solvation free energies. Such visualization has also proved useful in 3D-RISM. 50 Indeed, it would be interesting to compare IFST results with the results of 3D-RISM 51 and cell theory, 52,53 which can provide spatially decomposed predictions of solvation free energies using different methods.
This work represents a preliminary study on the quantitative application of IFST using a very small set of solutes. Future work should explore a number of avenues. A wide range of solutes would provide a sterner test for the methods. The inclusion of charged solutes may necessitate the use of a polarizable force field. Prior to this, it would be very useful to test the validity of the KSAs for a number of solutes, using the true inhomogeneous water−water correlation functions or via comparison with lower order entropy approximations. This should be done in concert with a thorough determination of the time scales and sampling requirements that are necessary for effective convergence of the thermodynamic predictions at the desired level of accuracy. In addition, care must be taken to employ methods that effectively capture the underlying probability distributions. The combination of ΔG IFST with an assessment of protein−ligand interactions will facilitate the calculation of binding free energies. This will augment binding affinity predictions based on a single unbound state simulation, 54 albeit at increased computational cost. To achieve this, it will be necessary to incorporate solute flexibility into IFST. 9 In conclusion, the agreement between hydration free energy predictions for IFST and FEP is very encouraging. It is clear that the entropic contributions are overestimated by IFST and further work is needed. However, the insight gained by visualizing the relative free energy densities provides a major advantage of IFST that make this a worthwhile objective.

Author Contributions
The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.