Thermodynamic Properties of Water Molecules at a Protein–Protein Interaction Surface

Protein–protein interactions (PPIs) have been identified as a vital regulator of cellular pathways and networks. However, the determinants that control binding affinity and specificity at protein surfaces are incompletely characterized and thus unable to be exploited for the purpose of developing PPI inhibitors to control cellular pathways in disease states. One of the key factors in intermolecular interactions that remains poorly understood is the role of water molecules and in particular the importance of solvent entropy. This factor is expected to be particularly important at protein surfaces, and the release of water molecules from hydrophobic regions is one of the most important drivers of PPIs. In this work, we have studied the protein surface of a mutant of the protein RadA to quantify the thermodynamics of surface water molecules. RadA and its human homologue RAD51 function as recombinases in the process of homologous recombination. RadA binds to itself to form oligomeric structures and thus contains a well-characterized protein–protein binding surface. Similarly, RAD51 binds either to itself to form oligomers or to the protein BRCA2 to form filaments. X-ray crystallography has determined that the same interface functions in both interactions. Work in our group has generated a partially humanized mutant of RadA, termed MAYM, which has been crystallized in the apo form. We studied this apo form of MAYM using a combination of molecular dynamics (MD) simulations and inhomogeneous fluid solvation theory (IFST). The method locates a number of the hydration sites observed in the crystal structure and locates hydrophobic sites where hydrophobic species are known to bind experimentally. The simulations also highlight the importance of the restraints placed on the protein in determining the results. Finally, the results identify a correlation between the predicted entropy of water molecules at a given site and the solvent-accessible surface area and suggest that correlations between water molecules only need to be considered for water molecules separated by less than 3.2 Å. The combination of MD and IFST has been used previously to study PPIs and represents one of the few existing methods to quantify solvent thermodynamics. This is a vital aspect of molecular recognition and one which we believe must be developed.


' INTRODUCTION
ProteinÀprotein interactions (PPIs) are essential in controlling cellular networks and play an important role in many disease states. 1 Significant efforts are now being focused on understanding the nature of the intermolecular interactions in PPIs, and computational methods are a key aspect of increasing our understanding. 2,3 In addition, PPIs are now increasingly being targeted for drug development, and computational methods are commonly combined with structural data in virtual screening and lead optimization for PPI targets. 4 One aspect of molecular interactions that is particularly important for understanding PPIs is hydrophobic association driven by desolvation of nonpolar protein surfaces. Water molecules form significant hydrogen bonding interactions in bulk water and are somewhat ordered. Conversely, water molecules at a hydrophobic surface have reduced hydrogen bonding interactions and have differing levels of order, dependent upon the environment. The balance of these components is one of the key factors that controls the thermodynamics of binding. This has been proposed as the principal driving force for binding in a number of systems and also impacts protein folding and stability. 5 In this study, we apply solvation thermodynamics to a prototypical PPI surface.
Recombinase Biology. Recombinases such as RadA and RAD51 are key factors in the process of homologous recombination (HR) to repair broken double strand breaks (DSBs) in DNA. 6 The human RAD51 recombinase is known to form an oligomeric structure in the cell, where it is sequestered until needed for HR. Shortly after DNA replication, RAD51 is loaded onto DNA around DSBs by association with the so-called BRC repeats of the regulatory BRCA2 protein. 7 RadA, the archaeal homologue of RAD51, is sequestered in oligomeric structure in the cell but appears to bind DNA as a helical filament without the presence of a regulatory protein. 8 The interface for oligomerization has been identified in RadA and RAD51 by crystallography. 9,10 The key determinant of binding is the presence of a hydrophobic pocket on the surface that binds a phenylalanine residue. 11 Another smaller pocket is found in close proximity and binds an alanine residue. These pockets are termed the phenylalanine pocket and the alanine pocket. RadA and RAD51 oligomerize by bringing together their hydrophobic surfaces with an FMRA and an FTTA sequence, respectively. The BRC repeats of BRCA2 also exploit these pockets to bind RAD51 with a Journal of Chemical Theory and Computation ARTICLE conserved FXXA motif. 12 In addition, these pockets are surrounded by a surface dotted with hydrophobic patches, as shown in Figure 1a for MAYM and Figure 1b for RAD51. This surface is thus typical of a PPI and provides a good test case to explore the thermodynamics of solvation and how it contributes to proteinÀprotein association.
Inhomogeneous Fluid Solvation Theory. Inhomogeneous fluid solvation theory (IFST) was developed by Lazaridis as a method to study hydrophobic hydration 13 by calculating interactions and correlations between water molecules through an analysis of molecular dynamics (MD) or Monte Carlo (MC) simulations. IFST was initially used to study pure water, 14 but the theory was then extended to consider small hydrophobic solutes 15 and then to consider protein binding sites. 16 In IFST, bulk water is considered as a reference state, and other molecules perturb this state, resulting in a change in enthalpy and entropy. 17 This is quantified by calculating the interaction energies and the correlation functions between the water molecules and the solute. 15 Regions of high water density are identified and then analyzed to compare the enthalpy and entropy with water molecules in bulk solvent. The methodology is described in detail below. IFST has been used to analyze a number of ligand binding sites to elucidate the role of water molecules. 16,18,19 IFST has also shown success in predicting binding affinities and has recently been implemented in Schrodinger's WaterMap software. 20,21 WaterMap has also been applied to explain binding affinities and specificities for PDZ domain 22 and for the polo-box domain of the mitotic kinase PLK1. 23 It has also been employed recently by Zielkiewicz to study water molecules around simple polypeptides. 24 Here, we apply IFST to the protein surface of the RadA MAYM mutant and explore the thermodynamic properties of water molecules at a PPI interface. This analysis quantifies the intermolecular interactions that underlie PPIs and allows the identification of potential binding hotspot regions.

' MATERIALS AND METHODS
We performed MD simulations of bulk water and of the apo MAYM protein using NAMD 25 using a number of simulation protocols.
Crystallography. The crystal structure of RadA was taken from a protein construct of Pyrococcus furiosus RadA (accession number AF052597) containing residues 108À349 (Marsh et al., unpublished). Residues 288À300 in the L2 loop were replaced by a single Asn residue, and residues 108À286, 304À329, and 336À349 have assigned density. The MAYM form of RadA has four humanizing mutations: I169M, Y201A, V202Y, K221M. The crystal structure contains one DMSO solvent molecule and one phosphate group. This protein construct lacks an N-terminal domain and thus does not oligomerize. However, the N-terminal domain is located over 15 Å from the phenylalanine and alanine pockets 9,26 and is thus unlikely to affect the properties of this surface.
Structure Preparation. The protein structure was initially prepared as follows. Atom coordinates for the protein and the water molecules were taken from the X-ray crystal structure. The DMSO solvent molecule and the phosphate group were deleted from the structure. The hydrogen-atom positions for the protein and the water molecules were then built using the PSFGEN mode of VMD 27 with the CHARMM27 energy function. 28,29 Histidine residues were then manually checked for protonation state. His210, His243, and His269 were assigned as epsilon protonated. All remaining histidines were assigned as delta protonated. The residues lysine, arginine, aspartate, glutamate, cysteine, and tyrosine were also analyzed to check their protonation state. There was no evidence of any unusual protonation states, and thus all lysine and arginine residues were assigned as positively charged, all aspartate and glutamate residues were assigned as negatively charged, and all cysteine and tyrosine residues were assigned as neutral. The terminal residues 304 and 336 were patched with an N-acetyl group, and the terminal residues 286 and 329 were patched with an N-methyl amide group. The atomic charges were assigned from the CHARMM27 forcefield. 28,29 All water molecules were modeled with the TIP4P/2005 water model. 30 The next stage was to solvate the protein with water molecules. All the water molecules observed in the crystal structure were retained. Solvation was performed with the SOLVATE program 31 version 1.0 from the Max Planck Institute to generate a solvation sphere of radius 50 Å around the center of the protein. No ions were included in the solution, as the protein has a net charge of zero. The system was then cut to form a rhombic dodecahedron (RHDO) with an edge length of 60 Å using the CHARMM program (version 34b1). 32 Equilibration. During all simulations with the RHDO, the protein atoms were fixed, the RHDO was treated using periodic boundary conditions, and the electrostatics were modeled using the particle mesh Ewald method. 33 The water molecules in the RHDO were first subjected to energy minimization for 10 000 steps using NAMD. This was followed by MD equilibration for

Journal of Chemical Theory and Computation
ARTICLE 100 ps in an NPT ensemble and then MD equilibration for 100 ps in an NVT ensemble. This stage of preparation was undertaken to equilibrate the density of the water molecules at the surface. The density of the water molecules plays an important role in IFST and is thus important to converge accurately. We ensured that the system was brought to equilibrium before continuing our simulations by verifying that the system reached a point where the energy fluctuations were stable. In the next stage, the RHDO was cut to form a sphere of water molecules around the binding pocket of interest using the CHARMM program. The solvent sphere of radius 20 Å was centered at the coordinates of the CA atom of Ala201. This is defined as the centroid of the solvent sphere. The resulting system containing the protein and a sphere of water molecules was then treated with three protocols. For each protocol, the system was subjected to MD equilibration for 100 ps using NAMD with spherical boundary conditions. 34 Again, we ensured that the system was brought to equilibrium before beginning the MD simulation by verifying that the system reached a point where the energy fluctuations were stable for each protocol. The three protocols are as follows: (1) Fixed: All protein atoms were kept fixed.
(2) Restrained: All atoms of any residue partially or completely outside the 20 Å sphere were fixed in place. All heavy atoms of any residue completely inside the 20 Å sphere were restrained using a 1.0 kcal/mol/Å 2 harmonic force. (3) Free: All atoms of any residue partially or completely outside the 20 Å sphere were fixed in place. All atoms of any residue completely inside the 20 Å sphere were not constrained. Molecular Dynamics. Production simulations were performed for 10.0 ns at 300 K. All MD simulations were performed using the NAMD program version 2.7b3 32 with the CHARMM27 force field 28,29 using an MD time step of 2.0 fs. Electrostatic interactions were modeled with a uniform dielectric and a dielectric constant of 1.0 throughout the setup and production runs. Van der waals interactions were truncated at 12.0 Å with switching from 8.0 Å. Bulk solvent was simulated as a periodic box of edge length 25 Å for a period of 8 ns using the same methods, parameters, and equilibration procedures detailed above.
Clustering. The 10.0 ns MD runs were first analyzed to cluster the water molecules into distinct spherical regions of high number density. These regions have been termed hydration sites in previous work using IFST, 20 and we retain this terminology here. We employed a radius of 1.2 Å for these hydration sites, in line with prior work. 18 The hydration sites were selected by sampling 1000 snapshots from the MD trajectory. All 1000 snapshots were superposed to generate a profile of the water density. Within the complete water density profile, we identified the oxygen atom of the water molecule with the largest number of water molecules within a 1.2 Å radius. The 1.2 Å sphere around the position of this oxygen atom was defined as a hydration site. This water molecule and all of its neighboring water molecules within 1.2 Å from any snapshot were excluded from further consideration. The process was then repeated to identify more hydration sites, allowing no new hydration sites within 1.2 Å of a previously defined hydration site. This iteration was terminated once when the density of an identified hydration sites fell below 1.5 times the number density of bulk water, which corresponds to an occupancy of 0.36 in the sphere of radius 1.2 Å. Only hydration sites within 12.0 Å of the solvation sphere center were considered. The resultant set of hydration sites was then subjected to energy and entropy calculations using IFST.
Energy Calculations. The interaction energy of each hydration site was calculated by sampling 5000 snapshots, taken every 2 ps from the 10.0 ns simulation. For each snapshot, we computed the average interaction energy with both the protein and all the other water molecules with VMD version 1.8.7 using the namdenergy plugin. This was then compared with the interaction energy of a water molecule determined from the bulk water simulation (À23.62 kcal/mol) to calculate the energy difference ΔE shown in eq 1.
In this equation, ΔE is the energy difference, E water/protein surface is the mean interaction energy between a water molecule in the hydration site and the protein, E water/water surface is the mean interaction energy between a water molecule in the hydration site and all of the other water molecules, and E water/water bulk is the mean total interaction energy of a water molecule in bulk.
Entropy Calculations. The entropy of each hydration site was calculated by sampling 100 000 snapshots, taken every 100 fs from the 10.0 ns simulation. The entropy difference between a water molecule at a hydration site and in bulk was calculated from the contributions of the proteinÀwater term (S pw ), the waterÀwater reorganization term (S ww ), and a term arising from the change in density (S density ). 35 These terms can be calculated by integrating over the proteinÀwater g pw (r,ω) and waterÀwater g ww (r,ω,r 0 ,ω 0 ) correlation functions, where the variable r represents the position of the water molecule with respect to the center of the hydration site, and the Euler angles ω represent the orientation of the water molecule in the fixed protein reference frame. As in previous work, only correlations between two species were considered. 18,20 The proteinÀ water correlations function were calculated using a bin size of 0.06 Å for the radial component and 18°for the angular components. The proteinÀwater and contribution to the entropy of changing the number density 35 can be calculated for each hydration site using eqs 2 and 3, where k is Boltzmann's constant, F is the number density of bulk solvent, F site is the number density of the hydration site being considered, and Ω is the integral over the Euler angles ω.
g sw ðr;ωÞln g sw ðr; ωÞdrdω ð2Þ As in previous work, the proteinÀwater term was separated into translational, S trans pw , and orientational, S orient pw , entropic contributions, and the orientational distributions were assumed to be independent of the position of the water molecules within the sites. 18 The entropies were calculated using eqs 4 and 5, where g trans pw (r) and g trans pw (ω) are the translational and orientational correlation functions.
S orient The waterÀwater reorganization term was calculated for each pair of hydration sites within a distance of 3.5 Å. This distance corresponds to water molecules in the first solvation shell of a Journal of Chemical Theory and Computation ARTICLE water molecule in bulk. The waterÀwater correlation functions were calculated using a bin size of 0.1 Å for the radial component and 18°for the angular components. For a given hydration site, the total reorganization entropy was calculated as the sum of the pairs of proximal sites. This term was then compared with the entropy of a water molecule from the bulk water simulation due to other water molecules within 3.5 Å (11.24 cal/mol/K). The entropies were calculated using eq 6.
ΔS ww is the waterÀwater entropy change, S w,w 0 is the pair entropy between a water molecule in the hydration site and a water molecule in another hydration site and S bulk ww is the pair entropy of a water molecule in bulk. The contribution to the enthalpy from waterÀwater correlations was also split into translational and orientational contributions. However, because of the vast amount of data required to accurately calculate the multidimensional waterÀwater correlation functions, we employed two approximations first proposed by Li and Lazaridis. 18 The first is that the waterÀwater correlation functions can be treated as dependent only on the relative orientation of the two water molecules and the distance between the centers of the two hydration sites. This correlation function can, in turn, be separated into translational and orientational contributions. g ww ðr;r 0 ; ω;ω 0 Þ ¼ g ww ðR;ω rel Þ ð 7Þ g ww ðR;ω rel Þ ¼ g ww ðRÞg ww ðω rel jRÞ ð 8Þ In these equations, g ww are the waterÀwater correlation functions, r 0 represents the position of the second water molecule with respect to the center of its hydration site, ω 0 represents the orientation of the second water molecule in the fixed protein reference frame, the variable R is the distance between the centers of the two hydration sites, and ω rel |R is the relative orientation of two water molecules at a distance R. The second approximation is that the waterÀwater correlation functions for the bound waters are the same as the waterÀwater correlation functions in bulk water. This leads to eqs 9, 10, and 11, where the variables θ 1 , θ 2 , χ 1 , χ 2 , and j are the five angles that specify the relative orientation of two water molecules. 14 g ww ðRÞ ¼ g bulk ww ðRÞ ð 9Þ g ww ðω rel jRÞ ¼ g bulk ww ðω rel jRÞ ð 10Þ g bulk ww ðω rel jRÞg bulk ww ðθ 1 ;θ 2 ;χ 1 ;χ 2 ;jjRÞ ð 11Þ Application of these approximations leads to eqs 12 and 13.
S trans ww ¼ À The waterÀwater correlation functions were calculated from the 8 ns simulation of bulk water, using all available water pairs. All calculations were performed using the Darwin Supercomputer of the University of Cambridge High Performance Computing Service (http://www.hpc.cam.ac.uk/) and were funded by the EPSRC under grant EP/F032773/1. All MD simulations were performed using NAMD compiled for use with CUDAaccelerated GPUs.

' RESULTS
The initial stage of the analysis was to cluster the water molecules from the MD trajectories to identify the hydration sites. To assess the predictions, we compared the positions of the hydration sites to the experimental positions of the oxygen atoms of water molecules from the crystal structure. The experimental sites should represent regions of high water density. We counted the number of predictions where the hydration sites were within 1.2 Å of the crystal structure oxygen atom position. Density was assigned to 38 water molecules in the crystal structure of apo MAYM within 12 Å of the site centroid. Each MD methodology produced a different number of hydration sites. This data can be found in Table 1. The fixed protein simulation predicts the largest number of hydration sites (78) and identifies the largest number of water molecules from the crystal structure. 21 The sites are predicted with an rmsd of 0.62 Å from the crystal structure positions. However, the restrained simulation also performs well, identifying 65 hydration site and 20 water molecules from the crystal structure with an rmsd of 0.64 Å. The correctly predicted hydration sites (blue) and crystal structure water molecules (red) for the restrained simulation are shown in Figure 2. Some water molecules and some hydration sites lie under the surface and thus do not appear in the figure. The water molecules labeled A, B, and a The effect of the MD protocol on the hydration site clustering and the accuracy with respect to the crystal structure water molecules. The percentage of predictions correct is the percentage of predictions made that are correct. The percentage crystal waters matched is the percentage of the crystal water molecules that were correctly identified. Journal of Chemical Theory and Computation ARTICLE C are in close proximity to neighboring crystal units in the X-ray structure (3.60, 5.27, and 4.74 Å to the closest heavy atoms, respectively), and their positions may thus be affected. The free simulation compares less favorably with the crystal structure, identifying 18 water molecules from the crystal structure with an rmsd of 0.76 Å and 52 hydration sites in total. It is important to note that the two metrics of the number of crystal structure water molecules identified and the rmsd of the water molecules are reliant on assigning X-ray density to specific points, which is an artifact of crystallography. In addition to comparing the positions of the hydration sites with the crystal structure, we calculated the effect of the three schemes on the calculated occupancy and thermodynamic properties of the hydration sites. The results of this analysis can be seen in Table 2, which details the calculated properties of five hydration sites. In general, despite small differences in the number of predicted sites and their position and occupancy, the fixed and restrained schemes agree reasonably well on the majority of the hydration sites. However, the free scheme yields quite different results, with markedly lower occupancies for all the hydration sites. There is also a key disparity that it is interesting to note. When restraints on the protein are removed, the hydrophobic phenylalanine pocket is filled by two methionine residues for a significant portion of the simulation. These two methionine residues form one side of the phenylalanine pocket. This reduces the apparent occupancy of the four water molecules within the pocket to an average of 0.19 in the free simulation. This low occupancy means that they are not identified as hydration sites under the clustering protocol. These four sites have appreciable occupancies of 0.94 and 0.90 on average from the fixed and restrained simulations. This prediction is not completely unexpected, as the opening and closing of hydrophobic pockets on protein surfaces has been observed. 36 Furthermore, these two methionines have relatively high average B-factors of 15.99 Å 2 and 11.93 Å 2 , suggesting high mobility. Because of the limitations of MD and of crystallography, it is difficult to assess whether the phenylalanine pocket spends an appreciable time in a closed conformation. However, as this clearly affects the MD simulations and the subsequent IFST analysis, it is a very important consideration. If the protein structure is treated as fully flexible, the energy function must be accurate or the predictions of IFST will be misleading. Previous implementations of this methodology have treated the protein as fixed 18 or as restrained. 21 We predict that this can have a significant effect on the location and occupancies of hydration sites. It also has a significant effect on the calculated thermodynamic properties, as can be seen in Table 2 and Table 3. Table 2 details the interaction energy, entropy, and free energy for the three different MD protocols for ten hydration sites. For many of the hydration sites, the three schemes agree both qualitatively and quantitatively. However, some hydration sites are predicted to have different thermodynamic properties in the three schemes. This is true for sites A and F in Table 2, where the predictions for the free energies vary by 1.80 and 2.23 kcal/mol, respectively. Such a difference impacts the conclusions of the modeling and would affect any quantitative treatment of the results. Table 2 shows that the hydrophobic sites C, D, and G have a free  a Details on the thermodynamic properties for the hydration site lying within the alanine pocket, calculated using the restrained MD scheme. The proteinÀwater terms are denoted pw, and the waterÀwater terms are denoted ww. The translational contributions are denoted trans, and the orientational contributions are denoted orient. E is the interaction energy, and F is the free energy.

Journal of Chemical Theory and Computation
ARTICLE energy with respect to bulk of +4.62, +4.19, and +6.33 kcal/mol. This agrees very well with previous applications of IFST to hydrophobic sites, where the maximum free energy with respect to bulk was approximately 5 kcal/mol. 20,21 Table 3 provides more specific details on the thermodynamic properties for the hydration site lying within the alanine pocket. The proteinÀwater entropy decreases from the free scheme to the restrained scheme and then to the fixed scheme. This trend occurs throughout the results. Fixing or restraining the protein also restrains the surrounding water molecules, and this has a direct effect on the entropies. The ten hydration sites shown in Table 2 are illustrated in Figure 3. For the hydration site labeled A, the three schemes agree closely with one another in position and also agree with the crystal structure position. However, the fixed scheme has a markedly different thermodynamic profile from the other schemes. This is due to the increased order in the fixed scheme at this hydration site, with the resulting decreased entropy leading to a less favorable free energy with respect to bulk. Hydration sites B, C, D, and E lie in the phenylalanine pocket and form a conserved square network with few hydrogen bonds per water. This is most marked for hydration sites C and D at the base of the pocket, which have very reduced interaction energies with respect to bulk. However, these hydration sites do not have a high overall entropy with respect to bulk water because of the reduction in waterÀwater correlations in the pocket. Hydration site G lies on the surface of the protein in the same location as the DMSO solvent molecule in the crystal structure. The highly unfavorable free energy for this hydration site may explain why a DMSO molecule is found there in the apo state. Hydration site H also lies on the protein surface above a backbone amide group but is mostly exposed to solvent. It has a more favorable interaction energy than in bulk due to hydrogen bonding, and the reduced waterÀwater correlations at the surface also lead to a favorable entropy with respect to bulk. Displacement of a water molecule from this hydration site by a ligand is predicted to contribute unfavorably to the binding free energy. Formation of a strong hydrogen bond between the ligand and the backbone amide group at this site could lead to a net favorable contribution to the binding free energy whereas a hydrophobic group would lead to a net unfavorable contribution. Hydration site I lies in the alanine pocket, and water molecules within this site have a strong degree of orientational ordering due to the formation of hydrogen bonds with two backbone carbonyls. Hydration site J is on a flat hydrophobic surface and makes weak interactions with the protein. However, its overall interaction energy is only 0.27 kcal/mol higher than in bulk water due to favorable interactions with other water molecules. However, these interactions lead to a strong degree of order and unfavorable proteinÀwater entropy (+1.79 kcal/mol) and waterÀ water entropy (+1.77 kcal/mol) terms. The property of increased ordering around hydrophobic solutes to yield favorable interactions has been likened to the formation clathrate cages and has been used previously to explain the hydrophobic effect. 37 The surface of RadA along with the predicted hydration sites from the restricted simulation can be seen in Figure 4. The sites are colored by hydrophobicity from hydrophobic in blue to hydrophilic in red. Such a view has been used previously to study protein binding sites and to explain binding affinity and selectivity. 22,23 Here it can be used to identify binding hotspot regions and provide a quantitative comparison. The phenylalanine and alanine pockets are clearly visible with blue hydrophobic sites on the left-and right-hand sides, respectively.
As well as studying the effect of the three simulation schemes, we have also considered the effect of other computational parameters in the IFST methodology. In this study we only considered waterÀwater entropies for pairs of hydration sites up to 3.5 Å apart, because of the high computational cost of considering a large number of pairs. We thus looked at the correlation in the waterÀwater pair distance and the waterÀ Figure 3. The molecular surface of the MAYM mutant showing the positions of ten water molecules in the crystal structure and the predicted hydration sites from the three simulation schemes. The oxygen atoms of the crystal structure water molecules are colored red, the hydration sites from the free simulation are colored green, the hydration sites from the restrained simulation are colored dark blue, and the hydration sites from the fixed simulation are colored cyan.  Journal of Chemical Theory and Computation ARTICLE water pair entropy. A graph of the waterÀwater pair distance against the waterÀwater pair entropy for the restrained scheme can be seen in Figure 5. Because of the dependence on the radial distribution function in bulk, the significant pair entropies are found when the distance between the hydration sites is similar to the maximum in the radial distribution function (2.7 Å). No significant pair entropies are found for hydration sites separated by more than 3.2 Å using this methodology. The majority of the pair entropies result from the orientational term, with the largest translational term being only 0.006 kcal/mol. With sufficient data, it would be very instructive to repeat this calculation without the approximations to the correlation functions.
As a final test, we also calculated the change in solventaccessible surface area (ΔSASA) of a carbon atom placed at the centroid of each hydration site. The ΔSASA upon binding is commonly employed as an estimate of the contribution of the hydrophobic effect to binding, so we were interested in how it correlates with the thermodynamic properties of the hydration sites. Figure 6 shows the plot of ΔSASA against the entropic contribution to the free energy (ÀTΔS) for all 65 hydration sites in the restrained simulation. The coefficient of determination between ΔSASA and ÀTΔS is 0.52, suggesting a reasonable correlation, with buried sites tending to have more negative entropies and thus more unfavorable contributions to the free energies. The coefficients of determination for ΔSASA with the interaction energy (0.06) and the total free energy (0.31) were not as high. The ΔSASA for a shape comprised of all 65 hydration spheres was 2167.46, and the sum of the entropic contributions to the free energies for the 65 sites was 62.14 kcal/mol. This corresponds to a value of 28.67 cal/mol/Å 2 , which is consistent with previous estimates used in MMPBSA (38) and MMGBSA (39) of between 5.0 and 50.0 cal/mol/Å 2 .
In summary, the results of this study highlight the importance of the molecular dynamics scheme on the results of IFST and illustrate how the predictions from IFST can be used to understand the thermodynamics of hydration at a protein surface.
' DISCUSSION This paper describes the application of IFST to a prototypical PPI surface. In particular, we studied the effect of freezing or restraining the protein structure during the simulation. This approximation has been applied previously, and we were interested in the effect. The free, fixed, and restrained schemes perform comparably in terms of correctly predicting the location of water molecules in the crystal structure. The fixed and restrained schemes identify the primary hotspot in the phenylalanine binding site as three hydration sites that are entropically unfavorable and strongly enthalpically unfavorable. However, these sites are not identified in the free simulation, as the protein shifts to close the pocket with two methionine residues. This may be due to inaccuracies in the forcefield, but it may, however, represent a lowly populated state of the apo protein that is incorrectly scored and thus overly populated. It may also be due to incorrect pressure in the MD simulation. Creation of the spherical boundary region and simulation in an NVT ensemble are likely to affect the pressure and the density of the water, which could lead to cavitation. All three schemes predict a secondary hotspot in the alanine binding site and also locate a third hotspot, which is filled by a solvent DMSO molecule in the crystal structure.
In general, the locations of the hydration sites are very similar with the three schemes. However, the results predict that fixing the protein significantly restricts movement of water molecules at the surface, and this impacts the predicted density and thermodynamic properties of the hydration sites. In particular, the proteinÀwater entropies decrease when the protein is frozen, and this leads to less favorable free energies with respect to bulk water. Incorporating at least some protein flexibility into the simulation seems to be very important, and this is consistent with recent implementations of IFST. 20,21 However, the effect of the degree and nature of the restraints have not been fully explored, and this remains as an important task for future work. In particular, quantifying hydration thermodynamics in highly flexible protein regions is a significant challenge but a very important one. The findings of our study also suggest that waterÀwater pair entropies need only be calculated for pairs that are less than 3.5 Å apart for this implementation of IFST, as contributions from more distant pairs were found to be negligible. However, due to the dependence on the radial distribution function in bulk, this may not be true in a more complete treatment of waterÀwater pair correlations and should be investigated in further work. It is also interesting that the degree of burial of a hydration site correlates to some degree with the entropy but not with the interaction energy. This suggests that the surface area term of MMGBSA and MMPBSA approaches to calculating binding free energy captures some aspects of solvent entropy changes.
IFST is one of the most important methods to quantify solvent thermodynamics, and it has numerous important potential applications. As shown here, it is ideally suited to scanning a protein surface to locate binding hotspots, and it can also be used to predict PPI surfaces on proteins of unknown function. When combined with a scoring function to compute proteinÀligand interactions, it can also be applied to molecular docking and the computation of proteinÀligand binding affinities. 21,35 This also allows it to be applied to molecular design algorithms for increasing binding affinities. However, in common with other methodologies that utilize MD, this method is highly sensitive to implementation details. This work details one aspect of the implementation that is very important and suggests a number of others. The utility of the method depends on using accurate forcefields, water models, restraints, and simulation parameters. However, the potential of IFST to greatly improve prediction of proteinÀligand binding affinities makes the development of this method a very important goal of computational modeling. Figure 6. A plot of the change in SASA when a carbon atom is placed at each of the 56 hydration sites predicted by the restrained simulation against the calculated total ÀTΔS of that site with respect to bulk water.