Combining Solvent Thermodynamic Profiles with Functionality Maps of the Hsp90 Binding Site to Predict the Displacement of Water Molecules

Intermolecular interactions in the aqueous phase must compete with the interactions between the two binding partners and their solvating water molecules. In biological systems, water molecules in protein binding sites cluster at well-defined hydration sites and can form strong hydrogen-bonding interactions with backbone and side-chain atoms. Displacement of such water molecules is only favorable when the ligand can form strong compensating hydrogen bonds. Conversely, water molecules in hydrophobic regions of protein binding sites make only weak interactions, and the requirements for favorable displacement are less stringent. The propensity of water molecules for displacement can be identified using inhomogeneous fluid solvation theory (IFST), a statistical mechanical method that decomposes the solvation free energy of a solute into the contributions from different spatial regions and identifies potential binding hotspots. In this study, we employed IFST to study the displacement of water molecules from the ATP binding site of Hsp90, using a test set of 103 ligands. The predicted contribution of a hydration site to the hydration free energy was found to correlate well with the observed displacement. Additionally, we investigated if this correlation could be improved by using the energetic scores of favorable probe groups binding at the location of hydration sites, derived from a multiple copy simultaneous search (MCSS) method. The probe binding scores were not highly predictive of the observed displacement and did not improve the predictivity when used in combination with IFST-based hydration free energies. The results show that IFST alone can be used to reliably predict the observed displacement of water molecules in Hsp90. However, MCSS can augment IFST calculations by suggesting which functional groups should be used to replace highly displaceable water molecules. Such an approach could be very useful in improving the hit-to-lead process for new drug targets.


■ INTRODUCTION
Water molecules are a key component of biological systems and act as ordered structural elements at binding interfaces. 1 The mediation of ligand binding by water molecules can have important consequences for binding affinity and specificity. Numerous examples of water-mediated protein−ligand interactions are known, including peptide binding in tyrosine kinase (Src), 2 binding of inhibitors to proteases, 3 and carbohydratebinding proteins. 4 The consideration of individual water molecules in ligand design hinges on an accurate assessment of opposing thermodynamic contributions. This includes the entropic gain of displacing a highly ordered water molecule and the enthalpic loss of breaking water−protein hydrogen bonds. 5 However, assessing the role of individual water molecules at the binding interface is a complex problem, as highlighted by the prediction that there is no direct correlation between the free energy of water molecules in the binding site and the affinity of bound ligands. 6 Despite the difficulty in predicting and interpreting the roles of such water molecules, several attempts have been made at classifying binding site water molecules with respect to the likelihood of their "displaceability." 7−9 High displaceability in this context corresponds to displacement of the water molecule by a suitable chemical group on the ligand with an accompanying favorable change in the binding affinity. There are a variety of methods for identifying and ranking water molecules in binding sites, including physics-based methods and empirical methods. A physical method based on the double decoupling method employing thermodynamic integration (TI) to replica exchange Monte Carlo simulations was found to be successful in classifying water molecules as displaceable or conserved. 6 However, such an approach requires extremely time-intensive calculations that must be performed on each water molecule individually. This drawback also affects freeenergy perturbation (FEP) techniques, which have also been used to successfully predict the ease of displacement of ordered water molecules in protein binding sites. 10 FEP predictions were found to correlate with the change in affinity for structural modifications that displaced the water molecules. However, the study also noted that a complete thermodynamic analysis is needed in order to accurately compute the effects of ligand modifications. Another approach employed machine learning to create a probabilistic water classifier. 11 This was found to be very good at predicting the location of water molecules and shown to have reasonable predictive power in classifying water molecules as displaceable or conserved. The structural features of water molecules in X-ray crystallographic structures, such as B-factors, accessibility to bulk solvent, number, and strength of protein-water hydrogen bonds, were used to develop multivariate logistic regression models for predicting the displacement of water molecules, yielding a prediction efficiency of 67%. An empirical method has also been developed that employs pseudo-Bayesian statistical analysis on predictions from the HINT scoring function 12 and the Rank algorithm, 13 which is based on the number and geometric quality of hydrogen bonds for each water molecule. The method was found to be particularly useful for identifying strongly conserved water molecules. 14 An alternative to the approaches described above is the use of inhomogeneous fluid solvation theory (IFST). 15 IFST can be used to create a thermodynamic profile of the solvent surrounding a protein, 16,17 which can be used to identify binding hotspots. 18 The free energies calculated by IFST are the contributions of regions of three-dimensional space to the solvation free energy of the solute. 19 When this region is a highly occupied hydration site, ΔG can be thought of as the contribution of a specific water molecule to the solvation free energy of the solute. IFST has shown significant utility in drug development via Schrodinger's WaterMap software. WaterMap has been used to explore the hydrophobic effect, 20 understand SAR, 21 explain kinase selectivity, 22 and predict binding affinities. 23 The advantage of IFST over other physics-based methods such as TI and FEP is that one simulation yields predictions for all the water molecules in a system. Recent work on IFST has focused on binding site prediction, 24 discretization on a Cartesian grid, 25 and investigating the effect of using different water models. 26 In this study, we employ IFST to predict the experimentally observed displacement of water molecules in a protein binding site.
When displacing a water molecule from a binding site in molecular design, consideration must be given to the functional group that replaces it. Multiple Copy Simultaneous Search (MCSS) is one of the oldest methods to predict energetically favorable positions of functional groups in protein binding sites. 27 The MCSS scoring function is based on the CHARMM force field 28 with interaction energies calculated from the difference between the bound and unbound states. The solvent effects are commonly taken into account by an approximation of dielectric screening from a distance-dependent dielectric model. An alternative approach to account for solvent effects is to couple the molecular mechanical energy component with polar and nonpolar solvation free energies derived from implicit solvent formalism in MM-PB/SA or GB/SA methods. 29,30 Implicit solvent approaches have gained widespread application in virtual screening, rescoring of docking poses, and estimation of ligand binding energies in ligand series. 31−36 The sorting of MCSS minima based on MM-PB/SA derived free energy estimates has yielded encouraging results. 37 MCSS analysis using probe molecules with different chemical characters provides what have been termed functionality maps. These functionality maps have been used for construction of ligands for numerous protein targets. 38 Other approaches have also been developed for computational solvent mapping and probe analysis. FTMap searches for favorable probe positions on protein surfaces using an empirical energy function including a desolvation term. 39 A clustering procedure is used to identify consensus probe binding sites which are identified as fragment-binding hotspots. 14 The integral equation theory of liquids, known as the threedimensional reference interaction site model (3D-RISM) 40,41 has also been used to find most probable positions of functional groups and small ligands on protein surfaces. 42 This method has been applied to thermolysin with solvent probes for which experimental data were available, and reasonable agreement with experimental results was obtained. 43 More recently, methods employing molecular dynamics (MD) simulations using a mixture of explicit water and functional group probes have been introduced. 44 Guvench and MacKerell used a solvent mixture consisting of propane−benzene−water to construct probability density maps of probe binding preferences which corresponded to known ligand functional group preferences. 45 Seco and co-workers used a water−isopropanol mixture and evaluated druggability from their binding propensities on protein surfaces. 46 In other studies, more diverse sets of probe molecules have been used to assess the druggability of allosteric protein targets 47 and binding hotspots. 48 An important feature of these developments was the combined use of probe clustering and water displacement as indicators of hotspots. As summarized above, MD simulation based protocols using solvent-probe mixtures have been used previously to identify binding hotspots and predict target druggability. However, a direct assessment of relative displaceability of ordered water molecules has not been addressed in these studies. An effective method for providing such assessments would prove valuable in the hit-to-lead and lead optimization stages of drug development. In addition, an accurate ranking of water molecules making bridging interactions between the protein and the ligand can be also be used to make judicious choices about keeping or removing water molecules prior to molecular docking methods for virtual screening. 49−52 The estimation of thermodynamic properties of water molecules at binding interfaces can provide very useful information for ligand design. The main aim of this study was to investigate the combination of hydration site free energies calculated by IFST with the MCSS scores of probe molecules in order to predict the observed displacement of water molecules. To calculate an experimental measure of displaceability, we used a large collection of ligand-bound structures of Heat shock protein 90 (Hsp90). The observed displacement was calculated from the frequency with which water molecules from the apo structure were displaced by a ligand. Hsp90 is well-known for its function as a molecular chaperone. Due to its important role in assisting protein folding, preventing self-aggregation, and cell cycle progression, it has been established as a valuable target in anticancer drug development. The structure of Hsp90 contains a highly conserved N-terminal domain that is linked, via a highly flexible linker, to a middle domain and a C-terminal domain. 53 The N-terminal domain contains an adenosine binding pocket, responsible for its ATPase activity, and has an unusual motif known as Bergerat fold, which is characterized by a rigid adenosine binding site and a flexible loop in phosphate binding region. 54 The ATPase activity of the N-terminal domain drives structural transitions required for chaperone functioning, and therefore it has been targeted in several drug discovery campaigns. 55 Water molecules have been a key aspect of structural studies on Hsp90, 56 and a key consideration in ligand design programs has been the role of four key water molecules in the binding site that are highly conserved in several complex structures. 57,58 Their systematic displacement and the resulting consequences on ligand optimization have also been the subject of recent investigations. 59,60 These factors make Hsp90 an ideal test case to assess the predictions of IFST.

■ MATERIALS AND METHODS
Crystal Structure Analysis. A data set consisting of 103 ligand-bound X-ray crystallographic structures of Hsp90 (Nterminal domain) was compiled, based on Roughley and Hubbard's work. 58 The structures can be organized into four groups: those based on a resorcinol substructure (41), those based on a (benz)-amide substructure (14), those based on an aminopyr(im)idine substructure (39), and those based on "second-site" fragments (9). Due to modifications around central scaffolds, this data set provides a systematic exploration of water displacement in the binding site. The set of crystal structures used in this work is reported in the Supporting Information. In addition, a crystal structure of the unbound closed-state conformation of Hsp90 (1YER) was used as the apo form of the receptor. All the structures were moved to the same reference frame by a rigid-body superimposition onto the ligand-bound crystal structure from 1UY6. From the superimposition, any water molecule in the apo structure lying within 2.0 Å of any of the ligand atoms in any structure was selected for subsequent analysis. For each of these reference water molecules, the observed displacement fraction (D) was defined from the fractional conservation (F).
F is obtained by counting the number of ligand-bound structures where a corresponding water molecule was present within 1.0 Å (N hydrated ) and then representing it as a fraction of total structures in the data set (N total ). Structure Preparation. The protein structure of the apo form was initially prepared as follows. Atom coordinates were taken from the Protein Databank 61 entry 1YER. 62 This structure is of Hsp90 in the closed state. All crystallographic water molecules were deleted from the structure. The orientations of asparagine and glutamine residues were then checked using Schrodinger's Preparation Wizard. 63 Residues Asn40, Asn105, Gln123, Gln133, and Gln194 were altered by switching the nitrogen and oxygen atoms to improve the hydrogen-bonding patterns. Histidine residues were also checked for orientation and protonation state using Schrodinger's Preparation Wizard. Residues His154, His189, and His210 were flipped and assigned as epsilon protonated. All remaining histidines were assigned as delta protonated, but residue His77 was flipped. The residues lysine, arginine, aspartate, glutamate, cysteine, and tyrosine were also analyzed to check their protonation state. There was no evidence of any unusual protonation states, and thus all lysine and arginine residues were assigned as positively charged, all aspartate and glutamate residues were assigned as negatively charged, and all cysteine and tyrosine residues were assigned as neutral. The hydrogen-atom positions were then built using the HBUILD facility of CHARMM with the CHARMM27 energy function, 64,65 and the atomic charges were assigned from the CHARMM27 force field. 64,65 MCSS Calculations. The prepared protein structure was then subjected to MCSS calculations using eight different probe molecules. The probes were chosen to ensure that energetically favorable positions of polar, charged, hydrophobic, and aromatic functional groups were sampled. Two representative probes from each group were selected: methanol and ammonia (polar), ammonium ions and acetate ions (charged), methane and propane (hydrophobic), plus phenol and benzene (aromatic). All parameters for the probes were assigned using the MMFF94 force field. 66 The binding site region for MCSS calculations was defined as a 12.0 Å sphere around the centroid of all overlaid ligands. For each probe molecule, 500 copies were randomly placed in this sphere where any copy within 1.2 Å of any protein atom was removed. Energy minimizations were carried out in stages, and duplicate copies (within 0.2 Å of another copy) were removed at the end of each stage. The first stage was 500 steps of steepest descent; the second stage was 300 steps of steepest descent, and the final stages were 20 repetitions of 500 steps of conjugate gradient minimization. All minimizations were performed using a distance-dependent dielectric model with the default dielectric constant of 1.0. At the end of the MCSS run, probe copies with CHARMM interaction energies greater than +10.0 kcal/mol were removed. Finally, probe copies within 1.0 Å of each other were clustered together, and the copy with the lowest interaction energy was selected as the cluster representative. All of the above calculations were performed using the MCSS implementation in Accelrys Discovery Studio 3.1.
The raw probe scores from MCSS consist of the CHARMM interaction energy between the probe molecule and protein.
These raw scores were size-normalized by multiplying by the ratio of water to probe molecular volume. This allowed us to compare the scores of probes with different sizes together. In two variations of the protocol, the normalized MCSS scores were combined with two different desolvation penalties: the experimental hydration enthalpy (MCSS-HE) and the experimental hydration free energy (MCSS-HFE). The experimental values of hydration enthalpy and hydration free energy for each probe were obtained from the literature. 67−70 The MCSS derived probe positions were compared with the reference water molecules using the following procedure. A probe position was identified as corresponding to a reference water molecule if any of the probe heavy atoms was less than 1.5 Å from the oxygen atom of the water molecule. When multiple probe positions were obtained for a given water molecule, the best scoring pose was selected.
MM-GBSA Calculations. In a variation of the scoring protocol for MCSS-derived probe positions, the binding energy of each probe was calculated using an MM-GBSA approach: 71 The free energy of each of the above terms is calculated from E MM is the energy calculated from the standard CHARMm energy function, which includes bonded and nonbonded terms. G elec and G np represent the electrostatic and nonpolar components of solvation free energy. G elec represents the electrostatic component of solvation energy and was calculated using a variation of the standard GBSA model, referred to as the Generalized Born method with Molecular Volume integration (GBMV). The nonpolar contribution (G np ) to solvation free energy was calculated on the basis of the Surface Area (SA) model, which assumes a linear relationship between G np and the solvent accessible surface area. 72 A value of 5.0 cal/ mol/Å 2 was used for the surface area coefficient in the SA model. The change in the solute entropy (TS) terms was assumed to be zero in this case. The MM-GBSA scores were also size-normalized using the method detailed above.
IFST Setup. For the MD simulation, the apo protein was first solvated with water molecules. Solvation was performed with the SOLVATE program, 73 version 1.0 from the Max Planck Institute to generate a sphere of radius 50 Å around the protein center. The system was then cut to form a rhombic dodecahedron (RHDO) with an edge length of 60 Å using the CHARMM program (version 34b1). 28 Water molecules were modeled with the TIP3P-Ewald, 74 TIP4P-2005, 75 and TIP5P-Ewald 76 water models in three separate systems. The water model has been shown to affect free-energy calculations in recent studies. 77−79 Six sodium ions were included in all three systems to yield a net charge of zero. The ions were placed far from the Hsp90 binding site.
Equilibration. During all MD simulations, the heavy atoms were harmonically restrained using a 1.0 kcal/mol/Å 2 harmonic force, the RHDO was treated using periodic boundary conditions, and the electrostatics were modeled using the particle mesh Ewald method. 80 The system was first subjected to MD equilibration for 100 ps in an NPT ensemble. This stage of preparation was undertaken to equilibrate the density of the water molecules. The density of the water molecules plays an important role in IFST and is thus important to equilibrate. The system was then subjected to MD equilibration for 1 ns in an NVT ensemble. We ensured that the system was brought to equilibrium before continuing by verifying that the system reached a point where the energy fluctuations were stable.
Molecular Dynamics. Production simulations were performed for 5.0 ns at 300 K. All MD simulations were performed using the NAMD program version 2.8 28 with the CHARMM27 force field 64,65 allowing an MD time step of 2.0 fs. Electrostatic interactions were modeled with a uniform dielectric and a dielectric constant of 1.0 throughout the setup and production runs. van der Waals interactions were truncated at 11.0 Å with switching from 9.0 Å. All MD simulations were performed using NAMD compiled for use with the CUDAaccelerated GPUs. The 5.0 ns MD simulations required approximately 15, 20, and 25 h for the TIP3P-Ewald, TIP4P-2005, and TIP5P-Ewald water models on two four-processor GPU cores.
Hydration Site Identification. The first stage of the IFST analysis was to identify regions within the binding site with a high number density of water, termed hydration sites. 17 We defined the center of the binding site as the centroid of the two water molecules W331 and W440 and considered the region within 12.5 Å of this centroid. We employed a radius of 1.2 Å for these hydration sites, in line with prior work. 16 Hydration sites were identified by two methods. The first method was to place hydration sites at crystallographically determined water positions, and the second was to cluster water molecules from the MD simulation. For clustering, hydration sites were selected by sampling 5000 equally spaced snapshots from the MD trajectory and calculating the number density at Cartesian gridpoints within the binding site. A grid with a resolution 0.5 Å was generated around the centroid of the binding site. The number of water molecules within 1.2 Å from all 5000 snapshots was used to calculate the number density, and hydration sites were selected iteratively, starting at the gridpoint with the highest number density. No gridpoints were selected within 2.4 Å of an existing gridpoint, and the iteration terminated when the remaining gridpoints had a number density lower than 50% of the bulk value. The hydration site positions were then calculated as the mean position of all water molecules within 1.2 Å of each selected Cartesian gridpoint. The clustering required approximately 21, 27, and 34 h for the TIP3P-Ewald, TIP4P-2005, and TIP5P-Ewald water models on a single processor.
Energy Calculations. The interaction energy of each hydration site was calculated by sampling 1000 snapshots with one every 5.0 ps from the 5.0 ns simulation. The average interaction energy of every water molecule within the site was computed from the interaction with the solute (E sw ) and the other water molecules (E ww ). This was then combined with the average interaction energy of a water molecule determined from the bulk water simulation (E bulk ) to calculate the energy difference (ΔE) and binding energy difference (ΔE bind ) as shown in eqs 5 and 6, respectively: The average number of water molecules in a hydration site is termed n. From a corresponding simulation of bulk water, E bulk takes values of −9.81 kcal/mol, −11.33 kcal/mol, and −9.67 kcal/mol for the TIP3P-Ewald, TIP4P-2005, and TIP5P-Ewald water models. The binding energy can also be termed the normalized energy or the per water energy.
Entropy Calculations. The entropy of each hydration site was calculated by sampling 250 000 snapshots with one every 20 fs from the 5.0 ns simulation. The entropy difference was calculated from the contributions of the solute−water term (S sw ) and the water−water reorganization term (ΔS ww ). These terms can be calculated by integrating over the solute−water g sw (r,ω) and water−water g ww (r,r′,ω,ω′) correlation functions. The variable r represents the position of the water molecule with respect to the center of the hydration site, and the set of angles ω represents the orientation of the water molecule in the fixed protein reference frame. As in previous work, only contributions from two particle correlations were considered, 16,17 and vibrational entropy changes are not considered. Furthermore, the solute-water term was separated into translational (S sw,trans ) and orientational (S sw,orient ) contributions, and the orientational distributions were assumed to be independent of the position within the sites. 16 These terms can be calculated for each hydration site as shown in eqs 7 and 8, respectively: ∫ ω

Journal of Chemical Information and Modeling
In these equations, k is Boltzmann's constant, ρ°is the number density of bulk solvent, and Ω is the integral over the angles ω. The solute−water translational correlation functions were calculated using a bin size of 0.03 Å for the radial component, 10°for the φ angle and 1/18 for cos θ. The solute−water orientational correlation functions were calculated using a bin size of 10°for the φ and ψ angles and 1/18 for cos θ. The water−water reorganization term was also separated into translational (S ww,trans ) and orientational (S ww,orient ) contributions. The water−water translational pair probability densities were calculated as previously between water molecules in the hydration site and a set of subvolumes surrounding the hydration site. 26 A sphere of radius 3.6 Å around the hydration site was split into subvolumes using a bin size of 0.1 Å for the radial component, 12°for the φ angle and 1/15 for cos θ. The 3.6 Å cutoff is used because contributions to the translational entropy outside the first solvation shell are expected to be small. 81 Due to the vast amount of data required for calculation of the solute−water−water triplet correlation function, a bulk water−water radial distribution function g ww (R) was assumed as previously. 15 g sw (r′) is the translational probability density with respect to bulk water within a subvolume. For each hydration site, S ww,orient was calculated from the mutual information (I ww ) between the orientations of water molecules in the hydration site and the orientations of water molecules in subvolumes surrounding the hydration site. A sphere of radius 3.6 Å around the hydration site was split into subvolumes using a bin size of 1/4 for cos θ and a bin size of 45°for the other Euler angles. The mutual information was then calculated between the hydration site and each of the subvolumes: S ww (ω rel ) is the water−water relative orientational entropy, where ω rel is the relative orientation of two water molecules using a bin size of 10°. The full five-dimensional relative orientational entropies were estimated by using the second order entropy approximation generated by a truncation of the mutual information expansion. 82,83 These solute−water and water−water terms were then compared with the entropy of a water molecule determined from the bulk water simulation (S bulk ) due to other water molecules within 3.6 Å to calculate the entropy difference (ΔS) and entropy of binding (ΔS bind ) using eqs 13 and 14, respectively:  (14) TS bulk takes values of +3.54 kcal/mol, +3.94 kcal/mol, and +3.85 kcal/mol for the TIP3P-Ewald, TIP4P-2005, and TIP5P-Ewald water models based on the calculations described here. It is important to note that the values of a half in eq 13 have been removed from the definitions for S ww,trans and S ww,orient in eqs 9 and 10 (which has been the convention in previous work) such that eqs 13 and 14 correspond to eqs 5 and 6.
Free Energy Calculations. For each hydration site, the difference in free energy (ΔG) and the free energy of binding (ΔG bind ) is then calculated using eqs 15 and 16, respectively: ΔE is assumed to be a good approximation for ΔH, as the PV term is expected to be negligible. The IFST calculations required approximately 7 min per hydration site.
Linear Regression Analysis. We hypothesized that the combination of best probe score and the corresponding water score should be reliable predictors of the observed displacement measure defined in this study. In order to assess this relationship, multiple linear regression analysis was performed in the R statistical computing package: where β o , β 1 , and β 2 are coefficients and e represents the error term. Several models of this form were constructed using different formulations of the probe and hydration site scores. The probe scores corresponded to MCSS, MCSS-HE, MCSS-HFE, and MM-GBSA scores. For the hydration site score, seven IFST calculated properties were used: ΔG, ΔG bind , ΔE, ΔE bind , ΔS, ΔS bind , and E sw . The predictive power for each model was expressed as the adjusted coefficient of determination (R̅ 2 ), which is given by where n is the number of samples and p is the number of variables in the model. The use of R̅ 2 corrects for improvement in correlation arising just by the increase in number of variables. In all other instances, where correlation between any two variables is reported, we used the standard R 2 values.

■ RESULTS
In the initial IFST analysis, hydration sites were placed on each crystallographic water molecule. This facilitates a direct comparison of observed displacement with various thermodynamic properties. The results of this analysis using water model TIP4P-2005 are summarized in Table 1. All of the crystallographic water positions bound favorably to the protein surface, as indicated by the corresponding hydration sites with negative ΔG values. Favorable ΔG values have also been noted in previous applications of IFST to proteins 16 and model systems. 77 The free energy of hydration sites exhibited a moderately strong correlation with the observed displacement as indicated by an R 2 value of 0.57. Interestingly, some of the hydration sites with a highly favorable free energy corresponded to water molecules (e.g., W301 and W323) that are involved in bridging interactions and are retained by most of the ligands (Figure 1). Among the other thermodynamic properties of hydration sites calculated from IFST, E sw and ΔE also demonstrated strong correlation with the observed displacement, with R 2 values of 0.62 and 0.58, respectively. The correlation was weak for other binding quantities ΔG bind , ΔE bind , and ΔS bind . This is not unexpected, as the binding quantities do not consider the occupancy of the hydration site, which can be low or high. The correlation was also weak for ΔS, and this was not expected. However, the predictions are that enthalpic contributions tend to outweigh entropic contributions in determining the free energy. This is an interesting result that has also been noted in previous applications of IFST to proteins 16 and model systems. 77 However, it may not be general for all systems or for all force fields. It is interesting to note that one of the hydration sites (W336) consistently appeared as an outlier in all correlation plots (Figure 2). The removal of this hydration site from the data set resulted in substantial improvement of correlation between observed displacement and thermodynamic properties, as indicated by R 2 values of 0. 78  presumably because the strong electrostatic interactions are necessary to overcome the strong interactions made by the water molecule. Thus, displacement of W301 must overcome even stronger water interactions and is rarely observed. This explanation provides the rationale for employing MCSS calculations, to allow the strength of interactions with water to be compared with the strength of interactions with probe molecules.
We also compared the effect of other water models in obtaining these quantities for hydration sites. The detailed results for each water model are given in the Supporting Information (Table S2). A comparison of the linear correlation between observed displacement and ΔG values from different models is shown in Figure 2. All three water models produced very similar results, with TIP3P-Ewald producing a very slightly lower correlation and TIP4P-2005 producing a very slightly higher correlation. From this point onward, the discussion of IFST results is based entirely on the TIP4P-2005 model for this reason.
In addition to placing hydration sites at crystallographic water positions, we also assessed the performance of the clustering protocol. The density cutoff was set to half the bulk density in order to identify all hydration sites. It was noted that some crystallographically observed water molecules have approximately the same number density as bulk water and were not predicted using a higher cutoff. Such hydration sites exhibit clustering of water molecules, leading to translational ordering and an unfavorable entropy contribution, but have a weakly favorable enthalpic component and thus a free energy that is approximately the same as bulk water. In terms of reproducing the crystallographic water positions from the simulation, the success rate (expressed as a percentage of crystallographic water molecules reproduced within 1.0 Å) was similar across different water models with TIP4P-2005 having the highest success rate of 80% (Table 1 and Table S2). Clustering identified 18 of the 20 displaceable water molecules within 1.5 Å and 16 within 1.0 Å. The failures all corresponded to crystallographic water positions with low occupancy and small free energy contributions as computed by IFST from the experimentally determined positions. This resulted in a slightly lower correlation with observed displacement (R 2 of 0.50) and suggests that placing hydration sites at crystallographic water positions may be more predictive.  The results of the MCSS calculations are summarized in Table 2. For each water position, the best scoring probe is reported using different formulations of MCSS. The raw MCSS scores reflect the CHARMM interaction energy between the probe and the protein, and desolvation penalties were applied to these raw scores. We tested the use of the hydration enthalpy (MCSS-HE) and the hydration free energy (MCSS-HFE) as desolvation penalties. The resulting scores were then normalized according to the relevant probe molecular volume. Finally, volume scaled MM-GBSA scores are also reported. The original MCSS output indicated a strong preference for charged groups for the majority of the hydration sites. This is most likely due to the crude approximation of solvent-screened electrostatic interactions in the distance-dependent dielectric model. The addition of desolvation penalties somewhat corrected for this trend by replacing nonpolar or polar probes in at least eight water positions. The MM-GBSA scores demonstrated the most significant differences, identifying polar molecules such as methanol and phenol as the most suitable probes for the majority of the hydration sites.
The important point to note from the MCSS analysis was that none of the different formulations had a significant correlation with observed displacement (Table 3). This is not unexpected, since capturing displaceability of water molecules solely from probe binding is unlikely, especially when probe scores contain crude approximations for solvation and entropic effects. The correlation between the different formulations of probe score and the predicted thermodynamic properties of water molecules was also mostly weak or insignificant. Volumecorrected MCSS scores showed R 2 values of 0.29 and 0.39 with ΔG and ΔG bind of the corresponding hydration site. When best scoring MCSS probes are divided into polar and nonpolar categories, the correlation of scores with most of the IFST based properties is in opposite directions. For instance, polar probes have a correlation of 0.21 and 0.24 with ΔG and ΔH, respectively. On the other hand nonpolar probes have a correlation of −0.60 and −0.59 with ΔG and ΔH, respectively. This partially explains the nearly insignificant correlation when probes are considered together. In general, it was noticed that hydration sites with a large enthalpic component were associated with charged or polar probes whereas hydration sites with a small enthalpic component were associated with nonpolar probes.
In order to qualitatively assess the MCSS results, we compared the distribution of probes in the binding site with that of the ligands. For each hydration site, ligand atoms within 1.0 Å of the crystallographic water molecule were obtained, which provided the distribution of atoms that most likely displaced it. These atoms were then divided into polar (N, O, S, P) and nonpolar (C) groups. The same procedure was then repeated for probes which provided the predicted distribution of functional groups around each hydration site. In Figure 3, these distributions are represented as an occupancy map on a 1 Å grid, where each grid point contains the amount of time it was occupied by an atom, represented as a fraction of the total number of atoms within 1 Å of the hydration site. First, we selected a set of hydration sites with highly favorable free energy ( Figure 3A and B) and relatively high observed displacement. For at least three of these hydration sites (W325, W346, and W412), polar atoms comprised more than a 1/3 fraction of the displacing groups. W324 and W529 were displaced less frequently by polar groups. When compared with probe occupancies, this trend was somewhat reproduced. An important difference was that probe polar atoms more frequently occupied W324 and W529. On the other hand, probe polar atoms were less frequently observed for W346, which is frequently displaced by a polar ligand atom. The overlap between the ligand and probe occupancy maps was more pronounced for a set of less favorable hydration sites ( Figure 3C and D), which were most frequently displaced by nonpolar functional groups (both in crystallographic ligands and probes). This can be rationalized on the basis that the optimal van der Waals interactions between (ligand/probe atoms and protein atoms) are more accurately reproduced by MCSS than the electrostatic interactions. Taken together, these results showed that hydration sites with highly favorable free energy (owing mostly to a strong enthalpic component) were displaced frequently by polar ligand atoms, and MCSS-based predictions for probes showed a similar trend. Conversely, hydration sites with less favorable free energy (with a decreased enthalpic but comparable or even large entropic component) were mostly displaced by nonpolar ligand and probe atoms. The number of distinct probes occupying the same energetically favorable site has been used before as an indicator of binding hot spots. 47,48 Prior to selection of the best scoring probe, we obtained the number of distinct probes identified at each hydration site and compared it with the observed displacement. The resulting correlation plot gives a weak R 2 value of 0.11. Hence, a clear relationship between diversity of probe binding and the likelihood of water displacement was not established from these data.
The results obtained from IFST calculations showed that the predicted free energy of hydration sites was a fairly reliable predictor of observed displacement. In order to assess whether the combination of IFST and MCSS probe analysis could improve the predictions, we used multiple linear regression models based on different combinations of hydration site thermodynamic properties and probe scores. The resulting data are summarized in Table 4, where the adjusted coefficient of determination is reported for each model. In most cases, the combination did not yield any improvement. The use of desolvation penalties derived from experimental data also did not lead to any improvement in predictions. However, as noted above, when the outlier W336 is removed from the data set, the adjusted R 2 values increased in some cases. For example, the highest value of adjusted R 2 (0.80) was obtained from a model combining E sw values from IFST and probe GBSA scores. This also indicated that the use of GBSA to estimate probe desolvation penalties provided better results than constant desolvation penalties based on experimental hydration enthalpy or free energy values. An important consideration in ligand design is the prioritization of ordered water molecules in the binding site for displacement and concomitant affinity gains. We selected a set of four water positions to assess the applicability of solvation thermodynamics and probe analysis in order to address such questions (Figure 1). Three of these water positions are already included in the set of reference water molecules derived from the apo structure (1YER). However, one additional water molecule is visible only in ligand bound Hsp90 (e.g., 1UY6). For consistency with previous studies, we refer to these water positions as W1, W2, W3, and W4. 60 The free energy of hydration sites corresponding to these water molecules, the best scoring probes (based on GBSA scores), and associated data are summarized in Table 5. These water molecules form a buried and tightly coordinated network of hydrogen bonds with the protein and/or ligand (Figure 1). The combined regression model based on IFST and MCSS predicts W2 to be the most displaceable in this group, whereas W2 and W3 are predicted to be moderately displaceable, and W1 is predicted to be very difficult to displace. The probe analysis predicts that W2 and W4 could be replaced by methane, whereas the W1 and W3 sites were most favorably filled by ammonia and methanol, respectively ( Figure 4B and D). A collective picture emerges from the probe analysis that the binding site region containing W2 and W4 accommodates functional groups of nonpolar character, but W3 needs to retain hydrogen bonding interactions made by the displaced water molecules. This is also suggested by the good interaction energy (E sw ) for the W3 site.
The IFST predictions agree well with experimental data on binding affinities for a pyrrolopyrimidine scaffold. Commensu-   rate with this is a high predicted displacement, and indeed displacement of W2 by a nitrile group was shown to increase binding affinity approximately 250-fold ( Figure 4C). 59 MCSS suggests a methyl probe is the most favorable replacement for W2 (with an MMGBSA score of −4.55 kcal/mol), but the methanol probe also scores highly (with an MMGBSA score of −4.13 kcal/mol), and this is consistent with displacement by a nitrile group that makes a hydrogen bond with Asn51. Thus, the experimental observations can be explained by IFST and MCSS due to the low ΔG for W2 in concert with a good probe score (Table 5). For the nitrile derivative, further displacement of W3 by a methyl group did not increase binding affinity. This was attributed to the attenuation of affinity gain by the loss of hydrogen bonding interactions for W3 and the clash of the new side chain. The rationalization based on IFST predictions is that the lack of significant gain in affinity is because W3 is not as easy to displace as W2 and that a nonpolar group is not the optimal replacement. Displacement of W3 alone by a polar substituent is predicted to be more fruitful. Further displacement of W4 by extending the methyl group to an ethyl group was shown to increase binding affinity 3-fold ( Figure 4C). IFST and MCSS again rationalize this, with W4 predicted to be moderately displaceable by a nonpolar group such as methane.
In addition, there is a small nonpolar cavity behind W3 that does not contain a water molecule. Such evacuated regions can provide unexpected boons in binding affinity, 84 and extension of the ethyl group to an isopropyl group may prove beneficial. It is interesting to note that the model predicts that W2, W3, and W4 are significantly more displaceable than has been observed experimentally. The recent experimental data 59 suggest that IFST is able to identify untapped potential at these sites. This would be very useful information in other established drug development projects. In order to further investigate the relationship between X-ray crystallographic ligands and thermodynamic properties of hydration sites along with associated functional groups, we selected a subset of unfavorable hydration sites including W338, W357, W379, W381, W385, W405, W435, W476, W536, and W547. The choice of these hydration sites was based on the fact that most of these are almost always displaced by the ligands (Table 1), and their calculated free energies lie in the range −0.5 kcal/mol to −2.5 kcal/mol. The remaining hydration sites had free energies less than −4.0 kcal/mol. The positions of the corresponding X-ray crystallographic water molecules and the most favored MCSS probes, based on MM-GBSA scores, are shown in Figure 5. The majority of sites are occupied by nonpolar groups, though some lower energy hydration sites are occupied by polar groups. This suggests an overall trend of prioritizing lower energy hydration sites for displacement with polar groups and high energy hydration sites with nonpolar groups. We performed a retrospective analysis of the different ligand series used in this study to see if a similar trend could be observed.   In addition, we noticed that W405 and W435, which are relatively favorable hydration sites in this subgroup, are located in a hydrophobic cavity lined by Ala55, Ile96, Gly97, and Ile107. In the X-ray structure, the only hydrogen bond observed for this pair of water molecules is between W435 and the backbone carbonyl of Leu107. The MCSS predicted probes reproduce this pattern by favoring a methane in place of W405 and a phenol group in place of W435, retaining the hydrogen bond of W435 with the backbone carbonyl of Leu107. Such binding motifs where hydrogen bonding functionalities are enclosed in a hydrophobic environment can provide binding hotspots. 17 The combination of IFST and MCSS can provide valuable information in this regard. Many of the potent Hsp90 inhibitors displace this pair of water molecules with a combination of polar and nonpolar groups, as predicted by probe analysis (Figure 6B, 7C,D). Figures 6−8 provide different scenarios in ligand design where predicted displacement of water alongside probe analysis could prove useful. We emphasize that binding site hydration was not explicitly addressed during ligand design in these  examples from the literature. However, the results are useful in understanding the utility of IFST and MCSS. In Figure 6, representative ligands from various stages of design (fragment/ virtual screening hits 6A and B plus the clinical candidate 6C) are shown, which highlight the observation that stepwise disruption of binding site hydration (although not attempted in this particular ligand series) is a suitable route toward optimization. It is notable that all of the ligands in this study displace W357, which IFST predicts to make the least favorable contribution the solvation free energy among the 20 waters studied. W536 is predicted to make the second least favorable contribution to the solvation free energy and also appears to be a clear binding hotspot. The weaker inhibitors in Figure 6A and B do not displace it, whereas the high affinity clinical candidate in Figure 6C does displace it. This suggests that further affinity can be gained by molecules that do not displace W536, such as the inhibitor in Figure 8B. A systematic approach to ligand optimization should involve an assessment of binding site water molecules and the chemical functionalities suitable for displacing them. Figure 7 provides an example where core scaffolds from two distinct chemotypes (resorcinol in Figure 7A and B plus benzamide in Figure 7C and D) bind at similar locations, displacing the same hydration sites, W357 and W405, representing potent hydration sites for displacement. Apart from these two sites, different functional groups in each of these optimized candidates displace different water molecules. Consequently, the similarities between these functional groups and MCSS-predicted probes are interesting, e.g., methanol probes at W385 and W536 compared against the ligand in Figure 7A and phenol at W381 compared against the ligand in Figure 7C. Finally, in fragment linking approaches (shown in Figure 8A and B), the choice of linker can benefit from consideration of probes favorably displacing a water molecule. For example, the methylsulfonamide linker between the two fragments shown in Figure 8A displaces W381 in Figure 8B. MCSS favors a polar phenol probe at this site ( Figure 6B), and the overlap of polar atoms between ligand and probe is noteworthy To summarize, the results suggest that IFST and MCSS can be favorably combined to provide a useful binding site profile. The clustering of probes at a given location on the protein surface has been used as an indicator of binding hotspots, which can give rise to substantial gains in affinity for protein−ligand interactions. However, the use of probe calculations alone can result in false positives as they do not accurately consider the effects of solvation. IFST provides explicit accounting for water molecules, and hence a combination of both approaches can lead to better identification of hotspots in concert with advantageous functional groups for binding. Conversely, high scoring probe positions from MCSS that correspond to energetically favorable hydration sites can indicate strategically important sites where designing bridging interactions will be more appropriate.

■ DISCUSSION
In this study, we have combined IFST calculations of solvent thermodynamics with MCSS probe analysis to predict the observed displacement of water molecules from protein hydration sites on Hsp90 and the best type of functional moiety to replace them, in the context of small molecule ligand design. IFST calculations of the hydration site free energies yielded a moderate correlation with the observed displacement, an encouraging result given the number and variety of ligands considered. Interestingly, the correlation with the total interaction energy (ΔE) is slightly better than the correlation with the free energy (ΔG), and the correlation with the protein interaction energy (E sw ) is even better. This can be rationalized on the basis that the enthalpic contributions to the solvation free energies are calculated to be significantly higher than the entropic contributions and that the loss of water−water interactions is less significant in the context of a ligand in the binding site that has already displaced the neighboring water molecules. This result suggests that approaches such as MCSS, which can calculate the interaction energy of water molecules with the protein, should work well in predicting observed displacement. However, there are two issues with this. The first issue is that the protein interaction energy from IFST is calculated from an explicit water simulation, and calculations based on a single water molecule that is not part of an ensemble will be different. The second issue is that the protein interaction energy from IFST is scaled by the occupancy of the site, and a calculation of the highest scoring single water molecule is not. Such a calculation is more analogous to the IFST binding quantities, which we have shown to have a weaker correlation with observed displacement.
To explore the application of IFST, we also calculated predictions based on three different water models, and in this case the results are very similar. All three water models predict hydration sites at very similar locations, and the R 2 between free energy predictions for the different water models is greater than 0.9 in all cases. In addition to predicting the observed displacement of water molecules, we considered whether MD simulations can reproduce crystallographic water positions. For TIP4P-2005, 18 out of the 20 crystallographic water positions were reproduced within 1.5 Å ( Table 1). The other two water models had similar performance. It is important to note that a complete reproduction of crystallographic water positions is not always to be expected, as crystallographic water positions rely on an assignment of density from sometimes ambiguous data. We also compared two methods for placing hydration sites. The first was to place hydration sites at the positions of crystallographic water molecules, and the second was to place hydration sites by clustering the positions of water molecules from the MD simulations. The two approaches yield hydration sites at very similar positions with almost identical thermodynamic properties. This is shown by the high correlation between the free energies of the set of 18 equivalent hydration sites that are common between the two approaches (R 2 = 0.99). However, placing hydration sites at crystallographic water positions is slightly more predictive of the displacement, because the clustering procedure failures to identify all the crystallographic hydration sites. The IFST calculations show another interesting result, which is that the enthalpic contributions to the total free energy are commonly of greater magnitude than the entropic contributions. This has been noted previously 16,18 and may be a general result of IFST calculations. In contrast to the results of IFST, MCSS proved unsuccessful at predicting displacement and was unable to improve the predictions of IFST when the two were used in concert. However, MCSS does demonstrate a synergy with IFST when applied to ligand design. When IFST identifies a hydration site as highly displaceable, MCSS can predict the ligand functionality that should be used to displace it. Such information is harder to glean from IFST calculations alone, although two points should be noted. The first is that the positions of hydration sites predicted to have a highly favorable free energy are commonly found to overlap with polar ligand atoms. This is easily rationalized, as strong hydrogen bonding is the cause of both effects. In addition, the predicted orientation of the hydrogen bonding interactions for a water molecule is often recapitulated by the geometry of ligands, and this observation could be used in design. Conversely, hydration sites predicted to have a small free energy are commonly found to overlap with nonpolar ligand atoms.
Water displaceability is a difficult metric to capture from experimental data because displacement is Boolean for any given complex. In fact, almost every water molecule should be displaceable by some ligand, but this may be deleterious to the binding affinity. For example, displaceability data may be confounded if medicinal chemists have deliberately attempted to displace particular water molecules in cases where this is deleterious to the binding affinity. This will increase the frequency of displacement for a given water molecule in a case where displacement is not advantageous. For this reason, thermodynamic data may prove more useful in quantifying displaceability, but one still cannot quantify the specific contribution of a single water molecule to the binding affinity. Thus, we consider that the frequency of displacement from a large and varied data set of complexes is the best experimental data available. However, such data are still imperfect due to the bias of design by medicinal chemists. In addition, there are many cases where water molecules are subtly shifted rather than displaced, and this blurs such data. Despite these issues, IFST has proven to be a useful method for studying solvation thermodynamics, providing a perspective that is not available using other computational methods by decomposing solvation free energies into contributions from specific spatial regions. A similar perspective can be gained from FEP, by decomposing solvation free energies into contributions from specific species, including the water molecules. 87,88 FEP can also be used to selectively excite the individual degrees of freedom of water molecules and assess their contribution to the physicochemical characteristics of water. 89 In this work, we have shown that IFST can be used to identify binding hotspots and predict which water molecules should be advantageously displaced from a protein binding site. While a larger study would be needed to show that this is a general result, early results are promising and suggest that such an analysis performed on a new drug target would provide very useful information for ligand design. It may also show utility in identifying untapped potential for well-known drug targets. As noted previously, accurately predicting the effect of a ligand modification upon binding affinity cannot be achieved by considering any one factor and requires a comprehensive thermodynamic analysis of the ligand binding. 10 However, given the vast number of potential ligands to be considered, there is a significant advantage in deriving useful information from IFST, which can be calculated from a single MD simulation.
■ ASSOCIATED CONTENT * S Supporting Information 2-D diagrams of all the ligands in the four series investigated in this study are shown in Table S1, alongside the PDB codes of the corresponding Hsp90 X-ray crystal structures. IFST results for the reference water molecules based on the TIP3P-Ewald and TIP5P-Ewald water models are shown in Tables S2 and S3, respectively. This information is available free of charge via the Internet at http://pubs.acs.org.