Abstract
The calculation of relative free-energy differences between different compounds plays an important role in drug design to identify potent binders for a given protein target. Most rigorous methods based on molecular dynamics simulations estimate the free-energy difference between pairs of ligands. Thus, the comparison of multiple ligands requires the construction of a “state graph”, in which the compounds are connected by alchemical transformations. The computational cost can be optimized by reducing the state graph to a minimal set of transformations. However, this may require individual adaptation of the sampling strategy if a transformation process does not converge in a given simulation time. In contrast, path-free methods like replica-exchange enveloping distribution sampling (RE-EDS) allow the sampling of multiple states within a single simulation without the pre-definition of alchemical transition paths. To optimize sampling and convergence, a set of RE-EDS parameters needs to be estimated in a pre-processing step. Here, we present an automated procedure for this step that determines all required parameters, improving the robustness and ease of use of the methodology. To illustrate the performance, the relative binding free energies are calculated for a series of checkpoint kinase 1 inhibitors containing challenging transformations in ring size, opening/closing, and extension, which reflect changes observed in scaffold hopping. The simulation of such transformations with RE-EDS can be conducted with conventional force fields and, in particular, without soft bond-stretching terms.
Similar content being viewed by others
Introduction
Rigorous free-energy calculations using molecular dynamics (MD) simulations have become an important tool to estimate binding free energies of novel compounds for lead optimization in drug discovery [1,2,3]. Although computationally relatively expensive, these methods are needed to properly account for entropic contributions introduced by protein/ligand conformational changes, entropy–enthalpy compensation, and desolvation of the ligand [4].
Computational free energy calculations typically make use of thermodynamic cycles. For instance, to estimate the binding free energy of five compounds, a “state graph” can be constructed (Fig. 1), where the nodes represent the end states and the edges the free-energy differences between them. Although not impossible [5], the direct calculation of (absolute) binding free-energies (\(\varDelta G^\text{bind}_i\)) is generally very challenging to achieve computationally [1]. A simpler alternative is to calculate the alchemical free-energy differences between two compounds i and j in a given environment (\(\varDelta G_{ji}^{env}\)) and then compare the relative binding free energy \(\varDelta \varDelta G^\text{bind}_{ji}\) with the difference of the \(\varDelta G^\text{bind}_i\) obtained from experiment [6, 7],
Conventional free-energy methods such as thermodynamic integration (TI) [8] and free-energy perturbation (FEP) [9] introduce a coupling parameter \(\lambda\) to define a pathway from end state i (\(\lambda =0\)) to end state j (\(\lambda =1\)). In practice, simulations at discrete intermediate \(\lambda\)-points are performed to obtain converged free-energy differences.
If a (large) series of N compounds is investigated, the free-energy difference for all \((N(N-1))/2\) pairs of ligands would in principle have to be calculated. To reduce the computational cost, automatic schemes have been developed to identify the edges in the state graph (Fig. 1) with the smallest perturbations such that all nodes (for a given environment) are connected [10,11,12]. It is thereby important to include some cycles as cycle closure is a frequently used measure to assess convergence. Nevertheless, manual optimizations may sometimes be required to determine the best sampling strategy [13]. Furthermore, calculating only a subset of the edges leads to a larger uncertainty in the estimated free-energy difference for pairs that are no longer directly connected. As \(\varDelta \varDelta G^\text{bind}_{ji}\) values are often relatively small, the increased uncertainty may negatively impact the usefulness of such calculations in practical applications.
An attractive and more efficient alternative to path-dependent methods is to simulate a reference state, which includes all N end states simultaneously, without the specification of pathways (green rings in Fig. 1). Such a reference state is provided by the enveloping distribution sampling (EDS) [14,15,16,17] method. The EDS reference state can be tuned for optimal sampling with parameters. Note that cycle closure is guaranteed by definition in this approach. In order to enhance sampling further, combinations of EDS with enhanced sampling methods were developed such as replica-exchange EDS (RE-EDS) [18,19,20] and accelerated EDS [21, 22].
In this study, we present an improved automated workflow for RE-EDS simulations that was restructured into two phases. The first phase aims to automatically estimate method parameters that otherwise had to be provided by the user. The second phase automatically optimizes the estimates from the first phase to retrieve a robust parameter set. The final production phase calculates the relative binding free energies of multiple ligands from a single simulation per environment. The robustness and versatility of the RE-EDS workflow are demonstrated on a series of five inhibitors of human checkpoint kinase 1 (CHK1) [23]. These ligands were selected by Wang et al. [24] as a challenging benchmarking set for FEP calculations since the changes between these ligands exemplify different types of core-hopping transformations (i.e. ring size change, ring opening/closing, and ring extension). Special soft bond-stretching terms were developed to be able to handle these transformations [24]. In contrast to other methods, no such special soft bonds are required with RE-EDS as we can use a “dual topology” approach [17] in a straightforward manner.
Theory
Enveloping distribution sampling (EDS)
In EDS, free-energy differences between multiple end states are obtained by sampling a reference-state Hamiltonian, i.e. without the definition of specific alchemical paths [14, 15, 17]. Given N end states, the potential energy function V of the EDS reference state R is defined as,
where \(\beta = (k_\mathrm {B} T)^{-1}\) with \(k_\mathrm {B}\) being the Boltzmann constant and T the absolute temperature. The smoothing parameter s and the energy offsets \(\mathbf{E}^R\) were introduced to enable tuning of the reference state for optimal sampling of all end states [14, 15].
A smoothness parameter \(s=1.0\) gives a reference potential-energy landscape that contains all the relevant minima of the end states. However, these might be separated by high barriers. For \(s < 1\), the energy barriers between different end states \(V_i\) are smoothed in the reference state \(V_R\), increasing the transition rates between the different minima (Fig. 2a) [15]. However, if s is chosen too small, \(V_R\) consists of a global unphysical minimum, which does not correspond to any of the end states. In the limit of \(s\rightarrow 0\), all end states contribute equally to the potential-energy function of the reference state [25], which can lead to unphysical configurations. The situation with a too small s has been termed “undersampling” [17].
The energy offsets \(\mathbf{E}^R\) are used to ensure equal weighting of all end states \(V_i\) in \(V_R\) (Fig. 2b). Note that the optimal values of s and \(\mathbf{E}^R\) are not independent of each other (as can be seen in Eq. (2)) [15]. Different schemes have been proposed to determine optimal reference-state parameters [16, 17, 26], however, these are only applicable to systems with two end states.
The force on a particle k in the EDS reference state is calculated as [15],
For s-values close to one, the reference-state forces are dominated by the one end state, for which the current coordinates are most favourable, while the other end states give high energies and therefore contribute little (i.e. “dummy states”). For small s-values (undersampling situation), all end states contribute effectively to the forces, resulting in the global unphysical minimum.
The free-energy difference between two end states A and B can be calculated by employing the Zwanzig equation twice forming a path via the reference state R [9, 14, 15],
Replica-exchange EDS (RE-EDS)
The recently introduced RE-EDS method [19, 20] is a type of Hamiltonian replica exchange [28, 29] with the smoothness parameter s as the exchange dimension (\(1\geq s>0\)), which was inspired from constant pH simulations by Lee et al. [18, 30]. The approach is shown schematically in Fig. 3. RE-EDS does not require a single (optimal) s-value. Instead, enhanced sampling is achieved by exchanging between the replicas with different smoothness levels. This simplifies the parameter choice problem and thus, the method can be applied to systems with more than two end states [19, 20].
For the pairwise exchanges between neighboring replicas k and l, a Metropolis-Hastings criterion [31] is used [19, 29],
where \(H_{R_k}\) and \(H_{R_l}\) are the reference-state Hamiltonians of the respective replicas, \(\mathbf{r}_k\) and \(\mathbf{r}_l\) are the current coordinates of the replicas.
Replicas are placed between \(s=1.0\) and a lower bound of s, where the reference state is in undersampling. The replicas with low s-values facilitate the transitions between the low-energy regions of the different end states. Especially for systems with slowly adapting environments (e.g. protein binding pockets), regions in s-space with very low acceptance probability can occur. Thus, to ensure sufficient exchanges between all pairs of replicas, a local variant of the round-trip time optimization algorithm [32, 33] was developed to optimally place the replicas in s-space [20]. It was found that a single set of energy offsets can be used for all replicas [19]. However, it is important that these energy offsets are chosen well to avoid “leakage” effects, resulting in one or more end states not being properly sampled [19]. The final free-energy differences are estimated from the replica at \(s=1.0\), which represents the physical minima of the end states.
Automatic parameter optimization
To facilitate the determination of the energy offsets and s-parameter distribution, we have extended and further automated the previous [20] RE-EDS workflow (Fig. 4).
The initial input for a system with N end states consists of a prepared EDS system (i.e. topology, perturbation topology, initial coordinates, and distance restraints), a list of energy offsets of length N with \(E_i^R = 0; ~ \forall ~ i \in [1,...,N]\), and a list of s-parameters, which are logarithmically distributed in the range \(s_i \in [1, 10^{-5}]\). Typically, we use 21 initial s-values.
The parameter exploration consists of three substeps: (i) determining the lower bound for the s-distribution (newly introduced), (ii) obtaining optimized coordinates within the EDS set-up for each end state (newly introduced), and (iii) estimation of an initial set of energy offsets (as done previously in Ref. [19]).
To enable sampling of all end states at \(s=1.0\), some replicas have to be in undersampling to facilitate transitions. However, for efficiency reasons (and numerical stability) the number of replicas M in undersampling should be small and the lowest s-value should be as high as possible. From a short simulation with the initial s-distribution between \([1, 10^{-5}]\), the highest smoothing parameter \(s_{M_\mathrm {us}}\) at which undersampling still occurs is determined and used in the following as a lower bound for the s-distribution. The s-distribution for the next step is then defined by logarithmically distributed replicas between \(s=1.0\) and the automatically determined lower bound.
Optimized coordinates for each end state in the EDS setup can be obtained from short parallel simulations, where one end state in turn is favoured by setting an arbitrarily large energy offset for this state. The optimized coordinates allow the user to start RE-EDS simulations from different end states and are needed for the subsequent parameter optimization.
In the last substep, the \(E_{\text{i}}^{\text{R}}\) estimation, the previously developed parallel energy offset estimation (PEOE) [19] scheme is used to estimate the initial set of energy offsets. This is done based on a short simulation with the initial parameters. For each replica k in the undersampling region, the energy offsets are extracted using [19],
The energy offsets that were extracted in parallel for the k replicas are subsequently averaged and used as initial set of energy offsets. These energy offsets should provide a first solution that is close to the optimal choice of energy offsets, which leads to an optimal state sampling of all end states in the RE-EDS simulation. As the initial energy offsets are obtained from the replicas in undersampling, they may not be exactly optimal and require fine-tuning in the next phase.
In the second step of the RE-EDS workflow, first the s-distribution is optimized and subsequently the energy offsets are fine tuned. The s-distribution is improved by minimizing the round-trip time \(\tau\) and increasing the number of round-trips with the multistate local round-trip time optimization (N-LRTO) algorithm [20]. The optimization is performed in an iterative manner with short simulations. This step is required as exchange bottlenecks between two replicas might occur leading to a very slow round trip time or to no round trips at all. In the N-LRTO algorithm, new replicas are inserted in each iteration by linear interpolation in the s-regions with exchange bottlenecks, while the replica positions of the previous iteration are retained. Adding replicas theoretically increases the round-trip time due to a longer path between the top and bottom replicas. However, the addition of intermediate replicas also increases the exchange probability between neighboring replicas, thus reducing the round-trip time. With the optimization algorithm, we aim to determine the balance between the length of the replica path and the likelihood of exchange between replicas for minimal round-trip time. The exchange bottlenecks are identified for each end state separately (i.e. multistate). The number of replicas added can be chosen by the user. The iteration is stopped when the average round-trip time \({\overline{\tau }}\) converges. The N-LRTO variant is needed for systems for which severe bottlenecks are observed with the initial logarithmic s-distribution (e.g. protein binding pockets). For systems with smaller perturbations, the global multistate variant (N-GRTO) [20] can be more efficient as this algorithm re-distributes the replicas in s-space according to the exchange statistics. In this study, we started with the same number of replicas as used for the PEOE scheme above and added four replica positions per iteration in the N-LRTO algorithm.
After optimizing the number of round trips and \(\tau\), the distribution of the state sampling is improved. To reach the ideal situation that each end state is sampled to an equal amount, the initial energy offsets need to be fine tuned, while keeping the round trips approximately constant. For this, we introduce here the energy offset rebalancing scheme. To avoid overshooting, a correction factor is calculated and applied iteratively,
where \(f_i^{\text{mc}}\) is the current sampling fraction (or estimated probability) of an end state contributing to \(V_R\), and \(f^{\text{mc,ideal}}\) is the ideal sampling fraction (see “Analysis” section). To make the approach more robust, a pseudo count c is introduced to avoid singularities with zero sampling, which is defined as,
with the intensity factor x. The default of the pseudo count was chosen to result in a maximal correction of \(\varDelta E^{corr}_i=8.43\) kJ mol−1, corresponding to a minimum 30-fold reduced sampling compared to the expected optimal sampling.
After optimizing the RE-EDS parameters, the production run is performed for a chosen length. The free-energy differences are subsequently calculated using the replica at \(s=1.0\) with Eq. (5).
Starting state mixing
The sampling in RE-EDS simulations can be further improved by using starting coordinates for the replicas corresponding to the different end states (i.e. replica 1 starts in a low-energy configuration for end state 1, replica 2 in a low-energy configuration for end state 2, etc.). This technical approach is called “starting state mixing” (SSM) in the following and is also used with Hamiltonian replica-exchange TI calculations (see e.g. [34, 35]). The optimized coordinates obtained in the parameter exploration step can be used for SSM. We compare RE-EDS simulations with SSM and with a single set of starting coordinates (abbreviated as 1SS).
Analysis
Three types of metrics were used to quantify the sampling in RE-EDS simulations. The first metric determines for each end state i the sampling fraction where it is maximally contributing to the reference state, i.e. \(f_i^{\text{mc}}\). A maximally contributing state is defined as the end state with the lowest potential energy minus its energy offset in a frame. As can be seen in Eq. (3), maximally contributing end states have the largest impact on the reference-state sampling at a given time point.
Optimal sampling in a RE-EDS system is achieved when all end states are sampled as maximally contributing states to an equal extent at \(s=1.0\), i.e.
The second metric is the estimated sampling fraction of “physical occurrence” of an end state i, i.e. \(f_i^{\text{occur}}\). As a result of phase-space overlap with the current maximal contributing end state, other end states in the EDS system might be sampled simultaneously. An end state is counted as “occurred” when its potential energy is below the threshold \(V_i \le T_{i}^{\text{phys}}\) at a time point t. These thresholds are estimated during the second substep of the parameter exploration phase. If end states show no phase-space overlap, \(f_i^{\text{occur}}\) will be (nearly) the same as \(f_i^{\text{mc}}\).
Undersampling is detected with a third metric using the thresholds \(T_{i}^{\text{us}}\). These thresholds are determined in the first substep of the parameter exploration phase from the simulation with the lowest s-value. If all end states have a potential energy below their respective \(V_i - E^R_i \le T_{i}^{\text{us}}\), the current frame is characterized as undersampling [19].
Methods
Model system
To showcase the performance of RE-EDS, a system of five inhibitors (L1, L17, L19, L20 and L21) of checkpoint kinase 1 (CHK1) taken from Ref. [23] was chosen (Fig. 5). The numbering of the compounds is according to Ref. [23]. The same system was studied in Ref. [24] as part of a series of scaffold hopping systems. Although the five ligands share a common substructure, they were considered to exemplify different types of core-hopping transformations (i.e. ring size change, ring opening/closing, ring extension) and R-group modifications [24].
For the protein, the GROMOS 54A7 force field [36] was used. For the ligands, topologies were generated using the parametrization by the ATB server [37] as an initial guess. The bonded terms were manually harmonized and adjusted to match the parameterization of similar functional groups in the GROMOS 54A7 force field. Partial charges were generated with our previous machine learning approach [38] (\(\epsilon\) = 4) and manually arranged into charge groups. The input files can be retrieved from: https://github.com/rinikerlab/reeds/tree/main/examples/systems.
System preparation
The crystal structure of CHK1 in complex with ligand L1 (PDB ID:3U9N) was used as starting structure. The initial coordinates for ligands L17, L19, L20, L21 were generated with the ConstrainedEmbed() functionality in the RDKit [39], where the common substructure was kept fixed in the crystal conformation. The coordinates of each ligand and those of the protein were subsequently energy minimized in vacuum using the steepest descent [40] approach implemented in the GROMOS software package [41].
A “dual topology” approach was used for the RE-EDS simulations, i.e. each ligand is present in the system separately [17]. Thus, each end state comprises of one active ligand and \(N-1\) inactive (dummy) ligands. To avoid spatial drifting of the dummy ligands, eight distance restraints per ligand pair were defined within the common substructure (Fig. 5) to connect all ligands in a ring with the help of the RestraintMaker program (https://github.com/rinikerlab/restraintmaker) (order: -L1-L17-L19-L20-L21-). The reference distance was set to 0.0 nm and the force constant to 1000 kJ mol\(^{-1}\) nm\(^{-2}\). The combined topology file was generated with the program prep_eds in the GROMOS++ [42] package. The EDS system was solvated in a cubic box of simple-point-charge (SPC) [43] water (resulting in 1’848 solvent molecules for the ligands in water and 15,639 solvent molecules for the protein-ligands complex). An energy minimization was carried out with the steepest descent algorithm [40], where all solute atoms were position restrained with a force constant of 25,000 kJ mol\(^{-1}\) nm\(^{-2}\).
Simulation details
All simulations were performed with the GROMOS software package [41] (freely available on http://www.gromos.net). The equilibrations and production runs were carried out under isothermal-isobaric (NPT) conditions using the leap-frog integration algorithm [44] and a time step of 2 fs. Bond lengths were constrained with SHAKE [45] using a tolerance of \(10^{-4}\). The nonbonded contributions were calculated with a twin-range scheme using a short-range cutoff of 0.8 nm and a long-range cutoff of 1.4 nm. The electrostatic nonbonded contributions beyond the long-range cutoff were calculated with the reaction-field [46] approach and a dielectric permittivity of 66.7 [47] for water.
The temperature was kept constant at 300 K using the weak coupling scheme [48] and a coupling time of 0.1 ps\(^{-1}\). The pressure was kept at 1.031 bar (1 atm) with the same type of algorithm and a coupling time of 0.5 ps and an isothermal compressibility of \(4.575 \times 10^{-4}\) (kJ mol\(^{-1}\) nm\(^{-3}\))\(^{-1}\). Rotation and translation of the center of mass of the simulation box were removed every 2 ps. Energies were written to file every 20 steps and coordinates every 5000 steps. In the RE-EDS simulations, replica exchanges was attempted every 20 steps.
RE-EDS workflow
The new Python code to manage the RE-EDS workflow, including the analysis steps, can be retrieved from: https://github.com/rinikerlab/reeds. The workflow starts with the energy-minimized coordinates of the EDS system (all N ligands plus environment, maximally contributing end state is L20) into the parameter exploration step, which is used as equilibration phase. A RE-EDS simulation of 0.2 ns length was performed with 21 logarithmically distributed replicas between \(s=1.0\) and \(10^{-5}\) and all energy offsets set to zero. The thresholds \(T_{i}^{\text{us}}\) were estimated from replicas with very low s-values. Undersampling was observed when each end state occurred with a fraction \(f_{i}^{\text{occur,us}} \ge 0.75\) during the simulation period. To be conservative, the lower bound for the following steps was set to the s-value two levels below the highest replica with undersampling.
To optimize the coordinates of the system for each end state, an EDS simulation of 2 ns length was performed for each end state i with \(s=1.0\) and \(E^R_i=500\) kJ mol\(^{-1}\) while the energy offsets of all other end states were set to \(-500\) kJ mol\(^{-1}\). L20 was the initial maximally contributing end state in the starting configuration. The coordinates were considered to be optimized when the desired end state was constantly sampled as the maximally contributing state in the last 30% of the simulation.
To determine the energy offsets, a 1.5 ns RE-EDS simulation was carried out with 12 logarithmically distributed replicas for the ligands in water and 17 for the protein-ligands complex between \(s=1.0\) and the lower bound (determined above). The first 0.4 ns of the simulation were discarded as equilibration. This simulation was performed in two manners: (i) using the final coordinates from the lower-bound determination as starting configuration for all replicas (1SS approach), or (ii) using the different optimized coordinates from the previous substep for the replicas in an alternating manner (SSM approach). For the PEOE [19] scheme, the following parameters were used: fraction \(f_{i}^{\text{us}} \ge 0.9\) and the potential thresholds determined in the lower bound exploration \(T_{i}^{\text{us}}\).
The iterative optimization of the s-distribution with the N-LRTO [20] algorithm was started with the energy offsets and the final coordinates of the previous substep. Four replicas were added per iteration. The simulation length of the first iteration was 0.5 ns, and subsequently increased by 0.5 ns at each iteration until a maximum length of 1.5 ns was reached.
The iterative optimization of the \(f_i^{\text{mc}}\) distribution was carried out with the described scheme. The scheme used short 0.5 ns simulations, and adjusted in each step the energy offsets \(E^R\) with a pseudo-count intensity factor \(x = 30\).
The optimization was considered converged here when all end states were sampled as maximally contributing states at \(s=1.0\), the number of round trips per ns was above zero, and the improvement of the round-trip time was below \({\overline{\tau }}/nRT < 0.5\) ns.
The production run with constant reference-state parameters was performed for 3.5 ns.
Simulation of single states
The input coordinates for the simulations of the individual end states were extracted from the RE-EDS starting coordinates and subsequently energy minimized. Next, a production run of 4 ns was performed.
Analysis
Free-energy differences were calculated with the program dfmult from the GROMOS++ [42] package. Statistical analysis and handling of the workflow steps are based on the Python packages pandas [49], Matplotlib [50], NumPy [51], SciPy [52], and PyGromosTools [53].
Results and discussion
The chosen model system of five inhibitors of CHK1 kinase exemplifies different core-hopping transformations (i.e. ring size change, ring opening/closing, ring extension) and R-group modifications [24], increasing the complexity compared to the systems previously studied with RE-EDS. Furthermore, the performance can be directly compared to the results obtained with FEP+ and OPLS3 in Ref. [24] as well as with QligFEP results in Ref. [13].
Parameter exploration and parameter optimization
The RE-EDS workflow was started by estimating the lower bound for the s-distribution. Using the above mentioned undersampling criterion (see “Methods” section), a lower bound of \(s=0.01\) was determined for the protein-ligands complex and \(s=0.0056\) for the ligands in water.
Optimized coordinates were obtained for all five ligands, as verified by comparing the potential-energy distribution from the EDS simulation with the one extracted from a standard MD simulation of the respective ligand (Fig. S1 in Supporting Information). From these same steps, the potential-energy thresholds for the occurrence sampling (\(T_{i}^{\text{phys}}\)) and undersampling (\(T_{i}^{\text{us}}\)) were determined.
The energy offsets \(\mathbf{E}^R\) were estimated from a short RE-EDS simulation with the PEOE [19] scheme and are listed in Table 1. For \(s=1.0\), the energy offsets should ideally be equal to the free energy of the corresponding state (i.e. \(\varDelta E^R_{ji} = \varDelta G_{ji}\)) such that the partition function of the reference state is the sum of the partition functions of the end states [15]. Therefore, the comparison between the relative estimated energy offsets in water and in complex (\(\varDelta \varDelta E^R_{ji} = \varDelta E^R_{ji,\text{complex}} - \varDelta E^R_{ji,\text{water}}\)) and the relative binding free energy \(\varDelta \varDelta G^\text{bind}_{ji}\) can be used to (roughly) assess the quality of the estimated energy offsets. As shown in Fig. S2 in Supporting Information, the energy offsets estimated from the SSM simulations are in better agreement with the experimental relative binding free energies than those estimated from the 1SS simulations.
The optimization of the s-distribution was performed with the N-LRTO [20] algorithm, thereby minimizing the average round-trip time \({\overline{\tau }}\) in the replica graph. For the 1SS approach in the complex, four optimization iterations were used. For the other systems, three iterations were used.
In the first iteration, the total number of observed round trips was very low or zero for all approaches. In the following iterations, this quantity increased, and the average round-trip time decreased for all simulations (Fig. 6). The number of round trips was generally smaller in the complex than in water due to a more pronounced gap region [20]. Already after the second iteration, the round-trip time was reduced in all approaches. The improvement of the \({\overline{\tau }}\) over the iterations can also be seen in Fig. S3 in Supporting Information. As can be seen in the third row of Fig. 6, the optimization algorithm increases the density of the replicas around \(s = 0.041\), where the major gap region lies.
The s-optimization was stopped after a sufficiently high number of round trips and low round-trip time was reached. This resulted in 20 replicas for the ligands in water after three s-optimization iterations. For the protein-ligands complex, the fourth s-optimization iteration was chosen for the 1SS approach, and the third iteration for the SSM approach, resulting in 29 and 25 replicas, respectively. The average round-trip time after convergence was \({\overline{\tau }} = 0.4 \pm 0.2\) ns for all simulations.
After the s-optimization, the energy offset rebalancing scheme was applied to improve the state sampling.
During the rebalancing steps, no further replicas were added to the s-distribution. It is essential for the success of the rebalancing scheme that round trips occur. Therefore, the number of round trips and average round-trip time were monitored. In all systems, the number of round trips and \({\overline{\tau }}\) remained relatively stable over the four rebalancing steps. For the RE-EDS 1SS approach in water, the number of round trips slightly decreased but never dropped to zero.
Across the optimization steps, also the sampling of the end states as maximally contributing states at \(s=1.0\) was monitored. During the s-optimization, some end states “vanish” and are no longer sampled as maximal contributing states. This leakage effect can occur when the initially estimated \(E^{\text{R}}\) are not exactly optimal [19]. With energy offset rebalancing, the sampling of each end state can be recovered, and the sampling distribution approaches the ideal case. After rebalancing, all end states showed a \(f_i^{\text{mc}} > 0\) and the mean absolute deviation of the sampling distribution from ideal decreased from 20–25% to approximately 7–12% (Fig. S4 in Supporting Information).
Free-energy calculation
After successfully optimizing the RE-EDS parameters, the production runs were performed for 3.5 ns.
Both in water and in complex, the potential-energy distributions of the end states generally agree well with the corresponding distributions from the standard MD simulations of the single end states (Fig. 7). Only for the 1SS approach in the complex, a deviation can be seen for L17, with a slight shift to higher potential energies. This is due to insufficient sampling of L17 in this case (see below). The analysis of the maximally contributing end states at \(s=1.0\) shows that in water all end states were sampled close to the ideal equal distribution (Fig. S5 in Supporting Information). In the simulation of the protein-ligands complex, there are still differences in sampling. Especially with the 1SS approach, L19 is generally sampled too often, while L17 is not sampled enough. The situation is improved with the SSM approach. Comparing \(f_i^{\text{occur}}\) and \(f_i^{\text{mc}}\) in Fig. S5 indicates that the end states in the CHK1 system are clearly separated (i.e. no phase-space overlap).
From the replica at \(s=1.0\), the free-energy differences were calculated using Eq. (5) and the resulting \(\varDelta \varDelta G^\text{bind}_{ji}\) were compared with the experimental results taken from Ref. [23]. The results are shown graphically in Fig. 8 and numerically in Table 2. The individual free-energy differences are given in Table S3 in the Supporting Information. The RMSE with RE-EDS 1SS is 4.8 kJ mol\(^{-1}\) and the MAE is \(3.9\pm 2.8\) kJ mol\(^{-1}\). The main deviations stem from ligand L17 in the RE-EDS 1SS approach, which can be explained by the insufficient sampling of this ligand in the complex (see Figs. 7 and S5 in Supporting Information).
The performance was substantially improved using the SSM approach with RE-EDS, giving an RMSE of 3.3 kJ mol\(^{-1}\) and an MAE of \(2.8 \pm 1.7\) kJ mol\(^{-1}\). Only two values (L21–L11) and (L21–L19) deviate more than 4.184 kJ mol\(^{-1}\) (i.e. 1 kcal mol\(^{-1}\)) from experiment. The Spearman correlation coefficient for RE-EDS 1SS is \(r_{\text{Spearman}}=-0.01\) and for RE-EDS SSM \(r_{\text{Spearman}}=0.69\).
Next, we assessed the convergence of the \(\varDelta G_{ji}\) values as a function of simulation time (Fig. S6 in Supporting Information). For the RE-EDS 1SS approach, all free-energy differences appeared converged after 2.5 ns in water and after 2.7 ns in the complex. For the RE-EDS SSM approach, convergence was observed after 2.5 ns in water and after 2.9 ns in the complex.
By applying the RE-EDS methodology to the same system of five CHK1 inhibitors as studied by Wang et al. [24] and later on also Jespers et al. [13], a direct comparison with FEP+ and QligFEP is possible (Table 2). Note that the quality metrics were calculated over all possible pairs of ligands and in both directions, not only those directly calculated by FEP+ and QligFEP. For FEP+, we obtained an RMSE of 2.4 kJ mol\(^{-1}\) and an MAE of \(1.8 \pm 1.2\) kJ mol\(^{-1}\) with a Spearman correlation coefficient of \(r_{\text{Spearman}}=0.67\). Including cycle closure correction (CC) [24] reduced the RMSE to 2.1 kJ mol\(^{-1}\) and the MAE to \(1.9 \pm 1.0\) kJ mol\(^{-1}\). The Spearman correlation coefficient increased to \(r_{\text{Spearman}}=0.73\). Jespers et al. [13] reported free-energy differences with QligFEP as an average over ten independent replicas, each with significantly less simulation time per \(\lambda\)-window than in Ref. [24]. For QligFEP, an RMSE of 2.3 kJ mol\(^{-1}\), an MAE of \(2.0 \pm 1.2\) kJ mol\(^{-1}\), and a Spearman coefficient of \(r_{\text{Spearman}}=0.61\) was obtained.
Overall, the performance of RE-EDS SSM is comparable with the pairwise methods. The results with FEP+ CC and QligFEP showed a slightly higher accuracy compared to experiment, likely due to the different force fields used. The Spearman correlation coefficient is comparable with the other methods for the RE-EDS SSM approach.
In terms of computational cost, the RE-EDS approach (with 3.5 ns per replica) resulted in about a quarter of the total simulation time (in ns) than reported for the FEP+ calculations in Ref. [24] (Table 2). Overall, the QligFEP approach is the one with the lowest simulation time consumption. A major advantage of the simultaneous simulation of multiple ligands in a single RE-EDS simulation is that all \(N(N-1)/2\) transformations are sampled directly, leading to low statistical errors and removing the need for a state graph. This advantage increases with increasing number of ligands. The current workflow of RE-EDS uses a relatively large amount of simulation time for parameter optimization. Future work will focus on further optimizing the workflow to reduce the pre-processing time.
From the calculated relative binding free energies, \(\varDelta G_{i}^{\text{bind}}\) can be obtained by using one experimental value as anchor point. This allows us to generate a ranking of the five ligands. To avoid any bias from the selected experimental anchor point, all possibilities were calculated and the resulting values averaged (Table 3). While the RMSE is generally low for all approaches (< 1 kcal mol\(^{-1}\) = 4.184 kJ mol\(^{-1}\)), the ranking of the ligands as measured by \(r_{\text{Spearman}}\) is not very good. This observation is not uncommon for ligand series with small differences in binding free energy [11, 55]. Note that the uncertainties of the individual values have increased compared to the relative binding free energies due to the anchoring and averaging procedure.
Conclusion
This study reports the recent developments for the multistate free-energy method RE-EDS, which omits the definition of alchemical transition paths. The automated workflow for RE-EDS was improved in robustness, and was applied to estimate the relative binding free energies of five CHK1 inhibitors containing typical core-hopping transformations. This system was investigated previously with FEP+ and QligFEP, allowing for a direct comparison of RE-EDS with state-of-the-art pairwise free-energy methods. Using different starting configurations representing all end states (SSM approach) in the parameter optimization of the RE-EDS workflow improved the sampling, convergence, and the accuracy of the resulting free-energy differences. The performance of RE-EDS SSM was found to be comparable with FEP+ and QligFEP, and shows that RE-EDS with a “dual topology” approach can be readily applied to challenging ligand transformations like ring size change, ring opening/closing, and ring extension.
In terms of computational efficiency, the total production run time with RE-EDS (3.5 ns per replica) was about a quarter of that reported for FEP+ with this system. As multiple ligands are simulated simultaneously in a single RE-EDS simulation, this sampling enhancement will increase with increasing number of ligands. However, the pre-processing phase in the RE-EDS workflow currently uses a relatively large amount of simulation time. Making these steps more efficient will be addressed in future work. In addition, further automatization of the dual topology approach with distance restraints is ongoing.
Data availability
The Python code for the RE-EDS workflow is provided on Github https://github.com/rinikerlab/reeds and can be used with the current version of GROMOS, freely available from http://www.gromos.net. The input files for the simulations can be retrieved from https://github.com/rinikerlab/reeds/tree/main/examples/systems.
References
Cournia Z, Allen B, Sherman W (2017) Relative binding free energy calculations in drug discovery: recent advances and practical considerations. J Chem Inf Model 57:2911
Armacost KA, Riniker S, Cournia Z (2020) Novel directions in free energy methods and applications. J Chem Inf Model 60:1
Cournia Z, Allen BK, Beuming T, Pearlman DA, Radak BK, Sherman W (2020) Rigorous free energy simulations in virtual screening. J Chem Inf Model
Chodera JD, Mobley DL (2013) Entropy–enthalpy compensation: role and ramifications in biomolecular ligand recognition and design. Ann Rev Biophys 42:121
Aldeghi M, Heifetz A, Bodkin MJ, Knapp S, Biggin PC (2016) Accurate calculation of the absolute free energy of binding for drug molecules. Chem Sci 7:207
Jorgensen WL, Buckner JK, Boudon S, Tirado-Rives J (1988) Efficient computation of absolute free energies of binding by computer simulations. Application to the methane dimer in water. J Chem Phys 89(6):3742
Merz KM (1991) Carbon dioxide binding to human carbonic anhydrase II. J Am Chem Soc 113:406
Kirkwood JG (1935) Statistical mechanics of fluid mixtures. J Chem Phys 3:300
Zwanzig RW (1954) High-temperature equation of state by a pertubation method. I. Nonpolar gases. J Chem Phys 22:1420
Liu S, Wu Y, Lin T, Abel R, Redmann JP, Summa CM, Jaber VR, Lim NM, Mobley DL (2013) Lead optimization mapper: automating free energy calculations for lead optimization. J Comput Aided Mol Des 27:755
...Wang L, Wu Y, Deng Y, Kim B, Pierce L, Krilov G, Lupyan D, Robinson S, Dahlgren MK, Greenwood J, Romero DL, Masse C, Knight JL, Steinbrecher T, Beuming T, Damm W, Harder E, Sherman W, Brewer M, Wester R, Murcko M, Frye L, Farid R, Lin T, Mobley DL, Jorgensen WL, Berne BJ, Friesner RA, Abel R (2015) Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J Am Chem Soc 137:2695
Yang Q, Burchett W, Steeno GS, Liu S, Yang M, Mobley DL, Hou X (2020) Optimal designs for pairwise calculation: an application to free energy perturbation in minimizing prediction variability. J Comput Chem 41:247
Jespers W, Esguerra M, Åqvist J, Gutiérrez-De-Terán H (2019) Qligfep: an automated workflow for small molecule free energy calculations in Q. J Cheminf 11:26
Christ CD, van Gunsteren WF (2007) Enveloping distribution sampling: a method to calculate free energy differences from a single simulation. J Chem Phys 126:184110
Christ CD, van Gunsteren WF (2008) Multiple free energies from a single simulation: extending enveloping distribution sampling to nonoverlapping phase-space distributions. J Chem Phys 128:174112
Christ CD, van Gunsteren WF (2009) Simple, efficient, and reliable computation of multiple free energy differences from a single simulation: a reference Hamiltonian parameter update scheme for enveloping distribution sampling (EDS). J Chem Theory Comput 5(2):276
Riniker S, Christ CD, Hansen N, Mark AE, Nair PC, van Gunsteren WF (2011) Comparison of enveloping distribution sampling and thermodynamic integration to calculate binding free energies of phenylethanolamine N-methyltransferase inhibitors. J Chem Phys 135:24105
Lee J, Miller BT, Damjanović A, Brooks BR (2014) Constant pH molecular dynamics in explicit solvent with enveloping distribution sampling and Hamiltonian exchange. J Chem Theory Comput 10:2738
Sidler D, Schwaninger A, Riniker S (2016) Replica exchange enveloping distribution sampling (RE-EDS): a robust method to estimate multiple free-energy differences from a single simulation. J Chem Phys 145:154114
Sidler D, Cristòfol-Clough M, Riniker S (2017) Efficient round-trip time optimization for replica-exchange enveloping distribution sampling (RE-EDS). J Chem Theory Comput 13:3020
Perthold JW, Oostenbrink C (2018) Accelerated enveloping distribution sampling: enabling sampling of multiple end states while preserving local energy minima. J Phys Chem B 122:5030
Perthold JW, Petrov D, Oostenbrink C (2020) Toward automated free energy calculation with accelerated enveloping distribution sampling (A-EDS). J Chem Inf Model 60:5395
Huang X, Cheng CC, Fischmann TO, Duca JS, Yang X, Richards M, Shipps GW (2012) Discovery of a novel series of CHK1 kinase inhibitors with a distinctive hinge binding mode. ACS Med Chem Lett 3:123
Wang L, Deng Y, Wu Y, Kim B, LeBard DN, Wandschneider D, Beachy M, Friesner RA, Abel R (2017) Accurate modeling of scaffold hopping transformations in drug discovery. J Chem Theory Comput 13:42
König G, Glaser N, Schroeder B, Hünenberger PH, Riniker S (2020) An alternative to conventional \(\lambda\)-intermediate states in alchemical free energy calculations: \(\lambda\)-enveloping distribution sampling. J Chem Inf Model 60:5407
Hansen N, Dolenc J, Knecht M, Riniker S, van Gunsteren WF (2012) Assessment of enveloping distribution sampling to calculate relative free enthalpies of binding for eight netropsin-DNA duplex complexes in aqueous solution. J Comput Chem 33:640
Ries B, Linker SM, Hahn DF, König G, Riniker S (2021) Ensembler: a simple package for fast prototyping and teaching molecular simulations. J Chem Inf Model 61:560
Hansmann UHE (1997) Parallel tempering algorithm for conformational studies of biological molecules. Chem Phys Lett 281:140
Sugita Y, Kitao A, Okamoto Y (2000) Multidimensional replica-exchange method for free-energy calculations. J Chem Phys 113:6042
Lee J, Miller BT, Damjanović A, Brooks BR (2015) Enhancing constant-pH simulation in explicit solvent with a two-dimensional replica exchange method. J Chem Theory Comput 11:2560
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97
Katzgraber HG, Trebst S, Huse DA, Troyer M (2006) Feedback-optimized parallel tempering Monte Carlo. J Stat Mech P03018
Nadler W, Meinke JH, Hansmann UH (2008) Folding proteins by first-passage-times-optimized replica. Exch Phys Rev 8:061905
Graf MMH, Maurer M, Oostenbrink C (2016) Free-energy calculations of residue mutations in a tripeptide using various methods to overcome inefficient sampling. J Comp Chem 37:2597
Hahn DF, König G, Hünenberger PH (2020) Overcoming orthogonal barriers in Alchemical free energy calculations: on the relative merits of \(\lambda\)-Variations, \(\lambda\)-extrapolations, and biasing. J Chem Theory Comput 16:1630
Schmid N, Eichenberger AP, Choutko A, Riniker S, Winger M, Mark AE, van Gunsteren WF (2011) Definition and testing of the GROMOS force-field versions: 54A7 and 54B7. Eur Biophys J 40:843
Malde AK, Zuo L, Breeze M, Stroet M, Poger D, Nair PC, Oostenbrink C, Mark AE (2011) An automated force field topology builder (ATB) and repository: version 1.0. J Chem Theory Comput 7:4026
Bleiziffer P, Schaller K, Riniker S (2018) Machine learning of partial charges derived from high-quality quantum-mechanical calculations. J Chem Inf Model 58:579
RDKit: Cheminformatics and machine learning software (2021). http://www.rdkit.org. Accessed March 2021
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint. arXiv:1609.04747
Schmid N, Christ CD, Christen M, Eichenberger AP, van Gunsteren WF (2012) Architecture, implementation and parallelization of the GROMOS software for biomolecular simulation. Comp Phys Commun 183:890
Eichenberger AP, Allison JR, Dolenc J, Geerke DP, Horta BAC, Meier K, Oostenbrink C, Schmid N, Steiner D, Wang D, van Gunsteren WF (2011) The GROMOS++ software for the analysis of biomolecular simulation trajectories. J Chem Theory Comput 7:3379
Berendsen HJC, Postma JPM, van Gunsteren WF, Hermans J (1981) Interaction models for water in relation to protein hydration. Interaction models for water in relation to protein hydration (Reidel, 1981), pp 331–342
Hockney RW (1970) The potential calculation and some applications. Methods Comput Phys 136–210
Ryckaert JP, Ciccotti G, Berendsen HJC (1977) Numerical integration of the Cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J Comput Phys 23:327
Tironi I, Sperb R, Smith PE, van Gunsteren WF (1995) A generalized reaction field method for molecular dynamics simulations. J Chem Phys 102:5451
Glättli A, Daura X, van Gunsteren WF (2002) Derivation of an improved simple point charge model for liquid water: SPC/A and SPC/L. J Chem Phys 116:9811
Berendsen HJC, Postma JPM, van Gunsteren WF, DiNola A, Haak JR (1984) Molecular dynamics with coupling to an external bath. J Chem Phys 81:3684
McKinney W (2010) Data structures for statistical computing in Python. In: Proceedings of the 9th Python in Science Conference 445, 51
Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9:99
Van Der Walt S, Colbert SC, Varoquaux G (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13:22
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, van der Walt SJ, Brett M, Wilson J, Millman KJ, Mayorov N, Nelson AR, Jones E, Kern R, Larson E, Carey CJ, Polat I, Feng Y, Moore EW, VanderPlas J, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero EA, Harris CR, Archibald AM, Ribeiro AH, Pedregosa F, van Mulbregt P, Vijaykumar A, Bardelli AP, Rothberg A, Hilboll A, Kloeckner A, Scopatz A, Lee A, Rokem A, Woods CN, Fulton C, Masson C, Häggström C, Fitzgerald C, Nicholson DA, Hagen DR, Pasechnik DV, Olivetti E, Martin E, Wieser E, Silva F, Lenders F, Wilhelm F, Young G, Price GA, Ingold GL, Allen GE, Lee GR, Audren H, Probst I, Dietrich JP, Silterra J, Webber JT, Slavič J, Nothman J, Buchner J, Kulick J, Schönberger JL, de Miranda Cardoso JV, Reimer J, Harrington J, Rodríguez JLC, Nunez-Iglesias J, Kuczynski J, Tritz K, Thoma M, Newville M, Kümmerer M, Bolingbroke M, Tartre M, Pak M, Smith NJ, Nowaczyk N, Shebanov N, Pavlyk O, Brodtkorb PA, Lee P, McGibbon RT, Feldbauer R, Lewis S, Tygier S, Sievert S, Vigna S, Peterson S, More S, Pudlik T, Oshima T, Pingel TJ, Robitaille TP, Spura T, Jones TR, Cera T, Leslie T, Zito T, Krauss T, Upadhyay U, Halchenko YO, Vázquez-Baeza Y (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17:261
Ries B, Lehner MT (2021) rinikerlab/pygromostools: Pygromostools\_v1 (2021). https://doi.org/10.5281/zenodo.4621710
Bennett CH (1976) Efficient estimation of free energy differences from Monte Carlo data. J Comput Phys 22:245
Schindler CEM, Baumann H, Blum A, Böse D, Buchstaller HP, Burgdorf L, Cappel D, Chekler E, Czodrowski P, Dorsch D, Eguida MKI, Follows B, Fuchs T, Grädler U, Gunera J, Johnson T, Lebrun CJ, Karra S, Klein M, Knehans T, Koetzner L, Krier M, Leiendecker M, Leuthner B, Li L, Mochalkin I, Musil D, Neagu C, Rippmann F, Schiemann K, Schulz R, Steinbrecher T, Tanzer EM, Lopez AU, Follis AV, Wegener A, Kuhn D (2020) Large-scale assessment of binding free energy calculations in active drug discovery projects. J Chem Inf Model 60:5457
Acknowledgements
The authors thank David F. Hahn and Dominik Sidler for helpful discussions about the methodology and Kay Schaller for the help on the project and the code. The authors gratefully acknowledge financial support by the Swiss National Science Foundation (Grant Number 200021-178762).
Funding
Open access funding provided by Swiss Federal Institute of Technology Zurich.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ries, B., Normak, K., Weiß, R.G. et al. Relative free-energy calculations for scaffold hopping-type transformations with an automated RE-EDS sampling procedure. J Comput Aided Mol Des 36, 117–130 (2022). https://doi.org/10.1007/s10822-021-00436-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-021-00436-z