Free energy perturbation calculations of tetrahydroquinolines complexed to the first bromodomain of BRD4

Alchemical free energy perturbation (FEP) theory is widely used nowadays to calculate protein–ligand binding energies, often in support of drug discovery endeavours. We assess the accuracy and sensitivity of absolute FEP binding energies with respect to the CHARMM/CGenFF and the AMBER/GAFF force field parameterisations for a set of tetrahydroquinoline inhibitors of the first bromodomain of BRD4, a target of keen interest for the development of anti-cancer drugs. We find that AMBER/GAFF is better able than CHARMM/CGenFF to cover the range of and to distinguish between the relative binding energies of the 16 ligands. GRAPHICAL ABSTRACT


Introduction
Developing a new drug compound is an expensive task that is becoming increasingly more difficult. The current drug-design paradigm faces challenges that can be ameliorated by taking advantage of the latest technologies and algorithms. There is a demand for new solutions that are sophisticated enough to meet the demands of the drug CONTACT  development market, while also being robust and sufficiently accurate to deal with the challenges presented by new emerging infectious diseases. For the past three decades, drug-design has been a proving ground for the application of machine learning, using different strategies [1,2]. Among these different paradigms, quantitative structure-activity relationships (QSAR) and AI-assisted drug-design have seen success in fragment-based drugdesign [3][4][5]. These data-driven methods to predict drug potency can benefit from calibration against physicsbased protocols, such as molecular dynamics (MD) simulations.
One of the most prominent and state-of-the-art protocols for the calculation of protein-ligand binding energies is alchemical free energy perturbation (FEP) theory [6][7][8]. Within the FEP framework, one determines the binding energy of a molecule to a given receptor by simply measuring the change in the Gibbs free energy caused by the ligand-protein interaction while in solution. The alchemical method is based on a non-physical thermodynamic cycle, where the binding free energy is computed as the sum of multiple steps where the ligand is 'inserted', 'removed' or 'transmuted' while in the pocket (or in solution) as shown in Figure 1.
Two well-documented and successful strategies are: (a) absolute FEP calculations, and (b) relative FEP calculations. The latter refers to the evaluation of the relative binding free energies between two congeneric ligands through the perturbation (transmutation) of one ligand into another, whereas absolute FEP is the evaluation of the reversible work of transferring the ligand from the binding site into solution. Note that 'absolute' does not refer to the actual absolute free energy (G) but to the difference between two states ( G). In that spirit, one could consider the relative FEP measure of 'difference between differences' ( G). While relative FEP can produce quick and accurate results, it suffers from some limitations regarding what transformations provide reliable results. Large perturbations, net charge changes, charge movement, linker changes, ring creation, and ring size changes during the transmutation are known to be problematic [9].
Both absolute and relative FEP calculations have been applied to systems of theoretical interest for several decades, enabling reliable predictions. Jayachandran et al. [10] reported a root mean squared (RMS) error of 1.6 kcal mol −1 between FEP and experimental binding affinities for a series of eight FKBP12 inhibitors. Wang et al. [11] obtained FEP calculations with similar quality (RMS errors ranging from 2.0 to 2.5 kcal mol −1 ) for the binding affinities for FKBP12 inhibitors. More accurate (0.4 kcal mol −1 ) calculations have been reported by Fujitani et al. [12] for a different series of FKBP12 inhibitors, albeit with a large offset prediction error of 3.5 kcal mol −1 .
The bromodomain-containing protein 4 (BRD4) is a member of the Bromodomain and Extra-Terminal (BET) family of proteins [13]. The BRD4 is an epigenetic protein that primarily recognises acetylated lysine residues (often present in histones) and is involved in DNA repair [14]. The suppression of BRD4 can quench the growth of several types of cancer, including NUT midline carcinoma [15], prostate cancer [16], acute myeloid leukaemia [17], and breast cancer [18]. Consequently, novel BRD4 inhibitors are highly desirable [19], due to their application in medicine and basic biological research, and as such they have recently received renewed attention [20]. There are two bromodomains of BRD4: BD1 and BD2.
Inhibitors of BRD4-BD1 have received attention from different computational research groups, who have exploited the wealth of structural data in the context of, for example, docking studies [21] and to enhance fragment-based discovery approaches [22]. Most pertinently, for our investigation, Wan et al. [23] used relative FEP to calculate binding energies of a series of 16 tetrahydroquinoline (THQ) derivatives complexed to BRD4-BD1, with errors ranging from 1.0 to 1.7 kcal mol −1 . Aldeghi et al. [24] have published a study on 11 BRD4-BD1 inhibitors across several classes of compounds, using absolute FEP calculations. Their work represents the most accurate protocol, with an RMS error of 0.8 kcal mol −1 A non-equilibrium absolute binding free energy method has recently been applied to study the latter 11 inhibitors binding to BRD4-BD1 [25].
The accuracy of these FEP calculations (or any MD simulation) relies on the quality of the parameterisation of the force field describing the ligands and the protein. Most computational investigations of systems of biological interest make use of well-known force fields, such as CHARMM [26], AMBER [27], OPLS [28] and GROMOS [29]. These force fields take advantage of the polymeric nature of proteins, as peptide residues act as building blocks for which parameters can readily be transferred amongst similar proteomic systems. The parameterisation of small molecules is, however, a more complex issue that impedes the use of MD tools to assist the development of new drugs. Some of the extensions of parameterisation to encompass small molecules include the CHARMM General Force Field (CGenFF) [30] and the Generalised AMBER Force Field (GAFF) [31]. Even though both force field strategies are successful at describing a large range of relevant drug candidates, one cannot expect that a limited library of parameters can cover the diverse chemical space often tackled by the drug discovery industry. When the transferability between these parameters is poor, re-parameterisation against quantum chemical information is required.
In that regard, the quantitative performance of absolute FEP methods needs further testing and the assessment of the quality of force field parameterisation is an area that would benefit from greater attention. Some effort has previously been applied into probing the quality of the AMBER/GAFF force field in reproducing experimental binding energies, but with a limited scope of molecules [24]. In the current study, we compare the performance of the CHARMM/CGenFF and the AMBER/GAFF parameterisations for the series of THQ inhibitors of BRD4-BD1 that was recently studied by Wan et al. [23] using relative binding affinities. However, in this work, we are going to make use of a methodology that resembles that of Aldeghi et al. [24], to calculate absolute G values (with some modifications to improve computational efficiency) to perform the first benchmarking comparison between the AMBER and CHARMM force fields for BRD4-BD1 and 16 THQ ligands ( Figure 2) applied to FEP calculations.

System preparation
The initial conformation for the BRD4 receptor was taken from the published coordinates, PDB ID 4BJX [32], in a complex with a THQ ligand (I-BET726 [33]). Five key crystallographic water molecules in the binding site, found to be crucial by our clustering analysis [21], were kept. All organic molecules that were not the ligand of interest were removed. The coordinates for all 16 ligands were obtained from the study of Wan et al. [23], except for L7, which was re-docked. The docked pose for L7 binds to the BRD4 Asn140 residue through its isoxazole ring, in contrast to the amide group in all other ligands.
For the AMBER investigation, ligand parameterisation was obtained with the AMBER force field (GAFF) and AM1-BCC charges [34] using the acpype online server [35]. The Amber99SB-ILDN force field [36] and the TIP3P model [37] were used for the protein and water molecules, respectively. The MD simulations were performed with GROMACS 2018.1 [38]. For the MD simulations employing the CHARMM and CGenFF force fields, GROMACS 2020.3 [37] was used. The CHARMM36 parameters [39] were used to describe the protein atoms and CGenFF [30,40] as used for ligand atom parameterisation, with TIP3P to describe water molecule [37].
Complexes containing ligands L1 to L9 were solvated in a dodecahedral box with a minimum distance between the solute and the box of 3 nm. The charged ligands (L10 to L16) were solvated in a cubic box, as done by Aldeghi et al. [24], to mitigate changes to the net charge of the box, which is detrimental to computing G, caused by annihilating the charged ligands. This is due to the finite size effect error which is increased by the asymmetry of the box. Sodium and chloride ions were added to neutralise the systems. The receptor has a net charge of +1, L1 to L9 are neutrally charged (so one chloride ion was added to the solvated complex), L10 to L12 and L16 have a net charge of +1 (so two chloride ions were added to the solvated complex) and L13 to L15 have a net charge of -1 (so no ions were added to the solvated complex). For each system, 100,000 energy minimisation steps were carried out using a steepest descent algorithm. The systems were then subsequently simulated for 0.2 ns in the canonical (NVT) ensemble with harmonic position restraints applied to the solute heavy atoms with a force constant of 1,000 kJ mol −1 nm −2 . Temperature coupling was achieved by Langevin dynamics [41] with 298.15 K as the reference temperature. A 2 ns position restrained simulation in the isothermal-isobaric ensemble was then performed using the Berendsen weak coupling algorithm. The production runs were obtained by performing 1.5 ns unrestrained Hamiltonian-exchange Langevin dynamics with a 2 fs time-step in the NpT ensemble with the Parrinello-Rahman pressure coupling scheme [42]. For all simulations the particle mesh Ewald (PME) algorithm [43] was used for electrostatic interactions with a real space cut-off of 12 Å, a spline order of 6, a relative tolerance of 10 −6 and a Fourier spacing of 1.0 Å. The LINCS algorithm was used to constrain bonds with hydrogen atoms. For the solvated ligands, MD simulations with the above parameters were employed using a cubic solvent box (with a minimum solute to box distance of 3 nm) and time lengths of 0.2, 2 and 1.5 ns for, respectively, the NVT equilibration, NpT equilibration and NpT production runs.

Free energy calculations
The FEP production runs were analyzed with the Bennett Acceptance Ratio (BAR) method [44] as implemented in GROMACS. The final binding free energy for each ligand is the difference between the decoupling of the ligand from the water solution and from the solvated complex. The two-stage MD/FEP protocol applied is described as follows. In the first step, the van der Waals and Coulomb interactions of the ligand ( G Coul+vdW bind ) were decoupled from the protein using a linear alchemical pathway with λ = 0.1 for both, with a coarser λ = 0. In each case a λ value of zero corresponds to ligand interactions in the system turned on, with the order of interaction switch-off proceeding as bonding, Coulomb and then van der Waals. A similar procedure was executed for the ligand in water to obtain its desolvation energy ( G Coul+vdW desolvation ). The van der Waals and Coulomb interactions of ligand with the solvent were similarly decoupled, but the additional restraints were not required. The Coulomb λ values were 0.00, 0.05, 0.10, In each case, a λ value of zero corresponds to ligand interactions in the system being turned off, with the order of interaction switch-on proceeding as van der Waals and then Coulomb. There was a total of 32 and 22 λ windows (i.e. separate MD simulations) for each ligand/protein and ligand/solvent system, respectively. The complete alchemical cycle is illustrated in Figure 3.
The relative position and orientation of each ligand with respect to the BRD4 protein was restrained by one bond, two angles and three dihedral harmonic potentials with force constants of 4 kJ mol −1 nm −2 (bond harmonic potential), 1 kJ mol −1 rad −2 (angle) and 1 kJ mol −1 rad −2 (improper dihedral). The contribution of these restraints to the non-interacting ligand in solution ( G rest desolvation ) was estimated by the analytical expression proposed by Boresch et al. [45] (Equation 1), while for the ligand/protein complex ( G rest bind ) it was evaluated numerically (as described above): where R is the ideal gas constant, T is the temperature in Kelvin, V 0 is the volume corresponding to one molar standard state, or 1,660 Å 3 , r 0 is the reference restrained bond distance, θ A,0 and θ B,0 are the reference angular restraints, and K n are the force constants to apply to the distance (r 0 ), the angles (θ A and θ B ), and the dihedrals (φ A , φ B and φ C ). To obtain the binding energy for the system one must consider the vibrational and rotational energy penalty imposed by the restraints ( G rest desolvation and G rest bind ) making the expression for the total binding free energy as follows (Equation 2).

Quality of ligand parameterisation
To evaluate the quality of the ligand parameterisation, the GAFF, GAFF2 and CGenFF force fields were investigated for all 16 ligands of the THQ series. The GAFF2 force field is the second generation of the general AMBER force field. Improvements include: (i) new van der Waals parameters to reproduce high quality interaction energies and key liquid properties, non-metal elements and common metals; and (ii) new parameters for bonded terms (bond stretching, bond angle bending and torsional twisting) obtained from high quality ab initio data. Transferability is also one of the main motivations behind the development of GAFF2, as many new parameters for molecules were included. Based on atom types and parameter assignments, both the GAFF/GAFF2 and CGenFF schemes apply their force fields to a given molecule in an automated fashion. However, in the event of the given atom or parameter being unavailable ('missing') in either of these force fields, an assignment is made based on atom similarity and analogy. Based on the accuracy of the approximation, a 'penalty score' is returned for every bonded parameter and charge, quantifying the dissimilarity between atom types. The penalty values can be interpreted as follows. Penalties lower than 10 indicate the analogy is fair; penalties between 10 and 50 mean some basic validation is recommended; and penalties higher than 50 indicate poor analogy and mandate extensive validation/optimisation. For the GAFF and GAFF2 force fields, the only type of parameter that has any penalty score associated with it are the dihedral angles and only four of them have values larger than 10, i.e. require any type of validation. Furthermore, for the THQ molecules studied here, the only angle to be improved in the second iteration of GAFF2 was the H-N-C-C dihedral, which has a penalty score of 41.2 estimated by the GAFF force field, as shown in Table 1.
Comparing the CGenFF force field with GAFF and GAFF2, the set of parameters implemented in the CHARMM force field is more restricted. For all THQ derivatives studied, there are charge, bond, angle and dihedral penalty scores associated with the parameters that score above 10, i.e. require at least some basic validation. Table 2 summarises the penalties scores for ligand L1. The maximum penalty score for any parameter is 91.5 (for dihedral angles), which is significantly less than 300, the largest value for both GAFF and GAFF2. However, in the CGenFF parameterisation, there are several dihedral angle parameters that have a penalty score larger than 50, and their values sum to 751.7. The same observation can be made for the charge parameters; they have a maximum value of 44.2, but their sum exceeds 250. On the other hand, both bond and angle parameters have fair analogies within the CGenFF force field. Table 3 shows the binding free energies obtained with the AMBER/GAFF force field in this absolute FEP study of THQ ligands complexed with BRD4-BD1. For the AMBER/GAFF results, most of the calculations agree very well with the experimental data [23]. 12 out of the 16 of our predictions have errors below or equal to  1.0 kcal mol −1 . Ligand L9 will be excluded from the quantitative analysis that follows since its experimental binding energy was determined only as a range ( > -5.9 kcal/mol). The largest deviations from experiment are seen for ligands L7 and L13, with deviations of 2.0 and -1.9 kcal mol −1 , respectively. For the series, a mean absolute error (MAE) of 0.9 ± 0.4 kcal mol −1 , an RMS error of 1.0 ± 0.5 kcal mol −1 and an R 2 of 0.58 were obtained. Our calculations are significantly more accurate than those performed by Wan et al. [22] (where the errors ranged from 1.0 to 1.7 kcal mol −1 ) but are less accurate for charged species (L10 to L15). Our calculations have a similar accuracy to the most robust state-of-the-art calculations performed by Aldeghi et al. [24] (MAE 0.6 ± 0.1 kcal mol −1 , and RMS 0.8 ± 0.2 kcal mol −1 ), albeit they predicted absolute binding energies of a diverse set of compounds (of many different series) and performed much longer production runs (of 10 ns per lambda window).

Absolute free energy calculations
The BAR results from the CHARMM/CGenFF force field FEP MD simulations (Table 4) agree less well with experiment compared to the AMBER/GAFF results. For the CHARMM/CGenFF computed binding free energies, a MAE of 1.8 kcal mol −1 , an RMS error of 2.2 kcal mol −1 and an R 2 of 0.07 with respect to experiment were found. Five out of the 16 of the CHARMM/CGenFF predictions have errors below or equal to 1.0 kcal mol −1 . The largest deviations from experiment for CHARMM/CGenFF computed binding energies are for ligands L3, 11, L13 and L16, which, respectively, have errors of -3.3, -4.9, -3.0 and -3.7 kcal mol −1 . For each of the 16 ligands, the error is larger than for the corresponding AMBER/GAFF result for most of the ligands, apart from ligands L7 and L15 (Tables 3 and 4).
The ligand with the lowest relative binding energy to BRD4-BD1 from experiment is ligand L5 ( G = -10.8 ± 0.05 kcal mol −1 ). For this ligand, AMBER/GAFF predicts a binding energy of -9.9 ± 0.6 kcal mol −1 and CHARMM/CGenFF predicts a binding energy of -9.8 ± 0.6 kcal mol −1 . For both force fields, ligand L5 is not predicted to be the most potent binder out of the 16 ligands (Tables 3 and 4). From experiment, ligand L9 is the weakest binder ( G > -5.9 kcal mol −1 ). AMBER/GAFF and CHARMM/CGenFF predict binding energies of, respectively, -4.7 ± 0.6 and -9.0 ± 0.5 kcal mol −1 for this ligand. AMBER/GAFF correctly predicts the binding of ligand L9 to BRD4-BD1 to have the highest relative binding energy, whereas CHARMM/CGenFF predicts ligand L12 to have the highest relative binding energy (-8.6 ± 0.3 kcal mol −1 ). AMBER/GAFF is better able than CHARMM/CGenFF to cover the range of and to distinguish between the relative binding energies of the 16 ligands ( Figure 4). In addition, AMBER/GAFF can effectively discriminate between the ligands showing an experimental binding energy below -8.9 kcal/mol and those with a binding energy greater than -8.0 kcal/mol, a result which is important in drug discovery settings as often medicinal chemists are interested in separating strongly-binding from weakly-binding compounds. CHARMM/CGenFF did not predict any binding energies of higher (i.e. less negative) than -8.6 ± 0.3 kcal mol −1 , whereas AMBER/GAFF predicts six ligands to have binding energies higher than this value. From experiment, seven ligands are found to have binding energies higher than -8.6 kcal mol −1 .

Discussion
The BAR results suggest that the AMBER/GAFF force field is preferable (MAE of 0.9 ± 0.4 kcal mol −1 and an RMS error of 1.0 ± 0.5 kcal mol −1 ) over the CHARMM/CGenFF force field (MAE of 1.9 kcal mol −1 and an RMS error of 2.3 kcal mol −1 ) for computing absolute binding free energies from FEP MD simulations for the BRD4/THQ series. The AMBER/GAFF approach may be further enhanced when the more accurate GAFF2 parameters for ligand molecules are made available.
The superior AMBER/GAFF performance could be attributed, at least partially, to the better ligand parameterisation stemming from GAFF since it has fewer missing parameters when compared to CGenFF, as demonstrated earlier in this work when analysing the dihedral and charge parameters. Both forcefields, however, have been extensively used in drug discovery, as CGenFF performs very well for most ligands, except for a few noticeable outliers, most likely due to the inaccurate  parameterisation (specifically the dihedral angles showed in Table 1) and the finite-size effect in the charged ligands. The latter assertion is supported by some additional simulations on one of the charged ligands, L13, performed in a larger simulation cell with a minimum solute to box distance of 5 nm. The calculated binding free energies (Table  S2), especially in the case of the CHARMM /CGenFF force field, are closer to the experimental one.
For the AMBER/GAFF force field, larger force constant constraints (40 kJ mol −1 nm −2 for the bond, 10 kJ mol −1 rad −2 for the angles, and 10 kJ mol −1 rad −2 for the dihedrals) give MAE and RMSE of, respectively, 1.1 and 1.7 kcal mol −1 with respect to experiment for the 16 THQs complexed with BRD4-BD1. These mean errors are slightly larger than the mean errors for the binding energies predicted using the smaller force constant constraints. These larger force constraints alter the order of the binding free energies (Table S1 and Figure  S1). L9 has the highest relative binding energy (-5.1 ± 0.3 kcal mol −1 ), as predicted using the smaller force constant restraints, but L16 is predicted to have the lowest binding energy (-12.4 ± 0.3 kcal mol −1 ) when the larger force constant constraints are used. The binding energy for L16 predicted using the larger force constant constraints has an error with experiment of 5.0 kcal mol −1 , which may be due to the G prot term being relatively more negative (Table S1) than that of the smaller force constant constraints ( Table 1). The smaller force constant constraints give L15 the lowest binding energy (-11.5 ± 1.3 kcal mol −1 ), the ligand that has the third lowest binding energy (-10.1 ± 0.4 kcal mol −1 ) when the larger constraints are employed.
The endpoints of an FEP MD simulation, i.e. λ = 1.0 for protein-ligand van der Waals interactions in this study, can lead to some technical challenges, due to their non-physical nature which give rise to some numerical instability. For our system of interest, this was not an issue, as the BRD4 pocket is quite shallow, relatively solvent-exposed and migration of water in and out of the binding cavity was routinely observed.
FEP MD simulations are often time consuming and troublesome to setup, run and manage -particularly for a range of ligands (and force fields) as presented here. A recent effort has been made to incorporate GROMACS into a user-friendly procedure using opensource software to enable FEP MD simulations [46], which should encourage research groups to engage the FEP MD technique to assist in their drug-design studies.
Future work could consider a more comprehensive set of force fields, e.g. OpenFF [47]. In addition, it has been suggested [48] that a consensus calculation based on more than one force field might improve the overall accuracy. We have examined this for the systems considered here, but in this case it does not offer benefit.