Towards good correlation between fragment molecular orbital interaction energies and experimental IC50 for ligand binding: A case study of p38 MAP kinase

We describe several procedures for the preprocessing of fragment molecular orbital (FMO) calculations on p38 mitogen-activated protein (MAP) kinase and discuss the influence of the procedures on the protein–ligand interaction energies represented by inter-fragment interaction energies (IFIEs). The correlation between the summation of IFIEs for a ligand and amino acid residues of protein (IFIE-sum) and experimental affinity values (IC50) was poor when considered for the whole set of protein–ligand complexes. To improve the correlation for prediction of ligand binding affinity, we carefully classified data set by the ligand charge, the DFG-loop state (DFG-in/out loop), which is characteristic of kinase, and the scaffold of ligand. The correlation between IFIE-sums and the activity values was examined using the classified data set. As a result, it was confirmed that there was a selected data set that showed good correlation between IFIE-sum and activity value by appropriate classification. In addition, we found that the differences in protonation and hydrogen orientation caused by subtle differences in preprocessing led to a relatively large difference in IFIE values. Further, we also examined the effect of structure optimization with different force fields. It was confirmed that the difference in the force field had no significant effect on IFIE-sum. From the viewpoint of drug design using FMO calculations, various investigations on IFIE-sum in this research, such as those regarding several classifications of data set and the different procedures of structural preparation, would be expected to provide useful knowledge for improvement of prediction ability about the ligand binding affinity.


Introduction
The fragment molecular orbital (FMO) method [1][2][3] allows for thequantum mechanical study of large biomolecules. It provides not only accurate protein-ligand interactions but also their energetic components for each fragment pair, which is called the inter-fragment interaction energy (IFIE). The FMO method was applied to several protein-ligand systems such as nuclear receptors, kinases, proteases, and protein-protein interaction systems, which are promising drug targets [4][5][6][7][8][9][10][11][12]. The application to the estrogen receptor-ligand binding system [4], for example, shows a good correlation between the binding energy and experimental relative binding affinity. However, for the complex structure used as the input for the FMO calculation, it has not been sufficiently studied how to prepare the structure, including the addition of hydrogen atoms, the necessity of structure optimization, and the selection of force field. Therefore, in this study, to find a relevant prescription of structure preparation suitable for the bioactivity prediction with FMO method, the structures were prepared according to several protocols and consequently the IFIEs were estimated. This is the first example to see the correlation of (IFIE-sum) obtained from various structures including the differences of the ligand charges and/or tertiary structures for the same target protein. In our experience, it is often difficult to observe that the IFIE-sum and bioactivity values are completely or successfully correlated using all of the calculated data. Therefore, the correlation is investigated while carefully trying to classify the data according to charge state of the ligand, skeleton and tertiary structure of the protein. To evaluate the capability of FMO approach for binding-affinity prediction and the influence of structure preparation protocol on it, we chose the p38 mitogen-activated protein (MAP) kinase [13][14][15] (Fig. 1A) as a target protein. The reason is that this protein has many entries of Xray crystal structures and experimental activity data; the 95 structures with IC 50 data in the ChEMBL database [16] were available in the Protein Data Bank (PDB) [17].
In this paper, we analyze 78 PDB structures because the FMO calculations for these structures successfully converged and provided IFIE values, while the calculations for the other structures did not converge for some structural reasons. Because all inhibitors which we dealt with have corresponding PDB structures and are placed in the same binding pocket, these inhibitors are specific inhibitors. Thus, we denote these inhibitors as 'ligands'. We also regard the IC 50 value as the measure of ligand binding affinity in this study. The kinases play important roles in the functional expression of various cellular processes, which are involved in cell aging and autoimmune diseases, and are activated by phosphorylation under external stress such as heat, osmotic pressure and ultraviolet radiation [13]. It is well known that the p38 MAP kinase has two main stable structures: the DFG (Asp-Phe-Gly)-in-loop and the DFG-out-loop forms (Fig. 1B). In addition, several structures include a DFG-intermediate-loop form in our dataset (Fig. 1B).
In the following sections, the preparation protocols are first described in Sec. 2. Next, we evaluated IFIEs between the p38 MAP kinase and its various inhibitors using the FMO method, and compared the IFIE-sum with experimental data in Sec. 3.1. We also categorized the proteins in terms of their DFG-in/out-loop configuration and the ligands according to their scaffold. We then evaluated the difference in IFIEs with and without complementation of missing residues. Additionally, we investigated the difference in IFIEs when different force fields based on molecular mechanics (MM) model were used to optimize the complexes. These results are presented in Sec. 3.2.

Materials and computational methods
The structures of p38 MAP kinase-ligands complexes used in this study were prepared and processed according to the protocols below (summarized in Fig. 2). First, the X-ray structures of the complexes were downloaded from the PDB repository [17]. If the protein structure has missing atoms, they were completed with every possible missing side-chain and main-chain atoms. For the missing main chain atoms, the "Complement Main Chain" function of the BioStation Viewer [3] was employed. In this process, we employed PDB ID: 3GC7 for the DFG-in conformation and 3D83 for the DFG-out conformation, respectively, as template structures because they have the best resolution and no missing residues. For the side chain atoms, the "Structure Preparation" function in Molecular Operation Environment (MOE) [18] was employed. All crystallographic water molecules were removed except for the following cases: the water molecule forming a hydrogen bond with Asp168 or Lys53 in the DFG-in conformation and the water molecule forming a hydrogen bond with Asp168 in the DFG-out conformation, because the numbers of crystallographic water molecules in binding pocket were different for each PDB structure. Sometimes the molecular modeling software fails to assign the bond order of ligands, which in turn fails to assign the correct protonation state of the ligand. When this issue arose, we manually corrected the bond order of the ligand. We carefully addressed this issue because a difference in protonation state frequently leads to large differences in IFIEs. The "Proton-ate3D" function [19] in MOE and GBpK [20] in Discovery Studio were employed for the determination of protonation states. All histidine residues except for His312 were set to have neutral charges. For His312, we chose the HIP state, which is positively charged state and has hydrogen atoms on both nitrogen atoms of the imidazole ring, for all complexes because Protonate3D provided a charged state for this residue for more than 50% of the representative structures and the side chain of His312 forms a salt bridge with Glu317, as shown in Fig. 3. The positions of the hydrogen atoms and of the added atoms were optimized with three classical force fields, AMBER10:EHT, CHARMM27, and AMBER99, where AMBER10:EHT is a typically used force field. The 78 structures were distributed among the four research groups of different institutions which contributed to the study for preparation and optimization, and each group used a different modeling tool, where PDB structures for one ligand type were dealt with by multiple institutions. We made 6 structure sets of A, B, C, C′, D and D' containing 38, 8, 25, 25, 15 and 15 complexes, respectively (see Table 1), where several complexes overlapped between the sets. Each institution did not have the same modeling tool; thus, preprocessing before the FMO calculations was performed in slightly different ways. Structures A, C, and D were named by each modeling tool. Differences were introduced in each step owing to the different tools employed for the determination of the protonation state, the force fields used for structure optimization, and the way that the missing residues were added. In addition, as these differences were concurrently introduced, we could not distinguish the effect of the change in a single step. Thus, to evaluate the effect of such differences on the IFIE values, we built Structures C′ and D′ to evaluate the influence of the force field, and Structure B to evaluate the effect of the added missing residues. Details of the preparation procedure for each structure are provided in the Appendix. We performed FMO calculations using ABINIT-MP [3,21,22] version 6+ and evaluated the inter-fragment interaction energies (IFIE) as where E′ X is the monomer (X = I, J) or dimer (X = IJ) energy without the environmental electrostatic energy, V IJ is the electrostatic potential of the fragment pair IJ, and ΔD IJ is the difference density matrix. The FMO-MP2/6-31G* [3,23] level was employed for the calculations. We treated each amino acid residue as one fragment and the ligand as well. The IFIE-sum was defined as the sum of the IFIE between a ligand and an amino acid residue, and the IFIE between the ligand and a water molecule was not included in the IFIE-sum. We used the K computer (in Kobe, Japan) for the FMO calculations. An example of calculation time was 8.6 h (for PDB ID: 1BL6), which mainly depended on the number of atomic orbitals in the ligand fragment.  (AMBER10:EHT). Fig. 4A shows the relationship between the pIC 50 and IFIE-sums of all complexes with unique 78 compounds, which has a low correlation coefficient of R 2 = 0.00008. Because the difference in IFIEsum between the neutral and charged ligands is more than 100 kcal/ mol, as shown in Fig. 4A, it is difficult to collectively compare these data. Thus, we investigated the correlation between the pIC 50 and IFIE-sums for ligands with similar net charges. Generally, comparing the IFIE values of neutral and charged fragments was difficult because the absolute value of the IFIE for charged fragment pairs was overestimated. Fig. 4B and C show the correlation between the pIC 50 and IFIE-sums for complexes with charged and neutral ligands, respectively. Even when separately considering the neutral and charged ligands, low and nearly no correlations were obtained, respectively.

Results and discussion
Second, we classified the proteins in terms of their two characteristic conformations: the DFG-in-loop and DFG-out-loop (Fig. 5). Fig. 5A and B show the relationship between the pIC 50 and IFIE-sums of the DFG-in and DFG-out conformations, respectively. In both cases, the IFIE-sums are not correlated with the pIC 50 . Additionally, these results were separated according to ligand charge, as shown in Fig. 5C-F. The IFIE-sums of the charged ligands show inversed correlations with the pIC 50 for both the DFG-in and DFG-out conformations ( Fig. 5C and D). However, the DFG-in conformation with neutral ligands shows a good correlation (R 2 = 0.4277), as seen in Fig. 5E. On the other hand, no correlation was obtained (R 2 = 0.0003) for the DFG-out conformation with neutral ligands, as shown in Fig. 5F. Such a poor correlation for the DFG-out structures would be caused by strong interaction between the ligand and Glu71 because experimental IC 50 values do not always represent significant inhibition when the IFIE value between ligand and Glu71 indicates strong stabilization [24]. In the next section, we investigate the correlation for the neutral ligands, and the relationship between the pIC 50 and IFIE-sums could be understood in terms of ligand scaffold.

Categorizing in terms of ligand scaffold
The correlation between the experimental pIC 50 and the IFIE-sums of 64 neutral ligands (where there are 64 PDB structures with neutral ligands in 78 calculated PDB structures) is shown in Fig. 4C. Contrary to our expectations, there is no correlation between the experimental pIC 50 value and the calculated IFIE-sum values for these ligands (R 2 = 0.0002). However, the IFIE-sums of the p38 MAP kinase in the DFG-in conformation with 42 neutral ligands were significantly correlated to the pIC 50 values, as shown in Fig. 5E. To understand the origin of this relationship, we divided the 64 neutral ligands into five categories based on the scaffold of the ligand by visual inspection, where the proteins were not separated by the DFG-loop conformation. The five types are defined as follows: (A) biphenyl amides, (B) three linked aromatic rings, (C) fused aromatic rings, (D) ureas, and (E) others (see Fig. 6). Fig. 7 shows the relationship between the pIC 50 and IFIE-sums for each ligand type. As shown in Fig. 7A, B, C, and E, the IFIE-sums exhibited good correlations with pIC 50 . On the other hand, the correlation coefficient was extremely poor for urea-type ligands. The X-ray crystal structures showed that all complexes of urea ligands and p38 MAP kinases consisted of DFG-out-loop structures. This data has a behavior similar   to that of the data in Fig. 5F, which presents the correlation between the pIC 50 and IFIE-sums of the DFG-out proteins with neutral ligands. In our current dataset, the type of ligand scaffold has its preferable protein conformations for DFG-loop.
3.2. Influence of structure preparation on the IFIE values 3.2.1. Influence of the complementation procedure for missing residues In this section, we describe the influence of the preparation of the structures on the IFIE values. For this purpose, we employed the biphenyl amide ligands as an example dataset, which included nine entries with PDB IDs: 2ZB1, 3D7Z, 3D83, 3DT1, 3IPH, 3RIN, 3ROC, 2ZB0, and 3DS6. Because the ligand in 3DT1 was a charged species, we only considered the neutral ligands for the discussion (Fig. 8).

Complementation of missing residues
First, we investigate the treatment of the missing residues. Occasionally, the original structure from the PDB repository has missing residues and atoms, and there are two possible ways to treat the missing sections: (i) by adding the missing residues following a template and (ii) by capping the termini. We compared the results obtained with both procedures, as shown in Fig. 9. The correlation between IFIE-sum and pIC 50 based on non-complemented and complemented structures is illustrated in Fig. 9A and B, respectively. Furthermore, to separate the effects of complementation, we evaluated IFIE-sums with excluding the IFIE values of missing residues using the complemented structure data shown in Fig. 9C. Note that the employed structures are identical in Fig. 9A and C and the IFIE-sum both in Fig. 9B and C contains only the contribution from the non-complemented region while that in complemented structure. This difference can be separated into two contributions such as the IFIE values from complemented residues (in missing region) and the changes (caused by the complementation) of IFIE values from non-missing region. Surprisingly, Fig. 9C shows the latter contribution was larger than the former.

Hydrogen orientation and protonation state
In this section, we investigate the differences in IFIE values of complemented and non-complemented structures using 3ROC as an example (Fig. 10). In the non-complemented structures, the Nterminus and C-terminus near the incomplete region were capped with NH 2 and COOH, respectively. The complemented structures were built with template structure (PDB ID: 3GC7) using BioStation Viewer and Discovery Studio. The IFIE-sums of the complemented and the non-complemented structures were -108.8 and − 92.6 kcal/mol (data not shown), respectively. Table 2  Next, we discuss the origin of the difference of about 10 kcal/mol [= (16.2-5.31) kcal/mol] in the binding energy other than the contribution of missing residues. The example structure 3ROC has 11 missing amino acid residues consisting of sequences from Ala172 to Tyr182. Table 3 shows the IFIE values affected by the difference of protonation state between complement and non-complemented structures in this region. First, from Fig. 11, we note that the protonation states of Lys118 in complemented and non-complemented structures are deprotonated and protonated states, which are the neutral and charged states, respectively. The difference of protonation states affected the IFIE of ligand with Lys118 and those around them by about several kcal/mol. To understand the other causes of the difference in IFIE between completed and non-completed structures, we show the protein structure around the Thr185 residue in Fig. 12. The side chain of Thr185 in 3ROC has missing atoms. The orientation of Thr185 is different in the complemented and non-complemented loop structures. The protonation states of the Lys152 and Arg186 residues, which are close to Thr185, changed so that these residues formed hydrogen bonds with different orientations with respect to Thr185 (Table 3). However, as shown in Figs. 11 and 12, the remaining missing groups influenced the other remaining groups and unrelated segments of the protein.

Influence of different force fields on the IFIE values
We employed three force fields, AMBER10:EHT, AMBER99, and CHARMM27, to investigate the influence of the force field used for the   10. Ribbon representation of the structure which was built by BioStation Viewer with starting from the 3ROC structure in PDB. The red residues show attraction against the ligand (yellow) and the blue residues show repulsion. The green colour indicates the missing residues. The distance from the ligand to these missing residues was 10 Å or more even at the closest residues (residue number 172-182). geometry optimization upon IFIE-sum on ligand. The IFIE-sum obtained with AMBER10:EHT was used as a reference for comparison. Fig. 13 shows that using a different force field for geometry optimization has little effect on the ultimate IFIE-sum. The IFIE-sums obtained with each force field are listed in Table 4, and no significant differences are observed with certain exceptions. To understand the origin of the exceptions, we also list the residues with the largest differences in IFIE between force fields in Table 5. This table says Glu71 and Thr106 frequently provide large contributions to the difference in IFIE-sum. As shown in Fig. 14, the Glu71 and the urea type ligands make the hydrogen bonds. The differences in IFIE of Glu71 become larger especially in the cases of the urea type. The structures around Glu71 and the urea-type ligand were, however, almost unchanged by difference of force field (Fig. 14). This would be explained by the fact that the IFIE values between the urea-type ligands and Glu71 were large negative values (from −45 to −60 kcal/mol) in comparison to those of other fragment pairs. These values were sensitive to the slight structural difference of hydrogen atoms due to the different force fields. This sensitivity may be related to the poor correlation between IFIE and pIC 50 for the urea type (Fig. 7D). Next, the 3D structures around Thr106 minimized by using different force fields are shown in Fig. 15, where 3GCU is employed as an example. In structure C′ with AMBER10:EHT force field, the OH group in the side chain of Thr106 faces the ligand, while in structure C with CHARMM27 force field, it faces the main chain of His107. Optimizing the structures under the different conditions to determine the direction of the OH group, we found that the final direction was determined by its initial position. We conclude that the difference in IFIE-sum between different force fields for geometry optimization was almost negligible by 5 kcal/mol or less as far as starting from X-ray crystallographic atomic coordinates, specifically when the structures were optimized into the same local stable geometry; the dependency on force field appears in IFIE modulation for length of hydrogen bonds and orientation of hydroxyl groups.

Conclusion
In this study, we performed an FMO-IFIE analysis of the intermolecular interactions between the amino acid residues of the p38 MAP kinase and its inhibitors. First, a good correlation between the experimental IC 50 values and the calculated IFIE-sums for neutral ligands and proteins in the DFG-in conformation was obtained, in contrast to that of charged ligands and DFG-out conformations, where the  IFIE values between the ligands and Glu71 caused the poor correlation between the IFIE-sum and pIC 50 . Moreover, when different types of ligands were bound to the protein, the correlation coefficients were moderate except for urea ligands, thus suggesting that the type of ligand also affects the calculated IFIEs. In addition, the correlation coefficients varied by 0.23 when different approaches to complementing the missing residues were used. This difference is composed of the contribution from complemented residues as well as change of protonation state or change in hydrogen orientation. Furthermore, the difference in IFIE obtained with different force fields was less than 5 kcal/mol, except for some cases, when the same local minimum energy structures were used. One of the reasons for the exceptions was that the IFIEs between Glu71 and the urea type ligands were significantly higher than those of other fragment pairs. This results in fluctuating tendency of IFIEs between Glu71 and the urea-type ligands due to the slight difference in position of hydrogen atom. The other reasons were that complementing the missing region or adding hydrogen atoms might change the hydrogen orientation or protonation state and thus cause a difference in IFIE. Although this difference did not make significant effects for main conclusion of this research, this potential for different protonated state caused by difference of modeling procedure may produce large differences in other research objects. Thus, sufficient care should be taken for modeling around ligand pocket.
This study provides a practical way to understand the relationship between different preprocessing procedures and the calculated IFIE values of the protein-ligand complexes and thus to achieve a good correlation between calculation and experiment. We expect that these results will contribute to the practical application of the FMO method to drug design in pharmaceutical industries. In addition, this research was limited to discuss the choice of preparation procedure, or the correlation between experimental IC 50 and FMO derived IFIEsum for some parts of PDB structures. We did not use more theoretically sophisticated approaches such as informatics approaches, for example, singular value decomposition [24] or physicochemical approaches especially for incorporation of solvation effects: FMO based polarizable continuum model (FMO-PCM) [25], FMO based Poisson-Boltzmann surface area (FMO-PBSA) [26,27] and FMO method with molecular mechanics Poisson-Boltzmann surface area (FMO+MM-PBSA) [10], because these methods require more    computation resources; for example, FMO-PBSA requires the computational time by about 20 times. Furthermore, we discussed neither the efficacy of FMO3 method [8,28], which would be important for bridging water between protein and ligand [29], nor the effects of structural sampling. These more sophisticated approaches, including higher-order electron correlation methods and higher basis sets (for example, cc-pVDZ) [30][31][32], may be promising to improve the correlation between the calculated IFIE values and experimental pIC 50 data. using the K computer (project ID: hp150160, hp160103, hp170183, and hp180147 Missing residues were complemented with the "Complement Main Chain" function in BioStation Viewer. We employed PDB structures 3GC7 and 3D83 as template structures for the DFG-in conformation and DFG-out conformation, respectively, because these structures have no missing residue and exhibit the best resolution of 1.7 Å (3GC7) and 1.9 Å (3D83).

(4) Missing atoms
Missing atoms were complemented with the "Structure Preparation" function in MOE. (5) Setting the bond order and ionization of the ligand We fixed bond orders and ionization of ligand according to the 2D chemical structures. (6) Crystal water molecules All crystallographic water molecules were removed except for the water molecule forming a hydrogen bond with Asp168 and Lys53 in the DFG-in conformation and the water molecules forming a hydrogen bond with Asp168 in the DFG-out conformation. (7) Hydrogen atom addition We used the "Protonate3D" function in MOE for pH = 7.0. In addition, the N-terminus and C-terminus are NH 3 + and COO − , re-

spectively. (8) Tautomeric states and protonation of histidine
We dealt with His312 as HIP. Other His residues were treated as neutral residues. The tautomeric states of His such as HIE or HID were determined with Protonate3D in MOE.
(9) Assignment of ligand charge We assigned ligand charges according to their protonation state at pH = 7.0 in water. For aliphatic amines, we set the formal charges to +1. For carboxylic groups, we set the formal charge to −1, and the formal charge of aromatic groups such as anilines was set to zero. (10) Structure optimization.
For the geometry optimization, the heavy atoms whose coordinates were determined by X-ray crystallography were fixed; the hydrogen atoms used to complement the missing residues and missing atoms were not fixed. MOE with the AMBER10: EHT force field was used.
Structure B (built by institution "a" with MOE; structure with noncomplementation for missing residues) (1) Obtaining X-ray structures Same as structure A. (2) Sequences for calculations Same as structure A.

(3) Missing residues
We did not complement missing residues. For the C-terminus, we employed COOH and for the N-terminus, we employed NH 2 . Structure C built by institution "b" and "c" mainly with Discovery Studio using CHARMM27 force field) (1) Obtaining X-ray structures Same as structure A. (2) Sequences for calculations Same as structure A. (3) Missing residues First, missing residues were complemented with the "Complement Main Chain" function in BioStation Viewer. If the structures were significantly distorted or deeply contacted, we complemented the missing residues with Discovery Studio. (4) Missing atoms We complemented the missing atoms with Discovery Studio (5) Setting the bond order and ionization of the ligand.
Same as structure A. (6) Crystal water molecules Same as structure A. (7) Hydrogen atom addition Similar to structure A. Here, "GBpK" function in Discovery Studio was employed instead of "Protonate3D" function in MOE. (8) Histidine tautomeric state Similar to structure A. Here, "GBpK" function in Discovery Studio was employed instead of "Protonate3D" function in MOE. (9) Assignment of ligand charge Same as structure A. (10) Structure optimization The optimization conditions were the same as for structure A.
The CHARMM27 force field of Discovery Studio was employed for the optimization.
Structure C′(similar to structure C and hydrogen atoms were reoptimized with AMBER10:EHT force field) (1) to (10) Same as structure C. (11) Structure optimization (II) We used AMBER10:EHT for the optimization of structure C for hydrogen atoms.
Structure D (built by institution "d" mainly with MOE using Amber99 force field; MOE was partly employed for the complementation of missing residues.) (1) Obtaining X-ray structures Same as structure A. (2) Sequences for calculations Same as structure A. (3) Missing residues 1) If the overlaps with the template were large, we copied a template structure using MOE.
2) If the difference in the conformations of the activation loop were too large, we used the "Build loop" function of MOE.  The optimization conditions were the same as for structure A. The AMBER99 force field of MOE was employed for the optimization.
Structure D′ (similar to structure D and hydrogen atoms were reoptimized with AMBER10:EHT force field) (1) to (10) Same as structure D. (11) Structure optimization (II) We used AMBER10:EHT for the optimization of structure D for hydrogen atoms.