Performance of protein-ligand docking with CDK4/6 inhibitors: a case

It is widely believed that tertiary protein-ligand interactions are essential in determining protein function. Currently, the structure sampling and scoring function in traditional docking methods still have limitations. Therefore, new methods for protein-ligand docking are desirable. The accurate docking can speed up the early-stage development of new drugs. Here we present a multi-source information-based protein-ligand docking approach (pmDock). In the CDK4/6 inhibitor case study, pmDock produces a substantial accuracy increases between the predicted geometry centers of ligands and experiments compared to AutoDock and SwissDock alone. Also, pmDock improves predictions for critical binding sites and captures more tertiary binding interactions. Our results demonstrate that pmDock is a reliable docking method for accurate protein-ligand prediction.


Introduction
The cyclin-dependent kinases 4/6 (CDK4/6) is one of the most important regulators for the cell cycle. Aberrant activity of CDK4/6 may cause uncontrolled cell proliferation, leading to malignant tumor diseases. For example, breast cancer is often associated with the abnormal accumulation of CDK4/6 [1,2]. The identification of the inhibitors would be essential to breast cancer treatment.

Docking by integrating multi-source information
To obtain a more precise docking result, we provide one hybrid docking approach by integrating multi-source information to effectively generate more native-like conformations and evaluate the structures more accurately (Figure 1).

Figure 1.
Flowchart of multi-source information-based protein-ligand docking in pmDock. Structure sampling is executed to filter the conformations with the distance between the geometric centers of the two methods less than 3 Å. Structure evaluation is evaluated by calculating the distance between the conformation's geometric center and the native ligand. Clustering is to find the best conformation in pmDock. The pmDock is better than AutoDock and SwissDock in structure evaluation.
AutoDock and SwissDock are the most widely used docking methods [41,42]. In the first step, AutoDock allows the receptor side chains and ligand to rotate the torsional angles for flexible docking [43]. SwissDock treats the receptor as rigid but splits the ligand into fragments. The fragments are then docked and reconstructed to the receptor based on the tree-based sampling algorithm [44].
In the second step, AutoDock uses a semi-empirical scoring function. It uses the Amber force field to evaluate the enthalpic contributions, empirical methods to calculate the solvation and entropy contributions [45]. The force field includes evaluations ( ) and an estimate of the conformational entropy lost upon binding (△ ): where , , , are the weighting constants from dispersion/repulsion, hydrogen bonding, electrostatic, and desolvation obtained by experimental calibration. The first term is a typical 6-12 potential for dispersion/repulsion interactions, where the parameters A and B are taken from the Amber force field [46]. The second term is a directional H-bond term based on a 10-12 potential [47]. And the parameters C and D are assigned to give a maximal well depth of 5 kcal/mol at 1.9 Å for O-H and N-H and a depth of 1 kcal/mol at 2.5 Å for S-H. The directionality of the hydrogen bond interaction E(t) is dependent on the angle t away from ideal bonding geometry. The third item is the electrostatic interactions [48]. The last term is the desolvation potential based on the volume (V) of the atoms surrounding the given atom and is weighted by the solvation parameter (S). In contrast, the exponential term is based on distance [49]. And the distance weighting factor σ is set to 3.5 Å.
The term for the loss of torsional entropy upon binding (△ ) is directly proportional to the number of rotatable bonds in the molecule ( ): the is conformational entropy determined by experimental calibration. The numbers of rotatable bonds include all torsional degrees of freedom.
where the potential energy, ( ⃗ ), is a sum over individual terms. The parameters , , and are their respective force constants, and the variables with a subscript of 0 are their equilibrium values [53]. The first term is (b). The second term is ( ). The third term is − (UB, S). For three bonded atoms A-B-C, this term is a quadratic function of the distance S between atoms A and C [53]. The fourth term is ℎ ( ), n is the multiplicity or periodicity of the dihedral angle and, δ is the phase shift. The fifth item is ( ). The sixth is − terms include Coulombic interactions between the point charges ( and ) and the Lennard-Jones (LJ) 6-12 term, which is used for the treatment of the core-core repulsion and the attractive van der Waals dispersion interaction. And is represents the well depth, where i and j are the indices of the interacting atoms, is the interatomic distance, and is the distance at which the LJ term has its minimum [53]. The last term is backbone torsional correction ( , , ), it is used to perform a numerical correction for the backbone [53].
To improve the docking accuracy, we integrated the AutoDock and SwissDock information in four steps. The detailed workflow of pmDock is as follows. First, we performed structure sampling by using AutoDock and SwissDock. AutoDock produces 2000 conformation samplings while SwissDock produces 5000 binding models. Second, we ranked the docking results and selected the top 200 docking results from both AutoDock and SwissDock. Third, we calculated the distances between the geometry center of the predicted ligands. We removed the ligand conformations in AutoDock (SwissDock) if the distance is larger than 3 Å to the conformations in SwissDock (AutoDock). Then, we further clustered the conformations using the K-Means algorithm [54,55]. The K-Means algorithm is mainly divided into the following steps: (a) Selects an initial K = 6 and initializes their respective cluster centers randomly. (b) Calculates the distances between the ligand geometry centers and cluster centers. The K-Means algorithm divides the ligands into the corresponding cluster according to the closest distance between the ligands and cluster centers. (c) Updates the cluster centers by the average positions of the ligands in each cluster. (d) Repeats the steps (b) and (c) until the sum of squared errors between the empirical average of a cluster and the points in the cluster is minimized among all K clusters: ( ) is the sum of the squared error over all K clusters. Let = * +, = 1, … , be the set of n 3-dimensional points clustered into a set of K clusters. = * , = 1, … +, µ is the mean of cluster . (e) Sorts the ligands according to the distances between the ligand positions and the cluster center.

Sequence conservation analysis
The CDK homology sequences were extracted from ConSurf-DB [56,57]. The HMMER [58] is used to search for homologous sequences similar to the CDK structure. Then, the multiple sequence alignment of these homologous sequences is calculated by MAFFT [59]. The position-specific conservation scores of each amino acid position in the alignment were computed using the Rate4Site program [60]. The Rate4Site algorithm assigns a conservation level for reach amino acid using an empirical Bayesian inference. The continuous conservation scores are divided into a discrete scale of 9 grades. Grade 1 indicates the most variable positions. Grade 9 shows the most conserved positions.

Case study dataset
We use the FDA approved CDK4/6 inhibitors as the case study dataset. These three inhibitors block the cell cycle progression by inhibiting the hyperphosphorylation of RB protein in sensitive breast cancer cells [2]. Here, we used the available CDK6-inhibitor complex structures to test the performance of pmDock. The inhibitors (Palbociclib, Abemaciclib, and Ribociclib) and corresponding CDK6 structures were extracted from the PDB database (PDB codes: 5L2I, 5L2S, and 5L2T) [61] (Table S3). We used OpenBabelGUI 2.4.1 [62] to convert the PDB files into mol2 files.

Case study: docking of breast cancer inhibitors
The performance of pmDock was tested against traditional AutoDock and SwissDock with the FDA approved breast cancer inhibitors Palbociclib (PD0332991), Ribociclib (LEE011), and Abemaciclib (LY2835219). The experimental CDK6-inhibitor complex structures are shown in Figure 2. These three inhibitors are ATP-competitive compounds to inhibit the CDK6 function and therefore interfere with tumor cells' growth [63]. We compared the three corresponding CDK6 structures (PDB codes: 5L2I, 5L2S, and 5L2T) ( Figure S1). The three CDK6 structures' structural differences are very small, with an average RMSD of 0.27 ± 0.06 Å. The workflow of the case study is divided into the following steps ( Figure 3). First, we select the top 200 docking conformations according to the lowest binding energy in AutoDock and SwissDock calculations. We calculated the distances between the predicted ligands' geometry center and removed the conformations if the distance is larger than 3 Å. There are 240, 323, and 204 predicted conformations for Palbociclib, Abemaciclib, and Ribociclib. Then, we calculate each ligand's geometric center and re-rank the docking results by using the K-Means clustering algorithm [54,55] (please see section 2.1 for detailed information). We compare the distance between the predicted ligand geometry centers and experiments. The results show that the distance accuracy of pmDock produces a substantial increase of 80 and 373% compare with AutoDock and SwissDock (Table S4). The pmDock is able to get more accurate docking results with the native-like geometry center as the crystal structures.

Analysis of binding sites
The inhibitor needs to interact with the critical residues to inhibit the CDK6 function. The prediction of the binding site can provide a better understanding of the protein structure and function [64][65][66][67]. Previous research showed Palbociclib and Abemaciclib bind to the residues of Tyr24, Lys43, Phe98, His100, Asp104, and Thr107 on CDK6 (Figure 4). The Ribociclib binds to the residues of Lys43, Phe98, His100, Asp104, and Thr107 on CDK6 [61]. The average distance of these interactions is 3.69 Å. Therefore, we use a distance of 4 Å to calculate the accuracy of the interactions between the ligands and the above-mentioned critical binding sites. The accuracy is defined as: where N is the critical binding sites determined experimentally (Palbociclib = 6, Abemaciclib = 6, and Ribociclib = 5). The n is the number of binding sites. For Palbociclib and Abemaciclib = 0,1,2, … ,6 . And for Ribociclib = 0,1,2, … ,5 . TP (FP) represents true positives (false positives). For example, TP (FP) is the total number of ligands whose distance between two binding sites is less (larger) than 4 Å when n = 2.  In the AutoDock calculation, the interaction accuracy of Palbociclib, Abemaciclib, and Ribociclib are 0.78, 0.91, and 0.52 (Table S5A). In the SwissDock, the interaction accuracy of Palbociclib, Abemaciclib, and Ribociclib are 0.26, 0.52, and 0.19 (Table S5B). For pmDock, the accuracy values of Palbociclib, Abemaciclib, and Ribociclib are 0.77, 0.81, and 0.64 (Table S5C). The mean and standard deviation accuracy values of AutoDock, SwissDock, and pmDcok are 0.74 ± 0.20, 0.32 ± 0.17, and 0.74 ± 0.09, respectively (Figure 5a, Table S5D). The results indicate pmDcok can provide accurate interaction predictions with a small standard deviation.

Analysis of binding energy
We further calculated the binding energy between CDK6 and the three inhibitors. The experimental binding energies of Palbociclib, Abemaciclib, and Ribociclib are −10.0 kcal mole -1 , −9.6 kcal mole -1, and −8.3 kcal mole -1 , respectively. First, we compared the differences between the experiment and theoretical predictions. The mean and standard deviation of energy differences of AutoDock, SwissDock, and pmDock are 0.66 ± 0.29, 1.41 ± 0.47, and 0.75 ± 0.15 (Table S6), respectively. The binding energy differences of AutoDock and pmDock are relatively small. We also selected the top 200 pmDock prediction conformations using binding energy ( Figure S2). The results show that the average binding energy of pmDock is closer to the experimental binding energy. The low energy difference and highest correlation show that pmDock is able to provide a more accurate structure evaluation.

Conservation analysis
We performed sequence conservation analysis to infer the structural or functional important residues. The evolutionary conservation scores were identified using the ConSurf-DB. The continuous conservation scores are divided into a discrete scale of 9 grades. Grade 1 indicates the most variable positions, and grade 9 indicates the most conserved positions. The residue conservations of CDK6 are listed in Table S7. There are four conserved residues in the binding sites: Tyr24 (8), Lys43 (9), Phe98 (7), and Asp104 (8). The two non-conserved residues in the binding sites are His100 (5) and Thr107 (5). We calculated the prediction probability of the critical conserved (non-conserved) residues with the following equation: where M is the critical conserved residue (non-conserved residue) determined experimentally (Palbociclib = 4 (2), Abemaciclib = 4 (2), and Ribociclib = 3(2)). The m is the number of conserved residues (non-conserved residue). For Palbociclib and Abemaciclib = 0,1,2, … ,4. (0,1,2. ). For Ribociclib = 0,1,2,3.(0,1,2.). TP (FP) represents true (false) positives. For example, TP (FP) is the number of correct ligands whose distance between two conserved residues (non-conserved residues) is less (larger) than 4 Å when m = 2. The correct ligand refers to the ligand whose distance from the native ligand is less than 4 Å in the previous description.

Discussions
To further validate the accuracy, we calculated the positive predictive value and RMSD. Figure S3a shows that the PPV of pmDock has increased by 9 and 121% compared with original AutoDock and SwissDock. We also performed clustering calculations for pmDock results. The RMSDs of average conformations for the first clusters in pmDock (3.57Å) are much smaller than AutoDock (4.84 Å) and SwissDock (12.37 Å) (shown in Figure S3b). Together, the results demonstrate that the pmDock approach is useful for further studies. We also performed the MM-GBSA calculation to predict the binding energy of the pmDock docking conformations. We filtered out the conformations less than 0 kcal mole -1 . Figure S4 shows the energy versus distance accuracy plot of the (a) Palbociclib, (b) Abemaciclib, and (c) Ribociclib predictions. The results show pmDock can provide native-like predictions.
CDK4/6 targeted therapy is one important treatment for HR+/HER2-advanced or metastatic breast cancer [68][69][70]. Computational molecular docking is able to speed up the early-stage development of new drugs. Generally, docking is mainly divided into two steps: structure sampling and scoring function. We have used a multi-source information-based docking approach to improve the prediction accuracy. In the case study of Palbociclib-CDK6, Abemaciclib-CDK6, and Ribociclib-CDK6, pmDock performs consistently better than the traditional AutoDock and SwissDock methods. Therefore, we can use pmDock to screen promising inhibitors that potentially target CDK6. We used the three CDK4/6 inhibitors (Palbociclib, Abemaciclib, and Ribociclib) as references to screen all the non-CDK inhibitors in the ChEMBL DB. The Palbociclib and Ribociclib have three important chemical groups: 3N-pyridine, exocyclic NH of the side chain, and carbonyl. For Abemaciclib, there are two critical chemical groups: exocyclic NH of the side chain and carbonyl ( Figure 6) [61,71]. We have screened three potential compounds: CHEMBL272332 [47], CHEMBL205409 [48][49][50] and CHEMBL257665 [47]. These compounds are screened by SwissSimilarity [51] from ChEMBL DB [52,53]. We can obtain their SDF files directly from the ChEMBL DB. We also convert their SDF files into PDB and mol2 files by OpenBabelGUI 2.4.1. The predicted compounds CHEMBL272332 and CHEMBL257665 have the same chemical groups as the reference inhibitors. But CHEMBL205409 only has one similar chemical group: exocyclic NH of the side chain (Figure 7). The similarity scores of CHEMBL272332, CHEMBL205409, and CHEMBL257665 to Palbociclib, Abemaciclib, and Ribociclib are 0.77, 0.41, and 0.22, respectively (Table S10A). The predicted binding probability of CHEMBL272332, CHEMBL205409, and CHEMBL257665 are 1.00, 0.64, and 1.00, respectively (Table S10B). We will conduct biological activity experiments to explore the above compounds' CDK6 inhibition effects in the future.

Conclusions
Here, we provide one multi-source information-based protein-ligand docking approach. The hybrid method is based on two well-known docking methods, AutoDock and SwissDock. Combining these two methods allows us to generate better prediction conformations compared to the original docking methods. The prediction results of pmDock are not very impressive compare to AutoDock, but pmDock can filter out more accurate native-like conformations. Together, pmDock can be implemented for virtual screening and evaluation for the early drug discovery development stage. In the future, we will incorporate additional features as the dynamical motions and co-evolution information to enhance the prediction accuracy.