De Novo Design of Protein Kinase Inhibitors by in Silico Identification of Hinge Region-Binding Fragments

Protein kinases constitute an attractive family of enzyme targets with high relevance to cell and disease biology. Small molecule inhibitors are powerful tools to dissect and elucidate the function of kinases in chemical biology research and to serve as potential starting points for drug discovery. However, the discovery and development of novel inhibitors remains challenging. Here, we describe a structure-based de novo design approach that generates novel, hinge-binding fragments that are synthetically feasible and can be elaborated to small molecule libraries. Starting from commercially available compounds, core fragments were extracted, filtered for pharmacophoric properties compatible with hinge-region binding, and docked into a panel of protein kinases. Fragments with a high consensus score were subsequently short-listed for synthesis. Application of this strategy led to a number of core fragments with no previously reported activity against kinases. Small libraries around the core fragments were synthesized, and representative compounds were tested against a large panel of protein kinases and subjected to co-crystallization experiments. Each of the tested compounds was active against at least one kinase, but not all kinases in the panel were inhibited. A number of compounds showed high ligand efficiencies for therapeutically relevant kinases; among them were MAPKAP-K3, SRPK1, SGK1, TAK1, and GCK for which only few inhibitors are reported in the literature.

P hosphorylation is the most important and widespread covalent modification of proteins. It is used to control enzyme activity in cellular processes and thereby plays a major role in cell signaling and is fundamental to all aspects of cell behavior and organization. 1 Protein kinases catalyze the transfer of the γ-phosphate group from ATP to recognized amino acids of proteins. Kinases have implications for many diseases including cancer, diabetes, and Alzheimer's disease and constitute the second most exploited group of drug targets with many ongoing drug discovery efforts. 2 Despite the extensive research over the past two decades, selective chemical tools are still needed to dissect the complex nature of kinase regulation. 2,3 A wealth of structural information has revealed the general architecture of protein kinases, their binding sites, and complex regulation. 4,5 The ATP-binding sites of most protein kinases share similar features (Figure 1a). 6,7 A key recognition motive is the hinge region that forms hydrogen bonds to the adenine moiety of ATP and is targeted by many kinase inhibitors. Often, inhibitors also address one or both of the adjacent hydrophobic pockets I and II. These are more variable between different kinases than the hinge region, and the differences can be exploited to achieve selectivity. 8 Kinase inhibitors are commonly discovered by highthroughput, virtual or fragment-based screening, often using compound libraries sourced from commercial suppliers. 9−13 While successful in delivering hit compounds, they have only limited template diversity. In order to tackle this issue, various research groups have developed approaches to expand their libraries with proprietary compounds. 14−20 Libraries that contained compounds with heterocycles, which have the potential to interact with the hinge region of the kinase binding site but no previous reported activity against kinases, were of particular high value. 14−18 A difficulty in expanding the kinase libraries was to assess synthetic feasibility of the suggested compounds, especially if they contained novel cores. 14,17 Here, we report on the structure-based de novo design of protein kinase inhibitors. The approach is centered on fragments that have precedence for synthesis but are not commercially available with the required substitution pattern. Libraries around six core fragments without previous reported activity against kinases were synthesized, and selected compounds were screened against a panel of 117 kinases. In addition, the crystal structure of one novel inhibitor in complex with cSrc was determined. Every tested compound was active against at least one kinase. While predicting general activity against kinases on a scaffold level was highly successful, predicting selectivity on a compound level failed. Ligand efficient inhibitors were identified for a number of kinases, which have implications in a range of diseases but for which only a few inhibitors have been reported to date.

■ RESULTS AND DISCUSSION
Structure-Based Design of Novel Protein Kinase Inhibitor Libraries. An in silico screening cascade was established for the design of novel kinase inhibitor libraries (Figure 1b). This approach consisted of the following four principle steps: core fragment extraction out of commercially available compounds, selection of candidate core fragments, docking of core fragments, and fragment expansion.
A core fragment was defined as a ring system plus the directly attached heteroatom containing functional groups. 11 Starting from over two million compounds, about 84,000 unique core fragments were extracted. In the next step these core fragments were filtered for fragment-like properties, the absence of unwanted functionalities, and limited complexity. The resulting 11,000 core fragments were subsequently filtered using a 3D pharmacophore to remove scaffolds that did not contain a hinge-binding motif (Supplementary Figure S1). About 6,000 core fragments that passed this filter step were docked into the binding sites of a panel of 46 different protein kinases (Supplementary Table S1). These proteins were chosen based on the availability in the MRC Protein Phosphorylation Unit in Dundee for compound testing and access to a crystal structure in the public domain. 21 In order to make the docking scores comparable for the different kinases, they were normalized relative to the best score for any fragment in the active site of each kinase in a similar way as carried out previously. 22 Following this approach the top 100 nonselective core fragments were visually inspected. The vast majority of these were predicted to form two or three hydrogen bonds with the kinase hinge region. Six out of the top scoring core fragments or fragments closely resembling these scaffolds had been co-crystallized with a kinase as part of a larger compound ( Figure 2). Comparing the docked poses with the binding modes of these compounds confirmed that they were placed correctly in the binding site in at least one kinase of the docking panel (RMSD of maximum common substructure <2 Å). The novelty of the high-ranking hinge binders was assessed by using the core fragments as substructure queries for ChEMBL 23 and SciFinder (Chemical Abstracts Service, Columbus, OH). It turned out that 73 of the top 100 nonselective core fragments were already reported in the literature as part of known kinase inhibitors. Six novel fragments out of the remaining 27 fragments were short-listed for library enumeration based on synthetic considerations ( Figure 3). None of the selected core fragments was available as part of commercial compounds with either the required substitution pattern or the desired diversity at the time of the study.
In the final step, suitable substitution points to attach additional moieties to interact with the hydrophobic pockets I and II ( Figure 3) were assigned. For all short-listed core fragments, more than one binding mode was predicted for the kinases in the docking panel. Typically, in the different orientations several hydrogen bonds with the hinge region were formed, but the interacting atoms of the cores differed. It was therefore ensured that after adopting these alternative binding modes the R-groups would still be placed into desired regions of the binding site. The libraries were enumerated using commercially available building blocks. The reaction products were filtered to remove molecules with unwanted functionalities and non-drug-like molecules according to Lipinski's rule of five. 24 A final selection was made based on diversity by visual inspection and in-house availability of the building blocks. Due to synthetic considerations, libraries containing the core fragments A and F were generated by varying either R 1 or R 2 (Supplementary Table S2, Figure 3), whereas for core fragments B and D R 1 R 2 were varied simultaneously. Core fragments C and E contained only one R-group for enumeration. In total, 265 compounds were short-listed for synthesis. All libraries were synthesized using parallel synthesis in a maximum of six steps. To speed up synthesis the synthetic routes were designed to introduce diversity as late as possible (Scheme 1), and 186 compounds representing all six core fragments were successfully prepared (yield 7−98%, purity >90%).
Inhibition Profiles of 15 Compounds against a Panel of 117 Protein Kinases. A subset of library compounds was tested against a panel of 117 protein kinases as proof of concept study ( Figure 4). The compounds were chosen to cover all six core fragments and to be fragment-sized (MW < 300 Da), therefore having a higher probability of inhibiting a kinase compared to more elaborated library compounds with the potential draw-back of having only moderate binding affinity. 25 Any compounds showing ≥75% inhibition at 100 μM were defined as active, between 40 and 75% as moderately active, and below 40% as inactive (Supplementary Table S3).  . Predicted binding modes with respect to the hinge region for six high-ranking core fragments (A−F) for which binding to protein kinases was not reported in the literature. Only the most frequent binding mode for each core fragment is shown. The substitution points that target the hydrophobic pockets I and II and that have been selected for diversifying the cores are indicated as R 1 and R 2 , respectively. (The binding sites are oriented as depicted in Figure 1a.) According to these definitions, all assayed compounds were at least moderately active against at least one kinase ( Figure 5a). The most selective compounds were E1 and F2, which both inhibited only one kinase with >40% inhibition (MSK1 and MKK1, respectively). The least selective compounds were B1 and A2, which inhibited 63 and 55 different kinases with >40% inhibition, respectively.
The kinases presented in the kinase panel were not equally inhibited by the 15 compounds selected to present the different core fragments (Figure 5b, Supplementary Tables S3 and S4). Using a ≥40% inhibition cutoff value, 26 kinases (22%) in the panel were not inhibited by any of the tested compounds.
When a cutoff of ≥75% inhibition was used, this number increased to 75 (64%).
Potency Determinations for Selected Compounds. Six compounds (A1, A2, B1, C2, D1, and F3) with inhibition of >50% for certain kinases were selected for IC 50 determinations (Table 1). Activity for all compounds was confirmed with IC 50 values ranging from 4 to 470 μM. Additionally, all compounds were characterized by high ligand efficiencies (>0.30 kcal/mol heavy atom) 26 for at least one protein kinase, with B1 showing the best ligand efficiencies across a range of kinases.
Confirmation of Binding Mode. The binding mode of B1 in complex with cSrc was determined using X-ray crystallography. In the most frequent binding mode that was generated for core fragment B in the kinases of the docking panel the exocyclic amino group pointed toward the gatekeeper and the nitrogen atom bridging the heterocycles was placed next to the hinge region ( Figure 3). This orientation closely matches the   one found in the cSrc-B1 complex structure ( Figure 6, Supplementary Table S5, rmsd = 0.77 Å for non-hydrogen atoms).

Comparison of Predicted and Observed Inhibition Profiles.
To compare the modeling results with the experimental inhibition data, we docked the 15 compounds that were profiled in our kinase panel (Figure 4) into the binding site of the kinases in the docking panel (Supplementary  Table S1). The rank obtained for the least favorable scoring inhibitor among the 15 tested compounds for each kinase was compared to the number of identified hits for that target (Supplementary Table S6). If docking performance was perfect, both numbers would be identical (e.g., if two hits were identified for a kinase the ligand with the lowest score should have rank 2 with the other ligand being on rank 1). While for some targets the worst rank corresponded closely to the number of identified hits, for most targets separation between active and inactive compounds was not possible, both when a cutoff value of ≥40% (data not shown) or ≥75% inhibition was used. Similarly, docking failed to predict the selectivity profile of the inhibitors (Supplementary Table S7). For this comparison, the scores for all 15 compounds docked into all 46 protein structures were normalized relative to the best score for any fragment in the active site of each kinase obtained when docking the library of candidate core fragments. Compounds were considered to be predicted as active if their normalized score was ≥1.0 (equal or better than the best score for a core fragment in a particular kinase). At the chosen cutoff level, there was only little agreement between predicted and confirmed activity. For instance, compound B1, the least selective ligand in the panel, was predicted to be active for only nine out of the 31 kinases for which activity was found. In contrast, E1 and F2, the most selective compounds in the panel were predicted to bind to 10 and 21 kinases, respectively. However, neither compound demonstrated activity against these kinases in a biochemical assay. The only compound for which the profile was predicted correctly was B4, which did not inhibit any kinase in the docking panel.
Overall, the chosen de novo design strategy (Figure 1b) was highly successful. Starting with a database of core fragments, six scaffolds for which activity against kinases was not reported previously were identified, and synthetic routes were developed to prepare focused libraries around each core. All tested compounds were active against at least one kinase (Supplementary Table S3). We attribute this success to the following reasons: (1) Focusing on core fragments that occur in commercially available compounds but are not available with the required substitution pattern ensured that the libraries were synthetically accessible and at the same time novel. (2) Exploiting the wealth of kinase crystal structures when shortlisting the cores canceled differences in the kinase binding sites and increased the chances that highly ranked cores would actually bind to a kinase when part of a larger compound.
Evaluating a subset of the library compounds in a large kinase panel confirmed some previous findings but also highlighted differences (Supplementary Table S4). For instance, we deliberately concentrated our synthetic efforts on predicted unselective fragments. It turned out that derivatives of each core displayed distinct selectivity profiles ( Figure 5). This observation is in tune with the notion that kinase selectivity often does not stem from the core itself but from its substituents. 13,27 Further, we observed inhibition of PIM1 and PIM3 (Table 1). These kinases possess an altered hinge binding region incompatible with forming the hydrogenbonding interactions typically observed in kinase-ligand complexes (Figure 1a). 28 It was speculated that interactions with different amino acids in the binding site require a similar pharmacophore as the interactions with the hinge region that presumably lead to high hit rates of fragment-like kinase inhibitors for these atypical kinases. 13 Our data confirms this hypothesis. However, Posy et al. 27 and Bamborough et al. 29 reported low hit rates for ASK1 (0 and 0.2%, respectively). In contrast, by testing just 15 compounds against this kinase we found two hits containing two different scaffolds. Anastassiadis et al. 30 found no hits for NEK6 and MAPKAP-K3, whereas in our exercise two inhibitors containing the same scaffold and seven inhibitors containing three different scaffolds were discovered, respectively. One of the MAPKAP-K3 inhibitors had an IC 50 value of 21 μM and a ligand efficiency of 0.33 kcal/ mol heavy atom ( Table 1). The previous studies had concluded that these kinases were less tractable. Our data contradict these findings, which is another warning that profiling data can be interpreted only in relation to the chemical space covered by the screening library.
Molecular docking was very successful when predicting binding modes of fragments and general activity against kinases on a core fragment-level but less so when individual compounds for individual kinases were considered. Six high ranking core fragments had been co-crystallized with kinases as part of more elaborated compounds ( Figure 2). All of the observed binding modes for the core fragments were among the ones generated in the kinase docking panel. Also, the binding mode of B1 in cSrc was predicted correctly ( Figure 6). Remarkably, 73 out of the top 100 ranked core fragments had activity against kinases reported in the literature. This unusually high enrichment was presumably caused by exclusively docking compounds that fulfilled a pharmacophore required for hingebinding fragments (Supplementary Figure S1a) and by using consensus scoring across a panel of kinase structures (Supplementary Table S1). This success provided confidence that also the remaining high ranking fragments for which no activity against kinases was reported would be suitable hingebinding scaffolds. At least for the selected core fragments that were chosen for library synthesis and for which examples were tested in a kinase panel this turned out to be the case. When trying to predict the selectivity profiles of the tested library compounds, docking failed to give useful guidance (Supplementary Table S7). This failure is driven by the inability to correctly rank the tested compounds according to their affinity for given kinases (Supplementary Table S6). Small changes, such as adding a nitrogen atom to the heterocycles (E1 vs F1, Figure 4), which change the selectivity of the compounds are not accurately captured by the used scoring function, and furthermore discrimination between more drastic changes, such as ranking structurally unrelated compounds, failed in most cases. In addition, predicting the effect of the subtle differences in the kinase binding sites on compound affinity remains a challenge. 14 In the past, more promising results were obtained by using structural and inhibition data to train predictors. 31−34 However, due to their nature, these approaches do not allow to make predictions for new compound classes or kinases.
A number of compounds were chosen for IC 50 determinations against selected kinases ( Table 1). All tested compounds were characterized by high ligand efficiencies (≥0.30 kcal/mol heavy atom) for at least one kinase, rendering them in general promising starting points for drug discovery for a range of targets with high medical interest. 26 Among others, compounds with high ligand efficiencies were identified for MAPKAP-K3, SRPK1, SGK1, TAK1, and GCK (Table 1). For all of these kinases less than 30 compounds with affinities ≤100 μM are reported in ChEMBL Kinase SARfari (as of April 2012). 23 They are thought to be targets for the treatment of cancer, systemic inflammation response syndrome, rheumatoid arthritis, other inflammatory diseases, and hypertension. 35−39 The hits identified by our in silico study can serve as starting points to further explore these targets.
Summary and Conclusion. A de novo design approach for novel kinase inhibitors was established that was centered on core fragments having precedence for synthesis but no reported activity against kinases. Using a wide range of methods at the interface of chemistry and biology, e.g., structure-based design, chemical synthesis, X-ray crystallography and biological profiling, the approach was validated. Six small molecule libraries were synthesized. Selected compounds were profiled in a large kinase panel, and dose−response curves were determined. In addition, the binding mode of one compound was confirmed using X-ray crystallography. Whereas predicting binding modes and general activity against kinases was very reliable, correctly predicting selectivity solely based on docking to crystal structures failed. Nevertheless, this study demonstrated the overall success of the chosen design strategy. In addition, a number of inhibitors with good binding efficiency were identified for a range of kinases with therapeutic relevance and can now serve as starting points for chemical tools to further explore these targets.

■ METHODS
Core Fragment Extraction. Core fragments were derived from an in-house database containing more than two million commercially available compounds using a method described previously. 11 In brief, core fragments were defined as ring atoms plus the atoms of directly attached polar functional groups. Polar functional groups were specified as polar hetero atoms (S, N, P, or O) and polar hetero atoms double or triple bonded to carbon atoms or linked to other polar hetero atoms or carbonyl groups. Functional groups linking ring atoms were added to both resulting scaffolds. Fragments containing only carbon atoms or carbon atoms and aromatic sulfur or oxygen atoms were disregarded.
Fragment Selection. Three filters were used to eliminate inappropriate or undesirable scaffolds from the generated database of core fragments. First, fragments that contained unwanted functionalities as described previously were removed. 11 Second, fragments that did not comply with a modified "Rule of Three" were removed from the database. 40 Here, only ring fragments with MW < 300 Da, number of hydrogen bond donors ≤3, number of hydrogen bond acceptors ≤6, and clogP ≤ 3 were retained. Additionally, core fragments with the total charge ≤ −1 or ≥1 were removed. Finally, compounds containing more than two fused rings or more than six rotatable bonds were rejected.
Pharmacophore Search. The 3D pharmacophore search was carried out using the UNITY tool available with the SYBYL-X 1.0 package (Tripos Inc.). The default file sln3d_macros.def that contained the definition of predefined feature types (macros) such as hydrogenbond acceptor or hydrogen-bond donor was modified to also account for C−H···O hydrogen bonds as often observed in protein kinases (Supplementary Figure S1b). Otherwise, default settings were used when converting the filtered core fragments to a 3D UNITY database.
The pharmacophore was generated based on the crystal structure of CDK2 in complex with an imidazole piperazine inhibitor (PDB code 2w05). The resulting pharmacophore model was made up of four features (Supplementary Figure S1a). One hydrogen-bond acceptor and two hydrogen-bond donor features (radius 0.6 Å) were defined to interact with the hinge region. To consider the directionality of the hydrogen bonds, the features were linked with their hydrogen-bond binding partners in the protein. The aromatic feature (radius 1.5 Å) was added to take into account that the vast majority of protein kinase inhibitors contain an aromatic ring system that is placed within the adenine binding region. 14 To fulfill the pharmacophore, all hits were required to contain the aromatic feature and two out of the possible three hydrogen-bonding features.
Receptor Preparation. Crystal structures of protein kinases that were available in the MRC Protein Phosphorylation Unit, University of Dundee, for profiling at the beginning of the study were downloaded from the PDB (Supplementary Table S1). All crystal structures representing these kinases exhibit the DFG-in conformation. 6 The structures were aligned with CDK2 (PDB code 2w05) using the hinge region as matching atoms. Hydrogen atoms of polar groups were added using MOLOC (Gerber Molecular Design), and their positions were minimized with the MAB force field as implemented in MOLOC. 41 Subsequently, all nonprotein atoms were removed from the crystal structures. Compounds were placed into the binding site and scored using DOCK 3.5.54. 42,43 The required sphere set to define the region with a low dielectric constant was composed of the bound ligands' atoms. If necessary, the set was manually modified to cover the entire buried part of binding site. The sphere set used as matching points for docking was manually generated by placing matching atoms into the adenine binding region close to the hinge region. The same set was used for all protein kinases. To favor interactions between ligand and hinge region, partial charges of the carbonyl oxygen atoms in the amide backbone of the hinge region were decreased by 0.4, and the partial charge of the amide hydrogen atom was increased by 0.4 ( Figure S1a). The total charge was then balanced by distributing the charge difference among the remaining hinge region atoms. Grids to store information about excluded volumes, van der Waals potential, and ligand desolvation were calculated as described previously. 44,45 Ligand Preparation. Tautomers, stereoisomers, and protonation states were enumerated and converted to a suitable database format as described previously. 44 Docking Protocol. Multiple conformations and orientations of each ligand were docked into the kinase binding sites using DOCK 3.5.54. 42,43 Ligand and receptor overlap bins were set to 0.4 Å. Distance tolerance for matching ligand atoms to receptor was set to 1.2 Å. All fragments were docked into the ATP binding pocket, and each docking orientation was filtered for steric fit. Only fragments that were able to pass the steric fit filter were scored for electrostatic, van der Waals, and desolvation energies. The best scoring conformation and representative (tautomer, protonation state) of each compound was used for final ranking.
To compare fragments between different kinases, the originally calculated energy scores were normalized by the formula S ij norm = S ij / S j best where S ij norm is the normalized score for fragment i in the active site of kinase j, S ij is the original score, and S j best is the best score for any ligand in the active site of the kinase j. All negative S ij norm scores were set to zero. Thus, S ij norm = 1 if the fragment scores best of all docked fragments for a particular kinase, and S ij norm = 0 if it scores worst.
Library Enumeration. The reagents were selected through a web interface based on Pipeline Pilot (Accelrys Software Inc.). Only inhouse reagents that were commercially available were used for the enumeration process. Enumeration of libraries was performed using the "core plus R-groups" method from Pipeline Pilot which was again presented through a customized web interface. All reaction products were filtered to remove molecules with unwanted functionalities, >5 hydrogen-bond donors, >10 hydrogen bond acceptors, molecular weight >500 Da, clogP > 5, and polar surface area >140 Å. Finally, all remaining reaction products were docked into the binding site of three kinases (CSK, RSK1, and GSK3β) to exclude those compounds which did not sterically fit into a typical ATP binding site.
Chemistry. All synthesized compounds had a purity of greater than 90% (measured on analytical HPLC-MS system). The synthetic details for the library compounds together with M + and 1 H NMR data to confirm compound identity and purity are listed in the Supporting Information.
Kinase Assay. Selected compounds were screened against a panel of mammalian kinases routinely run in the MRC Protein Phosphorylation Unit at the University of Dundee (www.kinasescreen.mrc.ac.uk). 21 Compounds were supplied in DMSO and screened in duplicates at 100 μM concentration using a radioactive ( 33 P-ATP) filter-binding-assay. For hit validation and all subsequent IC 50 determinations, selected compounds were solubilized in DMSO at a top concentration of 51 mM and serially diluted to achieve 10point titration of final assay concentrations from 1 mM to 30 nM. All IC 50 determination assays were done in duplicate and the IC 50 values were calculated using Prism (GraphPad Software, Inc.). All biochemical assays were run below the K m app for the ATP for each enzyme allowing comparison of inhibition across the panel.
Crystallization and Structure Determination of cSrc-B1. Fragment B1 was co-crystallized with cSrc using conditions similar to those previously reported by Michalczyk et al. 46 Diffraction data of the cSrc-B1 complex crystals were collected at the PX10SA beamline of the Swiss Light Source (PSI, Villingen, Switzerland) to a resolution of 2.5 Å, using wavelengths close to 1 Å. The data set was processed and refined as described previously. 46 Detailed data, refinement, and Ramachandran statistics are provided in Supplementary Table S5. ■ ASSOCIATED CONTENT

* S Supporting Information
Additional tables with percent inhibition data for the discussed compounds, hit rates in various kinase screens, and data collection and refinement details for the crystal structure of cSrc-B1; a figure of the pharmacophore used for virtual screening; the synthetic routes to the described compounds; NMR spectra of key compounds. This material is available free of charge via the Internet at http://pubs.acs.org.

Accession Codes
The crystal structure of cSrc in complex with B1 has been deposited into the PDB with the code 4fic.