Tyrosine kinase drug discovery: what can be learned from solved crystal structures?

Understanding molecular recognition during ligand binding is crucial for rational drug design. Protein kinases are a well known pharmacological target, especially in the therapeutic area of cancer. Solved crystal structures of tyrosine kinase domains in complex with ligands deliver insight into the binding event. These experimental data were collected and analysed by means of molecular modelling techniques. Common molecular recognition patterns were depicted and studied among this class of proteins. The results of this analysis and the consequences for rational design of inhibitors are presented


Introduction
Protein kinases are enzymes that transfer the γ-phosphate group of ATP to a peptidic substrate.These proteins are divided in two subfamilies based on the molecular target of the phosphorilation.Serine/threonine and tyrosine kinases transfer phosphates to hydroxyl groups of serine/threonine and tyrosine residues, respectively.
Tyrosine kinases are classified in receptor and non-receptor proteins, according to their function and localization in the cell, and sequence similarity. 1 Both are implicated in cell signalling: they do amplify, translate and integrate signals from inside and outside the cell. 2 Epidermal growth factor receptor (EGFR), plateled-derived growth factor receptor (PDGFR), fibroblast growth factor receptor (FGFR), vascular endothelial growth factor receptor (VEGFR) and insulin receptor (InsR) are examples showing the implications of these proteins in cell growth, differentiation and development.Deregulation of their catalytic activity is correlated to oncogenic and inflammatory processes as well as diabetes and other diseases. 3everal types of domains form a functional protein tyrosine kinase.Extra cellular and Trans membrane domains are present only in the case of receptors; while SH2, SH3 and other domains which up-and down-regulate the kinase catalytic activity (under physiological conditions) and interact with several cell components can occur in both classes.The conserved part of members of the protein kinase family is the catalytic domain, so-called kinase domain whose overall threedimensional structure is depicted in figure 1.
The kinase domain is characterised by structural plasticity resulting in the existence of several conformations as shown by crystallographic studies. 4The catalytic domain undergoes a conformational change to become enzymatically productive. 5ccording to the catalytic requirements, the kinase domain has two recognition sites: one for the ATP (phosphoryl donor), placed in a cleft between the N-and C-terminal lobes and one for the peptide substrate (phosphoryl acceptor) which is situated within the C-terminal lobe in correspondence of the activation loop C-terminal end (figure 1).
Tyrosine kinase inhibitors interfering with ATP binding are of interest for anticancer treatment.7][8] During the last years, a wealth of biochemical and structural data about the binding of these molecules has been produced.In fact, many ligand-kinase domain complexes have been crystallised and their three-dimensional structures have been solved by means of X-ray techniques.The coordinates of these complexes are publicly available at the Protein Data Bank (PDB, http://www.rcsb.org/pdb).In this manuscript we report the results of a comparison of these structures combined with multiple sequence alignments of tyrosine kinase domains.Common characteristics of the complexes are depicted and minimal requirements for binding deduced.

Results and Discussion
The Kinase domain The overall three-dimensional structure of the kinase domain, depicted in figure 1, is conserved throughout the protein kinase family.The N-terminal lobe is a twisted β-sheet of five antiparallel β-strands and one α-helix, and the C-terminal lobe is composed by eight α-helices and four βstrands.These two lobes are connected by a segment called hinge region and they delimit the cleft representing the ATP-binding site.Other important moieties of this domain are: the nucleotide-binding loop, the αC helix, the catalytic and the activation loop (see figure 1).
For easier understanding of the presented data, residue numbering and representations are referred to the sequence and crystal structures of the insulin receptor (PDB codes: 1IR3 5 and 1IRK 9 ).

Variety of ligands
The PDB was screened and Cartesian coordinates of 44 structures of the catalytic domain of several tyrosine kinases in complex with ligands were retrieved (PDB entries are found in the supplementary information section).These ligands have IC 50 values ranging from nM to µM. [10][11][12][13][14][15][16][17][18][19][20][21] The first analysis revealed the presence of 15 structures of kinase domains in complex with ATP-like molecules.Interesting, among these there are two kinase domains in complex with ATP analogue chemically linked with peptide that behaves as bisubstrate ligand.
The remaining 29 structures are kinase domains in complex with various types of inhibitors that belong to several chemical families (figure 2).The pyrimidine ring of the adenine which characterised ATP-like ligands is found as basic moiety in several groups of inhibitors: aminopyrimidine, pyrazolopyrimidine, pyridopyrimidine and quinazole derivatives.
Dihydroindolone and an isomer of it characterise other two classes of compounds.Staurosporine, which is a well-known ATP-competitive inhibitor with a wide range of action among protein kinases 17 and its analogues are dihydroisoindolone based molecules.
Recently, tyrosine kinase domains have been crystallised in complex with two aminoxazole derivatives.Finally, the last two inhibitors taken into consideration are a flavonol (quercetin) and an azepinone (debromohymenialdisine).
In order to analyse the protein-ligand interactions computer-aided visualization of the complexes was used.The ligands interact with the kinase domain via hydrogen bonds to the hinge region and more or less extended hydrophobic interactions with the region expanding from the hinge towards the nucleotide-binding loop and/or the αC helix.

Common residues in the complexes
For each complex the residues interacting with the ligand were studied by examination of a protein area of 4.5 Å in the proximity of the ligand.The amino acids found were compared at the sequence level by means of multiple sequence alignment of all structures taken into account.
This procedure allowed the highlighting of common residues involved in conserved interactions between the ligands and the kinase domains.In all complexes the common ligand-protein interaction pattern is constituted by hydrophobic interactions involving 4 residues and a single hydrogen bond.
These residues in the case of insulin receptor correspond to: Ala1028 (from the core of the Nterminal lobe), Met1139 (from the C-terminal lobe), Leu1002 and Val1010 of the nucleotidebinding loop and Met1079 of the hinge region (figure 3).Defining and determining the ligand core Superposition and visualisation of the complexes permitted to depict in which part of the protein the conserved residues interacting with the ligand are localised.Common interactions are found at the same binding region in all structures, called adenine region in a previous work. 23he moiety of the ligand interacting with this region was examined and defined as "ligand core".The results of this analysis are summarised in figure 4.
It was generally possible to cluster ligand cores based on the ligand families with each family having a single ligand core.However, for STI-571, the ligand core was determined twice because of the two different modes of binding found in 4 complexes e.g. one involving the pyridine and the other the pyrimidine moiety [namely STI-571 bound to Abl (pdb codes: 1IEP 15 and 1OPJ 13 ), Kit (1T46 19 ) and Syk (1XBB 20 )].
Ligand cores.In the drawn structures, R is in accordance with the rest of the molecules as in figure 2. Arrows indicate, for each ligand core, the atom involved in the hydrogen bond to the hinge region.

Amino acid variability around the ligand core
For the 4 residues involved in hydrophobic interactions with the ligand cores, a consensus on the type of amino acid was expected.In fact, the hydrophobic character of these residues was checked by means of multiple sequence alignment (data not shown) of catalytic domains of 90 tyrosine kinases. 1,24The amino acids variability at the four positions was calculated based on the multiple sequence alignment and is reported in table 1.
The table shows that at each position there is a rather dominant amino acid type.The alternative residues have always hydrophobic side chains, and thus the variability concerns only their bulkiness.involved in hydrophobic contacts with the ligand core.The data reported about the amino acid variability of the protein kinases 23 differ slightly in their values from the data here presented.This is most likely due to the used data sets, here only protein tyrosine kinases are considered which are a subfamily of the protein kinases.Nevertheless, the relative preference of amino acids shown here corresponds to the previous report 23 .

Active/inactive: residues movements
The superposition of the apo and the ATP-bound crystal structures of the kinase domain of the insulin receptor revealed that a conformational change occurs upon ligand binding (see figure 5).
The N-terminal lobe is moved towards the second lobe when the ligand is bound, narrowing the cleft of the ATP-binding site.
To estimate the domain's closure around the ligand core, distances between amino acids in the N-terminal and C-terminal lobe were measured (reported in table 2).The selected residues are the ones involved in the common interactions: from the Cα of Met1139 (C-terminal lobe) to the Cα of Leu1002, Val1010 and Ala1028 (N-terminal lobe).
Distance variations (range 0.8-2.2Å)shown in table 2 are meaningful of a conformational change between the productive and the unproductive structures.Whereas the variation of distances measured in other regions of these two complexes is significantly smaller (0.1-0.3 Å).

Conclusions
In the present study, by the comparison of the structures of the tyrosine kinase domain in complex with ligands, we could show that binders of this protein family share a common feature that we called "ligand core".As a consequence, the interactions involving this part of the molecule were understood as minimal requirements for binding: a hydrogen bond to a main chain atom of the hinge region and apolar interactions with the side chain of 4 residues with conserved hydrophobic character.
This represents a reduction of the complexity for the pharmacophore compared to the of the ATP that binds the hinge region via 2 hydrogen bonds.The domain's closure upon ligand binding, calculated for the insulin receptor structures reveals that the ligand is likely to stabilize the protein conformation with the ligand core playing the major role.Other tyrosine kinase domains have been crystallised in their apo and ligated forms, and their conformational changes were checked by measurements of the distances earlier mentioned.The distance's variations are smaller and less appreciable than the ones reported for insulin receptor.This fact is due to several experimental aspects (size of ligand, presence of other domains interfering with the catalytic one, phosphorylation of the activation loop, conditions of crystallisation, etc.) that might lead to differences in the degree of closure of the two lobes.This is in agreement with the general idea of protein "breath". 25n the year 2004 a similar study was published by Vulpetti and Bosotti. 23The approach used and the considerations and point of view in this study and theirs are slightly different leading towards different and complementary results.
While we focused on defining the ligand core, the authors identified 38 residues of the catalytic domain of protein kinases which built up the ATP pocket.They studied the amino acid variance among the protein family focusing on how to achieve ligand binding selectivity.This work and the list of the ligand cores will extend with the number of the ligand-tyrosine kinase structure solved.The knowledge of the ligand cores might be useful in drug design when hypothetical ligand-binding modes are considered and suggested, for example while validating the results achieved from docking methods.The table collects the PDB codes and other information of all tyrosine kinase complexes taken into consideration in the present study.

Figure 1 .
Figure 1.The three-dimensional structure of the kinase domain.The tertiary structure of the kinase domain of the insulin receptor in complex with adenylyl imidodiphosphate (AMP-PNP), and peptide substrate (ball-and-stick and stick representations, respectively; PDB code: 1IR3) is depicted.Main regions are coloured: nucleotide-binding loop (yellow), αC helix (red), hinge region (green), activation loop (violet) and catalytic loop (cyan).

Figure 2 .
Figure 2. Families of compounds.2D-molecular structures of the ligands found in complex with the kinase domain.Names of compounds and families are reported.Annotations of the ligand cores are given in squared brackets (referring to figure 4).

Figure 3 .
Figure 3. Common interaction pattern in the ligand-kinase domain complexes.Secondary structure of the kinase domain of insulin receptor (1IR3) is represented as a tube (magenta), ATP analogue in ball-and-stick representation and the common residues as colour-coded sticks (carbon in cyan for the ligand and magenta for the protein, oxygen in red, nitrogen in blue, sulphur in yellow and phosphorus in orange and the hydrogen bond as black dashed line).

Figure 5 .
Figure 5. Conformational changes around the core.The superposition of active and nonactive conformations of insulin receptor is depicted here.The protein main chain (tube representation) and the common residues (sticks representation) are coloured in magenta (1IR3) and in green (1IRK).Adenine of the AMP-PNP is also drawn as sticks (in blue the nitrogen atoms and in light grey the carbon atoms), the pyrimidine moiety represents the ligand core.

Table 1 .
Amino acid variability of the hydrophobic pattern interacting with the ligand core Variance (expressed in percentage) of the amino acid type at the positions (numbering referred to insulin receptor)

Table 2 .
Distances between residues of the N-terminal lobe and Met1139.Distances in Angstrom from Cα of the Met1139 to Cα of the N-terminal lobe residues are reported.