Evolutionary Divergence in the Conformational Landscapes of Tyrosine vs 1 Serine/Threonine Kinases 2

13 Inactive conformations of protein kinase catalytic domains where the DFG motif has a “DFG- 14 out” orientation and the activation loop is folded present a druggable binding pocket that is 15 targeted by FDA-approved “type-II inhibitors” in the treatment of cancers. Tyrosine Kinases 16 (TKs) typically show strong binding affinity with a wide spectrum of type-II inhibitors while 17 Serine/Threonine Kinases (STKs) usually bind more weakly which we suggest here is due to 18 differences in the folded to extended conformational equilibrium of the activation loop between 19 TKs vs. STKs. To investigate this, we use sequence covariation analysis with a Potts 20 Hamiltonian statistical energy model to guide absolute binding free-energy molecular dynamics 21 simulations of 94 protein-ligand complexes. Using the calculated binding free energies together 22 with experimental values, we estimated free-energy costs for the large-scale (~17-20Å) 23 conformational change of the activation loop by an indirect approach, circumventing the very 24 challenging problem of simulating the conformational change directly. We also used the Potts 25 statistical potential to thread large sequence ensembles over active and inactive kinase states. The structure-based and sequence-based analyses are consistent; together they suggest TKs evolved to have free-energy penalties for the classical “folded activation loop” DFG-out conformation 2 relative to the active conformation that is, on average, 4-6 kcal/mol smaller than the 3 corresponding values for STKs. Potts statistical energy analysis suggests a molecular basis for 4 this observation, wherein the activation loops of TKs are more weakly “anchored” against the 5 catalytic loop motif in the active conformation, and form more stable substrate-mimicking 6 interactions in the inactive conformation.


3
STKs are an ancient class of protein kinases that predate the divergence of the three domains of 1 4 life (bacteria, archaea, eukaryote) 3 , whereas TKs are a more recent evolutionary innovation, 1 5 having diverged from STKs about 600 million years ago during early metazoan evolution 4,5 .

6
Kinases are important therapeutic targets in a large number of human pathologies and cancers.

7
Both TKs and STKs share a striking degree of structural similarity in their catalytic domains, 1 8 owing to evolutionary selective pressures that preserve their catalytic function; in particular, the 1 9 location and structure of the ATP binding site is highly conserved which raises significant 2 0 challenges for the design of small-molecule ATP-competitive inhibitors that are both potent for 2 1 their intended target(s) and have low off-target activity for unintended kinase targets. The latter 2 2 is referred to as the "selectivity" of an inhibitor, a property which is difficult to predict and 1 control but is nonetheless very important for developing drugs with minimal harmful side effects.

2
A particular class of ATP-competitive kinase inhibitors which were proposed to have a high 3 potential for selectivity are called "type-II inhibitors" which only bind when the kinase adopts an 4 inactive "DFG-out" conformation. "DFG" is a conserved catalytic motif located at the N-5 terminus of a ~20 residue activation loop that is highly flexible and controls the activation state 6 of the kinase and the structure of the substrate binding surface. In both TKs and STKs, this 7 "activation loop" undergoes a large-scale (~17Å) conformational change when the DFG motif 8 flips from the active "DFG-in" conformation where the activation loop is "extended", to the 9 classical DFG-out conformation where the activation loop is "folded". The flip of the DFG motif 1 0 from "in" to "out" opens a back pocket which is connected to the conserved ATP binding site 1 1 through a "gatekeeper" residue. Type-II inhibitors typically have a long chemical fragment that 1 2 allows them to bind across the gatekeeper and form interactions with residues in the "back 1 3 pocket" which is formed due to the DFG flip. In contrast, type-I inhibitors (the majority of kinase 1 4 drugs) occupy the ATP pocket but not the back pocket and can bind to either DFG-in or DFG-1 5 out. For these reasons, it has been proposed that type-II inhibition holds greater potential for the 1 6 design of highly selective drugs [6][7][8] ; it has been shown that different kinase sequences have 1 7 different propensities to adopt DFG-out in the absence of inhibitor 9,10 , and the DFG-out back 1 8 pocket has been suggested to have a lesser degree of sequence and structural homology between 1 9 kinases 11 . However, the notion that type-II inhibitors developed to-date are more selective than 2 0 type-I inhibitors has been brought into question 12,13 , suggesting that further consideration of the 2 1 energetic contributions described above is required.

2
In order to fully exploit the target-selective potential of type-II inhibitors it is necessary to 1 understand the underlying sequence-dependent principles that control the conformational 2 preferences of their kinase targets, and the extent to which this has been diversified by evolution. 3 This can, in principle, be directly approached using free-energy simulations to estimate the 4 reorganization free-energy required for different kinases to adopt DFG-out, although this is 5 computationally very expensive and of uncertain reliability for conformational changes involving 6 large-scale loop reorganizations, such as the ~17 Å "folding" of the activation loop that 7 accompanies the transition from active DFG-in to the inactive, classical DFG-out state. To 8 accommodate this limitation, we employ modern sequence-based computational methods to 9 characterize the conformational selection process over the entire kinome, and combine the 1 0 sequence-based results with structure-based free energy simulations with the goal of identifying 1 1 evolutionarily divergent features of the energy landscape that control the preference of individual 1 2 kinases for the active (DFG-in) vs inactive (DFG-out "folded activation loop") states. To this 1 3 end, we report evidence that TK catalytic domains have a molecular evolutionary bias that shifts 1 4 their conformational equilibrium towards the inactive "folded activation loop" DFG-out state in 1 5 the absence of activation signals. In contrast, STKs as a class have a more stable active 1 6 conformation which is favored over the DFG-out state due to sequence constraints in the absence 1 7 of other signals.

8
As described below, our analysis of a previously published kinome-wide assay suggests that 1 9 TKs have properties which privilege the binding of type-II inhibitors in comparison to STKs, 2 0 which leads us to hypothesize an evolutionary divergence in their conformational energy 2 1 landscapes. To investigate this, we used a Potts Hamiltonian statistical energy model derived 2 2 from residue-residue covariation in a Multiple Sequence Alignment (MSA) of protein kinase 2 3 sequences to probe their conformational equilibria as previously described 9 . Using an approach 1 that involves "threading" a large number of kinase sequences onto ensembles of DFG-in and 2 DFG-out structures from the PDB and scoring them using the Potts Hamiltonian, we are able to 3 view the evolutionary divergence in TK and STK conformational landscapes. To validate our 4 results, we used the Potts statistical energy threading calculations to guide target selection for a 5 set of more computationally intensive free-energy simulations. These simulations use type-II 6 inhibitors as tools to probe kinase targets that have already reorganized to DFG-out, allowing us 7 to estimate the free-energy of reorganization Δ ‫ܩ‬ as the excess between the absolute binding 8 free-energy calculated from simulations and the standard binding free-energy measured 9 experimentally in vitro, which already includes the cost to reorganize. Although our methods

5
Insights into the sequence-dependent conformational free-energy landscape 1 6 The binding of type-II inhibitors is achieved once a protein kinase has reorganized to the DFG- The results of the molecular dynamics binding free energy simulations when combined with 5 experimental binding affinities, reveal significant differences in the conformational free-energy conformational space (Fig. 4). From this relationship, we can approximate a scale for the Potts is extremely unlikely to be observed by chance (ܲ , can also be written as a sum over pairs of Methods for details). We find that ~75% of the total contribution to Δ Δ ‫ܧ‬ strands are peeled apart (Fig. 6B and Fig. 6C). The Potts model suggests this conformational 5 change is energetically penalized in STKs due to favorable contacts between position  require access to the DFG-out conformation.

9
The "RD-pocket" 50 is a conserved basic pocket formed by the Arg and Asp residues of the active conformation in RD-STKs (Fig. 6B left), appearing in 28% of RD-STKs and only 2% of 2 0 RD-TKs. In RD-TKs, the residue at this position is usually Arg or Lys which appears to be 2 1 repelled from the RD-pocket (Fig. 6C left) and is suggested by the Potts model to result in a less 2 2 stable active conformation.

3
The largest Δ Δ ‫ܧ‬ term, which contributes 16% of the total difference in average Potts 1 conformational penalties between STKs and TKs, comes from an interaction pair in the C- replaced with Ala, with the exception of a few TKs (e.g. c-Src) which have instead adopted Arg terminal anchor, leading to a less stable active conformation in TKs as compared with STKs. in 33% of STKs, but never in TKs (Fig. S4D). The Potts coupling between these residues is 1 highly favorable. In contrast, we observe ‫,161ܮ(‬ ‫ܯ‬ 1 6 6 ) in 40% of TKs but never in STKs (Fig.   2 S4H), which have weaker coupling. The bulky sidechains of ‫,161ܮ(‬ ‫ܯ‬ 1 6 6 ) observed in TKs 3 causes the activation loop to "bulge" in this C-terminal region which has been previously 4 identified as a feature of TKs that helps shape the substrate binding site to accommodate Tyr 5 residues 50 . In addition to this paradigm, our analysis suggests that the C-terminal bulge results in 6 weaker structural constraints on the active conformation relative to STKs. DFG-out conformation with this property, and instead the activation loop is typically found to be 1 7 unresolved and/or projecting outwards towards solvent (Fig. 6B right). The Potts model suggests In this work, we have combined sequence and structure-based approaches to analyze the identified in this study reveals that the conformational landscape has a strong sequence 1 6 dependence with STKs having a ~4 kcal/mol conformational free energy bias favoring the active 1 7 state over the inactive state relative to TKs (Fig. 4). We note that the most potent type-II TKs, despite the substantial additional reorganization penalty that STKs must overcome. This as TKs (10,345 raw sequences, 1,069 effective sequences) and those that satisfy the condition for STKs at 2 0 position 126 and are non-overlapping with the TK condition at position 128 were classified as STKs 2 1 (210,862 raw sequences, 22,893 effective sequences). The effective number of sequences in each class 2 2 were calculated by summing over sequence weights, where each sequence was assigned a weight defined 2 3 as the fraction of the number of sequences in the same class that are within 40% identity. In this way, we 2 4 correct for the effects of phylogenetics in the calculation of sample size as well as other quantities (see 1 below). 2 Human kinome dataset. 497 human kinase catalytic domain sequences were acquired from ref. 2 3 (excluding atypical kinases). These sequences were aligned using a Hidden Markov Model (HMM) that 4 contains 259 columns (L = 259), which was constructed from the same MSA used to derive our Potts 5 model. 447 human kinases remained after filtering-out sequences with 32 or more gaps. 6 Classification of gatekeeper size. The designation of gatekeeper residues as "large" or "small" was 7 based on sidechain van der Waals volumes 68 (Fig. S6), where small gatekeepers have a volume of < 110 8 Å 3 (Gly, Ala, Ser, Pro, Thr, Cys, Val) and large gatekeepers have a volume of > 110 Å 3 (Asn, His, Ile, 9 Leu, Met, Lys, Phe, Glu, Tyr, Gln, Trp, Arg). 1 0 PDB dataset and conformational states. X-ray crystal structures of tyrosine, serine/threonine, and dual-1 1 specificity eukaryotic protein kinases in the PDB were collected from http://rcsb.org on July 30 th , 2020. 1 2 The protein sequences of 7,919 chains were extracted from 6,805 PDB files by parsing the SEQRES 1 3 record and aligned to the MSA used to construct our Potts model, using a Hidden Markov Model (HMM). 1 4 For this work, our classification of the active DFG-in and classical DFG-out conformational states is 1 5 based on ref 24, which we describe in further detail in the Supplementary Information. 1 6 Contact frequency differences. Each of the clustered PDB structures were converted to an adjacency 1 7 matrix of binary contacts (1 for "in-contact", 0 otherwise in Eq. 9, we are 9 identifying position pairs that cause STKs to have higher penalties than TK in our Potts threading 1 0 calculations for the active DFG-in to DFG-out conformational change (Fig. S5 were taken from the available crystal structure. The absence of crystal structure prompted us to model the 1 structure of the ligand into the active site of the kinase by superimposing over the binding pose of the 2 available holo crystal structure. 3 Decoupling of the ligand was achieved by first turning off the coulombic intermolecular 5 interactions followed by Lennard-Jones intermolecular interactions from both the legs. This allows DDM 6 to estimate the free energy, i.e., in the presence of protein (ΔG is the electrostatic energy contribution towards the total absolute binding free energy, 1 3 and Δ Δ G is the non-polar energy contribution. 1 4 In this study, depending on the system's convergence, either 20 or 31 total Rocklin and coworkers for treating charged ligands during DDM simulations were adopted 73 . In this 3 regard, it is well documented that the use of a finite-sized periodic solvent box during DDM simulations 4 can lead to non-negligible electrostatic energy contribution towards the calculated total ABFE. Thus, 5 calculated (ΔG ௗ