Resistor: an algorithm for predicting resistance mutations using Pareto optimization over multistate protein design and mutational signatures

Resistance to pharmacological treatments is a major public health challenge. Here we report Resistor—a novel structure- and sequence-based algorithm for drug design providing prospective prediction of resistance mutations. Resistor computes the Pareto frontier of four resistance-causing criteria: the change in binding affinity (ΔKa) of the (1) drug and (2) endogenous ligand upon a protein’s mutation; (3) the probability a mutation will occur based on empirically derived mutational signatures; and (4) the cardinality of mutations comprising a hotspot. To validate Resistor, we applied it to kinase inhibitors targeting EGFR and BRAF in lung adenocarcinoma and melanoma. Resistor correctly identified eight clinically significant EGFR resistance mutations, including the “gatekeeper” T790M mutation to erlotinib and gefitinib and five known resistance mutations to osimertinib. Furthermore, Resistor predictions are consistent with sensitivity data on BRAF inhibitors from both retrospective and prospective experiments using the KinCon biosensor technology. Resistor is available in the open-source protein design software OSPREY.


Introduction
Acquired resistance to therapeutics is a pressing public health challenge that affects maladies from bacterial and viral infections to cancer [1][2][3][4][5][6].There are several different ways cancer cells acquire resistance to treatments, including drug inactivation, drug efflux, DNA damage repair, cell death inhibition, and escape mutations, among others [2].Accurate, prospective prediction of resistance mutations could allow for design of drugs that are less susceptible to resistance.While it is unlikely that medicinal chemists will be able to address all of the resistance-conferring mechanisms in cancer cells, progress can be made by the incorporation of increasingly accurate models of the above contributing factors to acquired resistance, leading to the development of more durable therapeutics.To that end, several structure-based computational techniques for therapeutic design and resistance prediction have been proposed.
One such technique is based on the substrate-envelope hypothesis.In short, the substrate-envelope hypothesis states that drugs designed to have the same interactions as the endogenous substrate in the active site will be unlikely to lose efficacy because any mutation that ablates binding to the drug would also ablate binding to the endogenous substrate [7].C. Schiffer and B. Tidor's labs developed the substrate-envelope hypothesis for targeting drug-resistant HIV strains [7][8][9][10].Their design technique has been successfully applied to develop compounds with reduced susceptibility to drug-resistant HIV proteases [10].
Another computational technique is to use ensemble-based positive and negative design [11,12].There are two specific ways that point mutations can confer resistance to therapeutics: they can decrease binding affinity to the therapeutic or they can increase binding to the endogenous ligand [11,13].Protein design with the goal of decreasing binding is known as negative design, and increasing binding is known as positive design.As a concrete example, consider the case of a drug that inhibits the tyrosine kinase activity of the epidermal growth factor receptor (EGFR) to treat lung adenocarcinoma.Here, an active site mutation could sterically prevent the inhibitor from entering the active site [14].On the other hand, a different mutation might have no effect on an enzyme's interactions with the drug but instead increase affinity to its native ligands, resulting in increased phosphorylization of downstream substrates [15,16].Because these two distinct pathways to therapeutic resistance exist, it is necessary to predict resistance mutations using both positive and negative design.In other words, predicting resistance can be reduced to predicting a ratio of the change in K a upon mutation of the protein:endogenous ligand and protein:drug complexes.
K a is an equilibrium constant measuring the binding and unbinding of a ligand to a receptor.It is defined as: where k on and k off are the on-and off-rate constants, and [RL], [R], and [L] the equilibrium concentrations of, respectively, the receptor-ligand complex, unbound receptor, and unbound ligand.K a is the reciprocal of the disassociation constant K d .K * is an algorithm implemented in the OSPREY computational protein design software that provably approximates K a [17,18].See the supplementary materials for further explanation of the K * algorithm.
Our lab developed a provable, ensemble-based method using positive and negative K * design to computationally predict and experimentally validate resistance mutations in protein targets [11].We then applied this methodology to prospectively predict resistance mutations in dihydrofolate reductase when Staphylococcus aureus was treated with a novel antifolate [13] (also confirmed in vivo [13,19]), demonstrating the utility of correctly predicting escape mutations during the drug discovery process.For cancer therapeutics, Kaserer and Blagg used OSPREY to combine positive and negative K * design with mutational signatures and hotspot identification to both retrospectively and prospectively predict clinically significant resistance mutations [20].Their technique combined sequence, in the form of trinucleotide mutational probabilities, and structures, in the form of positive and negative K * design, to predict resistance mutations.
From these previous works, it is clear that multiple criteria must be combined to decide whether a mu-tation confers resistance.Often it is the human designers themselves who must choose arbitrary weights for different criteria.Yet multi-objective, or Pareto, optimization techniques would allow designers to combine multiple criteria without choosing arbitrary decision thresholds.Pareto optimization for protein design has been employed by Chris Bailey-Kellogg, Karl Griswold, and co-workers [21][22][23][24][25][26].One such example is PEPFR (Protein Engineering Pareto FRontier), which enumerates the entire Pareto frontier for a set of different criteria such as stability vs. diversity, affinity vs. specificity, and activity vs. immunogenicity [27].
Algorithmically, PEPFR combined divide-and-conquer with dynamic or integer programming to achieve an algorithm where the number of divide-and-conquer "divide" steps required for the search over design space is linear only in the number of Pareto optimal designs.To our knowledge, Pareto optimization has yet to be applied to predicting resistance mutations, making RESISTOR the first algorithm to employ Pareto optimization for predicting resistance mutations.Instead of merely finding a single solution optimizing a linear combination of functions, Pareto optimization finds all consistent solutions optimizing multiple objectives such that no solution can be improved for one objective without making another objective worse.Specifically, let Λ be the set of possible solutions to the multi-objective optimization problem, and let λ ∈ Λ.Let F be a set of objective functions and f ∈ F , where f : Λ → R is one objective function.A particular solution λ is said to dominate another solution λ ′ when f (λ ) ≤ f (λ ′ ) for all f ∈ F , and (2) g(λ ) < g(λ ′ ) for at least one g ∈ F . ( A solution λ is Pareto optimal if it is not dominated.RESISTOR combines ensemble-based positive and negative design, cancer-specific mutational signature probabilities, and hotspots to identify not only the Pareto frontier, but also the Pareto ranks of all candidate sequences. The inclusion of mutational signature probabilities in Pareto optimization is possible because distinct mutational processes are operating in different types of cancers [28,29].Specifically, these mutational processes drive the type and frequency of DNA base substitutions.Each different signature is postulated to be associated with a biological process (such as ABOPEC activity [29]) or a causative agent (such as tobacco use), although not all associations are definitively known.What is certain is that particular signatures tend to appear in particular types of cancer.For example, 12 single-base substitution signatures, 2 double-base substitution signatures, and 7 indel signatures were found in a large set of melanoma samples, with many of those signatures associated with ultraviolet light exposure [29].Building on the work of Alexandrov et al. [28], Kaserer and Blagg combined the multiple signatures found in each cancer type to generate overall single-base substitution probabilities [20].RESISTOR uses these probabilities to compute the overall probability that mutation events will occur in a gene independent of changes to protein fitness.This amino acid mutational probability is one of the axes we optimize over.
The most computationally complex part of provable, ensemble-based multistate design entails computing the K * scores of the different design states.This is largely because for biological accuracy it is necessary to use K * with continuous sidechain flexibility [30,31].Though OSPREY has highly-optimized GPU routines for continuous flexibility [18], energy minimization over a combinatorial number of sequences in a continuous space is, in practice, computationally expensive.Having a method to reduce the number of sequences evaluated would greatly decrease the computational cost.COMETS is an empirically sublinear algorithm that provably returns the optimum of an arbitrary combination of multiple sequence states [32].RESISTOR uses COMETS to prune sequences whose predicted binding with the drug improves and binding with the endogenous ligand deteriorates.While COMETS does not compute the full partition function, it provides a useful method to efficiently prune a combinatorial sequence space.This is particularly important when one considers resistant protein targets with more than one resistance mutation.By virtue of pruning using COMETS, RESISTOR inherits the empirical sublinearity characteristics of the COMETS sequence search, rendering RESISTOR, to our knowledge, the first provable structure-based resistance prediction algorithm that is sublinear in the size of the sequence space.
The tyrosine kinase EGFR and serine/threonine-protein kinase BRAF are two oncogenes associated with, respectively, lung adenocarcinoma and melanoma.Both kinases are conformationally flexible, but two conformations are particularly determinative to their kinase activity-the "active" and "inactive" conformations.Oncogenic mutations to EGFR include L858R and deletions in exon 19, both of which constitutively activate EGFR [33,34].Likewise, V600E is the most prevalent constitutively activating mutation in BRAF [35].Numerous drugs have been developed to treat the EGFR L858R and BRAF V600E mutations.The first generation inhibitors erlotinib and gefitinib competitively inhibit ATP binding in EGFR's active site, whereas binding by the third generation osimertinib is irreversible [36][37][38].For BRAF, the therapeutics dabrafenib, vemurafenib, and encorafenib were designed to target the V600E mutation and are in clinical use, and PLX8394 is in clinical trials [39][40][41][42].Use of RESISTOR to predict resistance mutations to these drugs would provide strong validation of the efficacy of this novel approach.
By presenting RESISTOR, this article makes the following contributions: 1.A novel multi-objective optimization algorithm that combines four axes of resistance-causing criteria to rank candidate mutations.2. The use of COMETS as a provable and empirically sublinear pruning algorithm that removes a combinatorial number of candidate sequences before expensive ensemble evaluation.3. A validation of RESISTOR that correctly predicted eight clinically significant resistance mutations in EGFR, providing explanatory ensemble-bound structural models for acquired resistance.4. Prospective predictions with explanatory structural models and experimental validation of resistance mutations for four drugs targeting BRAF mutations in melanoma.5. Newly modelled structures of EGFR and BRAF bound to their endogenous ligands and inhibitors in cases where no experimental structures exist.6.An implementation of RESISTOR in our laboratory's free and open source computational protein design software OSPREY [18].

Methods
The Pareto optimization in RESISTOR optimizes four axes: structure-based positive design, structure-based negative design, sequence-based mutational probabilities, and the count of resistance-causing mutations at a given amino acid location.Briefly, we chose these four criteria because they identify mutations that 1) increase affinity to the endogenous ligand in such a way that it outcompetes the inhibitor; 2) decrease the efficacy of the drug by reducing its binding (leading to the same effect); 3) are predicted to occur based on the DNA sequence and excludes those that are unlikely to arise; and, 4) are located at residue positions where many mutations are predicted to confer resistance, thus identifying a position of relative importance.We believe these criteria to be the minimal requirements a cancer clone must fulfill to confer resistance, and we've had success predicting retrospective and prospective resistance mutations in a previous study using these four criteria [20].It should be mentioned that, as a generalizable method, additional resistance-causing criteria could be trivially added to RESISTOR for further refinement.The Pareto optimization objective function maximizes the ∆K a of the positive design (the protein bound to the endogenous ligand), minimizes the ∆K a of the negative designs (the protein bound to the drug), maximizes the mutational probability, and maximizes the count of resistance-causing mutations per amino acid.Positive design (affinity to the endogenous ligand) is also an indication of clonal fitness, i.e. whether the mutated protein can still provide the function that the cancer cell so critically depends on.If the binding to the endogenous ligand is disrupted, then it's unlikely the clone will survive.Fig. 1 shows an overview how these axes are implemented in our algorithm.
Fig. 1: An example RESISTOR workflow with EGFR.RESISTOR finds the Pareto frontier from OSPREY positive and negative designs, mutational probabilities, and resistance hotspots.(A) Two structures are required as input to OSPREY to compute postive and negative design K * scores.The structure for positive design is EGFR (green) bound to its endogenous ligand ATP (blue), for the negative design EGFR is bound to the drug erlotinib (pink).The goal of positive (resp.negative) design is to improve (resp.ablate) binding affinity.A mutation is resistant when its ratio of positive to negative K * scores increases.(B) All residues within 5 Å (purple) of the drug are allowed to mutate to any other amino acid.(C) COMETS is used as an efficient, sublinear algorithm to quickly prune infeasible mutations.BWM * is used with a fixed branch width to compute a polynomial-time approximation to the K * score.(D) Candidate mutations that pass the COMETS pruning step have their positive and negative K * scores computed in OSPREY.We recommend using the BBK * with MARK * algorithm as it is the fastest for computing K * scores.(E) Candidate resistance mutations are pruned when their ratio of positive to negative K * scores indicates a mutation does not cause resistance or if the target amino acid requires a mutation in all three DNA bases.(F) RESISTOR computes mutational probabilities using a protein's coding DNA along with cancer-specific trinucleotide mutational probabilities for lung adenocarcinoma (abbreviated as LuAd), sliding a window (G) over 5 ′ -and 3 ′ -flanked codons.(H) RESISTOR employs a recursive graph algorithm to compute the probability that a particular amino acid will mutate to another amino acid (I).(J) Finally, RESISTOR uses Pareto optimization on the positive and negative K * scores, the mutational probabilities, and hotspot counts to predict resistance mutants.

Structure-based Positive and Negative Design
We use the K * algorithm in OSPREY to predict an ε-accurate approximation to the binding affinity (K a ) in four states: 1) the wildtype structure bound to the endogenous ligand; 2) the wildtype structure bound to the therapeutic; 3) the mutated structure bound to the endogenous ligand; and 4) the mutated structure bound to the therapeutic.This ε-accurate approximation is called the K * score [17,18].In order to calculate the K * score of a protein:ligand complex, it is necessary to have a structural model of the atomic coordinates.Experimentally-determined complexes have been solved for EGFR bound to an analog of its endogenous ligand (PDB id 2itx), to erlotinib (1m17), gefitinib (4wkq), and to osimertinib (4zau) [43][44][45][46].Similarly, we used the crystal structure for BRAF bound to dabrafenib (4xv2) and vemurafenib (3og7) [47,48].Experimentally-determined complexes of BRAF bound to encorafenib, PLX-8394, and an ATP analog in an active conformation do not exist, so we instead modelled the ligands into BRAF in its activated conformation (for additional details on model selection and preparation see the STAR methods).We used these predicted complex structures for our resistance predictions.
We added new functionality into OSPREY that simplifies the process of performing computational mutational scans.A mutational scan refers to the process of computing the K * score of every possible amino acid mutation within a radius of a ligand.RESISTOR uses this functionality to calculate the four K * scores for each amino acid within a 5 Å radius of the drug or the endogenous ligand.This generated a search space of 2471 sequences.We then set all residues with sidechains within 3 Å of the mutating residue to be continuously flexible for the RESISTOR K * designs.Each sequence has an associated conformation space size dependent on the total number of mutable and flexible residues, which one can use as a heuristic to estimate the difficulty of computing a complex's partition function.The average conformation space size of each sequence was ∼ 5.9 × 10 10 conformations, thus computing the partition functions is only possible using OSPREY's pruning and provable ε-approximation algorithms [18,30,49].Empirical runtimes of the positive-and negative-K * designs are shown in supplementary figure S1.The change in the K * score upon mutation for the endogenous ligand (positive design) and drug (negative design) become two of the four axes of optimization.These two axes also form the basis of a pruning step (described in Sec.2.4).

Computing the Probability of Amino Acid Mutations
To convert the trinucleotide to trinucleotide probabilities into amino acid to amino acid mutational probabilities, RESISTOR constructs a directed graph with the trinucleotides as nodes and the probability that one trinucleotide mutates into another trinucleotide as directed edges.It then reads the cDNA of the protein in a sliding window of 5 ′ -and 3 ′ -flanked codons, since the two DNA bases flanking a codon are necessary to determine the probabilities of either the first or third base of a codon mutating.We designed a recursive algorithm to traverse the graph and find all codons that can be reached within n single-base mutations, where n is an input parameter.The algorithm then translates the target codons into amino acids and, as a final step, sums the different probabilities on each path to an amino acid into a single amino acid mutational probability (see Fig. 1F-I).One can either (a) precompute a cancer-specific codon-to-codon lookup table consisting of every 5 ′ -and 3 ′ -flanked codon to its corresponding amino acid mutational probabilities, or (b) read in a sequence's cDNA and compute the mutational probabilities on the fly.The benefit of (a) is it only needs to be done once per cancer type and can be used on an arbitrary number of sequences.On the other hand, when assigning mutational probabilities to proteins that have strictly fewer than 4 5 amino acids, it is faster to compute the amino-acid specific mutational signature on the fly.In both cases, the algorithm is strictly polynomial and bounded by O(kn 9 ), where k is the number of codons with flanking base pairs (upper-bounded by 4 5 ) and n is the number of mutational steps allowed, which in the case of RESISTOR is 2.An implementation of this algorithm is included in the free and open source OSPREY GitHub repository [18].

Identifying Mutational Hotspots
After calculating the positive and negative change in affinity ∆K a and determining the mutational probability of each amino acid, RESISTOR prunes the set of candidate mutations (see section 2.4).Post-pruning, it counts the number of mutations at each amino acid location.This count is important to determine whether a residue location is likely to become a "mutational hotspot", namely a residue location where many mutations are predicted to confer resistance.Correctly identifying mutational hotspots is vital because they indicate that a drug is dependent on the wildtype identity of the amino acid at that location, and it is likely that many mutations away from that amino acid will cause resistance.Consequently, the fourth axis used in RESISTOR's Pareto optimization is the count of predicted resistance-conferring mutations per residue location.

Reducing the Positive Prediction Space
Prior to carrying out the multi-objective optimization to identify predicted resistance mutations, we prune the set of candidates.First, we introduce a cut-off based on the ratio of K * scores of positive and negative designs, adapted from Kaserer and Blagg's 2018 cut-off [20].We determine the average of the K * scores for the drug and endogenous ligand across all of the wildtype designs for the same protein.The cut-off c is: where c 0 is a user-specified constant, K * L is the average of the K * scores for the wildtype protein bound to the endogenous ligand, and K * D is the average of the K * score for the wildtype protein bound to the drug.We recommend in practice to set c 0 to be greater than the range K * max − K * min of wildtype K * scores-we set it to for the tyrosine kinase inhibitor (TKI) predictions. 1 A mutation m is predicted to be resistant when: where K * L (m) is the K * score of the endogenous ligand bound to the mutant, and K * D (m) is the K * score of the drug bound to the mutant.
We further prune the predicted resistance mutation candidates by removing all mutations that cannot arise within two DNA base substitutions.Whether an amino acid can be reached within two DNA base substitutions is determined by the algorithm described in section 2.2, and if it cannot, then that particular mutation is assigned a mutational probability of 0 and pruned.

RESISTOR Identifies 8 Known Resistance Mutations in EGFR
We evaluated a total of 1257 sequences across the three TKIs for EGFR.Among these sequences, the average conformation space size for computing a complex's partition function was ∼ 1.3 × 10 7 .After we ran the RESISTOR algorithm on these sequences, a total of 108 mutants were predicted as resistance-conferring candidates for all three inhibitors combined from a purely thermodynamic and probabilistic basis, i.e. these mutations were required to lower affinity of the drug in relation to the endogenous ligand (K * Positive and Negative Design, Fig. 1A-D) and could be formed in patients by less than three base pair exchanges (Calculating Mutational Probabilities, Fig. 1F-I).To further prioritize mutations and identify those that are most likely to be clinically relevant, we then computed the Pareto frontier over the four axes for each drug (Fig. 1J).Remarkably, out of these 108 candidates, RESISTOR correctly prioritized eight clinically significant resistance mutants, with 7 of the 8 in the Pareto frontier of the corresponding inhibitor and the remaining mutant in the 2 nd Pareto rank (see Table 1).A detailed description of the result for each inhibitor is included in the sections below.

RESISTOR Identifies Clinically-Relevant Resistance Mutations in EGFR Osimertinib
Erlotinib Gefitinib Table 1: RESISTOR correctly identified 8 resistance mutations in EGFR to erlotinib, gefitinib, and osimertinib.For osimertinib, G796R, G796S, G796D, and G796C were on the RESISTOR-identified Pareto frontier.L792H was in the 2 nd Pareto rank.For erlotinib, both T790M and G796D were on the Pareto frontier.For gefitinib, T790M was also on the Pareto frontier.Previous studies have documented all of these resistance mutations as occurring in the clinic [50][51][52][53][54][55][56][57].† indicates that RESISTOR predicted the mechanism of resistance to be improved binding of the endogenous ligand to the mutant.# indicates that RESISTOR predicted the mechanism of resistance to be decreased binding of the drug to the mutant.Note that these predicted mechanisms are only attributed here if the predicted change in the log 10 (∆K * ) ≥ 0.5.

EGFR Treated with Erlotinib and Gefitinib
Of the 462 sequences evaluated for the TKI erlotinib, RESISTOR identified 50 as candidate resistance mutations.Pareto ranking placed 19 sequences on the frontier, 13 sequences in the second rank, and 11, 6, and 1 sequences in the third, fourth, and fifth ranks, respectively.RESISTOR correctly identified two clinically significant mutations, T790M and G796D, as being on the Pareto frontier [50,51].This is encouraging because T790M is, by far, the most prevalent resistance mutation that occurs in lung adenocarcinoma treated with erlotinib [58].Similarly, for gefitinib, RESISTOR evaluated 438 sequences and identified 22 as candidate resistance mutants.The most relevant clinical mutant, T790M, is found on the Pareto frontier.

EGFR and Osimertinib
RESISTOR evaluated 357 OSPREY-predicted structures of EGFR bound with osimertinib and EGFR bound with its endogenous ligand.Of those, 36 were predicted as resistance candidates.Pareto optimization placed 16 sequences on the frontier, 2 sequences in rank 2, 8 sequences in rank 3, 1 sequence in rank 4, and 5 sequences in rank 5. RESISTOR correctly identified five clinically significant resistance mutations to osimertinib: L792H, G796R, G796S, G796D, and G796C [52][53][54][55][56][57], and while L792H was in the 2 nd Pareto rank, all of the other correctly predicted resistance mutations are on the Pareto frontier.
Two osimertinib resistance mutations in particular stand out: L792H and G796D (see Fig. 2).Both of these mutants have appeared in the clinic [52][53][54]57].OSPREY generated an ensemble of the bound positive and negative complexes upon mutation, providing an explanatory model for how resistance occurs.In both cases, the mutant sidechains are much bulkier than the wildtype sidechain (Fig. 2A and D) and thus are predicted to clash with the original osimertinib binding pose (Fig. 2B and E).Consequently, in both cases the ligand is predicted to translate and rotate to create additional space for the mutant sidechains (Fig. 2C and F).We hypothesize that this movement weakens the other molecular interactions osimertinib makes in the EGFR active site.
In the case of G796D, there are additional factors that contribute to acquired resistance.First, the mutation to aspartate introduces a negative charge, which probably leads to electrostatic repulsion with the carbonyl oxygen of the osimertinib amide (Fig. 2F, highlighted with a dashed oval).In addition, the exit vector of the hydrogen bound to the amide nitrogen does not allow a hydrogen bond with the aspartate.Second, the allyl-group of osimertinib must be in close proximity to C797 for covalent bond formation.In fact, C797 is so important to osimertinib's efficacy that mutations at residue 797 confer resistance [59,60].Even if osimertinib still binds to G796D, the allyl group would have to move away from C797 (Fig. 2F, highlighted with a black arrow).This would prevent covalent bond formation and thus reduce the efficacy of osimertinib considerably.Lastly, it is likely that the mutation away from glycine reduces the conformational flexibility of the loop, incurring an entropic penalty while also plausibly making it more difficult to properly align osimertinib and C797.

RESISTOR Predicts New Resistance Mutations in BRAF and Provides Structural Models
In addition to retrospective validation by comparison to existing clinical data for EGFR, we used RESISTOR to predict how mutations in the BRAF active site could confer resistance.Specifically, we used RESIS-TOR to predict which of 1214 BRAF sequences would be resistant to four TKIs-vemurafenib, dabrafenib, encorafenib, and PLX8394.On the Pareto frontier for vemurafenib are 13 mutations, for dabrafenib 16 mutations, for encorafenib 15 mutations, and for PLX8394 15 mutations.The full sets of predictions are included in the supplementary tables S4-S7.To validate RESISTOR's predictions, we compared its predictions with two sources of experimental data: a saturation mutagenesis variant effect assay from Wagenaar et al. [61] and a cell-based kinase conformation reporter assay termed KinCon by Stefan and colleagues [62,63].Furthermore we carried out new KinCon experiments on a number of RESISTOR predictions to validate RESISTOR's predictive capabilities.

Retrospective and prospective validation of RESISTOR predictions using the BRAF KinCon biosensor reporter
KinCon, developed by Stefan and colleagues, is an in-cell protein-fragment complementation assay (PCA) that provides a readout of the activity conformation change of full-length BRAF upon mutation or exposure to different inhibitors [64].KinCon's bioluminescence assay functions by appending parts of a luceriferase enzyme to the N-and C-termini of full-length BRAF and observing the amount of bioluminescence, indicating whether BRAF favors an open, catalytically active or a closed, autoinhibited conformation (see Fig. 3A) [64].Stefan and colleagues have demonstrated that activation of BRAF either via upstream regulators such as EGFR and GTP activated Ras or via tumorigenic mutations cause BRAF to favor an open conformation [62,63].The inhibitors bind to BRAF in the ATP binding site and cause BRAF's N-and C-termini to interact, shifting BRAF back towards a more closed, intermediate state (see Fig. 3A) [62][63][64].This implies that for inhibitor binding and BRAF closing to occur, a mutation (or a combination of mutations and/or upstream signaling events) needs first to induce an open conformation.Not all clinically observed BRAF mutations cause opening, even if they activate the MAPK pathway (e.g.L472C) [63,65].In the same vein, not all BRAF resistance mutants show increased kinase activity, in fact several are classified as kinase impaired [63,65,66].One prominent mutation that shows both increased kinase activity and induces an open conformation is V600E (Fig. 3B).Inhibitor treatment shifts the V600E conformational equilibrium towards a more closed state [62,63].In contrast, the gatekeeper mutations T529M and T529I do not confer opening of the kinase conformation and are thus insensitive to inhibitor treatment [62].However, in combination with V600E these mutations do confer resistance to BRAF inhibitors to varying degrees.Given that we model a state that is permissive of ligand binding at the outset (i.e., the ligand-bound BRAF complex), our RESISTOR calculations align very well with the reported KinCon measurements of double mutants (e.g.In case of the G796D mutation, the carboxylate moiety of D796 is predicted to be in close proximity to the osimertinib amide oxygen (highlighted with the dashed circle), thus leading to electrostatic repulsion.This mutation site is adjacent to C797, which reacts with the allyl-moiety of osimertinib to form a covalent bond in the wildtype.Due to the steric and electrostatic properties of the G796D mutant, the allyl group is located further away from C797 in the model, thus preventing covalent bond formation.The movement of the allyl group is indicated by the black arrow.

V600E/T529M and V600E/T529I).
Specifically, the RESISTOR predictions of resistance concord with the previous KinCon biosensor results for V600E/T529M and V600E/T529I for three of the four inhibitors: vemurafenib, dabrafenib, and PLX8394 [62].In the case of vemurafenib treatment, the proportion of open to closed conformations in the V600E/T529I mutant is not significantly different from the untreated V600E mutant, indicating vemurafenib treatment is not closing the conformational distribution in the double mutant [62].These data agree with the RESISTOR calculation of the ratios of the log 10 K * scores, which predict that both double mutants are resistant to vemurafenib, with V600E/T529M more resistant.Treatment of BRAF with PLX8394 follows the same pattern as vemurafenib, namely the V600E/T529I mutant's closed population increases only 1.2 fold compared to the untreated mutant, and the PLX8394-treated V600E/T529M mutant does not noticeably alter the conformational distribution [62].In contrast, the PLX8394-treated V600E mutant's closed population increases ∼3 fold compared to the untreated population, indicating V600E sensitivity to PLX8394 (see Fig. 3C).RESISTOR correctly predicted the V600E/T529I and V600E/T529M double mutants are resistant to PLX8394, with the change in the ratio of the log 10 K * scores of the two mutants suggesting that V600E/T529M confers greater resistance.In the case of dabrafenib, treatment of the V600E/T529I mutant closed the conformational distribution (2.4 fold more closed compared to untreated) more than treatment of the V600E mutation (2 fold more closed compared to untreated), whereas dabrafenib treatment of the V600E/T529M mutant increased the closed conformational population less effectively than the V600E mutant alone (1.4 fold vs. 2 fold).This again agrees with the RESISTOR predictions, namely that V600E/T529I remains sensitive to dabrafenib but V600E/T529M is resistant.RESISTOR predicted that the V600E/T529I and V600E/T529M mutants would be resistant to encorafenib, but the KinCon data indicates that these mutants may actually retain sensitivity to encorafenib, as the inhibitor induces BRAF's closed state.
In addition, all inhibitors except dabrafenib were predicted to be sensitive against the G466V mutation and showed closing the of kinase conformation [63].However, in case of dabrafenib, the response was comparable to vemurafenib, although vemurafenib was classified as sensitive.Notably, previous KinCon experiments showed that G466V (and G466R and G466E [66], see below) impaired kinase function consistent with the reduced endogenous ligand binding predicted by RESISTOR (see "All BRAF Predictions" supplementary table) [63].
In addition to the above retrospective validation, we chose a few mutations and evaluated them using the KinCon reporter.We selected the mutants G466E, G466R, V471F, L505H, and G593D because they were prioritized by RESISTOR for at least one of the investigated inhibitors and were reported as patient mutations in either the COSMIC [58] or cBioPortal (using the curated set of non-redundant studies) [67,68] databases (see Table 2).The expression-normalized basal biosensor signal suggests that both G466E and G466R mutants shift the conformation to an opened state, comparable to the highly oncogenic V600E variant and similar to the effect of the common non-small-cell lung cancer mutation G466V [63].The V471F, L505H and G593D mutations, in contrast, did not appear to induce a change in the active conformation (Fig. 3B).When exposed to BRAF inhibitors (Fig. 3C), G466E and G466R mutants showed the highest fold increase of the biosensor signal for all four inhibitors tested.The majority of inhibitors, three out of four, were predicted as sensitive against these mutants.RESISTOR predicted G466E and G466R to be resistant to dabrafenib, and while RESISTOR predicted dabrafenib had lower sensitivity compared to encorafenib and PLX8394 (which is consistent with the KinCon results), dabrafenib-treated mutants shifted to a closed comformation at least as much as vemurafenib-treated mutants did.The L505H and G593D KinCon mutants were not affected by any inhibitors, as those mutations do not shift the kinase into an active opened kinase conformation which is required for inhibitor binding.While vemurafenib and dabrafenib do not appear to affect the V471F mutant, encorafenib and PLX8394 did induce a closing of the kinase, suggesting that the structural properties of the inhibitor determine the binding affinity to this mutant.This is particularly intriguing, given that the V471F mutation was selected because we predicted it would confer resistance to encorafenib and PLX8394.While the KinCon results suggest that these two compounds still retain binding to the V471F mutant, the mutant itself did not induce a significant opening of the kinase confirmation required for ligand binding.For the latter three mutations (i.e.L505H, G593D, and V471F), it would therefore be required to induce the open conformation some other way, for example by introducing the V600E mutation similar to T529I and T529M described above, to investigate whether resistance would develop to the inhibitors [62].

Retrospective validation of RESISTOR predictions using BRAF saturation mutagenesis experiments
Wagenaar et al's 2014 study examined the effects of BRAF inhibitor binding site mutations on inhibitor efficacy [61].To do so, they carried out targeted saturation mutagenesis on the BRAF vemurafenib binding site in the A375 human melanoma cell line and challenged the mutants with vemurafenib over a three week period [61].They then sequenced the emergent clones and measured the IC 50 values of a subset of the mutants.Importantly, they demonstrated correlation between a mutant's deep sequencing enrichment, i.e. the increase in the amount of an amino acid sequence in a sample before and after the addition of an inhibitor, and its IC 50 value [61].We therefore compared their enrichment data to the RESISTOR predictions and determined RESISTOR's vemurafenib resistance prediction specificity to be 91%.There were five RESISTOR-predicted resistance mutations that had increased enrichment over the three week period: T529M already discussed above (enriched 47.96 fold above the V600E baseline, which was the experiment's largest change in enrichment), T529L (enriched 18.57 fold above baseline), T529F (enriched 7.87 fold above baseline), G593I (enriched 4.84 fold above baseline), and L514E (enriched 3.73 fold above baseline).Furthermore, Wagenaar et al. determined the relative IC 50 values of T529M, T529L, and G593I which were, respectively, 2.05, 2.16, and 3.19 times larger than the IC 50 for vemurafenib applied to the V600E mutant.The IC 50 of T529F and L514E were not determined.
To further elucidate the molecular mechanisms conferring resistance to the G593I and L514E mutants, we analyzed the OSPREY-predicted structural models.While neither mutant requires a movement of vemurafenib (Fig. 4A) akin to what was observed in the EGFR and osimertinib structures (Fig. 2), the mutations still lead to a loss of favorable interactions and/or the introduction of energetically unfavorable contacts.The residue G593 (Fig. 4B) may facilitate structural adaptions required for BRAF to accommodate the vemurafenib propyl sulfonamide moiety in the rear of the ATP binding site and the G593L mutations may thus constrain the flexibility of this loop region.In addition, the leucine side chain may project near to the fluoro-substituted central phenyl ring and introduce steric clashes (Fig. 4C).The neighboring D594 backbone interacts with the vemurafenib sulphonamide-nitrogen (Fig. 4B), and this interaction would be weakened in the G593L mutant.Furthermore, residue L514 makes a range of hydrophobic contacts with vemurafenib (Fig. 4D), including the central phenyl ring and the propyl-chain, which are lost in the L514E mutant (Fig. 4E).

Complexity
There are a number of distinct steps in RESISTOR, each of which has its own complexity.While there are sublinear K * algorithms, such as BBK * [69] with MARK * [49], these algorithms so far have only been applied to positive and negative design with optimization of specific multiple objectives, such as minimizing/maximizing the bound (respectively unbound) state partition functions and their ratios for computing binding affinity or stability.COMETS [32] provably does multistate design optimizing arbitrary constrained linear combinations of GMEC energies, but COMETS does not model the partition functions required for calculating binding affinity.A provable ensemble-based algorithm analogous to COMETS for arbitrary multistate design optimization is yet to be developed.Thus, general multistate K * design remains, unfortunately, a problem linear in the number of sequences and thus exponential in the number of mutable residues.
Computing K * itself, as a ratio of partition functions built from the thermodynamic ensembles of the bound to unbound states, can be expensive [70][71][72].In order to reduce the number of K * problems to solve, COMETS is employed as a pruning mechanism for all sequences in which there are more than one mutation.Without COMETS, RESISTOR would need to compute sN K * scores, where s is the number of states and N is the number of sequences.With COMETS, RESISTOR is able to avoid computing many of these K * scores, as COMETS has been shown in practice to reduce the number of required GMEC calculations by over 99% and to reduce N for continuous designs by 96%, yielding an overall speedup of over 5 × 10 5 -fold [32].Since in this study we considered only single residue mutations we omitted the COMETS pruning step, but in any use of RESISTOR that considers multiple simultaneously mutable residues we believe COMETS' empirical sublinearity will make the difference between feasible and infeasible searches.
Moreover, by using an approximation containing fixed partition function size and sparse residue interaction graphs, we can use the BWM * algorithm [73] to compute the K * scores in time O(nw 2 q 3 2 w + kn log q), where w is the branch-width and q the number of rotamers per residue.When we have w = O(1) this is polynomial time.In this study we found that the ε-approximation algorithms using adaptively-sized partition functions, such as BBK * with MARK * , were fast enough.However, for larger problems the sparse approximations allow us to approximate the necessary K * scores for resistance prediction in time exponential only in the branch-width, and thus polynomial time for fixed branch-widths.

Discussion
In this work, we report RESISTOR, a computational algorithm to systematically investigate protein mutations and identify those that have a high likelihood of lowering drug potency in comparison to native substrates.In addition, we analyze the probability that such a mutation is generated in cancer patients and thus likely of clinical importance.Our algorithm is novel, bringing the power of Pareto optimization and computational protein design together and applying them for the first time to predict resistance mutations.The Pareto ranking provides an objective way of prioritizing the most relevant mutations for experimental testing.In addition, we used computationally predicted input structures of ligand-target complexes whenever experimental data was lacking.This is an important step towards expanding the applicability of RESISTOR, as we have found that, perhaps surprisingly, the availability of high-resolution experimental ligand-target structures still can present a major bottleneck in computational protein design.
We have applied RESISTOR to two case studies, EGFR and BRAF, in a retrospective manner and, in case of BRAF, also included prospective experimental data for validation.In EGFR and BRAF, the algorithm correctly identified resistance mutations.Using the vemurafenib data by Wagenaar and colleagues [61], which is the most comprehensive dataset on BRAF mutations and vemurafenib resistance, we determined RESISTOR's vemurafenib resistance prediction specificity and sensitivity to be 91% and 31%, respectively.In a data-rich setting such as proteomics (e.g.[74]), the sensitivity could be regarded as low.However, the prediction of antineoplastic resistance mutations is a sparse data problem.Comprehensive datasets on drug resistance mutations on specific targets are virtually non-existent.We speculate that the reason for this can be found in the large number of individual mutants that must be generated and tested.For example, in our study we used RESISTOR to investigate 462, 438, and 357 individual mutants for Erlotinib, Gefitinib, and Osimertinib, respectively.While this is computationally feasible, it far exceeds the testing capacities of most experimental groups.Clinical resistance data is even more limited.Furthermore, even for those mutations that have been confirmed to confer clinical resistance in patients, the underlying molecular mechanisms often remain uninvestigated.
RESISTOR prioritizes escape mutations causing ablation of inhibitor binding and/or tighter substrate binding (the latter as a proxy for K M ).However, mutations affecting the drug target could also mediate resistance via other molecular processes such as altering the stability of conformational states or protein-protein interactions.In addition, clinical resistance is caused by several different mechanisms, of which the relative importance of escape mutations can vary greatly.In some kinases, such as c-Abl, EGFR, and FLT3, active site escape mutations are the main cause of acquired resistance [75].In other kinases, such as BRAF, escape mutations are not the main mechanism of acquired resistance [76].Rather, splice variants, amplification, and mutations in related genes such as N-RAS, MEK1, MEK2, IGF-1R, and AKT comprise the majority of cases of clinical resistance [76].From this perspective, the specificity of RESISTOR for BRAF and vemurafenib is remarkable and the sensitivity is in line with the fraction of resistance mutations whose aetiology is definitively escape via active site mutation.
We believe that the remaining gap can be closed in future work by modelling additional conformational flexibility, kinetics, and the protein-protein interactions of additional effectors.Yet, despite these limitations, RESISTOR is able to prioritize mutations that are demonstrated to confer resistance in patients.Specifically, our results show that detailed and combinatorial thermodynamic computations can form the basis for predicting escape mutations to TKIs.In the future, since some resistance mutations exploit kinetic phenomena, kinetics could be incorporated for a more comprehensive model.

Conclusions
RESISTOR fills an important void in the science of predicting resistance mutations by providing an algorithm to enumerate the entire Pareto frontier of multiple resistance-causing criteria.It is, to our knowledge, the first algorithm to apply Pareto optimization to predicting resistance mutations.By categorizing predicted resistance mutations by their Pareto rank, it allows the drug discovery community to prioritize escape mutations on the Pareto frontier.RESISTOR also provides structural justification for the mechanism of each predicted escape mutation by generating an ensemble of predicted structural models upon mutation.In this study, we have applied RESISTOR to predict resistance mutations in EGFR and BRAF for a number of different therapeutics.We demonstrate that RESISTOR can also be applied to computationally generated input structures, although the accuracy of the results may be somewhat diminished compared to experimentally determined structures of target-ligand complexes.However, computationally-derived models can still provide useful insights, especially when considering that the availability of experimental structures appears as major bottleneck.While RESISTOR as described herein optimizes over 4 objectives, as a general method any number of diverse objectives could be added.RESISTOR can be applied not only to cancer therapeutics, but also to antimicrobial or antiviral drug design.Thus it provides an important tool to the drug discovery community to design drugs that are less prone to resistance.

Box 1: Progress and Potential
Targeted cancer drugs developed over the past two decades have been instrumental in treating certain types of cancer and extending patient lifespans.These drugs include kinase inhibitors targeting EGFR and BRAF, two important enzymes of the mitogen-activated protein kinase pathway whose dysregulation can lead to many types of cancer, including melanoma and non-small cell lung cancer.The inhibitors are effective for a period of time but the tumors often develop resistance to the drugs, leading once again to cancer progression.The ability to predict how an enzyme target can develop drug resistance would allow for a proactive, resistance-aware approach to drug design.Our new algorithm RESISTOR uses structure-based computational design to predict how different mutations in an enzyme will affect a drug's efficacy.It pairs these predictions with empirical data on how likely a mutation is to occur in a given cancer type, which allows researchers to identify "mutational hotspots," or particular places where mutations are most likely to cause drug resistance.These predictions provide designers new insights during the drug development process that should allow for the quicker development of more durable and longer-lasting cancer therapeutics.binding site was defined as 6 Å around the ligand and the water molecule HOH905 was set to toggle and spin.The default settings of all other parameters were used.An experimental structure of the endogenous ligand ADP was available, however, BRAF adopted in inactive conformation in this complex.Apo BRAF in its active conformation (PDB id 4mne [83]) was thus combined with ANP-bound protein kinase c-src (PDB id 2src [84]) to generate an active, endogenous ligand-bound BRAF complex.This model was used as template to build a BRAF:ADP homology model in the Molecular Operating Environment [85] using the default settings.This included refinement steps to resolve potential steric clashes in the rather crude ANP-BRAF input template.
For all complexes, water molecules not involved in mediating interactions between the ligand and the target were deleted and only residues with a 12 Å radius around the ligand were kept in the final input structures.

Evaluation of Ligand Affinity:
The command line interface of OSPREY was used to generate distinct YAML design files for each residue within 5 Å of a ligand.These YAML design files specify the input structures, the mutable residues, the flexible residues, and connectivity templates for OSPREY.To create the forcefield parameters files for the inhibitors and endogenous ligands, we used the Antechamber program in the AmberTools software package [86].Then, to calculate the K * scores we used OSPREY with the following command input: osprey affinity --design <YAML design file> --epsilon 0.63 --frcmod <force field modification file> --stability-threshold -1 where <YAML design file> was replaced with the individual YAML design file and <force field modification file> was replaced with the AmberTools-generated file.The YAML design and forcefield modification files used in this study are available in the Harvard Dataverse (see Key Resources Table ).
Luciferase PCA analyses: We transiently overexpressed indicated versions of the Rluc-PCA-based KinCon biosensors in 24-well plate formats.Experiments were performed 48h post transfection.For the luciferase-PCA measurements, the growth medium was carefully removed and the cells were washed with phosphate-buffered saline (PBS).Cell suspensions were transferred to 96-well plates and subjected to luminescence analysis using the PHERAstar FSX (BMG Labtech).Luciferase luminescence signals were integrated for 10 seconds following addition of the Rluc substrate benzyl-coelenterazine (NanoLight, #301).Cell lysates were prepared post RLU measurements.Expression levels of the biosensor were determined via western blot analysis.

Quantification and Statistical Analysis:
In Fig. 3, the student's T-test was used to evaluate whether the mean of the RLU of a mutant was significantly different from that of the relative DMSO control.The SEM was used with n = 4. Significance was defined to three different p-levels, where *p < 0.05, **p < 0.01, and ***p < 0.001.S1.The K * algorithm K * is an ε-accurate algorithm for computing a provable approximation to the affinity constant K a .It is implemented in the OSPREY computational protein design software package [18,87].K * is defined as the quotient of the bound to unbound partition functions of a protein:ligand system for a given amino acid sequence.For a proof that K * approximates K a see Appendix A of [87].
K * calculates an ε-accurate partition function for three structures: the bound protein:ligand complex (denoted PL), the unbound protein (denoted P), and the unbound ligand (denoted L).Let X be an arbitrary state, X ∈ {P, L, PL}.The partition function is a summation of the Boltzmann-weighted energies for all of the conformations in the thermodynamic ensemble of X.Let s denote an arbitrary amino acid sequence, then the partition function of s in state X (which we donate as q X (s)) is: where Q X (s) is the entire conformational ensemble of sequence s in state X, and c is a single conformation from that ensemble.E(c) is the energy of conformation c.R is the ideal gas constant and T is the temperature in absolute Kelvin.The K * score for a sequence s approximates K a : By using an A * search over Q X (s) to generate an ordered, gap-free list of low energy conformations, the K * algorithms generates an ε-approximation of the partition function q X (s) and the ensemble-complete K * value.This approximation is known as the K * score.

S2. Empirical RESISTOR runtimes
The RESISTOR computation entails three stages: 1) computing the positive and negative K * designs; 2) assigning mutational signature probabilities to each mutation, and; 3) run Pareto optimization over the four axes.Steps 2 and 3 empirically take a negligible amount of time, on the order of seconds.Step 1, however, computes two partition functions for each sequence and can take more time.Figure S1 shows the empirical runtime (in seconds) that it took our computers to run the positive and negative K * designs, where a design mutated a residue to each of the 19 other possible amino acids.

S3. EGFR Pareto Frontier
Table S1: All RESISTOR resistance mutation predictions for EGFR with erlotinib."Pos" is the position of the residue."WT AA" is the wildtype identity of the amino acid."Mut AA" is the resistance mutation."Sig Prob" is the mutational signature probability for the mutation from "WT AA" to "Mut AA" in lung adenocarcinoma."ATP WT" and "ATP Mut" are the K * scores of the endogenous ligand with the wildtype and mutant residues, respectively."Drug WT" and "Drug Mut" are the K * scores of erlotinib with the wildtype and mutant residues, respectively."Count" is number of resistance mutations at the position."Rank" is the Pareto rank of the mutation.Note: K * scores are in log 10 units where possible and 0 where there is predicted to be no binding.Pos  Table S3: All RESISTOR resistance mutation predictions for EGFR with osimertinib."Pos" is the position of the residue."WT AA" is the wildtype identity of the amino acid."Mut AA" is the resistance mutation."Sig Prob" is the mutational signature probability for the mutation from "WT AA" to "Mut AA" in lung adenocarcinoma."ATP WT" and "ATP Mut" are the K * scores of the endogenous ligand with the wildtype and mutant residues, respectively."Drug WT" and "Drug Mut" are the K * scores of osimertinib with the wildtype and mutant residues, respectively."Count" is number of resistance mutations at the position."Rank" is the Pareto rank of the mutation.Note: K * scores are in log 10 units where possible and 0 where there is predicted to be no binding. Pos

S4. BRAF Pareto Frontier
Table S4: All RESISTOR resistance mutation predictions for BRAF with dabrafenib."Pos" is the position of the residue."WT AA" is the wildtype identity of the amino acid."Mut AA" is the resistance mutation."Sig Prob" is the mutational signature probability for the mutation from "WT AA" to "Mut AA" in melanoma."ATP WT" and "ATP Mut" are the K * scores of the endogenous ligand with the wildtype and mutant residues, respectively."Drug WT" and "Drug Mut" are the K * scores of dabrafenib with the wildtype and mutant residues, respectively."Count" is number of resistance mutations at the position."Rank" is the Pareto rank of the mutation.Note: K * scores are in log 10 units where possible and 0 where there is predicted to be no binding.Table S5: All RESISTOR resistance mutation predictions for BRAF with vemurafenib."Pos" is the position of the residue."WT AA" is the wildtype identity of the amino acid."Mut AA" is the resistance mutation."Sig Prob" is the mutational signature probability for the mutation from "WT AA" to "Mut AA" in melanoma."ATP WT" and "ATP Mut" are the K * scores of the endogenous ligand with the wildtype and mutant residues, respectively."Drug WT" and "Drug Mut" are the K * scores of vemurafenib with the wildtype and mutant residues, respectively."Count" is number of resistance mutations at the position."Rank" is the Pareto rank of the mutation.Note: K * scores are in log 10 units where possible and 0 where there is predicted to be no binding.

Fig. 2 :
Fig.2: Structural models predicted by OSPREY agree with experimental data and explain mechanisms of Osimertinib resistance to EGFR mutations L792H and G796D.Structural models predicted by OSPREY of EGFR wildtype (blue) and resistance mutations (red) bound to osimertinib (yellow sticks).The histidine (A) and glutamate (D) side chains (red sticks) in the EGFR L792H (A) and G796D (D) mutations are bulkier than the wildtype leucine (A) and glycine (C) residues (blue sticks).They clash with osimertinib in its original binding pose as highlighted by the sphere representation in panels B and E. (C+F) To allow for accommodation of osimertinib in the modelled EGFR mutant structures (red sticks), the inhibitor's position within the binding pocket moves from the experimentally determined binding pose (yellow sticks).Movements are indicated by black arrows.(F) In case of the G796D mutation, the carboxylate moiety of D796 is predicted to be in close proximity to the osimertinib amide oxygen (highlighted with the dashed circle), thus leading to electrostatic repulsion.This mutation site is adjacent to C797, which reacts with the allyl-moiety of osimertinib to form a covalent bond in the wildtype.Due to the steric and electrostatic properties of the G796D mutant, the allyl group is located further away from C797 in the model, thus preventing covalent bond formation.The movement of the allyl group is indicated by the black arrow.

Fig. 3 :
Fig. 3: (A) Schematic depiction of Renilla luciferase (RLuc; F1: fragment 1, F2: fragment 2) PCA-based BRAF kinase conformation (KinCon) reporter system.Conformational rearrangement of the reporter upon (de)activation of the kinase are indicated.Closed kinase conformation induces complementation of RLuc PCA fragments resulting in increased RLuc-emitted bioluminescence signal.(B) Domain organization of the BRAF-KinCon reporter (top) and basal bioluminescent signals of the BRAF-wt (black), V600E (red), and novel mutant (grey) KinCon biosensors.Bars represent the mean signals, relative to BRAF-wt, in relative light units (RLU) with SD of four independent experiments (nodes).Raw bioluminescence signals were normalized on reporter expression levels, determined through western blotting.Asterisk indicates level of significance versus the wild type BRAF biosensor.(C) BRAF-KinCon biosensor dynamics, induced via treatment with respective BRAFi (1µM for 1h) prior to bioluminescence measurement.BRAF wt and V600E KinCon variants serve as control (left).The novel mutants are shown in a separate bar chart (right).Bars represent the mean signals, relative to the DMSO control, in relative light units (RLU) with SEM of four independent experiments (nodes).All experiments were performed in HEK293T cells 48 hours post transfection.*p < 0.05; **p < 0.01; ***p < 0.001; n.s., not significant by t-test.

Fig. 4 :
Fig. 4: Structural analysis of BRAF mutations G593I and L514E.(A) No major movements were required for vemurafenib to bind to the G593I (yellow) and L514E (orange) mutation in comparison to the wild type binding pose (blue).(B) BRAF G593 is located on the N-terminus of the activation loop and may facilitate conformational changes required to accommodate the vemurafenib propyl sulfonamide moiety in the back of the pocket.The backbone of the neighboring D594 residue interacts with the sulfonamide nitrogen of vemurafenib as indicated by black dashed lines.(C) Mutation of G593 to L not only restricts flexibility of the loop, but also puts the leucine side chain in too close proximity to the fluoro-substituted phenyl ring (highlighted with the dashed circle).(D) Residue L514 is involved in a variety of hydrophobic contacts with vemurafenib (indicated by yellow arrows), which are lost in the L514E mutant (E).

Fig. S1 :
Fig. S1: Positive and negative design runtimes for a residue location.Each dot represents the amount of time (in seconds) that RESISTOR took to compute the positive and negative K * designs for a given mutation location.This means that each dot represents the computation of 40 K * scores.The computation times range from 813 seconds to 972465 seconds, with the average being 40630 seconds or 1015 seconds per sequence.The different colors represent the particular kinase/inhibitor pair.The designs were run on a 24-core, 48-thread Intel Xeon processor with 4 Nvidia Titan V GPUs.

Table 2 :
Prioritized BRAF mutations selected for experimental testing.We selected these mutants because they were prioritized by RESISTOR for at least one of the investigated inhibitors and were reported as patient mutations in either the COSMIC or cBioPortal.The numbers in the first four columns indicate the RESISTOR-predicted Pareto rank with melanoma mutational probabilities.The numbers in the last two columns indicate the number of patient samples containing the mutation reported in the respective database (access date 12/01/2022).Absence of a Pareto rank indicates RESISTOR predicted the mutant would remain sensitive to the drug.

Table S2 :
WT AA Mut AA Sig Prob ATP WT ATP Mut Drug WT Drug Mut Count Rank All RESISTOR resistance mutation predictions for EGFR with gefitinib."Pos" is the position of the residue."WT AA" is the wildtype identity of the amino acid."Mut AA" is the resistance mutation."Sig Prob" is the mutational signature probability for the mutation from "WT AA" to "Mut AA" in lung adenocarcinoma."ATP WT" and "ATP Mut" are the K * scores of the endogenous ligand with the wildtype and mutant residues, respectively."Drug WT" and "Drug Mut" are the K * scores of gefitinib with the wildtype and mutant residues, respectively."Count" is number of resistance mutations at the position."Rank" is the Pareto rank of the mutation.Note: K * scores are in log 10 units where possible and 0 where there is predicted to be no binding.