Introduction

Missense mutations play a central role in the onset and progression of cancer1. Examples of relevant molecular mechanisms include oncogenic activation/inactivation of proteins1, disruption of the contacts between proteins and their macromolecular interaction partners2,3,4,5, or emergence of cancer drug resistance6. The last has been previously addressed by a computational protocol6,7 predicting likely resistance mutations in the pharmacological target of targeted cancer drugs. Alteration of protein-protein interactions plays a major role in oncogenic signalling and is thus the focus of many experimental and computational studies8,9,10,11. These network-based studies investigate the cancer-related interactome with the aim to e.g. identify novel therapeutic strategies or provide novel insights into cancer biology. In addition, several studies mapped known cancer-associated mutations on proteins, including protein-protein interfaces9,12,13,14,15,16 or investigated disruption of protein-protein interactions outside of a cancer context17. However, to the best of our knowledge, no theoretical framework exists to systematically evaluate mutations within the interaction interfaces of critical signalling and regulatory components to identify disrupting mutations involved in the aetiology and progression of cancer.

We suggest that such mutations (1) have a high likelihood to be formed in a particular cancer type and (2) affect the molecular interactions formed by interaction partners, i.e. disrupt in the investigated cases. We report here the development and validation of a computational protocol, Disruptor, to address these aspects.

Results and discussion

Disruptor builds upon our previous work6 where we systematically evaluate the impact on binding affinity for all possible mutations within the binding interface (Fig. 1a, b) using experimental structures of central protein complexes. In addition, we combine gene sequences and mutational signatures18 to calculate the relative probability with which a specific mutation is formed. Results of these analyses are used to predict and rank mutations that have a high probability to become clinically relevant for carcinogenesis (Fig. 1b).

Fig. 1: Overview of the workflow and computational and experimental p16-CDK6 results.
figure 1

a Schematic representation of the molecular mechanism, where the two binding partners are presented in blue and green. Upon mutation (red), binding is disrupted. b Overview of the computational workflow. The inset shows the interaction between p16 (green) and CDK6 (blue), with the interaction interface coloured yellow (p16) and violet (CDK6). c Heatmap showing the changes in the Osprey30 Log10 K* score for mutations (Y-axis, H1-3 correspond to different histidine protonation states) compared to wildtype (wt, marked black) p16 residues (X-axis). Triple point mutations are marked grey. Hotspot residues predicted to disrupt interaction with CDK6 are coloured red on the p16 surface below. Arrows indicate the p16 residue position. d Schematic depiction of the LUMIER assay for the detection of protein:protein interactions. A p16 protein tagged with the NanoLuc Luciferase (NLuc) is transiently expressed in HEK293T cells together with Flag-tagged CDK6. The complex is immunoprecipitated with Flag antibodies and the emission of light is detected on-bead upon substrate (benzylcoelenterazine) addition if the bait protein is present. Expression profiles have been validated by Western Blot as shown in Supplementary Figs. 1 and 2. Introduction of dimerisation interfering mutations to either CDK6 or p16 lower the detected luciferase signal. e LUMIER assay of Flag-tagged CDK6 variants in the presence of wildtype or mutated p16-NLuc. Please note, we introduced the p16 mutations into mouse, not human, p16 and thus G23S, G55D, and P81L correspond to mouse G15S, G47D and P73L. The bioluminescence signals were normalised on the corresponding input signals. Bars represent the luciferase intensity relative to wild-type CDK6 and p16 interactions. Error bars represent SEM of n = 6 independent experiments. Significance was determined by two sided t-test *p < 0.05; **p < 0.01; ***p < 0.001. Structures were visualised using PyMOL42.

We have tested Disruptor on five well-studied targets involving p53:DNA (a consensus recognition sequence), p53:ASPP2 (also known as 53BP2), ERK2:DUSP6, p16 (also known as INK4a or CDKN2A):CDK6 (Fig. 1b, c), and Smad4:Smad2 complexes. In all cases, Disruptor predicted clinically relevant mutations, which have been demonstrated to disrupt binding to their respective interaction partner (Table 1). For example, this includes highly prevalent p53 hotspot mutations, e.g. at residues R248, R249, and R273, which are known to interfere with binding of the transcription factor to its DNA response element and thus hamper transactivation2,5. Furthermore, transactivation data deposited in the International Agency for Research on CANCER (IARC) TP53 database (version R20, July 2019)19 confirmed that 31% (67 out of 215) of our predicted mutations were indeed non-functional or only partially functional. In contrast, only 4% of mutations (10 out of the 215) showed transactivation activity despite their classification as disruptive. Unfortunately, for the vast majority of predicted p53 mutations (64%) within the DNA binding site, no functional or mechanistic data were available. This lack of data was not limited to p53, which is a thoroughly investigated target, but we could observe it for all investigated examples: for each interaction pair, we identified several mutations, that have not yet been investigated experimentally despite their detection in cancer patients, sometimes even multiple times (Supplementary Tables 18). To investigate some of these understudied mutations in more detail, we selected three p16 (G23S, G55D, and P81L, Fig. 1c) and two CDK6 (D102N and D110N) patient mutations predicted by our method, based on Log10 K* scores in comparison to the wildtype score (ΔK* (Log10) score) and exclusion of triple contiguous mutations in a sequence, for experimental validation. Please note, we introduced the p16 mutations into mouse, not human, p16 and thus G23S, G55D, and P81L correspond to mouse G15S, G47D and P73L in Fig. 1c. Intriguingly, in a biochemical assay for quantifying binary complex formation of cellularly expressed proteins (termed LUMIER assay;20 Fig. 1d) all five of our selected mutants showed a significant decrease in their binary interaction with the binding partner when compared to the non-mutated complex of p16:CDK6 (Fig. 1e).

Table 1 Examples of top-ranked computationally predicted patient mutations confirmed to disrupt complex formation.

Besides providing validation of Disruptor, this indicates that there may be many more overlooked disease-relevant mutations in patients that occur only at low rates and thus only affect a small patient population or even individuals. We therefore suggest that our method could be an especially valuable tool in precision medicine.

However, some limitations of the current approach should be noted. There are many other mechanisms by which mutations can affect protein function that are not addressed in this computational framework. For example, many p53 mutations also exert a gain-of-function phenotype, e.g. via changes in protein stability or reprogramming of DNA or protein-protein interaction preferences21. In addition, Disruptor requires structural data as input, which may not always be available. We are thus working on the extension of our computational toolbox towards additional molecular mechanisms and are investigating the suitability of computationally derived structural models as starting point for our analyses. Unfortunately, the lack of a comprehensive dataset describing the impact of all possible mutations within the interaction site of a protein on binding to its partner as well as their relevance for cancer patients prevents us from evaluating the performance of Disruptor in a systematic manner. However, we are encouraged by the fact that we could identify disruptive patient mutations for all case studies investigated in our retrospective as well as our prospective experiments.

Taken together, we report a computational protocol to prospectively predict protein mutations affecting binding to macromolecular interaction partners. It can be applied to investigate data on novel patient mutations, guide selection of mutants for subsequent wet lab experiments, and even predict a potential mode of action on a molecular level. In addition, Disruptor can not only be used to systematically investigate all mutations within the interaction interface of a given target of interest, but also identify those that will most likely emerge in the clinic. Moreover, we highlight an adaptable computational workflow for anticipating and unveiling the functional relevance of less common and overlooked patient mutations.

Methods

Preparation of input structures

The following PDB entries were used for the analysis: 1tup (p53:DNA complex)22, 1ycs (p53:ASPP2)23, 2fys (ERK2:DUSP6)24, 1bi7 (p16:CDK6)25, and 1u7v (Smad4:Smad2)26. All structures were prepared using the default parameters of the Protein Preparation Wizard27 in Maestro release 2020-328 and all water molecules and buffer components were deleted. In case of CDK6, ERK2, and p53:ASPP2, only residues within 12 Å of the interaction interface were included, and chains A and B were used for the calculation for both p53:ASPP2 and Smad4:Smad2. All three p53 copies were analysed in case of 1tup.

Computational evaluation of mutations

Structures and definitions of mutable residues and allowed mutations were submitted in YAML file format. In case of histidine mutations, all three protonation states were considered. Mutable residues were either investigated alone or in pairs. Mutable residues were set to continuously flexible, all other residues were kept rigid. ZAFF29 force field parameters were added for zinc ions and zinc coordinating residues and downloaded here: https://ambermd.org/tutorials/advanced/tutorial20/ZAFF.htm (access date 2021/04/07). Template coordinates and force field parameters for phosphoserines were calculated using Antechamber 19.0. An example input file for each of the interaction pairs is provided in Supplementary Data 1. Osprey version 3.27,30 was used for calculating K* scores, which predicts low-energy structural ensembles and provides an approximation to binding affinity. It does this by computing provable bounds on the partition function values for molecular ensembles of the protein, interaction partner, and bound complex31. The proof of the epsilon accuracy of the algorithm is presented in Appendix A of Lilien et al.31, and a recent summary of the algorithm’s details can be found in the Method Details section of Guerin et al.7. The stability threshold was disabled and an epsilon of 0.03 was used.

Calculation of relative probabilities

A detailed description of the calculation of relative probabilities has been reported previously6,7,32. Briefly, mutational signatures and their contribution to the mutational burden in a particular cancer type18 have been combined to calculate cancer-specific values for single base exchanges within a defined trinucleotide context. These have been used to calculate relative probabilities for generation of the DNA sequence mutations encoding for the investigated protein amino acid mutation. We only calculated relative probabilities for mutations that could be generated with single- or double base pair exchanges, because we considered mutation of the whole trinucleotide codon required for triple contiguous mutations in a sequence as extremely unlikely6. Colorectal and cervical cancer associated relative probabilities have been calculated for ERK2, and melanoma and colorectal cancer associated relative probabilities have been calculated for p16 and smad4, respectively. No cancer-associated relative probabilities have been calculated for p53, given that p53 mutations have been observed in the majority of cancer types.

Data analysis

Mutations with a change of Log10 K* scores > –3 in comparison to wildtype scores from the same run were considered to disrupt the interactions. Please note, that this cut-off can be adapted at the user’s discretion and according to the project requirements. Histidine mutations were included only if all three protonation states disrupted binding. Triple contiguous mutations in a sequence requiring mutations of all three bases of the codon were discarded. This led to a final set of mutations we considered clinically relevant (see Supplementary Data 2). Importantly, we only need to correctly classify whether a mutation disrupts the investigated interaction and how likely its formation is in patients in relation to the other mutations under consideration. Accurate calculation of the exact binding affinities or probabilities of mutation formation is thus not required. To prioritise mutations further, the number of individual mutations included for each residue position were analysed. Protein residues with the highest number of predicted individual mutations were considered as “mutational hotspots” and cancer-associated probabilities for all mutations at these positions were calculated to prioritise individual mutants further. An overview of the top-three mutational hotspots, and the individual mutations and their relative probabilities are reported in Supplementary Tables 18. Data were analysed using Microsoft Excel version 2019, Prism 6 for Windows - version 6.07 (GraphPad), and Python 3.7.6.

Selection of mutants to be tested

Two of the three p16 mutations (human G23S and G55D (mouse G15S and G47D)) were chosen because they were prioritised by our protocol (Fig. 1c, Supplementary Table 6) and both have been associated with hereditary melanoma33,34. P81L (corresponding to P73L in mouse) was included, because within the dataset of computationally predicted mutations (Supplementary Data 2) it was among the most frequently reported in cancer patients (29 times). In contrast, CDK6 generally appears to be mutated at a very low rate, with only 97 unique missense mutations reported in COSMIC35 (access date 2022/04/25) in total and the most common mutations observed only five times in patient samples. For comparison, the p16 H83Y mutation is reported 128 times and one of 387 unique missense mutations deposited. We therefore focused on two CDK6 mutations (i.e. D102N and D110N, Supplementary Data 2) that were also observed in the clinic.

Cell culture and antibodies

HEK293 cells (ATCC, CRL-11268) were grown in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% fetal bovine serum (FBS). Transient transfections were performed with TransFectin reagent (Bio-Rad, #1703352). Antibody used for LUMIER experiments was mouse anti-FLAG (Sigma-Aldrich, #F3165). The expression constructs were cloned using cDNA as PCR templates for amplifying the inserts (CDK6: Gene ID 12571, p16: Gene ID 12578), digestion with restriction enzymes and ligation into a Flag or NLuc vector.

Western blotting

Expression of indicated Flag-tagged CDK6 constructs in HEK293 cells were determined via Western blotting with indicated Flag antibody [mouse anti-FLAG® M2-tag 1:1000 (Sigma-Aldrich, St. Louis, MO, USA, F3165-1MG)]. 5× SDS loading buffer was added to the lysate to reach a final concentration of 1× SDS LB. Western Blots are shown in Supplementary Figs. 1 and 2.

LUMIER experiments

HEK293 cells were transiently transfected with wild type or mutated p16-NLuc (NanoLuciferase) and 3×Flag-tagged wild type CDK6 constructs. Subsequent to homogenising the cells with a syringe [lysis buffer: 150 mM NaCl, 10 mM sodium phosphate (pH 7.2), 0.05% Triton-X100 supplemented with standard protease inhibitors] the lysates were clarified by centrifugation at 13,000 rpm for 20 min. Cell extracts were incubated on an overhead shaker with anti-flag antibody (0.6 μg per sample) and protein G–Sepharose beads or IgG beads for 3 h at 4 °C. Isolated complexes were washed three times with lysis buffer and three times with PBS. Probes were transferred to 96-well white-walled plates and subjected to bioluminescence analysis using the PHERAstar FSX luminometer (MARS, version 3.30 (BMG Labtech). As substrate benzylcoelenterazine is used. NLuc bioluminescence signals were integrated for 1.2 s following addition of luciferase substrate. Raw Data Bioluminescence signals are shown in Supplementary Fig. 1 and raw data values are provided in Supplementary Data 3.

Statistics and reproducibility

LUMIER results were reported from six independent experiments. Significance was determined by two sided t-test *p < 0.05; **p < 0.01; ***p < 0.001.