Molecular Systems Biology Peer Review Process File Systematic Analysis of Somatic Mutations in Phosphorylation Signaling Predicts Novel Cancer Drivers Transaction Report

(Note: With the exception of the correction of typographical or spelling errors that could be a source of ambiguity, letters and reports are not edited. The original formatting of letters and referee reports may not be reflected in this compilation.) Thank you again for submitting your work to Molecular Systems Biology. I apologize for the delay in getting back to you-we received the report of reviewer #3 only yesterday. We have now finally heard back from the three referees who agreed to evaluate your manuscript. As you will see from the reports below, the referees find the topic of your study of potential interest. They raise, however, substantial concerns on your work, which should be convincingly addressed in a major revision of the work. The rigor of the statistical analysis should be considerably improved to strengthen the conclusions you wish to draw from your analysis. Further analyses seem to be required to demonstrate the power of your integrative approach and characterize better novel cancer drivers identified in this study and highlight the 'actionable results' of your study. If you feel you can satisfactorily deal with these points and those listed by the referees, you may wish to submit a revised version of your manuscript. Please attach a covering letter giving details of the way in which you have handled each of the points raised by the referees. A revised manuscript will be once again subject to review and you probably understand that we can give you no guarantee at this stage that the eventual outcome will be favorable.

Thank you again for submitting your work to Molecular Systems Biology. I apologize for the delay in getting back to you--we received the report of reviewer #3 only yesterday. We have now finally heard back from the three referees who agreed to evaluate your manuscript. As you will see from the reports below, the referees find the topic of your study of potential interest. They raise, however, substantial concerns on your work, which should be convincingly addressed in a major revision of the work.
The rigor of the statistical analysis should be considerably improved to strengthen the conclusions you wish to draw from your analysis. Further analyses seem to be required to demonstrate the power of your integrative approach and characterize better novel cancer drivers identified in this study and highlight the 'actionable results' of your study.
If you feel you can satisfactorily deal with these points and those listed by the referees, you may wish to submit a revised version of your manuscript. Please attach a covering letter giving details of the way in which you have handled each of the points raised by the referees. A revised manuscript will be once again subject to review and you probably understand that we can give you no guarantee at this stage that the eventual outcome will be favorable.

1.
A key motivation for integrating so much data is to gain statistical power to detect driving events that are common to multiple cancer types. The data for each of the individual cancer types have been presumably already considered for phospo-mutations by the authors involved in those studies. The novelty here comes from integrating all 800 samples from different cancer types. This approach has not yet been fully explored and it should be relevant here and should produce oncogenes that are only nominated once the full public data set is brought to bear.
2. Figure 2a, should include a figure showing the results for all of the data combined. That figure should be discussed primarily in the text, since it should be the main result.
3. Justification for the study in the introduction should include examples where druggable driving point mutations showed up across cancer types, like the druggable V600E mutation in melanoma and hairy cell leukemia. Such mutations recurring across cancer type are important for identifying drug targets that are only worth drugging, because they become significantly mutated across all cancers. This is especially important since many studies are not yielding many novel drug targets, and genes like CDK12 are showing up as significantly mutated in both prostate and ovarian cancer. 4. It is not clear to me why the author's presented a novel statistical method to find phospho-sites. Finding phospho-sites should be a straight-forward application of the mutational significance calculation developed by the group at the Broad, using the phospho-sites as your target set, rather than exomes or whole genomes. At least that is the expected approach. The authors need to justify taking a novel approach, since this is the norm in the field. 5. The data on TP53, EGFR and PTEN are worth presenting as examples demonstrating the utility of your approach. They are not inherently interesting as novel findings, so less attention should be paid to them.
6. Finally, the two examples of novel findings should be developed further. At least they should be the major results of the study, since they are the most actionable. This paper: Kan Z, Jaiswal BS, Stinson J, et al. Diverse somatic mutation patterns and pathway alterations in human cancers. Nature. 2010 Aug 12;466(7308):869-73 does an outstanding job of surveying for recurrent druggable mutations, focusing on two interesting cases. Your study is very similar in outline and intention to this work and this study suggests a much better organization of the presented results.
7. Some results that should be present in your study are structural models. If your findings are convincing, then they should be enriched in homologous sets of sequences, like they are for GRM and BAI in the Kan study above. They should also be convincing at the level of protein structure. The leap to pathway integration is not the immediate next step and there are likely more specific actionable findings that can be explored this way. If FLNB or PRKCZ are truly driving, you should be able to marshal more support using this data.
Reviewer #2 (Remarks to the Author): Systematic analysis of somatic mutations in phosphorylation signalling predicts novel cancer drivers This manuscript describes a very interesting new method (PhosphoDriver) to identify novel cancer driving genes from existing SNV data. There are several pitfalls in the statistics described in this analysis that I will point out below. In addition, the manuscript would greatly profit from more characterization of novel cancer driving genes and less repetition of known facts. As general comment, the text is disturbed by a large number of outcomes of statistical tests. I would suggest to create some supplementary tables where these could be found by interested readers in order to make the text more readable.
1) Phosphorylation sites are more prominent in unstructured/loop regions of proteins. As these parts of proteins are less restricted spatially they also tend to evolve faster than structured regions. Based on this observation I would expect that phosphorylation sites would also evolve faster than the proteins overall and that more SNV occur in these regions. Thus taking as background the proteincoding sequence length of all involved cancer samples (page 11) would bias the result in such a way that the enrichment of SNVs in phosphosites would be overestimated. The authors should rather split the regions into structured/unstructured regions and take the correct parts of protein sequences as background distribution.
2) Page 14: "the enrichment and depletion of phosphosite mutations was contrasted to the gene-wide mutation rate in two one-tailed tests, and the better p-value was selected as the outcome". If there is no prior hypothesis for depletion or enrichment, a two-tailed test should be done. In the way the authors have performed their analysis, p-values are overestimated.
3) Page 14: "The total gene-based significance was computed as the product of p-values from all phosphosites of a gene". This is an incorrect way of comparing two groups of values, the authors should have rather done a t-test to test significance in one go between the two groups. 4) Page 15: "We developed a greedy algorithm to search the kinase-substrate network for kinasesubstrate interaction modules that cover pSNVs that are maximally informative of a given clinical outcome." This method will completely overtrain on the examples. The authors should have kept a training and testing set separately, define the informative modules on the training set and report clinical outcome on the test set.

5)
Given that both small-scale and large-scale datasets are used it is not surprising that the most highly studied cancer genes involved in signalling come out from the algorithm. The authors should probably repeat the analysis based on unbiased large-scale data only, in order to not bias the whole study towards cancer genes from the beginning. 6) As the authors consider SNVs that alter kinase domains (page 4) as pSNVs and there is a large class of drugs directed against kinases, it is not surprising that pSNVs are enriched in drug targets. 7) Page 6: Could the authors describe in more detail what would be the effect of the pSNVs on the function filamin-B? 8) Page 6: I would say G6PD is not a completely novel cancer driver. There are now several publications describing the role of G6PD in cancer. As G6PD is an enzyme in the pentose phosphate pathway (PPP) and the major source of NADPH and NADPH maintains the level of glutathione that helps protect against oxidative damage, this could be a mutation that is involved early in the cancer process, allowing by oxidative damage more mutations to occur. Not only that, in cancer cells primarily glycolysis is used for ATP production (Warburg effect), in this way cancer cells can direct glucose to biosynthesis. P53 normally suppresses the PPP by binding to G6PD, but P53 mutants lack this inhibitory activity (Jiang et al, Nat Cell Biol 2011). The G6PD phosphomutant could have the same effect. In order to strengthen this manuscript, this would need to be confirmed by an experiment studying the activity of the G6PD mutant enzyme. 9) Page 7: the pathway analysis may reveal pathways that were not found by studying the overall occurrence of SNVs. However, the authors do not name any novel pathways or complexes nor novel explanations. They could therefore remove the whole paragraph. 10) Page 8: Network analysis. "deeper investigation of others is likely to reveal novel mechanisms of cancer biology and interesting therapeutic targets" If this is the case why do the authors constrict the analysis to repeating known kinases and not do this deeper investigation? 11) Heuristic network search. As stated above, I doubt that the statistics of this analysis has been performed correctly.
Minor comments: Glucose-6 phosphate-1 hydrogenase should be Glucose-6 phosphate-1 dehydrogenase Reviewer #3 (Remarks to the Author): This is an extremely well-written paper on a topic of critical importance, namely the understanding of how somatic mutations are a)selected, b)identified in the context of driver mutations, and c)perhaps most importantly -how these mutations are manifest in the functional signaling architecture. This linkage is most important because a vast majority of the targeted therapy pipeline are directed to kinase and protein enzyme inhibition-mostly phosphorylation driven.
The only significant weakness of the report, which is mainly an informatic analysis of public database sets is that the phosphoprotein/phosphosite databases are likely highly biased and is likely incomplete. This is because these databases are derived from a)large tumors whereby enough material was available for unbiased phosphoproteomic analysis, and thus small highly malignant tumors are not likely well-represented in these databases. b)contaminated with cellular compartments derived from stromal cells, infiltrating immune cells, tumor and non-tumor epithelium, endothelial cells, etc. Consequently, tying the somatic mutational load with phosphoproteins that may come from varied and unknown cellular compartments could be problematic. The authors need to address this issue and discuss how there results are not impacted by these issues, or if these issues limit the impact of their findings 1 We would like to thank the reviewers for their helpful and constructive comments.

Reviewer's comment
This is an ambitious study that highlights the different potential utilities of focusing on phosphorylation. It is impressive in its breadth of topics.

Response
Thank you for the positive note.

Reviewer's comment
This also leads to an inherent weakness of this study in that it sometimes is superficial in nature and could be considered diffuse. Also, the specific novel results of driving mutations in FLNB and PRKCZ get lost and it is not clear why the power of 800 tumor samples is not yielding any new insights as to specific novel oncogenes and thereby drug targets. That said the author's should be commended for the volume of analysis carried out in this work.

Response
This is a very helpful comment. We have made an effort to make the manuscript more concise and tried to emphasize novel results. We have also added new analysis regarding drug targets to suggest actionable findings. Our actions are detailed in the points below.

Reviewer #1 -comment 1
Reviewer's comment 1. A key motivation for integrating so much data is to gain statistical power to detect driving events that are common to multiple cancer types. The data for each of the individual cancer types have been presumably already considered for phospo-mutations by the authors involved in those studies. The novelty here comes from integrating all 800 samples from different cancer types. This approach has not yet been fully explored and it should be relevant here and should produce oncogenes that are only nominated once the full public data set is brought to bear.

Response
Interestingly, previous studies do not seem to consider phospho-mutations, which is one reason we chose to focus on such events. However, the reviewer raises an excellent point about integrating all 800 samples. We thus repeated our analysis of phosphosite mutations with a merged dataset of SNVs from 9 cancer genomics projects. The analysis identified 14 additional genes that are not found by investigating individual cancer types. Remarkably, all but one of these additional genes are novel candidate genes and two are known drug targets. These results are now presented as a separate panel in Figure 2a and discussed in Results p7.

Manuscript changes
The reviewer's comment applies to the second section of our analysis (identification of genes with significant phosphosite mutations). Other sections regarding general trends, pathways and networks already discuss phosphosite mutations in a multi-cancer context. Additional gene-based phosphosite analysis with merged collection of SNVs presented in Figure 2a and Results p7. About one third (21) of detected genes have phosphosite-specific mutations in multiple types of cancer (Figure 2a), potentially highlighting general cancer 2 driver mechanisms. As integration approaches are less explored in cancer genomics studies, most of our additional findings from the merged dataset analysis represent novel candidate cancer genes. Further, IRS1 (insulin receptor substrate 1) and GRM1 (glutamate receptor, metabotropic 1) are known drug targets (Knox et al., 2011) and therefore may be directly actionable for therapy development. KRAS in the only well-recognized cancer gene in the merged analysis and is listed due to less-than-expected number of pSNVs (FDR p = 4.6 × 10−3 from ActiveDriver), indicating the role of phosphorylation signaling in its oncogenic activities. These results illustrate the utility of data integration from multiple cancer genomics projects.

Response
We have improved Figure 2a as suggested by the reviewer, and discussed the results as mentioned above. A separate panel of Figure 2a now indicates additional genes that were found as statistically significant according to ActiveDriver analysis of the merged set of cancer mutations. In addition, we have added an extra panel below gene scores to show the cancer types in which phosphosite mutations are seen in detected genes. This highlights genes relevant in multiple cancer types. We still maintain the analysis of the individual cancer types, as clinical colleagues we've discussed our work with are very interested in cancer type specific results, thus these results are also valuable.

Response
This is an excellent point. We have added the suggested justification to the introduction of our manuscript.

Manuscript changes
Updated manuscript in Introduction p3.
Of particular interest are driver mutations that occur in multiple cancer types, for instance the druggable BRAF V600E mutation in melanoma and hairy cell leukemia (Tiacci et al., 2011;Chapman et al., 2011). Targeted drug development for such mutations may lead to effective multi-cancer therapies.

Reviewer #1 -comment 4
Reviewer's comment 4. It is not clear to me why the author's presented a novel statistical method to find phospho-sites. Finding phospho-sites should be a straight-forward application of the mutational significance calculation developed by the group at the Broad, 3 using the phospho-sites as your target set, rather than exomes or whole genomes. At least that is the expected approach. The authors need to justify taking a novel approach, since this is the norm in the field.

Response
We would like to thank the reviewer for this comment. We have now clarified why a novel method is required.
The main reason is that our method has different and novel aims. Previous methods, like MutSig, are global in that they seek genes whose mutation rate is considerably different than global background rate estimated from genomes or exomes. Our method (now called ActiveDriver) is different because is genecentered, that is, it compares a particular site on a gene to the background of a single gene. Thus, frequently mutated genes with no special phosphosite mutations are not detected by our method, while those with infrequent albeit specific pSNVs may gain statistical significance. In addition to above, we consider different types of information relevant for a gene-centered approach that studies mutations in functional sites. In particular, our model focuses on missense mutations and considers protein structure (disordered regions, as pointed out by the second reviewer, see comment 1 of Reviewer #2) and phosphosite position (number and distance of phosphosites with respect to mutations). These are modeled as confounding factors using a generalised linear regression statistical approach. MutSig does not seem to consider any protein or site level information and apparently uses a simpler binomial statistic based approach.
To test the effect of these differences on our results, we implemented a MutSiglike global analysis of missense mutations using binomial statistics, and tested all relevant genes in a cancer type-specific manner. We concluded that most genes highlighted by ActiveDriver, including several well-known cancer genes, would not be considered significant according to this global strategy. Results of this analysis are now shown in Supplementary Figure 4. Unfortunately, we could not use MutSig directly, since it is not publicly available. We contacted the authors who told us they would not share it until it was separately published, but they have no timeline for this.
Note: We assume that the reviewer means 'finding significantly mutated phosphosites' rather than 'finding phosphosites'.

Manuscript changes
Add section of method comparison to Results pp8-9. Update Supplementary Figure 4 to include global significance estimation of ActiveDriver results.
Conventional cancer genomics analysis focuses on frequently mutated cancer genes and ignores the long tail of genes with rare mutations. In contrast, our 'gene-centric' model considers each individual gene and detects signaling sites whose mutations are unexpected given the gene-wide mutation rate. We therefore find many genes that harbor infrequent, albeit highly specific mutations that are missed when considering mutation frequency alone ( Figure 2c). In particular, our results include nine known cancer genes not found among the top 58 genes ranked by mutation frequency (median rank 319, Supplementary Figure 3 . As all these were also found with ActiveDriver, we conclude that modeling protein disorder and phosphosite position is important for estimating cancer gene significance. Our method is therefore more sensitive than standard approaches, as it highlights eleven additional cancer genes and many novel candidates with highly specific phosphosite mutations.

Reviewer #1 -comment 5
Reviewer's comment 5. The data on TP53, EGFR and PTEN are worth presenting as examples demonstrating the utility of your approach. They are not inherently interesting as novel findings, so less attention should be paid to them.

Response
We have now made the manuscript more concise and emphasize novel data. First, we have highlighted the mosaic nature of somatic mutations in TP53 and their association to phosphosites. We also show that phosphosite mutations in TP53 are correlated with increased patient survival in two cancer types, now also when comparing patients with wildtype TP53 (Results pp9-10, Figure 3b-c). Second, we discuss the finding of 11 recurrent glioblastoma mutations at EGFR V289 that are part of a poorly characterised phosphosite in the extracellular domain at T290. Extracellular phosphorylation is little studied to date, but has gained more attention recently, for instance in a recent paper in Science (PMID: 22582013). Our finding suggests that phospho-targeted experiments at T290 may help researchers interpret this highly recurrent mutation in brain cancer. (Results p12, Figure 5b-c). These results are novel, to our knowledge.

Manuscript changes
Made sections on TP53 and EGFR more concise (Results pp9-10, p12  Figure 3b). The survival correlation is evident even in comparison to patients with wildtype TP53, suggesting that such pSNVs might be beneficial for tumor suppression. A similar correlation between pSNVs and better prognosis is seen among glioblastoma patients with long-term survival (more than one year), although its statistical significance is low due to small sample sizes (log-rank test p = 0.066, Cox regression p = 0.17, Figure 3c). In agreement with these data, phosphorylation of T155, S183, S269, T284 by casein and aurora kinases has been associated with TP53 inhibition via post-translational degradation and transcriptional repression (Bech-Otschir et al., 2001;Wu et al., 2011). The highlighted pSNVs in 118 patients potentially inhibit phosphorylation of these sites, meaning that TP53 will no longer be degraded if mutated. Taken together, our data suggest a double negative mechanism in which phosphorylation-mediated inhibition of TP53 is potentially disrupted by pSNVs, leading to reduced inhibition of TP53 tumor suppressor function and increased survival. In contrast to the above mutation hotspots, phosphosites in TP53 termini appear as highly significant mutation deserts (3 pSNVs observed, 67 ± 8 expected, p = 2.8×10−30 from ActiveDriver). The observed negative selection indicates the importance of these regions as tumorigenic signaling interfaces. The N-terminus of TP53 is the interaction interface of its primary inhibitor MDM2, and TP53 phosphorylation S15 and S20 inhibits MDM2 binding and leads to stabilization and activation of TP53 (Chehab et al., 1999). The absence of mutations in the region suggests that the sequence is required for successful docking of MDM2 and inhibition of apoptosis. This example demonstrates the utility of ActiveDriver in interpreting somatic mutations in known cancer genes. EGFR results (p12): Analysis of kinase-centric modules provides phosphorylation-associated interpretation of recurrent mutations in well-established cancer genes. For example, the majority (56% = 29) of EGF receptor (EGFR) missense point mutations associate with phosphorylation and appear in a tissue-specific, mutually exclusive pattern (Figure 5b). The kinase domain of EGFR is characterized by 15 lung cancer pSNVs (FDR p = 4.9 × 10−8 from ActiveDriver), including the well-studied L858R mutation that affects EGFR autophosphorylation and is used as a clinical marker for therapeutic outcome (Sharma et al., 2007). In contrast, 14 SNVs in glioblastoma are associated to EGFR phosphorylation sites, suggesting that the mutations play a role in posttranslational activation this oncogene. Eleven pSNVs alter the residue A289 that directly flanks a novel putative phosphosite T290 (Ruse et al., 2008) in the extracellular domain of the protein (FDR p = 8.5x10−16 from ActiveDriver, Figure 5c). While extracellular phosphorylation mechanisms are generally poorly understood, a recent study has characterized a novel family of secreted kinases with role in human disease (Tagliabracci et al., 2012). Further study of the role of phosphorylation at this site may help explain the mechanism of highly recurrent glioblastoma mutations.

Response
We thank the reviewer for highlighting this relevant paper. This paper focused on studying recurrent mutations across 441 cancers of multiple types in a subset of 1507 genes. Genes with recurrent mutations were highlighted and followed up with comparison to other types of mutations (copy number), gene family analysis, drug target analysis and an experimental study of the effect of MAP2K4 mutations on growth phenotypes in cell lines. Our analysis is similar, but introduces a number of novel computational methods useful and reusable in other cancer genomics projects. We now have made an effort to include additional analysis for novel findings, including a comprehensive drug target analysis, although we have decided to focus on missense mutations and leave integration of other mutation data and experimental follow up to future work.

Reviewer #1 -comment 7
Reviewer's comment 7. Some results that should be present in your study are structural models. If your findings are convincing, then they should be enriched in homologous sets of sequences, like they are for GRM and BAI in the Kan study above. They should also be convincing at the level of protein structure. The leap to pathway integration is not the immediate next step and there are likely more specific actionable findings that can be explored this way. If FLNB or PRKCZ are truly driving, you should be able to marshal more support using this data.

Response
Thank you for the recommendation, we have now found additional evidence from protein family analysis to support our findings of cancer gene candidates.
To investigate the presence of SNV in homologous proteins, we retrieved protein family members of highlighted candidate cancer genes (PRCKZ, FLNB, POU2F1, GRM1) from the Ensembl database, and conducted pSNV enrichment analysis using the Poisson statistical tests as in our pathway enrichment analysis. We found that homologs of PRKCZ are enriched in pSNVs (n=15 samples, p=2.30x10-13), but also all non-specific SNVs are more frequent than expected (n=27, p=1.98x10-9). In addition we found that the family of metabotropic glutamate receptors (paralogs of GRM1) is generally enriched in SNVs (n=13, p=3.2x10-3).

Manuscript changes
We have added these arguments into our manuscript in Results p8, p13.

Reviewer's comment
This manuscript describes a very interesting new method (PhosphoDriver) to identify novel cancer driving genes from existing SNV data. There are several pitfalls in the statistics described in this analysis that I will point out below. In addition, the manuscript would greatly profit from more characterization of novel cancer driving genes and less repetition of known facts. As general comment, the text is disturbed by a large number of outcomes of statistical tests. I would suggest to create some supplementary tables where these could be found by interested readers in order to make the text more readable.

Response
Thank you for the positive remarks. We have now addressed the statistical comments, made the text more concise, and added more information about our novel findings. Specific actions are described in the points below.

Reviewer #2 -comment 1
Reviewer's comment 1) Phosphorylation sites are more prominent in unstructured/loop regions of proteins. As these parts of proteins are less restricted spatially they also tend to evolve faster than structured regions. Based on this observation I would expect that phosphorylation sites would also evolve faster than the proteins overall and that more SNV occur in these regions. Thus taking as background the protein-coding sequence length of all involved cancer samples (page 11) would bias the result in such a way that the enrichment of SNVs in phosphosites would be overestimated. The authors should rather split the regions into structured/unstructured regions and take the correct parts of protein sequences as background distribution.

Response
This is an excellent suggestion. We have now updated the ActiveDriver method (previously called PhosphoDriver) and incorporated protein disorder as a confounding factor into the Poisson regression model (Results p6, Figure 2a, Methods pp18-19). Structured and disordered protein sequence regions, as predicted by the DISOPRED2 method, are now considered separately using distinct mutation rates. While the number of significantly phosphomutated genes is now smaller (58 vs. 144 earlier), we still find a number of supporting enrichments such as known cancer genes (n=15, p=1.1x10-11), transcription factors (n=15, p=3x10-4), kinase proteins (n=9, p=3.8x10-5) and drug targets (n=14, p=4.2x10-3). These observations validate our updated model and suggest that we are finding many genes imporant in cancer biology.

Manuscript changes
Updated ActiveDriver model to incorporate protein disorder information as a confounding factor (Results p6, Methods pp18-19). Recomputed set of significantly phospho-mutated genes and presented these in Figure 2a. Reevaluated enrichment statistics of significantly phospho-mutated genes (Results p6).
Results p6: ActiveDriver considers information on pSNV position within a phosphosite, protein structured and unstructured regions and cancer type specific mutation rate. [..] Second, post-translational modifications are known to occur in unstructured (disordered) regions of proteins, and such regions are also believed to evolve more rapidly than structured regions. Therefore we consider separate mutation rates for disordered and order and ordered protein regions, and encode 8 this information as confounding factors in the null and alternative regression models.

Methods pp18-19: The null hypothesis was expressed as the following interceptonly model h0 :E(Y)=exp(β0 +β(s) +X(s)), in which X(s) is set to one if the sequence position corresponds to disordered protein sequence, and equals zero if the corresponding region is structured (non-disordered)
. According to our alternative hypothesis, the mutations in the phosphosite region q are generated by rates λ that are significantly different from the baseline rate λ0 while considering protein disorder: h1 :E(Y|X)=exp(β0 +β(s) +X(s) +β(q) +X(q)).

Reviewer #2 -comment 2
Reviewer's comment 2) Page 14: "the enrichment and depletion of phosphosite mutations was contrasted to the gene-wide mutation rate in two one-tailed tests, and the better p-value was selected as the outcome". If there is no prior hypothesis for depletion or enrichment, a twotailed test should be done. In the way the authors have performed their analysis, p-values are overestimated.

Response
Thank you for pointing out the mistake in computing these p-values. We have now replaced the previous strategy with a single two-tailed test, and updated Results p9 and Methods p15 accordingly.
The test was only used in a comparative analysis between ActiveDriver and a simplified gene-centric strategy to find significantly mutated phosphosites. The two-tailed test expectedly provided fewer significant genes in the gene-centric analysis (only EGFR, TP53, IDH1, KRAS, FLNB were found with FDR p ≤ 0.05), therefore strengthening our conclusions regarding the ActiveDriver method.

Manuscript changes
Replaced the previous strategy with one binomial two-tailed test. Updated information regarding benchmarks in Results p9.
We further re-implemented ActiveDriver using binomial statistics in place of our disorder-corrected regression model, with all other factors unchanged, to test the effectiveness of our model versus standard methods. This simplified strategy only found five highly mutated genes as statistically significant (EGFR, TP53, IDH1, KRAS, FLNB; FDR p ≤ 0.05), the first four of which are cancer genes (p = 2.0 × 10−6, Fisher's exact test). As all these were also found with ActiveDriver, we conclude that modeling protein disorder and phosphosite position is important for estimating cancer gene significance. Our method is therefore more sensitive than standard approaches, as it highlights eleven additional cancer genes and many novel candidates with highly specific phosphosite mutations.

Reviewer #2 -comment 3
Reviewer's comment 3) Page 14: "The total gene-based significance was computed as the product of p-values from all phosphosites of a gene". This is an incorrect way of comparing two groups of values, the authors should have rather done a t-test to test significance in one go between the two groups. 9

Response
We agree that computing a product of site-specific p-values is not statistically rigorous, as the requirement of p-value independence is not met (background mutation rates for different sites are partially dependent). However we believe that such scoring is still useful to identify site-specific significance, and we present four arguments to justify our approach. 1. We only consider significant sites (p ≤ 0.05) in the gene-based p-value product, thus non-significant sites do not contribute towards the score. This was documented in the Methods, and we have now clarified this in the Results.
2. Comparing two groups of values (phosphosites vs. non-phosphosites), as recommended by the reviewer, provides less resolution for probing phosphosite mutations. A simple two-group test compares the average mutation rate of all phosphosites with the mutation rate of the remaining protein sequence. This assumes that all phosphosites, often covering a considerable fraction of the protein, are functionally important, and tends to 'average out' signal from few sites where mutations indeed indicate functional significance. As an extreme example, our manuscript discusses TP53 in which certain phosphosites are mutation hotspots while others are depleted of mutations. In the two-group analysis, the TP53 hotspot-coldspot mosaic would be averaged out. To test the proposed two-group comparison, we conducted our Poisson regression exactly as described, except that we merged all phosphosites simultaneously as a single factor in the alternative model. (Note that the proposed t-test is not appropriate here because low mutation counts are not normally distributed; t-test also prohobits use of confounding factors such as disorder). The results suggest that a naive two-group test fails to find genes found by site-specific p-value products. We only found five frequently mutated cancer genes using this approach (IDH1, EGFR, CTNNB1, KRAS, TP53; FDR p≤0.05). 3. We also tried a more complex approach where individual sites were detected first using the site-specific analysis presented in the manuscript. Then a twogroup comparison using Poisson regression was conducted such that only the significant sites were merged as a single factor in the alternative model. While this model is not prone to averaging from non-mutated sites, it is clearly less powerful in detecting genes with mixed types of mutations (direct, proximal flanking, distal flanking) in several sites. Specifically, this two-group analysis identified 17 (29%) of genes shown in the manuscript and found no additional genes (FDR p≤0.05; BOD1L, TP53, CCDC165, HTATSF1, G6PD, IDH1, EGFR, RBM27, FLT4, PKN3, RELA, ABL1, FLNB, PRRC2A, KRAS, CTNNB1, TMCC2). As an example, the tumor suppressor PTEN has direct (Y68C), proximal flanking (R173C, F241L) and distal flanking (T131A, G132D, R159K, T160I, T167S, T167S) phosphosite mutations in different regions of the protein (glioblastoma TCGA dataset). Our site-specific scoring in ActiveDriver highlights PTEN as significant (p=8.8x10 -4 , FDR p=0.041), however the twogroup model does not (p=0.015,FDR p=0.18). Similarly, the two-group test failed to highlight 13 of 14 candidate cancer genes that are only found in the sitespecific analysis of merged somatic mutations from merged cancer datasets (see comments 1-2 of Reviewer #1). This indicates that the power of gene-wide twogroup test is reduced, as more complex models are needed to reflect phosphosite mutation specificities of all sites simultaneously, while sparsity of mutations easily leads to overfitting, reducing statistical significance. 4. Finally, we attempted to model phosphosites with separate factors (a multigroup analysis) and used forward model selection to define a final, optimal 10 alternative hypothesis. The initial alternative hypothesis included one factor corresponding to the top-ranking site from the site-specific analysis as well as protein disorder. The selection space of additional factors included all other phosphosites as well as variables encoding direct and flanking regions of mutations. Details of the Poisson regression and significance estimation remained consistent with the original analysis. The multi-group analysis provided comparable results with the above two-group study, revealing 10 (17%) of genes presented in the manuscript and finding no additional genes compared to our proposed ActiveDriver approach (FDR p≤0.05; TP53, CCDC165, IDH1, RBM27, EGFR, FLNB, KRAS, CTNNB1, HTATSF1, TMCC2). In summary, we claim that our proposed strategy of site-specific significance estimation and p-value products is a more powerful method that better reflects the complexity of phosphosite mutations compared to two-group tests.

Reviewer #2 -comment 4
Reviewer's comment 4) Page 15: "We developed a greedy algorithm to search the kinasesubstrate network for kinase-substrate interaction modules that cover pSNVs that are maximally informative of a given clinical outcome." This method will completely overtrain on the examples. The authors should have kept a training and testing set separately, define the informative modules on the training set and report clinical outcome on the test set.

Response
This is a fair point. We present two arguments to justify our method. 1) Use of training and test sets to find modules. Our current method is limited to clinically motivated network search and exploration rather than classification and/or predicting clinical outcome or previously unseen mutations. We added discussion regarding this limitation to Results p14. The major reason for this limitation is data sparsity -most pSNVs are seen in very low numbers (often single mutations), restricting the predictive power of any computational clinical test. Another reason relates to topological properties of the kinase-substrate network -the network has a small diameter, meaning that most proteins are connected to one another within few interactions, and sparse mutations in neighbouring proteins are not sufficient classifiers. Our original goal was the interpretation of Gene Ontology categories and pathways for pSNV-associated clinical correlations; however no significant signals were identified. The network search approach was designed to find significant correlations using network information that would otherwise not be found using pre-defined gene sets or pathways.
2) Use of permutation tests to adjust for overtraining. Briefly, in our original manuscript we implemented a permutation test to account for the potential overtraining of our method. Using topologically identical networks with randomly reshuffled nodes, we estimated the expected signal of survivalassociated modules and used this as a null distribution to empirically evaluate the significance of modules observed in the true network. This enables us to only consider highly significant results. Please see our response to Comment 11 of Reviewer #2.

Manuscript changes
Added discussion regarding the limitations of predictive power to Results p14. While the output of our network search algorithm is not directly applicable for predicting clinical outcome, as it is challenged by infrequent mutations and small-world property of interaction networks, it is useful as an exploratory tool 11 that helps discover and interpret rare, specific mutations in signaling networks that are significantly correlated with clinical outcome.

Reviewer #2 -comment 5
Reviewer's comment 5) Given that both small-scale and large-scale datasets are used it is not surprising that the most highly studied cancer genes involved in signalling come out from the algorithm. The authors should probably repeat the analysis based on unbiased large-scale data only, in order to not bias the whole study towards cancer genes from the beginning.

Response
Because we use ActiveDriver to perform mutation enrichment analysis on each cancer type separately, we already consider the effects of large and small studies. In other words, the biased background of a small study does not affect the results coming from any other study and the resulting genes are presented in a cancertype specific way (Figure 2a). Moreover, ActiveDriver involves a gene-centric model in which phosphosite-specific mutation rate is contrasted to a gene-wide mutation rate. As no global (genome-or exome-wide) mutation rate is included in the model, every gene is also tested independently. The only possible bias would come from multiple testing procedures with FDR, as large-scale datasets involve more genes and p-values are subject to more stringent correction. However FDR is also applied in a cancer type-specific manner and excluding small-scale datasets does not impact this aspect. To study this further, we constrained our resulting gene lists to only those originating from large-scale studies (ovarian, pancreatic, glioblastoma, 18 genes and 389 samples) and tested the enrichment of known cancer genes (p=9.5x10-4; n=4), transcription factors (p=0.024, n=5), protein kinases (p=0.014, n=3), and drug targets (p=0.12, n=4). As we capture important genes by only analysing large-scale data, we suggest that there is little, if any, bias coming from the inclusion of small-scale studies.
In the revised manuscript, we have included a pan-cancer analysis by pooling together mutations from all nine datasets (as suggested by Reviewer #1 in comment 1, see Figure 2a). ActiveDriver analysis of the pooled dataset revealed 14 additional genes with significant phosphosite mutations that remained unseen in individual analyses. As 13 of these are novel candidate cancer genes, we suggest that the reviewer's concern regarding the bias towards known cancer genes is not well justified.

Reviewer #2 -comment 6
Reviewer's comment 6) As the authors consider SNVs that alter kinase domains (page 4) as pSNVs and there is a large class of drugs directed against kinases, it is not surprising that pSNVs are enriched in drug targets.

Response
This is a good point, but we had already considered this in the original analysis.
In the enrichment test of pSNVs among known drug targets, we only considered phosphosite-associated missense mutations (pSNVs) and disregarded mutations in kinase domains. We apologize for the confusion and have now clarified our statistical statement.

Manuscript changes
Clarified analysis in Results p5.
Finally, genes with phosphosite pSNVs are enriched in known drug targets from the DrugBank database (n = 109, p = 2.5 × 10−11, Knox et al., 2011). 12 Druggable genes with pSNVs cover nearly two thirds (284) of phospho-mutated cancer samples and 36% of all cancer samples, indicating the potential for therapy development targeting affected genes.

Reviewer #2 -comment 7
Reviewer's comment 7) Page 6: Could the authors describe in more detail what would be the effect of the pSNVs on the function filamin-B?

Response
Thank you for highlighting the lack of detail in this section. We have conducted a literature analysis of FLNB function and proposed some potential effects of FLNB pSNVs on tumor biology. FLNB is now discussed in a separate section dedicated to novel candidate cancer genes (GRM1, POU2F1).

Manuscript changes
Discussion on FLNB and other proposed candidates has been extended in Results p7-8.
Our findings also include genes whose cancer-specific roles are less recognized. The highest-ranking candidate cancer gene is filamin-B (FLNB, FDR p = 5.4 × 10−7 from ActiveDriver) that has been highlighted in the breast cancer genome project due to frequent mutations (Sjoblom et al., 2006). All four SNVs in breast cancer (A1565G, T703K, N663K, R566Q) alter phosphosite flanking regions, although evidence for phosphorylation events comes from large-scale screens and no targeting kinases are currently known. FLNB is an intracellular signaling protein involved in organization of actin cytoskeleton as well as skeletal and neuronal development (Lu et al., 2007). In particular, knockdown of FLNB has been shown to inhibit VEGF-induced angiogenesis (Del Valle-Perez et al., 2010), a hallmark of tumor cells. Further, knockdown of filamin genes has been shown to reduce cell migration in cancer cell lines (Baldassarre et al., 2009), and germline variants observed in human developmental disorders have been linked to gain-of-function phenotypes manifested in increased F-actin binding (Sawyer et al., 2009). We therefore speculate that the observed pSNVs may also act as gain-of-function mutations that enhance angiogenesis, invasion or metastasis of tumor cells. The G-protein coupled receptor GRM1 (glutamate receptor metabotropic 1) is identified in the merged dataset analysis of pSNVs due to two flanking pSNVs in glioblastoma (R684C) and colorectal cancer (R696W) (FDR p = 0.047 from ActiveDriver). The latter mutation directly flanks a phosphorylation site of protein kinase C alpha (PRKCA) at T695 that is involved in receptor desensitization in a feedback loop (Francesconi & Duvoisin, 2000). We propose that the pSNV disrupts inhibition of the receptor activity, leading to aberrant activation of tumorigenic processes such as growth, survival and proliferation via the down-stream phosphatidylinositol 3-kinase (PI3K) pathway. Overexpression of GRM1 was shown to induce melanoma in mouse models (Pollock et al., 2003), and GRM1 mutations were linked to melanoma in a human genetic association study (Ortiz et al., 2007).  (Segil et al., 1991). POU2F1 is highlighted in our model for two pSNVs in glioblastoma (R296Q) and in breast cancer (S111F). The latter mutation directly modifies a phosphosite targeted by PRKDC kinase, involved in promoting cell survival in response to DNA damage (Schild-Poulter et al., 2007). The detection of many known cancer genes validates our approach and the additional highlighted genes, some known to be druggable, provide novel hypotheses for detailed, functional experiments.

Reviewer #2 -comment 8
Reviewer's comment 8) Page 6: I would say G6PD is not a completely novel cancer driver. There are now several publications describing the role of G6PD in cancer. As G6PD is an enzyme in the pentose phosphate pathway (PPP) and the major source of NADPH and NADPH maintains the level of glutathione that helps protect against oxidative damage, this could be a mutation that is involved early in the cancer process, allowing by oxidative damage more mutations to occur. Not only that, in cancer cells primarily glycolysis is used for ATP production (Warburg effect), in this way cancer cells can direct glucose to biosynthesis. P53 normally suppresses the PPP by binding to G6PD, but P53 mutants lack this inhibitory activity (Jiang et al, Nat Cell Biol 2011). The G6PD phosphomutant could have the same effect. In order to strengthen this manuscript, this would need to be confirmed by an experiment studying the activity of the G6PD mutant enzyme.

Response
As G6PD has been linked to cancer in multiple publications, we have reduced the emphasis on this result.

Reviewer #2 -comment 9
Reviewer's comment 9) Page 7: the pathway analysis may reveal pathways that were not found by studying the overall occurrence of SNVs. However, the authors do not name any novel pathways or complexes nor novel explanations. They could therefore remove the whole paragraph.

Response
Our pathway analysis did identify some novel results which we have now further emphasized and clarified. 1. We provide detailed discussions of two protein complexes (HCF1, ASF1) whose phosphosite mutations appear as novel candidate cancer drivers (Results pp10-11). We have also visually highlighted drug targets as actionable findings (Figure 4b,c,d).
2. While our findings expectedly associate to hallmark cancer properties such as immune suppression, we do find novel aspects. Most pSNVs do not occur in canonical cancer genes/pathways and are therefore less understood. On the other hand, pSNVs that do appear in canonical cancer genes provide novel phosphorylation-associated interpretations. Most of our results appear as statistically significant combinations of rare mutations affecting the same system in multiple cancer types. Many of these mutations also occur in druggable genes, suggesting possible therapeutic use. We now discuss this in Results pp10-11.

Manuscript changes
Discuss novelty of rare pSNVs affecting components of the same underlying system, and the relation to potential drug therapies (Results pp10-11). Discuss two novel protein complexes with putative cancer driver function (Results pp10-11, Figure 4d). Highlight drug targets in detected pathways and complexes 15

Reviewer #2 -comment 11
Reviewer's comment 11) Heuristic network search. As stated above, I doubt that the statistics of this analysis has been performed correctly.

Response
We have revised the sections of our manuscript that describe our analysis to clarify the statistics (Results pp13-14, Methods p22-23) and hope that the reviewer is convinced that these have been performed correctly. We have also added a new figure as Figure 6c to illustrate the significance estimation procedure of the top survival-associated module using empirical p-values computed from random networks. In brief, to correct for the overtraining property of our network search method, we employ a permutation test that allows us to estimate the null distribution of survival correlation in a particular network neighbourhood. To assess the significance of a given module detected in the true network, we compare its survival correlation (the p-value from Cox regression) with the empirical distribution of Cox p-values gained from 10,000 searches in random networks (using the same search method). The significance of a given module is the fraction of p-values from 10,000 random searches that are lower than the module's p-value. As a particular network neighbourhood may result in multiple modules, empirical p-values are additionally corrected for multiple testing and only significant modules are reported (FDR p≤0.05). The random networks for permutation testing are constructed for every individual network neighbourhood (sub-network of paths of length two originating from a 16 central node). The topology of each random network is identical to the original network, but the network nodes (genes/proteins with associated pSNVs) are shuffled globally across the whole kinase-substrate network. Particular genespecific sets of mutations are also left intact. This means that permutation disrupts the topological proximity of survival-associated proteins, but retains the survival correlation of each individual protein.
Our permutation strategy controls for the bias introduced by highly interacting proteins (hubs), because hub-centered network neighbourhoods cover most of the global kinase-substrate network and permutations likely include many survivalassociated proteins, leading to high p-values even in random networks. The final modules we report are significant after considering all of these statistical factors.

Manuscript changes
Updated method descriptions in Results p10 and Methods p17. Added Figure 6c to illustrate estimation of significance with permutation testing. Reviewer #2 -minor comment 1 18

Reviewer's comment
This is an extremely well-written paper on a topic of critical importance, namely the understanding of how somatic mutations are a)selected, b)identified in the context of driver mutations, and c)perhaps most importantly -how these mutations are manifest in the functional signaling architecture. This linkage is most important because a vast majority of the targeted therapy pipeline are directed to kinase and protein enzyme inhibition-mostly phosphorylation driven.

Response
Thank you for these positive comments.

Reviewer's comment
The only significant weakness of the report, which is mainly an informatic analysis of public database sets is that the phosphoprotein/phosphosite databases are likely highly biased and is likely incomplete. This is because these databases are derived from a)large tumors whereby enough material was available for unbiased phosphoproteomic analysis, and thus small highly malignant tumors are not likely well-represented in these databases. b)contaminated with cellular compartments derived from stromal cells, infiltrating immune cells, tumor and non-tumor epithelium, endothelial cells, etc. Consequently, tying the somatic mutational load with phosphoproteins that may come from varied and unknown cellular compartments could be problematic. The authors need to address this issue and discuss how there results are not impacted by these issues, or if these issues limit the impact of their findings.

Response
This is a very good point. We indeed study a collection of public phosphoproteomic datasets originating from various healthy and diseased cell types and tissues. This represents the set of knowledge of as many phosphosites as we can collect. Admittedly, not all of these phosphosites are relevant to cancer. For instance, they may not be present in a given cancer type. Further, a cancer SNV that does not directly affect the phosphorylated residue has an unknown effect on the phosphosite -it could disable the site, strengthen it or have no effect. One advantage of our method is that it assigns a higher score to phosphorylation sites that are mutated multiple times in different samples and also mutated in multiple sites within the same protein. Thus we assume that the highest scoring proteins from our analysis are more likely to be cancer relevant due to the repeated and statistically significant mutation pattern. We also agree that analysis of the current cancer genomics studies is biased to certain types of tumors. Our method is general and now that it is developed, it can be automated and applied to new cancer genomics studies as they become available. We have now expanded our discussion to include these points.

Manuscript changes
Extended discussion on method limitations in Discussion p15: Second, our analysis treats all phosphosites equally, but phosphorylation is only functional in a cell if the appropriate signaling machinery is properly coexpressed and co-localized. Also, our phosphorylation site dataset is derived from a collection of publications, represents a mixture of different cell types, tissues and experimental conditions, and is not specific to the cancer types we study. Cancer-specific phosphoproteomics experiments will hopefully be available in the future to address this limitation. One advantage of our method is Acceptance letter 06 December 2012 Thank you again for sending us your revised manuscript. We have heard back from reviewer #1 and #2 who are satisfied with the modifications made and are now fully supportive. I am pleased to inform you that your paper has been accepted for publication.
Thank you very much for submitting your work to Molecular Systems Biology. Sincerely,

Editor Molecular Systems Biology
-------Reviewer #2 (Remarks to the Author): The manuscript has been greatly improved. The statistical methods are much better developed than in the previous version. Further improvements have been made by emphasizing more novelty and less known biological facts.