Robust genetic interactions in cancer are enriched in protein-protein interaction pairs

Genetic interactions, where the mutation of one gene can be associated with altered sensitivity to the inhibition of another gene, can be exploited for the development of targeted therapies in cancer. Many large-scale genetic perturbation screens in tumour cell lines have been used to identify such genetic interactions, including both synthetic lethal and resistance relationships. Despite these efforts, relatively few of the candidate genetic interactions identified have been reproduced across multiple studies. Here, we develop a computational approach to identify genetic interactions that can be reproduced across independent experiments and across non-overlapping cell line panels. We identified 220 such robust genetic interactions and found that they are enriched for gene pairs whose protein products physically interact. This suggests that protein-protein interaction networks may be used not only to understand the mechanistic basis of genetic interaction effects, but also to prioritise robust candidates for further development.


Introduction
Large scale tumour genome sequencing efforts have provided us with lists of driver genes that are recurrently altered in tumours 1 . In a small number of cases, these genetic alterations have been associated with altered sensitivity to targeted therapies in cancer. Examples of such targeted therapies already in clinical use include approaches that exploit oncogene addictions, such as the increased sensitivity of BRAF mutant melanomas to BRAF inhibitors 2 , and approaches that exploit non-oncogene addiction or synthetic lethality, such as the sensitivity of BRCA1/2 mutant ovarian or breast cancers to PARP inhibitors 3 . An ongoing challenge is to associate the presence of other driver gene alterations with sensitivity to existing therapeutic agents 4,5 or to identify additional proteins whose inhibition may provide therapeutic benefit to patients with specific mutations (i.e. new drug targets). Towards this end, multiple groups have performed large-scale loss-of-function genetic perturbation screens in panels of tumour cell lines to identify vulnerabilities that are associated with the presence or absence of specific driver gene mutations (i.e. genetic interactions) [6][7][8][9][10][11] . Others have performed screens in 'isogenic' cell line pairs that differ only by the presence of a specific oncogenic alteration 12,13 . Despite these large-scale efforts, very few genetic interactions have been identified in more than one study (recently reviewed 14 ). Even in the case of cancer driver genes subjected to multiple screens, such as KRAS, few genetic interactions have been identified in more than one screen 15,16 . This lack of reproducibility may be due to technical issues, e.g. false positives and false negatives due to inefficient gene targeting reagents 17 , and/or real biological issues, such as the context specificity of genetic interactions 14 . We refer to those genetic interactions that can be reproduced across multiple screens and across distinct cell line contexts as robust genetic interactions. Given that tumours exhibit considerable molecular heterogeneity both within and between patients there is a real need to: (i) identify robust genetic interactions that can be reproduced across heterogeneous cell line panels, reasoning that these reproducible effects will be more likely to be robust in the face of the molecular heterogeneity seen in human cancers; (ii) prioritise these robust genetic interactions for further therapeutic development; and (iii) understand the characteristics of robust genetic interactions in cancer as a means to predict new therapeutic targets.
To achieve this, we developed, and describe here, a computational approach that leverages large-scale cell line panel screens to identify those genetic interactions that can be reproducibly identified across multiple independent experiments. We found 220 such reproducible genetic interactions. In investigating the nature of these robust genetic interactions, we found that they are significantly enriched among gene pairs whose protein products physically interact. This suggests that incorporating prior knowledge of proteinprotein interactions may be a useful approach to guide the selection of reproducible "hits" from genetic screens as candidates worth considering as therapeutic targets in cancer.

A "discovery and validation" approach to the analysis of loss-of-function screens identifies reproducible genetic dependencies
We first wished to identify genetic interactions that could be independently reproduced across multiple distinct loss-of-function screens. To do this, we obtained gene sensitivity scores from four large-scale loss-of-function screens in panels of tumour cell lines, including two shRNA screens (DRIVE 6 , DEPMAP 7 ) and two CRISPR-Cas9 mutagenesis screens (AVANA 8 , SCORE 11 ). We harmonised the cell line names across all studies, so they could be compared with each other and also with genotypic data 4,5 . In total, 917 tumour cell lines were screened in at least one loss-of-function study. Only 50 of these cell lines were common to all four studies while 407 cell lines were included in only a single study ( Figure 1A). It is the partially overlapping nature of the screens that motivated the subsequent approach we took for our analysis. We used a 'discovery set' and 'validation set' approach to identifying genetic interactions across multiple screens -first identifying associations between driver gene alterations and gene inhibition sensitivity in the discovery study and then testing the discovered associations in the validation study ( Figure 1B). However, to ensure that any reproducibility observed was not merely due to cell lines common to both datasets, we first removed cell lines from the validation dataset if they were present in the discovery dataset. For example, when using DEPMAP as the discovery dataset and AVANA as the validation dataset, we performed the validation analysis on the subset of cell lines that were present in AVANA but not in DEPMAP. In doing so, we ensured that any genetic interactions discovered were reproducible across different screening platforms (either distinct gene inhibition approaches, i.e. shRNA vs CRISPR, or distinct shRNA/CRISPR libraries) and also robust to the molecular heterogeneity seen across different cell line panels. Similar to our previous work 9, 18 we integrated copy number profiles and exome sequencing data to annotate all cell lines according to whether or not they featured likely functional alterations in any one of a panel of cancer driver genes 1 (see methods, Supplementary Table  1). We then identified associations between driver gene alterations and sensitivity to the inhibition of specific genes using a multiple regression model that included tissue type as a covariate to reduce the possibility of confounding by tissue type (see Methods). We focused this analysis on 'selectively lethal' genes -i.e. those genes whose inhibition killed some, but not all cell lines (Methods, Supplementary Table 2). We analysed each pair of screens in turn and considered a genetic association to be reproducible if it was validated in at least one discovery/validation pair. Using this approach, we identified 229 reproducible genetic associations (Supplementary Table 3, Supplementary Figure 1).
Of these genetic associations nine were 'self vs. self' associations, where the alteration of a gene was associated with sensitivity to its own inhibition. The majority of these 'self vs. self' associations indicated oncogene addictions, e.g. BRAF mutant cell lines were sensitive to BRAF inhibition and ERBB2 amplified cell lines were sensitive to ERBB2 inhibition ( Figure  1A). Similarly, we also identified robust 'self vs. self' associations involving the CTNNB1 (β-Catenin), KRAS, NRAS, EGFR and PIK3CA oncogenes (Supplementary Figure 1B). However, we also identified two examples of 'self vs. self' dependencies involving tumour suppressors -TP53 (aka p53) and CDKN2A (aka p16/p14arf) (Supplementary Figure 1B). This type of relationship has previously been reported for TP53 -TP53 inhibition appears to offer a growth advantage to TP53 wild type cells but not to TP53 mutant cells 19 . Consequently, we observed an association between TP53 status and sensitivity to TP53 inhibition. Similar effects were seen for CDKN2A (Supplementary Figure 1C). These 'self vs. self' dependencies, in particular the oncogene addictions, serve as evidence that our approach could identify well characterised genetic associations. However, as our primary interest was in genetic interactions between different genes, we excluded 'self vs. self' interactions from further analysis, leaving us with 220 robust genetic interactions ( Figure 2A).

Many robust genetic interactions reflect known pathway structure
A number of the reproducible genetic interactions we identified have been previously reported, including both sensitivity relationships (such as increased sensitivity of PTEN mutant cell lines to inhibition of the phosphoinositide 3-kinase-coding gene PIK3CB 20 ) and resistance relationships, such as an increased resistance of TP53 mutant cell lines to MDM2 inhibition ( Figure 2A, Figure 2B).
Amongst the set of 220 robust genetic interactions, we identified two previously reported 'paralog lethalities' -synthetic lethal relationships between duplicate genes (paralogs) [21][22][23] ( Figure 2C). We found a robust association between mutation of the tumour suppressor ARID1A and sensitivity to inhibition of its paralog ARID1B 21 and also an association between mutation of SMARCA4 and sensitivity to inhibition of its paralog SMARCA2 22,23 . Both pairs of genes (ARID1A/ARID1B and SMARCA4/SMARCA2) encode components of the larger SWI/SNF complex 24 .
Some of the robust genetic dependencies could be readily interpreted using known pathway structures. For instance, many of the robust dependencies associated with the oncogene BRAF could be interpreted in terms of BRAF's role in the MAPK pathway. BRAF mutation was Robust genetic interactions in cancer associated with increased sensitivity to inhibition of its downstream effectors MEK (MAP2K1) and ERK (MAPK1), and increased resistance to inhibition of the alternative RAF isoform gene CRAF (RAF1) and the MAPK regulator PTPN11 ( Figure 3A, 3B) 25 . BRAF mutation was also associated with increased sensitivity to inhibition of PEA15, presumably a result of the requirement of PEA15 for ERK dimerisation and signalling activity 26,27 .
Mutation or deletion of the tumour suppressor RB1 (Retinoblastoma 1, Rb) was associated with increased sensitivity or resistance to inhibition of multiple Rb pathway members ( Figure  3C, 3D). We found that RB1 loss was reproducibly associated with resistance to inhibition of its negative regulators CDK4 and CDK6, consistent both with the known Rb pathway structure and with preclinical data suggesting that RB1 mutation confers resistance to CDK4/6 inhibitors 28,29 . Rb is a negative regulator of multiple E2F transcription factors, and we found that RB1 loss was reproducibly associated with increased sensitivity to both E2F1 and E2F3 inhibition ( Figure 3C, 3D). RB1 loss was also associated with robust sensitivity to SKP2, a binding partner of Rb 30 first identified as an RB1 synthetic lethal partner in retinoblastoma 31 and more recently in as a highly penetrant RB1 synthetic lethal partner in triple negative breast cancer 32 ( Figure 3C, 3D). Finally, RB1 loss was reproducibly associated with increased sensitivity to inhibition of Cyclin Dependent Kinase 2 (CDK2), suggesting that it may be a useful biomarker for CDK2-specific inhibitors 33 .

Robust genetic interactions are enriched in protein-protein interactions
In seeking to understand what particular characteristics robust genetic interactions might have, we noted that many of the robust genetic interactions we identified involved gene pairs whose protein products operate in the same pathway (e.g. the Rb pathway) or protein complex (e.g. SWI/SNF) suggesting that genetic interactions between gene pairs whose protein products physically interact may be more robust than other genetic interactions. To test this hypothesis, we compared the robust genetic interactions we identified with protein-protein interactions from the STRING protein-protein interaction database 34 . We found that, when considering the set of all gene pairs tested, gene pairs whose protein products physically interact were more likely to be identified as significant genetic interactors in at least one dataset ( Figure 4A) (Odds Ratio (OR) = 4.0, p<2x10 -16 , Fisher's Exact test). Furthermore, of the genetic interactions identified as significant in at least one dataset, those that are supported by a protein-protein interaction were significantly more likely to be reproduced in a second dataset ( Figure 4A) (OR = 3.9, p<1x10 -13 ). We therefore concluded that protein-protein interaction pairs are more likely to be significant hits in one dataset and even more likely to be reproduced across multiple datasets, suggesting this might be a feature of robust synthetic lethal effects.
We noted that a large number (n = 132) of robust genetic interactions involved TP53, presumably as a result of the high number of TP53 mutant tumour cell lines in the datasets (and its high mutation frequency in human cancer) and the associated increased statistical power to detect TP53-related genetic interactions. We therefore considered whether the significant number of TP53-related genetic interactions in our dataset could confound our analyses, especially as TP53 is also associated with a disproportionately high number of protein interactions (>1700 medium confidence interactors in the STRING database alone, compared to a median of 37 medium confidence interactions across all proteins). However, even after excluding genetic interactions involving TP53, the observation that robust genetic interactions were enriched in protein-protein interaction pairs was still evident ( Figure 4B); known protein interaction pairs were more likely to be identified as significant genetic interactions in one screen (OR = 3.8, p<2x10 -16 ) and among the significant genetic interactions discovered in one screen, those involving protein-protein interaction pairs were more likely to be reproduced in a validation screen (OR = 9.3, p<2x10 -16 ). The same effects were observed when considering genetic interactions observed at different false-discovery rate (FDR) thresholds ( Supplementary Figure 2A and 2B) and using different sources of protein-protein interaction data (Supplementary Figure 2C and 2D, Supplementary Table 4) 35,36 .
The increased reproducibility of genetic interactions associated with protein-protein interactions across different genetic perturbation screen datasets could have two distinct causes -increased reproducibility across distinct technologies or libraries (e.g. CRISPR/shRNA) or increased reproducibility/robustness of genetic interactions in cell line panels with distinct molecular backgrounds. To test the former possibility, we repeated our discovery/validation approach but focused on the set of cell lines that were common to different genetic perturbation screen datasets. Using this approach, the molecular backgrounds (i.e. cell lines) tested were the same, but the screening approach or library used differed. Upon doing this, we found that genetic interactions between gene pairs whose protein products physically interact were significantly more reproducible across studies ( Figure 3C, OR=6.1 and p<2x10 -10 when compared to discovered genetic interactions) (Supplementary Table 4). To test reproducibility using the same screening approach across molecularly distinct cell lines, we artificially split individual datasets into non-overlapping discovery and validation sets of cell lines. Again, we found that genetic interactions between gene pairs whose protein products physically interact were more reproducible across distinct cell line panels ( Figure 3D, OR=8.0 and p<1x10 -12 when compared to discovered genetic interactions) (Supplementary Table 4). We therefore concluded that genetic interactions supported by protein-protein interactions were more reproducible across different screening approaches and across distinct cell line contexts, suggesting that these interactions are overall, more robust.
The robust synthetic lethal interactions identified make more promising starting points for the development of targeted therapeutics than interactions reported in only specific cell line panels or with specific gene inhibition reagents. Three promising synthetic lethalities that are supported by a protein-protein interaction, but have not to our knowledge previously been validated, are highlighted in Figure 5. Mutation or deletion of the tumour suppressor kinase STK11, also known as LKB1, is associated with increased sensitivity to inhibition of its substrate kinase MARK2 37 . We have previously observed this synthetic lethality by analysing an siRNA screen of ovarian cancer cell lines 9 . That the association between STK11 loss and MARK2 sensitivity is observed across three distinct gene inhibition modalities (siRNA, shRNA, CRISPR) and across distinct cell line panels suggests that it is a robust synthetic lethality. We also found that loss of the tumour suppressor SMARCA4 was associated with increased sensitivity to inhibition of the Bromodomain PHD Finger Transcription Factor BPTF. Recent work has identified an increased sensitivity of SMARCA4 mutant tumour models to bromodomain inhibitors, although it is not yet clear which individual bromodomain proteins or combination of proteins is responsible for the observed effects 38 . Finally, we found a robust association between EP300 mutation and sensitivity to inhibition of its paralog CREBBP. To our knowledge this synthetic lethality has not previously been reported. However, work in lung and hematopoietic tumour models has demonstrated the reciprocal synthetic lethality -CREBBP mutation is associated with increased sensitivity to inhibition of EP300 39 . As these three interactions were reproduced across multiple screens and also supported by proteinprotein interaction evidence, they would appear to be strong candidates for further study.

Prioritising robust synthetic lethal interactions from chemogenetic screens
As an alternative to genetic perturbation screening in large cell line panels, genetic interactions can also be identified using chemogenetic screens, where loss-of-function screens are performed in the presence and absence of specific small molecule inhibitors whose targets are relatively well defined. Based on the observations made earlier, we hypothesised that genetic interactions identified in chemogenetic screens that involved genes whose protein products physically interact with the target of the inhibitor should both be more likely to be identified as genetic interaction partners in one screen and also more likely to be reproduced across multiple screens (i.e. to be more robust). To test this hypothesis, we analysed the results of a recent chemogenetic screen performed to identify genes whose loss is synthetic lethal with ATR inhibition 40 . In this study, genome-wide CRISPR-Cas9 screens in three cell lines from different histologies (breast, kidney, colon) were used to identify genes whose inhibition is selectively essential in the presence of a small molecule ATR kinase inhibitor 40,41 ( Figure 6A).
As predicted, we found that protein interaction partners of ATR are more likely than random genes to be identified as a significant synthetic lethal interactor of ATR in at least one cell line ( Figure 6B). Furthermore, we found that among the synthetic lethal interactions identified in at least one cell line, those involving known ATR protein interaction partners were significantly more likely to be reproduced in a second or even third cell line ( Figure 6B). This suggests that, of the candidate genes identified in one screen, those that encode protein-protein interaction partners of ATR are significantly more likely to validate in additional contexts than genes with no known functional relationship to ATR.

Discussion
Here, we have developed an approach to identify robust genetic interactions that are reproduced across multiple screening libraries and across molecularly distinct cell line panels. We identified a set of 220 robust genetic interactions and found that these robust genetic interactions are enriched among gene pairs whose protein products physically interact, suggesting a means by which we might prioritise the most promising candidates for follow on studies.
We do not claim that our set of robust genetic interactions is comprehensive, as there are many reasons that real robust genetic interactions may not be identified by our approach. There are many driver genes that we have not included in our analysis because they are infrequently mutated in the datasets studied. Consequently, we can report no interactions for these genes. We have also focussed only on identifying interactions associated with mutation or copy number changes to cancer driver genes. There are likely to be dependencies associated with altered gene/protein expression of driver genes that will be missed by this approach. There are also likely to be dependencies associated with the alteration of non-driver genes that we have missed (e.g. passenger gene deletions 42 ). Furthermore, for the genes that we do analyse, it is likely that some real interactions are not detected due to a lack of statistical power. Finally, of the dependencies identified in a discovery screen but absent in a validation screen, false negatives due to reagents with poor gene targeting ability likely play a significant role 17 .
We have exclusively focussed on identifying dependencies that are evident across panels of cell lines from multiple cancer types ('pan-cancer dependencies'). It is likely that there are robust dependencies only evident within specific cancer types, but it is difficult to use our approach to identify them due to the restricted number of cell lines available for each cancer type. Even with a relatively common mutation (e.g. KRAS mutation in non-small cell lung cancer) it is challenging to partition the available cell lines into distinct discovery and validation sets while maintaining statistical power to identify potential dependencies. This issue may be alleviated by efforts to create large numbers of new tumour cell lines 43 or through using isogenic models for discovery and cell line panels for validation 14 .
Our results suggest that knowledge of protein-protein interactions could be used to improve the design and analysis of loss-of-function screens for synthetic lethal interactions. One option would be to screen target libraries for specific driver genes based on their known protein interaction partners. Alternatively, for the analysis of unbiased screening libraries, proteinprotein interactions could be used as a covariate for hypothesis weighted multiple testing correction approaches 44 . Regardless of the approach used to identify candidate synthetic lethal interactions in a large-scale screen, our results suggest that candidates supported by a protein-protein interaction should be prioritised for follow on study.
Previous work has shown that genetic interactions between gene pairs whose protein products physically interact are more highly conserved across species [45][46][47] . Our analysis here suggests that the same principles may be used to identify genetic interactions conserved across genetically heterogeneous tumour cell lines. Although we have not tested them here, other features predictive of between-species conservation may also be predictive of robustness to genetic heterogeneity 45,47 . Our set of robust genetic interactions may serve as the starting point for such analyses and may also serve as a training set for computational approaches to predict synthetic lethality 48 .

Methods
All data analysis was performed using Python 3.7, Pandas 0.24 49 and StatsModels 0.9.0 50 .

Loss of function screens
Different scoring systems have been developed for calculating 'gene level' dependency scores from loss-of-function screens performed with multiple gene targeting reagents per gene (i.e. shRNAs or gRNAs). For the analysis of all loss-of-function screens we used the original authors' own preferred approaches. CEREs dependency scores 8 for AVANA (release 18Q4) were obtained from the DepMap portal (https://depmap.org/portal/download/), while DEMETER v2 gene dependency scores for the DEPMAP shRNA screen 7 were obtained from the same resource. For the DEPMAP screen, some genes were only screened in a subset of cell lines and these were excluded from all analyses. Quantile normalized CRISPRcleaned 51 depletion log fold changes for Project SCORE 11 were obtained from the Project SCORE database (https://score.depmap.sanger.ac.uk/). ATARIS 52 scores for the DRIVE dataset 6 were obtained from the authors. 28 of the 398 cell lines screened in DRIVE had missing gene scores for ~25% of genes screened and these cell lines were excluded from further analysis. All screens were mapped to a common cell line name format (that followed by the Cancer Cell Line Encyclopaedia 5 ) using the Cell Model Passports resource where appropriate 53 .

Identifying selectively lethal genes
Similar to previous work 6,7 , to reduce the burden of multiple hypothesis testing we focused our analysis on genes whose inhibition appeared to cause growth defects in subsets of the cancer cell lines screened. That is, rather than testing for associations with genes whose inhibition was always lethal or never lethal, we focused our analyses on genes that could be associated with distinct sensitive and resistant cell line cohorts. We first identified a set of 'selectively lethal' genes using the Avana dataset 8 -those with a gene dependency score <-0.6 in at least 10 cell lines but no more than 259 cell lines (half of the screened cell lines). We augmented this with a list of 65 'outlier genes' identified by the authors of the DRIVE study as having a skewed distribution suggesting distinct sensitive and resistant cohorts 6 . Finally from the combined list we removed genes known to be commonly essential in cancer cell lines 54 . This resulted in a set of 2,470 selectively lethal genes (Supplementary Table 1) which were used for all association analyses.

Identifying driver gene alterations from copy number and exome profiling
For all cell lines we obtained sequencing data (CCLE_DepMap_18q3_maf_20180718.txt) and copy number profiles (public_18Q3_gene_cn_v2.csv) from the DepMap portal. These datasets contain integrated genotyping data from both the Cancer Cell Line Encyclopedia and GDSC resources 4,5,55 . We used this to identify likely functional alterations in a panel of cancer driver genes 1 restricting our analysis to those genes that were subject to targeted sequencing as part of the Cancer Cell Line Encyclopedia 5 .
For most oncogenes we considered the gene to be functionally altered if it contained a protein altering mutation at a residue that is recurrently altered in either the COSMIC database or the Cancer Genome Atlas. For a small number of oncogenes (ERBB2, CCND1, MDM2, MDM4, MYC and MYCN) we considered them to be functionally altered only if they were amplified. For all tumour suppressors we considered loss of function mutations (e.g.premature truncations), recurrent missense mutations, and homozygous deletions to be functional alterations. The matrix of functional alterations is presented in Supplementary Table 2.

Identifying genetic dependencies in individual datasets
We wished to identify associations between driver gene mutations and gene sensitivity scores that were not confounded by tissue specific gene sensitivity effects (e.g. SOX10 sensitivity scores can be naively associated with BRAF mutational status because SOX10 is essential in melanoma cell lines and BRAF mutation is common in melanoma). Thus, we wished to model gene sensitivity after first accounting for tissue type. To this end, associations between individual driver genes and gene sensitivity scores were identified using an ANOVA model that incorporated both tissue type and mutational status as covariates, similar to the method previously developed for identifying pharmacogenomic interactions in cancer cell line panels 4,56 . As recent work 11 has highlighted that some dependencies (e.g. WRN) can be associated with microsatellite instability rather than individual driver genes, we also incorporated microsatellite instability 57 as a covariate in our model. The model had the form 'gene_X_sensitivity ~ MSI_status + C(Tissue) + driver_gene_Y_status' and was used to test the association between each recurrently mutated driver gene Y and all gene sensitivity scores X assayed in a given dataset. Driver genes were included in this analysis if they were functionally altered in at least five cell lines in the dataset being analysed. Correction for multiple hypothesis testing was performed using the Benjamini and Hochberg false discovery rate 58 .

Identifying genetic dependencies common to multiple datasets
When comparing a pair of datasets, we used one dataset as a discovery dataset and a second as a validation set, as outlined in Figure 1B. The discovery analysis was limited to the set of interactions that could be tested in both datasets, i.e. associations between the set of sensitivity scores for genes screened in both studies and the set of driver genes recurrently altered in both studies. An initial set of genetic interactions was identified in the discovery dataset at a specific FDR threshold and these associations were then tested in the validation set. We considered interactions to be reproduced in the validation dataset if: (1) the FDR was less than the threshold; (2) the uncorrected p-value was < 0.05 and; (3) the sign of the association (sensitivity / resistance) was the same in both discovery and validation set. A FDR of 0.2 was used for all analysis presented in the main text but additional FDR thresholds (0.1, 0.3) were tested to ensure that all findings were robust to the exact choice of FDR (Supplementary Figure 2).

Protein-protein interactions
Protein-protein interactions were obtained from STRING v10.5 34 , BIOGRID 3.5.170 35       The groups represent all gene pairs tested, gene pairs found to be significantly interacting in at least one screen (FDR < 20%), and gene pairs found to reproducibly interact across multiple screens (i.e. a discovery and validation screen). Stars (*) indicate significant differences between groups, all significant at P<0.001 using Fisher's Exact Test. Odds ratios and p-values are provided in Supplementary Table 4. B) As A but with interactions associated with TP53 removed. C) As B but here the discovery and validation sets contain the same cell lines screened in different studies (e.g. 'AVANA ∩ DEPMAP' as discovery and 'DEPMAP ∩ AVANA' as validation). Consequently, reproducibility here means 'technical reproducibility' using different screening platforms. D) Similar to B but here the discovery and validation sets contain single datasets partitioned into non-overlapping cell line sets (e.g. 'AVANA \ DEPMAP' as discovery and 'AVANA ∩ DEPMAP' as validation). Consequently, reproducibility here means 'genetic robustness' -the same association between gene pairs is observed across distinct genetic backgrounds. Figure 5. Novel synthetic lethal relationships between gene pairs whose protein products interact. Boxplots are shown for selected reproducible synthetic lethal interactions between gene pairs whose protein products physically interact. Top row shows discovery dataset, bottom row shows the validation results. Boxplots showing tumour suppressor genes whose inhibition provides a growth advantage to cells that have no genetic alteration of those genes.

Supplementary Figure 2. Reproducible genetic interactions are enriched in proteinprotein interaction pairs at different thresholds and using different databases A)
Barchart showing the percentage of protein-protein interacting pairs observed among different groups of gene pairs. The groups represent all gene pairs tested, gene pairs found to be significantly interacting in at least one screen (FDR < 10%), and gene pairs found to reproducibly interact across multiple screens (i.e. a discovery and validation screen). Stars (*) indicate significant differences between groups, all significant at P<0.001 using Fisher's Exact Test. Due to the high percentage of protein-protein interaction pairs among the reproducible hits at this FDR, the y-axis uses a different maximum value to all other charts. B) Same as A but with interactions identified at an FDR of 30% C) Similar to main text figure 4B but here the protein-protein interaction pairs are obtained from the HIPPIE database D) Similar to main text figure 4B but here the protein-protein interaction pairs are obtained from the BioGRID database.
Step 1. Identify genetic associations using all cell lines in the discovery dataset Output : discovered associations Step 2. Test discovered genetic associations using cell lines in the validation dataset but Step 3. Repeat steps 1 and 2 for additional dataset pairs