Drugging the R-loop interactome: RNA-DNA hybrid binding proteins as targets for cancer therapy ⋆

Unravelling the origin of genetic alterations from point mutations to chromosomal rearrangements was greatly enhanced by the discovery of RNA-DNA hybrids (R-loops) that behave as hotspots of genomic instability in a variety of organisms. Current models suggest that uncontrolled R-loops are a hazard to genome integrity, therefore, identifying proteins that are involved in recognising and signalling R-loop structures are of key importance. Herein we analysed key RNA-DNA hybrid binding proteins in humans taking advantage of large-scale gene expression, survival rate, and drug-sensitivity data from cancer genomics databases. We show that expression of RNA-DNA hybrid binding proteins in various cancer types is associated with survival and may have contrasting outcomes in responding to therapeutic treatments. Based on the revealed pharmacogenomic land-scape of human RNA-DNA hybrid binding proteins, we propose that R-loops and R-loop binding proteins are potentially relevant new epigenetic markers and therapeutic targets in multiple cancers.


Introduction
R-loops are special, three-stranded nucleic acid structures, composed of an RNA-DNA hybrid and a non-template, single-stranded DNA. R-loops have been implicated in a number of human diseases including repeat-expansion disorders, neurological syndromes, and cancer [1][2][3]. The molecular symptoms of cancer resemble the genome instability phenotype of human cell lines that accumulate R-loops and undergo replication/transcriptional stress-induced DNA damage [4]. Oncogenic mutation of HRAS V12 , for instance, has been shown to increase the protein levels of endogenous RNaseH1 (the enzyme that specifically degrades RNA-DNA hybrids) [5], which further supports a mechanistic link between R-loop formation and tumorigenesis.
There is evidence that R-loops are targetable by anticancer drugs to revert pathological phenotype [24,25]. For instance, in synovial and Ewing sarcoma (SS/ES), the use of clinical ATR inhibitors (ATRi) led to the accumulation of R-loops and increased sensitivity to chemotherapy [26,27]. PARP inhibitors augmented the antitumor activity of ATRi of SS cells, suggesting that combination therapies using ATRi are promising new approaches to treat sarcomas [26]. The anti-tumour drugs trabectedin and lurbinectedin have been shown to induce replicative stress and cell death in an R-loop dependent manner [28]. Using a cell line system where R-loops were stabilized by depleting THOC1 or BRCA2, cells with increased R-loop levels were more sensitive to trabectedin treatment. Consistently, cancers that accumulate R-loops in the absence of BRCA1/2 or Fanconi anaemia proteins show higher sensitivity to trabectedin therapy [29,30].
The EWS-FLI fusion protein has been shown to increase R-loop formation and inactivates BRCA1 that makes Ewing sarcoma cell lines hypersensitive to genotoxic drugs such as etoposide, camptothecin, and PARP inhibitors [31]. Since mutations in EWSR1 and its homologues are associated with several therapeutically challenging cancers, clastogenic agents that augment R-loop mediated stress could be administered as potentially effective co-therapeutic treatments in various tumours that are associated with EWSR1 mutations.
Specific G4 ligands (PDS, Braco-19, and FG) induce R-loop-mediated DNA damage and cell death in human cancer cells [18], establishing a link between the toxic effects of G4 ligands and R-loop formation. CX-5461 is another G4 ligand showing specific toxicity against BRCA-deficient cancer cells and patient-derived xenografts [16]. Repair of DNA damage induced by CX-5461 required BRCA and NHEJ repair pathways. CX-5461 is now in an advanced phase I clinical trial for patients with BRCA1/2 deficient tumours (NCT02719977).
EZM2302 is a selective inhibitor of the histone arginine methylase CARM1, an enzyme that affects R-loop homeostasis by recruiting TOP3B to RNA-DNA hybrid structures [32]. EZM2302 exhibits antiproliferative effects both in vitro and in vivo [33]. GSK3368715 is a specific inhibitor of the PRMT1 histone arginine methyltransferase that has also been linked to R-loop metabolism [32] and being developed for the treatment of diffuse large B cell lymphoma and solid tumours (phase I clinical trial; NCT03666988).
Several other compounds have also been described to increase Rloop levels, including topoisomerase 1 inhibitors [34] (promoting Rloops by modulating the superhelicity of DNA), spliceosome [35][36][37][38] inhibitors (promoting R-loops through the retention of intronic sequences), and reactive aldehydes [39]. Furthermore, the RNA-DNA degrading enzyme RNase H2 has been recognised as a putative anticancer drug target [40]. With the recently identified RNase H2 inhibitors [41], RNaseH2 might serve as an effective cancer target to stabilize RNA-DNA hybrids. Finally, a series of anticancer sulphonamides have been shown to induce proteasomal degradation of the U2AF-related splicing factor coactivator of activating protein-1 in human cancer cell lines [37,42], designating targeted protein degradation by E3 ubiquitin ligases as a potent drug target for selective inactivation of splicing factors to increase R-loop levels for cancer therapy.
In the current study we aimed to investigate whether key R-loopbinding proteins are associated with cancer survival and drug sensitivity. We performed a systematic pharmacogenomic analysis to identify these associations that may suggest that R-loop formation processes in cancer cells could be exploited as biomarkers, therapeutic targets, as well as be used to sensitize certain tumours to chemotherapeutic treatments. Our results offer new avenues for epigenetic therapies that are based on modifying R-loop levels in tumours.

Tumour types included in the analysis
We included in our analysis 33 primary cancer types available from the Cancer Genome Atlas (TCGA) [43]. These were:

Gene expression and survival data analysis
Gene expression data (RNA-sequencing) from the tumour samples and overall survival time data were downloaded from TCGA project [43,44]. Healthy tissue gene expressions were obtained from the website of the Genotype-Tissue Expression (GTEx) project [45]. Survival analysis was performed and Kaplan-Meier plots were made using the "survival" package of the R software. The level of significance was p < 0.05. All p-values and adjusted p-values -corrected for multiple testing using the Benjamini-Hochberg method -are included in Table  S1.

Cell line analyses
IC50 values of cancer cell lines upon treatment with 276 anticancer drugs were obtained from the Genomics of Drug Sensitivity in Cancer (GDSC) database as described and analysed previously [46]. We also downloaded additional cell line and drug information from the web page of the GDSC project [47]. Drug sensitivity and gene expression associations of cancer cell lines were presented as the Spearman correlation of IC50 values and mRNA expression scores of the R-loop genes analysed. We considered both negative and positive correlations with p-  Table S1_3. values < 0.05. All p-values and adjusted p-values using Benjamini-Hochberg correction are included in Table S1.

Results and discussion
To examine the relationship between R-loop genes and cancer, we first determined how many of the recently identified RNA-DNA hybrid binding proteins [20] are represented in the registry of cancer genes [75] representing the largest repository of 2372 genes whose somatic mutations have cancer driver roles. Of the 448 R-loop genes examined, 92 were cancer genes (Table S1_1) showing statistically significant enrichment (p = 0.005792; two prop z test). Next, we selected a core set of R-loop genes for a detailed analysis of tumorigenesis (n = 36), representing prominent molecular pathways with clear R-loop association that are also implicated in the formation of cancer (e.g. DNA topology; RNA-DNA hybrid ribonucleases and helicases; RNA processing, splicing and export; DNA damage; chromatin modifications; Table  S1_2). The mRNA expression of these R-loop genes was extracted from the Cancer Genome Atlas (TCGA) database, allowing us to identify differences in the gene expression signatures of healthy tissues and primary tumours (Fig. 1). For instance, expression of BUB3, DHX9, PRMT1, THOC4, THOC7, U2AF1 and ZNF207 (BUGZ) was increased in several primary tumours compared to normal tissues, while SRSF1 (ASF/SF2) was downregulated in most cancers (except for acute myeloid leukaemia; LAML). Next, we asked whether R-loop gene expression levels correlate with survival rate of cancer patients. Taking advantage of gene expression (RNA-seq) and overall survival (OS) data from the TCGA, we generated 12,862 Kaplan-Meier survival curves in our analysis and identified numerous cases showing significant survival association with R-loop gene expression levels (179 at p < 0.05; 33 at FDR < 0.05; Fig. 2, Supplementary Fig. S1, and Table S1_3). In 70% of cases (123 at p < 0.05), low expression of R-loop genes was found to be associated with prolonged survival of cancer patients. Low expression of RNASEH2A, THOC6, PRMT1, and PIF1, for instance, significantly lengthened survival time in mesothelioma (MESO), while low FANCM mRNA level was advantageous for breast cancer survival (BRCA; (representative survival plots of low-and high-expressor groups are shown in Fig. 3). In 30% of cases (52 at p < 0.05), high expression of R-loop genes was detected and associated with better survival; for example, TREX1 and BUB3 were beneficial for cervical squamous cell carcinoma and endocervical adenocarcinoma survival (CESC; Fig. 3 and Table S1_3). In the case of 10 R-loop genes, long-term survival was observed exclusively in the low-expressor group of patients, irrespective of cancer type (ATXN2 (1), BRCA2 (5), CARM1 (6), DDX19A (2), RNASEH1 (6), THOC2 (4), THOC3 (4), TOP1 (5), U2AF1 (2), ZNF207 (3). For the remaining 26 R-loop genes, high or low expressing groups varied by cancer type (Supplementary Fig. S2). Most survival associations were observed for adrenocortical carcinoma (19 genes) and mesothelioma (18 genes) (Fig. 2B) correlated with low expression of Rloop genes (Table S1_3). According to the number of survival associations (Fig. 2B), RNASEH2A, BLM, BRCA1, BUB3, and PIF1 appears to be generally important for survival in multiple cancers, while the effects of ATXN2, DDX19A and U2AF1 are limited to specific types of cancer.
To show that our approach can identify real relationships, we included an oncogene (ERBB2 -HER2) and a tumour suppressor (TP53) in our analysis that are known positive controls for cancer association, with no established R-loop function (Fig. 2, highlighted in blue). R-loop gene expressions were associated with survival in a similar number of cancers (or more in a few cases) as ERBB2 and TP53, which demonstrates the relevance of our observations on RNA-DNA hybrid binding proteins.
In order to identify R-loop genes that might serve as potential drug targets for therapeutic intervention, we sought to find synergistic interactions between the mRNA expression status of R-loop genes in a large collection of cancer cell lines and sensitivity to chemotherapeutics approved by the US Food and Drug Administration (FDA). Pharmacogenomic (RNA-seq and drug sensitivity) data for 267 anticancer compounds across 1065 cancer cell lines were extracted from the GDSC database [47], covering a wide range of biological pathways including protein kinase signalling, cytoskeleton, DNA replication, DNA repair, and cell cycle control. The total number of significant interactions showed considerable variations in the number of drug-specific associations (22,414 at p < 0.05; 508 at FDR < 0.05; range: 70-230 interactions per drug; Supplementary Fig. S3 and Table S1_4) and in the molecular pathways/drug targets involved (Fig. 4). Highly represented pathways included the Ser/Thr protein kinase pathway and PI3K/RTK/MAPK signalling (Fig. 4, left panel), while the most common (top10) drug targets were MEK1/2, BRAF, PARP1/2, HSP90, AKT1/3, GSKs, PI3Ks, IGF1R, ROCK1/3, and EGFR (Fig. 4, right  panel). We found significant drug sensitivity associations in 80% of the studied R-loop genes (29 at p < 0.05; except for BRCA2, BUGZ, DDX19, RNASEH1, RTEL1, THOC3, and THOC4; Table S1_4), of which CARM1, EWSR1, DHX9, and THOC1 showed the highest number of drug interactions (Fig. 5; Table S1_4). This highlights the importance of considering R-loop gene mRNA expression levels as these may affect drug response. However, we observed significant variability in the number of drug interactions between various cancer cells lines. For   Fig. 4. Drug sensitivity associations of R-loop genes grouped by molecular pathways and drug targets. Left: Distribution of drug interactions over molecular pathways. Green and red colours indicate higher or lower drug sensitivity, respectively, associated with the mRNA expression levels of R-loop genes. The level of significance is p < 0.05. The number of significant associations is 22,414. Right: Statistical representation of drug targets showing significant drug sensitivity interactions with the R-loop genes analysed. Circle sizes are proportional to drug target frequencies.
example, lung small cell carcinoma and ovarian cancer cells were sensitive to most of the compounds tested, but B-cell leukaemia, Hodgkin's lymphoma, head and neck cancer, and Ewing sarcoma cells exhibited significantly lower drug efficacy ( Supplementary Fig. S4, green bars). These differences indicate that RNA-DNA hybrid binding proteins may have contrasting outcomes in responding to various chemotherapeutic treatments depending on genomic context/cancer type. For example, in many cancer cell lines, high expression of TREX was associated with high IC 50 values (half maximal inhibitory concentration) for most drug treatments indicating worse efficacy (i.e. higher drug concentration achieving half-maximal response) when TREX is overexpressed. This is entirely consistent with a recent study demonstrating that reducing the level of TREX1 leads to improved sensitivity of glioma and melanoma cells to the anticancer drugs topotecan, nimustine, and fotemustine [76]. Chondrosarcoma, lymphoblastic T cell leukaemia, biliary tract cancer, breast cancer, and pancreatic cancer cell lines showed exceptionally high IC 50 values (resistance) to drug treatments in the case of increased BUB3 expression. In contrast, significant negative correlations were observed between BUB3, EWSR1, and SETX expression levels and IC 50 values in most (but not all) cancer cell lines, i.e., overexpression of these R-loop genes typically made cancer cells more sensitive to drug treatment. These results collectively indicate that targeting TREX, BUB3, SETX, and EWSR to reduce or increase their expression levels in the above tumours may help increase the efficacy of cancer chemotherapeutics.
Since cancer cell lines -deriving from natural tumours -recapitulate the genomic context and tissue type of primary cancers [47,77], we narrowed down the identified drug interactions to the fraction of R-loop genes that showed significant survival associations in the matching primary cancer (drug and survival data in non-matching cell types were excluded from further analysis; Fig. 6). Collectively, we identified 1,630 significant survival & drug associations (at p < 0.05) related to the expression of 29 R-loop genes (Table S1_4; seven R-loop genes were omitted because there was no survival or drug association, or the primary tumour could not be parsed with the corresponding tumour cell line). We observed the following trends: i) most survival interactions (top10) were related to BLM, RNASEH2A, ATXN1, BRCA1, BUB3, CARM1, GADD45A, FANCD2, THOC1, and THOC2 associated with various cancers (Fig. 6). ii) high expression of THOC2 in sarcoma cell lines was associated with a high IC 50 value to CX-5461 (G4 ligand/RNA polymerase I inhibitor), suggesting that CX-5461 may be less effective for the treatment of sarcomas showing high THOC2 expression. On the other hand, high expression of PIF1, FANCD2 and BRCA1 was associated with low IC 50 (sensitivity) to CX-5461 and low survival in mesothelioma (Table S1_4). Somewhat differently, low IC 50 to CX-5461 was associated with high expression of ATXN1 and TREX1 and increased survival in melanoma and endometrial cancer, respectively (Table S1_4). iii) BRCA1 expression in oesophageal carcinoma cell lines were associated with high IC 50 (resistance) to 5-fluorouracil (5-FU), while low BRCA1 level was associated with better survival in oesophageal carcinoma. It follows that 5-FU treatment may be efficient for the therapy of BRCA1(-) oesophageal carcinomas. Similarly, BRCA1 expression in glioma cell lines was associated with reduced efficacy to (52)-7-oxozeaenol, GDC0941, refametimib, and selumetinib, while low BRCA1 level was beneficial for the survival of brain lower grade glioma. Moreover, mesothelioma cell lines are resistant to doxorubicin, or OSU-03012 or thapsigargin when expressing BRCA1. These drugs are potentially effective chemotherapeutics for BRCA(-) gliomas and mesotheliomas. Importantly, 5-FU is being tested in a clinical trial related to oesophageal carcinoma (NCT00052910), selumetinib is being investigated in low-grade glioma (NCT01089101), and doxorubicin is evaluated in patients with mesothelioma (NCT00634205). iv) Most drug/gene associations were observed for RDEA119, selumetinib, and olaparib with most cell lines showing resistance to RDEA119 and selumetinib, and sensitivity to olaparib associated showing high R-loop gene expression levels ( Supplementary Fig.  S3). For instance, Selumetinib treatment was ineffective in stomach adenocarcinoma cell lines overexpressing AQR (NCT02448290 -  . vii) In glioma cell lines, low CARM1 expression was associated with better efficacy to SN38 treatment (type I topoisomerase inhibitor, active metabolite of irinotecan), while glioma patients with low CARM1 levels showed longer survival. It is possible that CARM1(-) cancers are more susceptible to SN38 drug treatment leading to inhibition of Top3B and Top1, which could in turn increase R-loop levels [32] and induce cell death.
The above pharmacogenomic associations of human RNA-DNA hybrid binding proteins support the role of R-loop genes in tumorigenesis and in determining the efficiency of tumour therapies. We note that, however, our analytical approach represents a hypothesis generating exercise performed on multi-experiment observations to identify all possible associations of R-loop genes and human malignancies, which should be experimentally validated in further high-content analysis projects. Our data suggest that modulating the expression levels of Rloop genes may affect clinical responses to anticancer drug treatments, and these expression changes could be used to define patient groups who are most likely to benefit from a therapy. Based on the revealed pharmacogenomic interactions, we propose that R-loops and R-loop binding proteins are potentially relevant new epigenetic markers and therapeutic targets in multiple cancers. Further exploration of the recognised associations is expected to improve drug effectiveness and identify potential combination therapeutics.