High-throughput validation of ceRNA regulatory networks

Background MicroRNAs (miRNAs) play multiple roles in tumor biology. Interestingly, reports from multiple groups suggest that miRNA targets may be coupled through competitive stoichiometric sequestration. Specifically, computational models predicted and experimental assays confirmed that miRNA activity is dependent on miRNA target abundance, and consequently, changes in the abundance of some miRNA targets lead to changes to the regulation and abundance of their other targets. The resulting indirect regulatory influence between miRNA targets resembles competition and has been dubbed competitive endogenous RNA (ceRNA). Recent studies have questioned the physiological relevance of ceRNA interactions, our ability to accurately predict these interactions, and the number of genes that are impacted by ceRNA interactions in specific cellular contexts. Results To address these concerns, we reverse engineered ceRNA networks (ceRNETs) in breast and prostate adenocarcinomas using context-specific TCGA profiles, and tested whether ceRNA interactions can predict the effects of RNAi-mediated gene silencing perturbations in PC3 and MCF7 cells._ENREF_22 Our results, based on tests of thousands of inferred ceRNA interactions that are predicted to alter hundreds of cancer genes in each of the two tumor contexts, confirmed statistically significant effects for half of the predicted targets. Conclusions Our results suggest that the expression of a significant fraction of cancer genes may be regulated by ceRNA interactions in each of the two tumor contexts. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3790-7) contains supplementary material, which is available to authorized users.


Background
Identifying regulatory interactions that mediate the effects of genomic alterations is a necessary step for interpreting the function of trans-acting variants in complex diseases, including cancer [15,16]. Among these, miRNA dysregulation, arising from alterations targeting their transcriptional [17] or biogenesis regulators [18], plays an established role in tumorigenesis [1]. Recently, multiple groups have reported on gene products that modulate miRNA activity [5][6][7][19][20][21][22][23][24][25][26], including RNA species that can alter the abundance of other RNAs in trans through ceRNA interactions. These studies show that targets of the same miRNAs are coupled and that up-or down-regulation of one target may alter the expression of other cognate targets by sequestering or releasing their shared miRNA molecules, respectively ( Figure 1A).
Since the discovery of ceRNA regulation in human cells [21,22] multiple reports questioned the physiological relevance of ceRNA interactions, researcher's ability to predict them, and the number of genes that are affected in each context [9][10][11]. To address these concerns, we proceeded to test genome-wide ceRNA predictions made by the information-theoretic Hermes algorithm [5]. For the sake of generality, we performed this analysis in two distinct tumor contexts, using a set of large-scale and high-throughput shRNA-mediated perturbation assays in model cell lines assembled by the Library of Integrated Network-based Cellular Signatures (LINCS) [14]. Specifically, we inferred ceRNETs using TCGA profiles of prostate and breast adenocarcinomas and tested them using a LINCS compendium of perturbation profiles, representative of the shRNA-mediated silencing of >3,000 genes in PC3 and MCF7 cells [12][13][14]. We propose that the high validation rates of these assays can inform on the accuracy of computational predictions, and will help estimate the number of genes that are modulated by ceRNA in representative tumor contexts.

Inference method
We used an extended version of the Hermes algorithm, which we had previously introduced to discover and validate glioma-specific ceRNAs [5], to systematically discover ceRNA interactions in prostate (PRAD) and breast (BRCA) adenocarcinomas, using matched miRNA and mRNA expression profiles of the corresponding TCGA cohorts. While the ceRNA inference component of the algorithm was unchanged, the new algorithm also supports the identification of the specific miRNAs that mediate each interaction (mediators); these miRNAs are predicted to target both mRNAs in a ceRNA interaction, and their activity is affected by modulation of target mRNA abundance. This extension is useful for generating more specific hypotheses for future functional testing.
We note that Hermes-inferred ceRNA interactions are independent of the co-expression of coupled mRNAs; rather, they are based on assessing whether the abundance of one mRNA species modulates abundance of the other (and vice-versa), via their shared miRNA program. This assessment is based on the statistical significance of the mutual information between the abundance of one mRNA species and one or more miRNA, given the abundance of another mRNA targeted by the same miRNAs. We note that a majority of predicted ceRNA interactions involved mRNAs that are not significantly coexpressed, and co-expression did not implicate genes as ceRNA interacting partners.
Hermes predicts ceRNA-coupling between two mRNAs based on the relative size of their shared miRNA regulatory program, as predicted by the Cupid algorithm [5], and the conditional mutual information between one of the mRNAs and each of their shared miRNAs, given the other mRNA. Namely, given genes Ti and Tj, and the set of miRNAs that regulate them Π ( ) and Π ( ), their shared program is identified by taking the intersection Π ( i ; j ) = Π ( ) ∩ Π ( ). First, Hermes tests that the size of Π ( ; ) relative to the sizes of the individual programs is statistically significant at stand for conditional mutual information and mutual information respectively, and the variables indicate the expression of the corresponding RNA species [5].
The CMI is estimated using an adaptive partitioning algorithm [27] [5]. In addition, the presence of transcripts with alternative 3' UTRs is expected to reduce the sensitivity of prediction.
In order to identify miRNA mediators in addition to ceRNA interactions, we modified Hermes to perform greedy addition of miRNA mediators and to optimize the combined p-value for each predicted interaction. Namely, for each candidate interaction, we searched for the minimum combined p-value through the greedy forward inclusion of individual miRNAs. Additional miRNAs were included as candidate mediators only if they improve the joint p-value, as estimated using Fisher's method. MiRNAs failing to improve the joint p-value lack functional evidence for mediating the interaction and were thus excluded from the analysis.

Inferred ceRNETs
We used Hermes to construct ceRNETs using matched TCGA gene and miRNA expression profiles of breast (BRCA) and prostate (PRAD) adenocarcinomas. For Page | 6 PRAD, this included data from 140 samples, representing 23,614 genes and 367 miRNAs [28]; while for BRCA, we used 207 samples, representing 18,748 genes and 524 miRNAs [29]. Predicted ceRNA networks included 476,456 and 447,011 interactions, for the PRAD and BRCA ceRNETs, respectively; see Tables S1-2, where each ceRNA interaction is defined by two RNAs and the miRNAs that couple them.
Due to their size, experimental validation of reverse-engineered networks is often challenging. Consequently, validation is generally performed only on a handful of interactions [30,31] or on small subnetworks [32,33]. To validate our inferred interactions on a more realistic scale, we used a large collection of shRNA-mediated silencing assays in the Library of Integrated Network-based Cellular Signatures (LINCS) database [14].

The LINCS database
The LINCS database includes Luminex-based multiplexed assays to measure the  (Tables S3 and S4). Gene expression was measured using the L1000 assay at two time points (96h and 144h), following each perturbation. To achieve adequate statistical power, we limited our tests to genes with six or more silenced Hermes-inferred ceRNA regulators. In total, we evaluated predicted ceRNA targeting of 405 genes (of which 365 were validated at 96h and 398 at 144h) in MCF7 cells and 419 genes (of which 363 were validated at 96h and 376 at 144h) in PC3 cells.
In LINCS perturbation assays, while some data points are missing due to quality control metrics, the expression of most genes was profiled in triplicates at both 96h and 144h after shRNA transduction. On average, 3.3 unique shRNA hairpins were used to silence each of 1,845 breast and prostate oncogenes and tumor suppressors (cancer genes) in our networks. By definition, for each interaction, the ceRNA regulator is the one targeted by the shRNA perturbation and the ceRNA target is the one profiled after silencing to determine any interaction mediated change. Fold-change for each target, in response to the silencing of one of its regulators, was estimated by averaging across all shRNA hairpins targeting this regulator. The identities of the specific shRNA hairpins, regulators, and targets are provided in Table S5.
In total, 9,055 and 9,800 predicted BRCA interactions were tested in MCF7 cells at 96h and 144h after shRNA-mediated silencing, respectively. Similarly The size of this dataset allows for rigorous controls that avoid bias due to gene expression variability and off-target effects of RNAi-mediated perturbations. To estimate the statistical significance of target responses to gene perturbations, we examined fold change in target expression following silencing of its predicted regulator, as well as the effect of silencing a specific regulator across all of its inferred targets. For the latter, we associated each candidate target with a rank vector representing its percentile ranked relative fold-change at a specific time point following each gene silencing perturbation.
We then ranked expression fold-changes of all profiled genes at a specific time point following silencing of any given gene and compared ranks across different perturbations. In this way, each candidate target was associated with a rank vector, representing its relative fold-change following each perturbation. Considering each target in isolation, we compared its response to perturbations of its predicted regulators as well as its response to shRNA-mediated silencing of other genes. Mann-Whitney U test was used to determine whether the rank of a target following silencing of its predicted regulators was lower than that following silencing of other genes.
Specifically, Fold change measurements for up to 1,171 genes in response to a given perturbation allowed ranking of the profiled genes based on the strength of the Page | 8 response. First, we assigned significance to the response of a gene to the silencing of its predicted regulators by comparing the set of its scores associated with perturbations of predicted regulators to the scores of all other genes, i.e. the gene's ranks following silencing of its predicted regulators vs. its ranks following silencing of all other genes.
Then we used a Mann-Whitney test to determine whether the ranks of a target after silencing of its regulators was significantly lower than its ranks following silencing of all other genes. Given the number of gene perturbations (up to 1,171 gene silencing experiments), the two sets of ranks were expected to be normally distributed and can be approximated by a z-score and a corresponding p value. On average, for each target gene, the number of perturbations targeting its regulators was 1% of the total number of perturbations tested.  Table S5. When comparing responses of ceRNA targets to controls, as described in Figure 1B-C, we randomly assembled control sets composed of as many non-ceRNA regulators as the number of ceRNA regulators for each target. We then calculated the average fold change and the error-propagated standard error after silencing each of non-ceRNA regulator, and estimated the significance of fold changes using a two-tailed rank-sum test. This process was repeated 1,000 times to obtain averaged fold changes, propagated standard errors, and averaged p values based on the negative controls.

Predicted cancer genes are affected by silencing of their ceRNA regulators
Results from these assays confirmed that Hermes predictions are highly enriched in bona fide ceRNA interactions in both BRCA and PRAD and that these interactions may and 62% of Hermes-inferred targets were significantly down-regulated (p < 0.05 by U test) following shRNA-mediated silencing of their Hermes-inferred ceRNAs in MCF7 and PC3, respectively, at least at one time point; see Figure 4 and Table S5. Fold-change and p-values were measured by comparing average differential expression of a gene following shRNA-mediated silencing of its inferred ceRNA regulators, compared to silencing of all other genes not predicted to be ceRNA targets of these regulators. This guaranteed the most unbiased selection of negative control assays possible. Moreover, our efforts to control for specific effects that could potentially bias this comparisonincluding shRNA off-target effects and outliers-reaffirmed analysis results.

Accounting for systematic biases
To ensure that comparisons of target responses to predicted regulators and nonregulators are free of bias, we repeated the analysis presented in Figure 4 for MCF7 and PC3 after controlling several properties. These included re-analyses after (1) eliminating outlier responders, i.e. ceRNA interactions associated with the most significant ceRNA-validating responses; (2) eliminating shRNAs that can act as human miRNAs [34] and produce off-target effects; and after accounting for (3) 3' UTR length and (4) CG content, (5) RMA-normalized expression, (6) expression variability, and (7) expression correlation with the predicted target. Results are presented in Figure S1 and suggest that accounting for these potential biases had relatively little effect on the number of targets that were found to be significantly down regulated by silencing their predicted regulators. We note that the average gene expression change across all interactions in LINCS is 1.0.
When eliminating outliers we removed predicted ceRNA interactions where the inducing effects were greater than Q1-1.5*STD by percentile rank from the analysis. By Page | 10 discarding the strongest ceRNA-like effects, we eliminated any chance that the test may be biased by a few outlier events. To eliminate potential off-target effects caused by miRNA-like behavior, we eliminated all shRNAs whose 7-base seed subsequence (2nd

Integrative Statistical Evaluation
To evaluate the significance of all tested interactions at each time point in both cell lines, we used a one-sample Kolmogorov-Smirnov test [36]. This normality test evaluates whether z scores obtained from fold-change rank comparisons follow a standard normal distribution with µ = 0 and σ = 1, thus assigning significance against the null hypothesis that z scores are selected at random. We ranked targets based on z scores and calculated the expected p value when assuming that z scores were drawn from a standard normal distribution. In total, 342 tested ceRNA targets were significantly down regulated in at least one assay, at an average of more than 2 assays per target, while none were significantly upregulated (p < 0.05). In aggregate, down-regulation of predicted ceRNA targets was highly significant (p ≤ 6.0E-84 for each time point and cell line). This analysis constitutes the largest scale validation of a regulatory network to date and suggests that hundreds of cancer genes may be altered through competition for miRNA regulation in BRCA and PRAD. Critically, these results are not merely a reflection of intrinsic coupling of gene expression in cellular systems. Indeed, equivalent numbers of interactions selected at random from non-predicted ceRNAs produced no statistically significant trends. In total, nearly 50% (50% in MCF7 and 47% in PC3) of all predicted targets were significantly down regulated by shRNA-mediated silencing of their predicted ceRNA. We note that when considering individual interactions, 31% of targets were down regulated following silencing of their predicted regulators.

Comparisons with other ceRNA prediction methods
To test whether Hermes predictions are uniquely enriched for down regulation in LINCS data, we used LINCS assays to test predictions by MuTaME [23] and cefinder [37]. Our results suggest that while Hermes significantly outperformed both methods ( Figure S2 is not downloadable. Ala et al. used MuTaME to predict DICER1-regulating ceRNAs [3], but only 4 genes that were predicted to interact with DICER1 were targeted in LINCS. Consequently, we chose to compare the three methods when predicting PTEN regulators and targets (predictions by MuTaME are bidirectional), but genome-wide comparisons were made between Hermes and cefinder only. The comparisons suggest that Hermes outperforms MuTaME and cefinder when predicting PTEN targets and regulators ( Figure S2), and it significantly outperforms cefinder on genome wide tests ( Figure S3).

Discussion
We proposed and re-implemented Hermes, a highly-selective context-specific method for predicting ceRNA interactions [5]. When analyzing TCGA profiles of prostate [38] and breast [39] adenocarcinomas Hermes inferred nearly 500K ceRNA interactions in each of the two tumor types.
In total, Hermes produced expression-based evidence for the regulation of over 5,000 genes by ceRNA interactions. Conclusions from perturbation assays that tested hundreds of these genes as potential targets are that half of them are dysregulated by targeting their Hermes-inferred ceRNA regulators. Put together, the results suggest that thousands of genes can be dysregulated be ceRNA interactions in each of the two contexts through exogenous perturbations or through genomic alterations that target their ceRNA regulators.

Conclusions
Computational evidence in conjunction with high-throughput biochemical assays, suggest that ceRNA regulation is the norm and not an exception in cancer cells. While ceRNA interactions can be easily detected and validated in extreme cases-as in MYCN-amplified neuroblastomas [6] or binding-site rich RNAs [26]-they affect the expression of thousands of genes and have the potential to synergistically dysregulate drivers of tumorigenesis in multiple tumor contexts.

Page | 13
Declarations Ethics approval and consent to participate. Not applicable.

Consent for publication. Not applicable
Availability of data and materials. Computational methods and analyses results are fully described here. Hermes2.0 can be downloaded free of charge from its Columbia University site http://califano.c2b2.columbia.edu/hermes. LINCS assays are included in supplementary tables and can be downloaded from the LINCS website at http://www.lincsproject.org/LINCS/data.

Competing interests.
The authors declare that they have no competing interests.      Hermes outperforms cefinder at P < 5E-07 and P < 2E-44, for MCF7 and PC3, respectively; p-values based on two-sample Kolmogorov-Smirnov tests of ceRNAtarget fold changes.

Page | 20
Supplementary Tables   Tables S1-S2. Inferred ceRNA networks. Predicted interactions in the two tumor contexts: Table S1 for BRCA, and Table S2 for PRAD. Each table describes