Construction of a lncRNA-mediated feed-forward loop network reveals global topological features and prognostic motifs in human cancers

Long non-coding RNAs (lncRNAs), transcription factors and microRNAs can form lncRNA-mediated feed-forward loops (L-FFLs), which are functional network motifs that regulate a wide range of biological processes, such as development and carcinogenesis. However, L-FFL network motifs have not been systematically identified, and their roles in human cancers are largely unknown. In this study, we computationally integrated data from multiple sources to construct a global L-FFL network for six types of human cancer and characterized the topological features of the network. Our approach revealed several dysregulated L-FFL motifs common across different cancers or specific to particular cancers. We also found that L-FFL motifs can take part in other types of regulatory networks, such as mRNA-mediated FFLs and ceRNA networks, and form the more complex networks in human cancers. In addition, survival analyses further indicated that L-FFL motifs could potentially serve as prognostic biomarkers. Collectively, this study elucidated the roles of L-FFL motifs in human cancers, which could be beneficial for understanding cancer pathogenesis and treatment.


INTRODUCTION
Long non-coding RNAs (lncRNAs, > 200 nucleotides in length) [1] are pervasive across the genome [2,3] and dysregulation of their expression is associated with many human diseases [4,5], including cancer [6]. The expression of lncRNA is regulated at the transcriptional level by transcription factors (TFs) [7] and at the post-transcriptional level by microRNAs (miRNAs) [8][9][10], leading to differential expression in different development and disease statuses. While both TFs and miRNAs regulate lncRNAs, TFs also regulate miRNAs [7]. This lncRNA-mediated feed-forward loop (L-FFL) is an important regulatory network motif underlying many biological processes, such as muscle cell differentiation [11] and cancer [12]. Indeed, many lncRNAs, TFs and miRNAs are dysregulated in various types of cancer [13,14]. However, there has been no large-scale attempt to identify L-FFL network motifs and their specific roles in human cancers.
Large-scale cancer genomics projects, such as the Cancer Genome Atlas (TCGA), have provided the global expression profiles of some lncRNAs, TFs and miRNAs in large samples [15,16]. In addition, several databases have been developed to facilitate the construction of L-FFL network motifs. For example, SNP@lincTFBS and ChiPBase identify TF-lncRNA interactions from ChIP-Seq data [17,18]. DIANA-LncBase and starBase v.2.0 provide experimentally verified miRNA-lncRNA interactions from Ago CLIP-seq data [19,20]. TransmiR manually collects experimentally supported TF-miRNA regulatory relationships from literature and publications [21]. However, these databases provide limited insight into the structure and function of the L-FFL network.

Research Paper
We hypothesized that the integration of large-scale cancer expression profiling data and an L-FFL network to identify cancer-associated L-FFL motifs may reveal novel regulatory mechanisms in human cancers as well as potential therapeutic targets. For this study, we developed a computational approach that integrates interaction and expression data of lncRNAs, TFs and miRNAs to identity dysregulated L-FFL motifs common across and specific to six types of human cancer. Survival analyses suggest that L-FFL motifs may could potentially serve as prognostic biomarkers.

The topological characteristics of the L-FFL network
We integrated multiple data sources to identify 623 L-FFL network motifs, each consisting of a TF, an miRNA and their common target lncRNA and then constructed a global L-FFL network containing 240 nodes (96 lncRNAs, 17 TFs and 127 miRNAs) and 878 edges ( Figure 1A). We found that this network approximated the scale-free network topology of a transcriptional regulatory network. The TFs held a larger degree than the lncRNAs and miRNAs and participated in more regulatory relationships ( Figure 1B). We also analyzed connectivity, topological coefficient, and clustering coefficient of nodes. All of these features followed a scale-free distribution ( Figures  1C, 1D, 1E), indicating that the L-FFL network behaves like a small-world phenomenon [22]. The neighborhood connectivity distribution provides the average of the neighborhood connectivity of lncRNAs, TFs, and miRNAs with k neighbors (k =0, 1…n). The constant decrease in the topological coefficient as the degree increases per lncRNA, TF, and miRNA indicates that the networks may have a hierarchical modularity. If the distribution decreases, then most of the edges in the network connect yellow, blue and red, respectively. Known disease-associated nodes are marked by black circles. B-E. The basic features of the network include degree, connectivity, topological coefficients, and clustering coefficients of lncRNA, miRNA and TF. F. Two L-FFL motifs were associated with lung cancer and breast cancer. www.impactjournals.com/oncotarget low-degree nodes with high-degree nodes, indicating that the network consists of sub-networks [23]. Some other networks have similar topological features. For example, the ceRNA network in cancer usually shows the features of multiple layers and being scale-free [24].
In addition, some nodes and edges in the L-FFL network were found to be associated with human cancer in previous studies. We identified six lncRNAs, 17 TFs and 127 miRNAs that were associated with different types of human cancers using data from lncRNADisease [25], HMDD [26] and the literature ( Figure 1A, Supplementary  Table S1). For example, the lncRNA JPX, the TFs MYC and E2F1 and the miRNA let-7 play crucial roles in human cancers [27,28]. More importantly, we found that some L-FFL motifs provide novel information on cancer regulation. For example, MYC is an onco-protein family comprised of c-myc, N-myc and L-myc, all of which contribute to pathogenesis in many human cancers. The lncRNA H19, which is regulated by c-myc, participates in embryonic development and tumorigenesis. Upregulation of H19 promotes cell proliferation correlates with poor prognosis in non-small-cell lung cancer [29]. MiR-29a/b/c, miR-106a/b and miR-19b are known diseaserelated miRNAs, especially in lung cancer [30][31][32]. Our L-FFL network is composed of L-FFL motifs in lung cancer, which may represent a new mechanism in cancer ( Figure 1F). As another example, the lncRNA JPX is an Xist activator, yet Xist expression is downregulated in breast cancer [33]. The TF MEF2C is expressed in normal mammary epithelial cells and in breast cancer cell lines [34]. MiR-193a/b, miR-145, and miR-197 were previously reported to promote breast cancer [35][36][37]. The TF MEF2C interacts with the lncRNA JPX, miR-193a/b, miR-145, and miR-197, and every transcript in these structures is associated with breast cancer ( Figure 1F), suggesting they may play critical roles in tumorigenesis in combination with an L-FFL motif.

Some L-FFL motifs are significantly dysregulated in cancer
To further understand the functional significance of L-FFL motifs in human cancer, we identified the significantly dysregulated L-FFL network motifs for different cancer types in the L-FFL network (see Materials and Methods). Based on these motifs, we constructed significantly dysregulated L-FFL sub-networks for each cancer type (  with eight lncRNAs, six TFs and 13 miRNAs in breast cancer ( Figure 2B); 14 L-FFL motifs with seven lncRNAs, four TFs and eight miRNAs in kidney cancer ( Figure 2C); 11 L-FFL motifs with seven lncRNAs, five TFs and eight miRNAs in lung adenocarcinoma ( Figure 2D); 15 L-FFL motifs with eight lncRNAs, five TFs, and 13 miRNAs in lung squamous cell carcinoma ( Figure 2E); and 12 L-FFL motifs with seven lncRNAs, four TFs and seven miRNAs in uterine corpus endometrioid carcinoma ( Figure 2F).
Analyzing these sub-networks, we found that some nodes and edges were present in multiple types of cancer, suggesting they play important roles in human carcinogenesis. To investigate these nodes, we first constructed a highly dysregulated L-FFL sub-network consisting of all of the dysregulated motifs for all six types of cancer. This network included 21 lncRNAs, nine TFs and 34 miRNAs ( Figure 2G). The nodes with larger degrees were usually associated with more types of cancer and tended to be network hubs, indicating they might promote pan-cancer development. For example, the TF E2F1, the lncRNA H19 and the miRNA miR-106 had high degrees and were dysregulated in many types of cancer. Indeed, these transcripts are known to promote cancer development [38,39]. Furthermore, we performed an enrichment analysis to investigate the function of this sub-network using all of the TFs [40]. These TFs were enriched in several cancer-related GO terms and pathways, such as the cancer pathway (Supplementary Figure S1). Significant GO terms included transcriptional regulation, transcription factor activity and positive regulation of RNA metabolic processes (Supplementary Table S3).

Common and specific dysregulated L-FFL motifs across cancer types
Here, L-FFL motifs dysregulated in at least two cancer types are designated as "common", while those dysregulated in only one cancer type are considered "specific". In total, 48 L-FFL motifs were identified as cancer specific, and 13 L-FFL motifs were common between different cancer types ( Figure 3A). Some common L-FFL motifs were dysregulated in at least three types of cancer ( Figure 3B, Supplementary Table  S4). For example, an L-FFL motif consisting of the TF E2F1, the miRNA miR-15b and the lncRNA IQCH-AS1 was dysregulated in breast cancer, lung squamous cell carcinomas, and lung adenocarcinoma. Previous studies have demonstrated that E2F1 can regulate the expression of genes in the cell cycle and act as a tumor suppressor in many types of cancer [41]. Also, miR-15b plays a critical role in many types of cancer and miR-15 families known as oncomiRs have tumor suppressor or oncogene functions [42]. We found that the lncRNA IQCH-AS1 interacts with E2F1 and miR-15b, suggesting it might be implicated in cancer regulation through the L-FFL network. Thus, the L-FFL might underlie cancer development processes common to many types of cancer. In addition, we also found that the L-FFL motif comprised of E2F1, miR-195 and SNHG12, and the L-FFL motif comprised of E2F1, miR-106b and KB-1732A1.1, were dysregulated in three types of cancers ( Figure 3B). E2F1 appeared many times in common L-FFL motifs and therefore might play different roles in different types of cancers by interacting with different lncRNAs and miRNAs [38]. On the other hand, most motifs were specific ( Figure 3C, Supplementary Table S4). For example, the MYC, miR-98 and LINC00665 motif was dysregulated only in breast cancer, while the MEF2C, miR-145 and JPX motif was dysregulated only in bladder cancer.

L-FFL motifs as prognostic biomarkers for cancers
We tested whether the expression of each L-FFL motif correlated with cancer survival (see Materials and Methods). Indeed, the expression of some L-FFL motifs correlated with survival in all types of cancer tested, excluding lung squamous cell carcinomas ( Figure  4, Supplementary  [43,44]. Also, the MEF2C, miR-145 and JPX correlated with the survival of bladder cancer patients (P=0.026). In addition, the MYC, miR-429 and MAPKAPK5-AS1 motif correlated with kidney cancer patient survival (P=0.065); the E2F1, miR-106b and ZNF718 motif correlated with lung adenocarcinoma patient survival (P=0.045); and, finally, two motifs correlated with endometrial cancer patient survival (P=0.019 and P=0.002). Importantly, survival analyses using only one type of transcript had no prognostic power. For example, the MYC, miR-429 and MAPKAPK5-AS1 motif did not correlate with patient survival when analyzing each of its components separately. These results suggest that L-FFL motifs may serve as prognostic biomarkers for different cancers.

L-FFL motifs may participate in complex biologic network regulation
We investigated whether there is cross-talk between the L-FFL motif and other types of network motifs in human cancer. First, we identified dysregulated mRNAmediated FFL (M-FFL) motifs in different types of cancers by following the same pipeline as with the L-FFL motif identification. Then, we generated dysregulated motifs that included the TFs and miRNAs shared by both the L-FFL and the M-FFL motifs (L-M-FFL motif). For example, we found that an L-FFL motif constituted by SNHG12, E2F1, and miR-16, and an M-FFL motif constituted by E2F1, miR-16 and AURKB could form a www.impactjournals.com/oncotarget complex regulation motif by sharing common TF E2F1 and miRNA miR-16, which were highly dysregulated in breast and bladder cancers ( Figure 5A, Supplementary  Table S6). In addition, some L-M-FFL motifs were dysregulated in only one type of cancer and could be regarded as cancer-specific motifs. For example, an L-M-FFL motif comprised of KB-1732A1.1, MYC, miR-93, and CCND1 was only dysregulated in lung squamous cell carcinomas ( Figure 5B), and an L-M-FFL motif including LINC00662, MYC, miR-34a and VEGF was only dysregulated in kidney cancer ( Figure 5C). Interestingly, we found cases of L-M-FFL motifs for which only the L-FFL motif exhibited functional relevance. For example, an L-FFL motif constituted by E2F1, miR-15b, and IQCH-AS1, and an M-FFL motif made of E2F1, miR-15b and BCL2 formed a complex L-M-FFL motif in uterine corpus endometrioid carcinoma, but only the L-FFL motif correlated with patient survival (P=0.002, Figure 5D). The same result was also found in lung adenocarcinoma cancer ( Figure 5E). These results indicate that although different types of motifs participate in cross-talk within the network, only parts of the network motif have a specific function.
We also obtained ceRNA motifs associated with cancer from the LncACTdb database [16]. We found that some dysregulated L-FFL and ceRNA motifs shared common miRNAs and lncRNAs in some types of cancers. For example, an L-FFL motif comprised of the lncRNA H19, the TF MYC and the miRNA miR-29c, and an ceRNA motif comprised of the mRNA COL3A1, the lncRNA H19 and the miRNA miR-29c, were both dysregulated in breast cancer ( Figure 5F). In this case, the L-FFL and ceRNA motifs might exhibit complex regulatory functions and cross-talk in cancer.

DISCUSSION
The cross-talk between different types of regulatory transcripts (e.g., mRNAs, lncRNAs, TFs and miRNAs) forms complex network motifs that may underlie carcinogenesis in some human cancers. In this study, we focused on a novel network motif, the L-FFL motif, and developed a computational approach to study it by integrating interaction and expression data of lncRNA, mRNAs and miRNAs. To validate our approach, we performed the same analyses using an independent microarray dataset from Gene Expression Omnibus (GEO, GSE36295) that included 45 breast cancer samples and eight normal samples. We obtained the TF, miRNA and lncRNA expression data for these samples based on the re-annotation of the Affymetrix Human Gene 1.0 ST Array and identified six dysregulated L-FFL motifs. There were three common dysregulated L-FFL motifs between GEO microarray and TCGA RNA-seq data, and TFs, miRNAs and lncRNAs were differentially expressed between breast cancer and normal samples ( Figure 6).
In this study, we also identified dysregulated L-FFL motifs that were common and specific to several cancers, consistent with previous studies on the tissue specificity of miRNA and lncRNA [45]. Tissue-specific L-FFL motifs could aid the development of drugs that target specific cancer tissues while minimizing side effects. Our functional and survival analyses suggest that L-FFL motifs may serve as prognostic biomarkers for cancer. Similar to previous studies highlighting the use of lncRNAs as cancer biomarkers [46], we found that the expression of lncRNAs that participate in L-FFL motifs correlates with prognosis.
There are other types of L-FFL motifs different from the ones we focused on in this study, such as those in which miRNAs and lncRNAs regulate TFs or lncRNAs regulate miRNAs [47]. Under different mechanisms and conditions, TFs can activate or repress the expression of target lncRNAs and miRNAs in the L-FFL motifs [21]. These alternate L-FFL motifs have also been shown to participate in the development of several types of cancer [48,49], and can also be constructed with the help of databases [50]. Further analyses of these types of motifs and their complex regulatory patterns could also reveal novel mechanism underlying carcinogenesis. In summary, our study provides novel insights into the mechanisms underlying the function of lncRNAs in cancer. As shown here, the analysis of the L-FFL network can help to understand the relevance of multi-level cross-talk regulation in cancer by revealing functional motifs that are common across or specific to different cancer types.

Expression data of lncRNA, TF and miRNA
We obtained the data from six types of cancers and normal expression data of lncRNAs, TFs and miRNAs as previously described [16]. Briefly, we downloaded the raw read counts for each exon from the TCGA level 3 dataset. Then, we mapped these exons to the annotation of human TFs and lncRNAs that was derived from GENCODE [51]. We recalculated the reads per kilobases per million reads (RPKM) values for the TFs and lncRNAs, leading to expression data of human TF and lncRNA. The miRNA sequencing data (Illumina HiSeq miRNASeq) for six types of cancer were downloaded from TCGA (level 3) [52]. Cancer samples with clinical follow-up information were retained for further analysis. The cancer name abbreviations of TCGA and the number of cancer and normal samples are listed in Supplementary Table S7.

Constructing a global L-FFL network
L-FFL motifs consist of a TF, an miRNA and their common target lncRNA, in which the TF regulates the expression of the miRNA, and both the TF and the miRNA regulate a common set of target lncRNAs [7]. To construct a global L-FFL network, three types of regulatory interaction are needed: miRNA-lncRNA, TF-lncRNA and TF-miRNA. The regulatory interactions chosen in this study were downloaded from starBase v2.0 [20], which provides comprehensive CLIP-Seq experimentally supported miRNA-lncRNA interaction data. The TF-lncRNA regulatory interactions were downloaded from SNP@lincTFBS [18], which identifies transcription factor binding sites (TFBSs) of lncRNA using genomewide ChIP-Seq data. Finally, the TF-miRNA regulatory interactions were downloaded from TransmiR [21], which manually collects experimentally supported TF-miRNA regulatory relationships from literature and publications.

Topological measurements of the L-FFL network
We investigated several common topological measurements to reveal the characteristics of the L-FFL network. For the whole L-FFL network, we analyzed the degrees, connectivity, topological coefficients, and clustering coefficients of nodes (lncRNA, TF and miRNA). The box plots show the differential expression of each L-FFL motif, and TFs, miRNAs and lncRNAs are colored red, blue and yellow, respectively.

Identifying L-FFLs dysregulated in cancer
Using both the L-FFL network and expression profiling data, we developed an integrative pipeline to detect L-FFL motifs dysregulated in different types of cancer (Supplementary Figure S2). First, for each L-FFL motif, we computed t-tests or paired t-tests and P-values to measure the differences in the expression levels of lncRNAs, TFs and miRNAs between cancer and normal samples. For each miRNA-lncRNA, TF-lncRNA and TF-miRNA regulatory interaction in each L-FFL motif, we calculated Pearson correlation coefficients (PCCs) for the cancer and normal samples, as well as the difference between them. We used the absolute difference between cancer and normal PCCs to represent association between interactions. We integrated the differential expression P-value and PCCs to calculate two comprehensive scores (score diff and score cor ) for L-FFL as follows (Supplementary Figure S3): where p t , p m , and p l are the differential expression P-values of TF, miRNA and lncRNA, respectively, in each L-FFL, and score diff corresponds to the difference in expression of an L-FFL motif between the cancer and normal samples. C tm , C tl , and C ml are the PCCs for the TF and miRNA, TF and lncRNA, and miRNA and lncRNA pairs, respectively, in the cancer samples, while N tm , N tl , and N ml are the PCCs for the TF and miRNA, TF and lncRNA, and miRNA and lncRNA pairs, respectively, in the normal samples. Score cor corresponds to the absolute difference in the correlation level of an entire L-FFL motif between the cancer and normal samples. Second, we ranked all of the L-FFL motifs for each cancer type using score diff and score cor based on an equally-weighted, multidimensional ranking method [53]. After ranking by each score, we computed a final ranking score for each L-FFL motif by integrating its two rank positions in a two-rank list. A higher ranking corresponds to larger dysregulation in cancer. Third, we measured the significance of each L-FFL motif's ranking by comparing its final score with that of L-FFL motifs calculated by randomly permuting the sample labels for expression profiling. With 1000 permutations, we referred to previous studies and generated dysregulated L-FFL motifs for each type of cancer based on the permutation P-value (P < 0.05) [47].

Survival analysis
For each dysregulated L-FFL motif in each type of cancer, a Cox multiple regression analysis was used to evaluate the association between expression and cancer survival. The corresponding regression coefficient of each lncRNA, miRNA and TF in an L-FFL motif was used as a weighting coefficient to produce an integrated score. We used the integrated score to classify the samples into high-risk and low-risk groups. Then, a Kaplan-Meier survival analysis was performed for the two clustered groups, and statistical significance was assessed using the log-rank test. All of the analyses were performed within the R 2.15.3 framework.

Gene set enrichment analysis
Gene Ontology (GO) enrichment and KEGG pathways enrichment analyses were performed by the DAVID functional annotation web server using default parameters [40]. We obtained enriched GO terms (FDR < 0.05) and KEGG pathways (P < 0.1).

CONFLICTS OF INTEREST
No conflicts of interest to disclose.