Integration analysis of long non-coding RNA (lncRNA) role in tumorigenesis of colon adenocarcinoma

Colon adenocarcinoma (COAD) is one of the most common gastrointestinal cancers globally. Molecular aberrations of tumor suppressors and/or oncogenes are the main contributors to tumorigenesis. However, the exact underlying mechanisms of COAD pathogenesis are clearly not known yet. In this regard, there is an urgent need to indicate promising potential diagnostic and prognostic biomarkers in COAD patients. In the current study, level 3 RNA-Seq and miR-Seq data and corresponding clinical data of colon adenocarcinoma (COAD) were retrieved from the TCGA database. The “limma” package in R software was utilized to indicate the differentially expressed genes. For in silico functional analysis, GO and KEGG signaling pathways were conducted. PPI network was constructed based on the STRING online database by Cytoscape 3.7.2. A ceRNA network was also constructed by “GDCRNATools” package in R software. Kaplan-Meier survival analysis (log-rank test) and ROC curve analysis were used to indicate the diagnostic and prognostic values of the biomarkers. The differential expression data demonstrated that 2995 mRNAs, 205 lncRNAs, and 345 miRNAs were differentially expressed in COAD. The GO and KEGG pathway analysis indicated that the differentially expressed mRNAs were primarily enriched in canonical processes in cancer. The PPI network showed that the CDKN2A, CCND1, MYC, E2F, CDK4, BRCA2, CDC25B, and CDKN1A proteins were the critical hubs. In addition, the Kaplan-Meier analysis revealed that 215 mRNAs, 14 lncRNAs, and 39 miRNAs were associated with overall survival time in the patients. Also, the ceRNA network data demonstrated that three lncRNAs including MIR17HG, H19, SNHG1, KCNQ1OT1, MALAT1, GAS5, SNHG20, OR2A1-AS1, and MAGI2-AS3 genes were involved in the development of COAD. Our data suggested several promising lncRNAs in the diagnosis and prognosis of patients with COAD.


Background
Colon adenocarcinoma (COAD) is one of the most common gastrointestinal (GI) cancers and is the second leading cause of cancer-related death, globally [1,2]. It is demonstrated that COAD occurs in approximately 5% of overall population at any given time in the world [3]. Despite the current screenings and therapies such as endoscopic resection and radical surgery, nearly half of the patients are diagnosed as advanced cases of COAD, experiencing tumor recurrence and relapse. COAD tumorigenesis has complicated multi-step processes including colon epithelial cell proliferation, aberration in differentiation, apoptosis resistance, survival, and invasion mechanisms [4]. Molecular aberrations of tumor suppressors and/or oncogenes are also one of the main contributors in different types of tumors especially COAD tumorigenesis [5]. However, due to complicacy of the underlying molecular pathways, the exact pathogenic contributors of COAD have not yet been clarified. Hence, there is an urgent need to indicate promising diagnostic and prognostic biomarkers for COAD. Recent investigations have highlighted the role of non-coding RNAs in the tumorigenesis of various malignancies. Among different kinds of non-coding RNAs, long noncoding RNA (lncRNA) is a putative class of non-coding RNA with more than 200 nucleotides in length, without any open-reading-frame (ORF) to encode proteins [5,6]. Interestingly, a large body of evidence indicates that lncRNAs plays critical roles in a variety of biological processes including cell proliferation, cellular development, differentiation, carcinogenesis, and metastasis through modulating gene expression at the transcriptional and posttranscriptional levels directly or by recruiting chromatin remodeling factors [6][7][8]. Aberrant expression of lncRNAs has been well-documented in different sorts of cancers [9]. Dysregulation of lncRNA HOTAIR, H19, MALAT1, SNHG7, GAS8-AS, and NEAT1 were extensively well-studied and have been demonstrated to contribute in tumorigenesis and poor prognosis [5,[9][10][11][12][13]. Numerous investigations have shown that the lncRNAs can exert their function by competing endogenous RNA (ceRNA) crosstalk. For instance, it has been shown that lncRNA SCARNA2 was overexpressed in COAD tissues and it remarkably correlated with chemoresistance. Mechanistically, SCARNA2 via targeting miR-342-3p, upregulates EGFR and BCL2 expression in COAD cells [14]. Furthermore, overexpression of lncRNA SNHG1 has been shown to promote epithelial-mesenchymal transition (EMT) by binding to miR-497/miR-195-5p in COAD cells [15]. Also, lncRNA BDNF-AS was downregulated in COAD patients and served as a tumor suppressor gene. Unsurprisingly, ectopic expression of BDNF-AS suppressed cell proliferation and migration via epigenetically downregulating GSK-3β expression through EZH2 [16]. Moreover, several investigations have considered lncRNAs as therapeutic opportunities in COAD. For instance, it has been NA Not Applicable demonstrated that overexpression of LINC00152 can promote Fascin actin-bundling protein 1 (FSCN1) expression via sponging miR-632 and miR-185-3p, which consequently leads to proliferation and metastasis in COAD [17]. A recent study has demonstrated that targeting lncRNA FLANC by 1,2-dioleoyl-sn-glycero-3phosphatidylcholine nanoparticles loaded with a specific small interfering RNA, decreased metastasis without any significant toxicity. They proposed that FLANC may act as a novel therapeutic strategy in COAD [18]. Additionally, many researches have suggested the potency of lncRNAs as biomarkers in the blood and serum. They suggested microvesicles and exosomes as carriers, being protected and stabilized in circulation [19]. In the current study, we comprehensively investigate lncRNAs, miRNAs, and mRNAs expressions from a public database, "Cancer Genome Atlas (TCGA)" and we constructed a ceRNA network in COAD. Also, we proposed novel potential biomarkers for COAD.

Sample and data collection
Clinical data of COAD were retrieved from the TCGA database (https://portal.gdc.cancer.gov/repository). The inclusion criteria were: (1) the histopathological diagnosis was COAD; (2) having complete demographic data including age, vital status, race, ethnicity, pathological stage, TNM classification, and overall survival time.   Table 1.

RNA-Seq and miR-Seq data analysis
RNA-Seq and miR-Seq Level 3 data were collected from the TCGA database. The raw count of the reads of RNA-Seq and miR-Seq data was normalized by Voom and TMM normalization methods. All the analyses were conducted in R software. The "limma" package in R software was utilized to indicate the differentially expressed mRNAs (DEmRNAs), lncRNAs (DElncRNAs), and miR-NAs (DEmiRNAs) between normal solid tissues and primary tumors. The concluded data were filtered based on the |log2 fold change (FC)| > 1 for DEmRNA, DElncRNA, and DEmiRNA. P-value < 0.05 and false discovery rate (FDR) < 0.05 were considered as significant thresholds.

Functional enrichment analysis and protein-protein interaction (PPI) network
For in silico functional enrichment analysis, gene ontology (GO) in three domains including biological processes, cellular components, and molecular functions, in addition to Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathways were conducted. The GO and KEGG outputs were visualized by R software (ggplot2 package). The PPI network was constructed based on the STRING online database by Cytoscape 3.7.2. Molecular Complex Detection (MCODE) was used to analyze and predict the interactions (score value > 0.4).

Statistical analysis
All the differentially expressed data were analyzed by using R software (3.5.2) through the "GDCRNATools" package. Kaplan-Meier survival analysis (log-rank test) was used to indicate the relation between over or downregulation of RNA, based on median expression with patient's survival time. ROC curve analysis, univariate, and multivariate Cox regression analysis were conducted by SPSS v21. P-value < 0.05 was considered as a significant threshold.

Differentially expressed genes
Our data demonstrated that 2995 mRNAs (1094 upregulated and 1901 down-regulated) were differentially expressed in COAD. Moreover, 205 lncRNAs (128 upregulated and 77 down-regulated) were identified that were deferentially expressed in patients. Three hundred and forty-five miRNAs containing 183 up-regulated and 162 down-regulated have been found with differential expression in the COAD samples. The data are presented in Figs. 1, 2 and Tables 2, 3.

GO enrichment and KEGG pathway analysis
GO enrichment analysis demonstrated that the differentially expressed mRNAs were enriched in different biological processes such as leukocyte migration, extracellular matrix organization, T cell activation, mitotic nuclear division, and adaptive immune response. Furthermore, GO analysis in cellular component revealed that the differentially expressed mRNAs predominantly contributed to collagen-containing extracellular matrix, basement membrane, microvillus, apical part of cell, and external side of plasma membrane. GO molecular function domain indicated that the genes were mainly enriched in glycosaminoglycan binding, heparin binding, sulfur compound binding, extracellular matrix structural constituent, and cytokine activity. GO outputs are presented in Fig. 3. In addition, KEGG pathway analysis indicated that the differentially expressed genes in the COAD patients remarkably participated in pathways involving in cancer, cell cycle, PPAR signaling pathway, PI3K-Akt signaling pathway, Wnt signaling pathway, and p53 signaling pathway (Fig. 4).

PPI network construction
The PPI network was constructed based on the STRING database to better understand the roles of the differentially expressed mRNAs. The data demonstrated that CDKN2A, CCND1, MYC, E2F, CDK4, BRCA2, CDC25B, and CDKN1A were the protein-protein interaction (PPI) critical hubs (Fig. 5).

Kaplan-Meier survival analysis of differentially expressed genes
Kaplan-Meier survival analysis was used to indicate the association of differentially expressed mRNAs, lncRNAs, miRNA, and prognosis of COAD patients. The data showed that 215 mRNAs, 14 lncRNAs, and 39 miRNAs  Table 4.

Diagnostic analysis of differentially expressed lncRNAs
AUC analysis was conducted to demonstrate the diagnostic value of each lncRNAs in the COAD samples. All 205 differentially expressed lncRNAs indicated significant diagnostic values. The top 50 hits of the lncRNAs are summarized in Table 5.

Novel lncRNA biomarkers
After merging the overall survival, and the diagnostic value data, we found that 14 lncRNAs had high ranks in prognostic and diagnostic areas which can be considered as COAD biomarkers. The data are presented in Table 6. Kaplan-Meier and ROC curve analysis were conducted for the top three lncRNAs (AC087388.1, SLC16A1-AS1, and ELFN1-AS1) from aforementioned analysis shown in Fig. 6. Moreover, univariate and multivariate analysis were conducted to demonstrate the power of the lncRNAs and to diminish the covariate effects. Univariate and multivariate analysis are summarized in Table 7.
LncRNA-miRNA-mRNA ceRNA network construction According to ceRNA hypothesis, which implicates that lncRNAs regulate mRNA expression level by competing

Discussion
LncRNAs regulate critical and canonical biological functions in different types of normal human cells and in a variety of tumor cells [20]. An escalating number of investigations have reported the function of lncRNAs in tumor proliferation, cell invasion and migration, chemotherapy resistance, and stemness capability in tumorigenesis and progression of COAD [21][22][23]. However, the exact underlying mechanisms of lncRNAs in progression of COAD are still unclear. So far, several different biological regulatory functions have been proposed for lncRNAs. Some previous studies have demonstrated that lncRNAs regulate mRNA expression via binding and sponging miRNA known as competing endogenous RNA theory, which generates a new aspect in the lncRNA regulatory mechanism [24,25]. To the best of our knowledge, only a few investigations have displayed ceRNA networks between lncRNAs and miRNAs in COAD. Thus, a clear image of lncRNAs-miRNAs links still remains uncharacterized. In the current study, we studied the differentially expressed genes including lncRNAs, miRNAs, and mRNAs in the COAD patients based on TCGA database. Gene set enrichment by GO and KEGG signaling pathway identified the differentially expressed genes which were significantly enriched in cell proliferation, differentiation, protein phosphorylation, and signaling pathways. Furthermore, KEGG signaling pathway analysis demonstrated several canonical signaling pathways including Wnt, PI3K/Akt and PPAR signaling pathways that have been shown to contribute in tumor progression [26,27]. A mounting of evidence has emphasized on Wnt/ β-catenin signaling pathway, promoting tumor growth, invasion and metastasis, and chemoresistance in COAD [28,29]. For instance, it has been demonstrated that lncRNA H19 overexpression induces the EMT of colorectal cancer (CRC) cells by sponging miR-29b-3p to directly upregulate PGRN and activate Wnt axis [30]. Moreover, the up-regulation of lncRNA colorectal cancer-associated lncRNA (CCAL) promotes CRC progression through suppressing the activator protein 2α (AP-2α) to initiate Wnt/β-catenin signaling pathway [31]. In the present study, the KEGG analysis indicated that the peroxisome proliferatoractivated receptor (PPAR) pathway contributes in Wnt signaling. It has been shown that the PPAR signaling pathway reduces cell proliferation and inhibits tumorigenesis in different types of cancers. Down-regulation of PPAR-α has been correlated with poor clinicopathological features of CRC that was remarkably higher in well to moderately differentiated adenocarcinoma than in mucinous adenocarcinoma [32]. In addition, lncRNA TINCR modulates PPAR signaling pathway through binding to miR-107 to up-regulate CD36 in CRC [33]. Recently, the PPAR aberration expression and its prime   [34]. It has been shown that PI3K/Akt signaling pathway had prominent roles in carcinogenesis of a variety of cancers particularly COAD. LncRNA AB073614 can take under control CRC growth and invasion by PI3K/Akt signaling pathway [35]. In addition, lncRNA SNHG7 elevated GALNT7 level and induced PI3K/Akt/mTOR pathway by sponging miR-34a in CRC cells [36]. Our ceRNA network data demonstrated important lncRNAs including MIR17HG, H19, SNHG1, KCNQ1OT1, MALA T1, GAS5, SNHG20, OR2A1-AS1, and MAGI2-AS3 which previously have been highlighted in the development of COAD. LncRNA MAGI2-AS3 have been discovered to play a crucial role as a tumor suppressor in breast cancer by targeting Fas/FasL in tumor cells [37]. Moreover, MAGI2-AS3 hampers hepatocellular carcinoma cell growth and its invasion through sponging miR-374b-5p to up-regulate SMG1 axis [38]. On the other hand, overexpression of MAGI2-AS3 has been explained to promote tumor progression by absorbing miR-141/ 200a and consequently, up-regulating ZEB1 which is an EMT promoting transcription factor, in gastric cancer cells [39]. MAGI2-AS3 up-regulation has also been shown to induce CRC proliferation and migration by modulating miR-3163 through upregulating TMEM106B [40].
LncRNA SNHG1 is a prominent lncRNA that is involved in a variety of cancers. SNHG1 expression is associated with unfavorable overall survival and tumor recurrence in patients with COAD. Moreover, SNHG1 promote cell growth and cell migration via upregulating EZH2 and miR-154a-5p in COAD [41]. LncRNA KCNQ1OT1 can promote EMT by decreasing miRNA-217 expression to upregulate ZEB1 axis in COAD [42]. Furthermore, KCNQ1OT1 has been demonstrated to promote chemoresistance of oxaliplatin by iR-34a/ATG4B pathway and it is associated with poor prognosis in COAD [43]. A previous study showed that lncRNA MALAT1 was remarkably upregulated in COAD cells. MALAT1 can promote metastasis of COAD via RUNX2 as a survival factor in tumor cells [44]. MALAT1 evokes EMT and angiogenesis via sponging miR-1265p to upregulate VEGFA, SLUG, and TWIST [45]. Several investigations demonstrated that lncRNA GAS5 can act as a tumor suppressor gene by different actions. It has been illustrated that GAS5   inhibited angiogenesis and metastasis via regulating Wnt signaling pathway in COAD cells [46]. Finally, lncRNA SNHG20 has been reported overexpressed prominently in CRC tissues in comparison to normal ones. Overexpression of SNHG20 was correlated with poor prognosis in the patients [47]. Although, there are several similar studies, the novelties of the current study include; an extensive exploration of lncRNA, mRNA and miRNA signatures, revealing the diagnostic and prognostic value of lncRNA, and constructing a COAD lncRNA-miRNA-mRNA ceRNA network. Hence, our data elucidated that, the suggested lncRNAs can be considered as potential promising biomarkers, which could drive tumorigenesis through hijacking canonical signaling pathways in COAD.

Conclusions
Our data highlighted the importance of lncRNA regulatory networks that might provide a promising therapeutic approach for clinical application by considering lncRNA hubs as potential efficient biomarkers.