Construction of competing endogenous RNA network and identification of novel molecular biomarkers in colon cancer

Abstract Colon cancer patients suffer from high incidence and mortality rates worldwide. More novel molecular biomarkers should be used for the diagnosis and treatment of colon cancer. Long noncoding RNAs (lncRNAs) are found to be involved in colon cancer tumorigenesis and metastasis. This study aimed to identify novel lncRNAs in colon cancer. Two independent datasets (GSE70880 and GSE110715) were downloaded from the Gene Expression Omnibus database and merged with the sva package. R software was used to distinguish differentially expressed lncRNAs and mRNAs in the merged dataset. The competing endogenous RNA (ceRNA) network was constructed using differentially expressed lncRNAs and mRNAs with Cytoscape. Differentially expressed RNAs in the ceRNA network were further verified using the Cancer Genome Atlas database. Gene oncology analysis, Kyoto Encyclopedia of Genes and Genomes pathway enrichment and survival analysis were also performed to identify hub genes. A total of 99 differentially expressed lncRNAs and 95 differentially expressed mRNAs were identified in the merged database. Ten lncRNAs, 8 miRNAs, and 6 mRNAs were involved in the ceRNA network, in which LINC00114 and UCA1 were highly expressed in colon cancer. They were both associated with early tumor stages and might be used for the early diagnosis of colon cancer. High expression of LINC00114 can lead to poor overall survival of colon cancer patients. Furthermore, new pathways such as LINC00114/miR-107/PCKS5, UCA1/miR-107/PCKS5, and UCA1/miR-129-5p/SEMA6A were identified. Two novel lncRNAs (LINC00114 and UCA1) in colon cancer were identified by bioinformatics analysis. They might contribute to the occurrence and development of colon cancer. In addition, LINC00114 may be involved in the overall survival of colon cancer patients.


Introduction
Colon cancer is one of the most common malignant gastrointestinal tumors. Worldwide the incidence of colon cancer was in the third place lower than lung and breast cancer. The mortality rate of colon cancer was in the second place lower than that of lung cancer. [1] The incidence (11.13%) and mortality (8.15%) rate of colon cancer were both in fifth place in China. [2] Colon cancer patients are divided into different stages according to the tumor/node/metastases system. Most patients are found in advanced stages because of their asymptomatic nature and usually have a poor prognosis. [3] Although molecular biomarkers such as carcinoembryonic antigen (CEA) and carbohydrate 19-9 (CA19.9) are associated with early diagnosis and prognostic assessment of colon cancer, they are not currently used in clinical practice as they are unreliable. [4] Therefore, novel biomarkers need to be identified and applied to the diagnosis or treatment of colon cancer.
Researchers have focused on identifying DNA, RNA, or proteins as molecular biomarkers to detect cancers. Recently, studies have investigated the regulation of non-coding RNAs, which may have an impact on development of cancers. [5] LncRNAs, which contain more than 200 nucleotides belong to noncoding RNA families. They have no protein-coding ability and were once considered to be transcriptional noise. [6,7] The aberrant expression of lncRNAs in cancers has been demonstrated to influence apoptosis, proliferation, transfer, and migration of cancer cells by regulating DNA, RNA, protein expression and interaction between them. [8] In addition, mounting evidence indicates that lncRNAs are implicated in a variety of cancer biological processes, including chromatin interaction, transcription regulation, and so on. [9,10] Generally, lncRNAs exert their function through regulating underlying target genes expression at the epigenetic, transcriptional, and posttranscriptional levels. [11] LncRNAs can competitively bind with miRNA to regulate mRNA expression which is considered as ceRNA theory. CeRNA plays an important role in the occurrence and development of cancer. [12] Qiong Wu et al found that lncRNA MALAT1 regulated HMGB1 expression by sponging miR-129-5p to induce colon cancer development. [13] Changwei Lin et al found that LINC01234 upregulated SHMT2 expression by competitively binding to miR-642a-5p to promote colon cancer proliferation. [14] GSE70880 and GSE110715 were downloaded from the Gene Expression Omnibus (GEO) database in our study. After gene reannotation and batch normalization, we performed gene expression analysis to identify differentially expressed genes (DEGs). Then the ceRNA network was constructed using differentially expressed lncRNAs and mRNAs. The results were validated using the Cancer Genome Atlas (TCGA) database. Eventually we find that LINC00114 and UCA1 are upregulated in colon cancer tissues compared to normal tissues. High expression of LINC00114 leads to improved overall survival (OS) in colon cancer patients. The expression of LINC00114 is higher in stage I and UCA1 is lower, indicating that they can be used for early colon cancer diagnosis. In addition, LINC00114/ miR-107/PCSK5, UCA1/miR-107/PCKS5, and UCA1/miR-129-5p/SEMA6A axes may play an important role in colon cancer carcinogenesis and require further research.

Gene expression microarray datasets
We downloaded the RNA expression profiles from the GEO database. Datasets that met these criteria were considered eligible: 1. Studies focused on colon cancer patients; 2. The technology and platform information were available for studies; 3. Studies contained colon cancer tissues and adjacent nontumorous normal tissues.
Two datasets GSE70880 (including 20 pairs of colon cancer samples and adjacent non-tumor samples from platform GPL19748) and GSE110715 (including 6 pairs of colon cancer samples and adjacent non-tumor samples from platform GPL18180) were included for further study. The details of the two datasets are shown in Table 1.

Differentially expressed genes
We merged two datasets into one that included 26 pairs of colon cancer samples and adjacent non-tumor samples. To eliminate bias between the two datasets, we used sva packages with R software (version 3.6.2) to batch-normalize the merged dataset. We then used the limma package to identify DEGs. Adjusted Pvalue < .05 and log j fold change (FC) j > 1 were the selection criteria. DEGs that met the criteria were considered statistically significant. The results were visualized by volcano plots and heatmaps using R software.  Table 2 The clinical information of colon cancer patients in TCGA database.

Ethics and dissemination
The study protocol was approved by the Ethics Committee of Nanjing Jiangbei People's Hospital. All data were downloaded from public databases. No patients were involved in our study and there was no written informed consent.

Gene expression profile from GEO database
Two gene expression profiles (GSE70880 and GSE110715) and their platform information were downloaded from the GEO database. There were 505 downregulated genes and 379 upregulated genes in GSE70880 (Fig. 1A). GSE110715 contained 299 downregulated genes and 199 upregulated genes (Fig. 1B). We merged these two datasets into one. After batch normalization, there were 111 downregulated genes and 83 upregulated genes in the merged dataset (Fig. 1C). The names of these DEGs are listed in Table 3. The merged dataset contained 99 differentially expressed lncRNAs (up: 53, down: 46, Fig. 2A) and 95 differentially expressed mRNAs (up: 30, down: 65, Fig. 2B).

CeRNA network
Ten lncRNAs, eight miRNAs, and six mRNAs were involved in the ceRNA network (Fig. 3). There were 24 nodes and 39 lines in the ceRNA network. Six miRNAs interact with lncRNA ADAMTS9-AS1, indicating that it might play an important role in the ceRNA network.

Go terms and KEGG pathway analysis and PPI
We performed GO terms and KEGG pathway analysis to determine the potential functions of mRNAs included in the ceRNA network. We found that biological processes were associated with metabolic processes, multicellular organismal processes and biological regulation. Cellular components gathered in the membrane, extracellular space and membrane-enclosed lumen. Molecular functions were associated with protein binding. All these results are shown in Figure 4A. KEGG pathway analysis showed that mRNAs were mainly involved in dipeptide transmembrane transport and dipeptide transport (Fig. 4B). However, the results were not statistically significant, and the false discovery rate (FDR) > 0.05. PPI results showed that SLC7A11 had the highest connectivity and three mRNAs interacted with it (Fig. 4C).

Survival analysis
Gene expression profiles and clinical information of 437 samples (398 tumors and 39 adjacent normal tissues) of colon cancer patients were obtained from the TCGA database. Nineteen tumor samples were deleted because of a lack of follow-up time.
Eventually, 379 colon tumor samples were included in the analysis. In order to identify prognostic genes of colon cancer patients in the ceRNA network, we performed survival analysis based on gene expression and survival time of patients in the TCGA database. We find that LINC00114 is associated with OS in colon cancer patients (P = 4.255eÀ03, Fig. 5). Colon cancer patients with high LINC00114 expression have longer survival time than those with low expression. In mRNAs, SLC16A9 is found to be associated with OS in colon cancer patients (P = 2.248eÀ02, Fig. 6). High expression of SLC16A9 leads to improved OS in colon cancer patients.

LncRNAs expression levels and clinicopathological variables
The expression levels of LINC00114 and UCA1 were validated using the TCGA database. LINC00114 and UCA1 are both highly expressed in colon cancer tissues compared to normal tissues in the TCGA database ( Fig. 7A and B). The results are consistent when using paired colon cancer tissues from the TCGA database ( Fig. 7C and D). We then investigated the correlation between lncRNAs expression levels and clinicopathological characteristics and find that the expression of LINC00114 decreased with increasing tumor stage (P = .003, Fig. 7E). However, the relationship between UCA1 expression and tumor stage is the opposite (P = .029, Fig. 7F). These results indicates that both LINC00114 and UCA1 can distinguish colon cancer patients from different stages. In conclusion, LINC00114 has the potential to be a prognostic biomarker and they can be used for early diagnosis of colon cancer.

Discussion
In 2011, the ceRNA theory was proposed by Salmena et al. They found that miRNAs could target the same miRNA response elements (MREs) in different RNAs. LncRNA and mRNA sharing the same MRE competed with each other for limited miRNA and mRNA expression was depressed. [12] LncRNAs act as ceRNA that play an important role in cellular biological processes and tumorigenesis. [15] The ceRNA network has been constructed in different types of cancers including gastric cancer, [16] tongue squamous cell carcinoma, [17] hepatocellular carcinoma, [18] breast cancer, [19] and so on. In 2018, Zhang et al constructed a lncRNA-associated ceRNA network in colon cancer using the TCGA database to identify biomarkers. Finally, they constructed a ceRNA network including 64 lncRNAs, 18 miRNAs, and 42 mRNAs and found that mRNAs were enriched in many cancer-related pathways. [20] In our study, we constructed a ceRNA network using the GEO database and verified it in the TCGA database. We included 10 lncRNAs, 8 miRNAs, and 6 mRNAs in the ceRNA network. Among these lncRNAs, LINC00114 is confirmed to be associated with improved OS in colon cancer patients. LINC00114 and UCA1 are confirmed to be associated with the early stages of colon cancer. Go analysis reveals that mRNAs were associated with many cancer-related processes.
Few studies have focused on LINC00114 which mapped on chr21. Lv et al found that LINC00114 is highly expressed in colorectal cancer. It can bind to EZH2 and DNMT1 to induce promoter methylation of miR-133b and miR-133b expression was downregulated. It also sponges miR-133b to regulate NUP214 expression. [21] Han et al demonstrated that LINC00114 plays an important role in nasopharyngeal carcinoma (NPC). It was upregulated in NPC and could be used as a diagnosis marker. It competitively sponges miR-203 to regulate the ERK/JNK pathway to promote the progression and radioresistance of NPC. [22] Our study predicts that LINC00114 is upregulated in colon cancer tissues, which is consistent with Lv's study. It may regulates PCSK5 expression by binding to miR-107. The LINC00114/miR-107/PCKS5 axis may plays an important role in colon cancer tumorigenesis and requires further research.
LncRNA UCA1 which mapped on chr19 has been demonstrated involved in many different tumors such as non-small cell lung cancer, [23] renal cell carcinoma, [24] prostate cancer, [25] and so on. Few studies have focused on the function of UCA1 in colon cancer. Cui et al find that UCA1 promotes colon cancer progression by binding to miR-28-5p and regulates HOXB3 expression. [26] Cao et al find that UCA1 inhibits the carcinogenesis and metastasis of colon cancer by regulating the miR-185-5p/ MAPK14/MAPKAPK2/HSP27 axis. [27] In our study, we find that UCA1 expression is upregulated in colon cancer and is associated with early tumor stage. We also predict two novel pathways, UCA1/miR-107/PCKS5 and UCA1/miR-129-5p/SEMA6A in colon cancer.
SLC16A9, also called monocarboxylate transporter 9 (MCT9) is a plasma membrane transporter belonging to the monocarboxylic acid transporter family SLC16. SLC16A9 is responsible for carnitine and creatine, which is presumably transported at the basolateral membrane of enterocytes. [28] SLAC16A9 has also been reported to be associated with some cancer types. Fernandez-Ranvier et al found that SLAC16A9 could be a good diagnostic target for distinguishing benign from malignant adrenocortical tumors. [29] Lim et al found that it was highly expressed in diffuse large B-cell lymphoma. [30] What's more, an informatic analysis have found SLC16A9 had potential diagnostic value for indicating the occurrence of colorectal cancer. [31] In our study, we find that SLC16A9 is highly expressed in colon cancer which leads to improved OS. More related molecular experiments should be performed to explore the role of SLC16A9 in colon cancer.
However, there are some limitations in our study. First, although we batch-normalized the datasets, the bias might still exist because they come from two different platforms. Secondly, we only included two datasets and there were 26 pairs of samples. More samples need to be included to obtain more precise results. Last but not the least, there were only six mRNA in ceRNA network which led to poor result of KEGG analysis.
In conclusion, we have constructed a ceRNA network in colon cancer using the GEO database and verified it in the TCGA database. We find that LINC00114 and UCA1 are upregulated in colon cancer which to be associated with the early tumor stage. They have the potential to serve as new biomarkers for colon cancer.