Comprehensive Analysis of Therapy-Related Messenger RNAs and Long Noncoding RNAs as Novel Biomarkers for Advanced Colorectal Cancer

Colorectal cancer (CRC) is one of the most common types of human cancers. However, the mechanisms underlying CRC progression remained elusive. This study identified differently expressed messenger RNAs (mRNAs), long noncoding RNAs (lncRNAs), and small nucleolar RNAs (snoRNAs) between pre-therapeutic biopsies and post-therapeutic resections of locally advanced CRC by analyzing a public dataset, GSE94104. We identified 427 dysregulated mRNAs, 4 dysregulated lncRNAs, and 19 dysregulated snoRNAs between pre- and post-therapeutic locally advanced CRC samples. By constructing a protein–protein interaction network and co-expressing networks, we identified 10 key mRNAs, 4 key lncRNAs, and 7 key snoRNAs. Bioinformatics analysis showed therapy-related mRNAs were associated with nucleosome assembly, chromatin silencing at recombinant DNA, negative regulation of gene expression, and DNA replication. Therapy-related lncRNAs were associated with cell adhesion, extracellular matrix organization, angiogenesis, and sister chromatid cohesion. In addition, therapy-related snoRNAs were associated with DNA replication, nucleosome assembly, and telomere organization. We thought this study provided useful information for identifying novel biomarkers for CRC.


INTRODUCTION
Colorectal cancer (CRC) is one of the most common types of human cancers (Ma et al., 2014). The morbidity and mortality of CRC have increased rapidly in recent years (Budai et al., 2004). In 2016, a total of 134,490 new cases of CRC and 49,190 deaths caused by CRC were reported worldwide. In the past decades, the diagnostic technologies and therapeutic strategies of CRC have made significant progress (Ress et al., 2015). However, the prognosis of CRC remained poor with 5-year survival rates being only 10-15%, and the recurrent disease rates of CRC remained high. Therefore, there was still an urgent need to understand the mechanisms underlying CRC progression and identify novel potential biomarkers for the prognosis of CRC.
Emerging studies had demonstrated that noncoding RNAs played crucial roles in the progression of CRC (Rezanejad Bardaji et al., 2018), including microRNAs, long noncoding RNAs (lncRNAs), and small nucleolar RNA (snoRNAs). The important roles of microRNAs in CRC had been studied clearly (Zhang et al., 2012a). lncRNAs are a large class of transcripts longer than 200 bases, with no protein-coding potential. Previous studies had showed that lncRNAs were associated with CRC progression and prognosis. For example, overexpression of lncRNA TUSC7 reduces cell migration and invasion in CRC (Xu J. et al., 2017). lncRNA KCNQ1OT1 enhanced the methotrexate resistance of CRC cells by regulating miR-760/PPP1R1B (Sunamura et al., 2011). LINC01354 interacting with hnRNP-D contributes to the proliferation and metastasis in CRC through activating Wnt/βcatenin signalling (Zhang et al., 2016). Recent studies have also indicated that snoRNAs were also associated with the progression of CRC, for example, Yoshida et al. (2017).
In the present study, we re-annotated a Gene Expression Omnibus (GEO) dataset GSE94104 to identify CRC-related mRNAs and lncRNAs. Bioinformatics analysis was also performed to understand the potential roles of these lncRNAs in CRC. This study could provide novel clues to prove that CRCrelated lncRNAs could serve as biomarkers for CRC.

MATERIALS AND METHODS lncRNA Classification Pipeline
We used a pipeline described by Zhang et al. to re-annotate microarray data using the following criteria (Zhang et al., 2012b). Briefly, first, GPL570 platform of Affymetrix Human Genome U133 Plus 2.0 Array (Affymetrix Inc., Santa Clara, California, USA) probe set ID was mapped to the NetAffx Annotation Files (HG-U133 Plus 2.0 Annotations, CSV format, release 31, 08/23/10). The annotations included the probe set ID, gene symbol, and Refseq transcript ID. Second, the probe sets that were assigned with a Refseq transcript ID in the NetAffx annotations were extracted. In this study, we only retained those labeled as "NR_" (NR indicates noncoding RNA in the Refseq database). Finally, 2,448 annotated lncRNA transcripts with corresponding Affymetrix probe IDs were generated.

Microarray Data and Data Preprocessing
By screening colon cancer-related public datasets in GEO database, we selected GSE94104 dataset for further study, which contained the largest number of therapy-related colon cancer samples. In the present study, we downloaded GSE94104 datasets (Tsukamoto et al., 2011) from GEO database to identify differently expressed mRNAs and lncRNAs. A total of 40 matched formalinfixed paraffin-embedded pre-therapeutic locally advanced rectal cancer biopsy and post-therapeutic locally advanced rectal cancer biopsy samples were included in this study. All samples were provided by the Northern Ireland Biobank and arrayed using the Illumina HumanHT-12 WG-DASL V4 expression beadchip. The raw data were normalized using robust multi-array average method under R 3.4.2 statistical software with affy package from BioConductor. Normalization was separately performed for LCM dataset and homogenized tissue dataset. The normalized gene expression levels were presented as log2-transformed values by robust multi-array average. lncRNAs with fold changes ≥2 and P values <0.05 were considered as differentially expressed lncRNAs.

Co-Expression Network Construction and Analysis
In this study, the Pearson correlation coefficient of different expressed gene-lncRNA pairs was calculated according to the expression value of them. The co-expressed differentially expressed gene-lncRNA pairs with the absolute value of Pearson correlation coefficient ≥0.6 were selected, and the co-expression network was established by using cytoscape software.

Functional Group Analysis
The DAVID system (http://david.ncifcrf.gov/) was used to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses. GO analyses included biological process, cellular component, and molecular function. GO terms and KEGG pathways with a P value of <0.05 were considered as significantly enriched function annotations.

Protein-Protein Interaction Network and Module Analysis
STRING online software was used to construct a protein-protein interaction (PPI) network (Liu et al., 2009) (https://string-db.org/cgi/ input.pl?sessionId = AUH42ZEZwajP&input_page_show_search = on). PPI with the combined score >0.4 was considered as significant. Cytoscape software was used to visualize the PPI network.

Transcriptional Analysis of Therapy-Related Messenger RNAs in Pre-Therapeutic Biopsies and Post-Therapeutic Resections of Locally Advanced Colorectal Cancer
The present study aimed to identify therapy-related mRNAs in advanced CRC using a public dataset, GSE94104. A total of 40 pre-therapeutic advanced CRC samples and 40 post-therapeutic advanced CRC samples were included in this dataset. We identified 427 dysregulated mRNAs between pre-and post-therapeutic locally advanced CRC (LACC) samples, including 235 upregulated mRNAs and 192 downregulated mRNAs after therapy in LACC. Hierarchical clustering was used to show differentially expressed mRNAs in posttherapeutic LACC ( Figure 1A).

Transcriptional Analysis of Therapy-Related Noncoding RNAs in Pre-Therapeutic Biopsies and Post-Therapeutic Resections of Locally Advanced Colorectal Cancer
Next, we focused on identifying noncoding RNAs between preand post-therapeutic LACC samples. A total of 19 snoRNAs and 4 lncRNAs were found to be differently expressed ( Figure 1B).

Protein-Protein Interaction Network Analysis of Therapy-Related Messenger RNAs in Locally Advanced Colorectal Cancer
In order to reveal the relationships among therapy-related mRNAs in LACC, we constructed PPI networks using STRING database.
The combined score >0.4 was used as the cutoff criterion. As shown in Figure 2, a total of 348 nodes and 1,047 edges were included in this PPI network. The nodes that had higher degrees were identified as hub genes, including FN1, CDC20, SPP1, HIST1H3B, ZWINT, CENPF, HIST1H3C, CXCR4, HIST1H3G, and RFC3.

Construction of Therapy-Related Long Noncoding RNAs and Small Nucleolar RNAs Regulating Co-Expression Network in Locally Advanced Colorectal Cancer
In order to reveal the potential functions of therapy-related lncRNAs and snoRNAs in LACC, we first performed Pearson correlation calculation between lncRNAs or snoRNAs and mRNAs in LACC. Based on the correlation analysis results, we constructed mRNA-lncRNA/snoRNAs co-expression networks (p-value < 0.05 and absolute value of correlation coefficient >0.7).
As shown in Figure 3, the mRNAs-lncRNAs co-expression network included 4 lncRNAs (MIR100HG, SERTAD4-AS1, KRTAP5-AS1, and PCAT18) and 226 mRNAs. MIR100HG was FIGURE 1 | Identification of therapy-related mRNAs and ncRNAs in CRC. (A) Hierarchical clustering analysis showed differential mRNAs expression in the CRC by using GSE94104. (B) Hierarchical clustering analysis showed differential ncRNAs expression in the CRC by using GSE94104.

Bioinformatics Analysis of Therapy-Related Messenger RNAs in Locally Advanced Colorectal Cancer
Furthermore, we performed GO and KEGG analysis for therapyrelated mRNAs in LACC (Figures 5 A, B). Bioinformatics analysis showed that the therapy-related mRNAs were mainly involved in regulating nucleosome assembly, chromatin silencing at recombinant DNA (rDNA), negative regulation of gene expression, DNA replication-dependent nucleosome assembly, extracellular matrix organization, cellular protein metabolic process, telomere FIGURE 2 | Construction of therapy-related PPI networks in CRC. We constructed therapy-related PPI networks in CRC, including a total of 348 nodes and 1,047 edges.

Bioinformatics Analysis for Related Long Noncoding RNAs and Small Nucleolar RNAs in Locally Advanced Colorectal Cancer
Then, bioinformatics analysis for related lncRNAs and snoRNAs in LACC was performed using their regulating targets in LACC (Figures 5 C-F). GO analysis showed that differentially expressed lncRNAs were associated with cell adhesion, extracellular matrix organization, angiogenesis, sister chromatid cohesion, positive regulation of transcription, apoptotic process, chromatin silencing at rDNA, epithelial cell differentiation, cell division, and cellular protein metabolic process. KEGG pathway analysis indicated therapy-related lncRNAs were associated with ECM-receptor interaction, transcriptional misregulation in cancer, focal adhesion, pathways in cancer, and PI3K-Akt signaling pathway.
GO analysis showed that differentially expressed snoRNAs were associated with chromatin silencing at rDNA, DNA replication, nucleosome assembly, telomere organization, regulation of gene silencing, muscle organ development, cellular FIGURE 3 | Construction of therapy-related lncRNA regulating co-expression networks in CRC. We constructed therapy-related lncRNA regulating co-expression networks in CRC, including a total of 4 lncRNAs and 226 mRNAs. Red node, lncRNAs; blue nodes, mRNAs.
protein metabolic process, protein heterotetramerization, extracellular matrix organization, and cell division. KEGG pathway analysis indicated that therapy-related snoRNAs were associated with systemic lupus erythematosus, alcoholism, drug metabolism, ECM-receptor interaction, and transcriptional misregulation in cancer.

Expression of Key lncRNAs Were Dysregulated in Colorectal Cancer Samples
In order to investigate the prognostic value of key lncRNAs in CRC, we analyzed an independent public dataset, the Gene Expression Profiling Interactive Analysis (GEPIA) database. By analyzing the GEPIA database, we found that the expression levels of MIR100HG, SERTAD4-AS1, and PCAT18 were significantly downregulated; however, KRTAP5-AS1 was upregulated in both colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) samples compared with that in normal tissues (Figure 6).

DISCUSSION
CRC is one of the most common types of human cancer, which is caused by multiple genetic and epigenetic aberrations. However, the mechanisms underlying CRC remained largely unclear. This study identified differently expressed mRNAs, lncRNAs, and snoRNAs between pre-therapeutic biopsies and post-therapeutic resections of locally advanced CRC by analyzing a public dataset, GSE94104. Then, we constructed a PPI network to identify FIGURE 4 | Construction of therapy-related snoRNAs regulating co-expression networks in CRC. We constructed therapy-related snoRNAs regulating co-expression networks in CRC, including a total of 19 snoRNAs and 360 mRNAs. Red node, snoRNAs; blue nodes, mRNAs.
key therapy-related proteins in LACC. Next, we constructed snoRNAs and lncRNAs regulating co-expression networks to identify key therapy-related snoRNAs and lncRNAs in LACC. Finally, GO and KEGG pathway analyses were conducted to predict their potential functions in LACC.
The present study identified a total of 235 upregulated mRNAs and 192 downregulated mRNAs after therapy in LACC. Bioinformatics analysis showed that these mRNAs were associated with nucleosome assembly, chromatin silencing at rDNA, negative regulation of gene expression, and DNA replication. Furthermore, a PPI network including 348 proteins and 1,037 edges were constructed to reveal the relationship among therapy-related proteins. Ten proteins were identified as key regulators in this network, including FN1, CDC20, SPP1, HIST1H3B, ZWINT, CENPF, HIST1H3C, CXCR4, HIST1H3G, and RFC3. FN1 is a novel protein involved in regulating cancer progression (Ifon et al., 2005). FN1 was found to be dysregulated in multiple human cancers, including colon cancer (Cai et al., 2018). In CRC, a single nucleotide polymorphism in FN1 was found to be associated with tumor shape. FN1 was transcriptionally activated by HMGA2, and the suppression of FN1 inhibited CRC growth and metastasis. CDC20 is a key E3 ligase that binds to APC and recognizes D-box or KEN box substrates to promote proteasomal degradation (Paul et al., 2017). CDC20 was frequently overexpressed in malignant tumors, such as prostate cancer, hepatocellular carcinoma, and ovarian cancer. SPP1 was reported to be overexpressed in numerous tumors, such as lung cancer, colon cancer, breast cancer, and prostate cancer (Xu C. et al., 2017). SPP1 was associated with tumor metastasis in gastric cancer and esophageal adenocarcinoma. Zwint is an important regulatory protein for chromosome movement and mitotic checkpoints FIGURE 5 | Bioinformatics analysis for therapy-related mRNAs, lncRNAs, and snoRNAs in CRC. (A) GO analysis showed therapy-related mRNA-associated biological processes. (B) KEGG pathway analysis showed therapy-related mRNA-associated pathways. (C) GO analysis showed therapy-related lncRNA-associated biological processes. (D) KEGG pathway analysis showed therapy-related lncRNA-associated pathways. (E) GO analysis showed therapy-related snoRNAassociated biological processes. (F) KEGG pathway analysis showed therapy-related snoRNA-associated pathways. (Kasuboski et al., 2011). Previous studies have identified Zwint overexpression in breast and ovarian cancers. CENPF is a part of the centromere-kinetochore complex and is a component of the nuclear matrix during G2 of interphase (Sugimoto et al., 1999). Recent studies showed that CENPF played crucial roles in the progression of human cancers. For example, the altered phosphorylation of CENPF affected glutamine uptake in colon cancers (Michalak et al., 2019). CXCR4 is a transmembrane G-protein-couple receptor and played a central role in the neurotropism of cells (Xu et al., 2015). RFC3 was a member of RFC family, which played a key role in DNA replication, DNA damage repair, and checkpoint control. Multiple studies indicated RFC3 was overexpressed and correlated to the progression of human cancers (Shen et al., 2014). These reports together with our findings suggested that these key regulators may play key roles in regulating the therapy-related biological processes in LACC.
Recent studies showed ncRNAs were involved in regulating multiple cancer-related biological processes, such as cell proliferation, apoptosis, and invasion. For example, HAND2-AS1 was observed to suppress CRC proliferation though sponging miR-1275 (Zhou et al., 2018). SNORA21 played as an oncogenic snoRNA in CRC with a prognostic biomarker potential. However, the ncRNAs involved in CRC therapy remained largely unclear. The present study identified 4 lncRNAs and 19 snoRNAs as therapy-related ncRNAs in LACC. Next, lncRNA-mRNA and snoRNAs-mRNA co-expression networks were constructed. Four lncRNAs, including MIR100HG, SERTAD4-AS1, KRTAP5-AS1, and PCAT18, were found to play crucial roles in this progression. KRTAP5-AS1 was reported as a potential biomarker for papillary thyroid carcinoma. PCAT18 was found to be associated with the progression of gastric cancer (Foroughi et al., 2018) and prostate cancer. For example, PCAT18 silencing inhibited prostate cancer proliferation, migration, and invasion (Zhan et al., 2018). MIR100HG was identified as a key regulator in LACC (Li et al., 2019). A recent study showed that MIR100HG regulates cell cycle by modulating the interaction between HuR and its target mRNAs (Sun et al., 2018). The present study showed that MIR100HG regulated more than 200 mRNAs, including FZD1, FGFR1, FN1, and KLF9. These genes had been demonstrated to be related to CRC progression. For example, KLF9 prevents CRC through inhibition of interferon-related signaling. Downregulation of FN1 suppressed CRC proliferation, migration, and invasion. FZD1 was a key regulator of wnt signaling and involved in regulating CRC metastasis. SERTAD4-AS1 was involved in regulating COL16A1, ISLR, ZNF626, COL6A2, SOX15, FRMD6, PCDHGA9, CDC6, TPM2, and C1R. Among these genes, TPM2 knockdown had been reported to promote CRC progression upon RhoA activation. We also found KRTAP5-AS1 might regulate DPYD, GAL3ST2, CSTL1, HSD11B2, LRCH2, DIAPH3, FERMT1, MRPL4, NEGR1, and LAMA2 in CRC. Among these mRNAs, DPYD variants was reported to be a predictor of 5-fluorouracil toxicity in adjuvant colon cancer treatment. FERMT1 promoted colon cancer metastasis and epithelial-mesenchymal transition progression via modulation of β-catenin transcriptional activity. By analyzing the GEPIA database, we found that the expression levels of MIR100HG, SERTAD4-AS1, and PCAT18 were significantly downregulated; however, KRTAP5-AS1 was upregulated in both COAD and READ samples compared with that in normal tissues. Furthermore, we conducted bioinformatics analysis for these therapy-related lncRNAs and snoRNAs. Our results showed therapy-related lncRNAs were associated with cell adhesion, extracellular matrix organization, angiogenesis, and sister chromatid cohesion. In addition, therapy-related snoRNAs were associated with DNA replication, nucleosome assembly, and telomere organization.
Of note, several limitations should be noted in this study. First, the number of samples used in present study were limited. In the further study, more samples should be included to identify therapy-related lncRNAs, snoRNAs, and mRNAs.
Second, the detail molecular functions and mechanisms of these key lncRNAs and snoRNAs were unclear. The further validation of these genes should be further investigated. Finally, with the development of next-generation sequence methods, RNA-seq would be a more powerful method to identify novel therapy-related lncRNAs, snoRNAs, and mRNAs in LACC.
In conclusion, we identified 427 dysregulated mRNAs, 4 dysregulated lncRNAs, and 19 dysregulated snoRNAs between pre-and post-therapeutic LACC samples. By constructing a PPI network and co-expressing networks, we identified 10 key mRNAs, 4 key lncRNAs, and 7 key snoRNAs. Bioinformatics analysis showed that therapy-related mRNAs were associated with nucleosome assembly, chromatin silencing at rDNA, negative regulation of gene expression, and DNA replication. Therapy-related lncRNAs were associated with cell adhesion, extracellular matrix organization, angiogenesis, and sister chromatid cohesion. Furthermore, therapy-related snoRNAs were associated with DNA replication, nucleosome assembly, and telomere organization. We think this study provided useful information for identifying novel biomarkers for CRC.

DATA AVAILABILITY
The datasets generated for this study can be found in the GSE94104.