Functional analysis of lncRNAs based on competitive endogenous RNA in tongue squamous cell carcinoma

Backround Tongue squamous cell carcinoma (TSCC) is the most common malignant tumor in the oral cavity. An increasing number of studies have suggested that long noncoding RNA (lncRNA) plays an important role in the biological process of disease and is closely related to the occurrence and development of disease, including TSCC. Although many lncRNAs have been discovered, there remains a lack of research on the function and mechanism of lncRNAs. To better understand the clinical role and biological function of lncRNAs in TSCC, we conducted this study. Methods In this study, 162 tongue samples, including 147 TSCC samples and 15 normal control samples, were investigated and downloaded from The Cancer Genome Atlas (TCGA). We constructed a competitive endogenous RNA (ceRNA) regulatory network. Then, we investigated two lncRNAs as key lncRNAs using Kaplan–Meier curve analysis and constructed a key lncRNA-miRNA-mRNA subnetwork. Furthermore, gene set enrichment analysis (GSEA) was carried out on mRNAs in the subnetwork after multivariate survival analysis of the Cox proportional hazards regression model. Results The ceRNA regulatory network consists of six differentially expressed miRNAs (DEmiRNAs), 29 differentially expressed lncRNAs (DElncRNAs) and six differentially expressed mRNAs (DEmRNAs). Kaplan-Meier curve analysis of lncRNAs in the TSCC ceRNA regulatory network showed that only two lncRNAs, including LINC00261 and PART1, are correlated with the total survival time of TSCC patients. After we constructed the key lncRNA-miRNA -RNA sub network, the GSEA results showed that key lncRNA are mainly related to cytokines and the immune system. High expression levels of LINC00261 indicate a poor prognosis, while a high expression level of PART1 indicates a better prognosis.


INTRODUCTION
Tongue squamous cell carcinoma (TSCC) is one of the most lethal malignancies of oral cancers, and it is the most common oral cancer (Mannelli et al., 2018;Kamali et al., 2017). TSCC is highly invasive and prone to recurrence and early metastasis, often endangering the lives of patients, and treatment is more difficult. Therefore, exploring the molecular mechanism of the occurrence and development of TSCC, studying the molecular markers of TSCC and finding effective molecular therapeutic targets play an important role in improving the survival rate of patients with TSCC.
In recent years, a regulatory network composed of long noncoding RNAs (lncRNAs), microRNAs (miRNAs) and messenger RNAs (mRNAs) has gained interest in the study of molecular biological mechanisms involved in the process of tumor occurrence and progression. lncRNA is a noncoding RNA with a length greater than 200 nucleotides. Studies have shown that lncRNA plays an important role in many physiological activities, such as the dosage compensation effect, epigenetic regulation, cell cycle regulation and cell differentiation regulation (Peng, Koirala & Mo, 2017;Koch, 2017). In recent years, increasing evidence has shown that lncRNA has important potential application prospects in the treatment and diagnosis of malignant tumors and other human diseases (Fan et al., 2018;Zhang et al., 2018c).
miRNA is a kind of endogenous small RNA with a length of approximately 20-24 nucleotides that plays a variety of important regulatory roles in cells. miRNA is involved in the regulation of gene expression after transcription in cells. It can act on RNA and render it untranslatable, which leads to the silencing of corresponding genes (Ye et al., 2008). Researchers have shown that miRNA is directly related to many important physiological and biochemical processes and diseases, including tumor occurrence and metastasis (Wang, Hu & Liu, 2015;Amirkhah et al., 2015).
Some RNAs, pseudogenes, long-chain noncoding RNA (lncRNA) and cyclic RNA (circRNA) contain some binding sites for miRNAs, which can competently bind to the same miRNAs through miRNA response elements (MREs), thus eliminating or alleviating the inhibition of miRNAs on target genes and regulating the expression of target genes. This mechanism is referred to as the competitive endogenous RNA (ceRNA) hypothesis (Qi et al., 2015). An increasing number of studies have shown that in the process of tumorigenesis, lncRNA can be used as ceRNA, which can inhibit or promote tumorigenesis by competitive binding of the same tumorigenic or antitumorigenic miRNAs (Karreth & Pandolfi, 2013). For example, a study showed that lncRNA Gas5 acts as a ceRNA to regulate PTEN expression by sponging miR-222-3p in papillary thyroid carcinoma (Zhang, Ye & Zhao, 2017). In addition, it has been showed that lncRNA SNHG1 can antagonize the effect of miR-145a-5p on the downregulation of NUAK1 expression in nasopharyngeal carcinoma cells (Lan & Liu, 2019). In order to diagnose and treat oral tumors more accurately, more and more scholars are studying the mechanism of lncRNA in the development of oral tumors. One study explored the role of lncRNA in OSCC by conducting a series of analyses of data downloaded from GEO databases and conducting related experiments to verify the conclusions. Finally, the authors found that lncRNA H1, as a ceRNA, can promote the development of tumors by inhibiting the expression of miR-138 and EZH2, which is helpful for the treatment of OSCC (Hong et al., 2018). In addition, some scholars have found that lncRNA OIP5-AS1 is closely related to undifferentiated oral tumors, which can regulate downstream target genes. Moreover, lncRNA OIP5-AS1 can be transactivated by stemness-associated transcription factors in cancer, resulting in poor prognosis (Arunkumar et al., 2018).
There are an increasing number of studies on the construction of ceRNA networks for cancer research, such as gastric, lung, renal and kidney networks (Xia et al., 2014;Wang et al., 2018;Qu et al., 2016). However, until now, there has been nearly no research on lncRNAs and miRNAs related to TSCC based on high-throughput sequencing and a large-scale sample size. The purpose of this study was to analyze the expression of lncRNAs in TSCC and their correlation with clinical indicators based on the competitive endogenous RNA (ceRNA) theory and to explore the possible molecular mechanism of lncRNAs in TSCC by gene set enrichment analysis (GSEA). We aimed to provide new indicators for the early diagnosis and prognosis of TSCC and provide new targets for the treatment of TSCC.

Selection of datasets and data collection
Gene expression data of TSCC were downloaded from The Cancer Genome Atlas (TCGA) (https://gdc-portal.nci.nih.gov/). TCGA is a project co-supervised by the National Cancer Institute and the National Human Genome Research Institute. It aims to apply highthroughput genome analysis technology to help people have a better understanding of cancer and improve the ability to prevent, diagnose and treat cancer. As the largest cancer gene information database at present, TCGA not only contains many cancer types, but also contains abundant multigroup data. We downloaded RNA-Seq V2 of tongue samples from TCGA, and this data is raw count, which comes from 162 tongue samples. Meanwhile, the downloaded miRNAs-seq came from 169 tongue samples.

RNA sequence data processing
We downloaded all RNA sequences of 162 tongue samples from TCGA. In addition, we organized the downloaded data into gene expression matrix files and divided the 162 samples into 147 tumor samples and 15 normal control samples by the programming code written in Perl software. And we used the same method to convert the miRNA seq into miRNA matrix files and divided the 169 samples into 154 tumor samples and 15 normal control samples. And we also count the clinical data of the samples. The results are shown in the Table 1.
In this study, we mainly used programming code written in Perl software and R language to analyze the RNA data.

Analysis of DEGs, DElncRNAs, and DEmiRNAs in TSCC
Ensembl (version 89; https://uswest.ensembl.org/info/website/archives/index.html) (Aken et al., 2016) is a software system capable of automatic annotation and maintenance of eukaryotic genomes. It is co-operated by the Wellcome Foundation of Sanger Institute and the European Institute of Bioinformatics, a division of the European Molecular Biology Laboratory. We used the Ensembl database to screen for RNA and lncRNA and exclude the RNA and lncRNA that do not exist in the database.
Before conducting differential expression analysis, we processed the data and deleted outliers samples. Using ''WGCNA'' package in R software (R Core Team, 2018), clustering analysis was carried out on the expression data of mRNA, lncRNA and miRNA respectively, and outlier samples were deleted, the results are shown in supplemental files. Next, we obtained differentially expressed mRNAs (DEmRNAs), lncRNAs (DElncRNAs) and miRNAs (DEmiRNAs) using the ''edgeR'' package in R software. The P-value was obtained using the false discovery rate (FDR) to correct the statistical significance of multiple tests. We specified fold changes (log2 absolute) > 2 and an FDR adjusted to P < 0.01 as statistically significant.

Construction of a ceRNA regulatory network
miRcode database is a software to predict microRNA targets. Currently, it covers a complete set of transcripts annotated by GENCODE, including 10,419 registered lncRNAs. When predicting the binding sites of miRNAs, we used the miRNA family defined by Target Scan v6. The miRNAs under the same family have the same binding site type. For the predicted region of the binding sites of miRNAs, the results are filtered and screened according to the gene type, the conservativeness of binding sites and the distribution of transcripts. First, the DElncRNAs and DEmiRNAs were matched in the miRcode database (Jeggari, Marks & Larsson, 2012). Unmatched lncRNAs and miRNAs were deleted. StarBase is used to annotate the suffix of miRNA. miRTarBase (Chou et al., 2016), miRDB (Wong & Wang, 2015) and TargetScan (Fromm et al., 2015) databases are all websites that can predict miRNA target genes. In order to make the results more accurate, we used the three databases to predict the target genes of DEmiRNAs. After that, the ''Venn Diagram'' package in R software was used to screen the target genes of DEmiRNAs by crossing the differentially expressed genes with the target genes, in order to prepare for the construction of ceRNA network. In order to measure the binding, a correlation analysis is necessary. We used the code in R software to detect the correlation between miRNAs and mRNAs, and between miRNAs and lncRNAs. P < 0.05 was considered statistically significant. The results of correlation analysis were provided in the supplemental files. Then, we obtained the ceRNA regulatory network of lncRNA-miRNA-mRNA and mapped it with Cytoscape software (http://www.cytoscape.org/).

Survival analysis
We used the ''survival'' package in R software for survival analysis of all RNAs in the ceRNA network. The expression of lncRNAs, mRNAs and miRNAs correlated with overall survival was identified by the univariate Cox proportional hazards regression model. The expression of RNAs was sequenced from low to high. The first 50% of the samples were used as the low expression group, and the second 50% as the high expression group. Then, we used the log-rank test to compare the significant differences in the univariate analysis between subgroups for the overall survival rates. P<0.05 was considered statistically significant.

Construction of the key lncRNA-miRNA-mRNA subnetwork
We used lncRNAs after survival analysis as key lncRNAs and extracted their linked mRNAs and miRNAs to reconstruct the subnetwork in Cytoscape software.

Multivariate survival analysis of cox proportional risk regression model
The Cox proportional hazards regression model plays a key role in the clinical diagnosis of cancer and can be used to detect the risk of key genes. We first downloaded gene expression data from TCGA and then merged the data with Perl script to obtain relevant survival information. Then, mRNAs in the key lncRNA-miRNA-mRNA subnetwork were regarded as key genes. Next, we performed multivariate survival analysis of the key genes using the ''survival'' package in R software, and the risk level of key genes was obtained. We provided the multi-variate analysis in the Supplemental Files.

Gene Set Enrichment Analysis (GSEA)
GSEA connects expression profile chip data with biological significance through statistical analysis, which enables researchers to interpret chip results more easily and reasonably.
To further explore the role of the ceRNA regulatory network in the occurrence and development of TSCC, after multivariate analysis, the key genes were divided into two groups according to the risk level as follows: h represents the high gene expression group and l represents the low gene expression group. Then, the results were imported into the GSEA software. This analysis used two gene sets as follows: the KEGG gene set and the gene ontology gene set, including biological processes, cellular components and molecular functions. FDRs (false discovery rates) <25% for the gene sets indicated significantly enriched gene sets using the GSEA software. In the GSEA results, the gene set with FDR < 25% is the enrichment set by default. Because it is possible to generate meaningful hypotheses from these functional genes to facilitate further research. In most cases, the choice of FDR less than 25% is appropriate.

DERNAs between primary tumor and control samples
Before the differential expression analysis, the clustering analysis were performed to check the outlier samples. Finally, one cancer sample was deleted from the mRNA data, four control samples were deleted from the lncRNA data, and one control sample was deleted from the miRNA data, the results are shown in Supplemental Files. Based on

Construction of the ceRNA regulatory network
To better understand the role of DElncRNAs in TSCC and further elaborate on the interaction between DElncRNAs and DEmiRNAs, a ceRNA regulatory network map related to lncRNA-miRNA-mRNA was constructed using 627 DElncRNAs and 75 DEmiRNAs obtained from the abovementioned studies. In the ceRNA regulatory network of TSCC, the interaction between 627 DEIncRNAs and 75 DERNAs was predicted based on human gene data in the downloaded microcode database using the Perl program. A total of 45 pairs of interacting lncRNAs and microRNAs were obtained. According to the microTarBase, TargetScan and microRNADB databases, 99 DEmiRNAs targeting mRNAs were searched. Then, eight differentially expressed and targeted mRNAs were screened out by comparing the targeted mRNAs with TSCC DEmRNAs from TCGA. Cytoscape was used to construct a ceRNA regulatory network including six DEmiRNAs, 29 DElncRNAs and six DEmRNAs in TSCC based on the abovementioned RNAs (Fig. 2).

Correlation analysis of survival in the ceRNA network
To further determine whether the expression of genes in the regulatory network is related to the prognosis of TSCC patients, Kaplan-Meier curve analysis of lncRNAs in the TSCC ceRNA regulatory network showed that only two lncRNAs, including LINC00261 and PART1, were correlated with the overall survival time of TSCC patients (P < 0.05), as shown in Fig. 3. Moreover, the expression levels of LINC00261 were negatively correlated with the overall survival rate, while the expression levels of PART1 were positively correlated with the overall survival rate.
The key lncRNA-miRNA-mRNA sub-network LncRNAs and mRNAs appear in the coexpression patterns of ceRNA. Thus, we used two lncRNAs, including LINC00261 and PART1, as the key lncRNAs. We extracted their linked miRNAs and mRNAs and reconstructed a new subnetwork using Cytoscape software. As shown in Fig. 4, The lncRNA PART1-miRNA-mRNA subnetwork was composed of one lncRNA node, three miRNA nodes, and four mRNA nodes. Functional prediction of lncRNAs based on the key lncRNA-miRNA-mRNA subnetwork In the network, one or more mRNAs are centered on a lncRNA and connected with it. We can better speculate the function of lncRNA according to the function of the mRNA linked to the lncRNA. Therefore, after multifactor analysis of the mRNAs in the constructed subnetwork, the subnetwork was divided into a high-expression group and a low-expression group according to the risk value, and GSEA analysis was then carried out to explore the biological pathways involved in lncRNA. The GSEA results showed that in the high-expression group of PART1 (Fig. 5), six significantly enriched gene sets were screened, including one leukocyte migration gene set, two leukocyte chemotaxis gene sets, one immune system process gene set, one innate immune response gene set and one tyrosine peptide phosphorylation gene set.

DISCUSSION
In recent decades, TSCC is the most common oral squamous cell carcinoma, and its treatment is difficult. Currently, basic research and clinical treatment of tongue cancer are the focuses of head and neck cancer surgery. The overall clinical treatment of tongue cancer has improved in recent decades. The 5-year survival rate is 50-60% (Sano & Myers, 2007;Torre et al., 2015), but it has been stagnant. Neoadjuvant therapies, such as biotherapy and targeted therapy, have been gradually applied in clinical practice. It has been shown that ceRNAs alter gene expression through a mechanism mediated by microRNAs, thereby affecting cell function, which may lead to cancer (Qi et al., 2015). However, only a few studies have been performed on ceRNA in TSCC. More than 10,000 kinds of lncRNAs have been shown to have potential ceRNA characteristics (Tay, Rinn & Pandolfi, 2014). In recent years, an increasing number of studies have confirmed that lncRNAs, as a competitive platform for miRNAs and mRNAs, is involved in the regulation of the cell cycle and cell death in various malignant tumors, such as breast, gastric, liver, lung and esophageal cancers (Luan et al., 2017;Zhang et al., 2018a;Wang et al., 2017a;Jiao et al., 2016). LncRNA effects the invasion and metastasis of tumors, thus playing an important role in the occurrence and development of tumors. In the study of TSCC, many lncRNAs have been shown to be associated with the occurrence, development and prognosis of TSCC. In a study published in 2018, a high level of lncRNA KCNQ1OT1 expression was shown to be closely correlated with poor prognosis in TSCC. Furthermore, KCNQ1OT1 was shown to promote TSCC proliferation (Zhang et al., 2018b).
Another study showed that lncRNA MALAT1 expression is upregulated in TSCC tissues by investigating the expression and function of MALAT1 in TSCC tissues and cells, and MALAT1 was shown to be related to cervical lymph node metastasis. Finally, MALAT1 was shown to induce cell migration, invasion and EMT and inhibit apoptosis by modulating the Wnt/ β-catenin signaling pathway (Liang et al., 2017). These results may provide new insights for the treatment of TSCC.
Therefore, to study the functional roles of lncRNAs as ceRNA in the occurrence and development of TSCC and the regulatory mechanism of lncRNAs, we used data downloaded from TCGA to create a lncRNA-miRNA-mRNA network and carried out a series of analyses, which is very relevant for the treatment of TSCC. In our study, two lncRNAs, including LINC00261 and PART1, were shown to be closely related to overall survival. Patients with high expression levels of LINC00261 had a poor prognosis, while PART1 had the opposite effect. The results of other studies are different from this study. A study (Yu et al., 2017) showed that patients with low expression levels of LINC00261 had a higher recurrence rate than those with high expression levels of LINC00261 in gastric cancer, suggesting that patients with low expression levels of LINC00261 have a poor prognosis. We hypothesize that the results are different because of the small number of samples in the present study.
Currently, there are few studies on the two kinds of key lncRNAs in TSCC, but many scholars have indicated that these key lncRNAs have an impact on the occurrence and development of other diseases. LINC00261 plays an important role in gastric, lung and colon cancers (Liu, Xiao & Xu, 2017;Wang et al., 2017c). Many studies have proven that LINC00261 can inhibit cell proliferation and invasion, thus promoting the development of cancer (Sha et al., 2017;Wang et al., 2017b). There are fewer studies on PART1 than LNC00261, and only a few studies have investigated its role in prostate cancer and esophageal squamous cell carcinoma (Sun et al., 2018;Kang et al., 2018). A 2018 study showed that lncRNA PART1 modulates the toll-like receptor pathway to influence cell proliferation and apoptosis in prostate cancer cells. PART1 can be used as a novel biomarker and target in the treatment of prostate cancer (Sun et al., 2018).
The results of enrichment analysis of the lncRNA-miRNA-mRNA subnetwork showed that PART1 is mainly related to cytokines and the immune system and could regulate the immune response and migration of cytokines, thus affecting the occurrence and development of tumors. However, little research has been performed on the effects of these two lncRNAs on immunity and cytokines.

CONCLUSIONS
Based on the ceRNA theory, we constructed a ceRNA regulatory network, which for the first time enabled an overall perspective and analysis of the lncRNA-associated ceRNAmediated genes in the development of TSCC at a system-wide level. Our research showed that lncRNAs affect the development of TSCC. Furthermore, the constructed key lncRNA-miRNA-mRNA subnetwork also showed that PART1 can provide new indicators for the early diagnosis and prognosis of TSCC and provide new targets for its treatment. However, the molecular mechanism of the two lncRNAs in TSCC remains unclear. Although the