A prognostic long non-coding RNA-associated competing endogenous RNA network in head and neck squamous cell carcinoma

Background This study aimed to develop multi-RNA-based models using a competing endogenous RNA (ceRNA) regulatory network to provide survival risk prediction in head and neck squamous cell carcinoma (HNSCC). Methods All long non-coding RNA (lncRNA), microRNA (miRNA), and mRNA expression data and clinicopathological features related to HNSCC were derived from The Cancer Genome Atlas. Differentially expressed RNAs were calculated using R. Prognostic factors were identified using univariate Cox regression analysis. Functional analysis was performed using GO, KEGG pathways, and PPI network. Based on the results, we derived a risk signature and compared high- and low-risk subgroups using LASSO regression analysis. Survival analysis and the relationship between risk signature and clinicopathological features were performed using log-rank tests and Cox regression analysis. A ceRNA regulatory network was constructed, and prognostic lncRNAs and miRNA expression levels were validated in vitro and in vivo. Results A list of 207 lncRNAs, 18 miRNAs and 362 mRNAs related to overall survival was established. Five lncRNAs (HOTTIP, LINC00460, RMST, SFTA1P, and TM4SF19-AS1), one miRNA (hsa-miR-206), and one mRNA (STC2) were used to construct the ceRNA network. Three prognostic models contained 13 lncRNAs, eight miRNAs, and 17 mRNAs, which correlated with the patient status, disease-free survival (DFS), stage, grade, T stage, N stage, TP53 mutation status, angiolymphatic invasion, HPV status, and extracapsular spread. KEGG pathway analysis revealed significant enrichment of “Transcriptional misregulation in cancer” and “Neuroactive ligand-receptor interaction.” In addition, HOTTIP, LINC00460, miR-206 and STC2 were validated in GTEx data, GEO microarrays and six HNSCC cell lines. Conclusions Our findings clarify the interaction of ceRNA regulatory networks and crucial clinicopathological features. These results show that prognostic biomarkers can be identified by constructing multi-RNA-based prognostic models, which can be used for survival risk prediction in patients with HNSCC.


INTRODUCTION
Head and neck squamous cell carcinoma (HNSCC) has the ninth highest global incidence of all malignancies, and accounted for approximately 53,260 new cases and approximately 10,750 deaths in 2019 in the United States of America (Ward et al., 2019;Siegel, Miller & Jemal, 2020;Thrift & Gudenkauf, 2019). Despite advances in surgical procedures, chemoradiotherapy, and targeted therapy, clinical outcomes have remained unchanged for decades, with the five-year survival rate ranging from 40% to 50% over the past three decades (Braakhuis, Leemans & Visser, 2014). HNSCC is characterized by a remarkably aggressive biological behavior with a high incidence of lymph node and distant metastasis that leads to poor prognosis (Ding et al., 2019;Sun et al., 2017). The 8th edition of American Joint Commission Cancer (AJCC) introduced depth of invasion and extranodal extension as criteria for T and N stages, respectively (Amin et al., 2017). TNM staging are useful for determination of prognosis and development of strategic treatment plans based on the anatomical situation (Asare et al., 2015). However, even patients with the same TNM stage require different therapeutic approaches. Therefore, identification of additional molecular biomarkers is required to improve prognosis by accounting for the biological heterogeneity of HNSCC.
Differential expression of long non-coding RNAs (lncRNAs) might affect the corresponding function and play an important role in cancer pathogenesis. LncRNAs and microRNAs (miRNAs) are important subclasses of competing endogenous RNAs (ceRNAs), which regulate the protein levels of target genes, thus, participating in all aspects of various cellular biological processes (Lin & Gregory, 2015;Schmitt & Chang, 2016). Accumulating evidence suggests that the ceRNA network could be beneficial for prognosis prediction in patients with HNSCC Yin et al., 2019;Pan et al., 2019;Fang et al., 2018;Zhao et al., 2018). Zhao et al. (2018) established a ceRNA regulation network related to HNSCC that included eight differently expressed mRNAs (DEmRNAs), 53 differentially expressed lncRNAs (DElncRNAs), and 16 differentially expressed miRNAs (DEmiRNAs). However, that study and others did not construct a prognostic model for HNSCC that assessed the relationship between ceRNA network RNAs and overall survival (OS). In addition, several important characteristics are closely related to the clinicopathological features of HNSCC, such as TP53 mutation status, angiolymphatic invasion (ALI), human papillomavirus (HPV) status, perineural invasion (PNI), and extracapsular spread (ECS). However, no previous studies have, to the best of our knowledge, assessed the relationship between ceRNA network RNAs and the above clinicopathological characteristics.
In the present study, we comprehensively evaluated the prognostic value of differentially expressed RNAs and developed multi-RNA-based models that predicted OS via the risk signature of HNSCC. In addition, we systematically investigated the relationship between prognostic RNAs and HNSCC clinicopathological features.

Study population
The RNA sequence data of HNSCC were obtained from The Cancer Genome Atlas (TCGA) database using the Data Transfer Tool, and the corresponding clinical information of patients with HNSCC was downloaded. Sequenced information was provided by the Illumina HiSeq miRNAseq and Illumina HiSeq RNA-seq platforms. This study conforms to the publication guidelines of TCGA.

Differentially expressed RNAs related to HNSCC
The design and analytical approach of the present study are shown in the flow chart (Figs. 1A and 1B). lncRNA-seq and mRNA-seq data were obtained from 546 samples consisting of 502 HNSCC and 44 adjacent normal tissues. MiRNA-seq information was obtained from 569 samples consisting of 525 HNSCC and 44 adjacent normal tissue samples. According to the cutoff criteria of |log2 (fold-change [FC]) | > 2 and adjusted p < 0.05, DEmRNAs and DEmiRNAs in the HNSCC and normal tissues were identified using the ''edgeR'' package in R. Next, DElncRNAs were screened out using the same method above in R with the thresholds of |log 2 FC| > 2.0 and adjusted p-value < 0.05. Then, the DElncRNAs were annotated and defined using the Encyclopedia of DNA Elements (ENCODE). The statistical significance of multiple comparisons was corrected using FDR (false discovery rate) and FDR < 0.05 was considered statistically significant.

Survival analysis
Patients with HNSCC with an OS greater than 100 days were included in the survival analysis. Univariate Cox regression analysis was conducted to screen for the prognostic DElncRNAs, DEmRNAs, and DEmiRNAs for OS. Last absolute shrinkage and selection operator (LASSO) regression analysis was conducted with DElncRNAs and DEmRNAs with p < 0.001, and DEmiRNAs with p < 0.05 for establishment of risk signatures in univariate Cox regression analysis. The risk score (RS) was evaluated using the following formula: RS = n i=1 Coef(i)R(i) Where n stands for the number of RNAs, Coef (i) represents the coefficient, and R(i) represents the relative gene expression of z-score-transformed estimated by LASSO regression analysis. If the RS of a sample was greater than the mean value of the RS, the sample was assigned to the high-risk group; otherwise, it was assigned to the low-risk group. The differences between high-and low-risk patients and OS were evaluated using Kaplan-Meier survival plots and log-rank tests. Moreover, the sensitivity and specificity for survival prediction were estimated by receiver operating characteristic (ROC) curves and areas under the ROC curves (AUC).

Functional analysis and PPI network
Functional enrichment analyses, including KEGG and GO enrichment analyses, were conducted using the ''clusterProfiler'' package. The Search Tool for the Retrieval of Interacting Genes (STRING, http://string.embl.de/) was used to build the PPI network according to a combined score greater than 0.4 as the threshold to evaluate the interactive relationships of prognostic DEmRNAs. The nodes with substantially more connections (>5) were considered hub proteins.

Validation and prognostic values
To test the reliability of the results, the expression levels and prognostic values of the prognostic lncRNAs and mRNAs in the ceRNA network were verified using Gene Expression Profiling Interactive Analysis (GEPIA, http://gepia.cancer-pku.cn/index.html), which was used to analyze the RNA sequencing data from TCGA and Genotype-Tissue Expression (GTEx) dataset projects. The threshold |Log 2 FC| for prognostic factors was determined through boxplot analysis using log 2 |TPM+1|. The DEmiRNA of miR-206 in the ceRNA network was validated with a microarray from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) between tumor and normal tissues samples (in vivo), while was validated in cell lines by qRT-PCR (in vitro). Then, the DEmRNA of STC2 in the ceRNA network between tumor and normal tissues samples was validated with six gene expression datasets obtained from ONCOMINE TM (https://www.oncomine.org/resource/login.html) and the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/).

Statistical analysis
The expression of each RNA correlating with clinical characteristics was estimated using the Mann-Whitney rank sum test. Survival curves were plotted using the Kaplan-Meier method. Prognostic RNAs were screened using univariate and LASSO regression analyses. R software (version 3.6.0), Cytoscape software (version 3.5.1), and GraphPad Prism 7 were used to plot figures. Statistical significance was set at p < 0.05.

Identification of DElncRNAs, DEmRNAs and DEmiRNAs
Each sample from patients with HNSCC was analyzed to obtain the expression profiles of lncRNA, mRNA, and miRNA for comprehensive integrated analysis. A total of 1052 DElncRNAs, 82 DEmiRNAs, and 2001 DEmRNAs were differentially expressed between HNSCC and normal tissue samples. Of these, 766 lncRNAs (72.81%) were upregulated

Screening of prognostic factors associated with OS
We assessed 474 HNSCC samples selected from TCGA against the following criteria: (1) has clinical characteristics data, (2) has whole RNA-seq data, and (3) OS of more than 100 days. DElncRNAs, DEmiRNAs, and DEmRNAs related to OS were screened using univariate Cox regression analysis. In total, 207 lncRNAs, 18 miRNAs, and 362 mRNAs identified by univariate Cox regression analysis (p < 0.05). Forthermore, 19 lncRNAs, 18 miRNAs, and 25 mRNAs were selected for the LASSO regression analysis to establish the risk signatures (DElncRNAs and DEmRNAs with p < 0.001, and DEmiRNAs with p < 0.05) (Table S1).

Functional analysis
GO enrichment analysis showed that on the BP, the differential expression genes were associated with ''keratinization'', ''anterior/posterior pattern specification'', etc. For MF, the genes were related to ''receptor ligand activity'', ''receptor regulator activity'', etc. CC were closely correlated with ''intermediate filament'', ''intermediate filament cytoskeleton'', etc (Fig. 3A). KEGG enrichment analysis revealed that mRNAs were dramatically enriched in pathways related to ''transcriptional misregulation in cancer'' and ''neuroactive ligandreceptor interaction'', which are significantly correlated with carcinogenesis and metastasis (Fig. 3B). Further, correlations of the top 185 most significant mRNAs (p < 0.05) were analyzed by constructing PPI networks (Fig. 3C). The results showed that these mRNAs play key roles during the initiation and progression of HNSCC.

Predictive model for OS
In total, 19 lncRNAs, 18 miRNAs, and 25 mRNAs identified in univariate Cox regression analysis were selected for the LASSO regression analysis. Of these, 13 lncRNAs (LINC00460,  (Tables S2-S4). Based on the powerful prognostic RNAs, three risk signatures were constructed (Figs. 4A-4F). LASSO regression analysis of the 13 lncRNAs, eight miRNAs, and 17 mRNAs was conducted, and the RS was calculated according to the coefficients shown in Table 1. To evaluate the prognostic performance of the risk signature, patients with HNSCC were divided into low-and high-risk groups based on the median RS. The results suggested that patients in the high-risk group had worse prognoses than those in the low-risk group (Figs. 4G-4I). In addition, specificity and sensitivity were assessed using a time-dependent ROC curve, and the AUC values were 0.62, 0.619, and 0.688, respectively, suggesting better prediction performance (Figs. 4J-4L).
The expression levels of differentially expressed RNAs related to clinicopathological characteristics, including the three risk signatures, are shown in the heat map (Figs. 5A-5C). Classification into the high-risk group was closely related to patient died status, TP53 mutation status, a high T stage, HPV negative, PNI, and disease recurrence/progression   Table 2). Thus, age, stage, T stage, N stage and the risk score were strongly related to OS in univariate analysis (Figs. 6A-6C); risk score and age, in the lncRNA and mRNA models; and risk score, age, and N stage, in the miRNA model (multivariate analysis) (Figs. 6D-6F) (p < 0.05). These results showed that the risk score could be regarded as an independent prognostic factor in HNSCC.

Verification of the prognostic values of ceRNA
The verified results of GEPIA showed that HOTTIP, LINC00460, and STC2 were significantly associated with OS, strengthening the reliability of our findings (Figs. 8A-8P).

DISCUSSION
Determining the potential molecular mechanisms of the onset and development of HNSCC and new prognostic targets will help improve the survival rate of patients. To date, few studies have reported single or multiple biomarkers for HNSCC due to its multiple genetic alternations and mutations and their complicated interactions. Therefore, identification of specific and sensitive lncRNA biomarkers and prognostic targets for HNSCC is urgently needed (Guglas et al., 2017). ceRNA regulatory networks are composed of lncRNAs, miRNAs, and mRNAs and have attracted great interest in recent years with respect to their contribution to the molecular mechanisms underlying tumor development and subsequent malignant progression (Deng et al., 2015). The present study explored how the HNSCC-specific lncRNAs function as ceRNAs to regulate their target genes and constructed RNA-based prognostic models for HNSCC. Many studies have implicated the ceRNA regulatory network comprising lncRNAs, miRNAs, and mRNAs in the development of HNSCC Yin et al., 2019;Pan   et al., Fang et al., 2018;Zhao et al., 2018). All the above studies are based on data from the TCGA database for the construction of a ceRNA regulatory network and to obtain survival-related RNAs of HNSCC. The present study screened HOTTIP, LINC00460, hsa-miR-206, and STC2 and, similar to the results of previous studies, a comparison with which indicated that our research methods were credible. However, these studies only assessed the relationship between signal lncRNA/miRNA/mRNA and survival and did not construct a prognostic model. Conventional methods are often directly used to screen for differentially expressed RNAs while constructing a ceRNA regulatory network. In contrast to other studies, in this study, we first analyzed differentially expressed RNAs to identify prognosis-related RNAs using univariate Cox regression analysis, and then selected the prognosis-related DElncRNAs, DEmiRNAs, and DEmRNAs for the construction of the ceRNA network. This approach ensured that all RNAs incorporated into the network were associated with survival.
Another study by Yin et al. (2019) showed that 71 lncRNAs, eight miRNAs, and 16 mRNAs were established an HNSCC-specific ceRNA network. HOTTIP was proven to be the most powerful prognostic factor and was markedly associated with clinical stage and histological grade in three patients with HNSCC. Moreover, our study used univariate Cox regression analysis and LASSO analysis to establish RNA-based risk signatures. Compared with multivariate Cox regression analysis, LASSO analysis may be more beneficial in reducing the occurrence of data model overfitting. However, a previous study identified only lncRNAs and mRNA prognostic signatures among the ceRNA network and could not establish a miRNA-based model. Forthermore, we analyzed the relationship between the prognostic model and different clinicopathological features, and thereby strictly explained the predictive role of risk signatures. More importantly, our study not only established three prognostic models consisting of prognostic RNAs, but also assessed the relationship between these prognostic models and conventional clinicopathological features (including TNM stage and histological grade) and specific HNSCC clinicopathological features (including TP53 mutation status, ALI, HPV status, PNI, and ECS).
Previous publications have focused in part on ceRNAs in laryngeal squamous cell carcinoma (LSCC) (Kong et al., 2019;Liu & Ye, 2019;Gao et al., 2019;Sun et al., 2018;Feng et al., 2016), tongue squamous cell carcinoma (TSCC) (Sturgis & Cinciripini, 2007), and nasopharyngeal carcinoma (NPC) (Fullerton et al., 2020;Agrawal et al., 2011). These studies have identified differentially expressed RNAs by lncRNA and mRNA integrated microarrays of cancer tissue or by immunohistochemistry (Gao et al., 2019;Sun et al., 2018;Feng et al., 2016). The different key RNAs screened may be attributed to the patients included from different databases and the different tumor types. The evaluation ability was 0.62 for the lncRNA-based model, 0.619 for the miRNA-based model, and 0.688 for the mRNA-based model. Importantly, while establishing prognosis-related prediction models, we also analyzed the relationship between the three prognostic models and clinicopathological features, including the relationship between the expression of each independent RNA in the model and clinicopathological features. This has, to the best of our knowledge, never been explored. Our results showed that the three prognostic models significantly correlated with clinicopathological features, such as OS, DFS, TP53 mutation status, and HPV status. Furthermore, the RS of the three risk signatures as an independent predictor of prognosis plays an important role in prognostic prediction in HNSCC. Notably, HPV is regarded as a crucial risk factor for the development of oropharyngeal cancer (Sturgis & Cinciripini, 2007). New studies have shown that patients with HPV-positive and HPV-negative oropharyngeal cancer have dramatically different rates of both head and neck carcinoma mortality and competing-cause mortality. Compared with HPV-positive patients, HPVnegative patients have a higher risk of two-year cumulative incidence of all-cause mortality and a lower risk of both head and neck carcinoma-specific mortality and competing-cause mortality (p < 0.0001) (Fullerton et al., 2020). Therefore, we believe that the HPV status is a critical determinant in oropharyngeal cancer. TP53 mutation was observed in 60% to 80% of HPV-negative HNSCC patients and was associated with poor prognosis, increased resistance to standard therapy (mainly cisplatin and radiation), and the shortest time to distant metastases (Agrawal et al., 2011;Stransky et al., 2011;Neskey et al., 2015). The significance of TP53 mutations is in predicting the efficacy of chemotherapy, targeted therapy, and radiotherapy.
In our KEGG pathway analysis, the prognostic mRNAs were significantly enriched in ''Transcriptional misregulation in cancer'' and ''Neuroactive ligand-receptor interaction''. The pathways related to ''Transcriptional misregulation in cancer'' included the P53 signaling pathway, renal cell carcinoma, thyroid cancer, and acute myeloid leukemia.  reported the same KEGG pathways to be enriched in HNSCC.
We further validated the RNAs in the ceRNA regulatory network using the TCGA database, both in vivo and in vitro. HOTTIP and LINC00460 were significantly associated with OS validated by GTEx data and several gene microarrays. They also showed varying degrees of upregulation in six HNSCC cell lines. HOTTIP is a newly identified lncRNA encoded by chromosome 7p15.2 located at the 5 -end of the HOXA cluster . Many studies have reported that higher expression of HOTTIP is associated with positive lymph node metastasis and unfavorable prognosis in various cancer types (Lian et al., 2016;Chen et al., 2017). Our previous studies have confirmed that HOTTIP is also involved in the construction of a ceRNA regulatory network in renal clear-cell carcinoma . Zhang et al. (2015) reported that the overexpression of HOTTIP is an independent poor prognostic factor for patients with TSCC. Xue et al. (2019) showed that the downregulation of LINC00460 can facilitate cancer cell apoptosis and autophagy in HNSCC. Our previous studies have confirmed that the lncRNA signature and tumor grade are independent prognostic factors and that the 3-lncRNA (KTN1-AS1, LINC00460, and RP5-894A10.6) signature could be a novel prognostic biomarker for HNSCC (Cao et al., 2017). Jiang et al. (2019) proposed that LINC00460 could enhance the proliferation and metastasis of HNSCC cells and promote EMT by accelerating PRDX1 entry into the nucleus. LINC00460 contributes to the progression of NPC by sponging miR-149-5p to upregulate IL6 (Kong et al., 2018).
In this study, only one prognostic miRNA (hsa-miR-206) involved in ceRNAs network. In terms of hsa-miR-206, it has been reported to be downregulated and associated with tumor progression and worse prognosis in several malignancies such as gastric cancer, osteosarcoma and colorectal carcinoma (Arigami et al., 2013;Bao et al., 2013;Liu et al., 2017). In HNSCC, it was demonstrated that hsa-miR-206 plays a critical role progression by targeting HDAC6 via PTEN/AKT/mTOR pathway, which might be a potential target for diagnostic and therapeutic (Vickers et al., 2012). The target prognostic mRNA of hsa-miR-206 was STC2 in ceRNAs regulatory network in present study. STC2 (Stanniocalcin 2) is a secreted glycoprotein with important functions in gastric cancer, which could be a powerful marker of poor prognosis (Yang et al., 2013;Yang et al., 2017). Yang et al. concluded that STC2 may promote HNSCC metastasis via PI3K/AKT/Snail signaling axis. Targeted therapy against STC2 may become a novel strategy to effectively treat patients with metastatic HNSCC (Yokobori et al., 2010).
This study has various advantages and limitations. The advantage of bioinformatics technology is that it can provide novel insights into the potential molecular mechanisms involved in the development and progression of HNSCC (Yu et al., 2014). However, a welldesigned and verified classifier model is needed to ensure the reliability of our findings. The highlights of this study are as follows: First, we established three multi-RNA-based prognostic models and a prognostic ceRNA regulatory network for HNSCC, all of which can increase the effectiveness of prognosis prediction. Second, we comprehensively analyzed the relationship between the prognosis model and each RNA included in the prediction model and conventional clinicopathological features (including TNM stage and histological grade) and specific HNSCC clinicopathological features (including TP53 mutation status, ALI, HPV status, PNI, and ECS) to further verify the significant correlation with survival and clinical therapeutic outcomes. lncRNAs can regulate gene expression at several levels and are involved in the evolution and progression of many cancers. However, the present study only identified one regulatory mechanism. Therefore, further clinical investigations and scientific research are required to validate these conclusions.

CONCLUSIONS
In conclusion, the present study analyzed an independent cohort of patients with HNSCC from the TCGA database, constructed a ceRNA regulatory network, and established three multi-RNA-based prognostic models with potential prognostic value, and the findings may provide a basis for further experimental and clinical research.

Grant Disclosures
The following grant information was disclosed by the authors: • Jialiang Liu and Hao Wu analyzed the data, prepared figures and/or tables, and approved the final draft.
• Siyi Li and Chenping Zhang conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

Data Availability
The following information was supplied regarding data availability: Data is available at TCGA, search terms: TCGA, Project: TCGA-HNSC, Diease Type: squamous cell neoplasms.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.9701#supplemental-information.