Identification of a six microRNA signature as a novel potential prognostic biomarker in patients with head and neck squamous cell carcinoma

The 5-year survival rate of patients with head and neck squamous cell carcinoma (HNSCC) was only 40%-50%. To investigate the prognostic and predictive value of specific mircoRNAs (miRNAs) in HNSCC. We identified 19 miRNAs associated with over survival (OS) of patients with HNSCC in different clinical classes between 492 HNSCC tissues and 44 normal tissues from The Cancer Genome Atlas (TCGA) dataset. A signature of six miRNAs was identified by the supervised principal component method in the training set. The AUC of the ROC curve for the six microRNA signature predicting 5-year survival was 0.737 (95%CI, 0.627-0.825) in the testing set and 0.708 (95%CI, 0.616-0.785) in the total dataset. In the multivariate Cox regression analysis, patients with high-risk scores had shorter OS (HR, 2.380, 95%CI, 1.361-4.303) than patients with low-risk scores in the total dataset. Therefore, these results provided a new prospect for prognostic biomarker of HNSCC.


INTRODUCTION
Head and neck squamous cell carcinoma (HNSCC) originates from the upper aerodigestive tract including oral cavity, oropharynx, hypopharynx and larynx, which is the sixth most common cancer. Approximately 600,000 new patients are diagnosed, and 350,000 patients died of this disease worldwide annually [1]. Furthermore, 5-year survival rate of patients with this disease was only 40 -50% [2]. However, the prognosis of HNSCC patients has not any significantly improvement in the past decade [3]. Recent studies have found that tobacco use and HPV status in patients with HNSCC had significant prognostic importance [4].
MicroRNAs (miRNAs) are small non-coding RNAs of 18-25 nucleotides in length that regulate gene expression by binding to the 3'-untranslated region (3'-UTR) of their target mRNAs, resulting in mRNA degradation and/or inhibition of mRNA translation [5]. MiRNAs play a critical role in many processes such as cellular proliferation, apoptosis, migration, and differentiation [6][7][8]. Studies had previously found that some circulating miRNAs strongly associates with the invasion, and metastasis of cancer [9]. Therefore, circulating miRNAs may be a potential novel biomarkers for cancer detection and prognosis. For instance, a 4-miRNAs panel was associated with overall survival of patients with non-small cell lung cancer [10]. In breast cancer, higher levels of circulating hsa-miR-122 can predictively indicate metastatic recurrence in patients with stage II and stage III [11]. Hsa-miR-25 is associated with poor survival of gastric cancer patients by inhibiting transducer of ERBB2 [12]. Hsa-miR-7 expression is associated with poor prognosis via EGFR regulation in colorectal cancer [13]. Other studies showed that high expression www.impactjournals.com/oncotarget of hsa-miR-19a and hsa-miR-21 and low expression of hsa-miR-375 were associated with shorter overall survival in laryngeal squamous cell carcinoma patients [14][15][16].
However, most of studies were based on a small number of patients, or inconsistent in these sets of miRNA markers due to the heterogeneous of disease and variations in the approaches for selection of miRNAs. The Cancer Genome Atlas (TCGA) released a large number of miRNA sequencing data for HNSCC patients. Therefore, the aim of this study was thus to assess systematically the predictive value of specific miRNA signature for 5-year survival of patients with HNSCC from a large dataset of TCGA.

Characters of the datasets
A total of 525 miRNA expression profiles (level 3 data) were obtained from TCGA. In miRNA tumor profiles, the expression of 1046 human miRNAs in HNSCC samples was assessed using the Illumina HiSeq Systems (n = 488) and Genome Analyzer (n = 37). MiRNA expression profiles for normal tissues (n = 44) were also analyzed using the Illumina HiSeq System. The clinical data for those patients obtained from TCGA were available. According to included criteria, a total of 492 HNSCC patients were finally enrolled in the study. For subsequent analysis, we randomly divided the total patients into the training set (n=246) and testing set (n=246) respectively. There was no significant difference on the clinical covariates between the two sets (P > 0.05). (Table 1)

Association of miRNAs expression and clinical features with OS of HNSCC patients
We conducted univariate Cox regression between clinical covariates and HNSCC to confirm the prognostic significance of the clinical covariates. After the analysis, clinical variables of age, smoking status, lymphnodes positive, perineural invasion present, pathologic N stage, pathologic T stage, pathologic disease stage, and tumor grade were significantly associated with OS. However, we did not find other clinical variables of gender, clinic N stage, clinic T stage, clinic M stage, clinic disease stage, HPV status, and alcohol history significantly associated with OS. Kaplan-Meier survival curves and log rank test for these variables were shown in Supplementary Figure  S1- Figure S15.
Then, we conducted univariate Cox regression to identify common miRNAs associated with OS within each of the following independent classes: age, smoking status, lymphnodes positive, perineural invasion present, pathologic N stage, pathologic T stage, pathologic disease stage, and tumor grade. Within each subset of clinical characteristics, the patient subclasses represented non-overlapping sets, respectively. MiRNAs were selected as candidate markers if they were associatedsignificance with OS in at least two independent categories for each covariate. The respective HRs for the association of miRNA with OS in each subclass were shown in Figure 1. A total of 19 miRNAs were identified in this analysis.

Definition of miRNA prognostic model
We selected six miRNAs by the supervised principal component method in the training set. And then, we developed a miRNA prognostic model. The miRNAs expression level was as the log2 reads per million of total aligned miRNA reads. The prognostic score was calculated as follows: Prognostic-score = (-1.155×expression level of hsa-let-7c) + (-1.063×expression level of hsa-miR-125b-2) + (1.217×expression level of hsa-miR-129-1) + (1.137×expression level of hsa-miR-337) + (1.013×expression level of hsa-miR-654) + (-1.157×expression level of hsa-miR-99a). We classified the samples into high risk or low risk group using the best cutoff point of miRNA scores with optimum sensitivity and specificity according to ROC curve for predicting 5-year survival in the training set. The cutoff point was -16.070 with 71.48% sensitivity and 70.20% specificity.

Prognostic value of the six microRNA signature in HNSCC
The six microRNA signature showed greater predicting prognosis capacity for predicting 5-year survival in HNSCC with an AUC of 0.737 (95%CI, 0.627-0.825) in the testing set (Supplementary Figure S16A) and an AUC of 0.708 (95%CI, 0.616-0.785) in the total HNSCC patients (Figure 2A), respectively.   Compared with patients with low-risk scores, patients with high-risk scores in the TCGA HNSCC cohort had significantly shorter OS (HR, 2.380, 95%CI, 1.361-4.303) after adjusted age, perineural invasion, pathologic T stage, pathologic N stage by multivariate Cox proportional hazards regression analysis according to the backward stepwise method of screening variables in Table 2.

Target prediction and functional enrichment of the six microRNA signature in HNSCC
The numbers of the target genes of the six miRNAs were 8437, 7253, and 1119, which were predicted by three data sets using miRanda, Targetscan, and PicTar programs, respectively. A total of 314 target genes were included in the three data sets (Supplementary Figure S20). We performed enrichment analyses to elucidate the biological function of target genes of the six microRNA signature. Finally, Gene ontology (GO) analysis revealed that there were 144 of the proteins were associated with biological process (BP), 37 of the proteins with cellular component (CC), and 31 of the proteins with molecular function (MF), respectively. The top ten enriched functional analysis was shown in Figure 4. The top enriched biological process was enzyme linked receptor protein signaling pathway. The top enriched cellular component and molecular function was plasma membrane part and chromatin binding, respectively. Therefore, a total of 23 KEGG pathways were enriched by the six microRNA signature. The top enriched KEGG pathway was the MAPK signaling pathway. The top 15 functional enrichment of target genes for six microRNA signature were summarized in the Supplementary Table S2.

DISCUSSION
MiRNAs have been reported in all stages of neoplastic progression including apoptosis, proliferation, progression, metastasis, invasion, and relapse [17]. Recent studies have found that miRNAs were associated with lymph node metastasis, T classification, clinical stage, and prognosis in patients with HNSCC [18][19][20]. Shen et al. found that the expression of hsa-miR-34a was positively correlated with survival rate and could independently predict survival in patients with laryngeal squamous cell carcinoma [21]. Hsa-miR-203 may be used as a predictive marker of survival in laryngeal squamous cell carcinoma patients because the expression of hsa-miR-203 was associated with lymph node metastasis, advanced clinical stages, and decreased 5-year survival rate [22,23]. Hou et al. found that circulating hsa-miR-99a may be used as biomarkers to assess the efficacy of therapy and the prognosis of HNSCC  D C www.impactjournals.com/oncotarget [24]. Therefore, we applied a miRNAs signature as a potential biomarker for predicting survival in HNSCC using TCGA datasets.
In the present study, we have identified a six microRNA signature consisting of hsa-let-7c, hsa-miR-125b-2, hsa-miR-129-1, hsa-miR-337, hsa-miR-654, and hsa-miR-99a, which was validated as an independent predictor for HNSCC patient survival. The AUC of the ROC curve for the six microRNA signature predicting 5-year survival was 0.737 (95%CI, 0.627-0.825) in the testing set and 0.708 (95%CI, 0.616-0.785) in the total dataset. The six microRNA signature have a good performance for predicting 5-year survival in HNSCC.
Among the six miRNAs, low expression of hsa-miR-99a was associated with poor prognosis in patients with HNSCC. On the contrary, overexpression of hsa-miR-99a distinctly inhibited cell proliferation and induced apoptosis by down-regulating the expression level of IGF1R and mTOR genes in oral cancer cells [25][26][27]. Accordingly, overexpression of IGF1R and activation of mTOR signaling path have been observed in HNSCC, and were often associated with poor prognosis [28]. Hsa-miR-125b controls the apoptotic pathway by targeting the BAK1 gene in HNSCC [29]. Down-regulated hsa-let-7c was significantly associated with metastasis and poor survival of patients, and hsa-let-7c inhibits cancer metastasis by degrading ITGB3 and MAP4K3 in non-small cell lung cancer [30]. Hsa-let-7c was down-regulated in HNSCC [31,32]. To date, to our knowledge, this is the first to report the association of hsa-miR-129-1, hsa-miR-337, and hsa-miR-654 with OS in HNSCC.
The carcinogenic process of HNSCC is a multistep process driven by a series of genetic and epigenetic alterations in tumor suppressor genes and oncogenes resulting in the progression from a normal cell to a cancer cell [33]. Through the enrichment and function analysis of DAVID database, we found that the target genes of these miRNAs may participate in many important biological processes, including regulation of transcription, positive regulation of transcription, and positive regulation of gene expression. In the KEGG pathway analysis, these target genes also take part in many important signaling pathways such as MAPK signaling pathway, cancer pathways, and TGFbeta signaling pathway. Aberrant activation of the p38 MAPK pathway may halt HNSCC cells growth and metastasis by reducing tumor-induced inflammation and angiogenesis [34]. TGF-β signaling causes the proproliferative and tumor-suppressive responses according to the biological context [35]. SMAD4 alterations may explain the decreased tumor suppressive effect of TGF-β signaling in HNSCC [36,37].
There were a number of limitations in this study. Firstly, the data in this study was from a single source (TCGA) and randomly assigning samples to training set and testing set for the development and assessment of the prognostic model. The performance of these miRNAs signature would be more reliable if validation is performed in an independent external data sets with long-term follow up. Secondly, the TCGA HNSCC cohort had a relatively high censored rate, which may affect the reliability of the Kaplan-Meier estimates.
In conclusion, our results showed that the six microRNA signature significantly predicts the 5-year survival of HNSCC patients in the TCGA cohort, indicating that it may be a novel potential biomarker for prognosis of HNSCC. This finding requires further confirmation in independent larger cohorts in future studies.

Patient characteristics and miRNA dataset in TCGA
The miRNA expression profiles (level 3 data) and corresponding clinical data for HNSCC patients were obtained from TCGA data portal (January 2013, https:// tcga-data.nci.nih.gov/tcga/tcgaHome2.jsp). Both the miRNA profiles data and clinical data of HNSCC are publically available and open-access. These patients' extended demographics were characterized by the TCGA. The patients were included in the study to meet the following criteria: (1) patients with fully characterized (clinical data and miRNA profiles) tumors; (2) patients with at least 1 month of over survival (OS).

Identification of differentially expressed miRNAs between HNSCC and normal tissue
To identify miRNAs differentially expressed between HNSCC and normal tissues, the raw counts of miRNA expression obtained from the TCGA dataset (492 HNSCC samples and 44 normal tissue) were normalized by a weighted trimmed mean of the log expression ratios (TMM, Trimmed mean of M values method [38]). One miRNA expression filter was miRNAs expressed in at least two normal or tumor samples and with at least 100 counts per million in the profile. The batch effect was removed using a generalized linear model (GLM) [39]. The expression differences were characterized by logFC (log 2 fold change) and associated P-values. The logFC >1 and logFC < −1 with FDR-adjusted (FDR adjust: Benjamini & Hochberg) P < 0.05 respectively represented up-regulated and down-regulated miRNAs. The analysis was performed using the R/Bioconductor package of edgeR [40].

Survival analysis
Clinical covariates for HNSCC patients are summarized in Table 1. The univariate Cox regression was performed to test the association between clinical covariates and OS. The equality for survival distributions in different groups were test by the Kaplan-Meier and log rank method. The miRNA expression level was as the log2 reads per million of total aligned miRNA reads. Univariate Cox regression was used to identify common miRNAs related to OS within each of clinical characteristics that were significantly associated with OS. Within each group of clinical characteristics, the patient subclasses represented non-overlapping sets. Common miRNAs associated with OS in at least two independent categories for each covariate were selected as candidate markers, using a P-value of 0.05 as the cutoff for miRNA selection. The hazard ratio was the ratio of hazards for a twofold change in the gene expression level. All reported P values were two-sided. The analysis was performed using R Packages of survival.

Definition of prognostic model and ROC curve
The miRNAs selected as candidate markers under survival analysis were conducted to the supervised principal component analysis [41]. 10-fold cross-validation was used to estimate the optimal feature threshold in supervised principal components. The threshold of 10-fold crossvalidation is equal to 1 serving as the optimal feature threshold in supervised principal components. The gene weights for the linear miRNA risk predictor were computed using the supervised principal component method. We used the linear miRNA prognostic model obtained from the training set to calculate a miRNAs signature prognostic score for each of 492 patients. We chose the best cutoff values with optimum sensitivity and specificity of the miRNAs signature prognostic scores to divide the patients into the high risk or low risk group in the ROC curve for predicting 5-year survival of the training set. Kaplan-Meier curves were used to estimate the survival for patients with high risk scores or low risk scores. In addition, the prognostic value of the miRNAs signature for 5-year survival of patients was also assessed in different clinical characteristics including smoking status, pathologic T, pathologic disease stage, and tumor grade.
The prognostic performance was measured using time-dependent receiver operating characteristic (ROC) curves. Since the majority of events occurred before 60 months, the ability of models to predict outcome at and around 60 months was assessed. Bootstrap 95%CI of AUC were calculated from 1000 bootstrap of the survival data. We used a multivariable analysis to evaluate the contribution of miRNAs as independent prognostic factors of patient survival. The backward stepwise method was employed to select the best model. All analysis were performed using R (Packages: survival, survivalROC, boot, and superpc).

Target prediction and enrichment analysis
The target genes of miRNAs were predicted by three programs including miRanda, Targetscan, and PicTar. The target genes were selected by miRanda if the mirSVR score ≤ -0.1 and by Targetscan if the total context score ≤-0.1. The final target genes were selected, which were included in all the three data sets. The enrichment analysis of these target genes was analyzed using DAVID online analysis [42] (https://david.ncifcrf.gov/). The gene sets containing less than 5 genes overlapping were removed from the DAVID analysis, and analysis for significance was determined when P values were corrected for FDR.