Comprehensive analysis of ubiquitin-proteasome system genes related to prognosis and immunosuppression in head and neck squamous cell carcinoma

The ubiquitin-proteasome system (UPS) with a capacity of degrading multiple intracellular proteins is an essential regulator in tumor immunosurveillance. Tumor cells that escape from recognition and destruction of immune system have been consistently characterized an important hallmark in the setting of tumor progression. Little know about the exact functions of UPS-related genes (UPSGs) and their relationships with antitumor immunity in head and neck squamous cell carcinoma (HNSCC) patients. In this study, for the first time, we comprehensively identified 114 differentially expressed UPSGs (DEUPSGs) and constructed a prognostic risk model based on the eight DEUPSGs (BRCA1, OSTM1, PCGF2, PSMD2, SOCS1, UCHL1, UHRF1, and USP54) in the TCGA-HNSCC database. This risk model was validated using multiple data sets (all P < 0.05). The high-risk score was found to be an independently prognostic factor in HNSCC patients and was significantly correlated with T cells suppression. Accordingly, our risk model can act as a prognostic signature and provide a novel concept for improving the precise immunotherapy for patients with HNSCC.


INTRODUCTION
Head and neck squamous cell carcinoma (HNSCC) affects more than 800,000 people each year worldwide, leading patients with HNSCC about sixth in global incidence [1,2]. HNSCC is a collection of malignancies with complex local autonomy, given that it originates from the squamous epithelium of the upper aero-The Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) provide potent resources for directly obtaining gene expression profiles from patient tissues. Therefore, this study aims to provide a systematic investigation of the expression patterns via bioinformatics analysis and a prognostic risk model based on the UPSGs. We proposed that UPSGs are associated with the prognosis of patients with HNSCC. Importantly, our finding demonstrated that HNSCC patients with poor prognoses have high-risk scores that are associated with impaired T cell antitumor responses.

Identification of differentially expressed UPSGs in HNSCC tissues
We analyzed the expression of 804 UPSGs (Supplementary Table 1) that were distributed through all chromosomes in 498 head and neck tumor tissues and 44 adjacent tissues. 114 differentially expressed UPSGs (DEUPSGs) were identified, as shown in Supplementary Figure 1, where 97 upregulated and 17 downregulated DEUPSGs were observed (FDR < 0.05 and |Fold change| > 1.5, Figure 1A, 1B). The genomic information of DEUPSGs was shown in Supplementary Table 2.

Functional analysis of DEUPSGs in HNSCC
To assess the potential functions of DEUPSGs, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway were analyzed using the TCGA database. The most significant GO terms were enriched for determining DEUPSGs functions, with respect to biological process (BP), cellular component (CC), and molecular function (MF), were shown in Figure 2A-2C. Enriched terms were significantly correlated with the proteasomal protein catabolic process, proteasome complex, and ubiquitin-like protein transferase activity. The KEGG pathway enrichment analysis indicated that ubiquitin-mediated proteolysis and proteasome pathways may play essential roles in the formation and development of HNSCC ( Figure 2D).

Construction and identification of the prognostic risk model
To identify the prognostic value of DEUPSGs in HNSCC patients, Univariate Cox regression analysis was used to confirm the expression patterns of 114 DEUPSGs in the TCGA training set. The forest plots displayed the eight prognosis-related DEUPSGs in HNSCC ( Figure 1C). The prognostic risk model was established based on the eight prognosis-related DEUPSGs using LASSO regression analysis (Supplementary Figure 2). The coefficient values of the eight DEUPSGs were shown in Table 1. The risk score was calculated for each sample as the following equation: Risk score = BRCA1 * (-0.037) + OSTM1 * 0.0294 + PCGF2 * 0.0502 + PSMD2 * 0.0058 + SOCS1 * (-0.0157) + UCHL1 * 0.0022 + UHRF1 * (-0.0496) + USP54 * (-0.0887) AGING Samples were classified into two groups based on the median risk score (0.0622) of the training set. OS analysis indicated that the low-risk group has a markedly better prognosis than the high-risk group (P < 0.0001, Figure 3A). The receiver operating characteristic (ROC) curve indicated that the area under the ROC (AUC) value was 0.664 ( Figure 3B). The risk scores and OS were performed by risk plots and scatter AGING plots ( Figure 3C, 3D). Expression patterns of risk genes in the high-and low-risk groups were depicted in Figure  3E, showing that high expression levels of PSMD2, OSTM1 (Osteoclastogenesis associated transmembrane protein 1), PCGF2 (polycomb group ring finger 2), and UCHL1 (ubiquitin C-terminal hydrolase L1) can be considered as risk factors were correlated with a highrisk score. Furthermore, high expression levels of BRCA1 (BRCA1 DNA repair associated), SOCS1 (suppressor of cytokine signaling 1), UHRF1 (ubiquitinlike with PHD and ring finger domains 1), and USP54 (ubiquitin specific peptidase 54) were associated with a low-risk score.
An independent validation data set was involved in the evaluation of the risk model. Based on the prognostic risk model, patients with HNSCC were divided into low-and high-risk groups. Survival analysis performed by Kaplan-Meier displayed significant prognostic differences between the two groups (P < 0.001, Supplementary Figure 3A). The relationship between the eight DEUPSGs and risk scores was shown in Supplementary Figure 3B. At the same time, similar results were found in the TCGA training set and TCGA all data set (Supplementary Figure 3C, 3D).
The GEO (GSE65858) database was used as an external data set for validating the classification performance of the risk model. Samples in the GSE65858 were segregated into low-and high-risk groups. The classification performance of the risk model and the expression pattern of the risk genes were consistent with the training set (Supplementary Figure 4).

Clinical independence of the risk model
To assess the independence of the risk model in clinical application, univariate and multivariate Cox regression analyses were subjected to clinical parameters from the TCGA training set, TCGA test set, GSE65858 database,   These data demonstrated that the risk model has an effectively prognostic power and exhibits an independent predictive value. A nomogram contained both risk score and clinical features was shown in Figure 4E.

Gene set enrichment analysis in the high-risk and low-risk groups
Gene set enrichment analysis (GSEA) was employed for pathways identification in the TCGA database for detecting twenty pathways from the high-and low-risk groups (Supplementary Table 3). The selected signaling pathways showed significant differences between the two groups (FDR < 0.25, NOM p < 0.05) ( Table 2), in which glycan metabolism, extracellular matrix, and proteolysis were significantly enriched in the high-risk group ( Figure 5A, 5B). In contrast, DNA repair, fatty acid metabolism, other metabolisms, and immunerelated pathways were significantly enriched in the lowrisk group ( Figure 5C-5E). Interestingly, the T cell receptor (TCR) signaling pathway was enriched in the low-risk group ( Figure 5F), which may indicate a potential relationship of the high-risk core with impaired immunosurveillance and T cell antitumor response in HNSCC. The rest of GSEA graphs were shown in Supplementary Figure 6.

Exploration of the relationship between the risk score and antitumor immunity
To determine whether the high-risk score was correlated with impaired T cell activity in HNSCC, we employed the ESTIMATE algorithm to estimate immune and stromal scores based on the TCGA database. The immune score was higher in the low-risk group than the high-risk group (P < 0.001, Figure 6A), and the risk score had a negative correlation with the immune score in HNSCC patients (R = -0.19, P < 0.0001, Figure 6B). However, the stromal score was higher in the high-risk group than the low-risk group (P < 0.01, Figure 6C), and the risk score had a positive correlation with the stromal score in HNSCC samples (R = 0.17, P < 0.001, Figure 6D).
We subsequently analyzed the fraction of both tumorinfiltrated innate and adaptive immune cells in HNSCC samples by CIBERSORT. The relationships of risk score with those immune cells were shown in Figure  6E. The results showed that the fraction of CD8 T cells (P < 0.001), CD4 memory activated T cells (P < 0.001), and follicular helper T cells (P < 0.01) in samples from the high-risk group were lower than the low-risk group, indicating that high-risk score was associated with immunosuppressive phenotypes that may result from impaired T cell response and activation.

Correlation between the genes of the risk model and the four T cell subpopulations
According to the potential relationship between the risk model and the four T cell subpopulations ( Figure 7A), we further tested the correlation between the eight genes and the four T cell subpopulations ( Figure 7B-7I). Consistent with the profile of these genes in the risk model, decreased CD8 T cells were associated with the upregulated OSTM1 (P < 0.01), PCGF2 (P < 0.05), PSMD2 (P < 0.01), UCHL1 (P < 0.05) and the downregulated SOCS1 (P < 0.01). Additionally, decreased CD4 memory-activated T cells were related to the upregulated UCHL1 (P < 0.01) and the downregulated SOCS1 (P < 0.01). Similarly, the asthenia of follicular helper T cells may be caused by high expressed OSTM1 (P < 0.05), PSMD2 (P < 0.001) and low expressed USP54 (P < 0.001). Moreover, the increase of CD4 memory resting T cells was associated with the high expressed OSTM1 (P < 0.01). Thus, the UPS-related genes OSTM1, PCGF2, PSMD2, UCHL1, SOCS1, and USP54 were associated with immunosuppression in HNSCC.

DISCUSSION
Tumor cells that escape from immunosurveillance and attack have been the characteristics of immunosuppression in the tumor progression, which is largely associated with the impaired or dysregulated host immune system [27]. By managing protein degradation and abundance, the UPS pathway has been reported as an essential regulator in immunosurveillance of the tumor microenvironment [28]. Accumulating evidence has demonstrated that E3 dysfunction perturbs antitumor immune responses [29][30][31]. Current knowledge on the functions of the UPS pathway in the development of HNSCC remains obscure. In this regard, an improved understanding of the UPSGs functions in the text of HNSCC can provide a therapeutic value.
This study analytically conferred prognostic evidence by identification of 114 DEUPSGs and establishment of the eight prognosis-related DEUPSGs. A prognosis risk model was established based on the eight DEUPSGs. Previous studies have reported the abnormal expression of UPSGs in renal cancer and breast cancer [32,33]. Analyses performed at both mRNA and protein levels can improve the reliability of the results, whereas the limited sample size of these analyses may lead to a bias AGING in the results, because only abnormal gene expression is analyzed, and verification of clinical significance or potential functional analysis is lacking.
In this risk model, high expression levels of PSMD2, OSTM1, PCGF2, and UCHL1 were risk factors, while high expression levels of BRCA1, SOCS1, UHRF1, and USP54 acted as protective factors. Overexpression of PSMD2 has been previously shown to be involved in the development of lung adenocarcinoma, breast cancer, and hepatocellular carcinoma [24][25][26]. UCHL1 has been reported a potential oncoprotein in colorectal cancer, breast cancer, and uterine serous cancer. It promotes proliferation and metastasis of cancer cells and leads to a radioresistant phenotype via regulation of β-catenin/TCF and HIF-1α pathway [34][35][36][37][38]. In line with those previous studies, our results also suggested that PSMD2 and UCHL1 may have therapeutic potential as targets against cancer. Accumulating evidence has shown that BRCA1 mutation is a crucial risk factor in multiple cancers, including breast cancer, skin cancer, ovarian cancer, and colorectal cancer [39][40][41][42][43]. SOCS1 acts as an antioncogene in various tumors, arresting the cell cycle, inhibiting cancer cell migration and invasion, and attenuating tumor growth [44][45][46][47]. Nevertheless, non-coding RNA (ncRNA) regulating SOCS1 promotes the occurrence and development of cancers [48][49][50]. Similarly, UHRF1 also acts as an antioncogene in multiple cancers and inhibits tumor development and progression [51][52][53][54]. Therefore, the abovementioned studies suggest that BRCA1, SOCS1, and UHRF1 serve as antioncogenes, consistent with our results.
Interestingly, our GSEA results showed that the TCR signaling pathway was highly enriched in the low-risk group, which suggested the impaired T cell activation presented in the high-risk group. TCR engagement initiates a central signaling pathway that is crucial for T cell proliferation, survival, and differentiation into killer cells for adaptive immunity. Abnormalities of TCR signaling could result in immunodeficiency [55][56][57][58].
Consequently, according to the relationship between the risk score and the distribution of T cell subpopulations, we found that the high-risk score has a decreased immune score, which may suggest that the high-risk score is critical for evaluating whether tumor-infiltrated T cells have an immunosuppressive status and dysregulated antitumor response. Moreover, the high expression of OSTM1, PCGF2, PSMD2 combined with the low expression of UCHL1, SOCS1, and USP54 were engaged in the suppression of T cell proliferation and activation in the HNSCC microenvironment. It has been previously shown that differential expression of UCHL1 was majorly involved in the proliferation and differentiation of T cells, including CD8 memory CTLp, circulating TCR-gamma/delta+ lymphocytes, and mature T cells [59]. SOCS1 has been identified as   AGING an inhibitor against cytokine release and a growth factor receptor, it plays a crucial role in regulating T cell homeostasis, development, and homeostasis activation [60][61][62][63][64]. However, the function of OSTM1, PCGF2, PSMD2, and USP54 have not been investigated in the antitumoral immune response. The present studies suggest that the high expression of OSTM1, PCGF2, and PSMD2 and low expression of UCHL1 were associated with immunosuppressive phenotypes.
Generally, we identified an eight -UPSGs based risk model according to the TCGA-HNSCC database and analyzed its biological functions, demonstrating that the risk score obtained from this model was significantly correlated with immunosuppressive status. However, some limitations in the context of the study should be acknowledged. First, this study is performed by bioinformatics analysis alone, the predictive results could be insufficient. Second, this is a retrospective study rather than a prospective one, further validations with large clinical cohort and actual experiments are needed.

CONCLUSIONS
In summary, we present a risk model constructed by DEUPSGs that can be considered as potential prognostic biomarkers and associated with an immunosuppressive status and impaired antitumor response of T cells in the HNSCC microenvironment. This systematic analysis on the interaction of UPSGs based risk model and immune profile provides a novel understanding of the precise immunotherapy for HNSCC patients.

Data downloading and processing
The most recent transcriptional data and clinical features on 500 HNSCC samples and 44 adjacent samples were obtained from the TCGA database (https://portal.gdc.cancer.gov/) and the cBio Cancer Genomics portal (https://www.cbioportal.org/) ( Table  3). We chose 498 HNSCC samples with follow-up data and randomly divided them into two groups: a training set (n=298) and a test set (n=298), as shown in Supplementary Tables 4, 5. In addition, we obtained the GSE65858 data setwith 270 HNSCC samples as an external set from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/).

Identification of differentially expressed genes
The differentially expressed UPS-related genes (DEUPSGs) between head and neck tumor tissues and adjacent tissues were screened out through the Wilcoxon signed-rank test using the limma R package, according to the following cut-off values: FDR < 0.05 and |Fold change| > 1.5. Subsequently, a heat map was generated using the "pheatmap" package, and volcano dot plots were created to present the DEUPSGs. We used the OmicCircos R package to show the distribution of the DEUPSGs [65].

Functional enrichment analyses
The enrichplot R package was built for the GO and KEGG pathway enrichment analyses to identify the functions of the abovementioned DEUPSGs [66], containing BP, CC, and MF terms and pathways. Statistical significance was defined as both P-value and FDR < 0.05.

Prognostic risk model establishment
We used univariate Cox regression analysis to screen out DEUPSGs significantly associated with OS. P < 0.05 was chosen as the threshold. We further used Lasso regression analysis to establish a multi-gene prognostic model. The risk score of each patient was calculated, employing the regression coefficient of each selected gene based on the following equation: where n is the number of prognostic genes, gene i is the i th prognostic gene, Coef is the regression coefficient of genes, and Exp i is the expression value of the prognostic genes. Then, the HNSCC patients were divided into high-risk and low-risk groups according to the median risk score. Moreover, we used rms package to generate a nomogram.

Gene set enrichment analysis
Gene set enrichment analysis (GSEA, https://www. gsea-msigdb.org/) is a powerful computational tool used to determine statistical differences of specific functional gene sets between two biological states [67]. Here, we performed the GSEA analysis using GSEA v4.0.3 to obtain the differentially expressed genes from the highand low-risk groups. After 1000 repeats for each analysis, gene sets with p-value < 0.05 and FDR < 0.25 were identified as enriched sets.

Identification of immune scores and tumorinfiltrating immune cells
To identify infiltrated immune cell numbers and stromal cell numbers, the R software package "ESTIMATE algorithm" was used for calculating immune and AGING  [68]. Additionally, CIBERSORT was applied to acquire the composition of the infiltrated immune cells in HNSCC samples [69].

Statistical analysis
R package (v4.0.1) was subjected to all analyses. Kaplan-Meier curve was drawn with the log-rank test.

AGING
Univariate and multivariate Cox regression was used to determine the independence of gene markers. Spearman's rank correlation test was utilized to identify the variables. Differences in the distributions of the variables were analyzed by the Chi-square test or Fisher's exact test. P < 0.05 was considered statistically significant.

AUTHOR CONTRIBUTIONS
YT and XZ designed this project and approved the manuscript. JW and ZL contributed to the data analysis and carried out the manuscript. JL, LZ, YQ, FZ, RH, HC and YT co-contributed to revising and polishing the manuscript and figures. All authors have read and approved the final submitted manuscript.  Table 3.