HPV infection related immune infiltration gene associated therapeutic strategy and clinical outcome in HNSCC

Head and neck squamous cell carcinoma (HNSCC) is the sixth most common tumor in human. Research has shown that HPV status HNSCC is a unique prognosis factor, which may due to its immune infiltration landscape. But the underlying mechanism is unclear. In this study, we used a combination of several bioinformatics tools, including WCGNA, ssGSEA, CIBERSORT, TIDE,etc., to explore significant genes both related to HPV infection status and immune cell infiltration in HNSCC patients. Combined with several bioinformatics algorithms, eight hub genes were identified, including LTB, CD19, CD3D, SKAP1, KLRB1, CCL19, TBC1D10C and ARHGAP4. In HNSCC population, the hub genes had a stable co-expression, which was related to immune cell infiltration, especially CD8+ T cells, and the infiltrative immune cells were in a dysfunctional status. Samples with high hub genes expression presented with better response to immune check point block (ICB) therapy and sensitivity to bleomycin and methotrexate. The eight hub genes we found presented with a stable co-expression in immune cell infiltration of HPV + ve HNSCC population. The co-expression of hub genes related to an immune microenvironment featuring an increase in immune cells but high degree of immune dysfunction status. Patients with high hub gene expression had a better response to ICB treatment, bleomycin and methotrexate. The co-expression of hub genes may be related to immune infiltration status in patients. The concrete molecular mechanism of hub genes function demands further exploration.


Background
Head and neck squamous cell carcinoma (HNSCC) is the sixth most common tumor in human [1], numbering eighth in the list of causes of tumor-related death [2] and accounting for over 500,000 new cases each year worldwide [3,4].
The two traditional main risk factors for HNSCC are alcohol and smoking, while the past decades witnessed an increasing population of HNSCC patients with persistent infection of human papillomavirus (HPV) [5,6], which is lately responsible for 60-80% of the oropharyngeal cancer incidence in the United States and Europe [7][8][9]. Patients who develop HPV + ve HNSCC generally end up with a better prognosis when compared to patients with HPV-ve HNSCC [10][11][12][13] Previous studies have demonstrated that immune system plays a significant role in the development of HNSC C [5], and there are significant differences in the composition of tumor microenvironment (TME) [14][15][16] between HPV-infected and non-HPV-infected HNSCC. TME, consisting of tumor tissue, micro vessels, cytokine and tumor-infiltrating immune cells (TICs) etc., are essential for tumor progression. Differences in the compositions of TICs, including cytotoxic T cells, helper T cells, dendritic cells (DCs), as well as related inflammatory pathways lead to different clinical tumor behavior [17]. However, immune landscape and its interaction with HPV status is still unclear.
Currently, the majority of patients are commonly provided with the standard treatment of surgery, radiotherapy, chemotherapy, or a combination of these therapies, but recurrence and resistance to following treatment will occur in about 40-60% of treated patients [16]. Therefore, the 5-year OS rate of HNSCCs hasn't changed much over the past decade [16]. Studies on the immune infiltration feature and underlying molecular mechanism of HNSCC population may provide potential molecular targets that improve therapeutic selection and better predict the therapeutic response.
With the development of bioinformatics tools, it is possible to process a huge scale of data at one time. In terms of immune system, a great quantity of algorithms has emerged, such as Estimation of STromal and Immune cells in Malignant Tumor tissue Expression (ESTIMATE), Single sample Gene Set Enrichment analysis (ssGSEA), Cibersort, Tumor Immune Dysfunction and Exclusion (TIDE),etc. Weighted gene co-expression network analysis (WGCNA) has been applied in various types of tumor, in which it is used to look for interaction between gene expression features and clinical characteristics. ESTIMATE is used to assess the extent of immune cells and stromal cells infiltration of tumor tissue by gene expression signatures.
In this study, we obtained HNSCC samples containing both HPV status and sequencing data from public available databases. Combined with the algorithms mentioned, we aimed to identify hub genes associates with immune cell infiltration and HPV status in HNSCC population. We also assessed the infiltrative immune cell type and function, willing to explore the effect of hub genes on immune microenvironment of this specific tumor subtype. In addition, we analyzed the response to immune check point treatment and drug sensitivity in samples with high hub genes expression.

Data procession of TCGA
We filtered the clinical characteristics and gene expression data of the Cancer Genome Atlas Project database (TCGA, https://cancergenome.nih.gov/) HNSCC cohort, in which we collected 102 samples with both HPV status imformation and sequencing data (30 HPV + ve, 72 HPV-ve). We used Variance Stabilizing Transformation function of R package DEseq2 to normalize data and do differential expression analysis between HPV-infection and none-HPVinfection group.

Immune microenvironment assessment
Among TICs, the two major type of cells are immune and stromal cells. Immune and stromal scores were calculated by analyzing specific gene expression signature, and combined to represent a measurement of tumor purity. In this study, we used R package ESTI MATE [18] to assess the immune infiltration of HNSCC samples.
We implemented CIBERSORT [19] (http://cibersort. stanford.edu/) to quantify the tumor infiltrating lymphocytes (TILs) in HNSCC samples. CIBERSORT is a deconvolution algorithm. By using signature matrix (referring to gene expression values), which minimally represents each cell type, as well as support vector regression, it measures the population of each type of cell from bulk tumor samples. We uploaded the standardized gene expression matrix to the website of CIBER-SORT (mentioned above), then ran the algorithm under LM22 signature and 1000 permutations. The LM22 gene signature includes over 500 genes which features high sensitivity and specification of common human immune cell types such like neutrophils, NK cells, DCs, macrophages, T cells, eosinophils, B cells, etc. Patients with P value under 0.05 were enrolled for further analysis.
We then quantified the infiltration levels of immune cell types by ssGSEA in R package gsva [11,20]. The ssGSEA uses the scoring result to individual tumor samples. In our study, we enrolled 24 immune cells of both innate immunity and adaptive immunity. We further analysised the function status of TILs by Tumor Immune Dysfunction and Exclusion (TIDE) algorithm [21].

Weighted co-expression network construction
The gene expression microarray GSE65858 was further used for WGCNA. We selected the top 25% genes (ranked by SD from high to low) to construct coexpression network. We calculated the adjacency coefficient (aij) through R package WGCNA, which represented the association extent between each two genes as follows: Supposing that X i and X j refer to expression value vectors for gene i and j, s ij refer to the Pearson's correlation coefficient of X i and X j , and is exponentially transformed into a ij , which measured the network association extent between gene i and j. The soft-thresholding borderline was set by soft threshold function of R package WGCNA to construct a scale-free network. The power of β = 4 (scale-free R 2 = 0.95) was set as the parameter of soft-thresholding. We gathered genes with high absolute Fig. 1 Construction of co-expression modules and identification of clinically significant modules. a) Analysis of the scale-free fit index and mean connectivity for different soft thresholding powers, and four was the most fit power value. b) The cluster dendrogram of top 25% genes ranked by SD from large to small in GSE65858. Each branch represents one gene, and each color represents one co-expression module. c) Heatmap of the correlation between module eigengenes (MEs) and the clinical features of HNSCC samples. d) The turquoise module was selected as the most significant module for further analysis value of correlations in one module in the network. We calculated the topological overlap measure (TOM) by the following equation: Modules are identified through hierarchical clustering of the weighting coefficient matrix. TOM refer to the overlap in neighboring genes of i and j.

HPV/immune-associated key modules and gene signature identification
We analyzed the relation between modules and their clinical or genetic features in order to identify the HPV/ immune-associated key modules. We first assessed the relation between module eigengenes (MEs) and certain feature. ME is the first principal component of each module, of which the expression represents all genes in the module. Then, we calculated the gene significance (GS) respectively for each gene (GS = lgP) in the linear regression between sample features and gene expression. Then we calculated the module significance (MS), Fig. 2 We downloaded the single cell sequencing data of profile GSE103322 and gene signatures of several tumor-related pathways from CancerSEA. The expression of tumor-associated pathways (including apoptosis, angiogenesis, differentiation, cell cycle, DNA damage, EMT, DNA repair, inflammation, invasion, quiescence and stemness pathways) in tumor cell was analyzed by through Gene Set Enrichment Analysis(GSEA). The red color referred to the pathway with high expression in the cell, while blue color referred to a low expression. The expression of turquoise module was related to some critical cell signaling pathways referring to the average GS of all genes in a module. We selected the module with highest absolute MS value to be the most relevant module of selected sample features.
We downloaded the single cell sequencing data of profile GSE103322 and gene signatures of tumor-related pathways from CancerSEA (http://biocc.hrbmu.edu.cn/ CancerSEA/). GSE103322 included 2105 tumor cells. The expression of tumor-associated pathways (including apoptosis, angiogenesis, differentiation, cell cycle, DNA damage, EMT (epithelial to mesenchymal), DNA repair, inflammation, invasion, quiescence and stemness pathways) in tumor cell was analyzed by through Gene Set Enrichment Analysis (GSEA). We applied k-NN (k = 15) algorithm from R package "FNN" to calculate the nearest neighbor of each cell at gene expression level and generate the nearest neighbor graph. The neighbors of each cell were taken as input for the visualization. Fruchterman Reingold algorithm was applied to calculate the force-directed layout on the k-NN graph via Gephi Toolkit. The red color referred to the pathway with high expression in the cell, while blue color referred to a low expression.
Hub genes correlated with HPV-infection and immune infiltration microenvironment were expected to be candidate genes. We identified the hub genes by two steps. Firstly, in GSE65858, the module membership (MM) of each gene in key module was calculated. The genes with MM over 0.8 was considered to be associated with the module features, as is described above. Secondly, we used R package limma for GSE40774 and DEseq2 for TCGA HNSCC to do differential expression analysis respectively. The cut off value was set as log2FC > |1|, and adjusted P-value < 0.01. We picked the intersection between the results from GSE65858, GSE40774 and TGCA database, among which we identified top eight hub genes. Differential expression analysis of hub genes was performed between tumor and normal samples, and also among different grade of tumor. Fig. 3 a) ROC showed that high expression of hub genes were closely related to HPV infection status. b) Hub genes were highly expressed in normal samplescompared with HNSCC samples (grey represents no statistical significance). c) Hub genes were highly expressed in high grade HNSCC

Prediction of therapy response
The expression degree of hub genes in GSE65858, GSE40774 and TCGA HNSCC were selected for further analysis. We used GEPIA2 (http://gepia2.cancer-pku.cn/) to anlyze survival outcome of hub genes; P under 0.05 was considered to be statistically significant.
Immune checkpoint blockade (ICB) is a novel tumor therapy, but the effective population is still unsure. To measure ICB response, we used the TIDE algorithm and subclass mapping. Two immune blockade drugs, CTLA4 and PD-1, were chosen. Besides, in order to explore whether hub genes could add information to exsisting criteria in therapy chosen, we applied a multivariate analysis. We firstly divided the patients into 6 subgroups(p16 positive/negative, CD274 score high/low, immune cell score high/low). Then we divided each subgroup into two groups based on the expression of these hub genes. Finally, we applied subclass mapping algorithm and pRRophetic algorithm for each subgroup in three independent datasets to test the independency and robustness of the hub gene marker.
We predicted the chemotherapeutic response of our samples on the pharmacogenomics database [the Genomics of Drug Sensitivity in Cancer (GDSC), https:// www.cancerrxgene.org/]. Four widely used chemotherapeutic agents for HNSCC, bleomycin, docetaxel, methotrexate, cisplatin, were selected. We used R package "pRRophetic" for prediction. The half-maximal inhibitory concentration (IC50) of the samples, as the parameter, was calculated by ridge regression. Rediction accuracy was assessed by 10-fold cross-validation, basing on the GDSC training set. Subgroup analysis was performed to test the independency and robustness of the hub gene marker in chemotherapy. The subgroups chosen was immune cell score high/low, HPV + ve/−ve and CD274 score high/low.

Weighted co-expression network construction and key modules identification
After processing the raw data, we performed WCGNA in order to screen out the genes related with both HPV infection status and immune cell infiltration. Initially, we clustered the samples GSE65858 by average linkage method, and no outsider sample was found (Fig. 1b).
Then we selected the soft-thresholding power to develop a scale-free network. Figure 1a showed that through thresholding powers from 1 to 20, we measured the network topology, and selected the mean connectivity and balanced scale. Then, the power of β =4 (scalefree R2 = 0.95) was selected. We finally identified 17 modules in total (Fig. 1c) and we removed genes in grey which did not belong to any modules. Therefore, we selected the turquoise module which had the most significant relation with Immune score and HPV infection status for further analysis (Fig. 1d).
In order to further validate the function of genes in turquoise module, we downloded single cell sequencing data profile GSE103322, including 2105 HNSCC tumor cells, and evaluated the performance of tumor-related pathways (including apoptosis, angiogenesis, differentiation, cell cycle, DNA damage, EMT, DNA repair, inflammation, invasion, quiescence and stemness pathways) through Gene Set Variation Analysis (GSVA). The overall view of all samples was shown in Fig. 2, including the expression of turquoise module and the expression of different tumor-related pathways respectively. It was shown that in samples with high turquoise module expression, the quiescence pathway expressed highly as well. The angiogenesis, differentiation, EMT, inflammation and invasion pathway were also significantly expressed in samples with high turquoise module expression.

Hub genes identification
To further identify the candidate genes in turquoise module, we defined MM over 0.8 and GS of immune infiltration over 0.5 as the cutoff threshold. We then identified 127 candidate genes in total.
To screen out the genes significantly related to HPV infection, we did differential expression analysis in two independent cohorts (TCGA HNSCC, GSE40774) between HPV + ve and HPV-ve samples (log2FC > |1|and adjusted P-value < 0.01). Then, we selected the intersection of genes in turquoise module and DEGs. In total we got 8 genes that was significantly associated with immune infiltration in HPV + ve HNSCC population, including LTB, CD19, CD3D, SKAP1, KLRB1, CCL19, TBC1D10C, ARHGAP4. The receiver operating characteristic curve showed that these hub genes were closely related to HPV status. (Figure 3a) hub genes were found to be highly expressed in normal samples compared with tumor samples (Fig. 3b). Besides, these gene signature were significantly highly expressed in high grade HNSCC (Fig. 3c).
The overall survival of high and low hub genes expression is shown in Fig. 4b. Since the expression of these genes could be different with a certain HPV status, we further explored independancy of the hub gene in prognosis. In the HPV negative cohort, these hub genes are significant Fig. 5 In the HPV negative cohort, eight hub genes had significant correlation with prognosis in two independent cohorts (p <0.05). Hub gene could be independently related to a better prognosis correlation with prognosis in two independent cohorts (p < 0.05) (Fig. 5). To sum up, the high expression of hub genes could be related to the better prognosis (Fig. 4c).

Hub genes were highly associated with tumor immune microenvironment
In order to further explore the infiltration landscape immune cells in HNSCC population with hub genes expression and HPV infection, we used the Gene Set Variation Analysis (GSVA) and CIBERSORT algorithm. To comprehensively assess the expression of the eight hub genes, we used GSVA to create a gene signature of them, and then used CIBERSORT to measure the infiltration of TILs. The result was shown in Fig. 6. In HNSCC population with high hub genes expression, the mostly infiltrative immune cells were CD8+ T cell, Treg, macrophage M1 and naïve B cell (Fig. 6a-f). And the most common immune cell types in HPV infected HNSCC population are CD8+ T cell, follicular helper T cell, Treg and γδT cell. In HPV-ve samples, the following types of immune cell are highly infiltrated: activated mast cell, resting dendritic cell, activated dendritic cell, Fig. 6 a-f) The infiltrative immune cell type of HNSCC samples with HPV infected (upper row) and high hub gene expression (down row) shared a similar result, especially CD8+ T cell. a) In GSE65858, correlation between TILs and HPV status. b) In GSE65858, correlation between TILs and hub gene expression. c) In GSE40774, correlation between TILs and HPV status. d) In GSE40774, correlation between TILs and hub gene expression. e) In TCGA HNSCC, correlation between TILs and HPV status. f) In TCGA HNSCC, correlation between TILs and hub gene expression. g-i) Heatmap of ssGSEA algorithm showed that hub genes are in positive correlation with both HPV infection status and hub genes in three data sets resting CD4+ T memory cell (Fig. 6a-f). There was a great similarity of immune cell infiltration type in HPV + ve and high hub genes expression cohorts in form of cell infiltration, especially CD8+ T cell, indicating that hub genes expressionwere closely related to immune cell infiltration of HNSCC population.
To further validate the result, we used ssGSEA to calculate the relation among hub gene expression, infection status and infiltration of different immune cells in HNSCC patients. The ssGSEA outcome showed a high positive association between the three aspects (Fig. 6g, h,  i), and the most relevant cells were T cells.
To sum up, hub genes expression may indicate a positive relation to immune cell infiltration, such as CD8+ T cell, follicular helper T cell, Treg and γδT cell, especially CD8+ T cell. The outcomes are similar and stable among different sample sets and methods.
We then performed TIDE algorithm to further explore the function of the TILs and the result was shown in Fig. 7a. We found that the expression of hub genes had a positive association with immune dysfunction in tumor (P < 0.01, correlation coefficient was 0.46).

Therapeutic strategies innovation
The assessment of immune cell function lead to the result that hub genes expression may indicate the condition of HPV infection associated immune microenvironment. To explore its relation in clinical therapeutic strategies, we used TIDE21 and subclass mapping [22] algorithm to analyze the ICB treatment response of population with high hub genes expression. In Fig. 7b, c, d, the relation between hub genes expression and PD1 targeting therapy response showed a significant trend with increasing expression of hub genes. (P < 0.05) This relation was consistent across all three independent data sets. We also applied a multivariate analysis of hub genes in order to investigate whether they could add Fig. 7 a) Correlation analysis by TIDE algorithm showed a positive association with T cell dysfunction in tumor and hub gene expression. The TIDE result also showed that the expression of hub genes was significantly associated with the activation of CD274 (PD-L1) b, c, d) PD1-R referred to patients with good response to PD1/PD-L1 therapy, while PD1-noR referred to patients with limited response to this therapy. The longitudinal axis showed patients with high and low hub gene expression. In each figure, the two lines at the bottom were the adjusted results of the upper two lines. The subclass mapping algorithm showed that patients with high hub gene expression had a similarity in genome with patients in PD1-R groups (p <0.01), indicating that patients with high hub gene expression may had better response to PD-1/PD-L1 therapy. This relation was consistent across all three data sets more imformation to the already existing criteria for therapy decision making. The result of multivariate analysis showed that these hub gene marker have a stable performance in ICB prediction regardless of criteria subgroups (Fig. 8). The result of chemotherapeutic response prediction is shown in Fig. 9. Among all the drugs, we found a positive and robust association between hub gene expression and the response of bleomycin and methotrexate. Subgroup analysis was performed, resulting in that the hub genes could predict the response of bleomycin and methotrexate independently in all subgroups. However, the response of docetaxel might be influenced by these subgroups (Fig. 10).

Discussions
In this study, we used a combination of several bioinformatics tools, including WCGNA, ssGSEA, CIBERSORT, TIDE,etc., to explore significant genes both related to HPV infection status and immune cell infiltration in HNSCC patients. A total of eight hub genes was identified, including LTB, CD19, CD3D, SKAP1, KLRB1, CCL19, TBC1D10C and ARHGAP4. By analyzing downloaded single cell sequencing data, we identified the underlying function of these hub genes. We found that in the HNSCC population, the hub genes had a stable co-expression, which was related to immune cell infiltration, especially CD8+ T cells, and the infiltrative cells were in a dysfunctional status, which had corresponded with several previous studies [23,24]. HPV + ve. Samples with high hub genes expression presented with better response to immune check point block (ICB) therapy, bleomycin and methotrexate.
In total we got 8 hub genes, which were LTB, CD19, CD3D, SKAP1, KLRB1, CCL19, TBC1D10C, ARHGAP4. Previous studies has researched on some of these genes respectively. LTB gene encodes a type II membrane protein of the TNF family lymphotoxin-β. Hsu DS [25] has reported that LTB Interacts with Methylated EGFR to mediate acquired resistance to Cetuximab in HNSCC. KLRB1 protein is expressed by NK cells and may be involved in the regulation of NK cell function. Previous studies shows that high expression of KLRB1 is closely related to the better prognosis of HNSCC patients [26]. CCL19, as the ligand of CCR7, induces several CCR7 activated pathways in metastatic HNSCC [27][28][29][30][31][32][33][34][35][36][37][38]. TBC1D10C protein is an inhibitor of both the Ras signaling pathway and calcineurin, and is reported to be related to the immune response, inflammatory response and formation of the tumor microenvironment [39]. In our research, we developed a combination of these eight genes, in which the hub genes had a stable coexpression in HNSCC population among different sample sets. The co-expression of hub genes may indicate immune microenvironment of HNSCC patients. Besides, some studies also found other underlying genetic markers. Previous studies [40,41] demonstrated the relation of NF-kappaB proteins and AP-1 super-family proteins with oral cancer. An activation and differential expression of NF-kappaB proteins in HPV infected oral cancer was found, especially p50 and p65, leading to heterodimerization of p50/p65, which may be related to improved differentiation and better prognosis. P50:p50/ NF-kappaB homodimer formation may has a cross-talk together with overexpression of AP-1 pathways in oral cancers. The relation of NF-κB proteins and AP-1 family proteins with HPV infection in tongue squamous cell carcinoma (TSCC) was mentioned by Gupta S et al. [42,43] as well. Further exploration of the association between hub genes and NF-κB proteins, AP-1 family proteins need to be applied.
We found that expression of hub genes has a positive association with T cell dysfunction in tumor. HPV + ve HNSCC population has been found to have a typical T cell infiltration in, especially CD4+ and CD8+ T cells [14][15][16], and feature a high level of immune cells infiltration but high degrees of immunosuppression [23,24,44]. Our result corresponded previous studies. The hub genes we found related to CD8+ T cell infiltration in HPV + ve HNSCC, and the infiltrative immune cells were in dysfunction status. While HPV infection status could divide HNSCC patients into two clinically, genetically, and immunologically different subtypes, current treatment for both subtypes are still similar [10]. Immune cell infiltration have been shown to be significant in the progression of HPV + ve HNSCC and in the favorable prognosis of HPV + ve HNSCC. Thus, ICB therapy could be a more promising treatment. HPV + ve HNSCC was reported to express higher PD-L1 proteins compared with HPV-ve HNSCC [45]. Based on this situation, we further assessed PD1/PD-L1 targeting therapy response in HPV + ve HNSCC patients. The TIDE result showed that hub genes expression was significantly associated with the activation of CD274(PD-L1), and the subclass mapping result was similar. This suggested that HPV + ve HNSCC population may be more suitable to ICB treatment. We also predicted the chemotherapy response of patients with high hub gene expression. Among all common used drugs, we found a positive association between HPV infection and response to bleomycin and methotrexate, which may be related to the better prognosis of HPV + VE HNSCC. To summarize, the eight hub genes we found stably co-express in immune cell infiltration of HPV + ve HNSCC population. The co-expression of hub genes related to an immune microenvironment featuring an increase in immune cells, especially CD8+ T cells, but high degree of immune dysfunction status. Patients with high hub gene expression had a better response to ICB treatment, bleomycin and methotrexate. The concrete mechanism of hub genes function demands further exploration.
This study has several strengths. We applied a combination of several bioinformatic tools to explore the underlying genetic mechanism of the better prognosis in HPV + ve HNSCC. Besides, we found the relation of hub genes with therapeutic response, which may help with immunotherapy and chemotherapy chosen. The weaknesses are as follow: Firstly, different prognosis may exist among HPV + ve samples with different hub genes expression. However, due to the incidence of HPV negative HNSCC is higher, we didn't find an appropriate cohort containing enough HPV positive samples with survival information to draw a reliable conclusion. Secondly, hub genes expression has positive association with T cell dysfunction leading to treatment failure and tumor progression, but in contrast HPV + ve HNSCCs show better prognosis following standard treatment. The correlation between T cell dysfunction and the better prognosis of HPV + ve HNSCC toward standard treatment should be further studied. Finally, the relation between hub genes expression and response of docetaxel was still unclear according to our study. Since docetaxel has been part of one-line therapy for HNSCC, further research is needed to explore the possibility of hub genes expression in predicting the response to docetaxel.

Conclusions
The eight hub genes we found presented with a stable co-expression in immune cell infiltration of HPV + ve HNSCC population. The co-expression of hub genes related to an immune microenvironment featuring an increase in immune cells but high degree of immune dysfunction status. Patients with high hub gene expression had a better response to ICB treatment, bleomycin and methotrexate. The co-expression of hub genes may be related to immune infiltration status in patients. The concrete molecular mechanism of hub genes function demands further exploration.