Data mining on identifying diagnosis and prognosis biomarkers in head and neck squamous carcinoma

Head and neck squamous carcinoma (HNSC) induces high cancer-related death worldwide. The biomarker screening on diagnosis and prognosis is of great importance. This research is aimed to explore the specific diagnostic and prognostic biomarkers for HNSC through bioinformatics analysis. The mutation and dysregulation data were acquired from UCSC Xena and TCGA databases. The top ten genes with mutation frequency in HNSC were TP53 (66%), TTN (35%), FAT1 (21%), CDKN2A (20%), MUC16 (17%), CSMD3 (16%), PIK3CA (16%), NOTCH1 (16%), SYNE1 (15%), LRP1B (14%). A total of 1,060 DEGs were identified, with 396 up-regulated and 665 downregulated in HNSC patients. Patients with lower expression of ACTN2 (P = 0.039, HR = 1.3), MYH1 (P = 0.005, HR = 1.5), MYH2 (P = 0.035, HR = 1.3), MYH7 (P = 0.053, HR = 1.3), and NEB (P = 0.0043, HR = 1.5) exhibit longer overall survival time in HNSC patients. The main DEGs were further analyzed by pan-cancer expression and immune cell infiltration analyses. MYH1, MYH2, and MYH7 were dysregulated in the cancers. Compared with HNSC, their expression levels are lower in the other types of cancers. MYH1, MYH2, and MYH7 were expected to be the specific diagnostic and prognostic molecular biomarkers of HNSC. All five DEGs have a significant positive correlation with CD4+T cells and macrophages.


Results
Mutations in HNSC patients. The mutation profile of HNSC patients from the TCGA database was obtained. Figure 1A showed that the frequency distribution of different variant types was divided into seven according to the impact on protein-coding. We found that missense mutation accounts for the majority of the variant types. In these variant types, the number of SNPs is significantly more than that of INS and DEL (Fig. 1B). Besides, C > T transversion was the primary type of single nucleotide variants (SNV) in HNSC (Fig. 1C). We show genes with a mutation frequency of top25 in Fig. 1D. The top ten genes with a mutation frequency were TP53 (66%), TTN (35%), FAT1 (21%), CDKN2A (20%), MUC16 (17%), CSMD3 (16%), PIK3CA (16%), NOTCH1 (16%), SYNE1 (15%), LRP1B (14%).
Differentially expression genes calculation. In these patients, the expression levels in two groups were compared, and a total of 1061 DEGs were identified, with 396 up-regulated and 665 downregulated ( Fig. 2A,B). Then, we retained the DEGs with more than 1% mutation frequency. After the screening, 187 common DEGs were obtained (Fig. 2C), including 66 up-regulated and 121 down-regulated.
PPI and key module analyses on DEGs. PPI analysis was performed on the common DEGs with more mutation frequency. The PPI network showed 144 nodes and 489 interaction pairs (Fig. 3). The DEGs with more showed Waterfall plot of the top 25 mutated genes in TCGA HNSCC cohort. The first part is the heatmap in the middle, where each row represents a gene and each column represents a sample, showing the distribution of different mutation types for each sample, and the second part is a stacked bar chart on the right, representing the different mutation types on each gene The frequency distribution of the loci, the third part is the stacked histogram above, which represents the frequency distribution of the loci of different mutation types in each sample. In addition, we listed the key DEGs in sub-network modules in Table 1.

Function analysis on DEGs in sub-network modules.
We further analyzed the function of DEGs by GO-BP and KEGG. We found that the DEGs in module A are mainly involved in GO-BP terms, including extracellular matrix organization, extracellular structure organization, collagen fibril organization, endodermal cell differentiation, endoderm formation, endoderm development, formation of the primary germ layer, cellular response to amino acid stimulus, collagen-activated tyrosine kinase receptor signaling pathway, and collagen-activated signaling pathway (Fig. 4A). They are also involved in KEGG pathways of protein digestion and absorption, ECM-receptor interaction, AGE-RAGE signaling pathway in diabetic complications, amoebiasis, focal adhesion, human papillomavirus infection, PI3K-Akt signaling pathway, Relaxin signaling pathway, small cell lung cancer, and platelet activation (Fig. 4B). The DEGs in module B mainly participated in muscle contraction, muscle system process, myofibril assembly, cellular component assembly involved in morphogenesis, striated muscle cell development, cardiac muscle cell development, cardiac cell development, muscle filament sliding, actin-myosin filament sliding, and cardiac muscle fiber development (Fig. 4C). In addition, KEGG pathways of hypertrophic cardiomyopathy, dilated cardiomyopathy, cardiac muscle contraction, thyroid hormone signaling pathway, adrenergic signaling in cardiomyocytes, cGMP-PKG signaling pathway, viral myocarditis, and arrhythmogenic right ventricular cardiomyopathy were enriched (Fig. 4D).
As for MYH2, MYH7, and NEB were significantly positively correlated with CD4+T cell (P < 0.001; Fig. 7C-E). They showed consistent correlation in the immune cells among the total HNSC, HNSC-HPV positive and negative samples; Some slight correlation opposition was observed in HNSC-HPV positive or negative with total HNSC, but not significant. Therefore, we concluded that no significant differences were identified in HNSC-HPV positive or negative with total HNSC in terms of immune cell infiltration of ACTN2, MYH1, MYH2, MYH7, and NEB.

Discussion
HNSC, a dreadful opponent, has become one of the world's most deadly diseases. Treatment for HNSC is costly, has a long cycle, and is not affordable. Therefore, the precise and early prediction of HNSC outcomes is critical for therapy options and prognosis improvement. Our present study focused on gene mutation and dysregulation expression in HNSC, expecting to identify essential biomarkers for HNSC.
In the present study, we explored the landscape of mutation and dysregulation in HNSC patients, showing that TP53, TTN, FAT1, CDKN2A, and MUC16 were the most predominant mutated genes. The mutation identification was consistent with Jiang et al. 14 . Among human genes, TP53 is a critical tumor suppressor gene, with low expression in normal cells and high expression in malignant tumors, regulating cell proliferation, apoptosis, angiogenesis, and DNA repair 15 . Mutated TP53 was found in various cancers and is correlated with reduced O.S. Furthermore; it showed that HNSC patients with TP53 mutations have a bleak prognosis than TP53-wildtype HNSC 16 . Our present study also found that mutated TP53 is observed in HNSC.
Then, the dysregulated genes were identified, with 187 DEGs having more mutation sites. We conducted the PPI analysis on these DEGs and identified two modules. We further analyzed these DEGs in these modules by GO-BP and KEGG analyses. The DEGs in module A are involved in the PI3K-Akt signaling pathway, a classic intracellular signaling pathway regulating cell apoptosis, proan essential pathway that regulates cell apoptosis, prognosis, and metastasis in HNSC [17][18][19] . The DEGs in module B participated in the cGMP-PKG signaling pathway. In addition, Tuttle et al. 20 demonstrated that the cGMP-PKG signaling pathway is one of the therapeutic targets of HNSC.
The DEGs of ACTN2, MYH1, MYH2, MYH7, and NEB in module B were significantly associated with O.S. in HNSC. ACTN2 is a member of the spectrin gene superfamily that includes varying groups of cytoskeletal proteins. In breast cancer patients, mutated ACTN2 was related to invasive ductal carcinoma and suggested a worse O.S. than ductal carcinoma in situ 21 . Sun et al. 22 demonstrated that ACTN2 is one of the hub genes selected by bioinformatics methods in PTEN mutation prostate cancer. Xu et al. 23 revealed that negative ACTN2 expression www.nature.com/scientificreports/ contributed to a better O.S. in HNSC patients, consistent with our present study. Therefore, ACTN2 might be an essential biomarker for predicting O.S. of HNSC, which might be employed in clinical. Through pan-cancer analysis, we found that MYH1, MYH2, and MYH7 are dysregulated in cancers, including HNSC. However, compared with HNSC, they were lower expressed in other cancers, revealing that these three genes can be considered single HNSC biomarkers. So far, different mutations in multiple members of the MYH gene family are associated with human hereditary diseases 24 . Among them, MYH2 mutations can cause a class of skeletal muscle diseases characterized by ophthalmoplegia. MYH7 mutations can cause skeletal muscle diseases, including myosin deposition myopathy, and Distal Laing myopathy is also closely related to hypertrophic cardiomyopathy. However, MYH1, MYH2, and MYH7 functions in HNSC are rarely mentioned. Further study into the interaction of the MYH gene family and HHSC appears to be an intriguing research topic.
As we all know, HPV infection is the most common causative factor for HNSC. Considering the significant influence of HPV on HNSC, some molecular research on HNSC have divided the samples into HNSC-HPV positive and negative groups. For example, Zhang et al. 25 found that Dedicator of cytokinesis 8 (DOCK8) is identified as one prognosis biomarker associated with immune infiltration HPV-positive HNSC. Our present study identified five prognosis DEGs that are significantly related to the O.S. in HNSC. To determine whether the HPV infection had significant influences on the results. We further analyzed their expression in HPV-positive and -negative HNSCs, and no significant differences were observed. Moreover, we also tested the immune cell infiltration of DEGs in both HPV-positive and -negative HNSCs. Although studies have demonstrated that HPV influences the immune infiltration situation 26 , it is not observed in ACTN2, MYH1, MYH2, MYH7, and NEB.As genes highly expressed in the tumor cells are expected to affect tumor purity positively, the association between the expression of prognosis DEGs and six immune infiltrates was evaluated. We found that all five DEGs have a significant positive correlation with CD4+T cells and macrophages. Therefore, we speculate that genetic mutations and differential expression of ACTN2, MYH1, MYH2, MYH7, and NEB genes in HNSC cells may be important drivers of CD4+T cell and macrophage infiltration.

Conclusions
In our present study, we identified the biomarker of HNSC in terms of diagnosis and prognosis. TP53 was found as the most mutated gene in HNSC. ACTN2, MYH1, MYH2, MYH7, and NEB genes were significantly associated with poor prognosis. Moreover, MYH1, MYH2, and MYH7 were expected to be the specific diagnostic and prognostic biomarkers molecular biomarkers of HNSC. Screening of high-frequency mutation genes. The maftools 28 was used to summarize the HNSC mutation information downloaded from the TCGA official website and screen the high-frequency mutation genes. The top 25 high-frequency genes with the mutation frequency were exhibited with a waterfall chart.

Methods
Differentially expressed gene (DEG) identification. The DEGs in tumor tissues were identified according to the FPKM value by limma package 29 (Version 3.10.3), using P-value < 0.05 and log 2 fold change (F.C.) > 2 as thresholds. The Pheatmap R package was employed for drawing the heatmap and volcano.
Then, the common DEGs with high-frequency mutation genes that exhibit a mutation frequency of more than 1% were retained for further study.
Protein-protein interaction (PPI) and module analyses. The PPI analysis was performed on all the common DEGs using STRING 30 (Version: 10.0, http:// www. string-db. org/) database, with a 0.4 (medium confidence) parameter. The network was constructed by Cytoscape (version: 3.2.0) 31 . The most significant clustered modules in the PPI network were analyzed using the Cytoscape plug-in MCODE 32 (Version 1.4.2, http:// apps. cytos cape. org/ apps/ MCODE) method, with a threshold of score ≥ 10.