DNA Methylome Distinguishes Head and Neck Cancer from Potentially Malignant Oral Lesions and Healthy Oral Mucosa

There is a strong need to find new, good biomarkers of head and neck squamous cell carcinoma (HNSCC) because of the bad prognoses and high mortality rates. The aim of this study was to identify the potential biomarkers in HNSCC that have differences in their DNA methylome and potentially premalignant oral lesions, in comparison to healthy oral mucosa. In this study, 32 oral samples were tested: nine healthy oral mucosae, 13 HNSCC, and 10 oral lesions for DNA methylation by the Infinium MethylationEPIC BeadChip. Our findings showed that a panel of genes significantly hypermethylated in their promoters or specific sites in HNSCC samples in comparison to healthy oral samples, which are mainly oncogenes, receptor, and transcription factor genes, or genes included in cell cycle, transformation, apoptosis, and autophagy. A group of hypomethylated genes in HNSCC, in comparison to healthy oral mucosa, are mainly involved in the host immune response and transcriptional regulation. The results also showed significant differences in gene methylation between HNSCC and potentially premalignant oral lesions, as well as differently methylated genes that discriminate between oral lesions and healthy mucosa. The given methylation panels point to novel potential biomarkers for early diagnostics of HNSCC, as well as potentially premalignant oral lesions.


External database validation
Our methylation findings were compared to TCGA Illumina HISeq RNAseq data of the TCGA-HNSC project through Wanderer (http://maplab.imppc.org/wanderer/), an interactive viewer to explore DNA methylation and gene expression data in human cancer. The RNAseq estimate of expression for the top 15 hypo-and hypermethylated gene promoters in our results were visualized in Wanderer. The TCGA dataset included 497 tumor and 43 normal tissue samples. Where the direction of expression change did not correspond with our methylation change, we also visualized a complementary TCGA in Illumina 450K DNA methylation array results for the same genes. The summary table (Table A2) and box plot graphs for individual genes are provided below. Out of the total of top 15 hypermethylated genes in our study, 10 were found to be either under-expressed or hypermethylated in TCGA cancer cases, as expected, while one had no measurable expression and only a single CpG site in Illumina 450K DNA methylation array. From the top 15 hypomethlylated genes in our study, 12 were also found to be either over-expressed or hypomethylated in TCGA data, with the remaining three lacking annotated data or probes in Illumina 450K DNA methylation array (Díez-Villanueva, Anna, Izaskun Mallona, and Miguel A. Peinado. "Wanderer, an Interactive Viewer to Explore DNA Methylation and Gene Expression Data in Human Cancer." Epigenetics and Chromatin 8, no. 1 (June 23, 2015): 22 https://doi.org/10.1186/s13072-015-0014-8). Table S2. Comparison of our methylation data with previous TCGA expression data for the same genes. Where the expression change was discrepant, TCGA methylation data was assessed. The bold font indicates the expected results or confirmed genes, italic fonts indicate discrepancies.  Figure S3. Public TCGA expression data of the selected top 15 genes found to be hypermethylated in our study. Graphs were interactively made in Wanderer web server from the TCGA-HNSCC project from Illumina HiSeq RNAseq data. Figure S4. Public TCGA methylation data of genes found to be hypermethylated in our study but without decreased expression in TCGA RNAseq data. Graphs were interactively made in Wanderer web server from the TCGA-HNSCC project from Illumina 450K DNA methylation array. Figure S5. Public TCGA expression data of the selected top 15 genes found to be hypomethylated in our study. Graphs were interactively made in Wanderer web server from the TCGA-HNSCC project from Illumina HiSeq RNAseq data. Figure S6. Public TCGA methylation data of genes found to be hypomethylated in our study but without increased expression in TCGA RNAseq data. Graphs were interactively made in Wanderer web server from the TCGA-HNSCC project from Illumina 450K DNA methylation array.

Gene set enrichment analysis
The list of differentially methylated CpG and promotor regions was combined and assessed to determine whether the affected genes are enriched for specific sets of functions or pathways. However, for the analysis, only those sites/regions with assigned RefGene names indicating nearby or overlapping genes were selected. The analysis was done using the WebGestalt functional enrichment analysis web tool (Liao, Yuxing, Jing Wang, Eric J. Jaehnig, Zhiao Shi, and Bing Zhang. "WebGestalt 2019: Gene Set Analysis Toolkit with Revamped UIs and APIs." Nucleic Acids Research. Accessed 28 May 2019. https://doi.org/10.1093/nar/gkz401).
Methylation data was separately explored for the hypo-and hypermethylated genes of three comparisons: HNSCC vs. healthy tissue, HNSCC vs. oral lesion, and oral lesion vs. normal healthy tissue. Two different analysis approaches were used: Over Representation Enrichment Analysis (ORA) and Gene Set Enrichment Analysis (GSEA). For the ORA and GSEA analysis, the Gene Ontology (GO) of biological process (no-redundant) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway databases were chosen.
For ORA, the reference gene set was set to the whole genome, since many differentially methylated regions were related to miRNA and other non-coding sequences. For GSEA, gene promotors were ranked according to the mean difference and this data was supplied in addition to the gene symbol. Unless indicated otherwise, default parameters were used. The results are presented below. The ORA (GO biological processes) for consistantly hypomethylated gene promoters and/or CpG sites in A) HNSCC compared to control healthy tissue, B) HNSCC compared to potentially premalignant oral lesions, and C) potentially premalignant oral lesions compared to control healthy tissue is shown in Figure 4 in the main publication; the top 10 categories and False Discovery Rate (FDR) adjusted significance (colored bar) are shown.
A -HNSCC tissue vs. healthy oral tissue.
C -Oral lesions vs. healthy oral tissue.