Integrative Analysis of DNA Methylation and Gene Expression in Skin Cutaneous Melanoma by Bioinformatic Approaches

Skin cutaneous melanoma is the most life-threatening skin cancer. Finding key methylation genes of prognostic value is an under-explored but intriguing eld in the research of skin cutaneous melanoma. This work is aimed to identify survival related methylated genes and their specic methylation sites in skin cutaneous melanoma via an integrative analysis with bioinformatic approaches. The original data, including gene methylation and expression les, were downloaded from the Cancer Genome Atlas database. Statistical analysis revealed that skin cutaneous melanoma patients with highly expressed and hypomethylated HHEX had a better outcome than patients with lowly-expressed and hypermethylated HHEX. In addition, fteen methylation sites of HHEX were identied to be signicantly correlated with HHEX expression changes. In various pathological stages, the expression levels of HHEX were different, and exhibited a downward trend from stage (cid:0) to stage (cid:0). Therefore, we speculate that the driven gene HHEX may play an important role in the survival of skin cutaneous melanoma. This nding provides novel epigenetic molecular clues and potential detection targets for early prediction of the prognosis of skin cutaneous melanoma.


Introduction
Derived from uncontrolled overgrowth of abnormal melanocytes, skin cutaneous melanoma (SKCM) assumes the primary responsibility for skin cancerrelated deaths due to its potent metastatic power 1,2 . Thickness of primary tumor, presence of ulceration, lymph node diffusion, mitoses, and distant metastases are features of middle or late stage of SKCM. And these indices are applied to determine the prognosis of the disease 3 . Numerous researches were carried out to nd novel biomarkers of prognostic value, which included circulating melanoma cells, exosomes, abnormally expressed proteins, mutated genes and non-coding RNAs [4][5][6][7] . However, these biomarkers were not speci c or early enough, and had limited use in clinical practice. Identifying new molecular that can be used for early and accurate prediction of the outcomes of the SKCM patients remains a goal in current days.
DNA methylation changes at the CpG site is the most widespread and stable epigenetic modi cation in malignant tumors 8 . Abnormal DNA methylation is regarded as a vital mechanism to promote the genesis and progression of tumors 9, 10 , while the phenomenon of DNA hypomethylation is less common compared with DNA hypermethylation in tumorigenesis 11 . Cumulative ndings indicated that melanoma development was prompted by alterations in DNA methylation, which has a strong in uence on the expression of numerous melanoma-associated genes by silencing tumor suppressor genes. DNA methylation changes also revealed a prognostic utility in patients with localized or metastatic melanoma [12][13][14][15][16] . For instance, RASSF1A (RAS association domain family 1 isoform A) is susceptible to methylation and its expression is often down-regulated in primary uveal melanoma 17,18 . SKCM is known to be virulent, even relatively small SKCM has the potential to metastasize, resulting in extremely high mortality 19,20 . Thus identifying relations of the DNA methylation and gene expressions in SKCM may provide new clues for the study of the prognosis of the disease.
The Cancer Genome Atlas (TCGA) database catalogues genomic pro les of more than 30 kinds of human tumors 21 . It is a database that is publicly available, free of patient consent or ethics committee approval. In this work, through a series of bioinformatic approaches, we downloaded data from TCGA and made an integrative analysis of DNA methylation and gene expression in SKCM, with an aim to nding new prognostic methylation genes and their methylation sites, and identifying more meaningful biomarkers for the evaluation of the prognosis of SKCM.

Data source and preprocessing
All the raw data were collected from the TCGA data portal website (https://portal.gdc.cancer.gov/) 22

Screening for differentially methylated genes
Original methylation matrix was normalized and processed with R package software. Differentially methylated genes were screened with the R -limma package. T-test and Benjamini -Hochberg's method were used to for statistical analysis of the genes. The differentially methylated genes were screened out by the following selection criteria: 1) | log2 (fold-change) | >1, and 2) the P-values < 0.05. The heatmap for the differentially methylated genes were created with R -heatmap package.
Correlation analysis of methylation levels of differentially hypermethylated genes and the corresponding mRNA expressions To evaluate the relationship between methylation levels of differentially hypermethylated genes and the corresponding mRNA expressions. Differently methylated genes with log2 (fold-change) > 0 were de ned as up-regulated methylation genes, namely hypermethylated genes. The correlation between the methylation levels of up-regulated methylation genes and the corresponding mRNA expressions was analyzed with R software. Absolute value of correlation coe cient (| cor |) > = 0.3 and P-values < 0.05 were considered to be signi cantly relatively high correlation. The correlation test was carried out with Pearsontest.
Correlation analysis of methylation sites of driven genes and the corresponding mRNA expressions To further assess the correlation between methylation levels of driven genes' methylated sites and the corresponding mRNA expressions, we analyzed the correlation between the methylation levels of speci c methylated sites of driven genes and the corresponding mRNA expressions by R packages. (| cor |) > = 0.3 and P-values < 0.05 were identi ed as a signi cantly high correlation, and Pearson-test was used to calculate the correlation test.

Joint survival analysis for driven genes
To explore the prognostic value of candidate genes, joint survival analysis for the methylation levels of those genes and genes expressions were conducted. Kaplan -Meier survival curves were used to illuminate and compare the prognostic information of a certain gene that is hypermethylated with low expression and hypomethylated with high expression. The P-value cutoff was 0.05. Exploratory analysis for corresponding mRNA of survival -methylated genes Expression analysis: GEPIA (http://gepia.cancer-pku.cn/index.html) is an expression analysis tool based on TCGA and GTEX data that quickly delivers critical interactive and customizable functionalities 23 . An expression analysis in diverse pathological stages of survival -methylated genes with the "Single Gene Analysis" and "Stag Plots" module of GEPIA was made using the "SKCM" dataset. Student's T-test was used for obtaining a P-value for analysis, and the Pvalues < 0.05 were still the criteria. Enrichment analysis: In brief, the STRING (https://string-db.org/) was visited with a single protein name ("HHEX") and organism ("Homo sapiens") according to the following parameters: minimum required interaction score ["Low con dence (0.150)"], meaning of network edges ("evidence"), max number of interactors to show ("no more than 50 interactors") and active interaction sources ("experiments"). Finally, 50 HHEX -binding proteins were obtained. Besides, the "Similar Gene Detection" module of GEPIA2 was used to get the top 100 targeting genes correlated to survival -methylated genes based on the datasets of SKCM tissues. Gene Ontology (GO) and pathway enrichment analysis was conducted for the two sets of data via R packages. Adjusted P-values < 0.05 was of statistical signi cance.

Study design
The work ow chart of our study design is shown in Fig. 1. Our initial goal is to make an integrative analysis of DNA methylation and gene expression in SKCM to identify pivotal methylation genes with prognostic value. Above all, data preprocessing was performed based on DNA methylation and mRNA expression pro les of SKCM downloaded from TCGA public database, so as to obtain normalized matrices of gene methylation and expression for subsequent analysis.
Genes with differential hypermethylation between SKCM samples and normal samples were extracted with the limma package of R software. Correlation analysis between the methylation levels of those hypermethylated genes and the corresponding mRNA expression levels were conducted to screen out driven genes, which were for further correlation analysis between methylation levels in their sites and genes expression levels. Then the combined survival analysis was carried out on driven genes. And a series of additional analyses based on prognostic genes were performed.
Identi cation of the differentially methylated genes SKCM patients and normal controls were analyzed, |Log 2 FC| greater than 1 and P-values less than 0.05 were considered as a criteria to screen out the differentially methylated genes. A total of 83 signi cantly differentially methylated genes were obtained, among which 21 were down-regulated and 62 were up-regulated (supplementary material 1). The heatmap of these genes were displayed in Fig. 2. 62 up-regulated methylation genes and 21 down-regulated methylation genes were identi ed in 475 samples, which could basically differentiate cancer samples from non-cancer samples.
Identi cation of differentially hypermethylated genes that were signi cantly correlated with the corresponding mRNA expressions DNA methylation is an epigenetic mechanism that is essential for regulating gene expression without changing DNA sequences 25 . DNA hypermethylation tends to inhibit the expression of downstream genes 26 . The correlation analysis for differentially hypermethylated genes and mRNA expressions were as follows: 1) Calculating the intersection of differentially hypermethylated genes and mRNA expressions. 2) Identifying the number of genes whose expressions were related to their methylation levels. A total of 15 differentially hypermethylated genes were related to corresponding mRNA expressions, of which 6 methylated genes, namely driven genes, were negatively correlated with mRNA expressions (Fig. 3). In addition, the correlation values of driven genes were − 0.668, -0.599, -0.424, -0.415, -0.385 and − 0.333 respectively (Table 1). Identi cation of the methylation sites of driven genes that were signi cantly correlated with the corresponding mRNA expressions Correlation between methylation sites of driven genes and the corresponding mRNA expressions was analyzed. As shown in supplementary material 2, a total of 54 sites of driven genes were signi cantly correlated with mRNA expressions. To be speci c, there were 15 sites for methylated HHEX (Fig. 4), 6 sites for methylated LINC00506 (Table 2), 12 sites for methylated NUP62CL, 10 sites for both methylated TMEM255A and methylated CH25H, and 1 site for methylated C2CD4A. The results indicate that these methylation sites may play a major role in the regulation of corresponding mRNA expressions. Identi cation of the prognosis -relevant methylated genes with joint survival analysis To further identify the prognosis -relevant methylation genes and evaluate their prognostic value in SKCM, the joint survival analysis of driven genes' methylation levels and their expression levels were performed. Results showed that HHEX and LINC00506 were signi cantly related to the prognosis of the disease, with corresponding P -values of 0.02 and 0.005, respectively. As illustrated in Kaplan -Meier survival curves in Fig. 5, hypomethylated HHEX and LINC00506 with high expression showed a preferable survival rate to hypermethylated HHEX and LINC00506 with low expression. As listed in Table 3, the ve- year survival rate of high-expressed and hypomethylated HHEX (0.629) was signi cantly higher than low-expressed and hypermethylated HHEX (0.540). Highexpressed and hypomethylated LINC00506 (0.665) was also better for survival than low-expressed and hypermethylated LINC00506 (0.488). Exploratory data analysis of corresponding genes of survival -methylated genes Expression analysis: Among corresponding genes of survival -methylated genes, HHEX was singled out as a gene of interest for further analysis in various pathological stages of SKCM. The correlation between the HHEX expressions and the pathological stages of SKCM patients was assessed and there was a signi cant correlation between the expressions of HHEX (p = 3.78e-04) and pathological stages of the disease (Fig. 6). The expression of HHEX in early stage was remarkably higher than in the later stage, which might be closely related to hypermethylation of HHEX. These data suggest that hypermethylated HHEX probably plays a key role in the progression of SKCM.
Immune in ltration data analysis: Over the years, it has become increasingly appreciated that tumor -in ltrating immune cells were closely linked to the occurrence, progression, and metastasis of various cancers [27][28][29] , highlighting the signi cant role of in ltrating immune cells in tumor growth. Here, EPIC, TIMER, QUANTISEQ, XCELL, CIBERSORT, CIBERSORT-ABS and MCPCOUNTER algorithms were applied to investigate the potential correlation between the gene

expressions of HHEX and in ltration levels of different immune cells (CD8 + T-cells, CD4 + T-cells, DC, macrophages, neutrophils and NK cells) in SKCM.
There was a positive correlation between HHEX expression and the immune in ltration of macrophages, neutrophils and DC in SKCM (Fig. 7) on the ground of most algorithms. As showed in Fig. 7, the scatterplot data of SKCM was produced using two algorithms. The expression level of HHEX in SKCM is positively correlated with the in ltration level of neutrophils (cor = 0.414, P = 2.44e − 20; cor = 0.151, P = 1.23e − 03) based on the TIMER and MCPCOUNTER algorithm, respectively.
Enrichment analysis of HHEX-related partners: To further investigate the molecular mechanism and biological function of the HHEX gene in the progression and prognosis of SKCM, enrichment analysis was made for the two datasets. Based on the GEPIA2 tool, top 100 HHEX expression-correlated genes and 50 targeting HHEX-binding proteins were screened out (supplementary material 3, Fig. 8A). According to the results of the enrichment analysis from the R software ( Fig. 8B-C), as for the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, "Viral carcinogenesis", "Human T-cell leukemia virus 1 infection" and "B cell receptor signaling pathway" are signi cantly enriched in. In terms of GO for the biological process (BP), "Myeloid cell differentiation", "Lymphocyte differentiation" and "Regulation of leukocyte differentiation" are signi cantly enriched in. Among the cellular component (CC), "Transcription regulator complex" and "RNA polymerase II transcription regulator complex" are the most concentrated; As well as in the molecular function (MF), "Nuclear import signal receptor activity" and "Nucleocytoplasmic carrier activity" are noticeably enriched in (Table 4).   34,35 . Therefore, the correlation between the methylation levels of genes and the corresponding mRNA expressions is an underlying mechanism for the outcome of SKCM. In this paper, among 62 differentially hypermethylated genes, 15 were related to the corresponding mRNA expressions. And there were 6 methylated genes, namely HHEX, LINC00506, NUP62CL, TMEM255A, CH25H and C2CD4A (Fig. 3, Table 1), were identi ed as driven genes. With the Kaplan -Meier curves for prognostic analysis, highly-expressed and hypomethylated HHEX (p = 0.02) and LINC00506 (p = 0.005) showed better outcome than lowly-expressed and hypermethylated HHEX and LINC00506 (Fig. 5), which revealed that HHEX and LINC00506 were of signi cantly prognostic value in SKCM. To be more speci c, the ve-year survival rate of HHEX with high-expression and hypomethylation (0.629; 95% CI, 0.553 to 0.716) was signi cantly higher than that of HHEX with low-expression and hypermethylation (0.540; 95% CI, 0.457 to 0.639). LINC00506 with high-expression and hypomethylation (0.665; 95% CI, 0.589 to 0.752) seemed to have a better survival rate than with low-expression and hypermethylation (0.488; 95% CI, 0.406 to 0.586) ( Table 3). HHEX has been extensively demonstrated to be associated with the migration and invasion of multiple malignancies, such as breast cancer, hematological malignancy, liver cancer, prostate cancer, colorectal cancer and thyroid cancer 36, 37 . It has been proved to serve as a prognostic marker for various malignant tumors 38 . Thus it was chosen as a possible prognosis-relevant methylation gene of interest for the subsequent analysis.
HHEX (hematopoietically expressed homeobox), also named as HEX, HOX11L-PEN, and PRHX, encodes several transcription factors of the homeobox family, which are implicated in many kinds of tumor development processes, including cell proliferation, invasion, metastasis and survival 39,40 . Surprisingly, in SKCM, while extensive researches have explored its epigenetic alterations, the involvement the methylation levels and sites of HHEX in the outcome of tumor have yet to be fully elucidated. DNA methylation regulate the corresponding mRNA expression and has an impact on the molecular function of translated proteins.
Aberrant HHEX methylation may be a crucial factor to pathophysiologic aggression and worse survival of cancer, which is supported in plenty of researches 36, 37,41 . Notably, the hypermethylation of HHEX has been reported to be associated with poor overall and disease-speci c survival among patients with acral melanoma (AM) 42 . However, AM is just a subtype of cutaneous melanoma localize in the glabrous skin of the palms, soles, or nail apparatus 43 . In our study, evaluated by Kaplan-Meier survival curves, in contrast to hypermethylation, hypomethylation of HHEX displays a signi cant better outcome in cutaneous melanoma, which is consistent with previous studies on other cancers 41 . Therefore, we hypothesize that a hypermethylated driven gene of HHEX may be a potential predictor for impaired survival in patients with SKCM.
However, according to our ndings, the expression of HHEX varies in different pathological stages of SKCM (Fig. 6). It is at a relatively low level in stage zero, in which tumors are con ned to the epidermis or mucosal epithelium without invasion or distant metastasis, while in stage -, tumors have in ltrated into the dermis and deeper layers, and the expression of HHEX shows a downward trend with advanced stages. As we have known, invasive melanomas are among the most immunogenic tumors 44 , and some immunocytes are deemed to be involved in SKCM, which includes macrophages, neutrophils and dendritic cells etc. [45][46][47][48] . Here, with multiple immune deconvolution approaches, we observed a positive correlation between HHEX expressions and the immune in ltration of the above mentioned immunocytes in the SKCM (Fig. 7). And the enrichment analysis of HHEX and its related partners further displays that a group of HHEX genes are mainly enriched in a set of immune -related processes, such as differentiation regulation of myeloid cells, lymphocytes and leukocytes (Fig. 8, Table 4). A possible explanation is that activated immune response, together with increasing hypermethylation, co-regulate the expression of HHEX. And the expression of HHEX and the immune response may be interactive in the development of SKCM. Therefore, when SKCM is in situ, the immune response may not be fully activated, so the hypermethylation status of HHEX may not be interfered by the immune response. However, when SKCM develops into invasive cancer, the immune response is correspondingly activated to some extent, resulting in gradually decreasing methylation levels of HHEX in stage -. In addition, with the deterioration of the tumor and the gradual weakening of the body's immune response, the methylation level of HHEX increased and the corresponding mRNA level decreased. The result seems to be in line with the joint survival analysis of different pathologic stages of SKCM. Therefore, HHEX is reasonable enough to act as a potential candidate for prognostic biomarker for patients with aggressive SKCM.
Identifying the sites of methylation is a requisite for decoding the roles of methylation in gene expression regulation, which has been implicated in the pathological processes of tumor occurrence and development. Numerous studies have corroborated that methylation at speci c sites of some crucial genes affects the prognosis of various cancers [49][50][51] . Based on the prognostic value of HHEX in SKCM, the correlation between speci c methylation sites of HHEX and the corresponding mRNA expression were further analyzed. In our work, a total of 15 methylation sites (cg14689537, cg24787755, cg23009123, cg18599843, cg11055493, cg26979504, cg00487187, cg16508068, cg03330490, cg02185052, cg00876273, cg07616394, cg14283380, cg09721427 and cg04413153) have been identi ed (Fig. 4, Table 2). Since their methylation levels are closely related to the corresponding mRNA contents, we consider that these methylation sites may indirectly play a role in disturbing certain gene expression, then make an impact on the outcome of melanoma patients.
Epigenetic changes are potentially reversible in melanoma 52,53 , which seems to provide a target for the prevention and treatment of the tumor. Accordingly, we speculate that not only survival -methylated HHEX, but also its methylation loci may become potential targets for SKCM therapies.
Although bioinformatics can provide researchers with vast amounts of data and a variety of ways to study disease, there are still some limitations that cannot be ignored. In our study, nearly all related raw data were downloaded from TCGA. However, in the majority of cases, the normal sample size was relatively too small to the tumor sample sizes. And the clinical information on individuals was insu cient, which appeared to be an underlying restriction. Accordingly, patient's material derived from TCGA database was not absolutely representative of any real SKCM population. The results of our analysis were virtual and should be con rmed in future experiments.
Taken together, our multilayer data mining and comprehensive bioinformatics analysis revealed that the methylation of HHEX may be a novel molecular tool    The expression levels of the HHEX gene in various pathological stages of SKCM (stage 0, stage I, stage II, stage III, and stage IV) . Log2 (TPM + 1) was applied for log-scale.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Supplementaryinformation.docx