Integrated data analysis reveals potential drivers and pathways disrupted by DNA methylation in papillary thyroid carcinomas

Papillary thyroid carcinoma (PTC) is a common endocrine neoplasm with a recent increase in incidence in many countries. Although PTC has been explored by gene expression and DNA methylation studies, the regulatory mechanisms of the methylation on the gene expression was poorly clarified. In this study, DNA methylation profile (Illumina HumanMethylation 450K) of 41 PTC paired with non-neoplastic adjacent tissues (NT) was carried out to identify and contribute to the elucidation of the role of novel genic and intergenic regions beyond those described in the promoter and CpG islands (CGI). An integrative and cross-validation analysis were performed aiming to identify molecular drivers and pathways that are PTC-related. The comparisons between PTC and NT revealed 4995 methylated probes (88% hypomethylated in PTC) and 1446 differentially expressed transcripts cross-validated by the The Cancer Genome Atlas data. The majority of these probes was found in non-promoters regions, distant from CGI and enriched by enhancers. The integrative analysis between gene expression and DNA methylation revealed 185 and 38 genes (mainly in the promoter and body regions, respectively) with negative and positive correlation, respectively. Genes showing negative correlation underlined FGF and retinoic acid signaling as critical canonical pathways disrupted by DNA methylation in PTC. BRAF mutation was detected in 68% (28 of 41) of the tumors, which presented a higher level of demethylation (95% hypomethylated probes) compared with BRAF wild-type tumors. A similar integrative analysis uncovered 40 of 254 differentially expressed genes, which are potentially regulated by DNA methylation in BRAFV600E-positive tumors. The methylation and expression pattern of six selected genes (ERBB3, FGF1, FGFR2, GABRB2, HMGA2, and RDH5) were confirmed as altered by pyrosequencing and RT-qPCR. DNA methylation loss in non-promoter, poor CGI and enhancer-enriched regions was a significant event in PTC, especially in tumors harboring BRAFV600E. In addition to the promoter region, gene body and 3’UTR methylation have also the potential to influence the gene expression levels (both, repressing and inducing). The integrative analysis revealed genes potentially regulated by DNA methylation pointing out potential drivers and biomarkers related to PTC development.


Background
Thyroid cancer is the most common tumor of the head and neck region, with the highest incidence among the endocrine neoplasias [1]. Papillary thyroid cancer (PTC) is the histological subtype with higher incidence (80% of cases) worldwide [2].
The thyroid carcinogenesis involves a constitutive activation of two major pathways associated to tyrosinekinase, including mitogen-activated protein kinase (MAPK) and phosphatidylinositol 3-kinase (PI3K) [3]. The activation of these pathways occurs mainly due to point mutations in BRAF and RAS and chromosomal rearrangements in RET [3]. MAPK signaling pathway activated by genetic alterations is frequently involved in cell proliferation, growth, and survival [4].
Approximately 60% of PTC cases are characterized by T1799A BRAF transversion nucleotide change (over 95% of the mutations), resulting in V600E mutant protein with constitutive activation of BRAF kinase [5][6][7]. BRAF mutation has been associated with unfavorable prognosis including large primary tumors, lymph node and vascular invasion, advanced stage, extrathyroidal extension, distant metastases, and recurrence [8,9]. However, there are no consensus in literature, since many studies have not found this association [10][11][12].
Recently, large-scale approaches were used to investigate the methylation profile of thyroid cancer [17][18][19][20][21][22][23]. These technologies allow the assessment of not only CpG islands (CGI) and promoter regions, as previously reported, but unveiled novel regions involved in neoplastic process, such as CGI shores/shelf, non-CGI promoter and enhancers [24,25]. However, the methylation as a regulatory mechanism of gene expression is poorly explored in PTC, even in the multiplatform robust study performed by The Cancer Genome Atlas (TCGA) [21].
To our knowledge, the current study is the first to assess a substantial matched-sample subset with methylation and expression data addressing the available data from TCGA study to cross-validate the results. The genes signature potentially regulated by methylation inferred the role of this epigenetic event in PTC development.

Sample population
Snap frozen PTC samples stored at tissue biobank of the A.C. Camargo Cancer Center, SP, Brazil, were obtained retrospectively. Forty one papillary thyroid carcinomas of patients treated with total thyroidectomy followed by radioiodine therapy and matched non-neoplastic adjacent tissues (NT) samples were included in this study. Cases with incomplete clinical data in medical records or diagnosed with previous or synchronic malignancies were excluded (except basal skin cell carcinoma). Clinical and histopathological data are summarized in Table 1.

Nucleic acids extraction and analysis of somatic mutations
Genomic DNA extraction was performed according to conventional protocol using enzymatic degradation with proteinase K followed by purification with organic solvents (phenol/chloroform). RNA was isolated as previously reported [26].

DNA methylation and gene expression profiling
Five hundred nanograms of DNA (Qubit® dsDNA BR Assay no Qubit® 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) were bisulfite-modified using EZ-DNA Methylation-Gold Kit (Zymo Research, Irvine, CA, USA) according to the manufacturer's recommendations. Converted DNA was used for the genome-wide methylation assays (Infinium Human Methylation450 BeadChip array-Illumina, San Diego, CA, USA). Arrays were scanned by HiScan system (Illumina), and methylation data were analyzed as β values. Genome-wide DNA methylation data processing was done as reported previously [27] (detailed in Additional file 1). Limma package [28] was used to identify significant probes adopting adjusted (Bonferroni) p value <0.05 and mean delta β value (Δβ) of 0.15 as a threshold for differential DNA methylation. A hypergeometric test (p < 0.05) was performed from phyper function of STATS package in R language to compare differentially methylated probes in relation to genomic regions (Illumina 450K array annotation).
Gene expression data were obtained from our previously reported study (GEO accession number GSE50901) [26]. The unsupervised hierarchical clustering analysis was performed using the most variable probes (interquartile range >0.2 to methylation and >1.0 to gene expression). Euclidean distance with average linkage method was used in all clustering analysis by BRB array tool software (https://brb.nci.nih.gov/BRB-ArrayTools/download.html). Student t test was assessed to verify the association between methylation/expression data of selected genes with clinical parameters.

Integrative analysis and cross-study validation
All probes differentially methylated (|Δβ| 0.15 and adjusted p < 0.05) and expressed (FDR <5% and fold change >2) were subjected to an integrative analysis, using a Pearson correlation test (34 PTC evaluated by both analysis), aiming to obtain negative and positive significant correlations (p < 0.05).
A cross-study validation was performed to confirm the results using DNA methylation microarray and RNA sequencing data from TCGA database (https://tcgadata.nci.nih.gov/docs/publications/tcga/). Similar parameters of the internal analysis were adopted to compare all conditions in the external dataset (t test p < 0.05, FDR <5% and Pearson correlation test p < 0.05) (details in Additional file 1). Figure 1 summarizes the strategy and results obtained in this analysis.

In silico molecular interactions analysis
Disease and biological function and canonical pathway analysis including genes found in the integrative analysis were performed using Ingenuity pathway analysis (IPA-Ingenuity® Systems) and KEGG Orthology Based Annotation System (KOBAS-http://kobas.cbi.pku.edu.cn/) software version 2.0. Genes confirmed by TCGA database with negative correlation between expression and methylation probes were included to obtain a highly trustworthy analysis.

Data confirmation by quantitative bisulfite pyrosequencing and RT-qPCR
In addition to the independent data confirmation using TCGA database, four genes (ERBB3, FGF1, GABRB2, and HMGA2) presenting methylation in the body gene region and two in the promoter regions (FGFR2 and RHD5) were selected to be evaluated for quantitative bisulfite pyrosequencing and RT-qPCR analysis. The selection criteria of these genes are presented in Additional file 1.
The CpG methylation pattern was assessed by quantitative bisulfite pyrosequencing using the Pyromark Q96 ID system (Qiagen, Valencia, CA, USA) in samples   The pyrosequencing and RT-qPCR results were compared to tumor/normal status and according to clinical, pathological and genetic features by Student's t test (p < 0.05) using GraphPad Prism 5.0 Software (Inc., La Jolla, CA, USA). Bonferroni correction was applied to adjust the P value by multiple hypotheses testing.

DNA methylation and gene expression profiles in PTC
To identify differential methylation in PTC, we analyzed CpG methylation status comparing PTC with NT (p < 0.05; |Δβ| 0.15). This analysis revealed 6070 CpGs probes differentially methylated, of which 89% (5425) were hypomethylated. A supervised hierarchical clustering analysis revealed two main clusters, one comprised exclusively PTC samples and the other included all NT and six PTC cases (Fig. 2a).
A total of 4995 of 6070 available probes in the TCGA database was confirmed as differentially methylated (Additional file 2: Table S1, Additional file 3: Figure S1A), highlighting the robustness of our methylome analysis. An enrichment of the identified probes was detected in nonpromoter regions (76 vs. 59% represented by the platform, p < 0.0001), mapped far from the CpG island called "open sea" (66 vs. 33% represented by the platform, p < 0.0001), and enhancers regions (54 vs. 21% represented by the platform, p < 0.0001) (Fig. 2b).
Using our previous genome-wide expression data [26] in 61 PTC versus five NT, 1657 differentially expressed genes Fig. 1 Workflow representative of the strategy used in the integrative analyses and in the cross-study validation. a Genome-wide methylation analysis revealed 6070 differentially methylated probes, and large-scale gene expression analysis identified 1657 differentially expressed genes in PTC (the last from a previous study). Corresponding probes/genes were submitted to a Pearson correlation test (34 PTC analyzed by both platforms) revealing 214 genes presenting probes with negative correlation and 49 genes with positive correlation. A total of 247 genes were classified as potentially regulated by DNA methylation in PTC. b A total of 4563 differentially methylated probes and 333 differentially expressed genes were identified in PTC according to BRAFV600E mutation. The Pearson correlation test revealed 69 and 17 genes with negative and positive correlation, respectively. Eighty three genes were classified as potentially regulated by DNA methylation in PTC BRAF mutated. *Tumor samples were initially corrected by NT samples (ΔβPTC-ΔβNT) and then BRAF positive and negative tumors were compared; § Some genes presented both methylation probes negatively and positively correlated. # Unadjusted p value were found (FDR <5% and FC >2). The comparison of these results with TCGA database, confirmed the involvement of 1446 genes differentially expressed (Additional file 4: Table S2, Additional file 3: Figure S1B).

Integration of DNA methylation and gene expression profiles in PTC
A powerful tool used in the identification of novel driver alterations in cancer is the combined analysis of different Fig. 2 Classification of the differentially methylated probes in PTC. a Supervised hierarchical clustering analysis showed 6070 differentially methylated probes in papillary thyroid carcinoma (PTC) versus normal thyroid (NT) tissues, mostly hypomethylated in PTC. The first cluster shows all normal samples (purple) and six PTC (orange), and the second is composed exclusively by tumor samples (orange). The beta values vary between zero (green) and one (red). b Methylation probes identified in PTC versus NT and those detected in the integrative analysis with negative (r−) and positive correlation (r+) according to the functional genomic distribution, CpG content, and neighborhood context and enhancer representation molecular platforms (21). Accordingly, an integrative DNA methylation and gene expression analysis were performed, highlighting genes potentially regulated by DNA methylation. The methylation analysis revealed 867 probes representing 420 genes differentially expressed. A total of 214 and 49 genes were identified as negatively and positively correlated (185 and 38 confirmed in the TCGA), respectively, with the corresponding methylation probe (Additional file 5: Table S3). Curiously, about half of the negatively correlated probes (163) was found covering promoter regions and 2% were mapped in CpG islands, contrasting to only 10% (6) of the positively correlated probes in promoter regions (p < 0.0001) and 21% (11) in CpG islands (p < 0.0001). No differences were observed in the enhancer regions (p = 0.299) (Fig. 2b).
The genes uncovered by the integrative analysis (negatively correlated and confirmed as altered in the TCGA portal) were distributed through all autosomal chromosomes (Fig. 3a). Cellular movement, growth, proliferation, and survival were the most significant molecular functions (IPA software) involving these genes (Additional file 6: Table S4). The FGF (Additional file 3: Figure S2) and retinoic acid signaling (IPA and KOBAS 2.0, p < 0.01) were among the main canonical pathways involved in PTC (Additional file 6: Table S4).

DNA methylation and gene expression profiling according to BRAF mutation
The BRAF (BRAFV600E) point mutation was detected in 28 of 41 PTC (68%) while no RAS (HRAS, KRAS, and NRAS) mutations were observed. The RET rearrangements (RET/PTC1 and RET/PTC3) were found in five of 41 cases (12%). As expected, the alterations were mutually exclusive.
An unsupervised hierarchical clustering analysis, comprising 34 PTC evaluated by both methylation and expression arrays, revealed two distinct clusters (Fig. 3b, c). A substantial overlapping between methylation and expression clusters was observed. One cluster was enriched by BRAFV600E tumors in methylation and expression analysis (p = 0.034 and p = 0.013, respectively, Fisher's exact test). The importance of BRAF mutation in the methylation and expression profiles was evaluated using a similar approach described in the integrative analysis using PTC versus NT samples. A differential methylation profile (unadjusted p < 0.05; |Δβ| 0.1) was observed in PTC BRAF mutated (3312 hypomethylated and 1251 hypermethylated probes) compared with PTC BRAF wild-type (Additional file 7: Table S5). The expression profile unveiled 333 altered transcripts in BRAF-mutated tumors (FDR <5% and FC >1.5) (Additional file 8: Table S6).
The comparison with TCGA database showed similar methylation and gene expression pattern in 29 and 82% of the genes, respectively. Integrative analysis revealed 69 and 17 genes with significant negative and positive correlation (36 and 5 of them were also found in the TCGA), respectively (Additional file 9: Table S7).

Discussion
The purpose of this study was to characterize the DNA methylation pattern of PTC and to elucidate its effect on gene expression deregulation and the biological pathways associated with the disease. By comparing PTC with matched NT, a global hypomethylation was detected, as previously reported [19,20,23]. Loss of DNA methylation throughout the genome has been related to genomic instability, somatic driver mutations, chromosomal breaks, and rearrangements. Papillary thyroid tumors have been characterized by a low number of structural rearrangements and frequent somatic driver mutations [17,19,20,23].
CpG islands (CGI) are well described in literature as often found in promoters and associated with gene downregulation when hypermethylated [25]. In our study, half of the methylated probes presenting inverted patterns compared with gene expression (negative Recently, epigenetic studies have uncovered that methylation alterations at other regulatory regions such as CGI shores, non-CGI promoter, and enhancers might also play a role in tumorigenesis [17][18][19][20][21][22][23]25]. In our study, these regions were more frequently detected, agreeing with previous reports in PTC samples [20]. Gene bodies are reported as having a limited number of CpG islands and broadly methylated. Interestingly, this region harbors multiple repetitive and transposable elements [25]. However, the biological significance of DNA methylation in this region is poorly clarified. Gene body methylation does not necessarily block the transcription as observed in promoter regions, but might increase the transcriptional activity, stimulate the transcription elongation, and impact in the splicing [25,29]. Accordingly, 80% of the probes presenting positive correlation between methylation/expression were annotated in gene body or 3'UTR. Interestingly, differentially methylated probes exclusively mapped in body or in promoter regions presented a similar proportion of differentially expressed genes (14 and 18%, respectively). Furthermore, an opposite relation between methylation and expression (hypomethylation/overexpression or hypermethylation /downexpression) of probes mapped exclusively in the promoter or in body gene regions were also detected (73 and 93%, respectively). These findings suggest that body gene methylation is a process involved in gene expression regulation, similar to those described in the promoter regions.
From 1446 genes differentially expressed (internal and external data), only 212 were considered as regulated by DNA methylation, suggesting the involvement of other transcription-regulator mechanisms. Nonetheless, one of the most significant results herein described was the enrichment of methylation disruption in enhancer regions (54% of the methylated probes). Enhancers are non-coding regulatory sequences able to recruit transcription factors and activate promoters. These regions are located from a few kilobases to more than a megabase of the transcription start site of the target gene [30]. Although the exact mechanism of the regulatory proteins binding according to DNA methylation in enhancers is still unclear, hypomethylation seems to result in increased activity of the target gene [31]. Therefore, even if only a few genes were directly associated with the regulation of DNA methylation, several genes might be influenced by the altered methylation in enhancer regions. Aran and Hellman [32] suggested that gene expression variation could be explained by methylation in enhancers.
The in silico analysis comprising 185 genes found in the integrative analysis (inverted methylation/expression pattern and confirmed by the cross-study validation) highlighted the importance of epigenetic alterations in thyroid carcinogenesis. The predicted effects in the biological functions (cellular movement, growth, proliferation, and survival) were previously reported as associated with thyroid cancer development [33]. Furthermore, deregulated canonical pathways including the retinoic acid-and FGF signaling pathways were unveiled as associated to PTC. The retinoic acid (RA) is involved in cell differentiation and plays a fundamental role in preventing neoplastic growth [34]. The RA biosynthesis involves the RDH5 enzyme, which reversibly oxidizes all-transretinol to all-trans-retinaldehyde and them irreversibly oxidized to RA by retinoid-active aldehyde enzymes (ALDH1A) [35,36]. We found RDH5 hypomethylation and overexpression and ALDH1A1 downexpression in PTC, suggesting a dysregulation of RA metabolism. Retinoic acid has been implicated in re-differentiation of thyroid cells by the induction of sodium-iodide symporter (NIS) expression, which is responsible for the iodine internalization [37,38]. Loss of NIS expression has been associated to a low uptake of iodine and interference in the efficacy of the radioiodine therapy in thyroid tumors [39]. According to our results, the methylation changes are potentially involved in the dedifferentiation of PTC cells, as a result of RA signaling pathway disruption.
Fibroblast growth factor signaling pathway, involved in angiogenesis and tumorigenesis [40], was significantly altered in our PTC samples. FGF1 and FGF2 have been reported as over-expressed in differentiated thyroid tumors, but their receptors present contrasting results [41][42][43]. In our study, FGF1 was found hypomethylated and over-expressed and FGFR2 was hypermethylated and downexpressed. According to Kondo et al, FGFR2 hypermethylation promotes downexpression and its re-expression acts blocking the BRAF/MAPK pathway in thyroid cancer [44].
Similar to the literature, mutually exclusive somatic alterations were found in BRAF (68% of PTC samples) and RET (12%), while RAS had no mutations [18,45,46]. The supervised clustering analysis of the methylation profiling revealed six PTC grouped with NT samples, two of them were BRAFV600E and one RET-PTC3 positive. The inclusion of PTC samples in NT enriched clusters was previously reported in methylation profiling studies [19]. This finding could be explained by nontumor cells contamination, as also observed in the TCGA study [21].
The hierarchical clustering analysis showed a substantial overlapping between transcripts and methylation profiles. A cluster enriched with of BRAFV600E tumors was detected, in agreement with previous studies using methylation [20] and gene expression analysis [45]. Kikuchi et al [18] evaluated the methylation profile (Infinium Human-Methylation27K Illumina) of 14 PTC and 10 normal thyroid tissues. Among the 25 differentially methylated genes, six (HIST1H3J, POU4F2, SHOX2, PHKG2, TLX3, and HOXA7) were selected for data confirmation. The authors described hypermethylation and downexpression of these genes in an additional set of cases and an association with BRAF/RAS mutations. From these genes, only PHKG2 was found hypomethylated (promoter region) of our gene list. The supervised analysis revealed an even more evident hypomethylated state in tumors harboring BRAFV600E, where 73% of the probes were less methylated compared to wild-type BRAF tumors (95% considering the TCGA confirmed probes). The integrative analysis according to BRAF mutation uncovered 69 genes showing negative correlation, 36 of them were confirmed by external data (TCGA). Similarly, Mancikova et al [19] reported genes with inverse correlation between methylation and gene expression mainly associated to MAPK pathway (MAPK13, DUSP5, and RAP1GA1) and apoptosis (LCN2, RIPK1, and LGALS1). The stratification of PTC into two entities, the BRAF-like (classical or tall variant) and RAS-like (composed mainly by the follicular variant), based on molecular landscape was generated from omics integrative analysis in a cohort of 496 PTCs [21]. Considering only the DNA methylation levels, the authors described four groups, two of them enriched by H/K/ NRAS-mutated follicular variant PTC (follicular and CpG island methylated) and two enriched by BRAF-mutated classical and tall cell PTC (classical 1 and classical 2). Similar to our findings, composed largely by BRAF-mutated tumors, the TCGA classical/tall cell PTC-enriched cluster was distinguished by low levels of methylation in CpG normally methylated outside of islands [21].
In addition to RDH5 (RA pathway), FGF1, and FGFR2 (both from FGF signaling pathway), ERBB3, GABRB2, and HMGA2 were previously reported as over-expressed in PTC [43][44][45][46][47][48][49]. Methylation and expression pattern of these genes were also confirmed by pyrosequencing and RT-qPCR. Previously, we pointed out GABRB2 and HMGA2 as potential diagnostic makers in thyroid tumors [26]. HMGA2 interacts with the transcription machinery (acts in the chromatin structure and regulates the transcription) and in the epithelial-mesenchymal transition (by repression of E-cadherin) [47,50,51]. Although HMGA2 and GABRB2 were not associated with clinical features or somatic alterations in the largescale expression analysis, the independent validation demonstrated that GABRB2 hypomethylation and overexpression were significantly altered in BRAFV600E tumors. Tumors harboring this mutation were associated with ERBB3 overexpression [45,48] but not with DNA methylation. A plausible explanation is the involvement of different mechanisms associated with the ERBB3 regulation in BRAF tumors, including the involvement of miRNAs or other post-transcription regulation events [49,52]. Overexpression and oncogenic activation of ERBB3 have been associated to treatment resistance with RAF/MEK inhibitors in melanoma and thyroid cancer, especially in the specific context of BRAFV600E [53][54][55].