Integrative epigenomic analysis of differential DNA methylation in urothelial carcinoma

Urothelial carcinoma of the bladder (UC) is a common malignancy. Although extensive transcriptome analysis has provided insights into the gene expression patterns of this tumor type, the mechanistic underpinnings of differential methylation remain poorly understood. Multi-level genomic data may be used to profile the regulatory potential and landscape of differential methylation in cancer and gain understanding of the processes underlying epigenetic and phenotypic characteristics of tumors. We perform genome-wide DNA methylation profiling of 98 gene-expression subtyped tumors to identify between-tumor differentially methylated regions (DMRs). We integrate multi-level publically available genomic data generated by the ENCODE consortium to characterize the regulatory potential of UC DMRs. We identify 5,453 between-tumor DMRs and derive four DNA methylation subgroups of UC with distinct associations to clinicopathological features and gene expression subtypes. We characterize three distinct patterns of differential methylation and use ENCODE data to show that tumor subgroup-defining DMRs display differential chromatin state, and regulatory factor binding preferences. Finally, we characterize an epigenetic switch involving the HOXA-genes with associations to tumor differentiation states and patient prognosis. Genome-wide DMR methylation patterns are reflected in the gene expression subtypes of UC. UC DMRs display three distinct methylation patterns, each associated with intrinsic features of the genome and differential regulatory factor binding preferences. Epigenetic inactivation of HOX-genes correlates with tumor differentiation states and may present an actionable epigenetic alteration in UC.


Background
Urothelial carcinoma of the bladder (UC) is one of the most common epithelial malignancies in the industrialized world and is characterized by heterogeneity in terms of the underlying molecular mechanisms. With respect to histopathology, UC can broadly be subdivided into non-muscle-invasive (NMI, stages Ta and T1) and muscle-invasive (MI, stage ≥ T2) disease. NMI disease is generally associated with a good prognosis despite frequent recurrences while MI disease has a decidedly worse prognosis [1]. Pioneering studies on phenotypic characterization of tumors using gene expression profiling have provided valuable insight into tumor biology and allowed for clinically relevant patient stratification with respect to targeted therapies [2,3]. We have previously established a molecular classification system for UC based on global gene expression patterns (Lund subtypes), and defined five major biologically distinct classes of tumors [4]. The Lund gene expression subtypes of UC include: (1) the low stage and grade Urobasal A tumors characterized by frequent FGFR3 mutations and a good prognosis; (2) high stage and grade Urobasal B tumors that are likely progressed Urobasal A tumors; (3) Genomically Unstable tumors characterized by high tumor grade and genomic instability; (4) the poor prognosis squamous cell carcinoma-like (SCC-like) tumors characterized by expression of basal cell markers; and (5) Infiltrated tumors in which the intrinsic gene expression subtype is partially confounded by infiltrating immune and stromal cells [4,5].
Alterations in DNA methylation and chromatin modification patterns are linked features that underlie many of the phenotypic changes observed in cancer cells [6]. In recent years, the interrelations between the gene expression phenotype, the genome, as well as the DNA methylation landscape has been extensively investigated across different malignancies [7,8]. Few studies have investigated the epigenomic landscape of UC. These have highlighted aberrant expression of epigenetic writers, silencing of developmental genes, as well as topological effects on the level of histone modifications as prominent features of aggressive UC [9][10][11]. Importantly, a broad range of epigenetic modifiers are frequently inactivated by somatic mutations in UC, further highlighting the role of epigenetic perturbations in UC development and disease progression [12][13][14]. In a recent landmark publication on MI UC by The Cancer Genome Atlas project (TCGA), 34% of tumors were found to exhibit a CpG Island methylator phenotype [15], consistent with previous reports by us and others [11,16]. The TCGA study confirmed many of our findings on the gene expression subtypes of UC and validated their subtypes using our data. Although this study used mRNA expression data to stratify the tumors, it did not report on the interrelations between the gene expression phenotype and the underlying DNA methylation subtypes of UC, but instead focused on the mutation and genomic landscapes.
To address this gap and investigate the interrelations between gene expression and DNA methylation profiles, we identified differentially methylated regions (DMRs) from methylated DNA immunoprecipitation on chip (MeDIP-chip) data generated for 98 UC tumors. We show that DMR methylation patterns stratify UC tumors into clinically and biologically coherent subgroups, and provide a detailed description of associations to gene expression subtypes of UC. Our main findings were validated using TCGA data. To characterize the underlying regulatory potential of UC DMRs and show that differential methylation occurs in distinct sequence contexts, we leverage ENCODE data on chromatin states across nine cell lines [17]. Furthermore, by integrating multi-level genomic data, we are able to assign genomic context regarding chromosomal distribution, chromatin state preference, and regulatory factor (RF) binding potential to the methylation subgroup defining DMRs. These intrinsic features of the genome may have the potential to dictate the observed DNA methylation changes [18], and provide a description of the genomic processes underlying differential DNA methylation in cancer. Finally, we characterize an epigenetic switch involving the HOXA/HOXB loci, previously described in the context of stem-cell differentiation [19], and show that the state of the epigenetic switch correlates with the level of tumor differentiation and aggressiveness.

Tumor samples
In total 98 tumor and four macroscopically normal urothelium samples were included in the study. Detailed sample selection criteria and collected sample annotations are described in Additional file 1. Informed consent was obtained from all patients in accordance with national statutes and use of the patient material is approved by the ethical review board at Lund University. The study conformed to the Declaration of Helsinki. Gene expression data generated on Illumina HT-12 expression arrays (Illumina, San Diego, CA, USA) was available for all samples included in the study and normalized for technical biases as previously described [4]. Gene expression data processing steps as well as DMR matching procedures are described in Additional file 1.

Methylated DNA immunoprecipitation and array hybridization
Methylated DNA immunoprecipitation [20], quality control, and purification steps, as well as PCR amplifications were performed as described in Additional file 1. Sample labeling and hybridizations to NimbleGen Human DNA Methylation 3 × 720 K CpG Island Plus RefSeq Promoter Arrays (Roche Nimblegen, Madison, WI, USA) were performed by the NimbleGen genomics facility on Iceland.

Data filtering, normalization, and variance-based detection of DMRs
The raw probe signal intensities of the full array (Cy5 and Cy3) were extracted for each sample. Probes mapping to the 22 autosomes were kept and the remaining probes discarded from further processing. A five-step normalization scheme was applied to the data and a permutation based approach that controls for local CpG density was used to define between-tumor differentially methylated regions (Additional file 1).

Clustering of tumor samples and DMRs
As the individual DMRs contained varying numbers of probes, we calculated the mean value of the probe scores for each DMR and tumor. We applied a bootstrap hierarchical clustering method [21] to derive stable methylation subgroups of UC tumors (Additional file 1). All calculations of sample and class enrichment and depletion with respect to clinicopathological and molecular annotations were performed using a one-versus-rest Fisher's exact test. Survival analysis with respect to HOX-cluster subtypes was performed using the logrank test on the entire cohort with disease specific survival as endpoint and was not corrected for clinicopathological variables or treatment.

Annotation of genomic features to DMRs
RefSeq tracks for genes, CpG-islands, chromatin tracks for nine ENCODE cell lines [17], as well as the Repeat-Masker track for hg18 were downloaded from the UCSC genome browser. Evolutionarily constrained elements throughout the genome, defined using the GERP algorithm [22], were obtained from the Sidow-lab web page. The MSigDB v3.1 database was downloaded from the GSEA web page. All processing steps related to genomic feature annotation of UC DMRs are described in Additional file 1.

TCGA data validation
For all samples included in the study (N = 234), data on DNA methylation (Illumina Infinium HM450 arrays), somatic variants (exome sequencing), and gene-level RNA sequencing were downloaded from the TCGA ftp server. We also obtained methylation data for 21 adjacent normal samples from TCGA. All raw file names are listed in Additional file 2: Table S1. The normalized gene-level expression estimates were processed by adding the constant 1 to all expression estimates followed by log2 transformation and median centering. The methylation data matrix was filtered for all probes with a SNP-annotation or a missing value in any tumor sample (final N = 322,425 probes). For the somatic mutation data, silent variants were filtered out and the mutation status of each gene was dichotomized on the sample level. For details on data processing steps, see Additional file 1.

Processing and analysis of ENCODE data
For compatibility reasons the UC DMRs were mapped to the hg19 build of the human genome using the UCSC liftover tool and resulted in a successful conversion for all but three DMRs (N = 5,450). Data on regulatory factor ChIP-seq peak calls as well as DNaseI-sites generated by the ENCODE consortium were obtained through the UCSC genome browser and processed as described in Additional file 1.

Statistical analyses and data visualization
All data manipulation, normalization, and calculation steps were carried out in the R statistical programing environment. All data visualization was produced using base graphics in R, and the UCSC genome browser.

Data access
The data generated through this study have been deposited into the Gene Expression Omnibus (GEO) under the accession number GSE58256. The previously published gene expression data are available under the accession number GSE32894. The data generated by TCGA are available through the project ftp site and ENCODE data through the UCSC genome browser.

Tumor-intrinsic DMR methylation patterns define four UC subgroups
We profiled 98 UC samples representative of the full clinicopathological spectrum using MeDIP followed by hybridization to Nimblegen 3 × 720 K RefSeq promoter and CpG Island (CGI) arrays to identify between-tumor differentially methylated regions (DMRs). We identified 5,453 high-confidence regions throughout all autosomes (Median size 780 bp, range 500 to 4,610 bp; Methods, Additional file 3: Table S2).
To define subgroups of UC based on their DMR methylation profiles, we used the 25% most varying DMRs (N = 1,363) and a bootstrap hierarchical clustering approach for tumor sample clustering [21]. The approach yielded four robust methylation subgroups (subgroups 1 to 4, Methods) which differed with respect to both clinical (pathological stage and grade) and molecular characteristics, including the Lund gene expression and the Lauss et al. [11] DNA methylation epitypes (epitype A-C) of UC as well as FGFR3 and TP53 mutation frequencies (Table 1) [4,11].
Subgroups 1 and 2 were enriched for stage Ta tumors (P <5 × 10 -5 and P = 0.047, respectively, Fisher's exact test) and tumors of lower pathological grade (subgroup 1 for grade 1, P <2 × 10 -5 and subgroup 2 for grade 2, P = 0.038). Both groups exhibited enrichment for the Urobasal A gene expression subtype (P <6 × 10 -8 and P = 0.0071, respectively) and the previously defined epitype A (P <8 × 10 -5 and P = 0.047). Moreover, subgroup 1 was enriched for activating FGFR3 mutations (P = 0.0026). Subgroup 3 was enriched for pathological grade 3 tumors (P = 0.0008) and exhibited a strong association with the Genomically Unstable gene expression subtype (P <2 × 10 -6 ) and epitype C tumors (P <4 × 10 -7 ). This subgroup did not differ significantly with respect to TP53 mutation frequency, but was depleted of FGFR3 mutations (1 of 24 tumors, P = 0.0002). Subgroup 4 was characterized by an association to MI (P = 0.0043) and pathological grade 3 (P = 0.012) disease, enriched for the poor prognosis SCC-like gene expression subtype (P = 0.011) and also included the majority (78%) of epitype B samples (P = 0.021). Subgroup 4 did not differ significantly from the other subgroups with respect to FGFR3 or TP53 mutation status. The identified methylation subgroups therefore correspond well to previously established molecular and epigenetic subtypes, as well as to pathologically, clinically, and genetically distinct classes of tumors.

The genomic characteristics of subgroup defining DMRs
We then applied ANOVA on the full set of 5,453 UC DMRs to identify regions with subgroup-specific methylation patterns. ANOVA significant DMRs (N = 2,697, P <0.05, FDR corrected) exhibited a higher median CpG density compared to DMRs without subgroup-specific methylation patterns (median = 0.022 and 0.011 CpG/bp, respectively, P <7 × 10 -82 , Mann-Whitney U test, Additional file 4: Figure S1A). We observed a significant difference in CpG density between UC DMRs that are hyper-(high CpG density) and hypomethylated (low CpG density) in tumor samples compared to normal urothelium (Additional file 4: Figure S1B). Hierarchical clustering of the subgroup specific DMRs revealed three main methylation patterns across the data ( Figure 1 and Additional file 5: Figure S2A to C), hereafter referred to as methylation pattern 1 (672 DMRs), 2 (650 DMRs), and 3 (1,375 DMRs).
Hypomethylation of DNA has been shown to occur at LINE1 and LTR elements in immortalized fibroblasts and at chromosome ends in a subset of glioblastoma tumors with potential implications for genome function [25,26]. To explore this further, we quantified the local (2 kb window) LINE1 and LTR element content and mapped the distance to the nearest chromosome end for each DMR. In total 1,523 (28%) of all DMR regions contained LINE1 or LTR repetitive elements and the median overlap, when present, was 326 bp. When considering only the subgroup-specific DMRs; pattern 1 DMRs showed significant enrichment of repetitive sequence overlaps (P <3 × 10 -11 ), while pattern 2 DMRs were strongly depleted of local repetitive element overlaps (P <4 × 10 -18 , Additional file 6: Figure S3A). Pattern 1 DMRs exhibited enrichment in subtelomeric regions of the genome measured as distance to nearest chromosome end (P <3 × 10 -34 , Kruskal-Wallis test, Additional file 5: Figure S3B), or relative enrichment of elements within the first or last 5 Mb of chromosomes (P <2 × 10 -32 , Fisher's exact test, Additional file 5: Figure S3C). Taken together, by integrating data from multiple levels, we show that subgroup-specific differential DNA methylation occurs in distinct genomic contexts.
The biology of DMR-associated genes and correlations to gene expression Overall we were able to match 3,685 (68%) of all DMRs to Illumina microarray gene expression data [4] (Methods). We used a resampling-based method to derive empirical significance thresholds for correlations between DMR methylation and gene expression (Methods). In total, 708 DMR-gene correlation coefficients, mapping to 668 unique DMRs, were found to be significant (18% of all gene expression matched DMRs). Among the 708 significant DMRgene correlations, 477 (67%) were negative. When only considering gene expression matched DMRs, 16.2% of the pattern 1 DMRs and 14.2% of pattern 2 DMRs exhibited significant correlations to gene expression. By contrast, 31.4% of pattern 3 DMRs exhibited significant correlations to gene expression. We observed a substantial difference between pattern 2 and pattern 3 DMRs with respect to methylation-gene expression correlations in loci marked by the 'Inactive/Poised Promoter' state in H1ESC. Pattern 3 'Inactive/Poised Promoter' state DMRs were four times more likely to be significantly associated with gene expression than were pattern 2 'Inactive/Poised Promoter' state DMRs (37.9% vs. 9.4%, P <5 × 10 -13 ) compared to approximately two-fold when considering pattern 2 and 3 DMRs irrespective of chromatin state in H1ESC. An analysis of biological themes related to the methylation patterns observed across UC DMRs was conducted using the signatures contained in the MSigDB v3.1 database (Methods) and highlighted differential enrichment of stem cell and developmental gene related signatures [27,28] (Additional file 7: Table S3). Taken together, pattern 3 DMR methylation highlights regions of active transcriptional regulation in UC, whereas pattern 2 DMRs accrue methylation in a subset of tumors with an absence of corresponding effects on gene expression patterns. Pattern 1 DMR methylation did not affect coordinated and biologically relevant gene expression programs.

Validation of DMR methylation patterns using TCGA data
We sought to validate the main observed methylation patterns in independent data generated by TCGA. We obtained methylation data for 234 MI tumors as well as 21 adjacent normal samples. We confirmed the high variance nature of UC DMR methylation by comparing the standard deviation of DMR overlapping (N = 9,969) and non-overlapping probes (N = 308,871). Probes within DMRs exhibited substantially higher variability in the external dataset (Figure 2A, P <2.2 × 10 -16 , Mann-Whitney U test). We extracted the most variable probe for all covered UC DMRs (3,361 probes) and investigated the average methylation state in tumors versus adjacent normal tissue (21 samples). These results confirm the platform (Illumina vs. Nimblegen) and cohort (NMI and MI vs. MI only) independent nature of our findings with respect to the three DMR patterns ( Figure 2B), that is, demethylation of pattern 1 DMR overlapping probes, hypermethylation of pattern 2 DMR overlapping probes, as well as elevated methylation levels of pattern 3 DMR overlapping probes.
To evaluate whether UC DMR methylation is representative of the overall variation structure of the full dataset, we clustered the top 25% most varying DMRoverlapping probes (N = 841 of 3,361) as well as the 2,000 most varying probes from the full platform using K-means consensus clustering [29]. We were able to derive stable subgroups using both datasets and found that a three-group split captured well the main structure of the data ( Figure 2C). Despite a limited overlap between the two probe sets (N = 279), the respective tumor coclustering results were highly concordant (P <3 × 10 -65 , Chi-square test, Figure 2D). These results confirm that the UC DMRs defined in our MeDIP-experiment capture the main variation structure in an exclusively MI cohort generated on a different platform.
Clustered heatmap visualization of the 2,000 most varying probes in relation to consensus cluster subgroups derived using the same data revealed two opposing patterns driving sample stratification: de novo methylation of high CpG density regions and demethylation of CpGs in lower density regions. The major methylation patterns were robustly associated with H1ESC chromatin states and the UC DMR methylation patterns defined in the discovery MeDIP-set. The row (probe) dendrogram indicated four major branches with 448, 840, 258, and 454 probes, respectively ( Figure 2E). We tested each of the four branches for skewness with regard to UC DMR overlaps. We found that two branches (N = 840 and 454 probes) marked by H1ESC active/weak/poised promoter (states 1 to 3) or transcription and enhancer (states 9 to 11 and 6 to 7) associated states respectively, were enriched for pattern 3 DMRs (P <0.0003 and P <0.0007, respectively (Fisher's exact test). A branch (N = 448) that was dominated by probes in regions with bivalent marks (state 3) in H1ESC displayed enrichment of pattern 2 DMRs (P <5.4 × 10 -21 ). Finally a H1ESC 'heterochromatin/low signal'-dominated branch (N = 258 probes) showed a significant enrichment of pattern 1 DMR overlaps (P <9.6 × 10 -13 ), confirming the existence of at least three distinct and pervasive sequence contexts that are differentially methylated in UC.
Thus we were able to validate our main findings on differential methylation in an independent data-set of 234 MI UC tumors run on Illumina Methylation 450 K arrays by TCGA. We show that probes overlapping UC DMRs display higher variance than non-overlapping probes, that probes overlapping pattern 1 to 3 DMRs exhibit the same directionality of differential methylation with respect to adjacent normal tissue, that UC DMR overlapping probes capture the variation structure of the data as a whole, and that unsupervised hierarchical clustering of probes identifies pattern 1 to 3 methylation in the validation data.

Chromatin state characterization of UC DMRs across ENCODE cell lines
To further characterize the regulatory potential associated with UC DMRs, we mapped the overlaps of H1ESC chromatin states UC DMRs. The distribution of chromatin states at UC DMRs exhibited a clear pattern of decreasing 'Heterochromatin/low signal' marks and a corresponding increase in states associated with promoter and gene regulatory regions towards DMR midpoints ( Figure 3A). The 'Inactive/poised promoter' as well as 'Weak promoter' and 'Weak enhancer' states showed increased frequencies towards the midpoint of DMRs while the 'Weak transcription' , 'Polycomb repressed' , and ' Active promoter' state overlaps exhibited decreases.
We also mapped the chromatin context at autosomal RefSeq transcription start sites in aggregate  (TSSs, Figure 3B) and in regions with UC DMRs ( Figure 3C). Average chromatin state frequencies differed between promoters with DMRs and all autosomal RefSeq promoters ( Figure 3B and C), in that an increase of 'Inactive/poised promoter' marks, mainly at the expense of the ' Active promoter' mark, was observed in areas having DMR overlaps. The average pattern of DMR overlaps around TSSs revealed a steady increase towards the TSS and a clear plateau covering approximately -700 to +700 relative to the TSS ( Figure 3C). Ernst et al. [17] also derived chromatin state maps for eight additional cell lines representative of different embryonal lineages and differentiation states. We mapped alternate chromatin states in these cell lines onto UC DMRs, yielding eight transition states for each UC DMR (Methods). The co-occurrence frequencies of alternate states in relation to the H1ESC states are shown in Figure 3D and highlight the relative stability of a subset of chromatin states, for example, ' Active promoter' and 'Heterochromatin/low signal' , as well as the dynamic nature of others such as the 'Weak enhancer' and 'Inactive/ poised promoter'. As UC DMRs were frequently marked by the 'Inactive/poised promoter' state in embryonic stem cells ( Figure 3A), we investigated the range of states of H1ESC 'Inactive/poised promoter'-marked loci in the additional eight cell lines. The analysis revealed that most UC DMRs marked by 'Inactive/poised promoter' marks resolve to a monovalent, predominantly H3K27Me3 marked ('Polycomb repressed') state across all eight cell lines ( Figure 3E), consistent with observations in stem-cell differentiation [23,27]. In contrast to the other cell lines, the two cancer-derived cell lines (HepG2 and K562) frequently exhibit heterochromatin marks at H1ESC bivalent loci ( Figure 3E). Our results highlight the dynamic nature of chromatin modifications across cell types and differentiation states while providing independent evidence of regulatory function for UC DMRs.

Regulatory factor occupancy of UC DMRs
We then utilized ChIP-seq data generated by the EN-CODE consortium [30], to gain further insights into the basic gene regulatory and genomic context of regions associated with methylation changes in UC and to analyze our data from a regulatory factor (RF)-based perspective. We focused on the five cell lines for which the largest amount of data on RFs was available and restricted our analysis to UC DMRs (Methods). We extracted a core set of 18 RFs for which data were available in all five cell lines (90 RF-cell tracks in total) and constructed a binary matrix of DMR-ChIP-seq peak overlaps, which we subsequently clustered ( Figure 4A, Methods). Fifty-five percent of all UC DMRs had at least one overlapping ChIPseq peak. While 88% of pattern 2 and 71% of pattern 3 DMRs had at least one overlapping ChIP-seq peak, this was only the case for 27% of pattern 1 DMRs. A subset of UC DMRs clustered together mainly due to the influence of CTCF and RAD21 homolog (RAD21) binding ( Figure 4A). CTCF/RAD21 co-binding may implicate these regions as functional elements in a process such as cohesin recruitment [31]. Specific co-occurrence of CTCF and RAD21 peaks at UC DMRs was assessed by identifying all instances in which a DMR overlapped both CTCF and RAD21 peaks in at least one cell line (N = 516). Pattern 2 DMRs exhibited significant enrichment for CTCF-RAD21 co-binding (P <3 × 10 -12 , Fisher's exact test, Figure 4B), and the enrichment was mainly attributable to the H1ESC cell-line.
In total, 1,548 DMRs (28.4%) were bound by RNA polymerase 2 (POLR2A) in at least one cell line. POLR2A binding exhibited three main patterns across the UC DMRs: (1) ubiquitous and exclusive POLR2A binding characterized by enhancer states in all cell lines; (2) ubiquitous POLR2A binding with active promoter states and frequent RF binding across multiple cell lines; and (3) patterns of POLR2A co-binding with RFs and enhancer/promoter states in a cell type specific context ( Figure 4A). Nearly 40% (37.3%) of the DMRs overlapping POLR2A sites exhibited pattern 3 methylation (P <3 × 10 -36 , Figure 4B), while at the other extreme, pattern 1 DMRs were strongly depleted of POLR2A binding (P <6 × 10 -30 , Figure 4B). The transcription-associated RFs TATA-binding protein (TBP) and Transcription initiation factor TFIID subunit 1 (TAF1) always clustered together in a cell-type specific manner ( Figure 4A). For each of the five cell-lines, co-binding of all three transcription-associated RFs in any of the cell lines was recorded and enrichment statistics were calculated for the three DMR categories. Only pattern 3 DMRs exhibited significant enrichment for regions bound by all three factors (P <3 × 10 -8 , Figure 4B).
For sites bound by the RE1-Silencing Transcription Factor (REST) in any of the cell lines (N = 511), pattern 2 DMRs exhibited significant enrichment for overlaps (P <4 × 10 -6 , Figure 4C). The enrichment was further accentuated in the subset of DMRs that were bound by REST in all five cell lines (N = 60, P <3 × 10 -8 ). A large number of UC DMRs exhibited EZH2 binding, consistent with chromatin state annotations showing frequent overlaps with bivalent state marked regions in H1ESC ( Figure 4A). As expected pattern 1 DMRs were strongly depleted of EZH2 binding (P <2 × 10 -17 , Figure 4B) while pattern 2 DMRs exhibited robust enrichment (P <7 × 10 -155 , Figure 4B). In addition to establishing the differential binding landscape of RFs with respect to UC DMRs, we highlight the connection between RF binding and chromatin states in both a cell-type specific and independent context. Our results provide evidence in favor of multiple genomic processes underlying the DMR methylation patterns 1 to 3 observed across UC tumors, and implicate pattern 1 DMRs as infrequent sites of RF binding, pattern 2 DMRs as frequent sites of CTCF/RAD21 as well as EZH2 binding, and pattern 3 DMRs as sites of frequent POLR2A occupancy.

Spatial patterns of regulatory factor binding at UC DMRs
DNaseI hypersensitive sites (DHSs) define regions of open chromatin and are frequently associated with regulatory factor binding. We mapped DHS-peak bases locally in a 10 kb window centered on UC DMRs and explored the spatial patterns of ENCODE ChIP-seq RF binding in relation to DMR positioning. Chromatin accessibility, as measured by DHS, increased towards DMR midpoints and the most frequent UC DMR-DHS overlaps were observed in the H1ESC cell line ( Figure 5A). A general trend of decreasing DHS levels across all UC DMR patterns was observed in the more differentiated and cancer-derived-cell lines compared to H1ESC, however this was most accentuated among pattern 2 DMRs ( Figure 5A). While pattern 1 DMRs did not exhibit specificity in DHS peak distributions when assessed by aggregation plots (APs), pattern 2 DMRs were centered on DHS sites, and pattern 3 DMRs showed a consistent tendency towards a local depletion of DHSs towards the DMR midpoints. EZH2 binding was strongly associated with DHSs in H1ESC and exhibited a sharp peak centered on pattern 2 DMR midpoints, a feature seen to a lesser extent in GM12878 and HepG2 but lacking entirely in K562 and HeLa-S3 cells ( Figure 5B). The observed patterns are consistent with H3K27-trimethylation-mediated repressive/poised state as 'default' for pattern 2 DMRs in ESC with a successive transition to stable modes of repression in response to differentiation cues or immortality ( Figure 4A). POLR2A binding across pattern 2 DMRs was associated with local DHS density in all cell lines except H1ESC, indicating a decoupling of open chromatin status and gene transcription in ESCs ( Figure 5C).
We observed a tendency towards local depletion of POLR2A peak coverage towards pattern 3 DMR midpoints, a feature also observed among a number of RFs  across all cell lines and resulting in decidedly bimodal APs ( Figure 5C and D). Bimodal AP-plots are commonly the result of non-oriented feature alignments [32]. However, the depletion of RF-binding at pattern 3 DMR midpoints may reflect true features of genome organization as the RF-binding patterns are recapitulated in the patterns of CGI and CGI-shore base overlaps ( Figure 5E and F).
In order to capture the nature of the switch-like appearance and to derive aggregate sample-level HOXAconsensus profiles, we performed k-means clustering (k = 3) on all DMRs contained within the HOXA locus ( Figure 6B). The consensus profiles captured the aggregate structure of the locus and provided readily interpretable methylation patterns. In terms of sample stratification, tumors could be subdivided based on the methylation patterns into those that display 'posterior-only' , 'anterior-only' , and 'pan' HOXA DMR methylation ( Figure 6B). A strong link was also observed between anterior HOXA (HOXA1-6) gene expression and expression patterns across the entire HOXB-locus ( Figure 6C).
Tumor stratification based on HOXA-DMR methylation was also reflected in the global DNA methylation patterns described above. The 'posterior-only' group of tumors displayed significantly higher levels of pattern 1 methylation compared to each of the other two HOXmethylation-based groups (both P <0.001, t-test, FDR corrected). Pattern 2 methylation was significantly higher in both 'anterior-only' as well as 'pan-HOXA' compared to 'posterior-only' tumors (both P <0.05) but did not differ significantly between the former two, indicating that HOX-methylation does not simply recapitulate global methylation patterns. High levels of pattern 3 methylation was characteristic of both 'anterior-only' and 'pan-HOXA' tumors, and differentiated both groups from the tumors displaying 'posterioronly' methylation (both P <5 × 10 -11 ). The absence of a clear difference in pattern 2 methylation suggests that the processes underlying the different epigenetic states within the HOX-gene loci are distinct from the ones giving rise to pattern 2 methylation.
With respect to the Lund gene expression subtypes, 'posterior-only' tumors corresponded to the Urobasal A gene expression subtype (33/36, P <4 × 10 -13 , Fisher's exact test) and belonged to methylation subgroups 1 and 2 (31/36, P <4 × 10 -13 ). Within the 'posterior-only' tumors, loss of anterior HOXA gene expression and increasing posterior HOXA-associated DMR methylation was evident and agreed well with the original subdivision of Urobasal A tumors into two subgroups; MS1a (Molecular subtype 1a) and MS1b (Molecular subtype 1b) [4]. Therefore, anterior HOXA gene expression is a feature of the MS1a subset of Urobasal A tumors, while the remaining tumors (mainly MS1b) only exhibit sporadic HOXA gene expression ( Figure 6C). The notion of HOX-cluster methylation patterns being related to tumor differentiation states was substantiated by the observation that 'posterior-only' tumors were almost exclusively of pathological grade 1 or 2 (34/36). In addition 83% were of pathological stage Ta. ' Anterior-only' tumors differed with respect to HOXA gene expression patterns. Although near ubiquitous HOXA9-expression was the common denominator among these tumors, HOXA10-13 gene expression was absent in tumors of the Lund SCC-like gene expression subtype of UC ( Figure 6C). As expected from the Lund subtype as well as clinical associations, the methylation profiles across the HOXAlocus stratify the tumors into low-('posterior-only') as well as high-risk ('pan-HOXA' and 'anterior-only') groups in terms of disease-specific survival ( Figure 6D, P = 8.7 × 10 -5 , logrank test). Thus the coordinated shift in HOXA/ HOXB loci methylation is strongly associated with a similar shift at the HOXA/B expression levels, with genome wide methylation patterns, as well as with previously described molecular (gene expression) subtypes of UC.

Expression of retinoic acid responsive genes correlates with HOXA methylation patterns
The observed pattern of anterior-posterior HOXA expression has previously been described in the setting of retinoic acid (RA) induced neuronal differentiation of pluripotent progenitor cells in which undifferentiated cells express the posterior-while repressing the anterior HOXA-genes and vice versa [19]. To further investigate the HOX-cluster methylation patterns, we performed t-tests for differential methylation and gene expression using the 'posterior-only' as a reference group against which 'anterioronly' and 'pan-HOXA' tumors were compared.
The dominant pattern for significant pairs (gene expression and methylation P <0.01, FDR corrected) was coordinated increased methylation and reduced gene expression (72/98 in 'pan-HOXA' and 59/76 in the 'anterior-only' comparison, Figure 6E and F, Additional file 8: Table S4). In addition to HOX genes HOXB2-5, HOXB8, and HOXA1, additional genes with concomitant gain of methylation and loss of expression in both 'pan-HOXA' and 'anterior-only' tumors included the retinoic acid responsive genes GPRC5C [33] and ITM2B [34] as well as the transcription factors SMAD3 and SLIT3. A gene significantly methylated and repressed in 'pan-HOXA' tumors only was PHF23, a frequent fusion partner with NUP98 in acute myeloid leukemia (AML) that has been shown to enforce HOXA9-10 expression by protecting activating H3K4Me3 marks and blocking EZH2 mediated HOX-gene repression [35]. Tumors with 'pan-HOXA' methylation patterns also displayed downregulation and methylation of FABP5, a key protein in directing the cellular response to RA [36]. Genes exhibiting specific methylation and downregulation in 'anterior-only' tumors included additional HOX-genes HOXA5, HOXD4, and HOXB7 as well as AHR which has been shown to modulate retinoic acid receptor/retinoid X receptor (RAR/RXR) mediated cellular responses to RA [37]. Consistent with developmental gene silencing through methylation being the primary factor underlying HOXA methylation patterns, few genes exhibited lower methylation levels with concomitant high gene expression levels in either of the two comparisons. Five genes exhibited this pattern for 'pan-HOXA' tumors; CDCA3, FBN2, GRM8, CDKN2A, and KRT20, the latter a marker of terminal urothelial differentiation. For 'anterioronly' tumors the same pattern was observed for HOXA9 and HOXA11 as well as CDKN2A. In addition, we noted that within the 'posterior-only' set of tumors, one of the most significantly upregulated genes among tumors expressing the anterior HOXA genes (Lund MS1a) versus tumors without anterior HOXA expression was RXRA (P <5 × 10 -7 ), providing further evidence in favor of a link between retinoic acid signaling and HOXA-gene expression patterns in UC.

KDM6A mutations are depleted in HOXA9-expressing tumors
We validated our observations on HOXA/B cluster methylation and gene expression patterns in external data generated by the TCGA consortium (Methods). Ordering of tumors with respect to the balance of anterior and posterior HOXA-DMR methylation validated our observations on the dynamics of HOX-gene expression. As the validation dataset only consists of high grade MI tumors, the low-grade associated anterior HOX gene expression pattern could only be observed in a small subset of samples, but with retained HOXB gene expression as in our own data ( Figure 7A).
The trithorax complex and its vertebrate homologs are crucial regulators of homeotic gene expression [38]. The H3K27 demethylase KDM6A is among the most frequently mutated genes in UC [14] and its homolog Utx has been shown to be a trithorax group regulator in Drosophila [39]. In addition, the trithorax complex component MLL-genes, encoding H3K4 methyltransferases, are frequently mutated in UC [15]. We therefore investigated the relationship between the HOXA methylation subgroups and the trithorax complex linked genes MLL, MLL2, and MLL3, as well as KDM6A. While little skewness in MLL-gene mutation rates could be observed, KDM6A mutations were depleted in tumors that exhibited unmethylated posterior HOXA DMRs ( Figure 7A). We also noted that KDM6A mutated tumors exhibited significantly lower HOXA9 gene expression levels (P = 0.00035, Wilcoxon rank sum test) and that methylation of the HOXA9 promoter DMR exhibited a strong negative correlation to gene expression ( Figure 7B, Spearmans Rho = -0.78, P <2.2 × 10 -16 ). In summary, we validate HOX-gene methylation-and expression patterns in an independent cohort of MI UC, and highlight a connection between HOXA9 gene expression patterns and KDM6A mutations.

Discussion
DNA methylation is a multifaceted process with context dependent functions in genome regulation and wideranging clinical implications [40,41]. Previous studies of epigenetic alterations in UC have been conducted on low-coverage platforms and have been focused on markers of aggressive disease [10,16,[42][43][44]. Studies that have explored the interrelation of global changes on the epigenetic and gene expression levels have often restricted their analyses to the individual CpG-gene level instead of addressing the associations between global phenotypes [9,11,43,44]. The current study aims at describing the links between gene expression and DNA methylation subtypes of UC as well as investigating the RF binding and chromatin state associations of UC DMRs.
We conducted a comprehensive analysis of differential methylation using MeDIP-chip on 98 UC tumor samples subtyped according to the Lund molecular taxonomy for UC [4]. Bootstrap hierarchical clustering analysis stratified the samples into four subgroups with distinct associations to histopathological groups (stage and grade), mutations (FGFR3 and TP53 mutations), as well as Lund gene expression subtypes (Urobasal A, Urobasal B, Genomically Unstable, or SCC-like). The present cluster analysis highlights a clear split between the low grade, non-invasive Urobasal A tumors and the high grade, invasive tumors characterized by genomic instability or a keratinized phenotype (Genomically Unstable and SCClike tumors, respectively). However, the analysis also revealed that differences in DNA methylation patterns can exist within a group of tumors of the same gene expression phenotype, for example, the presence of Genomically Unstable tumors in methylation subgroups 3 and 4. Importantly we were able to validate our findings in a platform (Nimblegen vs. Illumina) and cohort (Lund vs. TCGA) independent dataset.
Our previous characterization of DNA methylation patterns on low-coverage Illumina 27 K methylation arrays revealed three main methylation subgroups, termed epitypes (A to C) [11]. In the present investigation, subgroup 1 and 2 corresponded to epitype A and exhibited similar histopathological (low pathological stage and grade) and mutational (frequent FGFR3 and infrequent TP53 mutations) associations. Subgroup 3 tumors were highly enriched for epitype C tumors, linking this methylation phenotype to the Genomically Unstable gene expression subtype of UC. Finally, subgroup 4 was enriched for epitype B, characterized by extensive demethylation of low CpG density promoter, as well as tumors of the SCC-like gene expression subtype [4,11].
Previous studies into epigenetic changes in UC have mainly been focused on characterizing differential methylation [11,16,[42][43][44]. However, the functional genomic context of differential methylation remains less well studied. We used multi-level genomic data generated through the ENCODE consortium to characterize the regulatory potential of UC DMRs and show that the identified regions exhibit biologically coherent chromatin state and RF-binding preferences in ENCODE cell-lines. We found that subgroup-defining DMRs exhibit three distinct patterns of methylation across tumors (summarized in Figure 8A). Pattern 1 DMRs are located in low CpGdensity, repeat-rich, subtelomeric regions of the genome and are depleted of functional chromatin states and RFbinding across ENCODE cell lines. Methylation of pattern 1 DMRs is inversely correlated with pathological grade and may represent stochastic demethylation of heterochromatic DNA through a loss of a maintenance-like process, or may be a product of the formation of partially methylated domains (PMDs) [45] in a subset of tumors. The implications of subtelomeric and repetitive sequence demethylation for genome stability are not well understood but may contribute to UC pathogenesis and disease progression.
De novo methylation of high CpG-density positions is a common feature of an aggressive subset of UC tumors [11,16]. Pattern 2 DMRs are enriched for conserved, high CpG-density (CGI), repeat-depleted, regions marked by bivalent domains in embryonic stem cells. This pattern of DMR methylation does not correlate with gene expression, is present in a subset of high grade tumors and is tightly linked to the Genomically Unstable gene expression subtype of UC. We identified pattern 2 DMRs as sites of EZH2 and REST binding, as well as CTCF/RAD21 binding, in H1ESC (summarized in Figure 8B). EZH2 is a core component of PRC2 that mediates polycomb silencing of developmental genes [23,24,27,46]. REST is involved in repression of differentiation associated genes in the neural lineage, is essential for embryonic development [47] and has been implicated in the process of carcinogenesis [48,49]. Evidence for a direct role for DNA methylation in NRSF/REST mediated gene suppression has also been reported [50] and a connection between REST binding and polycomb mediated gene repression has been established [51]. The CTCF/RAD21 binding patterns may implicate disruption of cohesin function as either the cause or consequence of cluster 2 methylation at a subset of UC DMRs. Whereas a large proportion of CTCF/ RAD21 marked sites were devoid of additional RF binding, a subset displayed near ubiquitous POLR2A and RF binding with accompanying active marks in the four cell lines for which chromatin tracks were available. The observed patterns may reflect different modes of cohesin involvement in gene regulation [52]. In support of a functional role for differential methylation at sites of CTCF and cohesin co-localization, Parelho et al. have shown that differential DNA methylation of CTCF motifs at cell type specific cohesin sites can abrogate CTCF mediated cohesin binding [31]. As we found specific enrichment of CTCF/RAD21 colocalization across pattern 2 DMRs, this methylation pattern may identify a subset of tumors with actionable defects in cohesin function [53]. Our findings link pattern 2 methylation to developmental gene silencing as well as disruption of factors mediating higher-order chromatin structure. expression patterns, further investigations into differential HOX-gene expression patterns are warranted in light of confounding factors such as tumor heterogeneity and gene functional redundancy. Inactivation of polycomb-related epigenetic modifiers through gene mutations are likely early events in UC formation [13,14]. The selection pressures and processes leading to differential mutation and epigenetic landscapes across tumor subgroups are however unknown. Inquiries connecting the developmental biology of the bladder with the tumor biology of UC are beginning to provide insight into these basic questions [57,[63][64][65] and future investigations should be directed at understanding epigenetic changes in the context of molecular subgroups and underlying biological processes.