Integrated epigenome, exome and transcriptome analyses reveal molecular subtypes and homeotic transformation in uterine fibroids

Uterine fibroids are benign myometrial smooth muscle tumors of unknown etiology that when symptomatic are the most common indication for hysterectomy in the USA. We conducted an integrated analysis of fibroids and adjacent normal myometria by whole exome sequencing, Infinium MethylationEPIC array, and RNA-sequencing. Unsupervised clustering by DNA methylation segregated normal myometria from fibroids, and further separated the fibroids into subtypes marked by MED12 mutation, HMGA2 activation (HMGA2hi) and HMGA1 activation (HMGA1hi). Upregulation of HMGA2 expression in HMGA2hi fibroids did not always appear to be dependent on translocation, as has been historically described, and was associated with hypomethylation in the HMGA2 gene body. Furthermore, we found that expression of HOXA13 was highly upregulated in fibroids and that overexpression of HOXA13 in a myometrial cell line induced expression of genes classically associated with uterine fibroids. Transcriptome analyses of the most differentially expressed genes between cervix and myometrium also showed that uterine fibroids and normal cervix clustered together and apart from normal myometria. Together, our integrated analysis shows a role for epigenetic modification in fibroid biology and strongly suggests that homeotic transformation of myometrium cells to a more cervical phenotype is important for the etiology of the disease.


INTRODUCTION
Uterine fibroids (also known as leiomyomas) are benign tumors that develop in the smooth muscle of the uterine myometrium and are estimated to occur in up to 75% of reproductive age women. While mostly asymptomatic, approximately 25% of women with fibroids suffer from clinically significant symptoms, including pelvic discomfort, menstrual bleeding, menorrhagia to preterm labor, recurrent pregnancy loss and infertility (1,2). There is a strong racial disparity in the disease, with a lifetime prevalence estimated to be 3 times higher in women of African descent (3), who also have earlier clinical onset, higher tumor burden, and greater severity of symptoms.
Non-surgical, hormone-based therapies for fibroids offer only short-term mitigation of symptoms, and their use is limited due to significant associated side-effects. Surgical intervention is often the last resort for women seeking permanent relief from the disease. Symptomatic fibroids are the most common indication for hysterectomy in the United States (4).
Approximately 30-40% of fibroids have been reported as having karyotypic abnormalities, the most commonly reported of which are translocations at chromosome regions of 12q15 and 6q21, leading to over-expression of high mobility group AT-hook genes, HMGA2 and HMGA1, respectively (5,6). The HMGA family of proteins are nonhistone, chromatin-binding proteins that regulate transcription by influencing DNA conformation and in the process, accessibility of DNA binding proteins. Due to their various DNA binding properties, they influence diverse cellular processes including cell growth, proliferation, and cell death (7,8).
Whole-exome approaches have recently identified a somatic mutation that is found largely in exon 2 of the Mediator complex subunit 12 (MED12) gene and occurs in around 50-70% of the fibroids (9). MED12 is located on the X chromosome and encodes a highly conserved 250 kDa protein that forms part of the Mediator RNA polymerase II pre-initiation complex. Together, MED12 mutation (MED12mt) and HMGA1 or HMGA2 overexpression (HMGA1hi and HMGA2hi, respectively) encompass approximately 80-90% of genetic alterations present in all fibroids (10,11). However, the precise mechanisms disrupted during fibroid development or progression have yet to be determined.
Subtype classification of fibroids based on their mutation status or gene expression characteristics have been proposed (12), but the DNA methylation profiles of these fibroid subtypes have not been reported. Additionally, in some cases, the subtyping was performed without consideration of fibroids from African American women (13). Methylation of cytosine nucleotides 5' to a guanine (CpG) in DNA is among the most well-established epigenetic marks known to influence gene expression (12), but how these epigenetic modifications might affect transcriptional activity in fibroids has not been well described nor has cytosine methylation in a non-CpG context (CpH methylation) (14,15). The major goal of this study was to delineate the molecular landscape of fibroids based on integrated genome-wide analysis of DNA methylation and mRNA transcription within the context of their mutational status for subtype categorization and identify possible targetable mechanisms for therapeutic intervention.

Fibroid Subtype Determination
We applied an integrated approach to study uterine fibroid subtypes by combining DNA methylation array hybridization, whole exome sequencing (exome-seq), and RNA sequencing (RNA-seq) to determine driver mechanisms underlying subtype determination. Ten normal myometrial (5 each Caucasian and African American) and 24 fibroid (12 each Caucasian and African American) samples were collected for DNA methylation analyses. Methylomes for normal myometria and fibroids were profiled using the Infinium MethylationEPIC array (EPIC) (16). Epidemiological studies have well documented a strong racial disparity in the disease, with African-American women presenting with greater incidence, age of onset, and severity (17). Self-identified race for the African American samples was confirmed using EPIC SNP probes (Fig S1A and   B) in a constructive predictive model (16) to mitigate confounding effects from possible misidentification because of admixture. To assess cellular composition of the samples, promoter methylation of miR200c/miR141, which are methylated in mesenchymal cells (18) but unmethylated in epithelial cells, was analyzed ( Fig S1C). The methylation beta values, corresponding to the fraction of methylated probe signals, suggested very low contamination in the normal myometrium, and approximately 80-90% smooth muscle cells in the fibroids. Further examination of α SMA promoter methylation, which is mostly unmethylated in myofibroblast cells (19), showed consistent results (Fig S1C). Similarly, flow cytometry analysis of human myometrial and fibroid tissue identified approximately 70% and 90% α SMA-positive smooth muscle cells, ensuring a relatively homogeneous population of cells without the need for any further enrichment (Fig S1D). This indicates that our methylation study is a good reflection of the CpG landscape of smooth muscle cells and are not unduly influenced by contaminating cells. Origins of fibroid and normal myometrial samples from specific patients were confirmed by SNP analysis that identified distinct branches in paired normal myometrium and fibroids, while isolated branches were restricted to the unpaired samples ( Fig S1E).
Unsupervised clustering of DNA methylation data (Fig 1A) of the most variable 1% CpG sites (Table S1) revealed segregation by disease status (normal and fibroid).
Fibroids were further split into three major clades in the dendrogram. Whereas, the HMGA1hi fibroids clustered closer to the normal myometrial samples, the MED12mt and HMGA2hi fibroids clustered closer to each other. Consensus clustering analysis (20) with 1000 iterations showed that the three discovered methylation clusters of fibroids were robust as demonstrated by both sample-based and cluster-based stability scores ( Fig 1B). One of the MED12mt fibroids showed a high tendency to be clustered with HMGA2hi fibroids but predominantly clustered with other MED12mt fibroids.
We analyzed the fibroids for MED12 mutation status by Sanger sequencing, and the hotspot, exon 2 mutations in MED12 (9) were detected in most of the cases ( Fig   1C). However, an unreported C>T mutation and a 24 bp deletion from two separate fibroids collected from the same patient (MP136) were also detected using exome-seq ( Fig 1D). RNA-seq also showed a clear drop in MED12 reads in the deleted region of these two fibroids, compared to the rest of the first exon, and to the same region in their matched normal myometrial sample (Fig 1E). Further analysis of the cDNA from these fibroids confirmed the C>T mutation and deletion. Splicing of intron 1 did not appear to have been affected, but the observed in-frame deletion did result in an mRNA with a predicted translated protein missing 8 amino acids (Fig 1D). HMGA1 and HMGA2 overexpression marked the two other groups in the non-MED12mt fibroids (Fig 1F). MED12 expression levels were not significantly different between normal myometrium and fibroid subtypes (Fig 1F).
Multidimensional scaling of the whole exome sequencing results showed tight clustering of the myometria and fibroids by patient, confirming that these were matched samples ( Fig S2A). We also confirmed that the mutant MED12 allele was the expressed allele in a few MED12mt fibroids ( Fig S2B). With both exome sequencing and RNA sequencing data, we were able to investigate whether the paired fibroids from the same patients (MP111 and MP136) came from a single cell or had separate origins by examining X chromosome inactivation patterns. One of the two X chromosomes is randomly inactivated early in development, and fibroids with different inactive X chromosome would be unlikely to come from the same cell of origin. MP111F1 and F2 fibroids expressed alternative alleles for heterozygous loci on chromosome X (Fig S2C), as well as poorly correlated methylation patterns (see Fig S2D, also Fig 1A and Fig   S4B), suggesting that they originated independently from two cells with a different chromosome X inactivated ( Fig S2E). In contrast, MP136F1 and F2 expressed the same alleles (Fig S2F), as well as highly similar methylation patterns (see Fig S2G, also Fig 1A and Fig S4B), suggesting a probable single cell of origin ( Fig S2H). Conflicting results regarding clonality have been reported; however, previous exome sequencing of fibroids (11) and these results suggest that clonality can vary among the fibroids.
Mutational burden of fibroids was generally less than 0.5/MB (Fig S3A), except for MP164F (2.5/MB), and comparable to that of pediatric leukemias and lymphomas, which represent some of the lowest in human cancers profiled to date (21). A prevalence for C>A mutation in some of the fibroids (Fig S3B) was observed, different from the common C>T mutation in CpG context. However, association of the fibroids with mutation signatures did not reveal any overt differences within and between the MED12mt and HMGA2hi fibroid subtypes (Fig S3C), neither were we able to identify complex chromosomal rearrangements in our MED12mt samples by available bioinformatic tools suggesting that mutational burden is not a major contributor to the fibroid phenotype in our samples.

DNA Methylation Landscape Is Altered in Fibroids.
Overall distribution of CpG and CpH (or non-CpG) methylation were largely unremarkable in all samples (Fig S4A). At loci with fibroid-specific hypermethylation compared to normal myometria, HMGA1hi fibroids were closest to normal myometrial samples (Fig S4B), similar to the clustering results with the top 1% most variably methylated sites (Fig 1A). HMGA2hi and MED12mt fibroids had elevated levels of methylation at CpH sites ( Fig S4C). HMGA2hi had the highest-level gain of methylation at CpG and CpH sites among all groups, which is consistent with the observation that they also had the highest expression level of the de novo DNA methyltransferase DNMT3A ( Fig S4D).
We examined the sites interrogated on this array by various genomic compartments. CpG islands were largely unmethylated in both normal myometria and fibroids, and highly methylated domains (HMDs) in myometria remained highly methylated in fibroids (Fig S4E). We also did not observe significant hypomethylation in the partially methylated domains (PMDs), even though loss of methylation within PMDs, particularly in the context of WCGW (where W=A or T) without neighboring CpGs (dubbed solo-WCGW), has been suggested to track accumulation of cell divisions in normal cells and is commonly observed in tissues that have undergone extensive clonal expansion like cancer (22). Methylation at enhancer regions, however, exhibited a small shift in the overall distribution between normal myometria and fibroids from PMDs to HMDs (Fig S4E).
Enhancer activity landscape can be inferred from DNA methylation profiles (23), with unmethylated distal regions usually marking active enhancers. A large fraction of the probes on the EPIC array interrogate distal elements (defined as +/-2kb away from the TSS) containing at least one binding site for each of 158 transcription factors (TFs) that we previously annotated (16), based on ENCODE ChIP-seq data (24). We assessed enrichment or depletion of differentially methylated cytosines (DMCs) in binding sites for the TFs by hypergeometric testing (FDR =1*10-6), comparing all fibroid and each of the fibroid subtypes to normal myometria ( Fig S5). HMGA1hi fibroids were similar to normal myometria and had few DMCs. MED12mt and HMGA2hi fibroids exhibited similarity in TFBS enrichment at their hypermethylated distal loci, further indicating that they could share similar transcriptional rewiring. Notably, binding sites for EZH2 and SUZ12, components of the PRC2 complex (25), were highly enriched in the hypermethylated, and likely closed off, cytosines. In contrast, estrogen receptor-α (ERα) binding sites were enriched in distal sites that lose methylation and are presumably activated in the fibroids.

DNA Hypomethylation in the HMGA2 Gene Body Correlates with Higher HMGA2
Expression.
Compared to normal myometria, two adjacent CpG sites in a CpG island within the HMGA1 promoter gained DNA methylation in MED12mt and HMGA2hi fibroids, but remained hypomethylated in HMGA1hi fibroids, in line with the high expression level of HMGA1 observed in HMGA1hi fibroids ( Fig S6; Fig 1). In contrast, a segment of the gene body of HMGA2 (measured by 13 consecutive DNA methylation probes) was hypomethylated in HMGA2hi fibroids, compared with other fibroids and to normal myometria (Fig 2A). Upregulation of HMGA2 expression has been generally attributed to rearrangements at 12q14-15 (26,27). However, when fluorescent in situ hybridization (FISH) was performed on fresh-frozen tumor sections or exponentially growing fibroid cell cultures of samples identified as HMGA2hi, 12q14-15 rearrangement was not detected in either GO535F1 or MP120F2 with probes over 600 KB upstream of HMGA2 ( Fig S7A and B). qRT-PCR was performed to confirm the assignment of these fibroids to the HMGA2hi subtype by the RNA-seq results ( Fig S7C). The 3' end distal hypomethylated CpG site was located within a binding motif for CTCF, as determined by ENCODE ChIP-seq data (Fig 2A) (28). CTCF is involved in forming long range chromatin loops that alter the 3D structure of chromosomes and acts as an insulator of transcriptional activity of the encompassed genes (29). To locate CTCF binding regions upstream of, and within the HMGA2 gene body, we analyzed enhancer-promoter interactions from FANTOM5 (black lines), and overlaid it with available ChIA-PET interaction data (red line). We also inferred open and closed compartments (A/B compartment) (30) using DNA methylation profiles surrounding the HMGA2 region ( Fig  2B). Indeed, the chromatin was open for this locus in the HMGA2hi subtype specifically and closed in the others. These results suggest that epigenetic alteration could be an additional mechanism allowing for overexpression of HMGA2, at least in some fibroids.

Between Fibroid Subtypes.
As with DNA methylation, global RNA-seq analyses showed that, in addition to clustering separately from normal myometrial samples, the MED12mt, HMGA1hi, and HMGA2hi subtypes also clustered separately from each other and independently of patient origin (Fig 3A). Gene set enrichment analyses of the RNA-seq results between the MED12mt or HMGA2hi fibroids, when each was compared with normal myometria, showed a large number of shared activated and repressed genes among the top-ranked gene sets (Fig 3B). More than half of the up-regulated genes in HMGA2hi fibroids were similarly regulated in the MED12mt fibroids (Fig 3C), and nearly half of the downregulated genes in HMGA2hi fibroids were also down regulated in MED12mt fibroids ( Fig 3D). Gene ontology analyses of the differentially expressed genes showed a high concordance of dysregulated genes ( Fig 3E). KEGG pathway analyses of the differentially expressed genes showed that a few of the pathway changes were shared between MED12mt and HMGA2hi fibroid subtypes. Concordant with previous transcriptomic profiling, we identified elevated expression of RAD51B, PLAG1 and PAPPA2 in our MED12mt, HMGA2hi and HMGA1hi fibroids, respectively (Table S2) (13). These RNA-seq results suggest that the MED12mt and HMGA2hi fibroid subtypes are more alike transcriptomically than they are different.
In Fig 1F, we showed that the average HMGA2 expression level was also significantly elevated in MED12mt fibroids (log 2 FC=3.2), though to a lesser degree than in HMGA2hi fibroids (Log2 FC=11.6). Further analysis of the RNA-seq results identified that, although most of the MED12mt fibroids did not express any HMGA2 transcripts, 3 MED12mt fibroids did express HMGA2 (Fig S8), suggesting that there might be two subtypes of MED12mt fibroids, MED12mt and MED12mt/HMGA2 expressing, with different transcriptional profiles. However, these MED12mt/HMGA2 expressing fibroids did not cluster closer to the HMGA2hi fibroids in either the RNA-seq ( Fig 3A) or DNA methylation heatmaps ( Fig 1A).
To identify genes dysregulated due to altered promoter methylation level, we integrated DNA methylation and RNA-seq profiles for each fibroid subtype (Fig 4).
Genes with significantly altered promoter CpG methylation (absolute delta β values>0.25; p<0.05) and associated gene expression change between normal myometrium and fibroids are listed in Table S3. A few hypomethylated and induced genes were identified, but most (VCAN, RAD51B, COL1A1, etc.) have been previously reported. KRT19, which has also been previously reported to be silenced by promoter hypermethylation in fibroids (31), was the most down-regulated gene in all fibroids compared normal myometria using this analysis ( Fig 4A). The heatmap of beta values for the EPIC probes in the KRT19 gene show that most of the hypermethylation in the fibroids is occurring in the promoter region ( Fig 4B). We also identified many additional genes similarly silenced by promoter methylation, including genes involved in the retinoic acid pathway (ADH1B), WNT pathway (WNT2B), and stem cell function (GATA2, KLF4) ( Fig 4A). Many of these are known tumor suppressor genes, silencing of which could be important for fibroid growth. More granular analysis of KLF4 showed that it was hypermethylated and downregulated in each of the fibroid subtypes compared to normal myometria ( Fig 4C). We then analyzed coordinated differential methylation and gene expression by fibroid subtypes. In HMGA1hi fibroids SMOC2 as the most hypermethylation and down regulation gene ( Fig 4D). SMOC2 can stimulate endothelial cell proliferation and migration, which is consistent with the observed hypoxia in fibroids (32,33). MED12mt fibroids ( Fig 4E) had a differential gene pattern similar to that of total fibroids. Several of the differentially regulated genes (PAPPA2, PLAG1, for example) in HMGA2hi fibroids have also been previously described.
However, HOXA13 was identified to be hypomethylated and upregulated in HMGA2hi fibroids compared to normal myometria ( Fig 4F). Given the importance of this HOX gene in female reproductive tract development, we continued to further investigate the HOXA loci.

HOXA13 Induces a Homeotic Transformation in Myometrium.
The Clustering analysis based on this differential gene set showed that uterine fibroids clustered more closely to normal cervix than to myometria (Fig 5E), strongly suggesting that the development of uterine fibroids is in fact a homeotic transformation into cervical tissue results from induced expression of HOXA13 in normal myometrium.

DISCUSSION
In an attempt to understand the tumorigenic role of MED12 mutation in fibroid  (9), including the 8-amino acid deletion described here (Fig 1), and the clustering of these fibroids in both DNA methylation and RNA-seq analyses with other MED12 mutants ( Fig   1; Fig 3), suggests that a loss of function is more likely. However, given the hotspot MED12 mutations in fibroids, and now the in-frame deletion described in Fig 1, further study will be necessary to determine the precise mechanisms disrupted by MED12 mutation in fibroids.
Over-expression of HMGA2, a gene normally not highly expressed in normal myometrium, is the second most common phenomena that is known to occur in fibroids (5). HMGA2 over-expression has been attributed to genetic alterations involving translocations or aberrant splicing within the coding region (47, 48). HMGA2 expression has also been shown to be potentially regulated by the microRNA Let-7 family (49,50).
A gain of function mechanism involving fusion of RAD51B to HMGA2 has also been identified to contribute to overexpression (51), however, we were unable to detect RAD51B-HMGA2 fusion or aberrant splicing in our exome and RNA-seq analyses. Among the HOX genes that we analyzed, we identified HOXA13 to be uniquely upregulated within the HOX gene family in uterine fibroids, which has not been previously appreciated. Our results strongly suggest that expression of HOXA13 in uterine fibroids drives aberrant gene expression in myometrial cells, transforming them into uterine fibroids with a gene expression profile that is very characteristic of cervix.
Much like fibroids, the cervix, a collagen dense tissue that protects the uterus and, if gravid, the fetus from pathogenic assault, undergoes significant remodeling postpartum.
Thus, development and resolution of fibroids could be controlled by mechanisms similar to those observed in the cervix. For example, progesterone can keep the cervix stiff and help prevent cervical softening or ripening (76), and is normally required for fibroid growth and development (77). At term, mechanisms driving cervical ripening, characterized by partial dissolution of the collagen matrix is necessary for delivery (78), could also be driving the decrease in fibroid burden observed in postpartum uteri (79).
Relaxin is a peptide hormone associated with cervical ripening at parturition (80) In conclusion, using an integrative approach of RNA-seq, DNA methylation array and exome sequencing, we characterized the genetic and epigenetic profiles of uterine fibroids, which clearly allows for their subtyping by mutation and altered methylation.
Our analysis also identified hypomethylation of HMGA2 gene body as a possible mode of regulation that is independent of translocation. We also showed high correlation with gene expression and DNA methylation, highlighting the regulatory potential of altered DNA methylation driving development of uterine fibroids. Transcription factor binding analysis further identifies fibroid subtype specific regulators and hint at critical role of these regulators in fibroid tumorigenesis. Finally, a deeper characterization of our RNAseq results identified expression of a HOX gene, HOXA13, deregulation of which influences a number of genes characterized in fibroid biology. Importantly, we identified a homeotic transformation of normal myometrium to cervix-like tissue, probably through the subtype-independent upregulation of HOXA13 expression. Future work will validate the mechanisms altered by HOXA13 overexpression in fibroid biology and its possible targeting for therapeutic intervention for symptomatic patients.

Sample processing
Fibroid and matched myometrial samples were collected from hysterectomies of consented patients using Spectrum Health or Northwestern University IRB approved protocols for secondary use of biobank tissues. Samples were aliquoted upon arrival for DNA and RNA isolation, and cell separation. Race was confirmed essentially as described in (16).   (20). Sample-and cluster-based stability score were then calculated. Fibroids-specific methylation profiling was generated using CpG probes unmethylated in normal myometria but methylated (defined by a beta value of no less than 0.3) in at least one sample within each fibroid subtype. Hierarchical clustering based on betas within each subtype was then performed to show the specific methylation patterns across different subtypes.
Differentially methylated cytosines (DMCs) were called using R package DMRcate (88) by comparing all the fibroids, and each fibroid subtype to myometria, based on its build-in default p-value cutoff and an absolute beta-value difference threshold at 0.2. DMCs were mapped to genes captured by RNA sequencing, if they were located within gene promoters (defined as 2 kb flanking regions surrounding transcription start sites). We then performed enrichment analysis of transcription factors binding sites (TFBSs) at distal regulatory elements (i.e., enhancers). Distal probes were identified as probes located within TFBSs but not within gene promoters. For hyper-or hypo-DMC set generated from each comparison, hypergeometric test was applied to calculate the enrichment or depletion of binding sites for each TF within DMC set at those distal probes. Significance cutoff was made at 1e-6 after false discovery (FDR) correction.

Evaluating clonality with X inactivation
For patients with more than one fibroids sample examined, we evaluated the possibility that they arose as independent clones or from the same origin. We performed this analysis by integrating exome-seq and RNA-seq data. For each patient, we identified germline SNPs on the X chromosome from exome-seq data using GATK. We restricted the analyses to those SNPs that remained heterozygous in DNA in both tumors. We then examined which alleles (A or B) were expressed for these SNPs using RNA-seq data. Alternative expressed alleles would indicate separate cellular origins, as random inactivation of one X chromosome occurs early in development.
All statistical analyses were conducted using R software with versions newer