Integrated bioinformatics analysis of chromatin regulator EZH2 in regulating mRNA and lncRNA expression by ChIP sequencing and RNA sequencing

Enhancer of zeste homolog 2 (EZH2), a dynamic chromatin regulator in cancer, represents a potential therapeutic target showing early signs of promise in clinical trials. EZH2 ChIP sequencing data in 19 cell lines and RNA sequencing data in ten cancer types were downloaded from GEO and TCGA, respectively. Integrated ChIP sequencing analysis and co-expressing analysis were conducted and both mRNA and long noncoding RNA (lncRNA) targets were detected. We detected a median of 4,672 mRNA targets and 4,024 lncRNA targets regulated by EZH2 in 19 cell lines. 20 mRNA targets and 27 lncRNA targets were found in all 19 cell lines. These mRNA targets were enriched in pathways in cancer, Hippo, Wnt, MAPK and PI3K-Akt pathways. Co-expression analysis confirmed numerous targets, mRNA genes (RRAS, TGFBR2, NUF2 and PRC1) and lncRNA genes (lncRNA LINC00261, DIO3OS, RP11-307C12.11 and RP11-98D18.9) were potential targets and were significantly correlated with EZH2. We predicted genome-wide potential targets and the role of EZH2 in regulating as a transcriptional suppressor or activator which could pave the way for mechanism studies and the targeted therapy of EZH2 in cancer.


INTRODUCTION
Dynamic regulation of chromatin regulators at enhancers and promoters plays an important role in modulation of gene expression and cell fate determination [1,2]. EZH2, the catalytic subunit of Polycomb repressive complex 2 (PRC2), methylates lysine 27 of histone H3 (H3K27) to promote transcriptional silencing [3]. Converging lines of investigation have implicated that EZH2 is likely to be an important mediator of tumor cell plasticity and involves in development and progression of serval cancers. The earliest and intensive study of a role for EZH2 in cancer was that EZH2 overexpression was associated with worse progression of prostate cancer [4]. Similar findings were also observed in other tumors such as breast cancer, bladder cancer, melanoma, as high level of EZH2 were shown to correlate with aggressiveness and advanced disease in each of these cancer types [5][6][7]. Given its role as a transcriptional regulator, researchers focus on the identification of EZH2 regulated target genes or pathways with great efforts. In prostate cancer and breast cancer, EZH2 has been shown to repress the expression of E-cadherin [8] and RUNX3 [9], resulted in promotion of EMT and invasive phenotype and increased cell proliferation, respectively. Moreover, independently, ectopic EZH2 expression has been found to confer a proliferative advantage upon noncancerous cells [5].
The canonical role of EZH2 is mainly to transcriptionally silence of tumor suppressor genes which depends on PRC2. However, several recent studies showed the non-canonical functions of EZH2, such as transcriptional activation of target genes, which are sometimes associated with malignant progression. In ERpositive breast cancer cells, EZH2 interacts with β-catenin

Research Paper
and ER, and functionally enhances gens expression which is independent of PRC2 [10].
These researches show the multifaceted role of EZH2 in human malignancies and challenges will be to understand not only the uniqueness in the context of different tumors, but also the complexity of roles of enzymatic activity as a transcriptional suppressor versus non-enzymatic activity as a transcriptional activator. However, current studies of EZH2 in cancer limited in hormonal tumors such as prostate cancer and breast cancer [18]. Regulation by EZH2 in other cancer types remains largely unstudied, especially for the lncRNA targets.
Based on the high throughput technology, such as ChIP sequencing and RNA sequencing, we can analyze the global changes in genome-wide DNA interaction sites and gene expression [19]. In this study, we highlighted the role of EZH2 in regulating mRNA and lncRNA expression by integrating ChIP sequencing analysis in 19 normal and cancer cell lines and co-expression analysis in ten cancer types from TCGA data. Our finding provided a global view on cell specific EZH2 transcription regulation and canonical and non-canonical roles of EZH2. These results revealed the involvement of EZH2 in transcriptional regulation of gene expression and provided potential targets for further research on EZH2.

EZH2 binding peaks in multiple normal and cancer cell lines and distribution features
Based on MACS2 peak calling results, EZH2 binding peaks in different cell line ranged from 1,046 for HSMMtube to 28,588 for LNCaP with a median of 7,621 per cell line (Table 1). We analyzed distribution features of EZH2 binding sites on chromosome ( Figure 1A and 1B). The results showed that EZH2 binding sites in most cell lines were centered on the promoter region (<= 3kb to TSS region) except LNCaP, K562 and HeLa cell lines which were centered on distal intergenic region (> 3kb to the TSS region) ( Figure 1A). Among the EZH2 binding sites relative to promoter region, most were centered on 1kb region to the TSS region ( Figure 1B), whereas binding sites in LNCaP, K562 and HeLa were located with 5-10 kb region to the TSS region ( Figure 1B). Moreover, we also analyzed the overlapping of EZH2 binding peaks among different cell lines. Peak intervals with a separation of maxgap of 1,000bp or less were considered to be overlapped. 19 cell lines were divided into five groups. Cancer cell lines including LNCaP, abl, VCaP and VCaP-DHT were in group 1 while HeLa, HepG2 and K562 were in group 2. The other normal cell lines were divided into group 3-5 ( Figure 1C). Venn diagram showed that there were hundreds to thousands of overlapping binding peaks in different group ( Figure 1C). Further analysis showed that there was two overlapping binding peaks in all 19 cell lines ( Figure 1C) one of which was located in the intron of coding gene COMMD3-BMI1 by annotation ( Figure 3A).

Peak annotation, GO and KEGG enrichment analysis
Using ChIPseeker package, we detected the EZH2regulated target genes around binding peaks. For mRNA targets, the identified target coding genes ranged from 737 in HSMMtube to 7,255 in abl cell with a median of 4,672 per cell line (Table 1 and Supplementary Table S1). For lncRNA targets, the identified target lncRNAs ranged from 703 in HSMMtube to 6,284 in HMEC cell with a median of 4,024 per cell line (Table 1 and Supplementary  Table S2). Venn diagram showed that there were 20 overlapping mRNAs and 27 overlapping lncRNAs targets in all 19 cell lines ( Figure 2). Notably, we also discovered a number of cancer or cell specific target mRNAs or lncRNAs. For instance, 286 target mRNAs and 273 target lncRNAs were cancer specific which were only enriched in the four prostate cancer cell lines. Binding of EZH2 at representative genomic loci were visualized using IGV ( Figure 3A).
Combination with GO database, we conducted GO enrichment analysis for EZH2-regulated target genes across all the 19 cell lines (Supplementary Figure  S1) (P<0.05). We discovered that EZH2 regulated the developmental process (biology process), plasma membrane (cellular component) and DNA binding (molecular function) in almost all the 19 cell lines (Supplementary Figure S1). For KEGG pathway enrichment analysis, signaling pathways regulation pluripotency of stem cells and pathways in cancer were significantly enriched in most of the cell lines while important pathways Hippo, Wnt, MAPK and PI3K-Akt signaling pathways were enriched in different cell lines ( Figure 3B). Notably, PI3K-Akt signaling pathway were only enriched in two prostate cancer cell VCaP and VCaP-DHT ( Figure 3B). In total, EZH2-regulated target genes showed shared similarities with differences of diverse characteristics and cell specificity. www.impactjournals.com/oncotarget

EZH2 binding patterns around promoter region
To evaluate the binding patterns around the promoter region, we next systematically examined EZH2 binding sites across all promoters. 8,620 promoters were selected by excluding those where there were rare binding sites within. We then clustered the binding patterns based on their localization patterns across these loci and different cell lines. Figure 4 showed that the EZH2 binding sites could be divided into strong positioning signal and weak localization signal depending on ChIP sequencing signal intensity. The intensity of binding sites among these selected promoters varied from sample to sample and five clusters of promoters were generated ( Figure 4). EZH2 binding sites in most of the cell lines showed signal within cluster 1-4 while only EZH2 binding sites in abl cell showed strong signal in cluster 5. Moreover, binding sites in cluster 2 were located in the flanking region of TSSs while binding sites in other clusters were mainly located in TSSs region. KEGG pathway enrichment analysis were also applied to the clusters separately. These results were largely consistent with KEGG pathway enrichment in Figure 3B. Signaling pathways associated with neuro development were enriched in all five clusters while important pathways such as Wnt, MAPK, TGF-beta and calcium signaling pathway were enriched in cluster 1 and 2. Pathways enriched in cluster 5 were ribosome, spliceosome, RNA degradation and cell cycle ( Figure 3B).

Co-expression analysis of EZH2 in TCGA data
Co-expression analysis of transcriptome data may provide interesting insights in understanding the gene and transcript level regulations in biological samples. In order to find out the potential targets of EZH2 and validation of ChIP sequencing data, co-expression analysis between EZH2 and other genes were applied.
For co-expressed mRNAs, the correlation coefficients between EZH2 and 19,697 coding genes were calculated by Spearman correlation test in ten tumor types (Supplementary Table S3) and was presented as heatmap in Supplementary Figure S2A. Genes with high coefficients were largely consistent among the ten tumor types, which is, EZH2 was either positively or negatively correlated with these genes in all ten tumors (Supplementary Figure S2A). The top ranked positively or negatively correlated 20 co-expressed genes were selected by calculating the row sum of the coefficients in ten tumors and were shown in Figure 5A. We also validated the correlation between the ChIP sequencing data and coexpressing data by checking whether the top 20 positively or negatively correlated co-expressed genes were the potential targets in the ChIP sequencing results ( Figure  5B). Figure 5A and 5B showed that RRAS was negatively correlated with EZH2 in most tumors especially in LUSC, PRAD and LAML and was potential target in nine cell lines including abl, HepG2, K562 and LNCaP. TGFBR2, as another example, was negatively correlated with EZH2  in LUAD, LUSC and BRCA and was potential target in five cell lines including abl, LNCaP, and VCaP-DHT. NUF2 and PRC1 were positively correlated with EZH2 in TCGA data and were targets in six and two cell lines, respectively ( Figure 5A and 5B). Binding of EZH2 at RRAS, TGFBR2, NUF2 and PRC1 loci in 19 cell lines were shown Supplementary Figure S3A. For co-expressed lncRNAs, Co-expression analysis between EZH2 and lncRNAs were also applied using TCGA data from Co-LncRNA database. Spearman correlation coefficient in nine tumor types (LAML was not available) was obtained and NA values were represented as zero in the heatmap (Supplementary Table S4, Figure  5C and Supplementary Figure S2B). 1,401 lncRNAs with coefficients available at least in one tumor were shown in Supplementary Figure S2B, EZH2 co-expressed lncRNAs showed more diversity and cell specificity than mRNA, which is, some lncRNAs were positively correlated in some tumors while were negatively correlated in other tumors (Supplementary Figure S2B). Top ranked 20

Figure 3: Binding of EZH2 at representative genomic loci and KEGG pathway enrichment of EZH2 target genes. A.
Binding of EZH2 at representative genomic loci. Binding peaks around or within shared target genes of COMMD3-BMI1, FOXI3, NTNG2 and PAX2 were presented in 19 cell lines. B. KEGG pathway analysis of EZH2 target genes in 19 cell lines. positively and negatively correlated lncRNAs were shown in Figure 5C. We also checked whether these lncRNAs were potential targets in the ChIP sequencing data and results were shown in Figure 5D. Figure 5C and 5D showed that LINC00261 were negatively correlated with EZH2 in LUAD and LIHC and was potential target in 14 cell lines including H1, HMEC, HUVEC and NHLF cells. DIO3OS, another example, was negatively correlated with EZH2 in PRAD, BRCA and GBM and was potential target in 13 cell lines including K562, VCaP, VCaP-DHT, abl, HepG2 and NH-A ( Figure 5C and 5D). RP11-307C12.11 and RP11-98D18.9 were another two examples which were positively correlated with EZH2 and were shown in Figure 5C and 5D. Binding of EZH2 at these four lncRNAs loci were shown in Supplementary Figure S3B.

DISCUSSION
It is widely accepted that epigenetic alterations are associated with different stages of tumor formation and progression in many cancers. Therefore, epigenetic abnormalities in cancers are emerging as important biomarkers and may have therapeutic potential. As an integral component of PRC2, there are two possible molecular mechanisms for EZH2 action based on its role as a transcriptional repressor or activator [18,20]. The canonical role of EZH2 is that of a histone methyltransferase which depends on PRC2 while the noncanonical role of EZH2 is less known as a transcriptional inducer which is independent to PRC2 [18].
These targets above have already been validated by functional experiments by different researchers, thus we tested if our bioinformatics analysis correctly predict these validated targets. Prostate cancer was chosen as an example since ChIP sequencing results were available in  four prostate cancer cell lines in our data. An increased EZH2 expression in normal prostatic epithelial cells can suppress DAB2IP gene expression [28], DAB2IP was predicted as target in 14 cell lines including abl and LNCaP cell lines and was significantly correlated with EZH2 (Coef= -0.38, P<0.05) in RNA sequencing data in 497 prostate cancer patients (Supplementary Tables  S1 and S3). Similarly, TIMP3 is suppressed in prostate cancer cells by EZH2 which promoted the degradation of the extracellular matrix (ECM) and metastasis [29], TIMP3 was a potential target in VCaP, VCaP-DHT, abl and LNCaP cell lines and negatively correlated with EZH2 (Coef= -0.52, P<0.05) in prostate cancer in our analysis (Supplementary Tables S1 and S3). In fact, CDKN2A, ADRB2, CDH1 [4,30] and SLIT2 [31] were reported to be suppressed by EZH2 in prostate cancer cells and were also targeted by EZH2 in at least one prostate cell line in our ChIP sequencing analysis (Supplementary Table S1). These results proved that our bioinformatic predictions were reliable to a large extend and these validated EZH2 targets in prostate cancer might also be applied into other cell lines. It is worth noting that the correlations might not be compatible between the ChIP and RNA sequencing experiments results since they were conducted in different cells/tissues and under different conditions.
Interaction between EZH2 and lncRNAs in tumor also draws great attention to researchers in recent years. HOTAIR, one of the most well-known and studied lncRNA, was firstly found to recruit PRC2 in modulating the cancer epigenome in prostate cancer which depend on the enzymatic activity of EZH2 [32]. However, these studies were focused on how lncRNAs recruit EZH2 in re-targeting gene expression. Regulation of lncRNA expression by EZH2 remains largely unstudied. We systematically investigated the potential involvement of EZH2 in targeting lncRNA in 19 cell lines and revealed thousands of potential lncRNA targets for the first time.
Our study was an integrated ChIP sequencing and co-expression analysis study in 19 cell lines and ten cancer types, respectively. Moreover, this study was the first to reveal the potential lncRNA targets of EZH2 by ChIP sequencing. By integrating ChIP sequencing and co-expression analysis results, we revealed genome-wide shared and tissue/cell specific potential targets, researchers can predict potential targets in a specific cell line or cancer type and the underlying role of EZH2 in regulating as a suppressor or an activator. Thus, studies of EZH2 in tumorigenesis could be accelerated to understand the mechanism of EZH2, to identify novel markers for cancer diagnosis or treatment, and to prevent drug resistance in the future.
Given the evidence for EZH2 being a cancer driver, the development of EZH2-specific inhibitors has been an active area of investigation. Promising preclinical results have been obtained and human phase 1 trails are now underway with early results suggesting potential clinical activity [33][34][35]. Studies revealing the mechanisms of EZH2 in tumorigenesis may provide insights into the drug interaction or resistance in cancer treatment.
In conclusion, these divergent roles of EZH2 in human malignancies suggest context and tumor cell-type specificities. The contribution of hyperactive or hypoactive EZH2 during tumor formation may reflect the complex and important roles that EZH2-associated genes play in cell fate decisions [36]. Our study was the first to evaluate the role of EZH2 in regulating mRNA and lncRNA expression in tumor by integrating ChIP sequencing and co-expressing analysis. With more insights into the mechanisms of EZH2 in more cancer types, targeted therapy of EZH2 may apply to more cancer types and combination of other chemotherapies could be considered to overcome the drug resistance in clinical.

ChIP sequencing data of EZH2 and RNA sequencing data in TCGA
For ChIP sequencing data, a comprehensive datasets search of Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo) was performed with search terms included "EZH2" combined with "ChIP". To be eligible for inclusion in the study, ChIP sequencing datasets were selected under the following criteria: (1) antibody of EZH2 was used for chromatin enrichment; (2) IgG control was available; (3) high throughput sequencing rather than microarray was applied; (4) raw data was available. The ChIP sequencing data in 19 cell lines were obtained from GEO under the GSM number listed in Table 1. Cell line information was also listed. For RNA sequencing data, cancer types relevant with cell lines included in ChIP sequencing data were selected correspondingly. Level 3 RNA sequencing data were downloaded from The Cancer Genome Atlas (TCGA) Research Network (http://cancergenome.nih.gov/) using FireBrowseR [37]. Ten cancer types and sample numbers were listed in Table 2.

EZH2 binding peaks calling
Mapping of reads was done using bowtie (version 1.1.2) [38] on the hg19 genome. Bowtie's output was converted to BAM format using SAMtools (version 1.4) [39]. MACS2 (version 2.0.10) [40] was used to identify enriched ChIP regions of EZH2, and peak calling was employed to infer the actual binding sites from the positional distribution of sequenced DNA fragments. The q value (minimum FDR) cutoff to call significant regions was 0.01 as default setting.

Potential target gene annotations, GO and KEGG pathway analysis
Bioconductor package ChIPseeker (version 1.6.7) [41] within R program (version 3.2.4) was facilitated to detect annotation of the enriched peaks of EZH2 binding sites. For potential mRNA targets, binding peaks were annotated by "TxDb.Hsapiens.UCSC.hg19.knownGene" package while for potential lncRNA targets, binding peaks were annotated by long noncoding RNA GTF file downloaded from GENECODE database (http://www. gencodegenes.org/) (Release 24). Transcription start site (TSS) region, by default TSS is defined from -3kb to +3kb.
Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were conducted using clusterProfiler package (version 2.4.3) [42]. Annotated peaks were further analyzed for differences of dynamic variation of EZH2 binding sites using ChIPseeker.

Heatmap of ChIP sequencing signal of EZH2 binding sites
SeqMINER-1.3.3e, an integrated user-friendly platform that allows addressing central questions in the ChIP sequencing analysis work flow [43], was used to generate the heatmap of ChIP sequencing signal of EZH2 binding sites and conduct K-means clustering analysis. This fine-scale analysis highlights differential binding sites of individual TSSs and flanking regions, as well as differential relations to gene activity.

Co-expression analysis of EZH2 in TCGA data
Level 3 mRNA sequencing data were downloaded from TCGA and Spearman correlation test was preformed between the RSEM of EZH2 and other coding genes. Co-expression analysis between EZH2 and lncRNAs were also applied using TCGA data from Co-LncRNA database (http://www.bio-bigdata.com/Co-LncRNA/). Spearman correlation coefficients in nine tumor types (LAML was not available) were obtained, NA values were detected and replaced as zero due to the relatively low expression and cell specificity of lncRNAs in different cancer types. Top ranked co-expressed mRNAs and lncRNAs were defined as the maximum of row sum of coefficients in ten or nine cancer types, respectively. Heatmaps of coefficients between EZH2 and mRNAs and lncRNAs were also generated, respectively.