Genome-Wide Open Chromatin Methylome Profiles in Colorectal Cancer

The methylome of open chromatins was investigated in colorectal cancer (CRC) to explore cancer-specific methylation and potential biomarkers. Epigenome-wide methylome of open chromatins was studied in colorectal cancer tissues using the Infinium DNA MethylationEPIC assay. Differentially methylated regions were identified using the ChAMP Bioconductor. Our stringent analysis led to the discovery of 2187 significant differentially methylated open chromatins in CRCs. More hypomethylated probes were observed and the trend was similar across all chromosomes. The majority of hyper- and hypomethylated probes in open chromatin were in chromosome 1. Our unsupervised hierarchical clustering analysis showed that 40 significant differentially methylated open chromatins were able to segregate CRC from normal colonic tissues. Receiver operating characteristic analyses from the top 40 probes revealed several significant, highly discriminative, specific and sensitive probes such as OPLAH cg26256223, EYA4 cg01328892, and CCNA1 cg11513637, among others. OPLAH cg26256223 hypermethylation is associated with reduced gene expression in the CRC. This study reports many open chromatin loci with novel differential methylation statuses, some of which with the potential as candidate markers for diagnostic purposes.


Introduction
Cancer is a continuous global burden with an estimation of over 18.1 million new incidences and projected to increase in the next decade [1]. Within these statistics, colorectal cancer (CRC) contributes around 1.1 million (6%) of total cases and is ranked as the fourth most common cancer in the world [1]. CRC can occur through the accumulation of multiple genetics and epigenetics changes. Research has shown that somatic mutation in APC, BRAF, KRAS, PIK3CA and TP53 [2,3] have been frequently observed in CRC and are considered the drivers of CRC formation. Despite many kinds of research based on their personal adenoma or cancer history [17]. In addition, to the best of our knowledge, there have been extremely limited studies which investigate global DNA methylation profile using this beadchip which emphasize on the open chromatins area in CRC. Therefore, we aim to provide the readers with a comprehensive dataset of DNA methylation of open chromatins in CRC.

Clinical Specimens
The specimens from 51 CRC patients diagnosed at the UKM Medical Centre (UKMMC) were retrieved from the UMBI-UKMMC Biobank (23 pairs of tumour-adjacent normal fresh frozen colon tissues and 28 CRC tissues). The specimens were collected according to the procedures approved by the UKM Research Ethics Committee and all patients gave informed consent for their specimens to be stored and used for future research. The tissues were dissected, snap-frozen and stored in liquid nitrogen. All samples were cryosectioned and stained with haematoxylin and eosin and the percentage of tumour cells and normal cells contents were assessed by a pathologist. Only tumour samples with at least 80% cancerous cells and normal adjacent colon tissues with less than 20% necrosis were selected. The tissues were subjected to nucleic acids extraction using Allprep DNA/RNA/miRNA Universal Kit (Qiagen, Hilden, Germany) according to the manufacturer's recommendations. The integrity of DNA and RNA was assessed using agarose gel electrophoresis and RNA 6000 Kit (Agilent Technologies, Santa Clara, CA, USA). The quantity and purity for both DNA and RNA were assessed using Nanodrop 2000c Spectrometer (Thermo Fisher Scientific, Waltham, MA, USA).

Bisulfite Conversion
Five hundred nanograms (500 ng) of DNA was chemically modified to convert all unmethylated cytosine to uracil by the EZ DNA methylation-Gold kit (Zymo Research, Irvine, CA, USA) according to the manufacturer's protocol. The effectiveness of bisulfite conversion was determined using Universal Methylated DNA Standard & Control Primers (Zymo Research, Irvine, CA, USA) according to the manufacturer's protocol.

Methylation Microarray
The Infinium DNA MethylationEPIC assay was performed on 12 patients according to the manufacturer's specifications and the beadchips were scanned using iScan (Illumina, Inc., San Diego, CA, USA). The Illumina Infinium DNA MethylationEPIC assay examines the DNA methylation status of >850,000 CpG dinucleotides distributed over the whole genome, including >220,000 open chromatins.

Microarray Data Analysis
The raw idat files obtained from methylation microarray were analysed using GenomeStudio V1.9.0 and ChAMP Bioconductor packages [18]. Filters were applied to all datasets where CpG sites with a detection of p-values greater than 0.01 in one or more samples were excluded from further analysis. Additionally, probes on sex chromosomes were also removed. The raw intensities were SWAN-normalized to reduce the technical biases inherent in the probe design before statistical analysis [19]. In addition, to remove variation related to the beadchip and/or position, ComBat normalization was implemented [20]. Once normalization has been performed, β-values were extracted and subjected to further statistical analysis. Heatmap was generated using the online tool Morpheus (https://software.broadinstitute.org/morpheus/, Cambridge, MA, USA).

Gene Expression Analysis
The expression of OPLAH was validated in 51 CRC patients using Thunderbird SYBR qPCR Mix (Toyobo Co. Ltd., Osaka, Japan) on the CFX96TM Real-Time PCR Detection System (Bio-Rad, Hercules, CA, USA). GAPDH served as the reference gene. All primers were obtained from Integrated DNA Technologies (Hs.PT.58.22507981 for OPLAH and Hs.PT.39a.22214836 for GAPDH). Fold change of expression was calculated using the 2 (-DeltaDeltaC(T)) method [21].

Bisulfite Sequencing Validation
The methylation of OPLAH cg26256223 was validated in an additional 27 CRC patients. Due to limited DNA available, we were unable to perform the validation in the same samples subjected to microarray. Methyl Primer Express Software v1.0 (Thermo Scientific, Massachusetts, USA) was used for primer design and the forward primer's sequence is 5 CSRTTTYGGGGTTAAATTAAA 3 while the reverse sequence is 5 CCCCTAATCTCTCTAAACTCCTC 3 . PCR amplification was performed using 30 ng/µL bisulfite-converted DNA and HotStar Taq Master Mix (Qiagen, Hilden, Germany). The amplified PCR products were purified, cloned and sequenced. The fasta files were analysed using Bioedit v7.0.5.3 [22] and BISMA [23].

In Silico Validation of OPLAH Methylation
The methylation of OPLAH cg26256223 was validated based on in silico analysis of TCGA COAD dataset [3] using Wanderer [24].

Statistical Analysis
Differentially methylated CpG sites were determined using t statistics from the limma Bioconductor package [25,26]. We further used the filtering characteristic of adjusted p-value < 0.05 and ∆β of |0.3| to identify significant differentially methylated open chromatins. To verify the accuracy and specificity of the differentially methylated probes, the discriminative performance of the probes was assessed by receiver operating characteristic (ROC) curves, and the area under the ROC curve (AUC), sensitivity, and specificity at the optimal cut-offs were calculated. ROC analysis was performed using GraphPad Prism V8 (GraphPad Software, Inc., San Diego, CA, USA).

Demography
The majority (n = 38) of the patients are between 60 to 70 years old, while the remaining (n = 13) were above 70. Thirty-five patients were at stage T3, followed by stage T2 (n = 6), T4 (n = 6) and stage T1 (n = 4). There was a balanced distribution of male and female patients. All of the tumours were adenocarcinoma and well-differentiated.

The Output from Infinium DNA MethylationEPIC Assay
The genome wide CpG methylation profiles of 12 pairs of CRC tissue and adjacent normal mucosa were generated using the Infinium DNA MethylationEPIC assay. Methylation level at each locus was assessed using β-values generated by the Illumina GenomeStudio software, which was based on the intensity of the methylated and unmethylated probes. Prior to downstream analysis, the Detection Score filter was used, leaving only loci with significantly higher mean signal intensities from multiple probes for a given CpG locus than those of the negative control in the same set of chip data (detection p-value < 0.05). The number of CpG loci with detection p-value < 0.05 range from 863,728 to 865,500, and the call rates from 99.6 to 99.8. Next, the controls information was retrieved from the built-in Control Dashboard, and all of our samples passed the quality controls which include Staining Controls, Extension Controls, Hybdridisation Controls, Target Removal Controls, Bisulfite Conversion Controls, Specificity Controls, Non-Polymorphic Controls, Negative Controls and Restoration Control. No samples were identified as outliers, suggesting a uniformed amplification and hybridization conditions for all samples. Further hierarchical clustering analysis of the raw data successfully group the samples according to cancer or normal group. Additionally, the raw data of microarray methylation can be obtained in GEO under accession GSE149282.

Locations of Differentially Methylated Open Chromatins
We compared the differential methylation status of 12 CRC tissue samples with the 12 adjacent cancer-free colonic tissue samples. Only differentially methylated regions with absolute delta β-values of at least 0.3 at adjusted p-value < 0.05 were reported. From the list of differentially methylated probes, we further filtered for those located in the open chromatin regions. Here, we found that the CRC-associated differentially methylated probes are located at 2187 open chromatin regions. The genomic and gene-related regions of the significant differentially methylated open chromatins were distributed in a different way. Generally, 517 probes were hypermethylated compared to 1670 probes sites that were hypomethylated. Figure 1A shows that the largest portion of hypomethylated sites (57%) were in the opensea and subsequently decreased in other categories (shore 24%, shelf 14% and island 5%). In contrast, more than half (65%) of the significantly hypermethylated open chromatins were in the island, followed by the shore (32%), opensea (2%) and shelf (1%) ( Figure 1B). Meanwhile, Figure 1C shows that majority of the significantly hypomethylated loci were in the body (39%), closely followed by intergenic regions (36%), TSS1500 (10%), 5 UTR (7%), 3 UTR (3%), TSS200 (3%) and 1st exon (2%). However, more than a quarter (28%) of the significant hypermethylated loci were not associated with any genes, while the rest were mainly located in TSS1500 (24%), gene body (20%) and, to a lesser extent, in other gene categories including 5 UTR, 1st exon, TSS200 and 3 UTR ( Figure 1D). Chromosome-wise, chromosome 1 has the highest number of hyper-and hypomethylated loci in open chromatin (47 and 169, respectively), followed by chromosome 7 (46 and 143, respectively) ( Figure 1E).

Differentially Methylated Open Chromatins
Significant methylation differences for open chromatin regions were generated and illustrated through the heat map in Figure 2. Unsupervised hierarchical clustering based on Euclidean distance resulted in a distinct separation between cancer and normal tissues ( Figure 2). The 40 topmost significant differentially methylated probes in open chromatins are illustrated in Table 1 while Figure

Validation of OPLAH cg26256223 Methylation and Global Expression in CRC
The methylation pattern of OPLAH across 27 CRC samples using bisulfite sequencing was in concordance with the methylation profiling. The validation showed OPLAH exhibit average 51.9% of hypermethylation in CRCs in which the methylation percentage is close to the methylation value in microarray.
In order to investigate whether OPLAH hypermethylation, in particular the locus cg26256223, resulted in changes in gene expression, a qPCR analysis was performed. In line with our hypothesis,

Validation of OPLAH cg26256223 Methylation and Global Expression in CRC
The methylation pattern of OPLAH across 27 CRC samples using bisulfite sequencing was in concordance with the methylation profiling. The validation showed OPLAH exhibit average 51.9% of hypermethylation in CRCs in which the methylation percentage is close to the methylation value in microarray.
In order to investigate whether OPLAH hypermethylation, in particular the locus cg26256223, resulted in changes in gene expression, a qPCR analysis was performed. In line with our hypothesis, the expression of OPLAH was significantly reduced in CRC compared to normal tissues (p < 0.0001, fold change = −2.167) (Figure 6).

Figure 6.
Significant OPLAH downregulation was observed in the cancer tissues (CRC) compared to the normal colon. OPLAH cg26256223 hypermethylation was also validated in our validation samples.

In Silico Validation Using TCGA Dataset
Additionally, we also performed in silico validation of OPLAH methylation to determine whether our finding agrees with the study by TCGA. Fifteen CpG loci in OPLAH were found to be significantly hypermethylated in 302 CRCs versus 38 normal tissues, including our locus of interest ( Figure 7).

Figure 7.
In silico validation of OPLAH methylation in the TCGA COAD dataset. The red arrow indicates the specific hypermethylated locus identified in our study, which is also found to be significantly hypermethylated in the tumour compared to the normal colon.

In Silico Validation Using TCGA Dataset
Additionally, we also performed in silico validation of OPLAH methylation to determine whether our finding agrees with the study by TCGA. Fifteen CpG loci in OPLAH were found to be significantly hypermethylated in 302 CRCs versus 38 normal tissues, including our locus of interest (Figure 7).

Figure 6.
Significant OPLAH downregulation was observed in the cancer tissues (CRC) compared to the normal colon. OPLAH cg26256223 hypermethylation was also validated in our validation samples.

In Silico Validation Using TCGA Dataset
Additionally, we also performed in silico validation of OPLAH methylation to determine whether our finding agrees with the study by TCGA. Fifteen CpG loci in OPLAH were found to be significantly hypermethylated in 302 CRCs versus 38 normal tissues, including our locus of interest ( Figure 7).

Figure 7.
In silico validation of OPLAH methylation in the TCGA COAD dataset. The red arrow indicates the specific hypermethylated locus identified in our study, which is also found to be significantly hypermethylated in the tumour compared to the normal colon. Figure 7. In silico validation of OPLAH methylation in the TCGA COAD dataset. The red arrow indicates the specific hypermethylated locus identified in our study, which is also found to be significantly hypermethylated in the tumour compared to the normal colon.

Discussion
An array-based analysis is a simple, practical, and cost-effective tool for genome-wide DNA methylation screening. The latest DNA methylation microarray, Infinium DNA MethylationEPIC, provides substantively increased genomic coverage than previous studies, permitting the identification of novel methylated CpG sites that have not been previously discovered. In this study, we focused on the genome-wide methylation patterns of open chromatins among CRC patients. Data analysis showed that there were more hypomethylated loci than the hypermethylated counterpart, an observation supported by several other studies [10,14,16]. Our finding is further supported by a review [27] which postulates that hypomethylated genes include those involved in nucleosome and chromatin formation, in which the latter is the focus of our study. We also noticed that the hypomethylated sites were predominant at the opensea, while hypermethylation was more common on the island. Since there is no published literature that specifically focuses on the open chromatins using Infinium DNA MethylationEPIC in CRC, we are unable to verify this finding. Nevertheless, from the overall perspectives, our finding is in line with Baharudin et al. [10] and Naumov et al. [28]. Openseas are the regions located more than 4 kilobases from CpG islands, where hypomethylation was the prevalent form of aberrant methylation [29]. CpG islands, on the other hand, are short stretches of CpG-rich sequence which often aberrantly hypermethylated in cancers [30].
The majority of the significantly hypomethylated loci in this study were in the body, while more than a quarter (28%) of the significant hypermethylated loci were in the intergenic regions hence not associated with any genes. Our findings are in disagreement with another study [28] which reported majority (30.6%) the significantly hypomethylated CpG sites were in intergenic regions, while the largest portion of hypermethylated CpGs (25.9%) were in the 1st exon. DNA methylation in the intergenic regions was shown to regulate microRNA expression [31] and stabilizes the genome [32]. Furthermore, Hanley and colleagues discovered that intergenic methylated loci are enriched for transcription factor binding sites, particularly the AP-1 transcription factor family that regulates important cellular functions including apoptosis, proliferation, and differentiation [33]. The only published study on CRC based on Infinium DNA MethylationEPIC [16] did not describe the analysis at the gene-region level, therefore we were unable to compare. Nevertheless, gene body hypomethylation has been reported in cancer and was shown to be linked with reduced transcription activities compared to normal cells [34][35][36].
We identified 2187 differentially methylated sites, of which 1443 (66%) were mapped to 1025 genes. The involvement of several of those genes in CRC has been previously exposed, although the contribution of other genes in colorectal carcinogenesis is the subject for further research. It is noteworthy to mention that profiles of OPLAH cg26256223, EYA4 cg01328892 and CCNA1 cg11513637, cg25264081 hypermethylation in this study were similar to the patterns reported elsewhere [28,37,38]. Luo and colleagues [37] compared methylomes of the normal colon mucosa, tubular adenomas, as well as CRC and reported hypermethylation of EYA4 cg01328892 in CRC and adenoma compared to normal. There is no evidence supporting the hypermethylation of cg01578017 in CRC. Conversely, CHST10 hypermethylation in CRC compared to normal has been reported in several studies [39,40], a disagreement with our finding in which CHST10 cg18845236 was found to be hypomethylated. It is unclear which specific loci those studies referred to. Additionally, hypomethylation status of cg03683132, LY9 cg13904520, PDGFD cg18289710 and SH2D3C cg14582501 in CRC has not been reported.
OPLAH (human 5-oxoprolinase) is a gene involved in ATP-hydrolysis located on chromosome 8q24.3 [41]. Our preliminary finding suggested that the differentially methylated OPLAH cg26256223 had a significant effect on gene regulation, suggesting a possible contribution to CRC through transcriptome alteration. To date, OPLAH cg26256223 was also reportedly hypermethylated in CRCs [28,38]. Despite limited literature pertaining to OPLAH methylation in CRC, there are already several patents field for its application in cancer detection (https://link.lens.org/id5DmKPvsRe, Canberra, Australia). Forty-six patents were granted, with another 244 patents filed. The biggest applicant is the Mayo Foundation for Medical Education & Research, with a total of five patents granted and 17 patents applied. One of their patents, US 10,370,726 B2, is a patent granted for the use of OPLAH, among other genes, to detect CRC in individuals younger or older than 50 years old or in Lynch Syndrome patients [42]. Evidently, the capacity of OPLAH for CRC diagnostic has been proven. Further studies on its role in the prognosis of disease, the response to treatment and the exploration of its druggable potential are worth investigating.
Barrow and colleagues [43] performed the epigenome-wide analysis of DNA methylation in CRC with different smoking statuses; 36 never smokers, 47 former smokers and 13 active smokers, as well as adjacent mucosa from 49 never smokers, 64 former smokers and 18 active smokers. The authors reported significant hypomethylation of four loci associated with the TNXB gene in tissue from active smokers. In our study, we identified 19 hypomethylated TNXB loci in CRC compared to the normal colon; however, the association with smoking status could not be established due to the lack of information. TNXB (tenascin XB) gene encodes a tenascin, which exhibits an anti-adhesive effect [44]. It was first implicated in Ehlers-Danlos syndrome [45], but its role in malignancy has also been established in several cancers including nasopharyngeal [46] and mesothelioma [47], most possibly by promoting the epithelial-to-mesenchymal transition (EMT) to activating latent transforming growth factor-β [48]. More recently, TNXB is denoted as one of the triple-evidenced genes, displaying superior predictive ability in cancer diagnosis and prognosis [49].
HRNBP3 (RNA binding fox-1 homolog 3) gene encodes for an RNA-binding protein and regulates the alternative splicing of pre-mRNA. A meta-analysis revealed that the HRNBP3 gene was one of the most commonly hypomethylated genes in hepatocellular carcinoma [50,51]. In contrast with our finding, Hua et al. [38] reported modest hypermethylation of HRNBP3 in one locus from The Cancer Genome Atlas' rectal carcinoma study. Another study pertaining to this gene in CRC is still lacking, therefore we are the first to report HRNBP3 hypomethylation in CRC, in which 14 loci in this gene were hypomethylated. In line with its function as RNA binding protein, majority of the hypomethylation was in the 5 UTR, a region important for regulation of translation [52].

Conclusions
Our work gives a detailed assessment of the DNA methylation pattern of open chromatins and revealed epigenetically regulated candidate genes in CRC carcinogenesis. Specifically, our results provide the first evidence of HRNBP3, cg03683132, LY9 cg13904520, PDGFD cg18289710 and SH2D3C cg14582501 hypomethylation in CRC. This is the first insight on the open chromatins methylation profile in Malaysian CRC patients. The new knowledge from this study can be utilized to further increase our understanding of CRC methylomics, particularly on the open chromatins. To minimize the effect of confounding factors, methylome studies should be performed in cancer and adjacent normal tissues that have been collected from the same individual, as demonstrated in this study, instead of collecting cancer and normal tissues from a different individual.
However, our study is not without limitation. Although our sample size is small and lack of functional studies, the hypo-and hypermethylation of the genes reported in this study are relevant to carcinogenesis as reported in several studies. In future, the association with survival and other clinicopathological data is warranted. With regards to the heterogeneity of bulk tissue methylomics, single-cell epigenomic shall be performed to obtain higher resolution, cancer-specific methylation changes in order to better understand this cancer at the cellular level. In conclusion, the prognostic and diagnostic roles of the differentially methylated open chromatins warrant future investigations.