Cross-cell DNA methylation annotation and analysis for pan-cancer study

ed/indexed in Academic Search Complete, Asia Journals Online, Bangladesh Journals Online, Biological Abstracts, BIOSIS Previews, CAB Abstracts, Current Abstracts, Directory of Open Access Journals, EMBASE/Excerpta Medica, Google Scholar, HINARI (WHO), International Pharmaceutical Abstracts, Open J-gate, Science Citation Index Expanded, SCOPUS and Social Sciences Citation Index; ISSN: 1991-0088


Introduction
Pan-cancer study can uncover most of cell-and tissuespecific genomic loci and regions with underlying biological functions (Kristensen et al., 2014;Leiserson et al., 2015;The Cancer Genome Atlas Research Network et al., 2013;Witte et al., 2014).Meanwhile, it provides meaningful insights from the genome-wide interrogation of cross-cell analysis and annotation.
While till now, to biomedical researchers and clinicians, there is no systematic reference source of functional association between DNA methylation and transcriptional regulation for wet-lab experiment design and post-experiment validation.Thus, this is an imperative for most biologists and biomedical researchers to improve their research outcomes and efficiency (Bock and Lengauer, 2008;Roadmap Epigenomics Consortium et al., 2015).
Here, we utilized our online curated reference source for DNA Methylation Annotation Knowledgebase (DMAK) and implemented the cross-cell analysis in pan -cancer study.The knowledgebase provides multiple read-to-use analysis results and annotation information for the pan-cancer interrogation and cross-validation.
We deposited the curated information knowledgebase and related analysis results on GitHub for direct download and usage for free.Proper citation is suggested for any usage, possible reanalysis or refinement.
As depicted in Figure 1, the first level of DMAK was the curation of raw data sources from ENCODE Consortium portal; for the study case in our work, we emphasized on cross-cell DNA methylation profiling information for detecting differentially-methylated features and patterns within breast cancer T-47D cell type.This level content includes the summary for the analysis procedure and fundamental functions as discussed in the following sections.
The second level mainly focuses on integrative analysis on the curated DNA methylation data in RRBS format (Blattler et al., 2014;Ziller et al., 2013), we implemented function annotation for methylated CpG sites, identified differentially-methylated regions (DMR), and classified the hyper-and hypo-methylated regions or differential DMR candidates (Kemp Christopher et al., 2014).The detailed analysis procedure and results are given in the following section.
The third level analysis mainly includes the visualization and function analysis for the annotated results, which include the functional association network for tumor suppressor genes identified from the hyper-or hypo-DMRs detected from the above analysis.
We curated information and constructed the comprehensive knowledgebase using data sources mainly from ENCODE Consortium portal, together with other commonly-used tools, and the self-compiled scripts and programs.

Annotation and Analysis Procedure in DMAK
This section mainly discusses the functions and analysis procedure for DMAK, which covers fundamental functions of DMAK reference source, listed as, 3) Analysis and annotation results for the significant differentially-methylated CpG sites (SDMC) with reference to one cell type, here we selected T-47D as the study case.The results are further filtered based on the lifted methylation difference threshold (at least 25% methylation difference for the paired groups).
And the SDMC list contains 106,252 DMCs (Akalin et al., 2012a;Akalin et al., 2012b), together the related statistical p-value and adjusted q-value are also provided in Figure 4.
4) Statistical analysis and annotation results for the differentially-methylated regions (DMR) with reference to one cell type, for consistence we selected T-47D as the case.We identified 16,277 DMR candidates from all the DMCs, with the adjusted q-

Visualization and Function Analysis for the Annotation Results
This section discusses the visualization and function analysis for the annotated results.Together we seek to detect whether there exists any functional association (Szklarczyk et al., 2015) between those identified genes from hyper-DMRs and hypo-DMRs, which can explain the differential expression between those genes qualitatively, especially for the genes belonging to tumor suppressor genes (TSG) (Bedi et al., 2014;Blattler et al., 2014;Zhao et al., 2013).
Thus we annotated the genes identified from DMRs with TSG information, filtered out those from unknown sources, and constructed the TSG functional association networks for hyper-DMR and hypo-DMR, respectively.
For illustration and space limitation, Figure 8 depicts the 20-TSG functional association structures for hyper-and hypo-DMRs, respectively.For validating the high fidelity of the analysis results, those 20 TSGs are randomly selected from the TSG list for each case.
And interestingly, we found most of those TSG nodes are functionally associated to form clusters.In hyper-DMR case, Figure 9A, only 4 out of 20 TSGs are dissociated from the TSG cluster; for hypo-DMR case, Figure 9B, it is comparatively loosely-connected and 10 out of 20 TSGs are not linked to the TSG cluster.
The complete TSG functional association network structures for hyper-DMR and hypo-DMR are provided in DMAK package deposited at GitHub.The TSGs in those structures are highly physically connected and functional associated in DMRs for our T-47D breast cancer case.
Figure 10 depicts the Gene Ontology (Sherman et al., 2007)analysis results for the two functional protein association network inferred for the TSGs.The upper (A) is for the hyper-DMRs, and the corresponding GO terms clearly prove such processes as transcription regulation, differentiation, mutation, activator, pathway in cancer and tumor suppressor, which are closed related to the hypermethylation outcomes of tumor suppressor genes.And the bottom (B) for the hypo-DMRs, and its GO terms present positive regulation of transcription and gene expression, differentiation, which to a certain extent confirm its connectivity to the hypomethylation outcomes of TSGs.

Conclusion
Our cross-cell DNA methylation annotation and analysis provide the systematic information knowledgebase for pan-cancer study.It contains curated reference results for ready-to-use information for sharing and rapid reanalysis.
The first level of the knowledgebase is about raw data preprocess, we collected the data from the ENCODE Consortium portal.The second level is for annotation and function analysis; in this study case, we focused on DNA methylation in breast cancer cell, T-47D, annotated and identified the differentially-methylated sites and regions, and further identified the underlying tumor suppressor genes within the regions.The third level is for visualization and validation procedures.We further constructed the functional association network for the identified tumor suppressor genes, and further annotated the networks with Gene Ontology information, which can provide statistically significant evidences for the hyper-methylated and hypomethylated processes in the breast cancer context.
Our work provides a versatile and comprehensive platform for all biomedical researchers, especially for the genome-wide biomedical analysts, to interrogate and validate their hypothesis in an efficient and uniform way.
In coming days, further annotation and analysis results concerning pan-cancer analysis will be updated into the knowledgebase, especially we seek to provide an interactive environment for biomedical researchers to fetch and utilize this knowledgebase.
Figure 1: Schematic illustration for the DMAK structure.The first level contains ENCODE data preprocess (namely, cell curation and data format process); the second level includes integrative analysis on the ENCODE data, namely DNA methylation CpGs annotation, identification of differentially-methylated CpGs and regions; the third level covers result visualization and furthermulti-scale interrogation of biological functions value ≤0.01, CpG base methylation difference cut off, 25, and DMR mean methylation difference cut off, 20.Within those candidates, 8,936 entries present hypermethylated and 7,341 with hypo-methylated status.With the lifted thresholds, namely adjusted q-value ≤0.001, differentially-methylated CpG base count ≥5, we further detected 7,537 significant DMRs (Sig-DMRs), where 3,512 entries are significantly hypermethylated-DMRs (Sig-Hyper-DMRs), and 4,025 significantly hypomethylated-DMRs (Sig-Hypo-DMRs).The output format is shown in Figure5.5)Statistical analysis and annotation results for thesignificantly hypermethylated-DMRs (Sig-Hyper-DMRs) with reference to T-47D cell type as shown in the output format (Figure6).6)Statistical analysis and annotation results for the significantly hypomethylated-DMRs (Sig-Hypo-DMRs) with reference to T-47D cell type (Figure7).

Figure 2 :
Figure 2: Schematic illustration of statistical information detected from RRBS sequencing read coverage, number of Cs and Ts

Figure 6 :
Figure 6: Schematic illustration for the identified hyper-DMR with reference to T-47D cell type.The mean methylation difference is annotated with red bars for each DMR