Research ArticleDEEPAligner: Deep encoding of pathways to align epigenetic signatures
Graphical abstract
Introduction
Pathway analysis has become the first choice for gaining better understanding of large biological networks, mainly due to its better explanatory power and ability to use existing knowledge base (Khatri et al., 2012). Most of the computational methods assess statistical significance of pathways, in the first place, to validate experimental results. Knowledge base driven pathway analysis is very essential nowadays to infer deviant modifications that are caused by genetic or epigenetic factors. Epigenetic studies deal with heritable changes in functions and behavior of genes that cannot be explained by changes in gene sequence (Lim et al., 2010). DNA Methylation is one form of epigenetic modification which refers to addition of methyl (CH3) group to the 5′ end of a string of Cytosine or Guanine nucleotides in human genome (Zhang et al., 2015). Epigenetic mechanisms, in general, and DNA Methylation, in particular, are known to cause perturbations in pathways by Differential Methylation (DM) (Zhang et al., 2015). DM manifests itself as dense subnetworks within each pathway that can be viewed as signatures of that pathway. These epigenetic signatures are faithfully propagated over multiple cell divisions, which makes epigenetic regulation a key mechanism for cellular differentiation. Given the stability of DNA Methylation responses to drug therapy compared to mRNA and protein complexes, it is a potential candidate to targeted therapeutics studies, at least in cancer epigenetics (Jones et al., 2016). Fig. 1 shows the mechanism of DNA Methylation. This paper addresses the problem of computationally determining conserved epigenetic signatures across pathways of different cell types in an organism.
Characteristics of epigenetic patterns have not been studied much in literature. With the advent of high throughput sequencing techniques, an unprecedented opportunity is there to analyze and compare methylation profiles of different cell types. Network alignment can reveal conserved regions of differential epigenetic activity within cellular pathways.
In Zhang et al. (2015), a differential methylation correlation network is constructed to study the characteristics of methylation patterns across different cancer types. They were able to identify a large number of cancer-specific and across-cancer methylation patterns. PathBLAST (Kelley et al., 2004) is a network alignment and search tool to compare protein-protein interaction networks of multiple species. This method searches for high-scoring alignments between pairs of interacting protein paths specified by the user and returns a ranked list of such paths. NetworkBLAST (Sharan et al., 2005) is based on the observation that pathways correspond to highly interacting group of proteins. This method mines high-scoring dense subgraphs from PPI networks that are conserved across multiple species. AlignNemo (Ciriello et al., 2012) went a step forward by introducing a more greedy approach that starts from an initial seed and iteratively expands an alignment graph, exploring the local topology on each step. AlignNemo proposed a new strategy to weigh the edges of the alignment graph reflecting the confidence of interacting proteins, beyond direct interactions.
All the above methods follow a local alignment strategy, wherein, importance is given on identifying conserved subnetworks rather than maximizing the overall similarity of compared networks. Recently, a shift is observed in this tendency. IsoRank (Rohit et al., 2008) introduced the idea that incorporating PPI data in ortholog prediction helps to get a better global alignment of multiple PPI networks. ModuleAlign (Hashemifar et al., 2016) uses local topology information to define a module-based homology score. Then, functionally coherent proteins in the same module are clustered hierarchically to obtain global alignments between the input PPI networks. ModuleAlign is an improvement over HubAlign (Hashemifar and Xu, 2014) which seeks to identify functional significance of a protein based on global topology information. It uses a minimum-degree heuristic algorithm to achieve the desired mapping.
It seems that the problem of identifying conserved epigenetic signatures across pathways has not gained much attention. Except for a few notable works, most of them consider the impact of aberrant methylation patterns within the perspective of a metabolic or signaling pathway alone. Computing protein network alignment by examining the sequence similarity is of little use in epigenetic profiling, as, the variation of methylation changes does not convey change in the sequence itself. Inferring epigenetic marks that are conserved across a set of different cellular pathways or across identical cell types of different species throw light on the epigenetic diversity of cellular pathways, which in turn, warrants proper prognosis and diagnosis of cancer subtypes.
The remainder of this paper is organized as follows: Section 2 gives a detailed description of the proposed methodology. Section 3 gives the experimental results. Section 4 gives the discussion of the results and comparison with existing alignment methods. Section 5 concludes the paper with an eye on future prospects of epigenetic studies.
Section snippets
DNA Methylation, gene expression and pathway data
Human DNA Methylation data were obtained from the publicly available NCBI Gene Expression Omnibus (Matthews et al., 2002) which consisted of normalized beta values detected by the Illumina HumanMethylation27 and 450 BeadChip array across 4 different cancers types. These include lung cancer (Wilson et al., 2014), prostate cancer (Aryee et al., 2013), breast cancer (Di Cello et al., 2013) and colorectal adinoma (Naumov et al., 2013). The InfiniumMethylation hg19 annotation database in R library (
Results
To evaluate the proposed DEEPAligner method, experiments were carried out on four types of cancer datasets listed in Table 1. As far as alignment of epigenetic signature networks is concerned, it is not sufficient to evaluate the alignments based on topological similarity alone. Here, we evaluate the biological consistency of the aligned signatures based on three measures – FC, GOC and DAC, defined in Section 2.5.2.
Alignments across breast and prostate cancer training datasets are given in
Identification of cancer-specific and across-cancer methylation signatures
Aligning the pathway signatures based on their differential epigenetic activity helps us to understand the impact of methylation on different molecular subtypes of cancer. To analyze how methylation signatures impact aberrant activity in different cancer types, we explored the Gene Ontology (GO) annotations of genes that constitute aberrant signatures. The aligned signatures are very similar in differential epigenetic activity in terms of the GO biological terms involved. Table 7 shows the most
Conclusion
Biological network alignment has become the first choice for computational biologists to study functionally consistent interactions in pathways. A novel signature-based alignment method, called Deep Encoded Epigenetic Pathway Aligner (DEEPAligner), is proposed to identify conserved methylation patterns across pathways. The proposed work helps to advance science by addressing a very important open question: whether methylation changes are seen clustered as dense subnetworks at the molecular
References (35)
- et al.
Mismatch repair and DNA damage signalling
DNA Repair
(2004) - et al.
Identifying epigenetically dysregulated pathways from pathway–pathway interaction networks
Comput. Biol. Med.
(2016) - et al.
DNA methylation alterations exhibit intraindividual stability and interindividual heterogeneity in prostate cancer metastases
Sci. Transl. Med.
(2013) - et al.
AlignNemo: a local network alignment method to integrate homology and topology
PLoS ONE
(2012) - et al.
A (Sub)graph isomorphism algorithm for matching large graphs
IEEE Trans. Pattern Anal. Mach. Intell.
(2004) - et al.
Methylation of the claudin 1 promoter is associated with loss of expression in estrogen receptor positive breast cancer
PLoS One
(2013) - et al.
autoencoder: Sparse Autoencoder for Automatic Learning of Representative Features from Unlabeled Data
(2015) - et al.
Reactome knowledgebase of human biological pathways and processes
Nucleic Acids Res. Vol.
(2009) - et al.
HubAlign: an accurate and efficient method for global alignment of PPI networks
Bioinformatics
(2014) - et al.
ModuleAlign: module-based global alignment of protein–protein interaction networks
Bioinformatics
(2016)
Targeting the cancer epigenome for therapy
Nat. Rev. Genet.
Extracting coordinated patterns of DNA Methylation and gene expression in ovarian cancer
JAMIA
KEGG: Kyoto encyclopedia of genes and genome
Nucleic Acids Res. Vol.
PathBLAST: a tool for alignment of protein interaction networks
Nucleic Acids Res.
A new measure of rank correlation
Biometrika
Ten years of pathway analysis: current approaches and outstanding challenges
PLoS Comput. Biol.
Vitamin D signalling pathways in cancer: potential for anticancer therapeutics
Nature
Cited by (1)
Autoencoded DNA methylation data to predict breast cancer recurrence: Machine learning models and gene-weight significance
2020, Artificial Intelligence in MedicineCitation Excerpt :A later logistic regression classifier, trained with the encoded latent features, was able to accurately classify cancer subtypes. Visakh et al. [22] also proposed an innovative alignment method that made use of AEs to find functionally consistent and topologically sound alignments of epigenetic signatures from pathway networks. Later, those epigenetic signatures were applied to characterise several types and subtypes of breast, lung, colorectal, and prostate cancer.