DNA methylation and the core pluripotency network

From the onset of fertilization, the genome undergoes cell division and differentiation. All of these developmental transitions and differentiation processes include cell-specific signatures and gradual changes of the epigenome. Understanding what keeps stem cells in the pluripotent state and what leads to differentiation are fascinating and biomedically highly important issues. Numerous studies have identified genes, proteins, microRNAs and small molecules that exert essential effects. Notably, there exists a core pluripotency network that consists of several transcription factors and accessory proteins. Three eminent transcription factors, OCT4, SOX2 and NANOG, serve as hubs in this core pluripotency network. They bind to the enhancer regions of their target genes and modulate, among others, the expression levels of genes that are associated with Gene Ontology terms related to differentiation and self-renewal. Also, much has been learned about the epigenetic rewiring processes during these changes of cell fate. For example, DNA methylation dynamics is pivotal during embryonic development. The main goal of this review is to highlight an intricate interplay of (a) DNA methyltransferases controlling the expression levels of core pluripotency factors by modulation of the DNA methylation levels in their enhancer regions, and of (b) the core pluripotency factors controlling the transcriptional regulation of DNA methyltransferases. We discuss these processes both at the global level and in atomistic detail based on information from structural studies and from computer simulations.


Keywords: DNA methylation Stem cells Transcription factor Differentiation
A B S T R A C T From the onset of fertilization, the genome undergoes cell division and differentiation. All of these developmental transitions and differentiation processes include cell-specific signatures and gradual changes of the epigenome. Understanding what keeps stem cells in the pluripotent state and what leads to differentiation are fascinating and biomedically highly important issues. Numerous studies have identified genes, proteins, microRNAs and small molecules that exert essential effects. Notably, there exists a core pluripotency network that consists of several transcription factors and accessory proteins. Three eminent transcription factors, OCT4, SOX2 and NANOG, serve as hubs in this core pluripotency network. They bind to the enhancer regions of their target genes and modulate, among others, the expression levels of genes that are associated with Gene Ontology terms related to differentiation and self-renewal. Also, much has been learned about the epigenetic rewiring processes during these changes of cell fate. For example, DNA methylation dynamics is pivotal during embryonic development. The main goal of this review is to highlight an intricate interplay of (a) DNA methyltransferases controlling the expression levels of core pluripotency factors by modulation of the DNA methylation levels in their enhancer regions, and of (b) the core pluripotency factors controlling the transcriptional regulation of DNA methyltransferases. We discuss these processes both at the global level and in atomistic detail based on information from structural studies and from computer simulations.

Main text
Recent review articles provide excellent introductions into the fields of pluripotency (Li and Belmonte, 2017;Martello and Smith, 2014), stem cell differentiation (Alvarado and Yamanaka, 2014;Dixon et al., 2015;Jang et al., 2017;Keller, 2005), cellular programming (Prasad et al., 2016;Shi et al., 2017;Weinberger et al., 2016), and of the epigenetic changes taking place upon these transitions. Here, we will focus on the current mechanistic understanding of how epigenetic modifications and drugs targeting epigenetic enzymes exert their effects at the atomistic level. We will limit the discussion to the physiological processes related to pluripotency and not discuss epigenetic aberrations that are detected in cultivated human pluripotent stem cells as reviewed by Bar and Benvenisty (2019). Due to space limitations and also our own research interests, we will concentrate on the discussion of DNA methylation changes and will largely omit the equally important field of histone marks. We will start by briefly reviewing what pluripotent cells and DNA methylation are. Then, we will introduce the players of the core pluripotency network, followed by the role of cytosine methylation on DNA conformation and on DNA-protein interactions. The review is rounded up by a quick look at available medications and drug candidates targeting DNA methyltransferases.

The pluripotent state
Pluripotent stem cells (PSCs) are cells that have the ability to selfrenew indefinitely and to differentiate into any cell type of an adult creature. The developmental potential declines from the totipotent cell state following the onset of fertilization. Before blastocyst implantation, the epigenome of the developing embryo is transformed to the ground state of pluripotency (Reik et al., 2001), which is also termed the 'naïve' state. Cells that exhibit this form of pluripotency can be isolated from the inner cell mass (ICM) . ICM cells that are not exposed to the fluid cavity will adopt the epiblast fate and eventually differentiate into embryonic tissue. The pre-implantation epiblast is in the ground state of pluripotency ). However, after implantation, the epiblast switches into the so-called 'primed' state of pluripotency ) that is adopted by epiblast stem cells, or epiSCs (Brons et al., 2007;Tesar et al., 2007). Subsequently, in the early germ line most parental imprints are erased in primordial germ cells (PGCs) (Lee et al., 2002).
Murine pluripotent cells have been extensively studied in vivo. In comparison, we only have an incomplete understanding of how human pluripotent cells behave in vivo (Iurlaro et al., 2017), which is in part due to ethical considerations that limit the accessibility of human ESCs. Thus, in an attempt to understand the processes of reprogramming and differentiation, human and murine cells are grown in vitro to recapitulate the ICM state. Naïve pluripotent embryonic stem cells (ESCs) and embryonic germ cells (EGCs) are derived from the pre-implantation epiblast, respectively Smith, 2009, 2012). Historically, murine ESCs (mESCs) have been grown in a medium containing fetal bovine serum and leukemia inhibitory factor (LIF). This resulted in a hypermethylated population of cells that resemble the early epiblast in terms of epigenetic and transcriptional aspects, but that are nonetheless in a primed state of pluripotency. Epigenetic reprogramming is currently investigated using a (serum-to-2i) model where naïve hypomethylated mESCs are derived from primed hypermethylated mESCs in a serum-free medium, to which two small molecule inhibitors (2i), PD0325901 (PD) and CHIR99021 (CHIR) (Ying et al., 2008), are added (van den Berg et al., 2010). The former substance inhibits mitogen-activated protein kinase kinase (MAPKK) that targets fibroblast growth factor/extracellular signal-regulated kinase (FGF/ERK), whereas the latter one inhibits glycogen synthase kinase 3 (GSK3) signaling cascades and also activates signaling via the canonical Wnt pathway (Galonska et al., 2015). mESCs grown under these conditions are in their naïve state and are globally hypomethylated, but sustain methylation at imprinted regions (Ficz et al., 2013;Leitch et al., 2013).
Throughout this manuscript we will adopt the following nomenclature: Human gene symbols are generally italicized with all letters in upper-case (e.g., SHH). Murine gene symbols generally are italicized too, however, starting with a capital letter, whereas the remaining letters are in lower-case (Shh). In contrast, murine and human protein names are typically not italicized and all letters are upper-case (SHH). To distinguish human and murine proteins, we will use a short superscript prefix h SHH and m SHH, respectively, whenever needed.
Cells in the naïve state of pluripotency can differentiate into all three germ layers in vitro, form teratomas and chimaeric embryos in vivo, and are capable of tetraploid complementation (Evans and Kaufman, 1981;Martin, 1981;Nagy et al., 1993). PSCs derived from post-implantation epiblasts (epiSCs) are in the primed state of pluripotency and can only contribute to chimaera formation in the post-implantation stage (Brons et al., 2007;Huang et al., 2012;Kojima et al., 2014;Rossant, 2008;Tesar et al., 2007).
It is believed that the process of reprogramming in (serum-to-2i) medium can yield a general molecular understanding of how the state of pluripotency is established. Essentially, pluripotency is promoted by epigenetic features such as global DNA hypomethylation, DNA hypermethylation of imprinted gene loci (Okita et al., 2007), silencing of retroviral transgenes (Cherry et al., 2000), reactivation of the X chromosome in female iPS cells, and reorganization of chromatin fibers (Fussner et al., 2011;Meshorer and Misteli, 2006), e.g. in the promoters and enhancers of genes that are regulated during development (Mikkelsen et al., 2008;Spivakov and Fisher, 2007). One crucial change during reprogramming is the reactivation of endogenous pluripotency genes Takahashi et al., 2007;Takahashi and Yamanaka, 2006). Beyond this, Li & Belmonte recently reviewed various types of post-transcriptional controls, particularly those induced by RNA-binding proteins and alternative splicing, as a further important regulatory layer of pluripotency (Li and Belmonte, 2018). Fig. 1 illustrates schematically how epigenetic marks differ between the naïve and primed/differentiated states.

Core pluripotency factors
Core pluripotency factors are those key transcription factors and cofactors, such as microRNAs and other proteins that are either able to reprogram cells into the pluripotent state or maintain the cells in this state (Orkin et al., 2008). Experimental and computational studies have characterized a tightly connected set of core transcription factors that maintain murine ESC self-renewal under defined conditions Dunn et al., 2014;MacArthur et al., 2012;Niwa et al., 2009). Three eminent transcription factors, h/m OCT4, h/m SOX2 and h/m NANOG, serve as hubs in this pluripotency network and are being used as genetic markers for induced pluripotent stem cells (iPSCs) (Huangfu et al., 2008;Park et al., 2008;Takahashi et al., 2007;Takahashi and Yamanaka, 2006;Yu et al., 2007). m OCT4 (also called pou5f1) is the octamer-binding S. Shanak, V. Helms Developmental Biology 464 (2020) 145-160 transcription factor 4 that is crucial for both in vitro and in vivo pluripotency (Nichols et al., 1998;Scholer et al., 1989). OCT4 is a member of the family of POU proteins, which consist of two well-conserved DNA-binding domains connected by a variable linker region (Tantin, 2013). m SOX2 (short for SRY-box 2) regulates OCT4 expression in ESCs and is involved in the process of epiblast formation (Avilion et al., 2003;Masui et al., 2007). The presence of m NANOG promotes the acquisition of pluripotency in the inner cell mass (Silva et al., 2009), and m NANOG is essentially involved in self-renewal of pluripotent cells (Mitsui et al., 2003). m NANOG also drives leukemia inhibitory factor (LIF)-independent self-renewal in pluripotent cells (Chambers et al., 2003(Chambers et al., , 2007Mitsui et al., 2003;Silva et al., 2009;Suzuki et al., 2006). In addition, genome-wide binding of m NANOG alone can induce the state transition from naïve ESCs to primed epiblast-like cells independent of bone morphogenetic protein 4 ( m BMP4), and this is associated with epigenetic resetting of regulatory genes and activation of the enhancers of key germline transcription factors (Murakami et al., 2016). Fig. 2 illustrates the auto-regulatory and cross-regulatory effects among Oct4, Sox2, and Nanog. Under specific conditions, some of these three core factors may also drive cell differentiation. For example, overexpression of Oct4 or Sox2 leads to differentiation of the germ layer. On the other hand, trophectoderm differentiation is promoted by knocking out Oct4 or downregulation of Sox2 (Ivanova et al., 2006;Nichols et al., 1998). The core pluripotency gene regulatory network involves protein-protein and protein-DNA interactions that may stabilize any of the two states, i.e., pluripotency or differentiation. One of the key interests in the stem cell field is to unravel what types of molecular changes (overexpression or downregulation of key genes and microRNAs) induce transitions between these two states (Boyer et al., 2005;Dunn et al., 2019).
Besides that, core pluripotency genes can also show differential enhancer binding in primed versus naïve pluripotency, where some factors switch from the Oct4 proximal (serum) to the distal (2i) element (Tesar et al., 2007). For example, upon being switched to 2i medium, the proximal enhancer of the Oct4 gene shows decreased binding to Nanog, whereas the distal enhancer shows an increased binding to Sox2 under the same conditions (Galonska et al., 2015). Whereas all three core transcription factors ( m OCT4, m SOX2 and m NANOG) are critically involved in maintaining the pluripotent state, there exist clear differences between NANOG on the one hand, and the other two transcription factors, m OCT4 and m SOX2, on the other hand. Oct4 and Sox2 are strongly expressed in both the naïve and primed-state pluripotency, whereas Nanog is only highly expressed in the naïve state of pluripotency (Chambers et al., 2007). Other murine pluripotency genes are also expressed in the absence of Nanog (Silva and Smith, 2008). When Nanog is knocked down in pluripotent stem cells, they can still self-renew but nonetheless have a high propensity for differentiation, unlike to wild-type cells (Chambers et al., 2007;Festuccia et al., 2012). As such, Nanog seems to protect pluripotent cells from signals that induce differentiation.
Dunn and co-workers recently presented a computational model describing state transitions (on/off) in a core pluripotency network of 13 murine transcription factors (Dunn et al., 2019). They set up a so-called Boolean network that describes maintenance of naïve state ESCs. Interestingly, the model was also able to predict transcription factor behavior and potency during resetting from primed pluripotency. Computationally generated gene activation profiles were experimentally confirmed (with a predictive accuracy of 77%) at single-cell resolution by RT-qPCR.

Co-occupancy of core pluripotency transcription factors
The three core pluripotency transcription factors bind to the enhancer regions of their target genes and modulate, among others, the expression levels of genes that are associated with Gene Ontology terms related to differentiation and self-renewal (Young, 2011). Cooperative binding was found between h/m OCT4 and h/m SOX2, which were shown to form heterodimers , see below. The two transcription factors jointly bind and regulate the expression of many other genes (Chang et al., 2017), including the OSN triad itself in human and in mouse (OCT4, SOX2 and NANOG) (Okumura-Nakanishi et al., 2005;Rodda et al., 2005;Tomioka et al., 2002). Many murine genes regulated by m OCT4 and m SOX2 are also bound by m NANOG .
Molecular details of the formation of the m SOX2/ m OCT4 complex were explored via single-molecule imaging. First, m SOX2 dynamically engages with the target DNA sites and prepares them for m OCT4 binding. Then, binding of OCT4 stabilizes the heterodimeric m OCT4-m SOX2 complex on the target binding sites (Chen et al., 2014). The highly conserved h OCT4:Lys156 residue provides stability to the h OCT4 protein, and a salt bridge between h OCT4:Lys151 and h Sox2:Asp107 contributes to the stability of the h OCT4-h SOX2 complex (Pan et al., 2016). In bladder cancer patients, post-translational modifications of Lys156 impaired the Lys151-Asp107 salt bridge and the h OCT4-h SOX2 interaction. This resulted in the upregulation of mesendodermal genes and a subsequent epithelial-mesenchymal transition (Pan et al., 2016). The h OCT4-h SOX2  heterodimer can bind to DNA in two alternative ways. In the first arrangement ( Fig. 3(a)), the two TF binding motifs are separated by three base pairs. This conformation is seen, for example, in the Fgf4/FGF4 promoter that is involved in embryonic development and morphogenesis (Jauch et al., 2011;Li and Belmonte, 2017;Tapia et al., 2015). In the other arrangement, the so-called no-gap motif, the binding motifs of h OCT4 and h SOX2 are arranged next to each other ( Fig. 3(b)). This conformation is the canonical one found in the regulatory motifs of h OCT4, h NANOG, and h UTF1 (encoding undifferentiated embryonic cell transcription factor 1) (Pan et al., 2016), and it is crucial for somatic cell reprogramming and pluripotency in mouse and human (Li and Belmonte, 2017;Tapia et al., 2015).
Human and murine OSKM (OCT4, SOX2, KLF4, and MYC) are expressed both in vitro and in vivo in the inner cell mass of the blastocyst . Several other transcription factors, cofactors, and co-repressors serve as additional layers of the pluripotency regulatory circuitry by occupying the regulatory sequences of hundreds of target genes, including their own promoters and enhancers. As a result, several cascades that comprise feedback and feedforward loops stabilize the pluripotent state (Adamo et al., 2011;Hackett and Surani, 2014;Young, 2011). Two prominent examples are the transcription factors m c-MYC and m KLF4. c-Myc is a proto-oncogenic target gene of m LIF-m-STAT3 signaling that stimulates cell proliferation and self-renewal (Cartwright et al., 2005). Klf4 is a pluripotency factor whose expression may promote LIF-independent self-renewal ). Furthermore, m LIN28 regulates stem cell metabolism and conversion to the primed pluripotent state (Zhang et al., 2016a). β-catenin is a regulator of the WNT signaling cascade (Moon et al., 2002), and was shown to safeguard normal DNA methylation levels of murine m ESCs and regulate genome stability (Theka et al., 2019). Esrrb is a m NANOG target that can replace m NANOG in murine NANOG-KO cells via inhibiting the Gsk3 signaling pathway (Festuccia et al., 2012). Krüppel-like factor 2 ( m KLF2) is a protein that is crucial in the naïve ground state of pluripotency. Ectopic expression of Klf2 can replace Mek/Erk inhibition in murine m ESC (Yeo et al., 2014). h REX1 (short for reduced expression 1) is a protein whose addition to the reprogramming pool improves the reprogramming kinetics (Son et al., 2013). h/m STELLA is a pluripotency marker that is upregulated during the naïve state of pluripotency (Theunissen et al., 2014;Weinberger et al., 2016). m PRDM14 contributes to the regulation of pluripotency either by antagonizing the signaling of fibroblast growth factor receptor ( m FGFR), or by inhibiting synthesis of DNA methyltransferases (Yamaji et al., 2013).

Co-activators and co-repressors
During the processes of self-renewal of pluripotent cells and differentiation, a set of further transcriptional co-activators and co-repressors regulates the core transcription factors and aids them in their action. Some of these protein complexes do not bind to DNA directly but rather act via chromatin-mediated mechanisms. Examples for this are the protein complexes mediator and cohesin, which are crucial to the 3D genome organization, and facilitate physical as well as functional interactions between the enhancers and core promoters of the activated genes by forming chromatin loops in interphase nuclei of human and mouse (Gorkin et al., 2014;Kagey et al., 2010;Li et al., 2012). For example, binding of OCT4, SOX2, KLF4, mediator, and cohesin to the upstream enhancer of OCT4 facilitates the formation of contacts with the OCT4 promoter to induce expression of OCT4 (Kagey et al., 2010;Wei et al., 2013;Zhang et al., 2013). As shown by circular chromosome conformation capture with high-throughput sequencing (4C-seq), depletion of KLF4 causes breaking of the long-range interactions between enhancer and promoter, unloads cohesin off the enhancer, eliminates the OCT4 enhancer-promoter interaction, inhibits OCT4 expression, and induces differentiation (Wei et al., 2013;Zhang et al., 2013). Furthermore, knockdown experiments of Med12 (component of mediator complex) or Smc1 (component of cohesion complex) in ESCs resulted in the repression of genes regulated through cohesin-mediator interactions (Phillips-Cremins et al., 2013).
Several other cofactors engage in this process by establishing proteinprotein interactions with the core transcription factors. For example, the RNA polymerase-associated factor 1 ( m PAF1) complex physically interacts with the m OCT4 protein and thereby contributes to maintaining of self-renewal (Ponnusamy et al., 2009). The binding of the 60 kDa Tat-interactive protein ( m TIP60)-m p400 chromatin-remodeling complex to its target promoters appears to be driven by the binding of m NANOG in ESCs and histone H3 lysine 4 tri-methylation (H3K4me3) signals separately (Fazzio et al., 2008). Additionally, the co-repressors CCR4-NOT transcription complex subunit 3 ( m CNOT3) and tripartite motif-containing protein 28 ( m TRIM28) co-occupy many gene promoters together with m c-MYC and m ZFX and thus aid in promoting self-renewal of embryonic stem cells. Nonetheless, mCNOT3 and m TRIM28 show no interaction with any component of the core pluripotency (OSN) triad, what suggests that a different module is active in the self-renewal network than in the core pluripotency module (Hu et al., 2009).
Estrogen-related receptor b (Esrrb) is one of the few prominent ΔNanog-responsive genes, i.e., genes whose expression patterns are affected when cells lack Nanog. Normally, m NANOG binds to the Esrrb locus and promotes the binding of RNA Polymerase II and downstream Esrrb transcription. When Esrrb is overexpressed, pluripotency and selfrenewal are preserved. However, it is noteworthy that Nanog(-/-) ESCs possess the same activity. Moreover, Esrrb can reprogram cells lacking Nanog (Festuccia et al., 2012). Orthodenticle homeobox 2 (Otx2) was recently shown to function during the transition from the naïve to primed states of pluripotency, whereas Otx2-null ESCs maintain the naïve pluripotent state. Nanog, Oct4, and Sox2 are direct targets of m OTX2. The strongest OTX2-binding site was found in the Nanog promoter, where m OTX2 enhances the expression of Nanog. When the amount of Nanog is low or is totally absent, this is causing cells to be redirected to the naïve state (Acampora et al., 2016).
The orphan nuclear hormone receptor m NR0B1 (Nuclear receptor subfamily 0, group B, member 1), also termed m DAX1, was identified as an OCT4-interacting protein in a yeast two-hybrid screen. m NR0B1 binds to the POU-specific domain of m OCT4. In ESCs, m NR0B1 acts as a repressor of Oct4 and abolishes its DNA binding activity. m NR0B1 also decreases the activities of Nanog and Rex1 promoters (Sun et al., 2009). Overexpression of Nr0b1 maintains self-renewal of pluripotent stem cells . m NR0B1, alone or in cooperativity with m OCT4, inhibits trophectoderm differentiation. Both ESCs and induced pluripotent states are kept in the pluripotent state by the synergistic activities of m NANOG and m NR0B1 . Testis-expressed sequence 10 protein ( m TEX10) has been revealed via immunoprecipitation as a novel transcription cofactor in the core pluripotency network that forms complexes with the m SOX2 protein in ESCs, and functions in ESC maintenance and efficient reprogramming. m TEX10 is enriched at ESC-specific distal enhancers and functions as a co-activator, where it modulates histone acetylation and DNA demethylation by recruiting mp300 and m TET1 . CBFA2/RUNX1 translocation partner 2 ( h CBFA2T2) is another co-repressor that is important for the regulation of pluripotency. h CBFA2T2 was found to interact and colocalize throughout the genome with the PR domain-containing 14 ( h PRDM14) (Tu et al., 2016), a pluripotency factor that regulates DNA methylation in mouse Yamaji et al., 2013). Overexpression of CBFA2T2 also enhances iPSC reprogramming efficiency in a similar manner as PRDM14. h CBFA2T2 functions synergistically with core pluripotency factors, where it oligomerizes to form a scaffold that stabilizes the binding of h OCT4 and h PRDM14 (Tu et al., 2016).
3. How does cytosine methylation affect DNA conformation and DNA-protein interactions?

DNA methylation of cytosine bases
In mammals, DNA methylation at the 5-position of cytosine (5 mC) plays a key role in various processes, including maintenance of genomic integrity, regulation of transcription, and genomic imprinting, X chromosome inactivation, as well as during reprogramming and differentiation (Bird, 2007;Bogdanovic and Lister, 2017;Goldberg et al., 2007;Sasaki and Matsui, 2008). DNA methylation of promoter regions is believed to generally inhibit gene expression, possibly through altering chromatin structure (Razin, 1998). Also, DNA methylation of enhancer regions is considered as a sign of transcriptional inactivity (Sharifi-Zarchi et al., 2017). On the other hand, so-called gene body methylation, i.e. high levels of DNA methylation in exons, is frequently encountered even for actively transcribed genes but so far not fully understood. Further below, we will summarize our current understanding of how DNA methylation affects the three-dimensional conformation of DNA and its dynamics.
With respect to the focus of this review article, DNA methylation dynamics is also pivotal during embryonic development and in the process of reprogramming murine stem cells to the naïve state of pluripotency, where global demethylation is a cornerstone . Initially, oocyte and sperm show intermediate to high levels of DNA methylation. After fertilization and before the two nuclei merge, the genomes inherited from both parent mice undergo global demethylation (Messerschmidt et al., 2014). In murine and human PGCs, global DNA demethylation is coupled to the erasure of histone H3K9 di-methylation (H3K9me2) (Eguizabal et al., 2016) and strong gains of histone H3K27 tri-methylation (H3K27me3) (Seki et al., 2005(Seki et al., , 2007). Yet, some loci and regulatory elements, such as imprinted genes and transposable elements, maintain their DNA methylation levels (Hirasawa et al., 2008;Sasaki and Matsui, 2008). Indeed, DNA methylation has a pivotal role in reshaping the 3D genome structure and in the involvement of polycomb complexes in 3D genome re-organization in naïve pluripotency (McLaughlin et al., 2019).
Global genome DNA demethylation during reprogramming and its subsequent re-methylation are orchestrated processes resulting from the dynamic interplay between three main routes, (1) de novo methylation by the DNA methyltransferases m DNMT3a/ m DNMT3b, whereby new methyl marks are acquired, (2) maintenance methylation by m DNMT1, whereby methyl marks are maintained during replications, and (3) active demethylation. Active demethylation is a replication-independent mechanism that involves the action of Ten-eleven translocation ( m TET) proteinmediated iterative oxidation Ito et al., 2010Ito et al., , 2011Tahiliani et al., 2009), followed by the excision of oxidized bases by thymine DNA glycosylase ( h/m TDG) Kohli and Zhang, 2013;Maiti and Drohat, 2011). We will now summarize our current structural understanding of the protein-DNA complexes associated with the processes just mentioned.

Protein-DNA complexes
Proteins recognize specific DNA sequences by two general mechanisms: The 'direct readout' mechanism involves the formation of hydrogen-bonds between the side chains of amino acids and hydrogenbond donor and acceptor atoms of the target nucleotide bases. For example, bases in the major groove were found to have distinctive hydrogen-bond signatures (Garvie and Wolberger, 2001). On the other hand, the 'indirect readout' mechanism encompasses deviations from the canonical B-DNA conformation and subsequent conformational changes that optimize the protein-DNA interface (Otwinowski et al., 1988;Travers, 1989). In this readout, no direct contacts of DNA bases are needed with the target protein (Rohs et al., 2009).

Structural insights on the DNA maintenance methyltransferase DNMT1
h/m DNMT1 is active during the S phase of the interphase/cell cycle when it copies methylation patterns from the parental strand to the newly synthesized daughter strand (Esteve et al., 2011;Leonhardt et al., 1992). h DNMT1 is recruited to replication foci by the proliferating cell nuclear antigen (PCNA) and other factors including h UHRF1 (Esteve et al., 2006). Von Meyenn et al. found that the major drivers of global demethylation in naïve pluripotent stem cells are of passive rather than active nature . In this respect, impaired maintenance of pre-existing methyl groups takes place when m UHRF1 and m DNMT1 are repressed at the protein level. Additionally, m UHRF1 mutants that are unable to bind H3K9me2/3 cannot recruit m DNMT1 to the replication foci . h DNMT1 interacts directly with h SUV39H1, h SUV39H2 or h G9a, and this interaction may function in recruiting both histone methyltransferases to suitable binding sites during replication (Esteve et al., 2006).
As suggested by the available crystal structures of DNMT1:DNA complexes, h/m DNMT1 appears to establish a balance between autoinhibitory and active mechanisms in DNA recognition, thus ensuring that h/m DNMT1 catalyzes methylation of the hemi-methylated form of DNA, but not of the unmethylated form . When viewed from N-terminus to C-terminus, h DNMT1 is composed of an N-terminal regulatory domain, a conserved (Gly-Lys) n repeat, and a C-terminal methyltransferase domain. The regulatory N-terminal domain consists of a nuclear localization sequence, sequences responsible for the interaction of h DNMT1 with other proteins (Chuang et al., 1997), a domain responsible for allocating DNMT1 to the DNA replication fork (Leonhardt et al., 1992), two bromo-adjacent homology (BAH) domains that play an important role by linking DNA methylation, replication as well as transcriptional regulation (Callebaut et al., 1999), and a zinc finger CXXC (Cys-X-X-Cys) domain that recognizes unmethylated DNA sequences with high specificity (Pradhan et al., 2008). The C-terminal methyltransferase domain contains two subdomains, the catalytic domain and the target recognition domain (TRD).
Upon formation of a complex between DNMT1 and hemimethylated DNA, the TRD domain "leans" toward the DNA major groove by about 2-3 Å (Song et al., 2012) in comparison to the unbound conformation of DNMT1. This conformational transition induces an opening in the central dinucleotide step of DNA, guanine (G7) is translated by one step along the DNA helix toward the 3 0 end, and the downstream residue C8 is flipped out of the DNA helical conformation. The catalytic loop of DNMT1 penetrates into the DNA from the minor groove via the side chain of Met1235, and occupies the space vacated by the extruded flipped out base fC7' carbon atom on the target strand (Song et al., 2012). Song et al. also determined structures of partial mouse and human DNMT1 bound to unmethylated DNA. Specific binding of the CXXC to an unmethylated CpG dinucleotide was found to induce structural changes such that the CXXC-BAH1 linker is positioned between the active site of DNMT1 and the DNA sequence. As a result, DNA methylation cannot take place. In addition, the target recognition domain (TRD) is inhibited when a loop of BAH2 interacts with it, thus preventing it from penetrating into the major groove of DNA .

Structural insights on DNA de novo methyltransferases DNMT3a/3b/ 3L
m DNMT3a and m DNMT3b have major roles in de novo methylation in mammalian germ cells (La Salle and Trasler, 2006) and during development. h DNMT3L is a paralogue of these two proteins. It is enzymatically inactive because it cannot bind the cofactor S-adenosyl-L-methionine at its methyltransferase catalytic domain (CD). A crystal structure determined by Jia and colleagues shows h DNMT3a and h DNMT3L in a tetrameric configuration (DNMT3L-DNMT3a-DNM-T3a-DNMT3L) whereby two h DNMT3a-h DNMT3L interfaces and one h DNMT3a-h DNMT3a interface are formed. h DNMT3L plays a role in stabilizing the active loop of h DNMT3a. The two active sites of the h DNMT3a-h DNMT3a homodimer are separated by a single helical turn of the DNA . h DNMT3L was found to be capable of activating DNA methyltransferase  and binding to the unmethylated lysine 4 on the histone 3 tail (Ooi et al., 2007) via the ATRX-DNMT3-DNMT3L (ADD) domain of h DNMT3a, thus recruiting it to the chromatin (Otani et al., 2009). Guo et al. determined crystal structures of DNMT3a-DNMT3L in the active form when being bound to H3, and in the inactive form. When H3 is not bound to the DNMT3a-DNMT3L dimer, the ADD domain interacts with the CD domain of DNMT3a and inhibits it. However, the presence of H3 interferes with this interaction and blocks the autoinhibitory process (Guo et al., 2015).
DNMT3b seems to be more important for embryonic development than DNMT3a. This is suggested by a study conducted by Okano and colleagues, where DNMT3a-null mice could survive until delivery, whereas no DNMT3b-null viable mice were retrieved at birth (Okano et al., 1999). Additionally, DNMT3b was found to assist the differentiation process of human ES cells. The de novo methyltransferases m DNMT3a/b are downregulated in naïve pluripotent cells, and there is a minor role for m TET1-3 mediated demethylation affecting a short portion of the genome (van den Berg et al., 2010). Reprogramming into the induced state of pluripotency was activated in knockdown experiments of h DNMT3b via an ectopic expression of the four pluripotency genes, OCT4, SOX2, c-MYC, and KLF4 (Wongtrakoongate et al., 2014). The C-terminal CD domain and the N-terminal ADD domain are not exclusive to h DNMT3a, but are also part of the h DNMT3b protein. In addition, the two proteins possess a PWWP domain, which is a member of the Royal superfamily of domains that bind simultaneously to histone h H3K36me3 and DNA via a conserved "aromatic cage". Additional layers of DNA methylation as well as a crosstalk with other epigenetic marks can take place, as the PWWP domain can cooperate with other DNA reader or modifier proteins (Qin and Min, 2014). Rondelet et al. determined the first crystal structure of the PWWP domain of h DNMT3b in complex with histone h H3K36me3. They revealed a crucial conserved water molecule that mediates the interaction between trimethylated Lys36 and the Ser270 residue of h DNMT3b. The trimethyl ammonium group is stabilized in the cage by four conserved residues, a phenylalanine, two tryptophan residues and an aspartate (Rondelet et al., 2016) that are conserved in the PWWP domains of h DNMT3a and h DNMT3b.

Structural insights into Tet-mediated demethylation
Three TET (ten-eleven translocation) proteins, m TET1-TET3, have been identified so far and were shown to possess 5 mC to 5hmC oxidizing activity (Ito et al., 2010;Tahiliani et al., 2009). Although replication-dependent passive dilution is the major driving factor for zygotic DNA demethylation, m TET3 was found to actively contribute to the demethylation process as long as DNA replication takes place, especially in maternal pro-nuclei (Shen et al., 2014), and to some degree in paternal pro-nuclei as well (Wossidlo et al., 2011). This finding supports the crucial role of active demethylation (Okamoto et al., 2016) during the erasure of CpG methylation (5 mC) in PGCs, which occurs via the conversion to 5-hydroxymethylcytosine (5hmC) by TET1 and TET2. Increased expression of TET1, TET2, and transiently elevated levels of 5hmC have been reported in both human and mouse PGCs (Tang et al., 2015). Mice knockouts for either Tet1 or Tet2 are viable. Additionally, Tet1 and Tet2 double-knockout ESCs remain pluripotent (Dawlaty et al., 2013).
Global DNA demethylation in PGCs occurs passively in one stage. In a next stage, m TET-mediated locus-specific DNA demethylation affects imprinting control regions (ICRs) and meiotic genes. The two-stage mechanism maintains the ability to transmit DNA from parent to offspring (Hargan-Calvopina et al., 2016). The Tet-mediated oxidation of 5 mC is followed by the excision of the oxidized bases by thymine DNA glycosylase ( m TDG). In a study by Guo and colleagues, the demethylation process was unaffected by the deletion of m TDG from the zygote, which hints at the involvement of other demethylation mechanisms that are TDG-independent (Guo et al., 2014).
Full-length TET1 and TET3 of human and mouse have a CXXC-type zinc finger domain at the amino terminus, whereas the CXXC domain originally belonging to TET2 became a separate gene encoding the IDAX (or CXXC4) protein Pastor et al., 2013). In mouse, expression of full-length Tet1 is limited to early developmental stages whereas somatic cells express a truncated isoform of Tet1 lacking the N-terminal CXXC domain (Zhang et al., 2016c). The truncated form of m TET1 has weaker demethylation capacity and a reduced level of chromatin binding compared to the full-length form (Zhang et al., 2016c). In human, the truncated form of TET1 without the CXXC domain is repressed in embryonic stem cells, but is active in embryonic and adult tissues. Two truncated forms of Tet3 exist in mouse. One of them, the Tet3o isoform, is exclusively expressed in oocytes. The full-length isoform containing the CXXC domain (Tet3FL) may play a crucial protective role against neurodegeneration (Jin et al., 2016).
At their C-terminal ends, h/m TET proteins have a catalytic domain, which is composed of a double-strand β-helix (DSBH) and a domain rich in cysteines. The catalytic domain contains a nuclear localization sequence and can oxidize 5 mC Ito et al., 2010;Tahiliani et al., 2009). h TET proteins are Fe 2þ /α-ketoglutarate (Fe(II)/α-KG)-dependent dioxygenases. In a crystal structure of h TET2 bound to methylated DNA, two of the three zinc fingers bring the cysteine-rich domain and the DSBH domain into close proximity. The catalytic cavity contains Fe(II) and an analog of α-KG. The 5 mC inserts into the cavity and is positioned close to the Fe(II) to be ready for oxidation . h TET2 was shown to specifically recognize CpG dinucleotides in different oxidation forms although the 5 mC methyl group apparently does not form direct TET2-DNA contacts . DNA bound to the binding interface of h TET2 is pushed into a narrow groove formed by loops L1 and L2 surrounding the DSBH core. The TET2-DNA binding interface is rich in hydrophobic interactions and hydrogen-bonds. A water-mediated hydrogen-bond is formed between the guanidinium group of Arg1262 of h TET2 and methylated cytosine (mC6) (mediated by the oxygen in the pentose ring of the C6 nucleotide). Further hydrogen-bonds are established between the phosphate groups of mC6 and neighboring nucleotides and lysine, arginine, and serine residues at the TET2 interface. In addition, hydrophobic contacts exist in the minor groove between C5:G5 0 and G8:C8'. The side chains of Met1293 and Tyr1294 push G6' out of its normal base stacking position between G5' and mC7' and take its place. As a result, flipping of mC6 out of the duplex renders the methylated cytosine "vulnerable" in the catalytic pocket .

Methyl-CpG-binding proteins
Methyl-CpG-binding proteins that recognize methylated base pairs are called 'reader' or 'effector' proteins. Reader proteins can be classified into three families based on the domain type with which they interact with DNA : The first family are the methyl-CpG-binding domain ( h/m MBD) proteins MBD1, MBD2, MBD4, and MeCP2 that recognize methylation in fully methylated CpG dinucleotides (Zou et al., 2012). The second family are the SET and RING-finger associated (SRA) domain proteins that recognize hemi-methylated DNA sequences. This family includes m URHF1 (its human ortholog is called h ICBP90) and h URHF2 (Hashimoto et al., 2009). The third family includes Kaiso and Kaiso-like C2H2 zinc-finger proteins, e.g., h ZBTB4, h ZBTB38, h ZFP57, and h KLF4. Proteins in this family preferentially bind to methylated CpG within a longer specific DNA sequence (Filion et al., 2006;Lopes et al., 2008).
The protein methyl-binding CpG domain protein 2 ( h MBD2) was found to play an indispensable role for the two opposing processes of pluripotency and differentiation. Therefore, one may wonder how this is achieved. In fact, via alternative splicing, h MBD2 actually codes for two protein isoforms, h MBD2a and h MBD2c, that each perform a discordant function . Both h MDB2a and h MBD2c are enriched at the promoter regions of the core pluripotency genes, h OCT4, and h SOX2 . The OCT4 protein, a splicing regulator termed serine-arginine-rich splicing factor 2 (SRSF2) and splice products of MBD2 (MBD2a and MBD2c) participate in a positive feedback loop that stabilizes a self-renewing pluripotent ground state of the cell. Remarkably, decreased levels of SFRS2 or of OCT4 in hESCs led to an increase in the MBD2a isoform, but also to a decrease in MBD2c. MBD2a promoted hPSC differentiation via the interaction with repressive NuRD chromatin remodeling factors, whereas the expression of MBD2c boosted reprogramming of fibroblasts to pluripotency .
Both direct and indirect readouts appear to be involved in the recognition at the protein-DNA binding interface when h/m MBD binds to the major groove of methylated DNA. X-ray analysis of a methylated DNA-MBD complex of the MBD domain of MeCP2 with methylated DNA revealed (Ho et al., 2008) tightly bound crystal water molecules, or 'structural waters', at the binding interface, which contributed to a total of 23 hydrophilic contacts with the two methylated cytosines. It is worth pointing out that two of the structural waters formed CH⋯O interactions (Gu et al., 1999) with the two methyl groups of the mCpG dinucleotide pair (Ho et al., 2008) meaning that the mC5 methyl groups form water-bridged hydrophilic contacts with h MBD. This is unlike an MBD1:DNA complex, where the methylated CpG site makes contacts with five conserved protein residues forming a hydrophobic patch on the MBD1 surface (Ohki et al., 2001).
Furthermore, in the atomistic structures of MBD2-mDNA and MeCP2-mDNA complexes from chicken, arginine residues at the binding interfaces stacked with the methyl-cytosines and formed hydrogen-bonds with the adjacent guanine in the mCpG step (Scarsdale et al., 2011). This arrangement has been termed 'stair motif' and involves two conserved arginines (Zou et al., 2012). A molecular dynamics simulation study aimed at revealing further mechanistic principles through which h MBD recognizes methylated DNA (Zou et al., 2012). In contrast to experiments, such computer simulations are able to compare the physiological scenario where h MBD binds to methylated DNA to the hypothetical case when h MBD is put into contact with non-methylated DNA. In simulations of the complex with methylated DNA, hydrogen-bonds between Arg22 and Arg44 and the guanine bases of the mCpG step showed very small fluctuations and shifted the guanines to the minor groove. Thus, the stacking between guanine and 5-methyl-cytosine was reduced, whereas that of the arginines with either base was increased. Also, in the methylated form, tighter hydrogen-bonds having smaller length fluctuations were established between the arginines and cytosines (root mean square deviation (RMSD) 1.0 Å and 0.8 Å, respectively) than in the hypothetical complex with non-methylated DNA (RMSD 1.8 Å and 1.5 Å) (Zou et al., 2012).
Schenkelberger et al. studied the role of the methyl-CpG binding domain of MeCP2 as a potential transcriptional modulator of the BDNF (Brain-Derived neurotrophic factor) promoter (Schenkelberger et al., 2017). In a cell-free expression translation extract from E. coli, h MBD2 functioned as a specific methylation-and sequence-dependent inhibitor. According to molecular dynamics simulations that were part of this study, we noticed changes in the fractional occupancy of the B-DNA conformer, as well as in the handedness and twisting of DNA in the methylated form upon MBD2 binding. These are characteristic for cooperative conformational transitions in the promoter region. Also, lower handedness values of DNA were adopted upon binding (untwisting took place), and the B-DNA conformation was partially disrupted altogether. The major groove width also reduced upon binding (Schenkelberger et al., 2017). In another purely computational study, we showed that methylation entropically favors the binding of the MBD domain of the human MeCP2 protein to C5-cytosine methylated DNA. The contribution of the binding enthalpy (bonded interactions, electrostatic interactions, van der Waal interactions, Poisson Boltzmann solvation, and surface area contribution) was found to be very small . Fig. 4 displays a representative snapshot from the molecular dynamics simulations illustrating the contact surface formed between MeCP2 and the DNA.

Mechanistic insights into UHRF1 binding
h/m UHRF1 is a multi-domain protein that binds to methylated DNA with its SRA domain via a base-flipping mechanism (Arita et al., 2008;Avvakumov et al., 2008;Hashimoto et al., 2008). With its tandem Tudor domain, h UHRF1 also binds to methylated Lys9 on histone h H3 (H3K9me2) (Rottach et al., 2010). In mouse, mutations that targeted the two domains resulted in decreased levels of maintenance methylation . m UHRF1 is an essential cofactor that supports DNMT1 in maintenance methylation during DNA replication, which is a crucial event for epigenome inheritance (Kurimoto et al., 2008). m DNMT1 is recruited to the replication fork by the proliferative cell nuclear antigen ( m PCNA) and to hemi-methylated sites by m UHRF1 (Bostick et al., 2007;Sharif et al., 2007). Global demethylation in naïve embryonic stem cells appears to be driven by reduced protein levels of m UHRF1 and m DNMT1 (Iurlaro et al., 2017). Upon transition to the 2i stage, there is a global loss of m UHRF1 in concurrence with the loss of m H3K9me2 methylation required for chromatin binding of m UHRF1 as mentioned before (van den Berg et al., 2010). In murine imprinting control regions (ICR), H3K9 methylation functions as an anchor of local maintenance DNA methylation against global DNA demethylation and is assisted by several DNA binding proteins such as KAP1 and ZFP57 (Quenneville et al., 2011). Moreover, UHRF1 has an E3 ubiquitin-protein ligase activity (Nishiyama et al., 2013).
Recently, two studies explored the molecular details of the interaction between UHRF1, methylated CpGs and histone methylation. Fang et al. proposed that the binding of h UHRF1 to hemi-methylated DNA induces conformational changes that enhance its ability to recognize h H3K9me3. In addition, this resulted in a downstream interaction of the SRA domain with the RFTS sequence on h DNMT1 (Fang et al., 2016). In parallel, Harrison et al. showed that binding of h UHRF1 to hemi-methylated DNA and h H3K9me2/me3 via reciprocal cooperativity of DNA and histone binding domains, activates ubiquitylation of multiple lysines on the h H3 tail in the vicinity of the h UHRF1 histone-binding site (Harrison et al., 2016). This binding process was found to be required for DNA methylation but nonetheless is dispensable for chromatin interactions (Harrison et al., 2016). Fig. 4. MeCP2 bound to the methylated BDNF promoter sequence. Shown is a snapshot after 100 ns of molecular dynamics simulation. MeCP2 is shown in grey surface representation. The two methylated cytosine bases are shown as stick models.

Structural insights into C2H2 zinc finger proteins
The transcription factor Kaiso induces transcriptional repression of its target genes. In particular, Kaiso from Xenopus laevis binds specifically to the promoters of several genes of the Wnt signaling pathway (Kim et al., 2004;Park et al., 2005) that contributes to the maintenance of pluripotency in mouse and human ES cells, and promotes reprogramming of somatic cells to pluripotency (Marson et al., 2008). Interestingly, the Kaiso zinc finger DNA-binding domain uses similar mechanisms to bind either to the non-methylated, sequence-specific DNA target KBS, or to the promoter region of E-cadherin, which is symmetrically methylated. Superposition of the lowest energy structures of the NMR ensemble of both DNA sequences bound to Kaiso showed a high degree of structural alignment (Buck-Koehntop et al., 2012).
The bound Kaiso protein is composed of three domains, namely ZF1, ZF2, and ZF3. ZF3 is highly disordered in the unbound form and becomes ordered upon binding to DNA, when the third β-strand of the domain wraps around the backbone phosphate of DNA forming a unique ββαβ motif. Additionally, the bound protein-DNA interface includes hydrophobic packing and hydrogen-bonding interactions in the ZF2 domain. Base-specific readouts are enabled by classical and methyl CH⋯O hydrogen-bonding interactions of DNA with the side chains of the Nterminal helices of the ZF1 and ZF2 domains. Gln563 in the ZF3 domain penetrates into the major groove of DNA to form hydrogen-bonds with G32 and C7 in the non-methylated form of DNA. In contrast, the methylated form of DNA has a slightly different geometry that makes it impossible for Gln563 to induce hydrogen-bonds with the aforementioned bases in the major groove. Instead, Gln563 forms interactions with the phosphate backbone of C5. The zinc fingers are anchored to methylated as well as unmethylated DNA by a set of direct hydrogen-bonds, as well as water-mediated contacts to the phosphate backbone. Additionally, van der Waals interactions with the sugar rings are established (Buck-Koehntop et al., 2012). m ZFP57 is another C2H2 zinc finger protein that binds to specific stretches of DNA sequences. Together with its cofactor m KAP1, m ZFP57 establishes and reinforces the activity of H3K9me, which functions as an anchor for local maintenance of DNA methylation (in imprinting control regions) in the face of simultaneous global demethylation. As a result, mH3K9 methylases, such as mSETDB1 are recruited. ZFP57, its cofactor KAP1, and other effectors bind in a selective manner to imprinting control regions (ICRs) in ES cells that are methylated and modified by the aforementioned enzymes. ZFP57 is also involved in imprint establishment and the maintenance of paternal and maternal imprinted loci . Deletions in mZFP57 or mDNMTs lead to ICR DNA demethylation (Quenneville et al., 2011). Liu and colleagues determined the crystal structure of a mouse Zfp57 fragment containing the two zinc fingers ZF2 and ZF3 bound to a 10bp oligonucleotide stretch. The two zinc fingers contain two β-strands and one helix that coordinate a zinc ion tetrahedrally via two cysteines and two histidines from the β-strands and the α-helix, respectively. Both zinc fingers bind in the major groove of DNA and have 'direct readouts'. Therein, six base-pairs can form either direct or water-mediated hydrogen-bonds with the target amino acids. The carbonyl O4 atom of thymine 4 (T4) forms hydrogen-bonds with the hydroxyl group of Ser153, while the exocyclic amine group of adenine (A4) forms watermediated bonds with Asp151. The three guanines in the GC stretch each form hydrogen-bonds with one arginine (namely Arg157, Arg178, Arg185). These bonds can either be direct contacts between protein and DNA residues or mediated via water molecules. The two methylated cytosines bind to the protein via completely different mechanisms: whereas 5mC at position 7 forms water contacts and is surrounded by an ordered layer of water molecules, 5mC at position 8 forms van der Waals contacts with the guanidium group of arginine 178. Additionally, one of the carboxylate oxygen atoms of glutamate 182 interacts with the N4 atom of cytosine 8. The residues involved in base recognition are the same in mouse and human . m KLF4 (one of the Yamanaka factors) stands for Krüppel-like factor 4 protein and functions to maintain the naïve pluripotency state . It is involved in regulating the expression of core pluripotency genes by binding to their upstream enhancer regions. By recruiting the protein cohesin, m KLF4 is crucial for maintaining the enhancer-promoter contacts of its target genes (Wei et al., 2013). A decline in Klf4 levels induces an unloading of cohesin and a consequent transcriptional repression of Oct4, which causes differentiation (Wei et al., 2013). Being a target of LIF/STAT3 signaling cascades, m KLF4 induces self-renewal of the pluripotency state . Upon overexpression, Klf4 promotes self-renewal of pluripotent cells even without LIF signaling .
Mechanistically, m KLF4 recognizes the CpG dinucleotide in a G/C rich stretch of nucleotides  both in the methylated and the non-methylated forms. The consensus binding element for both forms shares a central GG(C/T)G, where the main constituent is either a 'usually' methylated CpG, or a TpG that is methylated on the complementary strand. m KLF4 contains three tandem C2H2 zinc fingers that bind in the major groove of DNA. The N-terminal ZF1 interacts with the 3 0 side of the DNA sequence, whereas ZF2 and ZF3 bind in the central region and on the 5 0 side, respectively. The interactions formed by the two methylated cytosines are very similar to the case of ZFP57-DNA interaction. One 5-methylcytosine displays water mediated contacts with the protein, while the guanidinium group of Arg443 bridges the contact between the other 5-methylcytosine (via van der Waals interactions) and the adjacent guanine G6 (by forming hydrogen-bonds). Arg449 seems not to be involved in the interactions between protein and DNA in its non-methylated form. The 5 mC methyl group forms a weak (3.6 Å long) C-H⋯O hydrogen-bond with the carboxylate group of Glu446 (Gu et al., 1999). KLF4 exhibits similar affinities to DNA in the fully methylated, hemi-methylated, and non-methylated forms, with a slight preference for the fully methylated form .

The impact of cytosine methylation on the interaction of core pluripotency factors with DNA in the pluripotent state
Global DNA hypomethylation is a hallmark of the naïve epigenome in pluripotent cells, whereby low-methylated regions are active distal regulatory regions. Numerous studies explored the global hypomethylation as a major scheme in the induction of the "ground state" induced by 2i together with the accompanying transcriptional profile (Hackett et al., 2013;Leitch et al., 2013;Sim et al., 2017). So far, the causal effectors for this global demethylation are little understood. Notably, when changing the ESC medium from serum to ground-state conditions (2i), the occupancies of core transcription factors to the genome reorganize despite marginal changes in genome-wide DNA methylation in mouse (Galonska et al., 2015). Global DNA demethylation is observed in conventional murine ESCs 1-3 days after inducing the transition to the ground state (Ficz et al., 2013). This, together with evidence on chromatin structure and transcription factor activity supports the hypothesis that binding of the transcription factors of pluripotency is generally not affected by the methylation status of their binding sites. There is indeed evidence from mouse experiments that the binding landscape of the core pluripotency factors reshapes local demethylation; i.e. this binding is a cause for and not a consequence of low methylation levels (Stadler et al., 2011). The forward transition from naïve to primed pluripotency in murine ESCs involves global genomic reorganization of the binding states of core pluripotency and a subsequent remodeling of the enhancer landscape (Buecker et al., 2014;Factor et al., 2014). However, these dynamic changes seem mostly not to result directly from alterations in the expression of core transcription factors, but to be rather caused by a change in the global binding landscape and the redirection of core transcription factors by their binding partners (for example m OTX2) (Galonska et al., 2015).
Chromatin immunoprecipitation (ChIP) time course experiments showed that m OCT4 initiates the demethylation of H3K9me2 and depletion of H3 via recruiting the histone lysine demethylase m JMJD1C. Consequently, the DNA is prone to DNMT3a-mediated methylation (Shakya et al., 2015). ChIP-seq analysis confirmed the binding of m OCT4, m JMJD1C, and m FACT (facilitates chromatin transcription) to the enhancer region of Oct4, the Nanog promoter, as well as other gene targets of m OCT4. These proteins are functional cofactors that are crucial for reprogramming fibroblasts to pluripotency (Shakya et al., 2015). m OCT4 possesses a linker region (AA76-AA92) that differs from the linker sequence in other members of the Oct family but is nonetheless conserved across species (Esch et al., 2013). Linker residues exposed to the m OCT4 surface were suggested to contribute to the biological activity of OCT4. Mutations in asparagine residues 76-79 led to the formation of significantly fewer iPS cell colonies. Additionally, Leu80 achieved the strongest effect in abolishing any colony formation by the pluripotent stem cells, as the capacity to reprogramming was fully abolished. In murine embryonic stem cells, no difference between the wild-type and the mutants was detected with respect to the binding process of the OCT4-SOX2 heterodimer. This indeed signifies the presence of a different interaction partner for OCT4 whose interaction interface is misshaped (Esch et al., 2013). Other studies then analyzed the full protein interactome involving m OCT4 Pardo et al., 2010;van den Berg et al., 2010). m SMACRCA4, a member of the SWI/SNF chromatin remodeling complex, was found to be one of the two proteins in the interactome network that exhibited dramatically reduced levels in the mutant interactome network of the L80A m OCT4 protein (Esch et al., 2013). Analysis of the pluripotency proteome showed that Smarca4/Brg1 is significantly overexpressed in 2i cells compared to serum (Taleahmad et al., 2015). The other protein which had lower levels in the mutant interactome of the L80A OCT4 protein is Chd4, a helicase belonging to the NuRD complex. As previously mentioned, h/m MBD2 interacts with the NuRD complex and directs it to methylated DNA (Zhang et al., 1999).

DNA methylation levels in regulatory elements of core pluripotency factors
The upstream region of the mouse Oct4 gene contains three regulatory elements, namely distal enhancer (DE), proximal enhancer (PE) and proximal promoter (PP). The two enhancers are differentially active depending on the developmental stage of the mouse embryo (Kellner and Kikyo, 2010). DE activates Oct4 expression in ICM, ES cells and primordial germ cells, whereas PE drives Oct4 expression in epiblast cells. DNA methylation of these three elements mirrors the expression of the Oct4 gene. In ES cells, they are unmethylated. In somatic cells which do not express Oct4, these elements are methylated. DNA methylation at the Sox2 super enhancer was found to have distinct effects on the cellular differentiation state (Song et al., 2019). Elevated methylation levels in a subset of naïve pluripotency gene promoters were found during the transition from pluripotency in mouse ESCs, including the Nanog promoter (Kalkan et al., 2017). However, the promoters of other naïve and general pluripotency factors did not gain methylation, showing that pluripotency-associated genes acquire methylation with different kinetics. Since expression was only poorly correlated to methylation levels, the authors suggested that promoter methylation may not be a major driver of transcriptional changes during exit from the naïve state, although they acknowledged that methylation might contribute to repressing certain genes that are relevant for the transition (Kalkan et al., 2017).
During differentiation of ESCs as well as in the mouse postimplantation embryo, DNMT3a and DNMT3b were found to mutually stimulate each other and interact synergistically to methylate the promoters of the Oct4 and Nanog genes . Double knockdown cells of Dnmt3a and Dnmt3b are associated with the downstream reactivation and transcription of Oct4 and Nanog . Nonetheless, the two methyltransferases seem to be dispensable for reprogramming to the pluripotent state in mouse (Pawlak and Jaenisch, 2011). Conversely, ChIP analysis was conducted by Wu and colleagues.
The results showed that in mouse NIH/3 T3 cells and CCE cells, m Oct4 binds directly to the À554 to À294 fragment of the upstream regulatory sequence of Dnmt1. Transfecting m Oct4 siRNA into mouse CCE cells resulted in the downregulation of Dnmt1 expression (Wu et al., 2018). m TET1-m TET3 were found to be involved in active demethylation of the regulatory regions of the core pluripotency genes. m TET2 mediates the conversion to 5-hydroxymethylcytosine (5hmC) in the promoters of Nanog, some of its target genes, as well as of the estrogen-related receptor-β (ESRRB). Esrrb is ubiquitously expressed in the naïve state of pluripotency and in the reprogramming system of OCT4-SOX2-KLF4-cMYC. It was noted in OSKM-transduced mouse embryonic fibroblast cells that hydroxymethylation can be a distinct epigenetic mark from demethylation (Doege et al., 2012). The m TET1 protein is highly expressed in the ground state of pluripotency, but it is downregulated during differentiation (Tahiliani et al., 2009). m TET1 was shown to interact with the m SIN3A co-repressor complex . h/m OCT4 is considered as the hub of the core pluripotency network that regulates its own expression and the expression of h/m NANOG, h/m SOX2, as well as of h/m TET1 and h/m TET2 (Boyer et al., 2005;Koh et al., 2011). It was shown that Tet1 is capable of replacing Oct4 during somatic cell reprogramming, together with Sox2, Klf4, and c-Myc in a secondary reprogramming circuitry. Additionally, m TET1 was demonstrated to physically and functionally interact with the m NANOG protein (Costa et al., 2013) and to co-occupy many genomic loci, e.g., Esrrb. Neither m TET1 nor m TET2 was sufficient for the induction of pluripotency, but either enzyme can partner with m NANOG to enhance reprogramming of somatic cells to naive pluripotency (Costa et al., 2013). m TET1 and m NANOG co-occupy the promoter of Oct4 and induce a rise in its 5hmC levels prior to reprogramming (Costa et al., 2013). Thus, demethylation and hydroxy-methylation both contribute to the process of reactivating regulatory genes that are vital for pluripotency in mouse . Olariu and colleagues proposed a stochastic model for methylation of the promoter of Oct4 (Olariu et al., 2016). This model suggests a positive feedback loop between mOCT4, mTET1, and mNANOG. According to this computational model, not only is Oct4 a main target for regulation by NANOG-TET1, but TET1 also regulates its own expression (Olariu et al., 2016). Such regulatory feedback control loops were studied intensively in the literature. Commonly, Oct4, Nanog and Tet1 are the cornerstones in these circuits, supporting the direct involvement of DNA methylation in the regulation of the regulatory regions for core pluripotency factors. This circuit is additionally affected by the changing environment, what affects cell fate (Papatsenko et al., 2018). Recent studies added several protein layers to this network topology, with positive and negative interactions (Dunn et al., 2014;Xu et al., 2014). Ravichandran and colleagues found that the retinoid inducible nuclear factor in mouse, m Rinf, forms complexes with m Tet1-m Nanog-m Oct4 and facilitates the proper recruitment of these factors to the regulatory regions of their own genes (Ravichandran et al., 2019).

General roles of cytosine methylation in cell differentiation
Excellent reviews have documented the changes of global DNA methylation patterns upon differentiation of ESCs (Kim and Costello, 2017). Here, in a continuation of the previous section, we will briefly review the existing knowledge whether and how differentiation affects the methylation levels at the enhancers and promoters of key pluripotency genes and what the relevant writer and eraser proteins for this are.
In preparation for differentiation, DNA hypermethylation is induced at the promoters of many pluripotency and germline-specific genes, e.g., Oct4 and Nanog. During differentiation, DNA methylation co-occurs with nucleosome assembly so that the binding of transcription factors is inhibited. It was shown that both m DNMT3a and m 3b are required for de novo methylation of the promoters of Oct4 and Nanog . On the other hand, nucleosome-depleted regions (NDRs) are enriched in the proximal promoters or the distal enhancers of pluripotency factors, e.g., h NANOG and h OCT4, respectively. Such NDRs are normally unmethylated and hence prone to binding by core pluripotency transcription factors, which again promotes the expression of h OCT4 and h NANOG. Differentiation leads to increased nucleosome occupancy, which precedes de novo DNA methylation. For example, forced expression of h OCT4 during differentiation was found to restore the NDRs but only in cells with unmethylated enhancers (You et al., 2011).
The epigenomic machinery responsible for differentiation is an orchestrated cellular "program" involving different players. First, a repressor binds to the proximal enhancer regions of the target genes, then the protein m G9a, also known as euchromatic histone-lysine N-methyltransferase 2 ( m EHMT2), mediates m H3K9 methylation, followed by recruitment of heterochromatin protein 1 ( m HP1) and then de novo DNA methylation (Feldman et al., 2006). Artificial targeting of HP1 to the promoter region of OCT4 induces OCT4 silencing. In cells other than pluripotent cells, this silencing remains heritable after the removal of HP1. Nonetheless, in mouse embryonic fibroblasts derived from ESCs, this removal leads to concurrent OCT4 demethylation and expression (Hathaway et al., 2012). m DNMT3a and m DNMT3b are recruited and can initiate methylation at the proximal enhancer (Athanasiadou et al., 2010). The histone H3 Lys9 methylases G9a and GLP were shown to be essential to maintenance of DNA methylation at the imprinted loci in murine ESCs as they recruit the de novo DNMTs (Zhang et al., 2016b). Other studies showed that G9a may not be required for DNA methylation at the Oct4 enhancer (Athanasiadou et al., 2010).
Establishing DNA methylation at specific gene promoters requires cooperative action of m LSH, a member of the SNF2 ATPase family (encoded by the Hells gene), and the m G9a/GLP complex of histone methylases (Myant et al., 2011). Differentiating ESCs lacking m LSH showed discrepancies in methylation between neighboring CpG regions. As a result, many genes were deregulated in fibroblasts lacking the LSH protein (Athanasiadou et al., 2010). m LSD1 (also termed Kdm1A) is a histone demethylase that removes methylation at m H3K4me and m H3K9me. The m Lsd1-Mi2/NuRD complex was found to be involved in a regulatory switch that stimulates the interaction of H3K4-unmethylated histone tails with the m DNMT3 ATRX-DNMT3-DNMT3L (ADD) domain and subsequent DNA methylation at the enhancers of some pluripotency genes (Petell et al., 2016). The interaction of the ADD domain of h/m DNMT3a with H3K4 is blocked by H3K4 methylation (Guo et al., 2015). Petell and colleagues suggested that for m DNMT3a to be active, m LSD1-Mi2/NuRD-mediated histone deacetylation and demethylation should precede m DNMT3a-induced methylation of the enhancer region of pluripotency genes (Petell et al., 2016). This histone-mediated effect is triggered by the dissociation of the OSN (OCT4-SOX2-NANOG) co-activator complex. Suppression of some genes in DNMT3a-KO cells was only partly affected when compared to wild-type cells. As such, DNMT3a-mediated DNA methylation is one of several processes that together cause the inactivation of pluripotency genes. On the other hand, inhibition of m LSD1 was not correlated with the level of m DNMT1 protein during induction of differentiation of ESCs, nor was m LSD1 associated with global DNA methylation. In addition, m DNMT3-KO cells almost lacked any methylation in the differentiated states. This indeed suggests that the presence of m DNMT1 cannot compensate for the absence of m DNMT3a in initiating methylation at the enhancers of pluripotency genes (Petell et al., 2016).
In the process of hematopoietic maturation, m LSD1 ensures that the regulatory regions of hematopoietic stem and progenitor cell (HSPC) genes are inhibited (Kerenyi et al., 2013). After loss of m DNMT3a, multipotency genes are upregulated, whereas genes responsible for differentiation are downregulated (Challen et al., 2012). This indeed puts forward a possible similar mechanism for methylation followed during hematopoiesis as in differentiation.
ESCs lacking DNMT1 are not viable. When differentiation is induced, they enter into apoptosis. On the other hand, ESCs deficient for DNMT3a/DNMT3b partially or completely lack their ability to differentiate. These cells remain viable in the pluripotent state, and they have high histone acetylation levels. If DNA methylation is reestablished in these cells, they regain their ability to differentiate (Jackson et al., 2004). Lastly, recruitment of DNMTs to murine imprinting control regions was found to antagonize TET-dependent DNA demethylation (Zhang et al., 2016b).

Effects of epigenetic drugs
Epigenetic drugs, or 'epidrugs', are a new class of drug molecules that induce or inhibit histone-modifying enzymes, DNA methyltransferases, or the readers of the resulting chromatin modifications. As the activity of the drug targets directly affects chromatin state, epidrugs crucially affect a wide spectrum of cellular processes. Naturally, the efficacy of epidrugs is also dependent on the 3D structure of chromatin, which in turn is context-dependent as well. Hence, one can clearly expect that epidrugs will have different effects in different cell types. It has been argued that epidrugs have the potential to target a wide variety of ailments, including neurodegenerative diseases (Delgado-Morales et al., 2017), tumors, regenerative medicine and metabolic disorders (Altucci and Rots, 2016).
One obvious starting point where epigenetic drugs may have a potential use as therapeutic agents is the observation that anomalies in DNA methylation often occur during tumorigenesis. In particular, tumor genomes show global hypomethylation coupled with hypermethylation of specific loci. Many of the sequence segments showing hypomethylation in tumors are originally transposable and parasitic elements, which are hypermethylated under normal conditions (Bestor and Tycko, 1996;Lee et al., 2012;Xie et al., 2013). Hypermethylation, on the other hand, was shown to impair the beneficial activity of tumor-suppressor genes and induce inactivation of DNA-repair genes, especially in the context of CpG islands (Baylin and Herman, 2000). In the light of the observed global hypomethylation, it may come as a surprise that the expression levels of h DNMT1 were shown to be higher in tumor than in normal tissues, e.g. in colon cancer (El-Deiry et al., 1991) as well as in acute and chronic myelogenous leukemia (Mizuno et al., 2001) (here also h DNMT3a and h DNMT3b), which can be associated with the hypermethylation of CpG islands (Vertino et al., 1996). The role of DNMT1 in tumorigenesis is not uniform. On the one hand, DNMT1 was shown to possess tumor suppressor activity in some tumor forms; e.g., in early-state murine prostate cancer (Kinney et al., 2010). On the other hand, it had an oncogenic activity in later stages of the same cancer, thus playing opposing roles (Kinney et al., 2010). Other mechanisms were suggested to attribute tumors to aberrant expression of h/m DNMTs during different stages of the cell cycle (Jones, 1996;Zhang and Xu, 2017).
One of the proposed mechanisms involves aberrant protein-protein interactions including h/m DNMTs. DNMT1 interacts, for example, with proteins that affect its nuclear localization (e.g., h/m PCNA) or chromatin targeting (e.g., h/m HDAC classes I and II) (Robertson, 2001). Two DNMT inhibitors, azacitydine and decitabine, have been approved by the FDA as treatments for acute myeloid leukemia and chronic myelomonocytic leukemia, respectively (Mazzone et al., 2017). Azacitydine and decitabine are structurally related cytidine nucleoside analogs that become intracellularly activated by triphosphorylation. Decitabine-TP is exclusively incorporated into DNA, while azacitydine-TP is incorporated primarily into RNA (Oellerich et al., 2019). It was recently shown that decitabine-TP, but not azacytidine-TP, is an activator and substrate of the triphosphohydrolase SAMHD1 (Oellerich et al., 2019). Fig. 5 shows the active site region in the X-ray structure of decitabine-TP bound to SAMHD1. Ongoing clinical trials of next-generation hypomethylating agents (HMAs), such as guadecitabine and cedazuridine are described in (Pan et al., 2020).
DNMTi drugs also modulate several layers of the immune system. In this respect, substances inducing DNA hypomethylation can be utilized as a feasible approach in tumor therapy. Cancer testis antigens (CTA) are a family of tumor-associated antigens (TAA), which elicit immune response (by cytotoxic T cells) against tumors. These proteins are normally not expressed in human somatic tissues, except for testis and placenta, as well as in several tumors. In normal tissues, CTA genes have methylated promoters so that they are transcriptionally inactive (Fratta et al., 2011), and they are activated by demethylation (De Smet et al., 1999). Antigen presentation is another immunological mechanism that the body uses in its defense mechanism. In mouse or human tumors, DNMTi were found to induce this machinery and propagate the downstream lysis of neoplastic cells by cytotoxic T cells (Khan et al., 2008;Manning et al., 2008;Nie et al., 2001;Sigalotti et al., 2004).

Conclusions
It is now well accepted that there exists a compact group of core pluripotency factors that are responsible for maintenance of pluripotency and/or induction of differentiation via regulating the expression of hundreds to thousands of target genes. Furthermore, the importance of epigenetic alterations (DNA methylation and histone marks) in these processes have also been well documented. As presented here, there exists an intricate interplay between the core pluripotency factors and the enzymes of the DNMT family. For example, as discussed in section 4, the transcriptional activities of m Dnmt1 were regulated by direct binding of Oct4 to the upstream regulation element of Dnmt1 (Wu et al., 2018). On the other hand, DNMT3a and DNMT3b were found to mutually stimulate each other and interact synergistically to methylate the promoters of the Oct4 and Nanog genes . Similar ideas are contained in the computational model of Olariu et al. (2016), who formulated a regulatory network of Oct4, Nanog and Tet1 including positive feedback loops involving DNA-demethylation around the promoters of Oct4 and Tet1. We suggest that following up on these three studies would be a worthwhile goal that may reveal further surprising insights into the interplay of transcriptomic and epigenomic regulation of differentiation processes. A challenge hereby is to place or remove epigenetic marks at specified genomic positions under in vivo conditions and studying their phenotyping effects. We hope that this review will contribute to stimulate further work along these lines.