The DNA methylation machinery

Cytosine methylation is an important epigenetic signalling mechanism that regulates gene expression. The presence of 5-methylcytosine (5-mC) at promoters is associated with gene silencing. In this chapter, we will describe our current understanding of;


DNA Methylation / Methylcytosine
In eukaryotic cells, the addition of methyl groups (CH3) to DNA occurs almost exclusively on cytosine bases, more specifically at cytosines that are found directly 5′ to a guanine forming a CpG dinucleotide (Figure 1A).Methylation occurs at the 5 th atom of the 6-atom ring (Figure 1B) resulting in the formation of 5-methylcytosine (5-mC).On double stranded DNA, a CpG dinucleotide on one DNA strand will bind to a complementary CpG dinucleotide on the opposite DNA strand.If a cytosine on one DNA strand is methylated then its opposite counterpart will also be methylated (Figure 1A; see Chapter 3 -Methylcytosine and its oxidised derivatives for further details).5-mC behaves in the same way as unmethylated cytosine; it binds to guanine in double stranded DNA (dsDNA) and acts as the complementary base for guanine during transcription (not changing the genetic function of DNA is a defining feature of an epigenetic modification).
In humans, and other mammals, most CpG dinucleotides contain methylated cytosine (5-mC) and these methylated CpGs are distributed sparsely throughout the genome (Ziller et al. 2013).They are found at a low frequency in gene body regions and more densely at endogenous repeats and transposable elements (Bird 2002).
The presence of 5-mC in gene body regions is associated with high levels of gene transcription while, the more densely occurring 5-mC found at transposable elements, such as LINE-1, suppresses transcription and mobility (Schulz et al. 2006).Throughout the mammalian genome CpG dinucleotides are relatively rare due to the spontaneous deamination of 5-mC to thymine (Lander et al. 2001) (Figure 2).However, unmethylated CpG dinucleotides are more abundant in regions called CpG islands.These range in size from about 300bp to 3kb in length and can be found at the promoter region, often encompassing the first exon, of approximately 70% of all human genes (Figure 3).There are approximately 27,000 CpG islands found in the non-repetitive regions of the human genome (Rollins et al. 2005;Saxonov et al. 2006;Deaton and Bird 2011).
The expression pattern of genes determines the phenotype of cells, and which genes are expressed in a cell is principally determined by two factors: 1).Gene expression is regulated by the binding of appropriate (lineage-specific) transcription factors (TFs) at the promoter region.These TFs are required to recruit RNA polymerase to the start of transcribed gene regions.The presence of cell specific transcription factors is determined during embryogenesis and is also regulated via cell signalling (Spitz and Furlong 2012).
2).For transcription factors and RNA polymerase to bind to gene promoter regions the chromatin (DNA and associated proteins) at that region must be in an uncondensed state, that is, the number of histones in that region must be low, allowing ready access to the naked DNA.
A complex network of histone modifying proteins regulates the level of chromatin condensation (see Chapter 4 -The role of nucleosomes in epigenetic gene regulation for further details).Two key regulators are histone acetyltransferases (HATs) and histone deacetylases (HDACs).HATs add acetyl groups (CH3CO (Ac)) while HDACs remove these from specific amino acids on histones (Verdin and Ott 2014).
The methylation status of CpGs, and in particular CpG islands, determines the acetylation status of localized histones which, in turn, regulates the condensation of that region of chromatin, the accessibility of associated promoters and thus regulates gene expression.The mechanism by which this process occurs will be discussed in the next section.

Readers of methylcytosine: DNA methylation and gene expression
The methylation status of CpGs can be read by three types of interacting proteins, these are 1) The methyl-CpG binding domain protein (MBD) family, 2) The Kaiso proteins and Kaiso-like proteins and 3) The UHRF proteins (Ubiquitin-like with PHD and ring finger domains).As the name suggests, the MBD family of proteins binds to regions containing double stranded methylated CpG (5-mCpG) via a conserved binding domain (Nan et al. 1993), while the Kaiso and Kaiso-like proteins bind to methylated DNA via zinc finger motifs (Prokhortchouk et al. 2001;Filion et al. 2006).
Both the MDB and Kaiso/Kaiso-like proteins repress the transcription of the DNA region they bind to (Hendrich and Bird 1998;Filion et al. 2006).However, the manner in which they target DNA is slightly different; Kaiso proteins bind to two consecutive methylated CpG sites whereas the MBD proteins preferentially bind to single CpG dinucleotides (Daniel et al. 2002;Sasai et al. 2010).UHRF proteins do not repress transcription, they are involved in ensuring both strands of DNA maintain 5-mCpG patterns during DNA replication; UHRF proteins bind to hemimethylated DNA (DNA where a CpG cytosine is methylated on only one strand of double stranded DNA) and recruit the DNA methyltransferase DNMT1 to that location (Bostick et al. 2007;Sharif et al. 2007).See Section 3 -Writers of methylcytosine: DNA methyltransferases (DNMTs) within this chapter for further details regarding the role of DNMTs.

Methyl-CpG binding domain proteins (MBD)
The MDB family comprises 7 proteins; MeCP2 (methyl-CpG-binding protein 2), MBD1, MBD2, MBD3, MBD4, MBD5 and MBD6 (Hendrich and Bird 1998;Baymaz et al. 2014).All of these proteins bind directly to methylated DNA (via a 60-85 amino acid (aa) MBD domain).MeCP2, MBD1 and MBD2 also bind to transcriptional repressors via a transcriptional repression domain (TRD) (Wolffe et al. 1999;Boeke et al. 2000), while MBD3 interacts with transcriptional repressor proteins via a Cterminal coiled-coiled domain (Hendrich and Bird 1998) (Figure 4).MDB proteins can bind to different yet overlapping partner proteins via their TRD.A key family of transcriptional repressors is the histone deacetylases (HDAC1, 2 and 3).These HDACs can bind to MeCP2 and MBD1-3 (Ng et al. 1999;Kokura et al. 2001;Saito and Ishikawa 2002;Shyh-Chang et al. 2013;Lyst et al. 2013).By bringing HDACs and other histone-modifying molecules to regions of methylated DNA, MBD proteins can direct the formation of heterochromatin.Heterochromatin consists of highly condensed chromatin that is often inaccessible to the transcriptional machinery and associated with low levels of acetylation (due to the activity of HDACs bound to MBD-proteins) (Taddei et al. 2001) at histone H3 on lysine residues 9, 14 and 18, and histone H4 on lysine residue 16 (Verdin and Ott 2014).In contrast, these regions often contain high levels of methyl groups on specific histones, such as histone H3 methylation on lysine 9 (Fuks et al. 2003)).MeCP2, MBD1 and MBD2 bind to histone methyltransferase proteins (SUV39H1, SETDB1 and PRMT5) (Lunyak et al. 2002;Fujita et al. 2003;Sarraf and Stancheva 2004;Le Guezennec et al. 2006) which facilitate the methylation of local histones (for further information regarding MDBs and their binding partners see Du et al. 2015).Heterochromatin is initiated by the juxtaposition of individual nucleosomes into dense nucleosome arrays by nucleosome remodeling complexes (see Chapter 4 -The role of nucleosomes in epigenetic gene regulation for further details).Two of these complexes are NuRD/Mi-2 (a nucleosome remodeling complex with histone deacetylase activity (conferred by HDAC1 and HDAC2) (Xue et al. 1998) and the SWI/SNF ATP-dependent chromatin remodeling complex BAF (for a general review of SWI/SNF complexes see Kadoch and Crabtree 2015).MeCP2 binds to the BAF complex by the BRM (also known as SMARCA2) subunit (Harikrishnan et al. 2005) and MBD2 and 3 bind to the NuRD/Mi-2 complex (Ramírez et al. 2012;Baubec et al. 2013).Thus, the recognition of methylated DNA by MeCP2 and MBD1 and 3 results in modification of local histones (deacetylation and methylation).This facilitates the binding of nucleosome-remodeling protein complexes (also directed to that location by MBD proteins) and the remodeling of nucleosomes to reduce accessibility.This decrease in accessibility closes access to the local DNA for transcription factors and RNA polymerase, resulting in transcriptional silencing.
MBD4 does not play a significant role in transcription repression.Rather, it contains a glycosylase domain that facilitates mismatch repair (mC>T or C>U) at CpG dinucleotides (Hendrich et al. 1999;Petronzelli et al. 2000).Both MBD5 and 6 remain to be fully characterized; despite containing MBD domains, there is no evidence that they bind to methylated DNA (Laget et al. 2010).However, they do bind to the human polycomb deubiquitinase complex (PR-DUB) which removes ubiquitin from Histone H2AK119 (Baymaz et al. 2014).

Writers of methylcytosine: DNA methyltransferases (DNMTs)
Humans possess five DNA methyltransferase enzymes (DNMTs); DNMT1, DNMT2, DNMT3A, DNMT3B and DNMT3L.Of these, DNMT1, DNMT3A and DNMT3B add methyl groups to cytosine bases at CpG dinucleotides within DNA (Lyko 2017) using S-adenosylmethionine as the methyl donor (Du et al. 2016).DNMT1, DNMT3A and DNMT3B contain a C-terminal catalytic domain responsible for methylating CpGs and varying N-terminal regulatory domains (Denis et al. 2011) (Figure 5).Although DNMT2 and DNMT3L share high levels of conservation with the other DNMTs, they do not possess catalytic domains capable of methylating CpG dinucleotides and have smaller N-terminal regulatory domains (Aapola et al. 2000;Dong et al. 2001).

The function of DNMTs: maintenance methylation
As described in section 2 of this chapter, many of the readers of DNA methylation only recognise fully methylated double stranded CpG dinucleotides in which the cytosine bases on both strands must be methylated.To maintain methylation during DNA replication, each newly synthesised daughter DNA strand must be methylated using the original methylated DNA strand as a template.This maintenance methylation is the principal role of DNMT1 (Bestor et al. 1988;Li et al. 1992).
DNMT1 is directed to the replication fork during S phase by binding to HDAC2 and DNA methyltransferase-associated protein 1 (DMAP1) (Rountree et al. 2000).
DNMT1 also binds to PCNA (proliferating cell nuclear antigen) (Chuang et al. 1997) and DNA directly at the replication fork (Suetake et al. 2006).Once at the replication fork, DNMT1 recognises hemimethylated DNA by directly interacting with the UHRF proteins (Bostick et al. 2007;Sharif et al. 2007;Berkyurek et al. 2014).Once DNMT1 is in position, its catalytic domain covers the hemimethylated DNA region and both the selection of the cytosine for methylation and the process of methylation occurs within the catalytic domain (Song et al. 2012;Bashtrykov et al. 2012).In addition to this maintenance methylation, DNMT1 (and DNMT3B) can facilitate de novo methylation of promoter regions via interactions with MBD3-NuRD/Mi-2 complexes (Cai et al. 2014).

The function of DNMTs: establishment of DNA methylation
Appropriate DNA methylation in mammalian germ cells is essential for fertility and the viability of the resulting embryo (Messerschmidt et al. 2014).Primordial oocyte cells undergo genome-wide demethylation (Seisenberger et al. 2012), then the growing oocytes (not yet fertilised) proceed to develop complex, organised DNA methylation patterns.During this period CpG islands associated with maternally imprinted genes are also methylated.This de novo methylation is carried out by DNMT3/DNMT3L complexes (Okano et al. 1998;Okano et al. 1999) and appears to be associated with histone modifications such as histone 3 lysine 36 trimethylation (H3K36me3) that are associated with transcriptional activity, suggesting that transcriptional activity is required in the non-dividing oocyte prior to gene silencing (Smallwood et al. 2011;Stewart et al. 2015).However, the precise timing and manner in which these enzymes are directed to specific regions of the oocyte genome remains to be fully clarified.Mammalian sperm cells also have distinct patterns of DNA methylation.However, unlike oocytes, as sperm progenitor cells undergo repeated mitoses at the onset of puberty the activity of DNMT1 is required to maintain the pattern of methylation (Marques et al. 2011).Shortly after fertilisation the embryonic genome undergoes a wave of demethylation (Sasaki and Matsui 2008) (see Section 4 in this chapter for mechanistic details of DNA demethylation).
The unmethylated DNA of the embryo is then re-methylated by DNMT3/DNMT3L complexes.DNMT3/DNMT3L methylate DNA in a non-selective manner, they have a preference for CpG dinucleotides but can also methylate CpH (H=A, C or T) dinucleotides (Gowher and Jeltsch 2001).Recent advanced in sequencing technology has enabled the identification of frequent CpH methylation in embryonic stem cells and neurones (Lister et al. 2009;Ziller et al. 2011).Differences in cellspecific levels of CpH methylation appears to be a consequence of differential expression of DNMT3a and DNMT3b (Lee et al. 2017).The mechanism that determines which DNA sequences are de novo methylated by DNMT3/DNMT3L is unclear.However, it is known that, in addition to preferentially methylating CpG dinucleotides, this is influenced by the surrounding sequence; CpGs with purine bases on the 5' flank and pyrimidines on the 3' flank are preferentially methylated (Lin et al. 2002).DNMT3 binds directly to histone H3 tails (Ooi et al. 2007) and this binding allosterically activates DNMT3 resulting the methylation of DNA local to that histone (Guo et al. 2014b;Baubec et al. 2015).The presence of large modifications on H3K4 (tri-methylation or acetylation) prevents binding of DNMT3 and associated DNA methylation (Ooi et al. 2007).Genome wide-methylation studies have shown a strong correlation between the absence of H3K4me3 and DNA methylation (Weber et al. 2007;Meissner et al. 2008) suggesting that this mechanism is important in setting global DNA methylation patterns.DNMT3 also binds to histone H3 when it is trimethylated at lysine 36 (H3K36me3).This histone mark is found in gene bodies and at exon-intron boundaries and correlates to transcriptionally active regions.The presence of H3K36me3 is also inversely correlated to H3K4me3 (Vakoc et al. 2006;Barski et al. 2007;Guenther et al. 2007).

CpG Islands
As described previously, CpG island are CpG-rich regions that are associated with the promoter region of approximately 70% of human genes.The majority of these CpG islands are not methylated during early embryogenesis, and are associated with a histone H3 modifications (H3K27me3 and H3K4me3); H3K27me3 marked regions of DNA are prone to undergo methylation during differentiation and oncogenesis (Ohm et al. 2007).The presence of transcription factors at regions of active transcription is associated with the absence of DNA methylation.This was observed more than 20 years ago (Brandeis et al. 1994;Macleod et al. 1994;Han et al. 2001) and more recently has been further validated by Genome-Wide Association Studies (GWAS) indicating that promoter region variations (and associated changes in transcription factor binding) correlated with changes in localised DNA methylation (Gutierrez-Arcelus et al. 2013).These results suggest that the cell-specific differential presence of transcription factors plays a significant role in determining the methylation status and chromatin pattern of different cell types during embryogenesis.By the blastocyst implantation stage of development the expression of DNMT3A, B and L is reduced and the expression of DNMT1 is increased (Huntriss et al. 2004;Vassena et al. 2005;Uysal et al. 2015).At this point cell type-specific methylation patterns have been set and are now maintained through repeated rounds of mitosis.

Erasers of methylcytosine
There are two main processes that lead to the demethylation of DNA; the first is passive demethylation, where the absence or low expression of the maintenance methyltransferase DNMT1 results in a dilution of DNA methylation marks as the genome replicates.This results in regions of hemimethylated and, ultimately, unmethylated DNA.
As mentioned in section 3.2, methylation marks are removed in a wave of DNA demethylation after fertilisation.Both the paternal and maternal component of the zygote genome are demethylated via passive dilution through DNA replication and an active process involving the TET (Ten-Eleven Translocation) proteins (Guo et al. 2014a).TET proteins were first identified as being components of a fusion protein, found in acute myeloid leukemia, due to a translocation between chromosomes ten and eleven (Lorsbach et al. 2003).The TET family of proteins (TET 1, 2 and 3) are oxyginases that catalyse the demethylation of 5mC; firstly 5mC is converted to 5hydroxymethylcytosine (5hmC), which is then converted to 5-formylcytosine (5fC) and finally to 5-carboxylcytosine (5caC).All three TET enzymes are capable of catalysing each of these steps (Kriaucionis and Heintz 2009;Tahiliani et al. 2009;Ito et al. 2010;Ito et al. 2011).The resulting 5fC or 5caC is recognised by thymine DNA glycosylase (TDG) and is restored by base excision repair (BER) (He et al. 2011).
TDG excised the 5fC or 5caC resulting in an abasic site that is converted to a single strand break.DNA polymeraseβ (Polβ) then inserts a deoxycytidine monophosphate at the break and the double stranded DNA is restored by DNA ligase 3 (Weber et al. 2016).This active modification-active removal (AM-AR) is independent of DNA replication (Kohli and Zhang 2013).The 5mC oxidative derivative can also remain incorporated in the genome and be converted to unmethylated cytosines by replicative dilution.This can occur in the presence of DNMT1 as it does not have a high affinity for 5hmC:C, 5fC:C and 5caC:C (Hashimoto et al. 2012;Ji et al. 2014).
In addition to demethylating genomes during embryogenesis, the TET proteins play important roles in determining pluripotency cell differentiation: Recent knockout experiments in mouse embryonic stem cells have shown that loss of TET1 or TET2 reduce the level of 5hmC and alter gene transcription.With unique targets being identified for each protein, the resulting gene expression changes have been suggested to influence differentiation potential of these stem cells (Koh et al. 2011;Huang et al. 2014).

Regulation of DNA demethylation by TET enzymes
TET protein activity requires α-KG and oxygen as substrates and Fe(II) as a cofactor.The availability of these molecules can influence the activity of the TET catalysed reaction (Lu et al. 2015).α-KG is generated in the citric acid cycle by the activity of isocitrate dehydrogenase (IDH1, 2, 3).Upregulation of IDH results in the production of excess α-KG and an associated increase of 5hmC throughout the genome, whereas loss of function mutations in IDH result in reduced α-KG production and are associated with increased levels of genomic methylation (Losman and Kaelin 2013).Similarly, fumarate and succinate can accumulate in tumour cells following disruption of the citric acid cycle and these molecules compete with α-KG, resulting in reduced TET activity (Laukka et al. 2016).
TET proteins are found to localise preferentially to CpG islands and transcriptionally active promoters, this is facilitated in part, by DNA binding domains that preferentially bind CpG rich regions, and in part due to interactions with other binding partners such as the pluripotency factor NANOG (Costa et al. 2013) and Polycomb repressive complex 2 (PRC2) (Neri et al. 2013).
In certain tissues the levels of 5mC oxidative derivatives are relatively high, suggesting that they have functional roles beyond being intermediates for demethylation.Moreover, the balance between the activity of DNA methyltransferases and the demethylation machinery is delicate, and when an imbalance occurs, either during development or in somatic cells, disease can result.
Further details on 5mC oxidation, as well as examples of mutations in the methylation machinery that result in disease, are discussed in Chapter 3 -Methylcytosine and its oxidised derivatives.

Conclusion
In the mammalian cell, there is a delicate balance being held between DNA modifying enzymes, the writers and erasers of methylcytosine.During embryogenesis the timing and molecular positioning of demethylation and then remethylation is orchestrated with incredible precision to facilitate appropriate differentiation.Once tissue differentiation has occurred these epigenetic marks are maintained or, in some tissue types, progressively change.In many ways, these simple markers are relatively easy to understand and molecular techniques allow us to analyse them at ever increasing speed and throughput.However, there is still much to discover, in particular how these molecules are regulated and interact with each other and how DNA modifying and binding molecules interact with histone modifying and binding molecules to determine chromatin structure and cell specific gene expression.Discovering the mode of these interactions will be important for future molecular medicine.

Figure
Figure 1.5-methylcytosine occurs primarily within the context of a CpG

Figure 3 .
Figure 3. CpG and methyl-CpG distribution.CpG dinucleotides are more

Figure 4 .
Figure 4.The conserved functional domains of the methyl-binding domain

Figure 5 .
Figure 5.The conserved functional domains of the DNA methyltransferase