Introduction

DNA methylation at the C5 position of cytosine (5mC) in the context of CpG dinucleotides regulates gene expression, retrovirus silencing, X chromosome inactivation and other functions in mammalian cells1. The enzymes responsible for this modification, i.e., DNA methyltransferases (DNMT1–3), are well characterized and are required for normal development in mice2. Although DNA methylation was previously assumed to be a stable epigenetic modification, recent discovery of the ten-eleven translocation (TET) family of DNA dioxygenases (TET1–3) has shown that the methyl group of 5mC can be modified to 5-hydroxymethylcytosine (5hmC), adding a layer of complexity to the epigenetic regulation of DNA methylation3,4,5. Several studies have developed methods for genome-wide mapping of 5hmC by using either 5hmC-specific antibodies6,7,8 or chemical labeling to enrich 5hmC-containing DNA9,10. More recently methods for mapping at a single-nucleotide resolution level were also reported11,12. Those study suggests a role for 5hmC in transcriptional activation and repression in a genomic context-dependent manner6,7,13. 5hmC is relatively stable and can be found in various mouse tissues and embryonic stem (ES) cells, although levels differ between cell types14,15; therefore, 5hmC is viewed as an epigenetic modification.

During DNA replication, maintenance DNMT1 maintains symmetric CpG methylation with high specificity on the unmethylated strand of a hemi-methylated CpG sequence, but not in a hemi-hydroxymethylated CpG sequence, which could lead to passive DNA methylation16,17. Alternatively, 5hmC can be converted to 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) via TET protein-mediated consecutive oxidations (Figure 1A)18,19. Biochemical analyses suggest thymine DNA glycosylase (TDG) excises 5fC and 5caC, thereby generating an apurinic/apyrimidinic site that is in turn processed by the base excision repair (BER) machinery, suggesting an active DNA demethylation pathway20. The importance of TDG in maintaining appropriate DNA methylation has been indicated by targeted Tdg allele disruptions in mice; these knockout mice exhibit aberrant DNA methylation in a subset of gene promoters and enhancers, coincident with dysregulated gene expression21,22. Unlike other DNA glycosylases required for the BER pathway, Tdg knockout is embryonic lethal in mice, despite leaving DNA repair largely intact. Thus, TDG is essential for proper embryonic development, in part due to its role in maintaining epigenetic stability during cell-lineage commitment.

Figure 1
figure 1

DNA synthesis by Klenow fragment exonuclease minus (KF exo-) on a DNA template containing 5fC and 5caC.

(A) Proposed 5mC oxidation pathway involving TET dioxygenases. 5fC and 5caC are removed by TDG in the BER pathway. (B) A 13-mer primer was 5′-labeled with 32P and annealed with a 30-mer oligonucleotide containing the indicated modified cytosines at position X (upper panel). The primer/template complexes were incubated with increasing amounts of KF exo- (0, 0.04 and 1 U in each group of three lanes) at 30°C for 5 min. (C) Relative KF exo- DNA synthesis efficiency on all 5 modified cytosine templates. Data were normalized to the DNA synthesis efficiency of KF exo- (1 U) for normal cytosine-containing templates (lane 3). Quantification of the 30–31-nt fragments by image analysis. Error bars indicate the standard deviation. n.d.: not determined. (D) A 17-mer primer was 5′-labeled with 32P and was annealed with a 30-mer oligonucleotide containing the indicated modified cytosines at position X (upper panel). The primer/template complexes were incubated with KF exo- (0.01 U) for 5 min on ice with one of the indicated dNTPs (lanes 1–4, 5–8, 9–12, 13–16 and 17–20).

Although 5fC and 5caC are assumed to be part of the DNA demethylation pathway and should therefore have short half-lives, substantial amounts of 5fC and 5caC are present in various mouse tissues and ES cells19. A recent study showed that 5fC and 5caC are enriched at gene regulatory elements in Tdg-deficient ES cells, suggesting the involvement of 5fC and 5caC in transcriptional regulation23,24. Other studies have suggested 5fC induces G·C to A·T transition mutations during DNA replication when DNA polymerase encounters 5fC on template-strand DNA25,26,27,28. When 5fC and 5caC behave as mutagenic bases, TET protein-mediated consecutive oxidations of 5mC and 5hmC leads to deleterious consequences such as predisposition to cancer or apoptosis due to the accumulation of genomic mutations, unless 5mC oxidation is coupled with efficient elimination of 5fC and 5caC. Here, we studied the activity of DNA polymerases on oligonucleotide templates containing specifically located epigenetic cytosine products that were oxidatively modified and found that 5caC forms G·T mismatch-mimicking base pairs with unmodified guanine. These mismatch-like base pairs induced DNA polymerase exonuclease activity and were recognized by mismatch repair (MMR) proteins, suggesting a novel DNA damage effect of 5caC via unexpected abortive MMR.

Results

DNA polymerases incorporat dGTP opposite modified cytosines

To assess whether these modified cytosines, which are involved in epigenetic processes, behave as DNA lesions that induce genomic mutations or block DNA synthesis, we first investigated whether DNA polymerases catalyze DNA synthesis on templates containing a site-specific C, 5mC, 5hmC, 5fC, or 5caC. Figures 1B and 1C show that Klenow fragment exonuclease minus (KF exo-) synthesized ~30-mer DNA fragments on templates containing C, 5mC, or 5hmC, but synthesized fewer ~30-mer DNA fragments on templates containing 5fC or 5caC. In the case of 5caC, a small fraction of polymerases stalled briefly at the modified cytosine. The results suggest that DNA synthesis by KF exo-, unlike the typical stalling induced by DNA damage, was affected by 5fC and 5caC. Next, we examined the nucleotide preference for incorporation opposite a modified cytosine by KF exo-. The polymerase preferentially incorporated dGTP opposite modified cytosines, suggesting these cytosine derivatives were not highly mutagenic and that the less-efficient DNA synthesis observed in the case of 5fC and 5caC was not due to miscoding (Figure 1D). Human DNA polymerase η (Polη) permits replication past DNA lesions on templates29,30; Polη exhibited a similar difference in DNA synthesis efficiency (Figure S1A and S1B) and preferentially incorporated dGTP opposite modified cytosines (Figure S1C).

5caC pairing with guanine stimulates the proofreading function of Polδ

Both KF exo- and Polη were incapable of proofreading during DNA synthesis; therefore, we investigated whether human DNA polymerase δ (Polδ), which harbors an intrinsic 3′ to 5′ exonuclease domain, catalyzes DNA synthesis past the modified cytosines during replication. Interestingly, although Polδ synthesized DNA fragments on all templates, proofreading cleavage products were observed with only the 5caC templates (Figure 2A). Next, we examined the nucleotide preference for incorporation opposite a modified cytosine by Polδ exo-, which catalyzes DNA synthesis (Figure S2) and lacks 3′ to 5′ exonuclease activity. As shown in Figure 2B, this polymerase also incorporated dGTP opposite modified cytosines.

Figure 2
figure 2

DNA synthesis reactions of human Polδ on DNA templates containing 5caC.

(A) A 13-mer primer was 5′-labeled with 32P and annealed with a 30-mer oligonucleotide containing the indicated modified cytosines at position X (upper panel). The primer/template complexes were incubated (A) with increasing amounts of Polδ (0, 0.8 and 4 nM in each group of three lanes) at 30°C for 5 min. (B and C) A 17-mer primer was 5′-labeled with 32P and annealed with a 30-mer oligonucleotide containing the indicated modified cytosines at position X (upper panel). (B) The primer/template complexes were incubated with Polδ exo- (10 nM) for 5 min at 30°C with one of the indicated dNTPs (lanes 1–4, 5–8, 9–12, 13–16 and 17–20). (C) The 5′-labeled 17-mer primer was loading in the first lane. The primer/template complexes were incubated with pol δ (4 nM) and dGTP (100 μM) at 30°C for 5 min.

When base-pair misincorporation occurs during DNA synthesis, the proofreading exonuclease activity of DNA polymerase removes the incorrect base. To test whether this exonuclease activity ensured correct pairing of 5caC, polymerization reactions by Polδ were performed in the presence of only a single dGTP. Fragment degradation by the exonuclease activity of Polδ was simultaneously observed on the 5caC templates (Figure 2C), indicating that 5caC pairing with guanine stimulates the proofreading function of Polδ.

We observed similar DNA synthesis or exonuclease effects for 5caC during DNA synthesis by Klenow fragment (KF exo+), which possesses 3′ to 5′ exonuclease activity during DNA synthesis (Figure S3). KF exo+ possesses intrinsic terminal-deoxynucleotidyl transferase activity; therefore, it could synthesize DNA fragments up to approximately 20 bp on all templates (Figure S3C). In the case of KF exo+, primers annealed with 5caC templates were more degraded than those annealed with 5fC templates, indicating that G·5caC pairings stimulate the proofreading function more than G·5fC pairings do.

MutSα complex recognizes G·5caC pairs in DNA substrates

The proofreading function of DNA polymerases plays an important role in correcting replicative mismatch errors. Our results suggest this proofreading occurs at G·5caC pairings but not at other cytosine pairings. Although 5caC forms appropriate base pairs with guanine, we hypothesized that these pairings behave like mismatches (Figure 3A). If this holds true, the mismatch repair (MMR) protein MutS should recognize both pairings as it does G·T mismatches, which are a canonical MutS substrate31,32,33. To test this possibility, we performed electrophoretic mobility shift assays (EMSAs) with Taq MutS and 30-mer DNA substrates containing G·T, G·C, G·5mC, G·5hmC, G·5fC and G·5caC. We observed a striking difference in MutS binding efficiency between these forms of cytosine. MutS bound G·T and G·5caC pairs (Figure S4A and S4B). The binding preference order was G·T = G·5caC > G·5fC > G·C = G·5mC = G·5hmC. Next, we performed EMSAs with 34-mer G·5caC-containing DNA substrates and human MMR protein MutSα complexes, which consist of MSH2 and MSH6 (Figure S5), because the exonuclease activity of Polδ was observed only on the 5caC templates. The MutSα complex is a human homolog of the MMR protein MutS and is indispensable for the mammalian MMR system34,35. MutSα bound to the positive control G·T pairs and to the G·5caC pairs (Figure 3B); addition of excess cold G·T DNA substrates inhibited binding between MutSα and G·5caC DNA substrates (Figure 3C). To confirm this interaction, biotin labeled-G·5caC DNA substrates were incubated with HeLa whole cell extracts and the DNA-bound proteins were pulled down with streptavidin-coated beads; MSH2 and MSH6 were detected by immunoblotting. Results confirmed the MutSα complex recognized G·5caC pairs in DNA substrates (Figures 3D and 3E). Thus, the G·5caC pairs behaved similarly to a G·T mismatch when Polδ synthesized new DNA fragments opposite 5caC, although DNA polymerase correctly incorporated dGTP. In addition, G·5caC pairs may be subjected to MMR in mammalian cells.

Figure 3
figure 3

MutSα complex binds to G·5caC base pairs.

(A) Postulated base-pairing models of 5caC with G. (B and C) A 34-mer oligonucleotide was 5′-labeled with 32P and annealed with a 34-mer oligonucleotide containing the 5caC. The mismatch substrates were incubated (B) with MutSα on ice for 20 min or (C) with MutSα and cold G·T mismatch substrates (lanes 3–6: non-labeled substrates/5′-labeled substrates molar ratio; ×1, ×10, ×50 and ×100) at 25°C for 20 min. Free and bound fractions were separated on nondenaturing 6% polyacrylamide gels containing 5 μM MgCl2. (D and E) Biotin-labeled 34-mer mismatch substrates were incubated (D) with whole cell extracts (10 μg) or (F) with whole cell extracts (10 μg) and G·T mismatch substrates (lanes 3–5: non-labeled substrates/5′-labeled substrates molar ratio; ×1, ×10 and ×50). Samples were loaded on 10% SDS-page gels and MSH2 and MSH6 were detected by western blotting.

Accumulation of 5caC affects cell proliferation

In the MMR system, exonuclease I removed the daughter DNA strand but could not remove template DNA containing the modified cytosine. Thus, the “offending” site persisted in the template. The ensuing abortive turnover of new DNA may result in a death response. Earlier studies have shown that TDG binds to these G·5caC pairs36 and excises the modified cytosine20. To investigate the effects of these G·T mismatch-mimicking base pairs in mammalian cells in vivo, we confirmed the expression levels of Tets, TDG and MSH2 in various human cells (Figure S6) and then knocked down TDG expression, thereby inducing the accumulation of G·5caC base pairs and observed the viable cells in TDG-knockdown cultures (Figure 4A and Figure S7). As expected, 5caC was induced in TDG-knockdown cells (Figure 4B and Figure S8) versus control-knockdown or MSH2-knockdown cells. TDG-knockdown HeLa cells exhibited elevated apoptotic cell population and decreased number of surviving cells (Figure 4C and 4D), indicating the accumulated G·5caC base pairs are recognized by MMR, which induces the effects of DNA damage. This phenotype was partially rescued by knockdown of MSH2 expression. Once again, because Tet1-overexpressed 293 cells exhibit increased 5hmC and 5caC levels19, we investigated the effects of TDG knockdown in Tet1-overexpressed 293 cells (Figure 4E and Figure S9). When Tet1 expression was induced by treatment with doxycycline (Dox), cell number was reduced in all cases, indicating that Tet1-modifying cytosines behave as DNA lesions (Figure 4F). TDG knockdown leads to reduced cell numbers that were rescued by TDG-MSH2 double knockdown (Figure 4F). Thus, MMR was required for these DNA damage effects, which result in cell proliferation defects and decreased cell number.

Figure 4
figure 4

MSH2 knockdown rescued the cell death response in TDG-knockdown cells.

(A) Western blotting of lysates prepared from siRNA-transfected HeLa cells using antibodies specific for the indicated proteins. Actin served as a loading control. (B) Mass spectrometric quantification of 5caC in siRNA-transfected HeLa cells. Data represent the average of at least three independent experiments and statistically analyzed by Student's t test (*p < 0.05). Error bars represent standard error of the mean. Three days after siRNA transfection, the percentage of apoptotic cells (C) and the number of viable cells (D) were calculated. Experiments were repeated six times and statistically analyzed by Student's t test (N = 6, *p < 0.01). Error bars represent standard deviation (SD). (E and F) 293 cells expressing mouse Tet1 under the control of Doxycycline were transfected with siRNA in the absence or presence of Doxycycline. Three days after siRNA transfection, expression of indicated proteins was examined by specific antibodies (E) and the number of viable cells was calculated (F). Experiments were repeated three times. Error bars represent SD. (G) Model for DNA damage effects of 5caC in DNA replication via abortive mismatch repair.

Discussion

In this study, we investigated the effects of oxidative forms of 5mC on DNA synthesis by replicative or translesion DNA polymerases with DNA templates containing a site-specific C, 5mC, 5hmC, 5fC, or 5caC. Although DNA polymerase correctly incorporated dGTP opposite any modified cytosine, DNA degradation products generated by the exonuclease activity of Polδ was significantly higher with 5caC than with other modified cytosines. Base pairing of guanine and the imino tautomer of 5caC (Figure 3A) has the same geometry as a G·T mismatch37 and was suggested in a previous study38; the results of exonucleolytic degradation in our study may be attributed to this type of base-pair formation. Münzel et al. demonstrated intramolecular hydrogen bonding between the amino and formyl groups of 5fC, but suggested that a substantial shift of tautomer equilibrium toward the imino form was unlikely27. Although it is very difficult to experimentally detect the unfavored imino tautomers of cytosine derivatives39,40, electron-withdrawing substituents at the C5 position of cytosine may facilitate base pair formation with the minor tautomer because this type of substitution destabilizes the Watson–Crick G·C base pair41. Formation of the G·5caC base pair in the same geometry as that of a G·T mismatch would stimulate the exonuclease activity of human Polδ.

In order to characterize the base-pair formation of the oxidized 5mC from another viewpoint, we examined binding of Taq MutS and human MutSα to C5-modified duplexes. As shown in Figures 3B and S4A, these proteins bound to the duplex containing 5caC; binding competition with a G·T mismatch-containing duplex was apparent (Figure 3C). MutS wedges a Phe side chain into the mismatch site, where this side chain is stacked onto one of the mismatched bases. This interaction changes the orientation of the stacked base, which originally formed the G·T-type mismatch shown in Figure 3A, so that a hydrogen bond forms between this base and the adjacent Glu. Then, bifurcated hydrogen bonds form between the thymine O4 and the nitrogen atoms of guanine in the MutS–DNA complex31,32. Our results support the formation of the G·5caC base pair shown in Figure 3A.

In living cells, DNA synthesis by DNA polymerase on 5caC, which cannot induce a typical mutation and cannot block DNA polymerase activity through replication, may also lead to adverse effects. Although TDG can remove these modified forms of cytosine from the genome, substantial amounts of 5caC remain in tissues and cells19. Therefore, when the 5caC on a DNA template pairs with the incoming dGTP via replicative DNA polymerase in the S phase, the delay in DNA synthesis may slow replication around the site and delay proper cell cycle progression. We showed that DNA polymerase generated G·T mismatch-mimicking G·5caC pairs that were recognized by the MutSα complex. MMR can eliminate the G·T mismatch-mimicking base pairs to remove the daughter strand, but cannot remove the modified cytosine. This process may induce abortive turnover of DNA synthesis (Figure 4G). As shown in Figure 4, MSH2 knockdown rescued the cell death phenotype induced by TDG knockdown. We suggest the G·T mismatch-mimicking base pairs formed by 5caC behaved as DNA lesions processed by MMR. This scenario is similar to a model of apoptosis triggered by O6-methylguanine (O6-meG)34,35, which gives rise to O6-meG·T mismatches that are subject to abortive MMR and apoptosis. Therefore, 5caC induced by knockdown of TDG may drive genomic instability in mammalian cells, similar to O6-meG. TDG knockout leads to embryonic lethality in mice, presumably due to aberrant epigenetic modifications, especially DNA methylation status and transcriptional defects21,22. However, G·T mismatch-mimicking G·5caC pairs may also contribute to embryonic lethality in TDG-knockout mice. Additionally, because 5caC residues behave as DNA lesions that slow DNA replication via the exonuclease function of DNA polymerase, it may be removed before replication to prevent formation of lethal DNA lesions.

MSH2 knockdown partially rescued the TDG-knockdown phenotype. This suggests that other effects of 5caC persist, independent of replication and MMR. Because the 5caC on transcribed strands induces transcriptional pausing42, a part of the residual lethality may be attributed to lesions encountered during transcription. Spruijt et al. reported that 5caC recruits a large number of DNA repair proteins in mouse ES cells, including BER (Neil1, Neil3 and Mpg) and MMR (Msh3 and Exo1)43. Thus, these DNA repair proteins may be involved in removing 5caC from genomic DNA to rescue the TDG-knockdown phenotype under certain circumstances. Interestingly, in TDG-knockdown mouse ES cells, no apoptotic effects are observed23. This may support the observation that 5caC decarboxylation exists in mouse ES but not HeLa cells44. This activity might determine the extent of cell viability in cells with accumulated 5caC DNA lesions.

Our results indicated that the electrostatic repulsion of oxidatively modified cytosines 5caC paired with guanine influenced the exonuclease activity of DNA polymerases and the damage recognition step of MMR. This process may lead to abortive turnover of MMR. Thus, 5caC residues that were assumed to be intermediates for an active demethylation pathway may be oxidative DNA lesions that must be removed before replication. These findings provide an important new perspective on the potential functional interplay between cytosine modification status and replication.

Methods

DNA substrates

Thirty-mer DNA substrates containing 5mC or 5hmC were synthesized at Tsukuba Oligo Service and purified by high-performance liquid chromatography (HPLC). DNA substrates containing 5fC or 5caC were synthesized in an Applied Biosystems 3400 DNA synthesizer (Applied Biosystems) by using phosphoramidite building blocks purchased from Glen Research and were purified by HPLC. The oligonucleotides were 5′-phosphorylated using (γ-32P)-ATP (PerkinElmer Life Sciences) and T4 phosphoramidite kinase (TaKaRa). Unincorporated nucleotides were removed using MicroSpin G-25 columns (GE Healthcare).

In vitro DNA synthesis assays

DNA synthesis assays were performed as described30. Briefly, the 5′-32P-labeled primer-template complex was prepared by mixing the primer with a template containing the indicated sequence context at a molar ratio of 1:1. Ten-microliter reaction mixtures containing 10 mM Tris-HCl (pH 7.9), 50 mM NaCl, 10 mM MgCl2, 40 nM of a labeled primer-template complex and the indicated DNA polymerases were incubated. The reactions were terminated by adding 10 μL of stop solution containing 95% formamide, 10 mM EDTA, 0.025% bromophenol and 0.025% xylene cyanol. The fragments were separated by electrophoresis on a denaturing polyacrylamide gel, dried and analyzed using a Fuji FLA-7000 phosphorimager (Fujifilm).

EMSAs

Standard binding reaction mixtures (10 μL) contained 1 nM 32P-labeled substrate and the purified 2 nM MutSα complex in binding buffer A containing 10 mM HEPES-KOH (pH 7.7), 50 mM KCl, 2 mM MgCl2, 1 mM EDTA, 1 mM DTT and 0.1 mM ADP. The reaction mixtures were incubated and loading buffer (2 μL) containing 50% glycerol, 0.5% bromophenol blue and 0.5% xylene cyanol was added45. The samples were separated on nondenaturing 6% polyacrylamide gels.

MutSα complex binding assays

Binding reaction mixtures (20 μL) contained 1 μM biotin-labeled 34-mer mismatch substrate, poly dI-dC (1 μg) and HeLa cell extracts (10 μg) in binding buffer A. The reaction mixtures were incubated at 4°C for 60 min and then with 5 μL DynaBeads M-280 (Invitrogen) for 30 min. Unbound proteins were removed by washing the beads three times with binding buffer A containing 1% Triton X-100. Bound fractions were separated on 10% SDS-page gels and analyzed by western blotting with 0.25 ng MSH2 antibodies (G219-1129, BD Biosciences) or 0.25 ng MSH6 antibodies (44/MSH6, BD Biosciences).

Knockdown experiments

siRNA duplexes specifically targeting TDG (SI02665040) and MSH2 (SI02663563) and nontargeting control siRNAs (1027280) were purchased from Qiagen and transfected into cells using Lipofectamine RNAiMAX (Invitrogen) according to the manufacturer's instructions. Four days after siRNA transfection, the cells were trypsinized and viable cells were counted. Total RNA was isolated using RNeasy Mini Kit (Qiagen) and cDNA was generated with the SuperScript VILO Master Mix (Invitrogen). Real-time quantitative PCR (qPCR) was performed on an Mx3005P QPCR System (Agilent Technologies) using SYBR Green reagent (Roche Applied Science). cDNA levels of the target genes were analyzed by the comparative Ct method and normalized to ACTB. qPCR primers are listed in the Supplementary Table. To quantify apoptotic cells, a Tali Apoptosis Kit - Annexin V Alexa Fluor 488 and Propidium Iodide (Invitrogen) was used according to the manufacturer's instructions. Mass spectrometry analyses to quantify 5caC were performed as previously described19.