Insights into the substrate discrimination mechanisms of methyl-CpG-binding domain 4

G:T mismatches, the major mispairs generated during DNA metabolism, are repaired in part by mismatch-specific DNA glycosylases such as methyl-CpG-binding domain 4 (MBD4) and thymine DNA glycosylase (TDG). Mismatch-specific DNA glycosylases must discriminate the mismatches against million-fold excess correct base pairs. MBD4 efficiently removes thymine opposite guanine but not opposite adenine. Previous studies have revealed that the substrate thymine is flipped out and enters the catalytic site of the enzyme, while the estranged guanine is stabilized by Arg468 of MBD4. To gain further insights into mismatch discrimination mechanism of MBD4, we assessed the glycosylase activity of MBD4 toward various base pairs. In addition, we determined a crystal structure of MBD4 bound to T:O6-methylguanine-containing DNA, which suggests the O6 and N2 of purine and the O4 of pyrimidine are required to be a substrate for MBD4. To understand the role of the Arg468 finger in catalysis, we evaluated the glycosylase activity of MBD4 mutants, which revealed the guanidinium moiety of Arg468 may play an important role in catalysis. D560N/R468K MBD4 bound to T:G mismatched DNA shows that the side chain amine moiety of the Lys stabilizes the flipped-out thymine by a water-mediated phosphate pinching, while the backbone carbonyl oxygen of the Lys engages in hydrogen bonds with N2 of the estranged guanine. Comparison of various DNA glycosylase structures implies the guanidinium and amine moieties of Arg and Lys, respectively, may involve in discriminating between substrate mismatches and nonsubstrate base pairs.


INTRODUCTION
Guanine:thymine mispairs are the most abundant mismatches found in genomic DNA ( Figure 1) (1-3). These mismatches are produced by various pathways. Replication errors of DNA polymerases via tautomerization preferentially generate G:T and A:C mismatches (2,(4)(5)(6)(7)(8). The exocyclic amine of 5-methylcytosine found in the CpG repeats is susceptible to hydrolysis, undergoing spontaneous deamination of 5-methylcytosine to give G:T mismatches (9,10). In addition, G:T mismatches at the CpG islands can be produced by the deamination of 5-methylcytosine by activation-induced deaminase (AID), which initiates active DNA demethylation process ( Figure 1A) (11). The G:T mismatches, if left undetected prior to replication, can promote C to T transition mutations. The promutagenic G:T mispairs can be repaired by DNA mismatch repair (MMR) pathway (12,13) as well as base excision repair pathway that involves mismatch-specific DNA glycosylases such as thymine DNA glycosylase (TDG) and methyl-CpG-binding domain 4 (MBD4/MED1) (14)(15)(16). Both TDG and MBD4 can cleave thymine paired with guanine to produce abasic sites, which are further processed by downstream base excision DNA repair enzymes including AP endonuclease, DNA polymerase, and DNA ligase.
Biochemical and structural characterization of MBD4 has provided important insights into the catalytic and substrate recognition mechanisms of the enzyme. Crystal structures of the catalytic domain of MBD4 bound to G:T mismatch-containing DNA show that the substrate thymine opposite guanine is flipped out and enters the catalytic site of the glycosylase to engage in thymine-specific contacts ( Figure 1B) (20)(21)(22). An intercalating Arg residue (Arg468 in hMBD4 and Arg442 in mMBD4) facilitates base flipping and stabilizes the flipped-out thymine by phosphate pinching ( Figure 1C). The presence of Asp residue (D560 in hMBD4 and D534 in mMBD4) within the catalytic pocket of the mismatch-specific DNA glycosylase is critical for glycosylase activity. Our recent studies show MBD4 uses Aspassisted general base catalysis mechanism to generate abasic sites bearing the C1´-(S)-OH (22), indicating bimolecular displacement mechanism for the thymine excision ( Figure 1B). The resulting abasic sites are stabilized by a hydrogen bond network, highlighting the role of MBD4 in preventing the aberrant release of the toxic abasic site.
O6-Methylguanine (O6MeG) is a highly mutagenic lesion that preferentially pairs with thymine during translesion DNA synthesis, promoting G to A transversions (23)(24)(25)(26)(27). Endogenous (e.g., S-adenosylmethionine) and exogenous (e.g., N-methyl-N-nitrosourea) alkylating agents attack DNA to form O6MeG as a minor adduct. The formation of the deleterious O6MeG adduct is also induced by an anticancer methylating drug temozolomide and contributes to the cytotoxicity of methylating anticancer agents (28). The genotoxic O6MeG is detoxified by methylguanine methyltransferase (MGMT), a sacrificial protein that removes the O6-methyl group by covalently modifying the catalytic cysteine residue of the protein. O6MeG:T mismatch can trigger a futile cycling of mismatch repair (MMR) (29), which can cause cell cycle arrest and cell death. In vitro studies show that O6MeG:T is a substrate mismatch for MBD4 and TDG (30). The base excision repair enzyme MBD4 is known to associate with mismatch repair enzymes (e.g., MLH1) and mediate alkylation damage response to methylating agents and other DNA damaging agents (16,31).
How does MBD4 recognize thymine opposite guanine or O6MeG in the presence of millions-fold excess of non-substrate base pairs? Answering this question would further our understanding of the catalytic and substrate base pair recognition mechanisms of G:Tmismatch-specific DNA glycosylases. To this end, we conducted biochemical characterization of wild-type or mutant hMBD4 encountering mismatch-containing DNA. In addition, we report two crystal structures of human MBD4 in complex with G:T-or O6MeG:T-containing DNA, which provide new insight into the substrate recognition and glycosylase mechanisms of MBD4.

Materials and Methods
Oligonucleotides used for biochemical and structural studies.

Plasmid construction of human MBD4 glycosylase domain.
The gene of human MBD4 glycosylase domain with residues 425-580 was amplified by PCR and inserted into pET28a vector. Using site-directed mutagenesis, single mutants D560N, R468K, R468Q, R468E, R468F, R468Y, and R468L and double mutant R468K/ D560N, were generated. All mutant plasmids were verified by sequencing (ICMB Core Facility, University of Texas at Austin).

Protein expression and purification of MBD4 glycosylase domain.
MBD4 glycosylase domain and its mutants R468K, R468Q, R468E, R468F, R468Y, and R468L were expressed using pET28a vector (21,22). The proteins were expressed in E. coli BL21 (DE3) cells. Cultures were grown in Luria-Bertani (LB) broth medium at 37 °C until the OD 600 of 0.5, where cells were induced with 0.25 mM of isopropyl β-D-αthiogalactopyranoside (IPTG) for four hours at 37 °C. Cultures for the MBD4 glycosylase domain single mutant D560N and double mutant R468K/D560N were also expressed in E. coli BL21 (DE3) cells, grown in LB medium at 37 °C until the OD 600 of 0.5, and induced by adding 0.5 mM IPTG at 28 °C for five hours. All the MBD4 glycosylase domain constructs followed the same purification steps. Pelleted cells were resuspended in buffer (50 mM sodium phosphate, pH 7.8, 500 mM NaCl, 10% glycerol, 1 mg/ml lysozyme, 0.25% NP-40, 0.25% Triton X-100, and 0.25 mM phenylmethylsulfonyl fluoride) and sonicated for 30 seconds. The lysate was centrifuged at 15000 g at 4 °C for 20 min. The MBD4 glycosylase domain protein supernatant was purified using Ni-NTA column (GE Healthcare). The imidazole-eluted fractions from the Ni-NTA column were combined and buffer-exchanged to low imidazole buffer, treated with thrombin (10 units), and incubated for 16 hours at 4 °C. This removed the 6-His tag, thereby leaving 19 extraneous N-terminal amino acids (GSHMASMTGGQQMGRGSEF). The cleaved proteins were further purified on Superdex 75 column (GE Healthcare) equilibrated with a buffer (50 mM Tris, pH 7.5, 150 mM NaCl, and 10% glycerol). Melting temperatures (T m ) of wild-type MBD4 and the MBD4 mutants were determined by UV-VIS spectroscopy to ensure proper folding of expressed proteins.

DNA glycosylase activity assay.
The glycosylase activities of wild-type MBD4 on various base pairs (G:T, O6MeG:T, HX:T, 2,6-AP:T, 2-AP:T) were performed using FAM-labeled oligonucleotides (see oligonucleotides above; Figure 2). The glycosylase activities of MBD4 Arg468 mutants were performed using 27-mer, 3'-FAM-labeled oligonucleotide duplex containing a G:T mismatch in a CpG region (5'-TCAGATCGCGCCGGC TGCGATAAGCT-3' and 5'-AGCTTATCGCAGCTGGCGCGAATCTGA-3') (see Figure 5). The standard reaction mixture (20 μ;l) contained 50 nM of labeled double stranded oligonucleotide and 2.5 uM of purified enzyme in 20 mM HEPES pH 7.5, 50mM KCl, 1 mM DTT, 1 mM EDTA, and 0.1 mg/ml bovine serum albumin. The mixture was incubated at 37 °C for 30 min, and the reaction was stopped by adding 4 μ;l of 0.5 N NaOH followed by boiling for 10 min to cleave DNA at the abasic site. After adding loading dye (5 μ;l, 98% formamide, 1 mM EDTA, 1 mg/ml of Bromophenol Blue, and 1 mg/ml Xylene Cyanole), the reaction sample was boiled for another 10min and loaded immediately onto a 10 cm x 10 cm 20% denaturing urea gel. The intensities of the FAM-labeled DNA were measured by using Typhoon FLA9500 (GE Healthcare).

Single turnover kinetics of wild-type and R468K MBD4 catalytic domain.
Glycosylase activity assays of wild-type and R468K MBD4 were conducted using FAMlabeled 27-mer duplexes containing T:G or O6MeG:T mismatches at the central position. The mismatch-containing 27-mer duplexes (0.25 μ;M) and 10-fold excess of wild-type or R468K mutant MBD4 catalytic domain (2.5 μ;M) were mixed in reaction buffer (10 mM Tris-HCl, pH 8.0, 1mM EDTA, 0.1% BSA) and incubated for 0-30 min at room temperature. The glycosylase reactions were stopped at designated timepoints by adding 2 μ;L 1N NaOH and the resulting solutions were boiled for 10 min to cleave DNA at abasic sites. Loading buffer (20 μ;L, 98% formamide, 1mM EDTA, 1mg/mL Bromophenol Blue) was added to the mixture and the samples were boiled for 10 min. The resulting solutions were immediately cooled and loaded onto a 10 cm x 10 cm 20% denaturing urea PAGE gel. The intensities of FAM-labeled DNA were measured via Typhoon FLA9500 (GE Healthcare) and quantified using ImageJ (NIH). GraphPad Prism 9 (Graphpad Software Inc.) was used to fit the data by non-linear regression: Y = (Y 0 -Plateau)*exp(−kt)+Plateau, where Y is the percent nicked yield, k is the rate constant (min −1 ), and Plateau is the Y value at infinite times. The kinetic values were derived from independent experiments and reported as the mean value ± standard mean error based on three independent replicates.

DNA binding assay.
DNA binding assays were performed by incubating D560N MBD4 with increasing concentrations (12-750 nM) and 50 nM of 3'-FAM-labeled G·T-containing DNA or O6MeG·T-containing DNA in 20 mM HEPES pH 7.5, 50 mM KCl, 1 mM DTT, 1 mM EDTA, and 0.1 mg/ml bovine serum albumin. The oligonucleotide sequence used in this assay was identical to the 27-mer used in the glycosylase activity assay. After incubating the mixture at room temperature for 30 min, loading buffer (4 μ;l, 60% TBE 0.25 and 40% glycerol) was added to the reaction mixture. The resulting mixture was loaded onto a 10 cm x 10 cm 10% native acrylamide gel and electrophoresed in 0.5 M TBE for one hour.

Protein-DNA co-crystallization.
A 12-mer crystallization DNA was annealed at a 1:1 ratio and mixed with the protein (1.5 mg/mL) in a 1:5 molar ratio of protein to DNA in a buffer containing 5 mM Tris pH 8.0 and 25 mM NaCl. The DNA-protein mixture was incubated on ice for two to three hours. Cocrystals of D560N MBD4-DNA complex were grown in 30% ethylene glycol. The crystals of R468K/D560N MBD4-DNA complex were obtained in 15% PEG1500, 0.1 M MES pH 6.5 and 0.2 M sodium acetate. D560N MBD4-O6MG·T and wild-type MBD4-nicked DNA complexes were crystallized in 21% ethylene glycol. The hanging drop method was used for crystallography by mixing 1 μ;L of protein-DNA complex solution with 1 μ;L of reservoir solution. Crystals of MBD4 were grown at 22 °C over two weeks. The crystals were flash frozen in liquid N 2 and used for data collection.

Data collection and structure determination.
Diffraction data sets for crystals of the MBD4 complexes were collected at a wavelength of 0.97948 Å on beamline 5.0.3 at the Advanced Light Source (Berkeley, CA). Diffraction data for the MBD4 complexes were indexed and scaled by using HKL2000. The MBD4-DNA complex structures were solved by molecular replacement using a MBD4-DNA complex (Protein Data Bank accession code 4E9G) as the search model. Manual model building was carried out with COOT (32) and refined using CCP4 (33) and REFMAC (34). The quality and stereochemistry of the models were examined with PROCHECK. All the crystallographic figures were prepared using PyMOL. Statistics of data processing, data quality, and the refined models are summarized in Table 1.

Characterization of substrate base pairs for MBD4.
Previous studies have shown that MBD4 efficiently recognizes and excises T:G mismatches (15). In addition, the enzyme has been also shown to cleave thymine opposite the highly mutagenic O6-methylguanine (O6MeG) lesion (16). How does MBD4 cleave thymine opposite guanine but not thymine opposite adenine? To gain further insights into the substrate discrimination mechanism of MBD4, we assessed the glycosylase activity of MBD4 toward various base pairs including G:T, O6MeG:T, hypoxanthine (HX):T, 2,6diaminopurine (2,6-AP):T, and 2-aminopurine (2-AP):T (Figures 2A and 2B). As observed previously (16), thymine opposite guanine was efficiently cleaved by MBD4, resulting in the formation of a fast-moving band after treatment of the reaction mixture with sodium hydroxide ( Figure 2C). The glycosylase activity of wild-type MBD4 on G:T and O6MeG:T under single turnover conditions showed that rate constants (k obs ) for excision of thymine opposite O6MeG and G were 0.24 and 0.09 min −1 , respectively ( Figure 3) (16), indicating that MBD4 processes G:T mismatches more efficiently than O6MeG:T mismatches. While G:T preferentially forms a wobble base pair, O6MeG:T can form a base pair with pseudo-Watson-Crick geometry (25,27), which could contribute to MBD4's reduced glycosylase activity toward O6MeG:T. Thymine opposite hypoxanthine, which forms a wobble base pair in duplex DNA (Figure 2A), was not processed by the enzyme, indicating that the hydrogen bond between the N2 of guanine and the backbone carbonyl oxygen of Arg468 is critical for substrate recognition and catalysis. When thymine is paired with 2,6-diaminopurine or 2aminopurine, it was not processed by MBD4 ( Figure 2C), suggesting base pairs that form an ideal Watson-Crick geometry are not substrates of MBD4. Taken together, these results signify that MBD4 recognizes and processes non-Watson-Crick purine:thymine base pairs that contains N2 of purine.

Structural basis for inefficient excision of O6MeG:T mismatch by MBD4.
How does MBD4 recognize and excise thymine opposite O6MeG? Why is O6MeG:T base pair less efficiently removed by MBD4 than G:T base pair? To gain structural insight into these questions, we determined a cocrystal structure of D560N MBD4 bound to O6MeG:Tcontaining DNA, which was refined to 2.2 Å resolution. The overall structure of the D560N MBD4-O6MeG:T complex is very similar to that of the published D560N MBD4-G:T complex structure (PDB code: 4OFA, RMSD = 0.244 Å for all heavy atoms of the MBD4-DNA complexes) (22). The estranged O6MeG remains intrahelical, while the substrate thymine is flipped out and fully inserted into the catalytic pocket of the enzyme (Figure 4). The flipped-out state is stabilized by Arg468, which approaches from the minor groove and intercalates between the bases 5' and 3' to the substrate thymine. The backbone carbonyl oxygen of Arg468 engages in hydrogen bond interaction with the N2 of the orphaned O6MeG. In addition, the guanidinium moiety of Arg468 forms polar interactions with the phosphate oxygens near the substrate thymine, stabilizing the extrahelical conformation of the thymine.
The Watson-Crick edge of the substrate thymine forms hydrogen bonds with Gln449, Tyr504, and Val448 ( Figure 5), as similarly observed in the G:T mismatch structure (22). Unlike in the D560N MBD4-G:T structure, the carbonyl oxygen of Arg468 in the D560N MBD4-O6MeG:T structure is not hydrogen bonded to the N1 of the estranged purine. To evaluate whether the reduction in glycosylase activity is caused by MBD4's weaker binding toward O6MeG:T-containing DNA, we compared the enzyme's binding to G:T-and O6MeG:T-containing DNA using a non-denaturing polyacrylamide gel electrophoretic mobility shift assay ( Figure 5C). The assay shows that the enzyme more tightly binds G:Tcontaining DNA than O6MeG:T-containing DNA, suggesting that the weaker binding of MBD4 towards O6MeG:T-containing DNA could contribute to the reduced glycosylase activity.

The role of Arg468 in MBD4-mediated excision of thymine opposite guanine
Structural studies of MBD4 have shown that, during the substrate base flipping, Arg468 penetrates DNA from the minor groove, is inserted into the void created by the flipped-out thymine, and forms hydrogen bonds with the two phosphate oxygens near the flipped thymine (19,22). These interactions can stabilize the flipped-out thymine in a productive state for catalysis. One question is whether Arg468 is essential for the catalytic activity of MBD4. As DNA glycosylases use various amino acid residues (e.g., glutamine (MutY), leucine (AlkA), tyrosine (AAG), arginine (TDG, MBD4, MUG), Asn (OGG1)) as an intercalating moiety (17,35), we wanted to evaluate whether the guanidinium moiety of Arg468 plays a role in MBD4-mediated excision of substrate thymine. Such knowledge would provide insights into the role of the arginine finger in glycosylase reaction.
To assess the role of Arg468 in substrate excision, we evaluated the effect of Arg468 mutation on the glycosylase activity of MBD4. We mutated Arg468 into lysine, glutamine, phenylalanine, tyrosine, leucine, and glutamate and assessed the catalytic activity of the MBD4 mutants ( Figure 6). Single turnover kinetic studies showed the R468K mutant protein was as efficient as wild type MBD4 in excising G:T mismatches ( Figure 6B and Figure 3B). The R468Q MBD4 mutant also excised thymine opposite guanine, albeit to a much lesser extent ( Figure 6B). In the case of R468F, Tyr, Leu, and Glu mutants, the thymine cleavage opposite guanine was not observed, revealing the identity of the intercalating moiety plays an important role in MBD4-mediated glycosylase reaction. Since the stabilization of a flipped-out substrate can be achieved by various intercalating amino acid residues, the observed loss of the excision activity is less likely to result from the lack of the phosphate pinching; a direct (e.g., Gln48 in MutY (36)) or water-mediated phosphate pinching (e.g., Tyr162 in hAAG (37), Leu125 in E. coli AlkA (38), Asn149 in hOGG1 (39)) are used in DNA glycosylases. The weak or lack of glycosylase activity of the glutamine, phenylalanine, tyrosine, leucine and glutamate mutants together with the strong activity of the lysine mutant suggests that the positively-charged side chain (NH 3 + of lysine and guanidinium moiety of arginine) may play an important role in the catalysis of MBD4.

Structure of R468K/D560N MBD4 in complex with G:T-mismatched DNA
To provide the structural basis for the catalytic activity of R468K MBD4, we determined a crystal structure of R468K/D560N MBD4 mutant in complex with G:T mismatched DNA (Figure 7). The R468K/D560N MBD4 structure (PDB ID 4OFE), which was refined to 2.2 Å resolution, explains how the R468K mutant efficiently cleaves thymine opposite guanine. The R468K is similarly positioned as Arg468 in the D560N MBD4-G:T structure ( Figure  5A-B) and the D560N MBD4-O6MeG:T structure (Supplementary Figure 1). Unlike Arg468, which directly hydrogen bonds to the 3' phosphate oxygen of substrate thymine nucleotide, R468K recruits a water molecule to engage in hydrogen bonding with the phosphate oxygen ( Figure 7A). Unlike Arg468, R468K does not hydrogen bond with the 5' phosphate oxygen of the nucleotide adjacent to the flipped thymine, indicating that the hydrogen bond interaction with the 5' phosphate oxygen is not required for catalysis ( Figure  7B). The lack of a hydrogen bond between R468K and the 5'-phosphate oxygen leads to a relatively relaxed phosphate backbone conformation. The relaxed conformation, however, does not affect the recognition of the flipped-out thymine by the active site residues. The catalytic water molecule is hydrogen bonded to D560N and proximal to the C1' of the substrate thymine.
Potential role of arginine finger in recognition of G:T mismatches.-Most DNA glycosylases use intercalating amino acid residues to stabilize the flipped substrate bases and facilitate the substrate excision. DNA glycosylases employ a wide variety of amino acid residues −such as Gln (e.g., Gln28 of Bst. MutY; Supplementary Figure 2), Leu (e.g., Leu125 of E coli. AlkA), Tyr (e.g., Tyr162 of hAAG), Asn (e.g., Asn149 of hOGG1), and Arg (e.g., Arg275 of hTDG)− in phosphate backbone pinching, thereby stabilizing the baseflipped conformation (Figure 8). The phosphate pinching involves either direct or watermediated hydrogen bonds to the 5' and/or 3' to the flipped base.
As the phosphate pinching-mediated stabilization of a flipped base can be conducted by a wide variety of amino residues [e.g., Leu (nonpolar), Tyr (bulky), Gln/Asn (polar)], the requirement of the Arg finger in G:T-mismatch-specific DNA glycosylases suggests additional roles of the Arg finger in glycosylase mechanism. Fluorescence base flipping assay indicates that the guanidinium moiety of Arg275 in human TDG promotes base flipping (43,44). In particular, R275A TDG binds G:U-containing DNA with only slightlyreduced affinity relative to wild-type hTDG but it does not induce base flipping, suggesting that the guanidinium moiety participates in intrahelical-sensing and base-flipping of G:T mismatches. While other mutant forms of hMBD4 (e.g., Glu, Tyr, Phe, Asn, Leu) are devoid of glycosylase activity, R468K MBD4 mutant displays a comparable excision efficacy to wild-type MBD4 (Figure 3), suggesting that G:T mismatch recognition and base-flipping promotion by R468K residue would be similar to that by R468 of MBD4. The ε-amino group and the guanidinium moiety of Lys and Arg fingers, respectively, might interact with O6 of dG and O4 of dT while dG and dT are intrahelical, thereby facilitating mismatch recognition and base flipping. Our results imply that, unlike lesion-specific glycosylases, G:T mismatch-specific DNA glycosylases rely on an intercalating residue's ability to sense and/or disrupt intrahelical mismatches and induce base flipping.