Structural basis of catalysis and substrate recognition by the NAD ( H )-dependent α-D-glucuronidase from the glycoside hydrolase family 4

Members of the glycoside hydrolase family 4 (GH4) employ an unusual glycosidic bond cleavage mechanism utilizing NAD(H) and a divalent metal ion, under reducing conditions. These enzymes act upon a wide diverse range of glycosides, and unlike most other GH families, homologs here are known to accommodate both α- and β-anomeric specificities within the same active site. Here, we report the catalytic properties and the crystal structures of TmAgu4B, an α-d-glucuronidase from the hyperthermophile Thermotoga maritima. The structures in three different states include the apo form, the NADH bound holo form, and the ternary complex with NADH and the reaction product D-glucuronic acid, at 2.15. 1.97 and 1.85 Å resolutions, respectively. These structures reveal the step-wise route of conformational changes required in the active site to achieve the catalytically competent state, and illustrate the direct role of residues that determine the reaction mechanism. Furthermore, a structural transition of a helical region in the active site to a turn geometry resulting in the rearrangement of a unique arginine residue governs the exclusive glucopyranosiduronic acid recognition in TmAgu4B. Mutational studies show that modifications of the glycone binding site geometry lead to catalytic failure and indicate overlapping roles of specific residues in catalysis and substrate recognition. The data highlight hitherto unreported molecular features and associated active site dynamics that determine the structure-function relationships within the unique GH4 family.


Introduction 20
The glycoside hydrolases (GHs) constitute a large group of enzymes of 167 GH families 21 classified currently in the Carbohydrate Active Enzyme database (CAZy) [1,2]. In general, GHs 22 catalyze hydrolysis through one of two distinct processes that are reliant on an acid/base pair of 23 carboxylic residues. The stereochemical outcome is either an inversion or retention of the 24 anomeric center of the glycone product [3]. In both cases, the steps involve a nucleophile 25 stabilized oxocarbenium ion-like transition state [3][4][5]. Members of the GH4 and GH109 26 families employ an unusual NAD(H)-dependent glycosidic bond cleavage mechanism [6][7][8][9]. 27 Here the cofactor oxidizing power is used to generate a transient keto intermediate at the glycone 28 as a means of activating the ring to achieve the intended cleavage [10][11][12][13]. Apart from NAD(H), 29 the GH4 enzymes require a divalent metal ion, usually Mn 2+ , and reducing conditions for 30 catalysis [14][15][16]. The mechanism has been dissected in detail through a combination of studies 31 Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200824/904005/bcj-2020-0824.pdf by guest on 11 February 2021 Biochemical Journal. This is an Accepted Manuscript. You are encouraged to use the Version of Record that, when published, will replace this version. The most up-to-date-version is available at https://doi.org/10.1042/BCJ20200824 maritima α-D-glucosidase (TmAglA) with NAD + and substrate maltose (PDB: 1OBB), the 23 binding geometry of the NAD N is in a syn orientation and represents a non-productive ternary 24 complex [14]. The structure of metal-bound apo form of Thermotoga neapolitana TnAgl (PDB: 25 3U95) was reported with thermostability properties although its activity is unknown [23]. The 26 other structures, namely, KpAglB (PDB: 6DVV), GsLicH (PDB: 1S6Y), BsLplD (PDB: 3FEF), 27 are of homologs from structural genomics efforts, which are apo/metal-bound holo forms and 28 remain unpublished. In particular, finer details of the structural basis of substrate specificity and 29 stereo-selectivity, and the conformational changes associated with the mechanism is not well 30 described due to the limited structural data from homologs of this unique family. Because of 31 Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200824/904005/bcj-2020-0824.pdf by guest on 11 February 2021 Biochemical Journal. This is an Accepted Manuscript. You are encouraged to use the Version of Record that, when published, will replace this version. The most up-to-date-version is available at https://doi.org/10.1042/BCJ20200824 their unique mechanism and taxonomic occurrence, these enzymes can be useful targets for 1 inhibitor design. It is noteworthy that T. maritima, an anaerobic, hyperthermophilic and 2 heterotrophic bacterium, with a relatively small genome, encodes a large number of glycoside 3 hydrolases compared to other bacterial and archaeal genomes. This is related to its capacity to 4 utilize an extensive range of simple and complex carbohydrates and is a likely outcome of 5 evolutionary processes that shaped its ability to grow in diverse global geothermal conditions 6 [24,25]. 7 We have previously reported the structure of citrate (a competitive inhibitor) and Co 2+ -8 bound complex of the T. maritima GH4 α-D-glucuronidase (TmAgu4B) (PDB: 6KCX, 1.95 Å 9 resolution) [26]. In the present study, we report three crystal structures of TmAgu4B, in the apo 10 form, the holo form with NADH, and the ternary complex with NADH and product D-glucuronic 11 acid (GlcA). These structures, supported by activity studies of a series of substitution variants, 12 reveal the structural determinants of the mechanism and the specificity for glucuronides. Also, 13 the structures illustrate the step-wise route of conformational changes in the active site, 14 beginning from the apo through to a holo form and achieving the final ternary complex that is 15 competent for hydrolysis. These studies expand our understanding of the distinct molecular 16 features and dynamics that define the mechanism, and the substrate binding and specificity, in 17 this unusual family of glycoside hydrolases. 18 19

Recombinant DNA constructs and site-directed mutagenesis 21
The TM0752 gene encoding protein TmAgu4B (471 residues) was obtained as a plasmid from 22 the DNASU plasmid repository. The gene in pMH2T7 vector contains a N-terminal 12 residue 23 His 6 tag. All mutants were generated by PCR using site-directed mutagenesis method. Briefly, 24 Dpn1 enzyme was added to PCR products and the reaction mixture was kept for 4 h at 37

Protein expression and purification 1
The plasmids were transformed into BL21 CodonPlus (DE3)-RIL cells and were grown in LB 2 Broth supplemented with 0.1 mg ml -1 ampicillin and 0.034 mg ml -1 chloramphenicol, and grown 3 at 37 °C with shaking conditions. Gene expression was induced at A 600 ~0.6 by adding 0.2 % 4 (w/v) L (+) arabinose and the cell culture grown at 30°C for another 4-5 h. The harvested cells 5 were sonicated in buffer (50 mM Tris HCl pH 7.5, 20 mM imidazole, 300 mM NaCl). The lysate 6 was centrifuged and the supernatant was subjected to heat shock (60°C for 35 min) and pelleted 7 down by centrifugation. The clear supernatant was applied to Ni-Nitrilotriacetic acid (Ni 2+ -NTA) 8 sepharose column (GE Healthcare) equilibrated with lysis buffer. The column was washed with 9 wash buffer (50 mM Tris-HCl pH 7.5, 50 mM imidazole, 300 mM NaCl,) and protein was eluted 10 with elution buffer (50 mM Tris-HCl, pH 7.5, 250 mM imidazole, 300 mM NaCl). The eluted 11 protein was desalted and stored in buffer (50 mM Tris-HCl pH 7.5, 100 mM NaCl). The 12 oligomeric states of purified samples were estimated by size exclusion chromatography 13 performed at 4 °C using a Superdex TM 200 (10/300 GL) analytical column (GE Healthcare). The 14 protein sample (5 mg ml -1 in 200 μl) was first incubated with 300 mM DTT, 100 mM NAD + , 100 15 mM MnCl 2 at room temperature. Protein concentration was determined by measuring absorbance 16 at 280 nm, using a calculated value of extinction coefficient of 101885 M -1 cm -1 . 17

Enzyme assays 19
In general, in-vitro characterization of enzymatic properties of GHs are carried out using p-20 nitrophenyl derivatives of monosaccharides, although the natural substrates of these enzymes 21 may be poly-, oligo-saccharides and other sugar moieties. In our study, six synthetic 22 monosaccharide derivatives, namely, p-nitrophenyl-α-D-glucuronic acid (pNP-α-GlcA), p-23 nitrophenyl-β-D-glucuronic acid (pNP-β-GlcA), p-nitrophenyl-α-D-glucose (pNP-α-Glc), p-24 nitrophenyl-β-D-glucose (pNP-β-Glc) and p-nitrophenyl-α-D-galactose (pNP-α-Gal), p-25 nitrophenyl-β-D-galactose (pNP-β-Gal), purchased from TCI India Pvt. Ltd, were used to 26 examine the substrate specificity of TmAgu4B. The NAD + , MnCl 2 and D-glucuronic acid (GlcA) 27 were purchased from Sigma-Aldrich. All chemicals and buffers were of analytical grade. α-D-28 glucuronidase activity of TmAgu4B was determined by monitoring the continuous release of 29 product p-nitrophenol at 405 nm over 2 min in a Perkin Elmer Lambda 25 UV-visible 30 spectrophotometer with a Peltier system. The reaction mixture contains MnCl 2 , NAD + , DTT and 31 Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200824/904005/bcj-2020-0824.pdf by guest on 11 February 2021 Biochemical Journal. This is an Accepted Manuscript. You are encouraged to use the Version of Record that, when published, will replace this version. The most up-to-date-version is available at https://doi.org/10.1042/BCJ20200824 enzyme (preincubated at 40 °C for 2 min) to final concentration of 0.2 mM, 0.5mM, 30 mM and 1 1.8 µM respectively. The reaction was initiated by adding and mixing of 0.5 mM pNP-α-GlcA 2 (pre-incubated at 40 °C for 2 min) to a final volume of 100 µl [24]. The molar absorption 3 coefficient for p-nitrophenol under assay conditions used is 7200 mM -1 cm -1 . One unit of α-D-4 glucuronidase activity of TmAgu4B is defined as the amount of enzyme that releases 1 µmol of 5 p-nitrophenol per min under standard assay conditions. 6 The effect of pH and temperature on the α-glucuronidase activity of TmAgu4B were 7 determined by measuring the specific activity over the pH in the range of 6.5 to10 and 8 temperature in the range of 30 to 100 °C. The effect of different metals on α-glucuronidase 9 activity of TmAgu4B was determined by measuring the specific activity in the presence of 10 MnCl 2 , MgCl 2 , CaCl 2 . Activities of mutants were measured following the protocol described 11

above. 12
Bisubstrate initial velocity studies were performed by kinetic parameter estimation 13 for one of the two substrates, A and B, at different concentrations of the second substrate. The 14 substrate dependent rates were fitted to Michaelis-Menten kinetics using GraphPad Prism 5.0, 15 following substrate depletion method. In the first set of experiments, the concentration of pNP-α-16 GlcA was varied at three fixed concentrations of NAD + . In the second set, the substrates A and B 17 were reversed and the above described protocol was repeated. The two sets of data obtained 18 above used to test bisubstrate mechanisms using equations (1) and (2) All measurements were performed in triplicates. In order to differentiate between the sequential 27 and ping-pong mechanisms, the corresponding Lineweaver-Burk (LB) double-reciprocal plots 28 were plotted for each experiment using GraphPad Prism 5.0. Furthermore, the global non-linear 1 regression fitting of data to the equations (n=3) were carried out using the ANEMONA. XLT 2 Excel template to generate the kinetic parameters [28]. were mounted and flash-frozen in a nitrogen gas stream at 100 K. All data collected were 8 processed and scaled using MOSFLM and Aimless programs, as implemented in the CCP4 9 software package [29][30][31]. Structure solution and refinement was carried out using programs 10 from the PHENIX program suite [32]. The structure was solved by molecular replacement using 11 the PHASER program using one subunit of TmAgu4B (PDB: 6KCX) as the search model [26]. 12 Iterative rounds of restrained maximum-likelihood refinement using phenix_refine were carried 13 out. Model bias was removed by implementing simulated annealing protocols. The structure and 14 restraints of the ligands were generated using electronic Ligand Builder and Optimization 15 Workbench (eLBOW) program, as implemented in the PHENIX suite [33]. Model building was 16 carried out using the Coot program [34]. The stereochemical quality of the final models was 17 evaluated using the MolProbity program. Structure representations were generated using the 18 PyMOL program [35]. 19 20

Results and discussion 21
Substrate specificity and kinetic properties 22 Recombinant TmAgu4B was expressed in E. coli and purified to homogeneity. The protein, 23 composed of 471 amino acids, exists as a dimer in solution ( Fig. S1A and B). Thermal unfolding 24 studies revealed a canonical two-state unfolding process and a melting temperature T m of 90 °C, 25 consistent with a hyperthermophilic enzyme [36]. Our first objective was to establish the 26 substrate specificity of TmAgu4B using six synthetic substrates. Activity assays revealed that 27 TmAgu4B is specific to p-nitrophenyl-α-D-glucuronic acid (pNP-α-GlcA), with a specific 28 activity of 0.31 U/mg under assay conditions. However, p-nitrophenyl-β-D-glucuronic acid 29 and p-nitrophenyl-α-D-galactose (pNP-α-Gal), p-nitrophenyl-β-D-galactose (pNP-β-Gal) did not 31 show detectable activity, confirming that the enzyme is an α-D-glucuronidase. The enzyme 1 exhibits highest activity at the optimum pH and temperature of 8.0 and 90°C, respectively (Fig. 2 S1 C and D). We performed bisubstrate kinetic characterization by varying NAD + and pNP-α-3 GlcA concentrations in parallel independent experiments. Hyperbolic kinetics obtained in all 4 cases indicated that the enzyme follows Michaelis-Menten kinetics ( Fig. 1 A and C, Fig. S2). 5 Furthermore, the intersecting Lineweaver-Burk plots clearly indicate that the TmAgu4B reaction 6 follows a sequential mechanism and not a ping-pong mechanism ( Fig. 1 B and D). The values of 7 the kinetic constants for the sequential mechanism models are listed in Table S1. accumulation of NADH, monitored by absorbance at 340 nm, is consistent with the course of 20 the reaction where NAD + is converted to NADH (Fig. S4A) [37]. Addition of substrate pNP-α-21 GlcA to the reaction mixture containing the enzyme, DTT, NAD + and Mn 2+ , increased the 22 fluorescence intensity in the region between 400-550 nm with a peak around 445 nm, 23 corresponding to the characteristic emission spectra of NADH. The same was absent in the 24 reaction mixture without substrate (Fig. S4B). Together, the data confirm the obligate NAD + -25 dependent activity and the requirement for the divalent transition metal ion and reducing 26 conditions and is in agreement with data from GH4 homologs [15][16][17][18][19][20]37]. 27 28 Crystal structures of TmAgu4B 29 complexes, with NAD (holo-TmAgu4B), and with NAD and the glycone reaction product, D-1 glucuronic acid (ter-TmAgu4B). The crystal forms in space group C2 contain a single subunit in 2 the asymmetric unit, with the second subunit of the functional homodimer related by a 3 crystallographic two-fold symmetry. The crystallographic statistics are summarized in Table 1. 4 Crystals of the apo form were obtained only after extensive dialysis of protein samples in 5 a cofactor-free buffer. NAD(H) is a dissociable prosthetic group, and notably, none of GH4 6 structures reported so far represent separate snapshots of the apo form and the holo form within 7 the same enzyme. The metal and cofactor-free apo form was refined to 2.15 Å resolution. 8 Extensive crystallization trials were carried out with added NAD + and Mn 2+ in co-crystallization 9 and soaking experiments utilizing the apo form to capture the metal and cofactor bound forms. 10 The experiments always had reducing agent DTT. Crystals of the NAD-bound holo form were 11 successfully obtained and the structure determined at 1.97 Å resolution. However, these crystals 12 always lacked the bound metal ion. To determine the cognate ternary complex with a ligand, we 13 carried out multiple co-crystallization trials with substrates and mono-and di-saccharides which 14 were not successful. Therefore, we soaked the crystals of holo-TmAgu4B in mother liquor 15 containing the product and MnCl 2 for different time periods. Optimized crystals obtained here 16 allowed the successful determination of the complex with NADH and GlcA to a resolution of 17 1.85 Å. In this context, the three structures represent successive structural snapshots of the 18 enzyme in the apo form, the holo form, and a ternary complex with a cognate sugar acid. 19 Together, these snapshots allow us to capture the structural transitions associated with the entire 20 catalytic cycle and establish the structural basis of the mechanism and substrate specificity and 21 selectivity for a GH4 glucuronidase. 22 The overall fold of TmAgu4B, as previously described, belongs to the mixed α/β class 23 and contains three separate regions; N-terminal, central catalytic, and C-terminal ( Fig. 2 and Fig.  24 S5) [26]. to the GH4 family, and distinguish this family from the evolutionarily related oxyacid 31 dehydrogenases [14][15][16]. The C-terminal region includes the oligomerization domain that forms 1 the conserved homodimeric interface. The pairwise structural alignments of the eight homologs 2 using the DALI program displayed root mean square deviations (rmsd) in the range 1.2 -3.2 Å, 3 Z-scores in the range 31.2 -61.8, while the sequence identities range from 18 -89% [38]. The N-4 terminal and C-terminal regions are the most structurally conserved across the GH4 family, 5 whereas the central catalytic region is the most divergent and is proposed to be responsible for 6 the diverse substrate specificity and selectivity in this family [36,39]. 7 8 9 Interactions of the cofactor and comparison of the apo and holo forms 10 A previous structure of TmAgu4B at 2.5 Å resolution (PDB: 1VJT, Joint Centre for Structural 11 Genomics, unpublished) lacks the critical nicotinamide (NAD N ) moiety of the bound cofactor. 12 However, the better resolution of data in our study allowed for the modelling of the entire 13 cofactor in the expected binding pocket spanning a conserved cleft between the N-terminal and 14 the central domains (Fig. S6A). The cofactor modelled as NADH in the holo form makes 15 multiple interactions in the active site (Fig. 3A). The adenosyl ring (NAD A moiety) is well 16 packed in a hydrophobic pocket made of largely aliphatic residues. The O3 and O2 atoms of 17 NAD N ribose make hydrogen bonds with the main chain NH and the sidechain ND2 atoms of 18 conserved Asn160. Asn160Ala substitution mutant was found to be inactive (Fig. S7) bond and a salt bridge interaction with the carbonyl oxygen and the OD2 of Asp93, respectively, 1 creates an ionic lock that clamps the cofactor (Fig. 3B and Fig. S8A). While the residue at 2 position 43 is usually a Lys/Arg in the family, residue 93 is part of a hairpin loop lining the edge 3 of the top edge of the NAD binding domain and is an insertion element unique to TmAgu4B and 4 TnAgl and absent in the other homologs (Fig. S9). These rearrangements mark an intra-domain 5 cleft closure upon cofactor binding and it is likely that the resultant ionic lock is specific to these 6 two homologs and are key structural features that modulate divergent kinetics of cofactor 7 binding and reaction rates across paralogous members of the family. 8 9 10 Conformational changes are required for formation of competent active site 11 The structure of the complex of NADH and the cognate reaction product GlcA allowed us to 12 identify the substrate binding pocket and describe the conformational changes associated with 13 the formation of the ternary complex, hitherto unreported in the GH4 family. Unambiguous 14 electron density is consistent for β-D-glucuronic acid in the chair conformation ( 4 C 1 ) bound at 15 the expected glycone binding site (Fig. S10). The structures of the holo form and the ternary 16 complex are largely identical with an overall rmsd of 0.37 Å (over 469 Cα atoms) between them. 17 The largest conformational changes occur in the N-terminal residues 11-13 of helix α1 (Fig. 4  18 and Fig. S8B). This remarkable secondary structure transition from an α-helical conformation to 19 a β-turn, results in the spatial positions of Val11, Arg12 and Phe13 moving closer towards the 20 substrate (Cα atoms of Val11 and Arg12 move by 3.5 and 4.3 Å, respectively). The side chains 21 undergo large "swing" movements accordingly, creating a closed state in the ternary complex. 22 First, Arg12 sidechain makes direct ionic interactions both with the GlcA carboxylic O6A and 23 with the pyrophosphate O1N atom (~2.9 Å). Besides, a water bridge connects Arg12, NAD N 24 ribose O2D and GlcA O6A (Fig. 5). Second, a major reorientation of the NAD N moiety occurs 25 due to the rearrangement of the sidechain of Phe13 in the ternary complex (1 rotation from 154° 26 to -73°). In the holo form, NAD N maintains an offset parallel π-π stacking interaction with Phe13 27 (~3.4 Å) (Fig. S6B). In contrast, in the ternary complex, NAD N reorients (C4N atom moves by 28 ~4.0 Å) to a perpendicular T-π stacking interaction (Fig. 4, Fig. 5, Fig. S8B). As a result, the pro-29 R face of NAD N now stacks parallel with GlcA bringing the C3 sugar atom to within 3.8 Å of the 30 C4N atom. Moreover, the C3-C4N-N1N and the C4N-C3-C1 angles are 80° and 88°, 31 respectively. The angle between planes defined by C3N-C4N-C5N-C6N of NAD N and O3-C3-32 C4-O4 of GlcA is ~11°, mimicking a syn-like conformation proposed by Wu et al., 1995, for 1 maximum overlap between the C3-H of GlcA and the LUMO-p z orbital of the C4N of NAD + 2 ( Fig. S11) [40]. This network of CH-π interactions involving the Phe13, the NAD N , and the 3 pyranose ring result in an interaction geometry appropriate for the hydride transfer step in 4 catalysis. Moreover, the NAD N ribose now makes a new hydrogen bond with Asn160. It appears 5 that the dynamics of residues Arg12 and Phe13 are crucial to positioning the cofactor and the 6 substrate for productive catalysis. Together, we propose that the structural transition of the 7 region Val11-Arg12-Phe13 is substrate-induced and that the cofactor-protein interactions in the 8 ternary complex are coupled to substrate binding (Fig. 6). The structural and kinetic data 9 indicates that TmAgu4B adopts the sequential bisubstrate kinetic mechanism and not a ping-10 pong mechanism. The kinetic analyses cannot distinguish between the random sequential and 11 ordered sequential mechanisms. However, the observed structural transitions from the apo form 12 to holo form and to the ternary complex strongly suggests that TmAgu4B adopts an ordered 13 sequential mechanism where NAD + is the first substrate while the glucuronide is the second 14 substrate. 15 16

Structural insights into the reaction mechanism of TmAgu4B 17
Inspection of the interactions that GlcA and the cofactor make with the active site residues 18 provides insights into the role of these residues in the mechanism. The GlcA O3 and O4 atoms 19 are hydrogen-bonded to Asn160 ND2, while the OD1 atom makes a hydrogen bond with a 20 structurally conserved water (W1) of the metal coordination shell [24]. Asn160, which is present 21 in a cis conformation, constitutes a conserved and functionally important Asn160-Pro161 (NP) 22 motif of the GH4 family. The loss of activity in the N160A mutant is consistent with the multiple 23 roles that Asn160 plays in maintaining the precise geometry of interactions of the substrate, the 24 cofactor, and the metal coordination via W1 (Fig. 5 and Fig. S7). Catalytic residue Asp267 25 makes bidentate hydrogen bonds with the GlcA O1 and O2 atoms (2.8 -3.0 Å). It is to be noted 26 that the O1 interaction is that for the bound β-conformer of GlcA, whereas this interaction will 27 be absent in the cognate α-linked substrate. Asp267 OD2 atom, 3.4 Å from the C2 atom of GlcA, 28 is expected to act as the base to deprotonate C2. In support, the substitution of Asp267 by Ala 29 abolished the activity (Fig. S7). 30 We were unable to locate bound Mn 2+ in the ternary complex although the crystallization 1 conditions contained excess metal ions. It is likely that the metal-coordinating Cys181 residue, 2 which has been oxidized to cysteine sulfinic acid (Cys-SOOH), prevents metal binding. 3 Nevertheless, the relative positioning and the geometry of interactions between GlcA, the 4 cofactor, and the stringently conserved neighboring residues, Asn160, His210, and Asn209, in 5 TmAgu4B, are nearly identical to that observed in the G6P and metal-bound ternary complexes 6 of BsGlvA and TmBglT (PDB: 1U8X, 1UP6) [18,20]. Moreover, previous reports indicate that 7 all steps of the mechanism (oxidation-elimination-addition) occurs at the glycone binding 8 subsite. Therefore, it is reasonable that the active site interactions in TmAgu4B reflect a 9 catalytically productive complex of a cognate glucuronide substrate. For instance, the distances 10 between the GlcA O2 atom and the metal-coordinating atoms SG of Cys181 and NE2 of His210, 11 are 3.4 and 3.0 Å, respectively. The equivalent interactions of Cys171 and His202 in BsGlvA are 12 3.4 and 3.7 Å (Fig. 7A). These metal-protein direct interactions are invariant irrespective of 13 whether the substrate or cofactor binding sites are occupied or not. Hence, we modeled a metal-14 bound form of the ternary complex using the structure of the Co 2+ -bound TmAgu4B (PDB: 15 6KCX) to position Mn 2+ (Fig. 7). The metal binding site and its interactions are structurally 16 equivalent in all metal-bound homologs including, TmBglT, BsGlvA, TnAgl and BsLplD. The 17 Mn 2+ -bound ter-TmAgu4B model did not require repositioning of any atoms except for a minor 18 reorientation of the NAD N amide group to bring the O7N atom within the expected octahedral 19 coordination sphere of Mn 2+ (Fig. S12). W1, at a distance of ~2.7 Å from GlcA O3 atom in 20 TmAgu4B, is also present in the holo form. As expected, the substitution of Cys181 by Ala 21 abolished the activity of TmAgu4B (Fig. S7) 22 Based on this model, we sought to establish the structural basis of the catalytic cycle of 23 TmAgu4B. The proposed mechanism occurs in two half-reactions. We propose that the Mn 2+ 24 with W1 together constitute the metal-activated hydroxide ion which acts as the general base to 25 deprotonate the C3-hydroxyl group of the glycone and initiate the reaction (Fig. 8, I) [19]. The 26 deprotonation acts in concert with hydride transfer to NAD + to give a ketone at C3 and has been 27 previously shown to be rate-limiting [17][18][19][20]37,39,41]. The carbonyl intermediate is stabilized 28 by Mn 2+ , leading to a relatively more acidic proton at C2. In the next step, deprotonation at C2 29 and concurrent β-elimination at C1 of glycone carbonyl intermediate occurs (Fig. 8, III). The β-30 elimination leading to the loss of two substituent atoms has been widely documented across 31 literature in uronic acids since the glycosidic linked residue or the methoxy substituent (at C4 of 1 glycone GlcA in case of TmAgu4B) of the uronic acid residue are good leaving groups [42][43]. 2 Careful KIE studies in combination with structural analysis in TmBglT, suggest that the 3 structurally equivalent Tyr241 deprotonates the C2 atom and elimination of aglycone proceeds 4 through a E1 CB or simple β-elimination [16,20,37]. In TmAgu4B, C2-H is deprotonated by the 5 equivalent Asp267 and the β-elimination of the aglycone leaves a α,β-unsaturated Michael 6 acceptor at C1=C2 of GlcA (Fig. 8 II, III). This Michael-like acceptor undergoes base-catalysis 7 by water, generating a keto group at C3, while the C2 is reprotonated by Asp267. Finally, the 8 onboard NADH reduces the C3 ketone, completing the overall cycle (Fig. 8, IV, V). 9 Interestingly, a water molecule (W7) is present in the ternary complex at a distance of 4.3 10 Å from the C1 atom (Fig. 5). W7, present only in the ternary complex, is hydrogen-bonded to the 11 sidechain of Arg12, while an equivalent water molecule is also present in the BsGlvA [15]. It is 12 tempting to speculate that W7, activated by Arg12, plays a functional role in catalysis [44] and is 13 an interesting feature that can explored using quantum mechanics/molecular mechanics 14 (QM/MM) studies. The structural basis for the proposed mechanism is in good agreement with 15 earlier structural and computational studies carried out for homologs [17][18][19]37,39]. 16 A close examination of the interactions of the catalytic base Asp267 suggests that its 17 protonation state is likely perturbed by an ionic interaction with the sidechain of the neighboring 18 Arg270 (3.0 Å). The orientation of the Arg270 guanidinium is held in place by a stable bidentate 19 salt bridge with Asp294, located on the opposite side ( Fig. 5 and Fig. 6C). It is likely that the 20 interaction geometry of the Asp267-Arg270-Asp294 triad perturbs the pKa of Asp267. A 21 perturbed pKa is consistent with the role of Asp267 in the deprotonation and protonation of the 22 C2 atom during each half cycle of the reaction. We next calculated the pKa values of Asp267 23 using the PROPKA program [45]. As expected, the predicted values of pKa show an increase 24 from 2.4 in the apo form to 6.0 in the ternary complex. This triad (Asp-Arg-Asp) is also present 25 in the α-glucosidase TmAglA, where the structurally equivalent Asp260 is proposed to act as the 26 base [14]. Aspartates with perturbed pKa as high as 8.2 have been reported earlier in human 27 aromatase enzyme active site due to interactions with nearby residues and ligands [46]. 28 Interestingly, both R270K and R270A mutants are inactive (Fig. S7). It is likely that Lys at this 29 position is unable to maintain the orientation or the appropriate protonation state of the catalytic 30 Asp267. Structural comparisons indicate that this triad region is divergent in conformation and 31 sequence compared to the phosphoglucosidases (Fig. S9) [18,37]. It thus appears that in the GH4 1 family, the pKa of the catalytic base is fine-tuned by its interactions with residues in the 2 neighborhood. This hypothesis is consistent with the structural diversity of this region across 3 homologs with different specificities. Lastly, the glycone binding site maintains an exquisite 4 complementarity to the sugar acid. This explains why most mutational modifications to the 5 existing active site geometry leads to catalytic failure (Fig. S7). Given this limitation, 6 examination of the precise role of specific residues in TmAgu4B will necessitate extensive 7 studies using KIE and NMR techniques. 8 Structural basis of glucuronide specificity in TmAgu4B 9 Next, we sought to understand the strict glucuronidase specificity of TmAgu4B using a 10 comparative analysis of the available three ternary complexes. Among the characterized 11 homologs, TmAgu4B is structurally the most similar to the α-glucosidase TmAglA (rmsd of 1. Arg299), constituting a significantly positive, well-packed pocket that confers specificity 16 towards glucuronide substrates (Fig. 6C). A perfect bidentate interaction is formed between the 17 carboxylic O6A and O6B atoms with Arg299 (~2.6 Å). Arg270 makes another bidentate 18 interaction with O5 and O6B atoms (~3.0 Å), while Arg12 interacts with O6A and O1N of 19 NADH (Fig. 5, Fig. S13). 20 Structural comparisons show that the four phosphoglucosidases, TmBglT, BsGlvA, 21 KpAglB and GsLicH, contain a conserved Arg equivalent to Arg299 (Arg285 in BsGlvA). In 22 contrast, in the α-glucosidase TmAglA, Arg299 is substituted by Trp293 (Fig. S13, A-D). 23 Presumably, a positively charged residue is imperative at this position to recognize a generic 24 negatively charged moiety in the substrate glycone. Given that the TmAgu4B mutants R299A, 25 R299F and R299W displayed no activity, perhaps the ionic interactions of Arg299 (Trp in 26 glucosidase) is important both for optimal orientation and affinity of the negatively charged 27 sugar (Fig. S7, Fig. S13A and B). In fact, the specificity for the substrate in the 28 phosphoglucosidases appears to be dictated by its interactions with another conserved Arg 29 (Arg95 in BsGlvA, Arg87 in TmBglT). This Arg is replaced by Trp99/Leu93 in TmAgu4B and 30 TmAglA, consistent with their non-phosphorylated substrates ( Fig. S13B and C). 31 Thus, specificity in TmAgu4B is likely determined by interactions of the carboxylic 1 atoms with either or both Arg12 and Arg270, positioned on opposite sides (Fig. 5, Fig. 6C). 2 Arg270 is located in a divergent loop region that connects helices α9 and α10 in the GH4 family 3 (Fig. S9). This region in the phosphoglucosidases lacks an Arg at the corresponding position, 4 whereas Arg is conserved in the glucosidase (Arg263, TmAglA) and makes a bidentate 5 interaction with the O6H and O5 atoms of bound maltose, identical to the Arg210-GlcA 6 interactions in TmAgu4B (Fig. S13A). This suggests that Arg270 cannot serve to discriminate 7 between the glucuronide and glucoside specificities in the TmAgu4B and TmAglA, respectively. 8 Arg12, present in the flexible and dynamic Turn I, is unique to TmAgu4B and is substituted by 9 Val15 in TmAglA (Phe17 in BsGlvA). Moreover, Turn I region is divergent across the family, 10 and presents an open conformation in TmAglA, oriented away from the substrate (Fig. S13A). 11 These observations strongly suggest that the ionic interactions of Arg12 largely dictates the strict 12 specificity for glucuronides in TmAug4B. It is noteworthy that the role of Arg12 in this context 13 is directly enforced by a substrate-induced coupled swing movement of residues Phe13 and 14 Arg12. Moreover, the substitution mutants R12A and R12K are both catalytically inactive. 15 Considering the network of direct and water-mediated interactions of Arg12 with the substrate 16 and cofactor, this residue has overlapping roles in substrate binding, specificity, and catalysis. 17 Another residue that may dictates the specificity of TmAgu4B is Thr125. The equivalent 18 residue, Asp119 in TmAglA, interacts the glycone (Fig. S13A). Docking of GlcA into the 19 TmAglA active site creates an unfavorable electrostatic environment with the glycone carboxylic 20 group, whereas in TmAgu4B, Thr125 makes a favorable water-mediated interaction. Indeed, a 21 T125D mutation in TmAgu4B abolished activity (Fig. S7). Notwithstanding the close structural 22 and sequence similarity of the glycone binding pockets of TmAgu4B and TmAglA, the evolution 23 of exquisite substrate discrimination between the two homologs is governed by multiple 24 adaptations of the pocket in the immediate neighborhood of the C6 position. 25 26

Conclusions 27
The structural basis of the mechanism and specificity of TmAgu4B studied here reveals unique 28 stepwise sequential conformational rearrangements, previously unreported for this family. 29 Conformational changes associated with binding the cofactor to the apo form are first necessary 30 to create a competent active site to accommodate the substrate. Next, the binding of the substrate 31 induces a subsequent secondary structure transition resulting in a swing movement of key 1 residues ensuring that both the nicotinamide moiety and the substrate are oriented in an 2 alignment required for hydride transfer. The ternary complex also illustrates the direct roles of 3 crucial catalytic residues, including the base Asp267, the protonation state of which is modulated 4 through interactions within a triad of charged residues. Lastly, crucial insights are realized for the 5 structural basis of substrate specificity in the GH4 family by defining clear roles for specific 6 residues that discriminate between glucoside, glucuronide, and phosphoglucoside substrates 7 within the glycone recognition pockets of three paralogous members. The GH4 family is another 8 example that demonstrates the exquisite fine-tuning of molecular features and dynamics that 9 have driven the evolution of the enormous number of functionally diverse extant carbohydrate-10 active enzymes built around a few structural folds and catalytic frameworks. GlcA D-glucuronic acid 23 BsGlvA Bacillus subtilis phospho-α-glucosidase 24 BsLplD Bacillus subtilis glucosidase 25 G6P Glucose-6-phosphate 26 GsLicH Geobacillus stearothermophilus phospho-β-glucosidase 27 HOMO highest occupied molecular orbital 28 LUMO lowest unoccupied molecular orbital 29 KpAglB Klebsiella pneumoniae 6-phospho-α-glucosidase 30 TmBglT Thermotoga maritima phospho-β-glucosidase 31 TnAgl Thermotoga neapolitana 32 TmAglA Thermotoga maritima α-glucosidase 33 TmAgu4B Thermotoga maritima α-glucuronidase 34     moiety of GlcA and the pyrophosphate moiety of NAD + . A second set of changes is the conformational changes in the NADH and Phe13 where a T-π stacking interaction between Phe13 with NAD N moiety in the holo form is transformed to a π-π stacking interaction in the ternary complex. Figure 5. Interactions of TmAgu4B with NADH and GlcA in the ternary complex. Cartoon and stick representation of active site residues interacting with GlcA and NADH. W1 and W4 are conserved water molecules while other water molecules labeled W5, W6, W7 are unique to the complex, and form water bridge interactions connecting the enzyme, NAD and GlcA. The Turn I (residues 11-13) between α1 helix and β1 sheet is labeled. Hydrogen bond interactions (< 3.2 Å) are shown as dashed lines. Step-wise sequence of conformational changes in the active site of TmAgu4B. Surface and stick representation of active sites in A) apo form, B) holo form, C) ternary complex. The Turn I region in the complex that undergoes a structural transition to occlude the active site is marked in transparent cyan. In all panels, positively charged residues are colored blue and the negatively charged residues are colored red. The bound GlcA and NADH are shown in green stick representation.  I) Mn 2+ -OH (W1) deprotonates the C3-OH (green) leading to C3 keto sugar (green) and concurrent C3-H hydride (purple) abstraction by nearby C4N/NAD + . II) O=C3-C2 enolate species (green) is stabilized by Mn 2+ leading to slightly acidic C2-H (red). C2 is deprotonated by Asp267 and concerted β-elimination (E1 CB ) of -ROH (teal) leading to a double bond character across the C2=C1 Michael acceptor (pink). III) Water (blue) addition to unsaturated Michael acceptor at C2=C1 (pink) occurs in a Michael addition fashion. IV) Keto group at position C3 (green) is reduced by NADH (purple) and V) Glycone product bound form.