A structural and kinetic survey of GH5_4 endoglucanases reveals determinants of broad substrate specificity and opportunities for biomass hydrolysis

Broad-specificity glycoside hydrolases (GHs) contribute to plant biomass hydrolysis by degrading a diverse range of polysaccharides, making them useful catalysts for renewable energy and biocommodity production. Discovery of new GHs with improved kinetic parameters or more tolerant substrate-binding sites could increase the efficiency of renewable bioenergy production even further. GH5 has over 50 subfamilies exhibiting selectivities for reaction with b -(1,4) – linked oligo- and polysaccharides. Among these, subfamily 4 (GH5_4) contains numerous broad-selectivity endoglucanases that hydrolyze cellulose, xyloglucan, and mixed-linkage glucans. We previously surveyed the whole subfamily and found over 100 new broad-specificity endoglucanases, although the structural origins of broad specificity remained unclear. A mechanistic understanding of GH5_4 substrate specificity would help inform the best protein design strategies and the most appropriate industrial application of broad-specificity endoglucanases. Here we report structures of 10 new GH5_4 enzymes from cellulolytic microbes and characterize their substrate selectivity using normalized reducing sugar assays and MS. We found that GH5_4 enzymes have the highest catalytic efficiency for hydrolysis of xyloglucan, glucomannan, and soluble b -glucans, with opportunistic secondary reactions on cellulose, mannan, and xylan. The positions of key aromatic residues determine the overall reaction rate and breadth of substrate tolerance, and they contribute to differences in oligosaccharide cleavage patterns. Our new compos-ite model identifies several critical structural features that confer broad specificity and may be readily engineered into existing industrial enzymes. We demonstrate that GH5_4 endoglucanases can have broad specificity without sacrificing high activity, making them a valuable addition to the biomass deconstruction toolset.

Broad-specificity glycoside hydrolases (GHs) contribute to plant biomass hydrolysis by degrading a diverse range of polysaccharides, making them useful catalysts for renewable energy and biocommodity production. Discovery of new GHs with improved kinetic parameters or more tolerant substrate-binding sites could increase the efficiency of renewable bioenergy production even further. GH5 has over 50 subfamilies exhibiting selectivities for reaction with b-(1,4)-linked oligo-and polysaccharides. Among these, subfamily 4 (GH5_4) contains numerous broad-selectivity endoglucanases that hydrolyze cellulose, xyloglucan, and mixed-linkage glucans. We previously surveyed the whole subfamily and found over 100 new broad-specificity endoglucanases, although the structural origins of broad specificity remained unclear. A mechanistic understanding of GH5_4 substrate specificity would help inform the best protein design strategies and the most appropriate industrial application of broadspecificity endoglucanases. Here we report structures of 10 new GH5_4 enzymes from cellulolytic microbes and characterize their substrate selectivity using normalized reducing sugar assays and MS. We found that GH5_4 enzymes have the highest catalytic efficiency for hydrolysis of xyloglucan, glucomannan, and soluble b-glucans, with opportunistic secondary reactions on cellulose, mannan, and xylan. The positions of key aromatic residues determine the overall reaction rate and breadth of substrate tolerance, and they contribute to differences in oligosaccharide cleavage patterns. Our new composite model identifies several critical structural features that confer broad specificity and may be readily engineered into existing industrial enzymes. We demonstrate that GH5_4 endoglucanases can have broad specificity without sacrificing high activity, making them a valuable addition to the biomass deconstruction toolset.
Sustainable, biological solutions to the growing climate and energy crises have been the subject of increasing interest in the last 2 decades. In contrast to biofuel production from edible polysaccharides, such as starch, recent efforts have focused on nextgeneration bioenergy, with higher-energy fuels derived from inedible lignocellulosic biomass found in a variety of abundant plant materials.
Enzyme hydrolysis of biomass releases sugars, which can be converted into a growing range of fuels and commodities. Enzymes, however, are a major operational expense in the cellulosic bioenergy process (1), owing to the difficulty of hydrolyzing cellulose relative to starch, prompting search and design efforts for more effective enzyme mixtures. Broad-specificity glycoside hydrolases (GHs) may replace many specialized enzymes with fewer, more flexible catalysts, increasing sugar yield and reducing enzyme variability between feedstocks (2). One such family rich in broad-specificity cellulases is GH5.
Because GH5 is one of the largest and most catalytically diverse GH families, a classification scheme was devised that separates the family into over 50 subfamilies based on global analysis of sequences, biochemical data, and structures (7). Subfamily 4 (GH5_4) was noted for being particularly enriched in broad-specificity b-(1,4)-endoglucanases (enzymes that cleave in the middle of long glucan chains, like cellulose).
Despite the numerous structural models available, prediction of selectivity by sequence and structure homology remains a significant challenge (6,21,22). This challenge is exacerbated by inconsistent assay methodologies and end-point activity measurements that offer limited kinetic insight. Understanding subtle differences in structure that lead to variations in substrate selectivity and catalysis requires examination of closely related proteins, consistency in experimental methods, and a combination of experimental approaches.
We recently carried out a normalized catalytic analysis of over 240 members of the GH5_4 subfamily and found that 55% are active on lichenan, mannan, and xylan, and many have high activity on XG (6). Other researchers have shown that some also hydrolyze additional mixed-linkage glucans, like barley b-glucan (BBG), as well as glucomannan (GM) (8,9,11,18,(23)(24)(25)(26)(27) (Table S1). These analyses helped guide the selection of enzymes for additional catalytic studies and efforts to solve crystal structures.
In this work, we report the crystal structures of 10 new enzymes that sample the GH5_4 phylogenetic space from two of its three distinct clades. In combination with normalized catalytic studies, this work provides a fuller picture of the evolutionary changes and structural features contributing to the breadth of substrate selectivity and hydrolysis reactions observed in GH5_4.

Structural features of GH5_4 active-site clefts
We sought to structurally characterize enzymes from each of the three major GH5_4 clades, and crystallization trials were conducted for ;20 enzymes. Of these, 10 structures were successfully solved: two from clade 2 and eight from clade 3. Enzymes from clade 1 tended to be more difficult to purify and crystallize due to poor solubility, which may be related to their intimate dependence on CBM46 modules (6,18), which were removed prior to expression.
All 10 structures adopt the expected (a/b) 8 , or TIM-barrel, fold (Fig. 1A), with the active-site Glu residues positioned at the C-terminal ends of strand 4 and strand 7. The ;5-Å distance between the side chains of these residues is consistent with the retaining mechanism of glycoside hydrolases in GH5 (28). It should be noted that the full polypeptide sequences of some of these proteins include carbohydrate-binding modules, dockerin domains, and additional enzyme domains (Fig. 1C), but the experimental design for this study included only the GH5_4 enzyme domains.
Four of the enzymes originate from cellulose-degrading environmental bacteria. 4IM4 is derived from Hungateiclostridium thermocellum, a thermophilic anaerobe isolated from decaying cotton bales (29). Other source organisms (Hungateiclostridium cellulolyticum and Clostridium cellulovorans) are found on decaying grass and wood (30,31), whereas another (Clostridium acetobutylicum) is found in soil and water (32). The remaining six source organisms dwell in the digestive tracts of herbivorous mammals (or chickens, for Bacteroides salanitronis), playing critical roles in aiding digestion of plant feedstocks (33)(34)(35)(36)(37)(38). Table 1 briefly summarizes data collection and refinement statistics for the 10 structures reported here. A fuller set of data are provided in Table S2. As a set, the structures have similar resolution, completeness, and R-values. The structures are distributed across the GH5_4 phylogenetic tree (Fig. 1D), with new structures added to subclades 2A and 3F and additional structures added to subclades 2B, 3B, and 3G (6). There is ;30-70% sequence identity in this set of structures.
After considerable effort and limited success in obtaining structures with bound oligosaccharides (Table S2, PDB 6WQV with cellotriose bound in the negative subsites), we relied on alignment of structures previously solved by Dos Santos et al. (16) (PDB entry 4W88) and Gloster et al. (17) (PDB entry 2JEQ) to model the binding of XG fragments. Of particular utility, 4W88 provides a model for a XG variant having a backbone of three glucose with branches of one xylose and one xylosegalactose in the positive subsites, whereas the 2JEQ structure provides a model having a backbone of four glucose with branches of one xylose-and one xylose-galactose in the negative subsites. Alignment of our new structures with these previous exemplars gave average root mean square deviation for alignments of the protein backbone of ,1 Å and essentially no steric clashes between aligned enzymes and modeled polysaccharides. This lends great confidence to the predictive modeling of XG fragments into the new GH5_4 active sites.
Six features of the binding clefts were explored for contributions to substrate selectivity: 1) the dimensions of the clamp surrounding the hydrolysis site, dictated chiefly by loop lengths; the position of residues that interact with 2) negative-subsite and 3) positive-subsite sugars; 4) interactions with branch sugars; and the presence of protein-xylose interactions 5) near the 11 subsite and 6) near the 21 subsite.
Clamp-forming loops-The loops connecting the C termini of the interior b-strands to the N termini of the subsequent helices form the walls of the substrate-binding cleft and provide contact surfaces for substrate interaction. Several of these loops are varied in length, residue composition, and the presence of secondary structure elements (Table 2). Changes in these loops are the primary determinants of the diversity of active-site shapes found within GH5_4.
The cross-sections of the 10 enzymes shown in Fig. 2 highlight the substrate-binding cleft, which is bounded by walls built mainly from loops 4 and 6. These loops form a U-shaped clamp into which the glycan chain settles for hydrolysis. Specifically, loops 4 and 6 contribute to the "left-hand" and "right-hand" walls of the clamp, respectively, and largely determine its width. Aromatic residues in these loops interact with the polysaccharide chain (Trp (green) and Tyr (orange) in Fig. 2). Fig. 2 also shows the location of the Glu residues that serve as the nucleophile (red) and proton donor (blue) for the retaining reaction.
Seven of the eight clade 3 proteins (4IM4, 6MQ4, 6PZ7, 6Q1I, 6UI3, 6XSU, and 6WQP) possess a deep cleft with mostly vertical clamp walls of constant width. Among these seven enzymes, loop 4 and loop 6 show up to 78% identity (Table S3). Loop 4 contains 18 residues (except for 14 residues in 6UI3), loop 6 contains 18-21 residues, and similar short secondary structural elements are found in each ( Table 2). Fig. 3 (A and B) shows the surface representations of these loops and the negative and positive subsites for 4IM4, which is representative of the active-site clefts of the seven enzymes mentioned above. As described below, different members from this group have the highest catalytic efficiencies for each of the polysaccharides tested (Table 3).
Although 6XSO is in clade 3, its loop 4 and loop 6 are significantly longer and shorter, respectively, than the other clade 3 proteins. Its clamp is incomplete because it lacks an aromatic residue in loop 6 to interact with the substrate (Figs. 2 and 3 (C and D)). 6XSO has middle to low activity with all substrates (Table 3). The two structures from clade 2 feature greater loop diversity than clade 3. In 6XRK and 6WQY, loop 6 is substantially longer than the average of 21 residues, at 35 and 32 residues, respectively. Both structures contain an insertion in loop 6, but at different relative locations in the loop. The insertion occurs before Trp-254 in 6XRK and after Trp-269 in 6WQY (position of the aligned Trp residues shown by red arrow in Fig. S3A). In 6WQY, the loop folds back onto itself and makes a stabilizing contact with helix 7.
The two clade 2 enzymes appear to lack complete clamp motifs, as they are missing either one or both of the aromatic residues in the walls of the cleft. This contributes to their overall more open active-site shape (Fig. 2). 6XRK lacks aromatic residues on the left wall provided by loop 4 (Fig. 3E), whereas the two Trp residues mentioned above provided by loop 6 extend upward from the cleft and could feasibly provide distal contacts with a bent polysaccharide chain (Fig. 3F).
Furthermore, loop 6 has a unique two-strand antiparallel b-sheet that extends outward an additional 20-25 Å from the globular protein core in 6XRK (Fig. 2). This configuration is supported by crystal packing, as the extended loop makes contacts with its counterpart in the asymmetric unit. However, in the monomer expected in solution, loop 6 in 6XRK has two solvent-exposed Trp residues (Fig. 3F, green) that may potentially contact positive-subsite sugars and so act as a rudimentary CBM and contribute to reactions with XG. Elastic network   Structure and selectivity of GH5_4 endoglucanases modeling using the elNémo server (39) revealed several possible low-frequency normal modes, which are springlike motions resulting from protein flexibility. The modeling predicted large dynamic motions of the extended loop 6 in 6XRK that might sample different binding interactions with polysaccharides. Similar motions were not predicted for the other clade 2 enzyme, 6WQY, as its own elongated loop 6 was folded back against the protein and stabilized by numerous interresidue contacts.
Interactions with negative-subsite sugars-Loop 1 varies from 18 to 31 residues in length and determines the overall size of the "loading platform" on each enzyme. This is a mostly flat surface, starting at the edge of the protein containing all of the negative subsites and funneling toward the active site of the substrate-binding cleft (see Fig. 1).
The shortest loop 1 belongs to 4IM4, and correspondingly, so does the smallest platform (Fig. 3A). The four enzymes that have the longest loop 1 ( Table 2, 30 or 31 residues) are 6XSO (Fig. 3C), 6XRK (Fig. 3E), 6Q1I, and 6WQP. Their loading platforms extend out ;10 Å farther than that of 4IM4. These and the other enzymes, such as 6XSU (Fig. 3G), with loop 1 greater than 25 residues contain a small a-helical segment (;1.5 turn) that sits just below the platform-extending loop, in a shelf-andbracket motif, providing a stable, compacted support for the extended surface.
Interactions with positive-subsite sugars-Loops 4 and 6 contribute additional aromatic residues-beyond those that define the clamp-that interact with sugar moieties in different ways, from 11 out to 13, 14, or potentially 15 subsites in different enzymes. 6XSU has an additional Trp after the critical Trp (Trp-193) in the 11 site, which doubles the length of the aromatic surface of the left wall (Fig. 3H). Access to this putative stabilizing protein-sugar interaction would only be available to a polysaccharide chain that maintained a linear configuration as it exited the substrate-binding cleft. Other examples of linear-only positive-subsite interactions include a Tyr on 6MQ4 and a Trp on 6WQP, near the edge of the protein that might conceivably interact with a strictly linear polysaccharide chain at 15. 6WQY also possesses an analogous extra Trp, but a Gly is inserted between the two Trp residues, and so the extended surface is altered; 6WQY has lower catalytic efficiency compared with 6XSU (Table 3).
Branch sugar interactions-Loop 3 is adjacent to the left wall-forming loop 4 and is poised to present side chains that can directly interact with sugars preceding the hydrolysis site. In other GH5_4 structures (e.g. 4V2X (18), 3ZMR (40), 5OYC (10), and 4W8A (16,41)), an aromatic residue in loop 3 stacks against the primary xylose branch at 22; none of the structures presented here has this binding motif.
6XSO possesses a Lys-Asp dipeptide in loop 3 that could form a hydrogen bond network with two galactose branches of XG (Fig. 4). Phe-132 is sheltered beneath the b-turn in loop 3 and is optimally oriented to stack against a secondary galactose branch at the 22 position (Fig. 3C, blue).
Loop 4 provides additional favorable interactions with branch sugar hydroxyl groups. For seven of the structures  presented here (all except 6XSO, 6XRK, and 6WQY), the position immediately preceding the clamp-forming Trp in loop 4 (e.g. Trp-203 in 4IM4) is occupied by either Asp or Glu, whose side chain comes within hydrogen-bonding distance of the modeled galactose branch off the 12 position. Loop 8 also provides additional potential interactions with branch sugars. 4IM4, 6MQ4, and 6Q1I possess a Tyr (e.g. Tyr-353 in 4IM4; Fig. 3A) oriented parallel to the modeled xylose branch at the 23 position, suggesting a close planar aromaticcarbohydrate interaction. This interaction appears to be specifically optimized to the geometry of XG. Other structures have small, hydrophobic side chains or Ser or Thr in this position and so offer different possibilities for interaction with a branch sugar. However, 6XSO has a diminished loop 8, and no protein residues appear to be within contact distance of the xylose branch at 23 (Fig. 3C).
Xylose branch interactions at the 11 subsite-A xylose branch at 11 rests at the floor of the cavity and helps orient the 11 glucose, which becomes the leaving group after nucleophilic attack. The glucan chain in the modeled XG is deflected sharply away from the cleft axis after the site of chain scission, with the deeper placement of the xylose branch excluding the glucan backbone from the active-site cleft (see Fig. 3, B, D, F, and H).
In addition to creating intrapolysaccharide hydrogen bonds with the glucan chain, the 11 xylose branch interacts closely with polar residues from loop 5 and the bottom of loop 6 indicated by the position of the modeled xylose. The only enzyme with no predicted polar contacts with this sugar is 4IM4, although a similar pocket shape and other interactions are present and likely position the 11 xylose branch and glucan chain exiting the active-site cleft.
6MQ4, 6PZ7, 6Q1I, and 6WQP contain a Ser in loop 5 that potentially forms a hydrogen bond with OH-2 or OH-3 of the branch xylose, whereas 6XSU and 6UI3 have a Ser-X-Asp triad with greater potential for polar contacts with these sugar hydroxyl groups. 6WQY has a Thr-Asn motif oriented favorably to form polar contacts, as does 6XRK. In addition, 6WQY possesses an Asp at the bottom of loop 6 that may coordinate with OH-4 on the xylose. 6XSO, despite having a loop 5 approximately twice the length of the other enzymes, offers only one potential polar contact, a suboptimally oriented Asn at position 238.
In summary, no favorable aromatic contacts are available to the 11 xylose, but several polar contacts might occur. In any case, all 10 GH5_4 structures show a distinct concave region that could accommodate a 11 xylose, and all 10 enzymes are able to hydrolyze XG (see below).
Xylose branch interactions at the 21 subsite-Interactions with the a-(1,6)-xylose branch sugar on the glucose in the 21 subsite are hypothesized to play key roles in determining catalytic selectivity (16). Many GH74 exo-XGases feature a pocket where an aromatic residue contacts this xylose, allowing cleavage of the substrate at branched sugars, but this feature is rarer in the GH74 endo-XGases (42). In GH5, such a pocket is also rare, and none of the 10 present GH5_4 structures has either a Trp or Tyr in the correct position to interact with a 21 xylose branch. 6XRK and 6WQY, the structures from clade 2, possess the most open clefts and thus potentially offer the highest tolerance to a branch xylose at the 21 subsite.

Catalytic efficiency with soluble polysaccharides
The high solubility of XG and GM allowed determination of k cat and apparent K m (Table S4), and the semisoluble characteristic of lichenan and xylan allowed these determinations for a subset of enzymes. Experimental errors were ;1-20% for k cat and ;3-30% for K m. The high viscosity of XG and GM solutions and the heterogeneity of lichenan and xylan suspensions were likely sources of experimental error for these substrates.
In summary, kinetic analysis revealed that 1) the enzymes clustered into several groups based on catalytic efficiency for different substrates and 2) despite the an apparent convergence in k cat /K m , individual k cat and K m values varied widely.
Three groups of XGase catalytic efficiency emerged (Table  3). 4IM4, 6XRK, 6Q1I, 6WQP, 6UI3, and 6XSO had a group average k cat /K m of ;130 liters·min 21 ·g 21 . None of the enzymes within this group had statistically significantly different catalytic efficiency values, yet the individual k cat and K m values varied widely. 6PZ7 and 6WQY represented another grouping of average catalytic efficiencies (;110 liters·min 21 ·g 21 ), whereas 6XSU and 6MQ4 had distinctly lower efficiencies.
All of the lichenan-saturable enzymes had similar catalytic efficiencies, except for 6XSO (Table 3), which was about half as efficient (k cat /K m of 87 compared with the group average of 157 Structure and selectivity of GH5_4 endoglucanases liters·min 21 ·g 21 ). No grouping occurred for the seven enzymes saturable on beechwood xylan, as each enzyme had a statistically unique catalytic efficiency, ranging from 6MQ4 (23 liter-s·min 21 ·g 21 ) to 6XSO (2 liters·min 21 ·g 21 ).
Strikingly, k cat and K m for XG show a roughly linear correlation ( Fig. 5; R 2 = 0.69), indicating a tendency for the catalytic efficiencies in GH5_4 to converge for this substrate. Lichenan parameters were correlated to a lesser extent than XG (R 2 = 0.38), and a weak and inverse correlation was observed with GM (R 2 = 0.15) due to the occurrence of several efficiency clusters. It should be noted that if the single lowest-performing enzyme was removed for XG and lichenan (6MQ4 and 6XSO, respectively), the correlation coefficient rose to 0.88. Kinetic parameters for xylan were not significantly correlated (R 2 = 0.08), and although K m ranged from 1.3 to 25 g/liter, k cat did not surpass 150 min 21 .

Reaction with insoluble polysaccharides
When the present enzymes were tested on phosphoric acidswollen cellulose (PASC) and mannan, a linear, saturable relationship between [S] and k cat was not observed, so the turnover number measured at 10 g/liter substrate was used for comparison (Fig. 6). The activities on these insoluble substrates were about an order of magnitude slower than on XG, GM, and lichenan (with xylanase activities typically in between).

End products of reaction with oligosaccharides
Nanostructure-initiator MS, or NIMS (43), was used to determine the products released from hexose and pentose oli-gosaccharides, which have sufficient length to span across the active-site cleft and revealed unique patterns for occupancy of the negative and positive binding sites leading to catalysis (Table 4). Overall, reactivities observed for the oligosaccharide substrates matched those observed for full-length cellulose and mannan, but not for xylan (compare Table 4 and Fig. 6). All 10 enzymes hydrolyze PASC and C6, and five (4IM4, 6MQ4, 6UI3,  6XSU, and 6WQP) also hydrolyze mannan and M6. Whereas nine enzymes (all but 6WQY) hydrolyze xylan, two of these nine (6UI3 and 6XRK) do not hydrolyze xylohexaose (X6).
The two enzymes from clade 2, 6WQY and 6XRK, hydrolyzed C6 into a breadth of products that only modestly favored C3. Moreover, 6XSO from clade 3G also favored C3 among several products, suggesting low selectivity for occupation of sugar-binding sites spanning the site of catalysis.
Although M6 and X6 were not completely hydrolyzed in these experiments, there was less variation in their hydrolysis products relative to C6. Both M6 and X6 were preferentially hydrolyzed into the trisaccharide. Except for 4IM4 and 6WQP, lesser but equivalent amounts of the di-and tetrasaccharides were observed, suggesting that asymmetric binding across the catalytic site was less favorable for M6 and X6 than for C6, but still allowed.
The dual cellulase-mannanase 6UI3 offered an opportunity to further explore cleavage specificity with cellulose and man-nan oligosaccharides (Fig. 7). For C5, C6, and PASC, C2 was the dominant product, with lesser amounts of C3 and C1 and no C4 or C5 observed (Fig. 7A). The C6 results imply preferred asymmetric binding across the catalytic site and also the processive hydrolysis of C4. In contrast, for M6 and mannan, M3 was the dominant product, and near-equimolar amounts of M2 and M4 were observed. The M6 results imply a contrasting outcome: preferred symmetric binding of M6 across the catalytic site and no processive hydrolysis of M4. For M5, near-equimolar amounts of M3 and M2 were observed along with no processive hydrolysis of M3 (Fig. 7B).

Binding affinity to polysaccharides
Affinity gel electrophoresis (Fig. S2) was used to estimate K D for XG, GM, and low-molecular weight xylan. In general, when binding interactions could be observed, the estimated K D value was 2-4 orders of magnitude lower than the corresponding K m . For example, 6PZ7 has a K D of ;0.01 g/liter on XG but a K m of 3.4 6 0.3 g/liter. The binding of polysaccharides to 6WQY and 6XSO could not be tracked by electrophoresis, likely due to mismatch between the enzyme pI (6XSO, pI = 6.44; 6WQY, pI = 4.60) and the buffer needed for the electrophoresis.
None of the active enzymes showed changes in gel mobility in the presence of low-molecular weight xylan, which may have been caused by in-gel hydrolysis. After creation of Glu-to-Gln mutations of the catalytic Glu residues for five enzymes (4IM4,  6PZ7, 6UI3, 6XSO, and 6WQY), only the mutated 4IM4 showed detectable binding to soluble xylan with K D ;0.03 g/liter. The blue fluorescent protein mKalama1 was fused to 6UI3, and a simple pulldown assay was used to determine the K D on insoluble substrates PASC (0.53 6 0.03 g/liter) and high-molecular weight beechwood xylan (0.89 6 0.09 g/liter). Although the K D values suggest tight binding, the catalytic activity increased linearly with increasing substrate concentration up to 10 g/liter, emphasizing the mismatch between apparent binding and effective binding leading to catalysis. No binding was detected on insoluble b-(1,4)-mannan.

Discussion
Biological occurrence of GH5_4 enzymes GH5_4 is primarily a bacterial subfamily, although the occasional occurrence of GH5_4 sequences in fungal and protozoan genomes may be attributed to horizontal gene transfer from bacteria. Many organisms using GH5_4 enzymes are rumendwellers, and microbial communities in animal digestive tracts have been noted for high frequencies of glycoside hydrolase horizontal gene transfer (45).
GH5_4 polypeptides also occur frequently in cellulosomes (6,46), the large, extracellular nanomachines anchored to the outer membranes of certain anaerobic bacteria and fungi. Four of the present enzymes are naturally incorporated into cellulosomes, as evidenced by their fusion to dockerin domains (Fig. 1C). Cellulosomal organisms may be either ruminal or terrestrial. Two of the present cellulosomal GH5_4 proteins (4IM4 and 6MQ4) also possess carbohydrate esterase domains, which help debranch xylan and break linkages to lignin in plant biomass (47).
There appears to be no correlation between the habitat of the host organism and a preferred domain arrangement or GH5_4 clade. This may be a testament to the inherent catalytic flexibility of GH5_4 enzymes, which allows incorporation of the catalytic domains into a wide variety of biomolecular frameworks for optimal nutrient acquisition.

Functional diversity via loop length
GH5 enzymes adopt the (a/b) 8 -barrel fold, where loops connecting the outer a-helices to the core b-strands contribute to active-site shape. We examined the structural arrangements of these loops in greater detail and identified several features that contribute to substrate selectivity across GH5_4.

Loop contributions to the active-site shape
In all cases but one (6XRK), loop 4 forms the left wall of the active-site cleft and provides a critical Trp residue that stacks against the 11 sugar in the active site, positioning it for catalysis. Loop 6 forms the right wall of the active-site cleft and supplies additional aromatic residues to position the sugar chain perpendicular to the floor of the active site. Loop 6 also has the greatest diversity in length, ranging from the 8-residue loop in 6XSO (offset by a larger-than-average loop 4) to the 35-residue loop containing two b strands in 6XRK (offset by a smallerthan-average loop 4). The size and conformation of these two loops determine the width of the cleft immediately following the site of catalysis-the pinch point where the 11 sugar is clamped into position for hydrolysis.
Enzymes with a narrower cleft tend to have lower K m for GM than XG, with the exception of 6PZ7 and 4IM4, which have similar K m for both substrates, suggesting the linear GM can more easily settle across the active site when space is limited. Again, with the exception of 4IM4 and 6PZ7, a narrow cleft also appears to correlate generally with a higher k cat on all substrates. This makes sense, considering that accurate placement of the 11 sugar may help force the scissile glycoside bond into proper position. Conversely, a narrower cleft may interfere with the initial chain-loading step for complex substrates like XG, raising K m .
Other loops impact the shape of the binding cleft, especially loops 1, 3, 5, and 8.
Loop 1 determines the length of the "loading platform" and the number of potential interactions with negative-subsite sugars. It is not clear whether these extended platforms provide a defined fourth negative subsite, but 4IM4 and the other proteins with a shortened loop 1 can accommodate no more than three.
Loop 3 provides one interesting case, where in 6XSO, a b-turn harboring a Lys-Asp pair projects in toward the cleft, providing potential close polar contacts with several branch sugars in the modeled substrate (Fig. 4). Loop 3 also supplies a Phe side chain oriented to contact a galactose branch in XG, but the functional relevance of these potentially specific interactions requires further study.
Loop 5, with the shortest average length of all the loops, provides some potential stabilizing interactions at the 11 subsite, where a xylose branch was observed to bind in an oligosaccharide co-crystallization study (16). Although this xylose binding interaction may enable XGase activity in GH5_4, the potential strength of the interaction did not appear to correlate with differences in either K m or k cat .
Finally, loop 8 extends the boundary of the right wall formed by loop 6. 4IM4, 6MQ4, and 6Q1I possess a Tyr that stacks optimally with a xylose branch at the 23 position, but the presence of this interaction did not correlate with a lower K m or higher k cat on XG in this data set.
In summary, the contributions of each of these auxiliary loops to substrate selectivity may be incremental compared with cleft width. However, the combination of multiple interactions likely contributes to the fine-tuning enzyme selectivity. Additional studies of the contributions of the above-mentioned four loops may help clarify their contributions to branch sugar interactions.

Aromatic surfaces guide substrate selectivity
Interactions between electronegative p-systems of aromatic side chains and electropositive C-H bonds in sugar rings are frequent and important (48). The number and location of enthalpically driven p-CH interactions, especially involving Trp, are key determinants of protein affinity for carbohydrate ligands (49), and most sugars display anisotropy between their faces; that is, one side of the ring is more electropositive and interacts more often with aromatic side chains. Glucose, however, is a special case because all ring hydroxyls are oriented equatorially (48), meaning both faces have high potential for p-CH interactions. GH5_4-binding clefts appear to take advantage of this property when interacting with glucan substrates by clamping them with aromatic residues on both sides in the vicinity of the active site, but not in every case (Fig. 3).
Aromatic surfaces may help guide a bound substrate chain to the exit, and GH5_4 proteins use two strategies: linear and bent. Several enzymes with high catalytic efficiency for reaction with GM have additional Trp and Tyr residues oriented along the cleft axis, which may keep a linear substrate in a linear exit conformation. 6XSU exemplifies this strategy best, because it has two exposed Trp side chains comprising a continuous surface along the left wall toward the exit, encouraging a linear conformation (Fig. 3D). This makes 6XSU the most efficient GMase, but a poor XGase when compared with the other enzymes.
6WQY and 6XRK are among the most efficient XGases and the poorest GMases, but probably for different reasons. 6WQY possesses two Trp residues in the left wall, which is critical for GMase activity and appears to guide a linear exit; the right wall, however, lacks a properly positioned aromatic side chain (Fig.  2). A linear exit is thus encouraged by the left wall but not enforced by a clamp.
In contrast, 6XRK lacks a left wall but presents aromatic surfaces on the right wall to guide a bent glucan exit conformation. This makes it an excellent XGase, because the 11 xylose branch predisposes the substrate to a bent exit, but the lack of a clamp renders it a poor GMase. This emphasizes an important finding that in GH5_4, high efficiency on complex substrates such as XG and on linear substrates such as GM are not mutually exclusive, so long as both a clamp and an optional bent exit (or at least an unenforced linear exit) are available.
A structure recently published by Venditto et al. (18) of Cel5B, a GH5_4 enzyme from Bacillus halodurans (PDB entry 4V2X), shows a clear case where multiple strategies are simultaneously employed to achieve broad substrate selectivity. The enzyme bears two additional C-terminal domains, an Ig-like linker domain followed by CBM46. The latter is unique to clade 1 proteins GH5_4 and presents an additional Trp to the right wall of the cleft, continuous with loop 6. Using the XG oligosaccharide ligand from 4W88 as a model, the angle of exit of the glucan chain points almost directly to this residue, which could interact favorably with a glucose at 14. In our structure of 6XRK, the position of the loop 6 Trp-238 aligns closely with Trp-501 on the CBM46 of Cel5B (distance between the Trp CG atoms of these two residues in the two structures is 4.7 Å after alignment of the central TIM barrels). This evolved convergence of structures suggests a common purpose for the Trp as a positionally specific supplement to substrate binding.
In addition, Cel5B possesses a well-defined left wall with a Trp contacting the 11 glucose, which our results indicate is a requirement for high catalytic efficiency on linear substrates; however, this Trp is missing in 6XRK. Indeed, Cel5B reacts with both BBG and XG and is ;5-fold more reactive with BBG. However, upon deletion of CBM46, k cat for BBG is detrimentally affected. This implies that in the case of Cel5B, the left wall is necessary but insufficient for high linear substrate efficiency and that the extended right wall provided by CBM46 plays a critical role in keeping the chain in a catalytically competent position. The overall cleft is relatively wide, which is expected to reduce catalytic efficiency with linear substrates (as in the case of 6XSO), but by taking advantage of the bent exit normally used by XG, BBG can form sufficient stabilizing interactions to remain in the cleft long enough for nucleophilic attack to occur.
Bending of glucan chains to fit the particular shape of GH5 active sites has been reported previously (9), and although frequently observed in lichenases that hydrolyze both b-(1,3) and b- (1,4) linkages, a "kinked" b-(1,3) linkage need not be a specific requirement for bending, given the flexibility of b- (1,4) polysaccharides in solution. Additionally, the pyranose rings of sugars need not be centered directly over the aromatic plane of Trp and Tyr residues to benefit from stabilizing forces (50), supporting a model of dynamic interaction between a flexible glucan chain and a protein surface "painted" with p-electron density.
In summary, our data combined with those of others support a mechanistic framework for dual branched-linear substrate selectivity in GH5_4 along five general principles: 1) high catalytic efficiency with linear polysaccharides requires, at minimum, p-CH interactions from specifically positioned aromatic residues on the left side of the 11 sugar; 2) additional interactions with positive-subsite sugars enhance catalysis with XG, but must be paired with a bent exit; 3) linear substrates may use either bent or linear pathways to exit the channel, and at least one must be available for high catalytic efficiency; 4) XG will use the bent pathway to exit the channel, if available; and 5) a narrow, two-sided clamp enhances hydrolysis but can reduce catalytic efficiency with XG (via reduced saturability) in the absence of additional stabilizing interactions.

Catalytic efficiency and substrate solubility
GH5_4 has received attention in recent years as a source of multifunctional "cellulases," but this term is ambiguous and may obscure the true function of individual members of this subfamily. Our present study indicates that GH5_4 is best characterized as a b-glucanase subfamily with some members that exhibit a breadth of reactivity on mixed-linkage glucans, XGs, and hetero-b-(1,4)-glucans, plus some less frequent and lower reactivity on amorphous cellulose and insoluble b-(1,4)polysaccharides.
We observed convergence of catalytic efficiency (k cat /K m ) for XG and GM and to a lesser extent for the mixed-linkage glucan lichenan, but not for xylan, mannan, or PASC. GH5_4 may have reached an efficiency ceiling for XG and GM, so these may be the "true" substrates of the subfamily.
The lack of convergence for the other substrates suggests that either 1) there is less selection pressure for improvement on insoluble substrates in the enzymes' native context, or 2) improvement on both classes of substrate is mutually exclusive because of incompatible protein structural requirements (i.e. flexible and hydrated polymers require a different activesite shape than rigid and crystalline ones).
In every case where k cat could be measured for xylan, it was lower than for XG, GM, and lichenan, although K m was occasionally lower for xylan than other substrates. GH5_4 enzymes Structure and selectivity of GH5_4 endoglucanases may thus be able to accept b-(1,4)-linked pentose cell-wall polysaccharides, like xylan, but are more optimally evolved to hydrolyze hexose substrates, which better fill the space in the active-site cleft.
Activities on PASC were lower still, and mannan hydrolysis occurred above background for only five of the 10 proteins. We attribute the low reactivity on these substrates to their near total insolubility in water (and thus poor enzyme accessibility), although in the case of mannan, we considered the possible contribution of the axial hydroxyl at C2 in mannose (see next section). That GM was one of the most reactive substrates suggests that b-(1,4)-mannosyl linkages may be as reactive as b-(1,4)-glucosyl linkages when hydrated, especially considering that the GM substrate was ;60% mannose by weight.
6UI3 was the most active mannanase in our original screen (6), yet it did not bind mannan in our pulldown assay (but it did bind GM), further suggesting that mannanase activity is limited by low accessibility of the substrate.

Sugar subsite occupancy and selectivity
The dual cellulase-mannanase 6UI3 offered an opportunity use NIMS to further explore the cleavage selectivity of two distinct oligosaccharides. Fig. 8 provides a schematic of binding in the negative and positive sugar subsites that provides insight into the origin of the different product distributions observed with C6 and M6.
For M6 and M5 hydrolysis, the following reactions are relevant.
The results of Table 4 and Fig. 7 indicate that symmetric cleavage of M6 is preferred (Reaction 1, ;65% of hydrolysis products), which must occur by occupancy of 23 to 13 sites (Fig. 8B). A secondary, asymmetric cleavage of M6 (Reaction 2) could utilize either the 24 to 12 or the 22 to 14 sites, leading to the observed nearly equal proportions of M2 and M4. Furthermore, M4 is not further hydrolyzed, so 6UI3 requires occupation of at least five sugar subsites for its mannan hydrolysis reaction.
For M5, equimolar amounts of M3 and M2 were observed, consistent with Reaction 3. b-(1,4)-Mannosidase activity, removal of a single mannose residue, was not observed because neither M1 nor M4 was detected in the M5 reactions above trace amounts. The product distribution from M5 is therefore consistent with occupation of the 23 to 12 (or, less likely, the 22 to 13 sites), leading to the observed equal proportions of M2 and M3.
For C6 hydrolysis, the following equations are relevant.
The results of Table 4 and Fig. 7 indicate that C6 reacts in two ways fundamentally different from M6. First, the lack of any C4 product indicates that cellulose hydrolysis only requires occupancy of the 22 to 12 sites (Fig. 8), one fewer negative site than mannan. The requirement for occupation of four sugarbinding sites means that C3 is not further processed into C2, and this is supported by the negligible amount of C1 observed.
Second, although symmetric binding leads to the accumulation of C3, 6UI3 seems to preferentially use an asymmetric binding mode, which can be rationalized as follows. Assuming no reaction of C3, if Reaction 4 and the combination of Reactions 5 and 6 were equally favorable, the expected C2:C3 ratio would to be ;1.5. However, Table 4 shows that the actual ratio of C2:C3 is ;3.5. Consequently, an asymmetric binding of C6 into either (or both) of the 22 to 14 or the 24 to 12 binding sites is required to account for the observed product distribution. Fig. 8 provides insight into the possible origin of the differences in cellulose and mannan hydrolysis by 6UI3. Overall, the discrimination appears to reside at the 23 site, because occupancy of this site is required for mannan hydrolysis but not for cellulose hydrolysis. This makes sense structurally, because the axial -OH at C2 in mannose points (favorably) away from the protein at 21 and 23 sites but (unfavorably) toward the protein at the 22 site (Fig. 8A).
Binding of additional sugars before the 21 site is critical for GH5 activity (11); however, the axial OH in mannose likely disrupts binding at the 22 site. Therefore, we propose that binding at the 23 site is required to compensate. The mannose sugars bound in the 21 and the 23 sites may also have higher affinity for the conserved aromatic residues than glucose because of the increased electropositivity of mannose contacts with the aromatic residues, potentially compensating for the disrupted binding at the 22 site (48).
The Trp residue at the 23 site is conserved in GH5_4, and all of the enzymes that were active with M6 yielded a similar product distribution to 6UI3. However, not all GH5_4 enzymes hydrolyze mannan. Thus, the interaction of mannose and Trp at the 23 site helps to explain product distributions, but not the presence or absence of mannan reactivity.

Pentose versus hexose polysaccharides
The similarity of product distributions for M6 and X6 (Table  4) suggests similar requirements for occupation of the sugar subsites. However, the physical rationale for this requirement is likely not the same, because unlike mannose, xylose has relatively symmetric electropositivity on each of its faces. Instead, the lack of C6-OH on xylose may affect the positioning of the 21 and 11 sugar. For the 21 sugar, a C6-OH may provide leverage to insert the pyranose ring deep in the active site, whereas at the 11 sugar, this functional group could contact the cleft surface directly, angling the leaving sugar upward to assist the positioning of the scissile bond. These putative interactions are absent in xylose, which may explain the need for more stabilizing interactions at the distal 23 site to compensate.

Saturation of binding and catalysis
Whereas K D is commonly expressed as the ratio of k off /k on for substrate binding to free enzyme, K m has been conceptualized as the ratio of substrate capture rate to product release rate (51), with [S] = K m being the substrate concentration at which these two rates are equal. At [S] , K m , activity is limited by the capture rate, whereas at [S] . K m , total velocity is often limited by product release. The low values of K D relative to K m in this study suggest a rapid association of substrate with free enzyme, but despite the ability to saturate binding at low concentration, catalysis is not saturated until much higher concentrations.
One possible explanation for this discrepancy is the distinction between binding and productive binding; that is, whereas nearly 100% of the enzyme may be associated with a substrate well before reaching [S] = K m , only a certain proportion of these ES pairs may be adequately poised to initiate the first step of catalysis, which is glycosylation of the enzyme via nucleophilic attack of the catalytic Glu. Tight, nonproductive binding of cellulases to polysaccharides and the aromatic polymer lignin has been widely observed and modeled (52)(53)(54)(55)(56), and given the modular nature and promiscuity of GH5 active sites, as well as the structural similarity among polysaccharides, it is straightforward to envision how an enzyme could interact favorably with a glycan chain without achieving the proximity to the catalytic residues required for hydrolysis. An inverse relationship exists between substrate affinity and hydrolysis rate in many cellulases (56), which may stem from such nonproductive binding.
A second possibility may be related to release of products from the active site. Given that the structure of the substrate and products are similar and that the majority of the proteinligand interactions are unchanged from the beginning to the end of catalysis, it may take additional time, post-deglycosylation, for the two products to dissociate. Short oligosaccharides are known to be competitive inhibitors of some cellulases (57)(58)(59), supporting the notion that hydrolysis products retain appreciable affinity for the active site. High substrate availability could help to speed product dissociation, because an uncut substrate could cooperatively anchor into both the negative and positive subsites, whereas the two products only access one anchor each.
The reason K D is up to several orders of magnitude smaller than K m in all cases we tested may be a combination of the factors above. Whereas nonproductive binding may seem like a disadvantage, the ability to interact with multiple polysaccharides sharing broadly similar characteristics (i.e. b-(1,4)-linked main chains) is the fundamental requirement for broad selectivity, a trait that has obvious evolutionary advantages in a complex or changing nutritional environment.
The wide range of K m and k cat on XG, lichenan, and GM (Table 2), along with the colinear clustering of k cat /K m (Fig. 5), also hints at evolutionary adaptation to diverse environments. Individual GH5_4 enzymes may, for example, be adapted to lower substrate concentrations (lower K m ), but their increased affinity comes at the cost of dampened reaction velocity (lower k cat ). Whether this trade-off is imposed by the physical limits of mass action on the substrate or by a fitness ceiling in the local protein sequence landscape is the subject of ongoing investigation in many laboratories.

Conclusions
In this work, our standardized comparative examination of GH5_4 structure and function revealed complex molecular mechanisms for determining substrate selectivity. Polysaccharide recognition is mediated by aromatic residues at several key points in the binding cleft, some of which sit at considerable distance from the site of catalysis. GH5_4 widely accommodates features of branched XG, whereas individual members may be further optimized for cleavage of more linear glucans by favoring different chain conformations in the positive subsites. Whereas the GH5_4 fold is well-conserved, the shapes of GH5_4-binding clefts may show radical differences, as various structural strategies have evolved to supply critical protein-carbohydrate interactions. Diagnostic glucan and mannan oligosaccharides are cleaved into products of different sizes, suggesting that the negative subsites play an important role in enabling broad spectrum activity. Furthermore, the ability of GH5_4 proteins to tolerate so much sequence diversity (7,15) and to operate on such a wide array of substrates highlights the potential of GH5_4 as a starting point for efforts to create custom glycoside hydrolases for industrial bioenergy applications (60)(61)(62).

Experimental procedures
Gene sequence sources and PCR DNA sequences encoding GH5_4 domains were amplified from a library of pEU cell-free expression plasmids synthesized in previous work (6). Insertion into isopropyl 1-thio-b-D-galactopyranoside-inducible overexpression vector pVP67K or into arabinose-inducible pBAD-mKalama1 was confirmed by colony PCR and Sanger sequencing.

Site-directed mutagenesis
The catalytic nucleophile was identified in each sequence by protein sequence alignment (Clustal Omega), using 4IM4 as a reference. Mutagenic primers, of ;40 base pairs in length, were designed to mutate the nucleophilic Glu to a noncatalytic Gln. Site-directed mutagenesis was performed by typical PCR protocols, and mutagenesis was confirmed by Sanger sequencing.

Construction of fluorescent fusion proteins
The plasmid pBAD-mKalama1 was obtained as a gift from Robert Campbell (Addgene 14892). GH5_4 catalytically inactive gene sequences were cloned between the N-terminal His 6 tag and the fluorescent mKalama sequence by Gibson assembly. Insertions were confirmed by PCR and Sanger sequencing.

Protein expression and purification
Rosetta (Millipore-Sigma) cells were transformed with sequence-confirmed WT or catalytically inactivated plasmids (pVP67K, N-terminal His 8 tobacco etch virus-cleavable tag) and plated on selective medium. Protein expression was performed using autoinduction (63)(64)(65). Starter cultures were grown in non-inducing medium overnight and used to inoculate autoinduction medium, and cells were grown at 25°C for 24 h. Cells were harvested by centrifugation and lysed using lysozyme and sonication. Lysate was centrifuged at 20,000 rpm for 45 min, and the supernatant was clarified with a 0.4-mm PES filter. Filtered supernatant was separated by nickel affinity chromatography on an Akta Start FPLC. Purified protein was desalted and concentrated to 10-20 mg/ml. Expression of mKalama1 fusion proteins was performed by arabinose induction in LB medium (66), but purification was performed the same as above.

X-ray crystallography
Crystallization experiments were conducted in the Collaborative Crystallography Core in the Department of Biochemistry at the University of Wisconsin (Madison, WI, USA). Unless otherwise specified, the following procedures were followed for all targets. Crystallization experiments were set with a TTP Labtech Mosquito robot in MRC SD-2 plates at 20°C (297 K). General screens used in this study were Hampton Research IndexHT, Molecular Dimensions JCSG1 (67), Morpheus (68), and PACT premier (69). Diffraction data were reduced with XDS (70), scaled with XSCALE (71), solved by molecular replacement with Phaser in the Phenix suite of programs (72), automatically rebuilt with phenix.autobuild (73), iteratively rebuilt in Coot (74), refined using phenix.refine (75), and validated using MOLPROBITY. Diffraction data were collected using beamlines at the Advanced Photon Source (APS) Argonne National Laboratory at 100 K.
6MQ4 crystals were obtained from 1 mM protein incubated with 5 mM cellohexaose for 4 h at room temperature. The mixture (of protein and cellohexose) was screened using a TTP Labtech Mosquito, using MRC SD-2 plates. Droplets composed of 200 nl of protein-substrate mixture and 200 nl of reservoir were equilibrated against 50 ml reservoir. An exceptional crystal formed directly from the JCSG1 screen, condition B6, 40% ethanol, 10% PEG 1000, 0.1 M phosphate citrate buffer, pH 4.2. Crystals were exposed to vapor from 45% ethanol for 30 s prior to plunge cooling in liquid nitrogen. Data were collected on August 23, 2015 at LS-CAT 21ID-F, 0.97857 Å (540 frames, 0.5°, 90 mm) using a Rayonix MX300 CCD detector. The structure was phased by MR using 2WAB.
6PZ7 protein was crystallized in SD2 plates set by a Mosquito crystallization robot. The hit came from PACT Premiere Screen A7, 200 mM NaCl, 100 mM sodium acetate, pH 5.0, 25% PEG 6000. Crystals were cryopreserved in reservoir solution supplemented with 15% ethylene glycol. Data were collected at LS-CAT 21ID-D on October 10, 2013 at 1.07805 Å (180 frames, 1°, 225 mm) using a Rayonix MX300 CCD detector. The structure was phased by MR using 3NDZ.

Enzyme kinetic analysis
All substrates were obtained from Megazyme, except beechwood xylan (Sigma-Aldrich) and PASC (prepared in-house from Avicel from Sigma-Aldrich). Enzyme activities on polysaccharides were determined using the BCA reducing-sugar assay. Assays were conducted in triplicate with enzyme concentration of 1 mM and substrate concentrations up to 10 g/liter, in a 200-ml total volume of 0.05 M phosphate buffer, pH 6. Reactions were incubated at 30°C, with shaking at 900 rpm. XG, GM, and lichenan reactions were incubated for 10 min, whereas PASC, mannan, and xylan reactions were incubated for 60 min. The 96-well plates were spun at 3000 3 g for 30 min at 4°C, pelleting the undigested substrate. The supernatant (5 ml) was added to 100 ml of BCA working reagent, and plates were incubated at 80°C for 15 min. Sample absorbance was measured at 562 nm and converted to mmol/min using a glucose standard curve.
Data were fit to the Michaelis-Menten equation using the Solver function in Microsoft Excel (GRG nonlinear leastsquares error minimization with k cat and K m as parameters).

End-product analysis by NIMS
To determine the concentrations of oligosaccharide enzyme digest products, reactions were carried out at 30°C, pH 6 (sodium phosphate, 50 mM) for 2 h using 1 mM enzyme. Substrate concentrations were 0.5 g/liter for oligosaccharides, 5 g/liter for mannan, and 2.5 g/liter for PASC. Analysis was performed by NIMS, as described in detail elsewhere (43,79). All oligosaccharides were obtained from Megazyme.

Insoluble polysaccharide binding assay
Enzyme binding to insoluble polysaccharides was determined using mKalama1 fluorescence. Assays were conducted in triplicate using an enzyme concentration of 2 mM, PASC concentrations up to 5 g/liter, and insoluble xylan concentrations up to 50 mg/ml in a 200-ml total volume with 0.05 M phosphate buffer, pH 6. The enzyme-substrate reactions were incubated at 4°C, with shaking at 900 rpm for 60 or 90 min for insoluble xylan or PASC, respectively. Samples were spun at 3000 3 g for 30 min (for PASC) or 20,000 3 g for 15 min (for xylan) at 4°C to pellet the bound enzymes along with the substrate. The fluorescence of the supernatant (100-ml nominal volume) was measured using excitation at 385 nm and emission at 456 nm.

Affinity gel electrophoresis
To prepare the 7.5% resolving gel, 3.75 ml of 30% acrylamide, 3.75 ml of 1.5 M Tris, pH 8.8, and 7.5 ml of water were mixed in a vacuum flask and degassed. Polysaccharide affinity gels were prepared by substituting a portion of the water with polysaccharide substrate solution. TEMED (7.5 ml) and ammonium persulfate (75 ml) were added, and the solution was pipetted into gel molds, with 1 ml of isopropyl alcohol added to the top. To prepare the 4% stacking gel, 2 ml of 30% acrylamide, 3.8 ml of 0.5 M Tris, pH 6.8, and 9 ml of water were mixed in a vacuum flask and degassed. TEMED (15 ml) and ammonium persulfate (75 ml) were added, and when the resolving gel was solidified, the surface was rinsed of isopropyl alcohol, and the stacking gel solution and combs were added.
To determine K D , native electrophoresis was conducted in a pH 8.3 Tris-glycine running buffer at 200 V for 60 min at 4°C with enzyme concentration of 0.5 g/liter and substrate concentrations up to 0.1 g/liter. Gels were stained in Coomassie dye and imaged on a light box. R f was determined for each band relative to BSA, a protein standard that did not bind polysaccharides, and K D was estimated using normalized R f curves with respect to ligand concentration.

Data availability
The structures presented in this paper have all been deposited in the PDB under accession numbers 6MQ4, 6PZ7, 6Q1I, 6UI3, 6XSU, 6XSO, 6WQP, 6XRK, 6WQY, 6WQV, and 4IM4. All remaining data are contained in the paper.