Mapping Molecular Recognition of β1,3-1,4-Glucans by a Surface Glycan-Binding Protein from the Human Gut Symbiont Bacteroides ovatus

ABSTRACT A multigene polysaccharide utilization locus (PUL) encoding enzymes and surface carbohydrate (glycan)-binding proteins (SGBPs) was recently identified in prominent members of Bacteroidetes in the human gut and characterized in Bacteroides ovatus. This PUL-encoded system specifically targets mixed-linkage β1,3-1,4-glucans, a group of diet-derived carbohydrates that promote a healthy microbiota and have potential as prebiotics. The BoSGBPMLG-A protein encoded by the BACOVA_2743 gene is a SusD-like protein that plays a key role in the PUL’s specificity and functionality. Here, we perform a detailed analysis of the molecular determinants underlying carbohydrate binding by BoSGBPMLG-A, combining carbohydrate microarray technology with quantitative affinity studies and a high-resolution X-ray crystallography structure of the complex of BoSGBPMLG-A with a β1,3-1,4-nonasaccharide. We demonstrate its unique binding specificity toward β1,3-1,4-gluco-oligosaccharides, with increasing binding affinities up to the octasaccharide and dependency on the number and position of β1,3 linkages. The interaction is defined by a 41-Å-long extended binding site that accommodates the oligosaccharide in a mode distinct from that of previously described bacterial β1,3-1,4-glucan-binding proteins. In addition to the shape complementarity mediated by CH-π interactions, a complex hydrogen bonding network complemented by a high number of key ordered water molecules establishes additional specific interactions with the oligosaccharide. These support the twisted conformation of the β-glucan backbone imposed by the β1,3 linkages and explain the dependency on the oligosaccharide chain length. We propose that the specificity of the PUL conferred by BoSGBPMLG-A to import long β1,3-1,4-glucan oligosaccharides to the bacterial periplasm allows Bacteroidetes to outcompete bacteria that lack this PUL for utilization of β1,3-1,4-glucans. IMPORTANCE With the knowledge of bacterial gene systems encoding proteins that target dietary carbohydrates as a source of nutrients and their importance for human health, major efforts are being made to understand carbohydrate recognition by various commensal bacteria. Here, we describe an integrative strategy that combines carbohydrate microarray technology with structural studies to further elucidate the molecular determinants of carbohydrate recognition by BoSGBPMLG-A, a key protein expressed at the surface of Bacteroides ovatus for utilization of mixed-linkage β1,3-1,4-glucans. We have mapped at high resolution interactions that occur at the binding site of BoSGBPMLG-A and provide evidence for the role of key water-mediated interactions for fine specificity and affinity. Understanding at the molecular level how commensal bacteria, such as prominent members of Bacteroidetes, can differentially utilize dietary carbohydrates with potential prebiotic activities will shed light on possible ways to modulate the microbiome to promote human health.

T hroughout evolution, the human gut microbiota has evolved to efficiently target and degrade complex carbohydrate molecules derived from the human diet, popularly termed "dietary fiber." These carbohydrates evade a complete metabolization by the human digestive system, which is intrinsically poor in complex carbohydrate active enzymes (CAZymes) (1,2). As such, the microbial community complements the metabolic capacity of the human organism, producing metabolites that influence nutrition and health (3). Thus, changes in the carbohydrate influx will not only shape the microbiota composition and homeostasis but also have an impact on human physiology. Clarifying the molecular mechanisms that underlie this cross-communication is key to personalized medicine solutions and to fine-tuning therapies for diseases associated with a dysbiosis of the microbiota, including obesity and inflammatory bowel disease (4)(5)(6).
The symbiotic bacterium Bacteroides ovatus is a specialist in complex carbohydrates, carrying in its genome different colocalized gene clusters termed polysaccharide utilization loci (PUL). These encode CAZymes, surface carbohydrate (glycan)-binding proteins (generally designated SGBPs), TonB-dependent transporters (TBDTs), and transcriptional regulators, comprising complete systems to target and degrade major diet-derived and plant cell wall polysaccharides (1). An important group of complex carbohydrates with proven health benefits (7,8) are the mixed-linkage b1,3-1,4-glucans, which are abundant in the endosperm of cereal grains of barley and oats and in algae and edible lichen (e.g., Icelandic moss) (9) and, more recently, have also been identified in microalgae (10). The b1,3-1,4-glucans are linear homopolysaccharides of D-glucopyranose constituted of blocks of three or four consecutive b1,4-linked residues (cellotriosyl or cellotetraosyl units, respectively) separated by single b1,3 linkages (Fig. 1A). The ratios of cellotriosyl and cellotetraosyl units differ with the source of the polysaccharide, resulting in different physicochemical properties (11). A b1,4-glucose-linked chain, as in the cellulose polysaccharide, is rigid and renders the polysaccharide less soluble in water, whereas the b1,3-linked glycosidic linkages confer flexibility and water solubility, creating kinks in the main chain and imposing a twisted conformation on the polysaccharide that challenges microbial degradation (9,12).
In a pivotal study, Martens and colleagues identified a B. ovatus PUL (PUL 51) that is transcriptionally upregulated during growth on a mixed-linkage b-glucan from barley ( Fig. 1B) (13). Recently, Tamura and colleagues demonstrated that copies of this locus, which they named mixed-linkage glucan utilization locus (MLGUL), are present in other Bacteroidetes species that are ubiquitous in the gut of human populations, pointing to the importance of the catabolism of b-glucans with mixed linkages by the microbiome (14). Structural and biochemical studies of two MLGUL CAZymes (a surface GH16 endob-glucanase and a periplasmic GH3 exo-b-glucosidase) and two SGBPs (BoSGBP MLG -A and BoSGBP MLG -B) have contributed to the characterization of the specificity of the MLGUL for mixed-linkage b1,3-1,4-glucans (14,15). These studies provided evidence for a concerted model for polysaccharide enzymatic degradation and oligosaccharide targeting at the cell surface for transport via a SusC-like TBDT to enable complete saccharification to glucose in the periplasm. The SGBP protein encoded by the BACOVA_02743 gene is a SusD homolog and was named BoSGBP MLG -A by Tamura and colleagues (15). In that study, the authors showed that the unique specificity of BoSGBP MLG -A toward mixed-linkage glucans is mediated by shape complementarity of its extended binding site with the twisted conformation of the oligosaccharide backbone and that the binding is dependent on the chain length up to the heptasaccharide, comprising two cellotriosyl repeats. That study also demonstrated the direct involvement of BoSGBP MLG -A in the functionality of the PUL, as mutations on the binding site and a gene knockout mutation blocked mixed-linkage b-glucan utilization by B. ovatus (15). Therefore, a detailed understanding of carbohydrate recognition by BoSGBP MLG -A is pivotal to understanding at the molecular level the utilization of b1,3-1,4-glucans by this symbiont in the microbiota.
The knowledge about carbohydrate recognition by proteins has been revolutionized by the advent of carbohydrate microarrays (16)(17)(18)(19). This technology addresses the need for high-throughput methods to identify carbohydrate ligands for proteins and assign the specificities of carbohydrate-binding proteins. Having realized the importance of glucan recognition across all domains of life, we developed a "glucome" microarray as a screening tool for the study of glucan binding by proteins; the microarray comprises sequence-defined gluco-oligosaccharides with linear and branched sequences of different chain lengths and different linkages (20). We demonstrated its application to a wide range of glucan-binding proteins, including bacterial carbohydrate-binding modules (CBMs), anticarbohydrate antibodies, and immune lectins (20,21).
Here, we report the carbohydrate microarray analysis of BoSGBP MLG -A to evaluate its binding to a wide range of carbohydrate structures and to assign the fine specificity toward b1,3-1,4-gluco-oligosaccharide sequences. We provide further evidence supporting that the binding pattern of BoSGBP MLG -A is different from those of other b1,3-1,4-glucan binding proteins and that BoSGBP MLG -A is unique in its requirement of and preferential binding to longer oligosaccharide chain lengths. By integrating the carbohydrate microarray data with data from affinity studies using microscale thermophoresis, we demonstrate the preferential binding of BoSGBP MLG -A to b1,3-1,4-gluco-oligosaccharide chains longer than the heptasaccharide reported previously by Tamura and colleagues (15). The crystal structure of BoSGBP MLG -A in complex with a b1,3-1,4-glucononasaccharide solved at 1.45 Å reveals unique structural features that, combined with the results of isothermal titration calorimetry (ITC) and site-directed mutagenesis, enable us to assign at high resolution the molecular determinants of the carbohydrate binding and dependency on the chain length. We discuss the potential implications of the preferential binding of BoSGBP MLG -A to long b1,3-1,4-gluco-oligosaccharides for the PUL system's functionality.

RESULTS
BoSGBP MLG -A targets mixed-linkage b1,3-1,4-glucans from different sources. The carbohydrate-binding properties of recombinant BoSGBP MLG -A (Fig. 1C) were first investigated using a microarray comprising carbohydrate probes (polysaccharides and glycoproteins) derived from fungi, bacteria, microalgae, and plants. These probes covered carbohydrate structural diversity with different glycosidic linkages in a or b configuration and were grouped by major backbone type as shown in Fig. 2 Table  S1 in the supplemental material. Monoclonal antibodies (MAbs), carbohydrate-binding modules (CBMs), and lectins were also analyzed and showed binding profiles in accord with their reported carbohydrate-binding properties ( Fig. 2 and Tables S2 and S3), thus validating the constructed microarray.

and in
BoSGBP MLG -A showed strong binding to barley and lichenan b-glucans and to the b1,3-1,4-glucan-enriched fraction isolated from Nanochloropsis oculata microalgae, similar to the binding profiles of CBM11 of Clostridium thermocellum (CtCBM11) and the b1,3-1,4-glucan-specific MAb BS400-3 ( Fig. 2 and Table S2). The fact that BoSGBP MLG -A could bind mixed-linkage b1,3-1,4-glucans from different sources reflects the flexibility to accommodate the linear backbone with different ratios of b1,4-linked cellotriose to cellotetraose units spaced by b1,3-linkages ( Fig. 1) (11). In accord with previous affinity data (15), BoSGBP MLG -A also interacted with branched xyloglucan fractions composed of a b1,4-linked glucose backbone, but with lower binding intensity. No interaction was observed with b1,3-glucans or with polysaccharides with a b1,4-linked backbone other than glucose, such as xylan and mannan, or to any of the other carbohydrate probes featured on the microarray, highlighting the preference of BoSGBP MLG -A for binding to mixed-linkage b1,3-1,4-glucans.
BoSGBP MLG -A shows carbohydrate-binding specificity restricted to b1,3-1,4gluco-oligosaccharides. To investigate the binding specificity of BoSGBP MLG -A at the oligosaccharide level and the influence of oligosaccharide chain length, a structurally diverse gluco-oligosaccharide microarray that represented the major sequences found on glucans was used ( Fig. 3A and Table S4) (20). The microarray was comprised of 153 sequence-defined gluco-oligosaccharides of different degrees of polymerization (DP2 to DP16; the degree of polymerization [DP] is the number of monomer units in a oligosaccharide), with linear and branched-chain lengths and homo-and mixed-linkages in a or b configurations prepared as neoglycolipid (NGL) probes (Table S4).
The microarray analysis revealed a restricted binding pattern of BoSGBP MLG -A, showing highly specific binding to gluco-oligosaccharide fractions with mixed b1,3-1,4 linkages derived from barley b-glucan (DP15 and DP16, probes number 119 and 120 in Fig. 3A and Table S4). Under the conditions of the analysis, no binding to linear b1,4linked gluco-oligosaccharides up to DP13 was detected. While these results appear to contrast with the previously reported binding of BoSGBP MLG -A to b1,3-1,4-gluco-hexasaccharide and -heptasaccharide in solution using ITC (15), they might reflect the ability  (Table S1); the major backbone sequences are depicted at the bottom. The heatmap represents the relative binding intensities calculated as the percentage of the fluorescence signal intensity at 150 pg (0.5 mg/ml)/spot given by the saccharide probe most strongly bound by each protein (normalized as 100%). Results are detailed in Table S2. S. cerevisiae, Saccharomyces cerevisiae; N. oculata, Nanochloropsis oculata; P. palmata, Palmaria palmata; C. albicans, Candida albicans; M. tuberculosis, Mycobacterium tuberculosis; hMalectin, human malectin; TmCBM41, CBM41 of Thermotoga maritima; mDectin-1, murine dectin-1; CtCBM11, CBM11 of Clostridium thermocellum; ConA, concanavalin A; AAL, Aleuria aurantia lectin. of the protein to access the required oligosaccharide sequence to bind productively in the microarrays. Supporting this were the different binding patterns displayed by the characterized glucan-binding CtCBM11 and CBM6-2 of Cellvibrio mixtus (CmCBM6-2) (Table S3). CmCBM6-2 showed the predicted broad binding profile, recognizing all immobilized b-linked gluco-oligosaccharides ( Fig. 3A and Table S4). Although presenting a b-glucan binding profile nearly as narrow as BoSGBP MLG -A, CtCBM11 bound to barley-derived mixed b1,3-1,4-linked gluco-oligosaccharides with shorter chain lengths (e.g., DP7). As a type B CBM, CtCBM11 recognizes the carbohydrate chain internally, which accounts for the requirement of a minimum chain length for access and recognition. CtCBM11 targets, as a minimum binding motif, the mixed-linkage tetrasaccharide repeat with a b1,3 linkage at the reducing end, Glcb1,4Glcb1,4Glcb1,3Glc (G4G4G3G) (22). In the microarrays, glycan probe presentation with the derivatization to the lipid via the reducing-end glucose prevented strong recognition of CtCBM11 by probes with DP4 to DP6 (probes number 105, 107, and 109 in Table S4) by hindering access to the minimum-recognition binding motif. The same phenomenon may occur in the microarray analysis of BoSGBP MLG -A, but with a more pronounced effect, as it requires a longer chain length than the tetrasaccharide for binding (15), and since SusD-like proteins are larger than CBMs, it may therefore be more affected by steric hindrances in the microarray setup.
Binding affinity of BoSGBP MLG -A is dependent on oligosaccharide chain length. To further understand the chain length dependency of BoSGBP MLG -A and the influence of the b1,3-Glc linkage on the binding, microscale thermophoresis (MST) was used as a complementary technique to determine the affinities of the interactions of BoSGBP MLG -A with sequence-defined barley oligosaccharides of different DPs and increased numbers of b1,3-Glc linkages (Fig. 3B).
The MST results also showed the preference of BoSGBP MLG -A for mixed b1,3-1,4linked gluco-oligosaccharides (barley-) compared to b1,4-gluco-oligosaccharides (cello-) (Fig. 3B). The affinity for barley-9 (association constant [K a ] of [4.62 6 1.02] Â 10 5 M 21 [mean 6 standard deviation]) was 1 order of magnitude higher than that for cello-9 (K a of 1.31 6 0.20 Â 10 4 M 21 ). For the more weakly binding cello-7, it was not possible to determine the affinity with confidence, as under the conditions of the analysis, higher concentrations of the oligosaccharide could not be reached due to solubility issues. Importantly, the MST data demonstrated that the binding affinity was dependent both on the carbohydrate chain length and on the presence of b1,3 linkages along the chain ( Fig. 3B and C). The interaction curves obtained with barley oligosaccharides showed a clear effect of the DP on increasing the affinity of the interaction, with barley-6/barley-7 and barley-8/barley-9 grouping with similar affinities. Considering their sequences, the increased affinity could be correlated with the increase in the number of b1,3-Glc linkages (Fig. 3C). These results suggest that BoSGBP MLG -A can bind to mixed-linkage b1,3- probes. CtCBM11 (CBM11 of Clostridium thermocellum) and CmCBM6-2 (CBM6-2 of Cellvibrio mixtus) were used as control proteins. The degree of polymerization (DP) and glucose linkages are indicated at the top of the panels. Some relevant carbohydrate probe sequences for binding to CtCBM11 are depicted in panel B. G, glucose; AO, NGLs were prepared from reducing oligosaccharides by oxime ligation with an aminooxy (AO)-functionalized lipid (39). The binding signals are depicted as mean values of fluorescence intensities of duplicate spots for each probe arrayed at 5 fmol/spot (with error bars) and are representative of at least two independent experiments (details are in Table S4). (B) Microscale thermophoresis analysis of the interaction of BoSGBP MLG -A with sequence-defined gluco-oligosaccharides. Dose-response curves were fitted to a one-site binding model to obtain K a values. Error bars indicate the standard deviations from triplicate experiments (n = 3). Quality of the fitting is given by the standard error of regression and the reduced chi-square (Red x 2 ) parameters. S/N, signal-to-noise ratio. (C) Sequences of the gluco-oligosaccharides depicted from the nonreducing to the reducing end. Monosaccharide symbol representations follow the symbol nomenclature for glycans (SNFG) (52). The 3-linkages are underscored.
The high-resolution structure of BoSGBP MLG -A in complex with a b1,3-1,4gluco-nonasaccharide provides atomic detail on the long-chain interaction. To fully understand the molecular determinants of BoSGBP MLG -A's unique specificity and chain length dependency, the three-dimensional (3-D) structure of BoSGBP MLG -A in complex with barley-9 (G4G3G4G4G3G4G4G3G) (Fig. 1B) was solved by X-ray crystallography to a resolution of 1.43 Å (Fig. 4 and Fig. S1). Data collection, processing, and refinement statistics are summarized in Table 1. Superposition with the structure of BoSGBP MLG -A in complex with barley-7 (G4G4G3G4G4G3G) (15) revealed an overall fold conservation (;0.3-Å root mean square deviation [RMSD] between 516 Ca atom pairs) and an overall match with monosaccharide residues Glc1 to Glc7 (Fig. 4A and B). Most strikingly, in the complex with barley-9, two protein monomers were found in the asymmetric unit forming a protein-sugar-sugar-protein supramolecular assembly, where the protein interacts with a dimerized carbohydrate consisting of two barley-9 chains related by the 2-fold noncrystallographic axis (Fig. S2). That is, one complete chain of barley-9 could be modeled flanked on one side by one protein molecule (chain A) and on the opposite side by a second antiparallel barley-9 molecule that is flanked by the second protein monomer (chain B). The antiparallel barley9A-barley9B dimer is promoted by 4 direct symmetrical hydrogen bonding contacts between pairs , and Glc8A(C2-OH)-(OH-C6)Glc3B. Besides these direct contacts, sugar oligomerization is maintained by 8 water-mediated contacts established between glucose pairs Glc2A-Glc9B, Glc3A-Glc7B, Glc4A-Glc7B, Glc4A-Glc6B, Glc5A-Glc5B, Glc6A-Glc4B, Glc7A-Glc4B, and Glc9A-Glc2B. For clarity in Fig. 4, only chain A is represented and discussed, but all observations were systematically cross-checked in both chains A and B. The high-resolution data (1.43 Å) and the presence of positive peaks of residual electron density in the mF obs 2 DF calc map indicate that alternate conformations are possible for the C6-OH groups of Glc3, Glc5, and Glc6 in chain B, but, overall, the same protein-carbohydrate contacts were maintained compared to chain A component molecules.
Molecular determinants of the carbohydrate-binding specificity are supported by a complex water-mediated hydrogen bonding network. The structure of the BoSGBP MLG -A-barley-9 complex showed an ;41-Å-long binding platform that accommodated and accompanied the natural bends and kinks of this mixed-linkage oligosaccharide ( Fig. 4B and C). All nine glucose residues of the ligand could be modeled in the electron density in the lowest-energy 4 C 1 chair conformation. BoSGBP MLG -A bound to the barley-9 through the oligosaccharide-reducing end (Glc1), and multiple protein contacts up to Glc8 stabilized the interaction. These included CH-p stacking, direct hydrogen bonding, and an extensive water-mediated network ( Fig. 4C and D and Table S5). The nonreducing terminal Glc9 residue showed no interaction with any amino acid, directly or indirectly.
As in the BoSGBP MLG -A structure with the heptasaccharide (15), the triad of residues formed by Trp77, Trp350, and Trp353 constitutes the center of the platform, interacting with the internal sequence of the gluco-oligosaccharide chain (Glc3-Glc7) (Fig. 4C). The CH-p stacking established between the aromatic side chains of the triad and the saccharide rings is the main force positioning the ligand in the binding site. This is mainly mediated by Trp77 and Trp350, as mutating any of these residues to alanine abolished the binding to barley b-glucan and xyloglucan, as determined by ITC analysis (Fig. 5), corroborating the reported results using affinity gel electrophoresis (15). The torsion in the Trp353 side chain in relation to Trp77 and Trp350 causes a curvature of the binding site that optimizes the CH-p interactions with Glc6 and Glc7 and best accommodates the natural bends imposed by the mixed b1,3-1,4 linkages of the oligosaccharide. The importance of these Trp353-mediated interactions for the affinity of BoSGBP MLG -A was corroborated by the unquantifiable weak interaction observed with the Trp353Ala mutant (bearing a change from Trp to Ala at position 353) (Fig. 5). At the reducing end, the aromatic ring of Tyr266 establishes a CH-p stacking with Glc1,  Table S5. and mutation of this residue caused a 3.5-fold decrease in the K a value for both barley b-glucan and xyloglucan polysaccharides, which corroborates the importance of this interaction for anchoring and orienting the oligosaccharide. In addition, the direct hydrogen bonding contacts and electrostatic interactions promoted by eight amino acid residues with the reducing-end tetrasaccharide stretch (Glc1 to Glc4) ( Fig. 4C and Table S5) are conserved in the structures of both BoSGBP MLG -A-oligosaccharide complexes.
The high resolution of the BoSGBP MLG -A-barley-9 structure enabled us to identify 21 ordered water molecules constituting an extensive and complex water network that mediates indirect hydrogen bonding contacts with 18 amino acid residues, supporting the entire oligosaccharide from Glc1 to Glc8 (Fig. 4D and Table S5). In particular, four water molecules were shown to mediate interactions of Arg378 Nh 1 and Nh 2 with O2 and O3 of Glc5 and O6 of Glc6. Also, Arg378 Nh 1 hydrogen binds to the Og of Ser349, which in turn is water bridged to Asn286, a direct contact of Glc4. The mutation of Arg378 to Ala resulted in a decrease of affinity by approximately 15-fold for barley b-glucan polysaccharide but only 3.9 times for xyloglucan, evidencing the importance of these water-mediated contacts for interaction with the mixed-linkage glucans (Fig. 5). At the nonreducing end, a water-mediated triad was also identified between the carbonyl of the main chain of Met376 and Trp353, mediating the interaction with the Glc8 residue (Fig. 4D). Two additional water molecules promoted the contact between the O6 of Glc8, the N« 1 from Trp353, and the Met376 carbonyl oxygen. These water-mediated contacts constitute the protein-ligand interactions holding Glc8 and explain the limit for chain length recognition by BoSGBP MLG -A.

DISCUSSION
The recent characterization of a Bacteroides ovatus PUL targeting complex dietary b1,3-b1,4-glucans provided molecular insight into the specificity of surface carbohydrate binding proteins toward these mixed-linkage polysaccharides and demonstrated the crucial role of substrate binding by the BoSGBP MLG -A SusD homologue in the functionality of the PUL (15). In particular, the X-ray crystallography structure of BoSGBP MLG -A SusD in complex with a barley-derived heptasaccharide (G4G4G3G4G4G3G) revealed that the first level of specificity is determined by shape complementarity of the binding site with the twisted conformation of the glucan chain with mixed b1,3-1,4 linkages (15). Here, we combine the power of the carbohydrate microarray technology with quantitative affinity studies and high-resolution structure determination of BoSGBP MLG -A in complex with a higher-affinity oligosaccharide ligand that we have identified. We visualize thereby details of the molecular determinants underlying the unique specificity of BoSGBP MLG -A toward glucans with mixed b1,3-b1,4 linkages.
We demonstrate that the binding specificity of BoSGBP MLG -A toward b1,3-1,4gluco-oligosaccharide sequences differs from those of other b-glucan binding proteins (e.g., CtCBM11 and CmCBM6-2), with the binding affinity being dependent on the chain length and on the number and position of b1,3 linkages. While corroborating that BoSGBP MLG -A is highly specific toward mixed-linkage b1,3-1,4-gluco-oligosaccharides, with a minimum chain length of DP5/DP6 (15), we demonstrate that the affinity increases with the addition of a b1,3 linkage after every three glucose units. The crystal structure of the BoSGBP MLG -A-barley-9 (G4G3G4G4G3G4G4G3G) complex reveals an ;41-Å-long binding site (compared to the 36-Å-long binding site with the heptasaccharide) comprising an extended platform that accommodates the oligosaccharide through its reducing end up to Glc8. As observed in the structure of BoSGBP MLG -A complexed with the heptasaccharide (15), a triad of aromatic amino acids (Trp77, Trp350, and Trp353) form a CH-p stacking surface that accommodates the internal sequence of the oligosaccharide. These interactions dominate the specificity and set the minimum motif length by creating a curvature at the binding site into which the kink of a b1,3-glycosidic bond between Glc4 and Glc5 fits best. Mutating any of the triad residues to an alanine abrogates (Trp77 and Trp350) or significantly diminishes (Trp353) the affinity of the interaction. Thus, the minimum binding motif must contain at least 5 glucose residues and the sequence G3G4G4G3G.
The high-resolution data obtained for the BoSGBP MLG -A-barley-9 complex structure identified an ordered water network that promotes extended intermolecular hydrogen bonding that is essential to support the long oligosaccharide ligand. In particular, the side chain of Arg378 makes four water-mediated hydrogen bonds that likely contribute to the specificity by stabilizing the Glc4-Glc6 stretch that accommodates the minimum binding motif. The importance of this residue for the specificity is supported by the effect of the point mutation of Arg378 to an Ala, which reduced the affinity of BoSGBP MLG -A for barley b-glucan more than it reduced its affinity for xyloglucan and had an effect comparable to that of the mutation of Tyr266, which stabilizes the reducing end through a CH-p interaction. The importance of solvent organization to BoSGBP MLG -A's affinity is also evidenced by the pair Trp353 and Met376, which sets the preference for longer oligosaccharides by bridging with two water molecules to interact with the Glc8 monosaccharide. The kink imposed on the oligosaccharide chain by the third b1,3-glycosidic linkage between Glc7 and Glc8 approximates Glc8 monosaccharide to the protein surface and likely promotes the interaction through these water-mediated contacts. Thus, the water-mediated interactions are tailoring the BoSGBP MLG -A binding site to mixed-linkage glucans and contributing to the specificity and increased affinity with oligosaccharide chain length. The presence of ordered water molecules is often observed in carbohydrate-protein crystal structures, allowing specific additional interactions to occur between the protein and the ligand (23). The importance of these water-mediated hydrogen bonding networks in modulating ligand affinity and specificity in protein-ligand interactions is increasingly being recognized, giving particular attention to the balance of enthalpic and entropic contributions to binding (23)(24)(25)(26).
The direct hydrogen bonding/electrostatic interactions at the reducing end and additional water network interactions mediating carbohydrate binding by BoSGBP MLG -A will likely allow for flexibility of the binding site to accommodate structural differences of mixed-linkage glucans, i.e., the ratios of cellotriosyl (G4G4G) and cellotetraosyl (G4G4G4G) units spaced by b1,3 linkages, explaining the ability demonstrated here to recognize mixed b1,3-1,4-linked glucans from different sources.
The protein-sugar-sugar-protein assembly that constitutes the asymmetric unit of the BoSGBP MLG -A-barley 9 complex (Fig. S2) may be a consequence of a high local concentration of the oligosaccharide, which might be compensated by the formation of aggregates, helping the crystal nucleation event. The observed assembly in the crystal structure could also reflect a unique means for B. ovatus to entrap long oligosaccharides and prevent the formation of aggregates at high cell surface oligosaccharide concentrations, but experimental validation is required for biological relevance.
Our study and that of Tamura and colleagues (14,15) raise the question as to how the preference of BoSGBP MLG -A for longer oligosaccharides impacts on the function of the PUL-encoded system that occurs widely in the Bacteroidetes species. The hypothesis is that BoSGBP MLG -A could confer specificity and preference for the import of long b1,3-1,4-glucan oligosaccharides generated by the endo-glucanase GH16 (BACOVA_02741) through the TBDT (BACOVA_02742). BoSGBP MLG -A would then likely be playing a role in the rapid import and scavenging of long oligosaccharides to the periplasm rather than targeting the polysaccharide to the surface, to enhance hydrolysis by the enzyme, as initially proposed for the function of these proteins (27). This would complement the action of the other SGBP, BoSGBP MLG -B (BACOVA_02744), which is suggested to assist in capturing oligosaccharides from the environment (15).
Recent studies demonstrate that SusD homologues of Bacteroides thetaiotaomicron can be intimately associated with the cognate TBDT SusC, forming a complex that works as a "pedal-bin" mechanism for import, with SusD functioning as a lid that opens and closes for carbohydrate capture and delivery to the transporter (28,29). For MLGULs, the encoded import mechanism of mixed-linkage b1,3-1,4-gluco-oligosaccharides is yet to be elucidated. Nevertheless, the hypothesis raised would justify the evolution of BoSGBP MLG -A toward a higher affinity for longer oligosaccharides, as observed for other SusD homologues (29,30). As endo-acting Bacteroides glycanases generate large oligosaccharides (31,32), BoSGBP MLG -A could also be optimized to bind to major products generated locally by the PUL endo-glucanase. The preference for longer oligosaccharides would be energetically more favorable and would enable B. ovatus and other Bacteroidetes to outcompete bacteria that lack this PUL-encoded system.

MATERIALS AND METHODS
Plasmid construction and site-directed mutagenesis. The molecular architectures and sequences of primers designed for each construct are displayed in Fig. 1D and Table S6. All constructs were truncated to exclude the signal peptide and N-terminal lipidation cysteine residue predicted with SignalP 4.1 and LipoP 1.0, respectively (33,34). All resulting recombinant proteins have an N-terminal hexahistidine (His 6 ) tag fusion for purification through ion metal affinity chromatography (IMAC) and immunodetection.
For carbohydrate-binding experiments (microarray, ITC, and MST analysis), the high-throughput cloning of the putative carbohydrate-binding SusD-like protein BoSGBP MLG -A (Fig. 1C) was performed following the established protocols of NZYTech Ltd. (Lisbon, Portugal). In brief, the gene BACOVA_02743 encoding the SusD homologue (amino acid residues 24 to 558) was amplified by PCR from Bacteroides ovatus strain ATCC 8483 (NCBI:txid411476) genomic DNA, using specific primers. The pCR product was subsequently cloned into the pHTP1 expression vector using the NZYEasy cloning kit (NZYTech Ltd., Portugal) for ligation-independent cloning (LIC) technology. The pHTP1 vector contains a kanamycin resistance cassette for selection. In pHTP1-A57, the derivative is under the control of a T7 promoter. MicrobiolSpectrum.asm.org 13 For crystallization, BoSGBP MLG -A (residues 39 to 558) was recloned into the pNIC-ZB vector (Structural Genomics Consortium [SGC]; GenBank accession number GU452710), also using the LIC method and with the pHTP1-A57 DNA as the template, originating the BoSGBP MLG -A-ZB construct (Fig. 1C) (35). The pNIC-ZB complementary sequences were included in the forward and reverse primers for LIC. The vector was linearized with the restriction enzyme BsaI (Thermo Fisher Scientific). Both the linearized vector and the amplified insert were treated with T4 DNA polymerase (Thermo Fisher Scientific) to create the complementary overhangs, and they were annealed at a 1:2 molar ratio (vector/insert). For mutagenesis, site-directed mutants of BoSGBP MLG -A were created in the pHTP-A57 vector using the PCRbased NZYMutagenesis kit (NZYTech Ltd., Portugal) according to the manufacturer's instructions. The fidelity of all plasmid constructs was verified by DNA sequencing (Stab Vida Lda, Portugal).
Protein production and purification. For carbohydrate microarray binding assays, the highthroughput expression and purification of BoSGBP MLG -A was done using the established protocols of NZYTech Ltd. (Lisbon, Portugal) (36). In brief, Escherichia coli strain BL21(DE3) harboring the plasmid pHTP1-A57 was cultured in NZY autoinduction Luria-Bertani (LB) medium (NZYTech Ltd., Portugal) supplemented with 50 mg/ml kanamycin at 37°C until reaching an optical density at 600 nm (OD 600 ) of %1.5, with further overnight incubation at 25°C. Cultured cells were resuspended in NZY bacterial cell lysis buffer, and protein was purified by IMAC using the 96-well-plate vacuum manifold setting and eluted in 50 mM HEPES buffer, pH 7.5, containing 1 M NaCl, 5 mM CaCl 2 , and 300 mM imidazole.
Polysaccharides, oligosaccharides, and carbohydrate-binding proteins. The soluble polysaccharides used for ITC analysis (barley b-glucan and tamarind seed xyloglucan) were purchased from Megazyme (Bray, Ireland) and prepared according to the manufacturer's instructions. The barley-4b oligosaccharide used in the MST analysis was purchased from Megazyme (O-BGTETB). The b1,4-oligosaccharides (cello-DP7 and -DP9) and the mixed-b1,3-1,4-linked gluco-oligosaccharides DP6 to DP9 used in the MST analysis were prepared by controlled acid hydrolysis of cellulose acetate (20) or by controlled enzymatic hydrolysis of barley b-glucan with lichenase (Megazyme), respectively. The resulting mixtures were fractioned by gel filtration chromatography with a Bio-Gel P4 column (Bio-Rad) and analyzed by mass spectrometry (20) and NMR (unpublished data).
The polysaccharides, glycoproteins, and oligosaccharides used in the microarray analysis are described in Tables S1 and S4. Selected carbohydrate-binding proteins with characterized specificity were analyzed as controls and comprised monoclonal antibodies, bacterial carbohydrate-binding modules (CBMs), and lectins. These are detailed in Table S3. All CBMs and human malectin were produced in E. coli as recombinant proteins with an N-terminal His 6 tag, except for CtCBM11 from Clostridium thermocellum, which carried a C-terminal His 6 tag. CtCBM11 (22) and human malectin (37) were prepared as previously described. CmCBM6-2, from Cellvibrio mixtus, was kindly provided by Harry Gilbert (University of Newcastle, UK). TmCBM41, from the marine hyperthermophile Thermotoga maritima, was kindly provided by Alisdair Boraston (University of Victoria, Canada).
Carbohydrate microarray analysis. Information on the saccharide and oligosaccharide probes, generation of the microarrays, imaging, and data analysis are described in the supplemental glycan microarray document (Table S7) based on MIRAGE (minimum information required for a glycomics experiment) guidelines (38).
For microarray-screening analysis, the following two types of microarrays were used: (i) a microarray designated "fungal, bacterial, microalgae, and plant saccharide microarray" featuring 32 saccharides (polysaccharides and glycoproteins) derived from fungi, bacteria, microalgae, and plants (Table S1), and (ii) a "gluco-oligosaccharide microarray," comprising 153 sequence-defined gluco-oligosaccharides (Table S4). The oligosaccharides (up to 100 mg) were prepared as NGL probes by microscale conjugation via reducing-end glucose with the aminooxy (AO)-functionalized lipid 1,2-dihexadecyl-sn-glycero-3phosphoethanolamine (AOPE) using oxime ligation in a solvent system, such as CHCl 3 /MeOH/H 2 O/ AcOH, 25:25:8:1, analyzed by high-performance thin-layer chromatography (HPTLC) and mass spectrometry, and accurately quantified in solution as previously described (20,39). For construction of the microarrays, the polysaccharides and glycoproteins (0.03 and 0.1 ng per spot) or the NGL probes (2 and 5 fmol/spot) in the form of liposomes were printed and immobilized noncovalently as duplicate spots on nitrocellulose-coated glass slides (UniSart 3D microarray slide; Sartorius, Goettingen, Germany), following established protocols (20,40). The fluorescent dye cyanine 3 was included in the printing solution as a tracer for quality control of the arraying process and for localization of the printed spots.
The microarrays were probed with BoSGBP MLG -A following described protocols (20,40). In brief, the nitrocellulose surface was blocked with 3% bovine serum albumin (BSA) (product number A8577; Sigma-Aldrich) in 5 mM HEPES buffer, pH 7.4, 150 mM NaCl supplemented with 5 mM CaCl 2 (3% BSA in HBS-Ca), followed by incubation with the protein diluted in the binding buffer (1% BSA in HBS-Ca). For the saccharide microarrays, BoSGBP MLG -A was analyzed at 100 mg/ml, and binding was detected using an antipolyhistidine monoclonal antibody (Ab1) (product number H1029; Sigma-Aldrich) preincubated (for 15 min) with an biotinylated anti-mouse IgG (Ab2) (product number B7264; Sigma-Aldrich) at 10 mg/ml. For the gluco-oligosaccharide microarray, BoSGBP MLG -A precomplexed with Ab1 and Ab2 at a ratio of 1:2:2 (by weight) was analyzed at 20 mg/ml. The protein-antibody complexes were prepared by preincubating Ab1 and Ab2 for 15 min, followed by incubation with BoSGBP MLG -A for 15 min and final dilution in binding buffer for microarray overlay.
In parallel, the microarrays were analyzed with sequence-specific proteins for quality control and data validation (Table S3). The His-tagged murine dectin-1 and human malectin lectins were tested at 5 mg/ml, and the binding was detected with the precomplex of antibodies Ab1 and Ab2 at a final concentration of 10 mg/ml in the binding buffer. The His-tagged CBMs TmCBM41, CmCBM6-2, and CtCBM11 were analyzed at final concentrations of 5 to 20 mg/ml, precomplexed with Ab1 and Ab2 at a ratio of 1:3:3 (by weight), and diluted in 1% BSA in HBS-Ca after blocking with 3% BSA in HBS-Ca. The monoclonal antibodies BS400-2, BS400-3, BS400-4, LM5, LM6, LM11, LM21, and LM25 were analyzed at 10 mg/ml using specific biotinylated secondary antibodies for detection. In brief, after blocking with 0.02% casein (catalog number 37583; Thermo Scientific), 1% BSA in HBS-Ca, the microarrays were probed with the antibodies prepared in the same buffer, followed by incubation with 3 mg/ml biotinylated anti-mouse IgG (product number B7264; Sigma-Aldrich), anti-rat IgG (product number B7139; Sigma-Aldrich), or anti-rat IgM (catalog number 612-4607; Rockland) as appropriate. The lectins Aleuria aurantia lectin (AAL) and concanavalin A ConA were analyzed at 2 and 5 mg/ml, respectively, using a single-step overlay protocol for biotin-tagged samples. In brief, the array was blocked with 3% BSA in HBS-Ca, followed by incubation with the different lectin solutions, prepared in the binding buffer.
For all the analyses, Alexa Fluor 647-labeled streptavidin (Molecular Probes, 1 mg/ml) was used for the fluorescence readout. Binding assays were conducted at room temperature. All microarray slides were scanned with the GenePix 4300A fluorescence scanner, and quantitation of the fluorescence was performed using GenePix Pro software (Molecular Devices). Microarray data were analyzed using a software developed by Mark Stoll from the Glycosciences Laboratory (Imperial College London, UK) (41). The parameters for recording the fluorescence images were selected considering the signal-to-noise ratio and the saturation of the signal in the different experiments. These are detailed in the MIRAGE (38) document in Table S7. The binding signals in the microarrays were dose dependent. The results given are plotted as the average values from two replicates for binding signals at 0.1 ng/spot (saccharides) or 5 fmol/spot (NGL).
MST. The affinities of the interactions between BoSGBP MLG -A and different mixed-linkage b1,3-1,4and b1,4-gluco-oligosaccharides (barley-and cello-, respectively) were measured using microscale thermophoresis (MST) in a Monolith NT.115 instrument (NanoTemper Technologies, Germany). BoSGBP MLG -A was labeled with the red fluorescent dye NT-647 using the Monolith NT protein-labeling kit red-NHS (Nhydroxysuccinimide) (catalog number L001; NanoTemper Technologies, Germany) according to the manufacturer's instructions. Labeled protein was eluted in 50 mM HEPES, pH 7.5, 100 mM NaCl, 5 mM CaCl 2 , 5 mM TCEP with 0.05% (vol/vol) Tween 20. For analysis of the binding, 16 serial dilutions of each oligosaccharide were prepared in the same buffer (1,000 to 0.03 mM or 100 to 0.003 mM) and mixed 1:1 with labeled BoSGBP MLG -A (at a final concentration of 50 or 100 nM). The samples were incubated at room temperature for 15 min and loaded into standard treated capillaries (catalog number MO-KO22; NanoTemper Technologies, Germany). The MST traces were recorded at 25°C (40% light-emitting diode [LED] power and medium MST power) using the MO.Control software version 1.6. Triplicates of independent measurements were analyzed with MO.Affinity analysis software version 2.3 (NanoTemper Technologies, Germany) to calculate the binding affinity expressed as a K a value.
ITC. The thermodynamic parameters of the binding of BoSGBP MLG -A and mutants to soluble polysaccharides were quantified by isothermal titration calorimetry (ITC) using a VP-ITC calorimeter (MicroCal, Northampton, MA, USA) at 25°C. All proteins were buffer exchanged to 50 mM HEPES, pH 7.5, 100 mM NaCl, 5 mM CaCl 2 , with 5 mM TCEP, and polysaccharides prepared in the same buffer to minimize heats of dilution. During titration, BoSGBP MLG -A proteins (33 to 53 mM) were stirring in the reaction cell (329 rev/min) while being successively injected with 28 pulses (220-s spacing) of 10 ml of polysaccharide (2.5 mg/ml). Integration was corrected by subtracting the value for the carbohydrate titration into the buffer run and analyzed by nonlinear regression using a single-site binding model, and the association constants K a and binding enthalpy (DH) were obtained (MicroCal Origin version 7.0; MicroCal Software). The standard Gibbs energy change DG°and the standard entropy change DS°were calculated using the thermodynamic equation RT ln K a = DG = DH 2 TDS, where R is the gas constant and T the absolute temperature (K). For polysaccharides, a binding stoichiometry of 1:1 was assumed (N = 1) to overcome the problem of converting concentration to molarity.
DSC. The structural integrity of recombinant BoSGBP MLG -A and mutants was confirmed by differential scanning calorimetry (DSC), using a Nano DSC (TA Instruments, New Castle, DE, USA). Samples were diluted to 0.8 mg/ml in 50 mM HEPES (pH 7.5), 100 mM NaCl, and 5 mM CaCl 2 . DSC scans were