Multimodularity of a GH10 Xylanase Found in the Termite Gut Metagenome

Xylan is the major hemicellulosic polysaccharide in cereals and contributes to the recalcitrance of the plant cell wall toward degradation. Bacteroidetes, one of the main phyla in rumen and human gut microbiota, have been shown to encode polysaccharide utilization loci dedicated to the degradation of xylan. Here, we present the biochemical characterization of a xylanase encoded by a bacteroidetes strain isolated from the termite gut metagenome.

IMPORTANCE Xylan is the major hemicellulosic polysaccharide in cereals and contributes to the recalcitrance of the plant cell wall toward degradation. Members of the Bacteroidetes, one of the main phyla in rumen and human gut microbiota, have been shown to encode polysaccharide utilization loci dedicated to the degradation of xylan. Here, we present the biochemical characterization of a xylanase encoded by a Bacteroidetes strain isolated from the termite gut metagenome. This xylanase is a multimodular enzyme, the sequence of which is interrupted by the insertion of two CBMs from family 4. Our results show that this enzyme resembles homologues that were shown to be important for xylan degradation in rumen or human diet and show that the CBM insertion in the middle of the sequence seems to be a common feature in xylan utilization systems. This study shed light on our understanding of xylan degradation and plant cell wall deconstruction, which can be applied to several applications in food, feed, and bioeconomy. KEYWORDS termite gut, lignocellulose, glycoside hydrolase, carbohydratebinding module, xylanase, PUL, GH10, CBM4, protein domain insertion, functional genomics X ylan is the most abundant hemicellulose present in cell walls of higher plants, especially cereal grains and hardwoods (1). The xylan main chain is composed of b-1,4-linked D-xylopyranosyl (D-Xylp) residues that can bear substitutions at O-2 and/or O-3 positions. L-arabinofuranosyl (L-Araf), 4-O-methyl glucuronyl (D-MeGlcAp), and acetyl residues are frequent main-chain substituents, and L-Araf moieties can be esterified by ferulate at their O-5 position. The nature of xylan backbone decorations varies depending on the species, the tissue, and the stage of development of the plant (2). Generally, graminaceous plants are rich in glucuronoarabinoxylan (GAX), while glucuronoxylan (GX) is found in dicots, the difference between these two categories being the relative amounts of L-Araf and D-MeGlcAp present. Complete xylan degradation requires an extensive arsenal of enzymes that can act synergistically (3). The main chain is depolymerized by b-D-xylanases (EC 3. Xylanases are mainly found in the glycoside hydrolase (GH) families 5, 8, 10, 11, 30, and 43 in the CAZy database (www.cazy.org) (5). The GH10 family constitutes a monospecific family that includes only endo-xylanases. Enzymes from this family perform catalysis via a retaining mechanism (6), and their canonical three-dimensional (3D) structure is a TIM barrel, (b/a) 8 , which is the most commonly known (2,077 occurrences) protein fold in the Protein Data Bank (PDB) and which forms an active cleft able to accommodate up to seven xylosyl backbone units (7). In addition, according to the Pfam database (http://pfam.xfam.org/), 20 to 30% of b-D-xylanases are multidomain proteins, comprising catalytic domains associated with accessory or helper domains, such as carbohydrate binding modules (CBMs). The latter have been attributed various roles, including the ability to target specific regions in substrates (8), disrupt polysaccharide structure (9), or anchor enzymes to bacterial surfaces (10). In multidomain proteins, individual domains are defined as the structural, functional, or evolutionary units of proteins (11) and can be regarded as biological equivalents of components in complex devices whose parts can be interchanged. Mostly, domains in proteins are sequentially organized, with one domain following another one. However, around 10% to 20% of domain combinations are discontinuous, with one domain being inserted into another one (12).
Termites are wood-feeding animals that are considered an abundant source of biomass-degrading enzymes (13). Termites produce very few endogenous lignocellulosedegrading enzymes, and their gut microbiome is mainly responsible for their ability to capture nutrients and energy from plant biomass (14,15). Over the last decade, numerous metagenomics studies revealed enzyme arsenals of termite gut microbiomes and detected promising enzymes for industrial use (16)(17)(18)(19)(20). Notably, Gram-negative Bacteroidetes, the dominant phylum in many animal digestive systems (21)(22)(23)(24)(25), utilize finely tuned glycan utilization systems. The paradigm for this type of system was provided by the well-studied starch utilization system (Sus) (26). In Sus-like systems, several proteins are encoded by genes found in a cluster (known as polysaccharide utilization loci, or PUL) and act in a coordinated manner to bind and hydrolyze complex sugars and utilize them for their metabolism (27). A xylan utilization system (Xus) that is composed of two outer membrane polysaccharide-binding proteins (XusB and XusD), two transporter proteins (XusA and XusC), and two outer membrane proteins (XusE and Xyn10C) was previously described in rumen and human digestive systems (28). Each of these proteins is expressed from a cluster of tandem genes that are organized as xusA-xusB-xusC-xusD (or sometimes only xusC-xusD), followed by xusE and xyn10C, the latter encoding a CBM-containing GH10 b-D-xylanase (28).
According to previous data, the CBMs in Xyn10C are inserted into the polypeptide sequence of the GH10 catalytic domain between structural elements b3 and a3 of the TIM barrel (29). The expression of Xyn10C was shown to be induced by xylan (30) along with the other xylanases, XynA and XynB. The most effective inducer is demonstrated to be a xylooligosaccharide with a degree of polymerization (DP) around 35, similar to the hydrolysates of Xyn10C (31). Altogether, this is consistent with the hypothesis that Xyn10C serves as a functional homologue of the Bacteroides thetaiotaomicron VPI-5482 SusG protein, initiating xylan metabolism through extracellular hydrolysis of polymeric substrates (28). In this regard, it has been proposed that Xyn10C is used as a functional marker of xylan degradation in the human gut (28,30,32). The potential roles and distributions of Xyn10C have recently attracted considerable attention but have not yet been fully described (29,30,32,33).
Previously, a putative xus locus assigned to the genus Bacteroides, Gram-negative anaerobic bacteria, was identified in a metagenomic library from the microbiome of a fungus-growing termite, Pseudacanthotermes militaris (17). This xus is composed of eight different open reading frames (ORFs) encoding putative XusC/D-like proteins, unknown protein (UNK), GH10 containing an insertion of two CBM4s (GH10jCBM4), GH115, GH11, a putative transporter protein, GH10, and GH43 (Fig. 1A). The GH10jCBM4 protein, designated P. militaris 25 (Pm25) here, presents an insertional modular structure homologous to Xyn10C protein (Fig. 1B). Here, we describe the characterization of Pm25 and discuss its activity with respect to its unusual multidomain organization. In addition, the potential function of the UNK was also investigated.
(This research was conducted by H. Wu in partial fulfillment of the requirements for a Ph.D. degree from Toulouse University [34].)

RESULTS
Bioinformatics analysis of the Pm25-encoding gene sequence. Analysis of the primary amino acid sequence of Pm25 revealed a multimodular architecture composed of a signal peptide (residues 1 to 32), a GH10 catalytic domain (amino acid residues 66 to 151 and 514 to 753), and two putative tandem CBM4s (CBM4-1, residues 161 to 321, and CBM4-2, residues 324 to 486) that constitute insertion domains (Fig. 1B). Alignment of the amino acid sequence of Pm25 with those of other GH10 family members for which structural data are available revealed that CBM4-1 and CBM4-2 are inserted between strand b3 and helix a3 of the (b/a) 8 structure (Fig. 1C). Moreover, this alignment allowed the identification of E546 and E663 as the putative catalytic acid/base and nucleophile, respectively.
The SSN is centered on the main cluster, with other minor clusters scattered around. Pm25 is located in the bottom right cluster (Pm25_cluster), and all classified sequences (61 nodes) thereof are from Bacteroidetes.
To investigate the sequence differences between the Pm25_cluster and main cluster, sequences in each cluster were aligned before the sequence logo was constructed (Fig. 2B). The 22 subsite motif in Pm25_cluster is glycine instead of glutamate in the main cluster, and both glycine and glutamate at the 22 subsite were found in other enzymes (41)(42)(43). Importantly, the distance between the b3 motif and b4 motif is ; CBM4-2 is the second CBM4 in Pm25. PDB 2Y6G, CBM4 from Rhodothermus marinus xylanase; 1GU3, CBM4 from Cellulomonas fimi endoglucanase C; 1GUI, CBM4 from Thermotoga maritima laminarinase 16A; 3K4Z, CBM4 from Clostridium thermocellum cellulase CbhA; 3P6B, CBM4 from Clostridium thermocellum cellulase CelK. Residues highlighted in gray were used for mutagenesis. Numbered residues in PDB sequences indicate that they are responsible for ligand binding according to the literature. The K in blue is potentially labeled with N-hydroxysuccinimide dye, which was described in Materials and Methods for the MST experiment. wider in Pm25_cluster (620 amino acids) than in the main cluster (92 amino acids), which indicated interrupted catalytic domains in Pm25_cluster.
To verify the relationship of Pm25_cluster with PUL, each sequence was searched against PULDB (http://www.cazy.org/PULDB/) (44). The majority of sequences (45 out of 61) were found in PUL. Among these, 44 out of 45 are always located downstream of hypothetical susC-susD-unk (see Table S2 in the supplemental material), suggesting that sequences in Pm25_cluster are Xyn10C-like proteins from the xylan utilization system in gut Bacteroidetes. The remaining sequences (16 out of 61) were not found in PUL, perhaps because of poor assignment. Enzyme optimal pH and temperature. The optimal pH of Pm25 was determined using beechwood GX as the substrate. The purified enzyme retained greater than 80% of its activity in the pH range from 4.5 to 9.0 (Fig. 3A). The activity was also measured at temperatures from 50 to 75°C, with maximum activity being observed at 60°C (Fig.  3B). However, since Pm25 was rather unstable at 60°C, thermostability was measured at both pH 7.5 and pH 9 ( Fig. 3C and D, respectively). Pm25 remained fully active for over 24 h at either 50°C (pH 7.5) or 45°C (pH 9). Overall, optimal conditions for routine Pm25 assays were defined as pH 7.5 and 50°C.
Enzyme assays and kinetic analysis. Determining the hydrolytic activity of Pm25 on various substrates revealed that it is 3-to 4-fold more active on arabinoxylans (30 U·mg 21 ) than on beechwood GX (7.4 U·mg 21 ), indicating that the latter is the least suitable substrate among those tested (Table 1). When comparing the relative activities of Pm25 on different arabinoxylans, no significant differences were detected, although rye arabinoxylan (RAX), which possess more O-3 substitutions (45), qualifies as the best substrate tested. Testing the activity of the inactive mutants (M1 and M2) (Fig. 4) on different substrates  revealed that they both displayed much lower (2 to 3 orders of magnitude) activity than Pm25 (Table S3). Hydrolysis product analysis with XOSs. HPAEC-PAD analysis of reaction mixtures containing different XOSs and Pm25 revealed that the hydrolysis of X 6 (after 60 min) produced a mixture of X 2 , X 3 , and X 4 at a ratio of 1:2:1, reaching 74 mM (Fig. 5A). Likewise, the hydrolysis of X 5 led to the production of X 2 and X 3 as major products at 75 and 96 mM, respectively ( Fig. 5B), while the hydrolysis of X 4 yielded X 1 and X 3 (Fig. 5C) at 81 and 112 mM, respectively. Moreover, the activity of Pm25 was directly correlated with the DP of the XOS used, with higher-DP XOS leading to higher activity (Table 1). This suggests that the Pm25 substrate binding cleft is quite large and can accommodate at least six xylosyl residues. To further study binding cleft subsite interactions, reactions were performed using Pm25 and different aryl-b-xylosides. Accordingly, the catalytic efficiency (k cat /K m ) obtained when using p-nitrophenyl-b-D-xylotetraose (pNPX 4 ) as the substrate was approximately 2.3-fold greater than that measured when using p-nitrophenyl-b-D-xylotriose (pNPX 3 ) ( Table 1). These data provided the basis to calculate the binding affinity at the 24 subsite, revealing a value of 0.53 kcal/mol. Likewise, the 23 subsite binding affinity is 2.76 kcal/mol, which correlates with the fact that the k cat /K m value for the hydrolysis of pNPX 3 is over 70-fold higher than that measured for the hydrolysis of p-nitrophenyl-b-Dxylobiose (pNPX 2 ) (Table 1). Furthermore, the use of construct M6 (i.e., Pm25 devoid of CBMs) (Fig. 4) to hydrolyze aryl-b-xylosides revealed similar affinities at subsites 24 and 23 (0.75 and 2.69 kcal/mol, respectively), inferring that the two CBM4 domains do not influence catalysis when using these substrates.
CBM and UNK ligand specificity. To probe the function of CBM4-1 and -2, various constructions with CBM deletions or inactivations (M1 Pm25 E546A, M7 Pm25DCBMs E546A, M8 CBM4-1, M9 CBM4-2, and M10 CBM4-1-CBM4-2) were designed and expressed (Fig. 4), and affinity gel electrophoresis experiments were performed. Since the results of bioinformatics analysis putatively assign CBM4-1 and -2 to CBM family 4, initial tests were performed on xylan and glucan before performing complementary tests with other polysaccharides ( Table 2). All proteins displayed strong affinity with the different xylans tested. However, very low or no affinity was detected for b-glucan and xyloglucan, and none of the proteins bound to arabinan, nanocellulose, and galactomannan (Table 2). Importantly, in the presence of different xylans, the dissociation constant (K d ) of M10 (CBM4-1 and -2 in tandem) was much lower than that of M8 (CBM4-1 alone) and M9 (CBM4-2 alone), ranging from 3.2 to 9, 10.9 to 31.9, and 8.0 to 17.5 10 22 mg·ml 21 , respectively, suggesting that there is cooperativity between the two CBMs. Comparing the behavior of the inactive mutants M1 and M7 with that of M10 revealed that the affinity of the inactive Pm25 M1 for xylans was similar to that exhibited by M10 and the CBM4-1/-2 tandem, whereas the K d values obtained for the inactive enzyme devoid of CBMs (i.e., M7) were significantly higher. This suggests that xylan binding is largely driven by the two CBM4 domains. Comparing M8 and M9 revealed that M9 systematically exhibited lower (1.4 to 1.8 times) K d values for tests involving xylans, suggesting that its binding ability is stronger (Table 2). However, the  results obtained with other polysaccharides indicate that CBM4-1 weakly binds to xyloglucan, while M9 (CBM4-2 alone) does not (Fig. S1). Overall, the results suggest that binding specificities of CBM4-1 and -2 are not identical.
To further probe the ligand binding ability of the two Pm25-associated CBM4 domains, different residues putatively involved in ligand binding were mutated (Fig. 6) based on sequence alignment (Fig. 1D) and structural comparison of CBM4 (unpublished data). Accordingly, the binding abilities of both CBM4-1jY213A and CBM4-2jY378A were lost, confirming that these residues play essential roles in the CBMligand interaction (Fig. 6). The binding abilities of CBM4-1jY257A and CBM4-2jY422A were also diminished, but ligand interactions were still observable, indicating that these tyrosines play less critical roles than Y213 and Y378, respectively ( Fig. 6A and B). Likewise, other mutants, such as CBM4-1jQ216A, CBM4-1jW259A, CBM4-2jQ381A, and CBM4-2jW424A, also diminished, to some extent, ligand binding, but the mutation of asparagines (N218 and N261 in CBM4-1 and N383 in CBM4-2) had no apparent effect on binding, although these residues are close to the essential ones.
The determination of the K d values of M7, M8, and M9 for X 6 was achieved using microscale thermophoresis (MST). The sigmoidal titration curves were used to calculate K d values (Fig. S2). The K d values obtained when using M7, M8, and M9 were 1.5 6 0.2 mM, 4.7 6 0.4 mM, and 1.8 6 0.1 mM, respectively. Recalling that M7 is devoid of CBM domains, it is noteworthy that this construction displayed the highest binding ability, with M9 (i.e., CBM4-2) exhibiting a similar ligand binding ability.
Substrate depletion experiments performed using M1, M7, M8, and M9 on wheat bran revealed that the latter two constructions (CBM4-1 and CBM4-2, respectively) exhibited binding ability. Similarly, M1 (i.e., Pm25jE546A) was also able to bind to wheat bran despite its catalytic impotency (Fig. S3A). However, this was not the case for M7 (i.e., the inactivated catalytic domain alone), clearly demonstrating the role of the CBM in binding.
The affinity of UNK toward low-viscosity wheat arabinoxylan (LVWAX) was investigated. The retardation of the UNK band suggested the UNK can bind to xylan (Fig.  S3B).
Analysis of polysaccharide and wheat bran hydrolysis. HPAEC-PAD of soluble polysaccharide hydrolysis mediated by Pm25 and its variants M5 and M6 failed to reveal significant differences in D-Xyl and XOS release (Fig. 7). This suggests that the CBM4 domains do not enhance degradation of soluble polysaccharides, since M6 is devoid of CBM4 domains. Performing a similar analysis using wheat bran as the substrate revealed that (after 14 h of incubation) D-Xyl and XOS release by Pm25 was approximately twice that of the variants M3, M4, M5, and M6 (Fig. 8). Significantly, in this experiment the consequences of the point mutations CBM4-1jY213A and CBM4-1jY378A were approximately equivalent to those produced by the ablation of the two CBM4 domains.

DISCUSSION
Unlike the vast majority of multimodular enzymes that display a sequential arrangement of their modules, the enzyme described here is characterized by a discontinuous organization that involves the insertion of two CBM domains into one GH10 xylanase domain. In this regard, it is significant that the SSN analysis performed using the amino acid sequence of M6 replacing Pm25 located the sequence within the same cluster, even though the CBMs were omitted (data not shown). This suggests that the Pm25 GH10 domain forms part of a distinct group and implies that the intercalated GH10 arrangement is robust from an evolutionary standpoint. Moreover, the biochemical data described here demonstrate that, despite its discontinuous organization, Pm25 is a fully functional xylanase. The first Pm25 analog was identified in a rumen-based member of the Bacteroidetes phylum (29). More have since been found in human gut bacteria (30,32,46), with Pm25 being the first described in termite gut. Several studies have revealed the importance of Pm25-like GH10 in xylan utilization systems (29,30,32,33). Using SSN analysis, we have shown that Pm25-like xylanases are exclusively linked to Bacteroidetes and are mostly (44 out of 61 based on SSN analysis) adjacent to an susC-susD-(unk) cluster. This evidence of strong conservation is consistent with the fact that in their native host, the genes encoding Pm25 homologs are highly induced/expressed during growth on xylan (30,46). In addition, our data show that the UNK protein upstream of Pm25 is a xylanbinding protein that strengthens the xylan utilization function of this core cluster (46), suggesting it is an analogue of SusE, which is also supported by the fact that like SusE, UNK is predicted to have a lipoprotein peptide signal by SignalP (47). Taken together, one can conclude that each component in the core cluster is essential for xylan utilization by members of the Bacteroidetes phylum in the gut ecosystem.
The in vivo function of Pm25 homologs in gut Bacteroidetes has not yet been fully established, although it has been suggested that it is a functional homolog of SusG (28). SusG is a cell surface-bound GH13 a-amylase that catalyzes the initial cleavage of polysaccharides (48). In our study, we also predict that Pm25 bears an N-terminal signal peptide that directs it to the cell surface, consistent with a proposal that was previously made for a Pm25 homolog (32). Moreover, SusG displays negligible activity compared to periplasmic a-amylases (48), an observation that is consistent with our findings. Indeed, compared to other xylanases (41,49), both Pm25 and similar elements display quite poor catalytic efficiency toward polysaccharides (33,50) and oligosaccharides (32). This trend is also observed in other polysaccharide-degrading systems, such as mannan utilization loci from members of the Bacteroidetes phylum (51) and the xylan-degrading system in the Proteobacteria (43) phylum. The underlying reason for such low activity most likely reflects its function. SusG-like proteins probably have a carbohydrate surveillance function, while highly active intracellular enzymes are charged with complete oligosaccharide breakdown prior to sugar catabolism. This clever and "selfish" strategy ensures that readily metabolizable sugars are not released into the environment, where they could be used by other bacteria that lack a specific glycan utilization machinery (52).
Remarkably, we found that Pm25 remains active over a broad pH range, maintaining more than 80% of its maximum activity at pH 9.0. This observation correlates well with results obtained for the Pm25 homologs Bacteroides intestinalis Xyn10C (BiXyn10C) and BiXyn10A, which were identified in the human gut microbiome (50). Accounting for the fact that alkaline-stable xylanases are sought after for use in applications such as paper pulp biobleaching, Pm25 might constitute a useful starting point for enzyme engineering aimed at improving its hydrolytic properties.
So far, we have been unable to obtain structural data pertaining to Pm25, and none is available for its closest homologs. Therefore, at this stage it is tricky to speculate on the exact topology and molecular determinants of its active site. Nevertheless, to gain some understanding, we have examined similarities with the family GH10 xylanase Cellvibrio japonicus Xyn10C (CjXyn10C), which displays approximately 30% identity to Pm25 and whose structure is known (PDB entry 1US3). Like Pm25, CjXyn10C exhibits rather poor activity on XOS, ascribed to weak substrate binding in subsite 22 (43). Unlike most other GH10 enzymes, CjXyn10C subsite 22 contains G295 in the place of E, whose side chain can hydrogen bond to the substrate. According to sequence alignment, Pm25 also lacks the vital E residue in subsite 22, an observation that might explain its poor ability to hydrolyze X 4 (43,53). Therefore, the 23 subsite with rather strong affinity value (2.76 kcal/mol) compared to others (53) is probably involved in the glycine subsite in the degradation of X 4 to compensate for the poor 22 subsite. Taken together, a hypothetical subsite mapping of the active site of Pm25 with XOS is proposed for Pm25 (Fig. 5D).
The two CBM4s that are inserted into Pm25 clearly contribute to the binding and degradation of complex biomass. Our results reveal that this is especially true when both CBMs are functional and suggest that binding of large ligands involves a cooperativity phenomenon (Fig. 8). However, based on the PULDB database, the number of CBM domains in Pm25 homologs varies from one to three, and the CBMs are from different families, CBM4, CBM22, or unclassified. This suggests that the SusG-assimilated functions can be fulfilled by enzymes that are not configured in an identical way. Moreover, it also confirms that the TIM-barrel fold in the GH10 family is quite accommodating in terms of insertions at the b3/a3 loop.
Apparently, unlike many highly active periplasmic endoglucanases, such as SusA (48) and CjXyn10D (43), extracellular enzymes such as SusG, CjXyn10A, and CjXyn10C are generally appended to CBMs (43). Therefore, it is of interest to discuss the reason for this. CBM58 in SusG (54) and the CBM4s in Pm25 appear to improve the ability of the enzymes to hydrolyze insoluble substrates (Fig. 8), while CBM15 in CjXyn10C does not play an important role in catalysis, irrespective of whether the substrate is soluble or not (43). However, our data suggest that the affinity of Pm25 for soluble substrates was mostly derived from the binding ability of the CBMs (Table 2). In light of this observation, we propose that CBMs in membrane-associated enzymes temporarily withhold soluble oligosaccharides before their importation into the cell. This implies that the function of the CBM4 domains would be relatively independent of that of the GH10 domain. In this regard, it is noteworthy that the first structure of a SusG protein (54), which reveals that a CBM58 domain is inserted into the B domain of the GH13 a-amylase domain, reports that CBM58 does not form hydrogen bonds with the catalytic domain, an observation that argues in favor of an independent function. Regarding Pm25, evidence for an independent function of the CBM4 domains is provided by the fact that the xylan-degrading profile of the Pm25 wild type was almost identical to that of the CBM-deleted version, M6 (Fig. 7), and the fact that the xylan binding affinity of CBMs was relatively unaltered when the CBM domains were separated from the GH10 domain (Table 2). Finally, it is also useful to recall that the affinity values determined for subsites 24 and 23 of Pm25 and M6 were nearly identical. Therefore, we believe that the catalytic center of Pm25 and the binding surfaces of the CBM4 domains are disconnected, an organization that corresponds to independent functions and contributes to low enzyme reaction rates (55).
In conclusion, focusing on a termite gut-derived enzyme, we have provided further insight into the properties and function of Xyn10C-like enzymes that form part of core xylan utilization systems. This system seems to be rather efficient in terms of evolution, since it is conserved in termite gut, rumen, and human gut. Therefore, the role of the CBM insertion is an interesting question. In this respect, we have thoroughly succeeded in characterizing the enzyme and shown that the CBM4 domains can be successfully excised without loss of catalytic function. Regarding the enzyme's substrate specificity, although it is difficult to speculate on the group of polysaccharides that might be preferential substrates in the termite gut environment, we have shown that it is better adapted for the hydrolysis of arabinoxylans than glucuronoxylans, which is consistent with the fact that the host termite feeds on crops such as sugarcane rather than wood.
The GenBank accession number for the clone containing Pm25 is HF548280.1, and the protein ID for Pm25 is CCO21036.1.
Bioinformatics analysis. Putative signal peptide sequence analysis was performed using the SignalP 4.1 server (47). The domain annotation of Pm25 was done using InterPro protein sequence analysis (https://www.ebi.ac.uk/interpro/) with accession number S0DFK9. Multiple-protein sequence alignment of CBM4s was done using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/), and the alignment of the secondary structure elements of Pm25 with other structurally characterized GH10 family members was achieved using both Clustal Omega and ESPript 3 at http://espript.ibcp.fr/ESPript/ ESPript/ (57). SSNs. Amino acid sequences of GH10 family members were extracted from the CAZy database (http://www.cazy.org/GH10_all.html), updated on 29 May 2020. To remove redundant sequences, the 4,936 sequences were winnowed down to 2,539 by a sequence identity cutoff of 0.9 (58), length cutoff of 250, and fragment exclusion (59). Sequence similarity networks (SSNs) were constructed using the Enzyme Function Initiative Enzyme Similarity Tool (EFI-EST) (59) and visualized using Cytoscape 3.6 (60). The alignment score threshold was set to 35% sequence similarity, since nodes are linked with the edge when they share over 35% identity and each node represents one protein sequence. Multiple-sequence alignment of GH10s in different clusters was done by MAFFT (https://www.ebi.ac.uk/Tools/msa/mafft/), and sequence logos were constructed via WebLoGo (https://weblogo.berkeley.edu/logo.cgi).
Cloning and site-directed mutagenesis. Cloning of the plasmid pDEST17 containing Pm25 was achieved as described previously (61). All mutants were constructed using the QuikChange site-directed mutagenesis kit (Strategene, La Jolla, CA, USA) with oligonucleotide primers ( Table 3). The M6 construct, which corresponds to Pm25 deprived of its CBMs, was obtained by gene synthesis (NZYTech, Lda, Portugal). The mutation of E546 to A in M6 yielded M7, while M8 and M9 were constructed by cloning the sequences encoding CBM4-1 and CBM4-2, respectively, into pET28a(1) expression vector. Likewise, the construct M10 is the pET32a(1) expression vector containing the sequence encoding both CBM4-1 and CBM4-2 cloned in frame with the thioredoxin tag using the In-Fusion cloning kit (Clontech, TaKaRa, Shiga, Japan). The DNA sequence of UNK (UniProt ID S0DDM9) deprived of its signal peptide sequence, as identified by the SignalP 4.1 server (47), was synthesized and subcloned into pET28a between the NheI and XhoI restriction sites.
Protein expression and purification. Wild-type Pm25 and the mutants M1, M2, M3, M4, and M5 (Fig. 4A) were expressed in Escherichia coli Rosetta(DE3) pLysS grown in ZYP autoinduction medium (62) at 25°C overnight. Constructs M6 to M10 and UNK were transformed into E. coli Tuner(DE3) and cultured for 2 h at 37°C until the optical density at 600 nm (OD 600 ) reached 0.6. At this point, isopropyl-b-D-thiogalactopyranoside (IPTG; 200 mM final concentration) was added and growth was pursued at 16°C overnight. Cell pellets were collected by centrifugation, washed, and lysed using sonication (Fisherbrand Q700; tip diameter, 13 mm; output, 40 W), and the clarified cell lysates were applied to TALON metal affinity resin (Clontech, Mountain View, CA, USA). After elution, protein purity was estimated by SDS-PAGE to be 95%. Protein concentrations were determined by measuring absorbance at 280 nm and applying the Beer-Lambert equation. Theoretical molar extinction coefficients were calculated using ProtParam online software (63).
Determination of pH and temperature optima. The apparent optimal pH of Pm25 was determined in the pH range of 3.0 to 11.0, measuring the enzyme activity (0.4 mM final enzyme concentration) on 1% (mol/vol) beechwood GX at 37°C. The buffers used were 50 mM citrate buffer for pH 3.0 to 6.0, 50 mM phosphate buffer for pH 6.0 to 8.0, 20 mM bicine buffer for pH 8.0 to 9.0, and 20 mM glycine-NaOH for pH 9.0 to 11.0. Xylanase activity was determined by measuring the release of reducing sugars using the 3,5-dinitrosalicylic acid (DNS) assay (64,65). Reactions were performed in triplicate at 37°C in the different buffers from pH 3.0 to 11.0, containing bovine serum albumin (BSA; 1 mg/ml). At regular intervals (0, 3, 6, 9, 12, 15, 18, 21, and 24 min), 100 ml of the reaction mixture was removed and added to 100 ml of DNS and kept on ice until all samples were ready. All samples then were heated at 95°C for 10 min and cooled on ice before adding 1 ml of deionized water and recording the absorbance at 540 nm using a spectrophotometer. A D-xylose series (0 to 1 mg/ml) was used to prepare a standard curve. The apparent optimal temperature was determined over the range of 21 to 90°C in 50 mM phosphate buffer (pH 7.5). Thermostability was monitored by preincubating the enzyme in the absence of substrate in 50 mM phosphate buffer (pH 7.5) at 45, 50, 55, and 60°C from 0 to 24 h. Residual enzyme activity in each case was then assayed as described above.
Enzyme specificity and kinetics. Enzyme kinetics were measured using a Pm25 concentration of 0.4 mM and 0.08 mM to degrade beechwood GX and other soluble polysaccharides, respectively. Initial rates (the concentration of D-Xyl equivalent released, in milligrams per milliliter per minute) were determined using a range of substrate concentrations (from 0.5 to 40 g/liter beechwood GX and 0.25 to 10 g/ liter for other soluble substrates) under optimal conditions. The DNS assay was used to monitor reducing sugar release as described earlier. The kinetic parameters (k cat and apparent K m ) were calculated using nonlinear regression in SigmaPlot 11.0 (Systat Software, San Jose, CA, USA). One unit of xylanase activity was defined as the amount of enzyme that catalyzes the release of 1 mmol of D-Xyl equivalents per min.
To study the hydrolysis of xylooligosaccharides (XOS) with a degree of polymerization of 4 to 6 (X 4 to X 6 ), reactions were performed using various concentrations (0.05 to 0.8 mM) and the optimal reaction conditions. Assays began upon the addition of enzyme, its final concentration being fixed to account for the nature of the substrate. Accordingly, 2.60, 0.26, and 0.026 mM enzyme were used for X 4 , X 5 , and X 6 , respectively. At regular intervals (0, 5,10,15,20,30,40,50, and 60 min), aliquots were removed and immediately heated at 95°C for 10 min to stop the reaction. The hydrolyzed products were then analyzed by high-performance anion-exchange chromatography with pulsed amperometric detection (HPAEC-PAD) using an ICS 3000 dual device (Dionex, France) equipped with Carbo-Pac PA-100 guard and analytical columns (2 by 50 mm and 2 by 250 mm, respectively) as described before (65). Ten microliters of sample was injected, and separation was achieved by applying a gradient of 0 to 85 mM sodium acetate, 150 mM NaOH from 0 to 30 min, isocratic elution with 500 mM sodium acetate, 150 mM NaOH from 30 to 33 min, and reequilibration of the column with 50 mM sodium acetate, 150 mM NaOH for another 10 min at a flow rate of 0.25 ml/min. Calibration was achieved using D-Xyl and XOS (X 2 , X 3 , X 4 , X 5 , and X 6 ) at concentrations from 5 to 50 mM. Plotting the hydrolysis rate (micromolars per minute) versus oligosaccharide substrate concentration (micromolars) yielded a linear relationship, meaning that the catalytic constant k cat /K m could be calculated from the slope k using equation 1, where [E] is the final concentration of enzyme.
All experiments were performed in triplicate, and reported values are the means from three experiments.
where A 2i is the subsite affinity at 2i subsite, k cat /K m of pNPX i is the performance constant for pNP-labeled XOS with a DP of i (where i is a whole number), and R is the universal gas constant (8.314 J mol 21 K 21 ).
To determine the catalytic parameters of reaction mixtures containing pNP-XOS, the final concentration of Pm25 used was 6, 27, and 136 nM for pNPX 4 , pNPX 3 , and pNPX 2 , respectively. Similarly, the final concentration of M6 was 10 nM for pNPX 4 and pNPX 3 and 54 nM for pNPX 2 . The concentration range of substrate was 0.025, 0.05, and 0.1 mM for pNPX 3 and pNPX 4 and 0.5, 2, and 5 mM for pNPX 2 . All experiments were performed in duplicate. The plot of hydrolysis rate against pNP substrate concentration was linear, which indicated that substrate concentration was far below the K m . Therefore, the k cat /K m of reaction mixtures containing aryl b-xylosides was determined under optimum conditions using equation 3 (53,66). Briefly, the substrate concentrations at the beginning of the reaction ([S 0 ]) and at specific times ([S t ]) were fitted to equation 3, where k = (k cat /K m ) [Enzyme] and [Enzyme] is the final concentration of enzyme.
The molar extinction coefficient of pNP (15,570 M 21 ·cm 21 ) was determined experimentally by measuring the absorbance at 404 nm for a standard curve ranging from 0 to 0.12 mM pNP at pH 7.5 and 50°C.
AGE. The binding of CBM4-1 and CBM4-2 to soluble polysaccharides was evaluated by affinity gel electrophoresis (AGE), using 7.5% (mol/vol) acrylamide gels containing various amounts of polysaccharide (for the concentration range of RAX, refer to Table S1 in the supplemental material). ADWAX, EDWAX, and LVWAX samples and beechwood GX were used at 0.006 to 0.06% (mol/vol), while other polysaccharides were used at 0.5% (mol/vol). Pure protein (6 mg) was migrated (10 mA/gel for about 1 h at room temperature) on gels in 25 mM Tris, 250 mM glycine buffer, pH 8.3. BSA (15 mg) was also included in the experiment as a negative, noninteracting control. Proteins were visualized by Coomassie blue staining. The dissociation constant K d was calculated as previously described (68). In equation 4, R 0 is the relative protein migration distance compared to that of BSA in the control gel (without ligand). The variable r is the relative protein migration distance compared to that of BSA in ligand-containing gels. R c is the relative protein migration distance of complex between protein and ligand. c is the concentration of ligand. When equation 4 was plotted, taking 1/(R 0 2 r) as the ordinate and 1/c as the abscissa, a straight line was obtained. The intercept of the line on the abscissa provided a negative reciprocal value of the dissociation constant (21/K). All experiments were performed in triplicate, and reported values are the means from three experiments.
To fluorescently label the amine groups of exposed lysines (Fig. 1D) lying in the vicinity of the ligand binding clefts in M7, M8, and M9, 100 ml of pure proteins (20 mM) was treated with the reagents in the protein labeling kit (RED-NHS) by following the manufacturer's instructions. The labeled proteins were recovered and purified using TALON metal affinity resin, and the concentration of the labeled proteins was estimated using SDS-PAGE and serial dilutions of a protein solution of known concentration. A total of 16 dilutions (350 to 11 mM) of a solution of X 6 containing either 0.075 mM M7, 0.13 mM M8, or 0.07 mM M9 in 50 mM phosphate buffer, pH 7, 0.05% Pluronic F-127 were loaded onto 16 standard capillaries. The initial fluorescence of all 16 samples was obtained by performing a capillary scan with LED power of 25% for M8 and 20% for M7 and M9. The dissociation constant K d was calculated by selecting the tab "Initial Fluorescence Analysis Set" in the Affinity Analysis software (70). To perform the SDS denaturation (SD) test, 10 ml of samples 1 to 3 and 14 to 16 were mixed with 10 ml of 4% SDS, 40 mM dithiothreitol (DTT) after 10 min centrifugation at 15,000 Â g, followed by a 5-min incubation of the mixture at 95°C to denature the protein. The samples then were loaded into the capillaries to measure their fluorescence intensities.
Solid depletion assay. The ability of inactivated M1, M7, M8, and M9 to bind wheat bran was investigated by incubating 100 mg of protein with 4 mg of wheat bran in 200 ml of reaction buffer (50 mM sodium phosphate, pH 7). Reactions were performed in 0.2-ml PCR tubes and incubated at 10°C for 2 h with agitation in an Eppendorf Thermomixer R at 1,400 rpm. For each reaction, the supernatant containing the unbound enzyme fraction was recovered after centrifugation using a benchtop microcentrifuge. The pelleted substrate was washed 3 times with reaction buffer. Finally, 20 ml of Laemmli sample buffer was added to the pellet and heated at 95°C for 10 min to denature the protein (bound fraction). All the fractions were verified by SDS-PAGE. BSA was used as a negative control.
Hydrolysis of wheat arabinoxylan and wheat bran. Product profiles were generated with Pm25, M3, M4, M5, and M6 on either LVWAX (0.5% [mol/vol]) or wheat bran (20 mg/ml of wheat bran prehydrated for 12 h at 37°C, 1,400 rpm using the Eppendorf Thermomixer R). The enzymes (final concentration, 0.5 mM) were incubated with the respective substrate in 50 mM phosphate buffer (pH 7.5) and 1 mg/ml BSA. Enzymatic reaction mixtures were incubated at 37°C for either 24 h for LVWAX or 14 h for wheat bran, and aliquots were removed at regular time intervals and heated at 95°C for 10 min to terminate the reaction. Each sample was centrifuged at 20,000 Â g for 5 min and quantified by HPAEC-PAD on a Dionex PA1 column equipped with a Carbo-Pac PA-1 guard and analytical columns (4 by 50 mm and 4 by 250 mm, respectively). Separation of oligosaccharides was achieved by isocratic elution with 100 mM NaOH at a flow rate of 1 ml/min from 0 to 5 min, a gradient of 0 to 120 mM sodium acetate in 100 mM NaOH from 5 min to 25 min, and isocratic elution with 500 mM sodium acetate in 100 mM NaOH from 25 min to 35 min. The column then was reequilibrated with 100 mM NaOH for another 10 min. Calibration was achieved using D-Xyl and XOS (X 2 , X 3 , X 4 , X 5 , and X 6 ) at concentrations from 5 to 100 mM.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, PDF file, 0.6 MB.

ACKNOWLEDGMENTS
The research was supported by CSC (China Scholarship Council) (H.W.) and the Climate-KIC ADMIT BIOSUCCINOVATE project (E.I.). The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We thank the ICEO facility dedicated to enzyme screening and discovery, part of the Integrated Screening Platform of Toulouse (PICT, IBiSA), for providing access to highperformance liquid chromatography and protein purification systems. G. Arnal is gratefully acknowledged for insightful reading of the manuscript.