Crystal Structure of a GH3 β-Glucosidase from the Thermophilic Fungus Chaetomium thermophilum

Beta-glucosidases (β-glucosidases) have attracted considerable attention in recent years for use in various biotechnological applications. They are also essential enzymes for lignocellulose degradation in biofuel production. However, cost-effective biomass conversion requires the use of highly efficient enzymes. Thus, the search for new enzymes as better alternatives of the currently available enzyme preparations is highly important. Thermophilic fungi are nowadays considered as a promising source of enzymes with improved stability. Here, the crystal structure of a family GH3 β-glucosidase from the thermophilic fungus Chaetomium thermophilum (CtBGL) was determined at a resolution of 2.99 Å. The structure showed the three-domain architecture found in other β-glucosidases with variations in loops and linker regions. The active site catalytic residues in CtBGL were identified as Asp287 (nucleophile) and Glu517 (acid/base). Structural comparison of CtBGL with Protein Data Bank (PDB)-deposited structures revealed variations among glycosylated Asn residues. The enzyme displayed moderate glycosylation compared to other GH3 family β-glucosidases with similar structure. A new glycosylation site at position Asn504 was identified in CtBGL. Moreover, comparison with respect to several thermostability parameters suggested that glycosylation and charged residues involved in electrostatic interactions may contribute to the stability of the enzyme at elevated temperatures. The reported CtBGL structure provides additional insights into the family GH3 enzymes and could offer new ideas for further improvements in β-glucosidases for more efficient use in biotechnological applications regarding cellulose degradation.


Introduction
Beta-glucosidases (β-glucosidases) are key cellulolytic enzymes that catalyze the hydrolysis of cellobiose to glucose. They are terminal enzymes of the cellulase system as they act during the final step of the cellulose degradation. Although beta-glucosidases do not act directly on cellulose, their activity is important to circumvent the inhibitory effect of cellobiose on endoglucanases and cellobiohydrolases. Consequently, inclusion of β-glucosidases in cellulase preparations can synergistically increase the hydrolytic efficiency of other cellulolytic enzymes [1].
Despite their extensive use in biotechnological application, structural data on fungal β-glucosidases are still scarce. Many fungal BGLs are classified as part of the GH3 family, one of the largest CAZy

Quality of the Structure
The crystal structure of CtBGL was refined using data up to 2.99 Å resolution. The final R cryst and R free (5% of the reflections excluded from refinement) were 0.201 and 0.252, respectively ( Table 1). The structure exhibits good stereochemistry despite the limited resolution with root mean square deviation (rmsd) in bond lengths and bond angles of 0.007 Å and 0.94 • , respectively. The Ramachandran plot shows 92.8% of the residues in the most favorable regions and 1.0% in disallowed regions. Residues that fall into the disallowed regions belong to flexible parts of the structure with weak electron density. Structure quality statistics for CtBGL fall within the distribution found in other crystal structures of similar resolution as analyzed and displayed by POLYGON [15]. The refined structure contains 836 residues of the mature protein, starting at residue Trp32. The full-length cDNA encodes for a 867-residue enzyme and the first 16 residues have been suggested to act as a secretion signal peptide [16].  at the N-terminal were not modeled, owing to lack of adequate electron density, possibly due to high flexibility. The enzyme crystallized with one molecule in the crystallographic asymmetric unit in contrast to other β-glucosidases that crystallize either with two molecules such as the β-glucosidases from Aspergillus oryzae (AoβG) and A. fumigatus (Af βG) [8], A. aculeatus (AaβG) [8] and NcCel3A [5], or with four molecules in the asymmetric unit such as ReCel3A [4]. The solvent content is unusually high (~77%) for a protein crystal, which could explain the low resolution of the data and the disintegration of the crystals during manipulation prior to mounting. The NCS dimer found in other β-glucosidases is seen in CtBGL as a crystallographic dimer that involves the −Y, −X, −Z+ 1 2 symmetry-related molecule.

Overall Structure Comparisons
Structure-based sequence alignment ( Figure 1) showed that CtBGL has the highest structural similarity with NcCel3A (PDB id 5nbs), as indicated by the root mean square deviation (rmsd) of 0.60 Å for 831 aligned residues (sequence identity 72%), followed by Aspergilus fumigatus AfβG (PDB id 5fji; rmsd 0.81 Å; seq. identity 62%), ReCel3A (PDB id 5ju6; rmsd 0.84 Å; 62.0%) and Aspergillus aculeatus AaBgl1 (4iig; 0.83 Å; 62.0%). Lower similarities were found with Hypocrea jecorina HjCel3A (3zyz; 1.21 Å; 47%).  Structure-based sequence alignment of CtBGL with members of GH3 family. Secondary structure elements are shown on the top of the alignment, while the red triangles below the alignment indicate the catalytic conserved residues (Asp287 and Glu517 in CtBGL). Disulfide bonds are depicted with green numbers. The three domains are indicated. A column is framed if more than 70% of its residues are similar according to physicochemical properties. Frames in red background with white letters depict strict identity. The figure was constructed with ESPript [16].

Description of the Structure
The structure of CtBGL ( Figure 2) consists of three distinct domains, similar to other GH3 family members: A catalytic triose phosphate isomerase (TIM) barrel-like domain (Leu51-Ser354), an α/β sandwich domain (His396-Gly596) and a FnIII (fibronectin type III) domain (Thr663-Gln867) with a prominent insertion region (677-766). A linker region (residues 355-394) connects domain 1 to domain 2 and a second linker region (600-662) connects domain 2 to domain 3. The 39-residue linker between domain 1 and domain 2 is comparable to that in NcCel3A and significantly longer than the linker in other GH3 β-glucosidases such as HjCel3A and HvExoI with only 18 and 16 residues in length, respectively. In NcCel3A, the linker comprises 42 residues (residues 341-383) with a 25-residue (residues 351-376) insertion previously described as a hydrophobic linker responsible for the activation of ReCel3A and AaBgl in organic solvents [5]. This insertion has been coined as loop II and contains several aromatic residues, of which CtGBL Phe364 (Phe352 in NcCel3A) and CtGBL Trp367 (Trp355 in NcCel3A) are the most conserved. Moreover, loop II contains residues Phe366, Trp367, and Trp377 that line up one side of the substrate-binding site, and they are also found in NcCel3A (Phe354, Trp355, and Trp365, respectively) with variations in other β-glucosidases, such as ReCel3A and AaBgl1. It has been suggested that this loop is stabilized by interactions with the N-glycans from a neighboring glycosylated Asn residue (Asn57 in NcCel3A) [5]. In CtBGL, the equivalent residue is Asn72, which is also glycosylated (see below). with green numbers. The three domains are indicated. A column is framed if more than 70% of its residues are similar according to physicochemical properties. Frames in red background with white letters depict strict identity. The figure was constructed with ESPript [16].

Description of the Structure
The structure of CtBGL ( Figure 2) consists of three distinct domains, similar to other GH3 family members: A catalytic triose phosphate isomerase (TIM) barrel-like domain (Leu51-Ser354), an α/β sandwich domain (His396-Gly596) and a FnIII (fibronectin type III) domain (Thr663-Gln867) with a prominent insertion region (677-766). A linker region (residues 355-394) connects domain 1 to domain 2 and a second linker region (600-662) connects domain 2 to domain 3. The 39-residue linker between domain 1 and domain 2 is comparable to that in NcCel3A and significantly longer than the linker in other GH3 β-glucosidases such as HjCel3A and HvExoI with only 18 and 16 residues in length, respectively. In NcCel3A, the linker comprises 42 residues (residues 341-383) with a 25residue (residues 351-376) insertion previously described as a hydrophobic linker responsible for the activation of ReCel3A and AaBgl in organic solvents [5]. This insertion has been coined as loop II and contains several aromatic residues, of which CtGBL Phe364 (Phe352 in NcCel3A) and CtGBL Trp367 (Trp355 in NcCel3A) are the most conserved. Moreover, loop II contains residues Phe366, Trp367, and Trp377 that line up one side of the substrate-binding site, and they are also found in NcCel3A (Phe354, Trp355, and Trp365, respectively) with variations in other β-glucosidases, such as ReCel3A and AaBgl1. It has been suggested that this loop is stabilized by interactions with the N-glycans from a neighboring glycosylated Asn residue (Asn57 in NcCel3A) [5]. In CtBGL, the equivalent residue is Asn72, which is also glycosylated (see below). CtBGL second domain, an (α/β)6 sandwich fold, is structurally well conserved among all the GH3 enzymes. It consists of residues His396 to Gly596 and includes loops III and IV that encompass residues Gly433-Val467 and Ala513-Asn536, respectively ( Figure 3). Loop III houses the conserved CtBGL second domain, an (α/β) 6 sandwich fold, is structurally well conserved among all the GH3 enzymes. It consists of residues His396 to Gly596 and includes loops III and IV that encompass residues Gly433-Val467 and Ala513-Asn536, respectively ( Figure 3). Loop III houses the conserved cysteine residues Cys442 and Cys447, which form a disulfide bridge that stabilizes the folded loop III architecture. Ser458 and Asp444 are also highly conserved in all compared structures. This domain hosts the catalytic acid Glu517 in loop IV, which is found to be conserved in all of the compared β-glucosidases, CtBGL, NcCel3A, ReCel3A, HjCel3A, AaBgl1, and Thermotoga neapolitana β-glucosidase 3B (TnBgl3B). Moreover, the architecture and location of loops III and IV, which constitute one site of the active site cavity in between loops I and II, are also found conserved. Ser458, Asp444, and Glu517 are found pointing downwards from the loop regions and towards the active site. cysteine residues Cys442 and Cys447, which form a disulfide bridge that stabilizes the folded loop III architecture. Ser458 and Asp444 are also highly conserved in all compared structures. This domain hosts the catalytic acid Glu517 in loop IV, which is found to be conserved in all of the compared βglucosidases, CtBGL, NcCel3A, ReCel3A, HjCel3A, AaBgl1, and Thermotoga neapolitana β-glucosidase 3B (TnBgl3B). Moreover, the architecture and location of loops III and IV, which constitute one site of the active site cavity in between loops I and II, are also found conserved. Ser458, Asp444, and Glu517 are found pointing downwards from the loop regions and towards the active site. Domain 2 of the CtBGL structure is followed by a second linker region (Linker 2; residues Lys597-Ser660). This linker region exhibits an extended structure, which almost resembles a boundary that separates domain I and domain II from domain III. The extended linker region probably plays a role in stabilizing loops I and IV.
The FnIII-like domain 3 was first observed in the GH3 family in the TnBgl3B structure [18]. It consists of residues Tyr661-Asn867 that form a beta sandwich composed of a total of nine β-strands arranged in two layers of β-sheets with three and four β-strands, respectively. Loop V takes an extended structure and encompasses domain I. Importantly, loop V is present in NcCel3A, ReCel3A, and AaBgl1, but it is absent in TnBgl3B and HjCel3A ( Figure 4). Several conserved aromatic residues, namely, Tyr716, Tyr718, Tyr733, and Phe740, are found in loop V. Tyr716 and Tyr733 are found to be conserved in all structures except HjCel3A and TnBgl3B, where loop V is absent, while Tyr718 is replaced by Trp in ReCel3A and AaBgl1. Moreover, Phe740 is replaced by a Tyr residue in ReCel3A and a His residue in AaBgl1. These conserved aromatic residues form π-stack interactions with Nacetyl--D-glucosamine (GlcNAc) residues. Also, conserved in loop V is Asn720, which was found to be N-glycosylated. Domain 2 of the CtBGL structure is followed by a second linker region (Linker 2; residues Lys597-Ser660). This linker region exhibits an extended structure, which almost resembles a boundary that separates domain I and domain II from domain III. The extended linker region probably plays a role in stabilizing loops I and IV.
The FnIII-like domain 3 was first observed in the GH3 family in the TnBgl3B structure [18]. It consists of residues Tyr661-Asn867 that form a beta sandwich composed of a total of nine β-strands arranged in two layers of β-sheets with three and four β-strands, respectively. Loop V takes an extended structure and encompasses domain I. Importantly, loop V is present in NcCel3A, ReCel3A, and AaBgl1, but it is absent in TnBgl3B and HjCel3A (Figure 4). Several conserved aromatic residues, namely, Tyr716, Tyr718, Tyr733, and Phe740, are found in loop V. Tyr716 and Tyr733 are found to be conserved in all structures except HjCel3A and TnBgl3B, where loop V is absent, while Tyr718 is replaced by Trp in ReCel3A and AaBgl1. Moreover, Phe740 is replaced by a Tyr residue in ReCel3A and a His residue in AaBgl1. These conserved aromatic residues form π-stack interactions with N-acetyl-β-d-glucosamine (GlcNAc) residues. Also, conserved in loop V is Asn720, which was found to be N-glycosylated.

Glycosylation Sites
Potential glycosylation sites were observed in the CtBGL structure in 10 positions, all in Asn residues ( Table 2). The glycosylation sites found in CtBGL when compared with other structures showed some variations. A unique glycosylation site was observed in CtBGL at Asn504 on the surface of the molecule, where the corresponding aligned structures had either a different residue or a gap in the structure-based alignment (Table 3). Asn259 was found highly conserved and glycosylated in all structures. Moreover, ReCel3A possess 16 glycosylation sites [4], similarly to Aspergillus βglucosidases [8]. On the other hand, HjCel3A showed the lowest number of glycosylation sites, with only two glycosylated sites (Asn208 and Asn310 in chains A and B, respectively). As observed before in this class of β-glucosidases, all glycosylation sites in CtBGL were also located on one face of the molecule ( Figure 5).
A total of 27 glycan moieties were found in CtBGL. The glycans ranged in length from single GlcNAc to longer chain. The longest glycosylation chain was composed of eight residues. The overall degree of glycosylation in CtBGL can be considered as moderate when compared against other glycosylation chains, such as those found in ReCel3A and AaBGL1, with the longest chain composed of 10 residues and 45-50 glycosylation residues in total per chain in the crystallographic asymmetric unit. Interestingly, the glycosylation pattern in CtBGL showed high GlcNAc-type N-glycans, while ReCel3A and AaBGL1 displayed high mannose-type N-glycans. A total of 15 GlcNAc residues were identified in CtBGL. Out of the total 10 glycosylation sites in CtBGL, three of them were found to consist of a single GlcNAc monosaccharide. Single GlcNAc monosaccharides in other enzymes, such as AaBGL1, were obtained possibly as a result of a treatment with endoglycosidase H prior to crystallization. In the case of CtBGL, no endoglycosidase treatment was employed. Similarly, single

Glycosylation Sites
Potential glycosylation sites were observed in the CtBGL structure in 10 positions, all in Asn residues ( Table 2). The glycosylation sites found in CtBGL when compared with other structures showed some variations. A unique glycosylation site was observed in CtBGL at Asn504 on the surface of the molecule, where the corresponding aligned structures had either a different residue or a gap in the structure-based alignment (Table 3). Asn259 was found highly conserved and glycosylated in all structures. Moreover, ReCel3A possess 16 glycosylation sites [4], similarly to Aspergillus β-glucosidases [8]. On the other hand, HjCel3A showed the lowest number of glycosylation sites, with only two glycosylated sites (Asn208 and Asn310 in chains A and B, respectively). As observed before in this class of β-glucosidases, all glycosylation sites in CtBGL were also located on one face of the molecule ( Figure 5).
A total of 27 glycan moieties were found in CtBGL. The glycans ranged in length from single GlcNAc to longer chain. The longest glycosylation chain was composed of eight residues. The overall degree of glycosylation in CtBGL can be considered as moderate when compared against other glycosylation chains, such as those found in ReCel3A and AaBGL1, with the longest chain composed of 10 residues and 45-50 glycosylation residues in total per chain in the crystallographic asymmetric unit. Interestingly, the glycosylation pattern in CtBGL showed high GlcNAc-type N-glycans, while ReCel3A and AaBGL1 displayed high mannose-type N-glycans. A total of 15 GlcNAc residues were identified in CtBGL. Out of the total 10 glycosylation sites in CtBGL, three of them were found to consist of a single GlcNAc monosaccharide. Single GlcNAc monosaccharides in other enzymes, such as AaBGL1, were obtained possibly as a result of a treatment with endoglycosidase H prior to crystallization. In the case of CtBGL, no endoglycosidase treatment was employed. Similarly, single GlcNAc molecules were found in AfβG (in Asn543 and Asn715) without any attempt of enzymatic cleavage. It is possible that, in these cases, the enzymes were subjected to endoglycosidase activity during the expression stage. The other seven N-glycosylation sites in CtBGL ranged from two to eight monosaccharides. The N-glycan at CtBGL Asn329 was located in domain I of the enzyme and consisted of eight monosaccharides (two GlcNAc, five α-d-Man, and one β-d-Man). Extra densities that could accommodate another two monosaccharide molecules were present in the electron-density map, but they were not modeled owing to lack of clarity. It was, nevertheless, the largest N-glycan structure in CtBGL and was involved in forming multiple H-bonds with the enzyme residues at Thr293, Asp730, Val295, Gly734, and Tyr733 and a pi-sigma interaction with Tyr733 residue. The structural equivalent Asn residue in AaBgl1 (Asn322) had a long-length glycan with eight Man and two GlcNAc that were also involved in extensive stabilizing interactions with domain I. N-glycans at positions Asn72 and Asn720 were located in domains I and III, respectively, with loop V residues in between them. They both participated in interactions with protein residues. The N-glycan at Asn72 was found to interact in conventional H-bonding interactions with Ala90, Tyr716, Tyr718, and in pi-sigma interaction with Tyr715 from domain III. The N-glycan at Asn720 interacted with Arg710 via H-bonding and Tyr704 via weak van der Waals interactions. A long chain glycan moiety at Asn259 at the outer surface of the enzyme was found to interact via H-bonding with Ser35, Glu36, and Asp229.
Notably, the protein glycans were also shown to exhibit a potential binding affinity for polysaccharides such as cellulose [19] and aromatic compounds [20,21]. This suggests a possible role of N-glycans in promoting cellulose binding and also in protein-glycan interactions. Glycans in β-glucosidases can stabilize the crystal packing contacts and also participate in hydrogen-bonding interactions at the dimer interface [8]. There are two glycosylated Asn residues at the dimer interface of β-glucosidases that provide protein-glycan interactions between the two chains. One of those Asn residues corresponds to Asn531 in CtBGL. This Asn bears only two monosaccharide molecules, whereas, for comparison, the equivalent Asn residue in AaGL1 has seven monosaccharides with the terminal ones able to reach the adjacent subunit. The second Asn residue that participates in dimer interface contacts is not present in CtBGL. The lack of these contacts could therefore contribute to reduced strength of intermolecular interactions in the crystal lattice, resulting in further instability of the CtBGL crystals.

Active Site
The active site is located in a shallow pocket near the interface of the first and second domain. A molecule of β-D-glucose (BGC) was fitted at the active site based on residual electron density observed in electron density Fo-Fc difference maps. The source of the β-D-glucose is most likely the growth medium used as no β-D-glucose was used during crystallizations or soaking. Two catalytic residues were identified, Asp287 (nucleophile) at the N-terminal TIM-barrel domain and Glu517 (acid/base) at the sandwich α/β domain II. Both catalytic residues were found conserved in the GH3 family members (Figure 6). The corresponding catalytic residues were Asp276/Glu505 in NcCel3A, 180° Figure 5. Distribution of glycosylation sites in the CtBGL structure. GlcNAc is shown in blue, β-d-mannose in green, α-d-mannose in orange, and BGC in magenta.

Residue
Corresponding Glycan Structure

Active Site
The active site is located in a shallow pocket near the interface of the first and second domain. A molecule of β-d-glucose (BGC) was fitted at the active site based on residual electron density observed in electron density F o -F c difference maps. The source of the β-d-glucose is most likely the growth medium used as no β-d-glucose was used during crystallizations or soaking. Two catalytic residues were identified, Asp287 (nucleophile) at the N-terminal TIM-barrel domain and Glu517 (acid/base) at the sandwich α/β domain II. Both catalytic residues were found conserved in the GH3 family members ( Figure 6). The corresponding catalytic residues were Asp276/Glu505 in NcCel3A, Asp277/Glu505 in ReCel3A, Asp280/Glu509 in AaBgl1, Asp236/Glu441 in HjCel3A, and Asp242/Glu458 in TnBgl3B. The collapsed TIM-barrel model of domain I is vital for the proper accession of the active site. It was found that near the active site, the second barrel β strand (Gly87 to Thr89) was much shorter and antiparallel, which creates an active site much wider and accessible when compared with GH3 enzyme structures with complete TIM-barrel fold [6]. An additional electron density present at the active site next to BGC was not interpretable and may suggest a bound buffer molecule, MPD from the crystallization mother liquor, or a partially bound glucose molecule.

CtBGL Thermostability
CtBGL has been found to be thermostable at 50 • C and to retain half of its activity after incubation at 65 • C for 55 min [22]. The enzyme also retains 29.7% of its activity after incubation at 70 • C for 10 min. In regard to most other β-glucosidases from thermophilic fungi, CtBGL exhibits comparable thermostability, as previously reported [22]. In contrast, A. fumigatus β-glucosidase has been found highly thermostable and able to retain most of its activity for at least 19 h at 65 • C [23].
Protein thermostability is usually hard to predict and there is no a common mechanism yet available [24,25]. Several factors of protein thermostability have been proposed that could provide some clues (Table 4). Solvent-accessible surface (SAS), charged residues, and glycosylation patterns are some of the key indicators. The structure of HjCel3A from the mesophilic fungus Hypocrea jecorina had the lowest SAS (22812 Å 2 ), owing to the smaller number of residues and the lack of loop V. Similarly, loop V was absent in TnBgl3A and the SAS value was reduced. The SAS values for the other β-glucosidases, including CtBGL, were quite similar, i.e., around 28100 Å 2 .

CtBGL Thermostability
CtBGL has been found to be thermostable at 50 C and to retain half of its activity after incubation at 65 °C for 55 min [22]. The enzyme also retains 29.7% of its activity after incubation at 70 °C for 10 min. In regard to most other β-glucosidases from thermophilic fungi, CtBGL exhibits comparable thermostability, as previously reported [22]. In contrast, A. fumigatus β-glucosidase has been found highly thermostable and able to retain most of its activity for at least 19 h at 65 °C [23].
Protein thermostability is usually hard to predict and there is no a common mechanism yet available [24,25]. Several factors of protein thermostability have been proposed that could provide some clues (Table 4). Solvent-accessible surface (SAS), charged residues, and glycosylation patterns are some of the key indicators. The structure of HjCel3A from the mesophilic fungus Hypocrea jecorina had the lowest SAS (22812 Å 2 ), owing to the smaller number of residues and the lack of loop V. Similarly, loop V was absent in TnBgl3A and the SAS value was reduced. The SAS values for the other β-glucosidases, including CtBGL, were quite similar, i.e., around 28100 Å 2 .
Charged residues in the structures can contribute to structural integrity and, in turn, to thermostability. The negatively (Asp and Glu) and positively (Arg and Lys) charged residues may provide a stability profile of the structure [26]. The numbers showed significant variations when the proteins were compared. In particular, HjCelA showed reduced numbers of positively and negatively charged residues, despite its high thermostability with an optimum temperature at 90 °C and an unfolding temperature of ~88 °C depending on the enzyme concentration [27]. The number was increased for the β-glucosidases from other thermophilic and mesophilic β-glucosidases. In addition, AfβG, despite its high thermostability, was also characterized by a similar content of charged residues, suggesting that the strength of individual ion-pair interactions may play a key role.
Finally, the glycosylation pattern found may also contribute to the thermostability of the enzyme by promoting interactions with amino acid residues. It has been shown that glycosylation enhances Charged residues in the structures can contribute to structural integrity and, in turn, to thermostability. The negatively (Asp and Glu) and positively (Arg and Lys) charged residues may provide a stability profile of the structure [26]. The numbers showed significant variations when the proteins were compared. In particular, HjCelA showed reduced numbers of positively and negatively charged residues, despite its high thermostability with an optimum temperature at 90 • C and an unfolding temperature of~88 • C depending on the enzyme concentration [27]. The number was increased for the β-glucosidases from other thermophilic and mesophilic β-glucosidases. In addition, AfβG, despite its high thermostability, was also characterized by a similar content of charged residues, suggesting that the strength of individual ion-pair interactions may play a key role.
Finally, the glycosylation pattern found may also contribute to the thermostability of the enzyme by promoting interactions with amino acid residues. It has been shown that glycosylation enhances the solubility, reduces the aggregation, and increases the thermal stability of proteins [28,29]. The exact mechanism by which the enzyme glycosylation pattern affects the overall function and structure of proteins is not yet well understood. Analysis of protein structures deposited in the Protein Data Bank has suggested that N-glycosylation causes no significant local or global structural changes; however, it decreases the protein dynamics, thus leading to increased stability [26]. Aglycosylated proteins are, in general, found to be less stable, and therefore aggregate more easily than its glycosylated counterparts at certain temperatures. Thus, glycosylation has been suggested as a potential factor to enhance enzyme solubility, stability, and function [30]. Thermostability measurements in HjCel3A samples with different degree of N-glycosylation revealed the same melting temperature (74.0 ± 0.2 • C), thus suggesting that the effect of glycosylation may be case-specific and other factors could play a role. Interestingly, TnBgl3B that lacks glycosylation exhibited thermostability, most likely as a result of the high numbers of charged residues. Nevertheless, the moderate glycosylation in CtBGL could not be ruled out as a contributing factor to the limited stability of the enzyme compared to other more thermostable β-glucosidases characterized by extensive glycosylation. Further studies are, however, required to better understand the role of glycosylation in β-glucosidases and in protein stability.  [3]. $ The enzyme was produced in Escherichia coli.

Protein Expression, Purification, and Crystallization
Protein expression and purification of the β-glucosidase from Chaetomium thermophilum CT2 was carried out as previously described [22]. Briefly, the enzyme (Uniprot id A6YRT4) was produced in Pichia pastoris GS115 cells and purified by ion-exchange chromatography on a DEAE-Sepharose (Pharmacia, Uppsala, Sweden) to homogeneity as judged by SDS-PAGE. β-Glucosidase activity was assayed with salicin using the Miller's method and also detected in native polyacrylamide gel using 4-methylumbeliferyl-β-d-glucopyranoside [22]. The activity of the enzyme was 1.62 ± 0.20 U/mg (1 U corresponds to the release of 1 µM of glucose per min) from three independent measurements at optimum conditions of pH and temperature.

Protein Crystallization
Prior to crystallization, the enzyme solution was concentrated to~10 mg/mL with Amicon ® Ultra Centrifugal Filters (10,000 MW cut-off) (Millipore, MA, USA) in 10 mM HEPES-NaOH, pH 7.0 buffer. Crystals were obtained by the hanging-drop vapor diffusion method at 16 • C using a well solution of 35-45% v/v MPD (Sigma-Aldrich, St. Louis, MO, USA). The drops were prepared by mixing 2 µL of protein solution with an equal volume of well solution. The crystals grew as octahedra to a maximum size of approximately 0.06 × 0.06 × 0.08 mm 3 within a period of 1 month.

Data Collection and Processing
Data were collected on the X13 beamline at EMBL-Hamburg (c/o DESY) from a single crystal under cryogenic (100 K) temperature using a MARCCD detector. The presence of MPD in the crystallization was sufficient for cryoprotection, thus no additional cryoprotectant was needed. One hundred and fifty diffraction images were collected from a single crystal with a rotation range of 0.45 • per image. Data processing was carried out with XDS [32]. The crystal was found to belong to the tetragonal space group P4 1 2 1 2/P4 3 2 1 2. Assuming one molecule in the asymmetric unit, the Matthews coefficient V M [33] is 5.4 Å 3 /Da corresponding to a solvent content of~77%.

Structure Determination and Refinement
Initial phases were obtained with molecular replacement using Phaser [34], as implemented in Phenix 1.15.2_3472 [35]. The crystal structure of Aspergillus aculeatus β-glucosidase in complex with castanospermine (PDB id 4iif; sequence identity 61.5%) was used as a search model after pruning side chains with Sculptor [36] based on sequence alignment considerations. Initially, the search was carried out assuming two molecules in the asymmetric unit, but no solution was produced. Based, however, on the statistics for one molecule (TFZ = 30.9), the search was limited to one molecule and a single solution was obtained in space group P4 1 2 1 2. Refinement was carried out using simulated annealing (1000 K) in Phenix with maximum likelihood as target function. The refinement was alternated with model visualization and rebuilding using Coot 0.8.9 [37]. Tight restraints were used to avoid overfitting of the structure to the data, owing to the low resolution. The progress of the refinement was monitored by the R free with 5% of the reflections used for the calculations [38]. High-resolution structures were used to assist the rebuilding in places with poor electron density and ambiguities in atom positions. Structure validation was performed with tools implemented in Phenix and Coot. The stereochemistry and conformation of the sugars were tested with Privateer [39]. Data collection and final refinement statistics are shown in Table 1. The atomic coordinates and the structure factors have been deposited to the Protein Data Bank under the accession code 6SZ6.

Conclusions
The CtBGL structure was determined at 2.99 Å resolution and refined to good stereochemical and refinement numbers, despite the limited resolution. The structure exhibited a three-domain architecture with two linker regions as other β-glucosidase structures. Variations in the length of the first linker regions and the third domain were identified. CtBGL showed the highest structural similarity with NcCel3A. The catalytic residues at the active site were identified as Asp287 (nucleophile) and Glu517 (acid/base). CtBGL showed a low number of glycans at glycosylation sites compared to other heavily glycosylated GH3 β-glucosidases. Charged residues and the glycosylation pattern are suggested as potential contributing factors for the thermostability properties of CtBGL. The analysis presented in this study could offer new ideas towards further improvements in β-glucosidases for better use in biotechnological applications.