Evolutionary history and activity towards oligosaccharides and polysaccharides of GH3 glycosidases from an Antarctic marine bacterium

Glycoside hydrolases (GHs) are pivotal in the hydrolysis of the glycosidic bonds of sugars, which are the main carbon and energy sources. The genome of Marinomonas sp. ef1, an Antarctic bacterium, contains three GHs belonging to family 3. These enzymes have distinct architectures and low sequence identity, suggesting that they originated from separate horizontal gene transfer events. M-GH3_A and M-GH3_B, were found to differ in cold adaptation and substrate specificity. M-GH3_A is a bona fide cold-active enzyme since it retains 20 % activity at 10 ◦ C and exhibits poor long-term thermal stability. On the other hand, M-GH3_B shows mesophilic traits with very low activity at 10 ◦ C ( < 5 %) and higher long-term thermal stability. Substrate specificity assays highlight that M-GH3_A is a promiscuous β -glucosidase mainly active on cellobiose and cellotetraose, whereas M-GH3_B is a β -xylosidase active on xylan and arabinoxylan. Structural analysis suggests that such functional differences are due to their differently shaped active sites. The active site of M-GH3_A is wider but has a narrower entrance compared to that of M-GH3_B. Genome-based prediction of metabolic pathways suggests that Marinomonas sp. ef1 can use monosaccharides derived from the GH3-catalyzed hydrolysis of oligosaccharides either as a carbon source or for producing osmolytes.


Introduction
Polar marine bacteria must face various stressful conditions [1].Indeed, they are constantly exposed to low temperatures, which reduce membrane fluidity and hinder macromolecular interactions and enzyme kinetics.Moreover, they have to cope with additional stresses, such as high oxidative stress, high osmotic pressure and low nutrient availability [2].Therefore, psychrophiles have evolved a multitude of adaptive strategies to counteract all these stressors [3].Among these, a wide range of cold-active enzymes allow psychrophilic organisms to maintain high metabolic activity at low temperatures [4][5][6], and to survive in nutrient-poor environments such as Arctic and Antarctic marine sediments [7][8][9].The organic matter in marine sediments of the polar regions is largely composed of high molecular weight polymers such as proteins and glycans, and its composition depends on several factors including the activity of bacterial communities or river runoff induced by the warming of soil's permafrost [10].Glycans are polysaccharides produced by photosynthetic organisms, i.e. terrestrial plants, and marine algae, thus representing the largest carbon reservoirs for marine environments.Recently, xylans from terrestrial plants have been found in Baltic Sea sediments suggesting a transport of plant matter from land to sea [11].
Among hydrolytic enzymes, glycoside hydrolases (GHs) play a key role in the degradation of glycans and in their metabolism [12,13].GHs catalyze the hydrolysis of glycosidic bonds and are classified by the CAZy database based on their sequence identity in 183 different families [14].Among GHs, family 3 (GH3) groups promiscuous enzymes with β-glucosidase, β-xylanase, β-glucuronidase, β-N-acteyl-hexoaminidase and α-L-arabinofuranosidase activities.These enzymes are widespread in plants, fungi and bacteria, where they perform diverse functions including carbohydrate degradation, cell wall remodeling and defense against pathogens [15,16].Most GH3s have a catalytic core formed by two domains with an (α/β) 8 (TIM) barrel structure and an (α/β) 6 sandwich structure, respectively.The (α/β) 8 (TIM) barrel domain contains the Asp residue acting as the catalytic nucleophile, while the (α/β) 6 sandwich domain with a Glu acting as the catalytic acid/base residue in a double displacement mechanism.In addition to the catalytic core, the GH3 architecture sometimes displays accessory domains such as the fibronectin-like and the PA14 domains, whose functions are still unknown [17][18][19][20][21].
In this work, we investigated the structural and functional features of two GH3s identified in the genome of Marinomonas sp.ef1 (M-GH3s), an Antarctic bacterium [22,23].Although these enzymes belong to the same glycoside hydrolase family, they show different evolutionary origins and activity towards natural oligosaccharides and polysaccharides.

Search for GH3s in the genome of Marinomonas sp. ef1
Genes coding for GHs were searched in the genome of Marinomonas sp.ef1 (NCBI: GCA_002806845) with hhmscan from HHMER v3.3.2 [24], using the family/subfamily profile hidden Markov models from dbCAN2 [25], by using a restrictive e-value of e − 30 .The predicted molecular weight was determined with Expasy ProtParam [26].

Distribution of GH3s in Marinomonas species
The genomes of Marinomonas species available in the NCBI Bio-Sample database (https://www.ncbi.nlm.nih.gov/genbank/,accessed on 17/05/2023) were annotated with dbCAN2 to identify the sequence of putative GH3s.The sequence identity between M-GH3s and GH3s from other Marinomonas species was determined using an identity matrix calculated with pseqsid script (https://github.com/amaurypm/pseqsid).
The evolutionary lineage of Marinomonas species was inferred through phylogenetic analysis of the nucleotide sequences of 16S rDNA and of the gene encoding the β-subunit of bacterial RNA polymerase (rpoB).For each genome, multiple copies of the 16S rDNA genes were clustered at the 98 % sequence identity threshold and the resulting centroid was used for phylogenetic analysis performed by a Bayesian Monte Carlo Markov Chain method using BEAST v.1.10.4 [27].The 16S rDNA and the rpoB genes of Oceanispirillum sanctuary and Pseudospirillum japonicum served as outgroups.
Divergence times of the two outgroup species were calibrated based on the data obtained from https://timetree.org/.According to these data, O. sanctuary and P. japonicum diverged 338 Mya and separated from the Marinomonas lineage 425 Mya.A single joint tree was built based on the 16S rDNA and the rpoB genes.A general time reversible matrix, specific to each gene, was used to model nucleotide substitution patterns.This matrix incorporated a (per-gene) proportion of invariant sites and a (per-gene) gamma-distributed rate variation with four categories.A strict molecular clock was employed, considering distinct evolution rates for each gene.The tree prior was set up as a calibrated Yule model, with time calibrations established based on the average of a normal distribution with standard deviation of 10 %.Default values were used for all other parameters.A Monte Carlo Markov Chain analysis was conducted, implementing three chains of 20 million steps each.
The initial 50 % was discarded as "burn-in", and sampling was performed every 2000 steps.Tracer v.1.7.2 [28] was used for verifying the convergence of parameters within each chain.

Evolutionary history of GH3 enzymes
A phylogenetic analysis of the GH3 family was performed to predict the functional properties of the M-GH3s and to explore their evolution within the family.The sequences of characterized GH3s were retrieved from the CAZy database (http://www.cazy.org/GH3.html,accessed 15/ 06/2023), while information regarding substrate specificity was collected from literature.The 3D structure of every GH3 was acquired from either the PDB or AlphaFold (AF) databases (https://alphafold.ebi.ac.uk).For each GH3, the catalytic and accessory domains were manually trimmed and annotated (if average plDDT >0.75 for AF models).To exclude GH3 enzymes with high levels of similarity, the catalytic domains were clustered at 90 % sequence identity threshold with cd-hit 4.8.1 (https://github.com/weizhongli/cdhit).The sequences obtained listed in Table S1 were aligned with mafft v.7.471 [29].The DASH option was employed to use the structural information to guide multiple sequence alignment [30].The alignment obtained by trimming insertions shared by fewer than 25 % of sequences was used to estimate a rooted maximum likelihood tree with IQ-Tree v.2.2.2.7 software [31].The phylogenetic analysis used the non-time reversible protein substitution matrix NQ.pfam (estimated from Pfam version 31 database [32]) and included a gamma distributed rate variation with four categories.Branch supports were obtained by using 1000 ultrafast bootstrap replicates [33] and transfer bootstrap expectation [34].

Identification of hypothetical metabolic pathways
The hypothetical metabolic pathways (https://biocyc.org/)were inferred from the Marinomonas sp.ef1 genome using GapSeq v1.2 [35].The default parameters were retained and only "Good Blast" annotations of protein coding genes were considered.Only pathways 100 % complete are reported.OperonMapper [36] was used to identify operons of multiple genes involved in the same metabolic pathway of interest.

3D structure prediction
The 3D molecular models of M-GH3_A and M-GH3_B were predicted using AlphaFold v.2.3.2 [37], v3 model and ColabFold v.1.5.2 (https://g ithub.com/sokrypton/ColabFold)[38], in the oligomeric state determined by size exclusion chromatography (SEC, see below).The default ColabFold parameters were retained, allowing the use of structural templates from PDB and a more thorough sampling by activating the dropout option and using 3 different seeds.Only monomeric models with plDDT ≥0.90 or multimeric models with iPTM score ≥ 0.75 were retained and the best according to both metrics was selected.The sidechain torsion angles were refined using DiffPack [39], followed by an energy minimization and a short NVT classical molecular dynamics simulation (8000 steps) in TIP3P water molecules with 10 Å padding.This step was performed using OpenMM 7.7.0 [40] under the amber ff14SB force field [41], with a time step of 2.0 fs and at 30 • C.

Per-residue substrate binding affinity estimate
In silico models of D-cellobiose and D-xylobiose were prepared using the Avogadro 1.2.0 software [42] and minimized by a steepest-descent algorithm under the general Amber force field [43].The AM1-BCC charges were assigned to the ligand by using Antechamber with the semi-empirical quantum mechanics method within the Ambertools21 package.Cellobiose and xylobiose were docked with the Gnina v.1.0.3 software [44] to the refined 3D models of M-GH3_A and M-GH3_B.The docking box (a square with 20 Å per side), representing the active site, was centered at catalytic Asp (D232 in M-GH3_A and D288 in M-GH3_B).The exhaustiveness of 32 was set to sample 10 docking poses with default root-mean-square deviation (RMSD) for clustering.The CNN score was used to rank docking poses.According to the retaining catalytic mechanism [45], a docking pose was considered catalytically A. Marchetti et al. competent when the distances between: i) the O γ of catalytic Asp and the C 1 of the non-reducing end sugar monomer and ii) the O δ of catalytic Glu (E419 in M-GH3_A and E529 in M-GH3_B) and the glycosidic oxygen (facing the catalytic Glu) are <4.5 Å.The best catalytically competent docking pose was selected for subsequent steps of refinement through AdaptivePELE v1.7.2 [46].Five independent replicas were performed and the result averaged.The system was prepared for each replica by hydrogenating residues with pdb2pqr v.3.2.0 [47] at the optimum pH of each enzyme.An MD engine (OpenMM v.7.7.0 [40]) was employed for propagation, with a "production length" of 4 ns, reporting every 200 ps, and performing 5 iterations.The ff14SB force field and explicit water solvent were added with a 10 Å padding."minimization iterations" was set to "8000".All other parameters were at default.
The calculation of binding free energy (ΔG bind ) for the interaction between the AdaptivePELE samples of each GH3-substrate complex was estimated by applying molecular mechanics energies combined with the generalized Born and surface area continuum solvation, using mmpbsa.py [48] from the ambertools21 package.The prEFED protocol was used to decompose the binding free energy at residue level as described in [49].The ΔG bind values were averaged over the replica means.Residues were considered hot spots of interaction if their average energy contribution was ≤ − 1.0 kcal⋅mol − 1 .

M-GH3s expression and purification
Sequence coding for M-GH3s were optimized for expression in Escherichia coli cells, chemically synthesized (Genscript, Piscataway, NJ, USA) and cloned in frame with a C-terminal 6× His-Tag into the pET21 plasmid (EMD, Millipore, Billerica, MA, USA) between the NdeI and XhoI sites.These plasmids were used to transform E. coli BL21(DE3) cells (EMD, Millipore, Billerica, MA, USA).Recombinant M-GH3s were produced in Zym 5052 medium [50] with 100 μg/L of ampicillin (Merck, Darmstadt, Germany), for 24 h at 25 • C. Cells from 1 L of culture were harvested by centrifugation at 4000g for 10 min at 4 • C and the cell pellet was suspended in 15 mL of lysis buffer (50 mm sodium phosphate pH 8.0, 300 mM NaCl and 10 mM imidazole).Crude extracts were prepared by lysing the cells with a cell disruptor (Constant Systems Ltd., Daventry, UK) at 3.67⋅10 5 atm (25 kpsi) and clarified by centrifugation at 6000g for 10 min at 4 • C. M-GH3_A and M-GH3_B were purified from the soluble fraction of the cell lysate by metal ion affinity chromatography on Nickel-nitrilotriacetic acid agarose resin (Thermo Fisher Scientific, Waltham, MA, USA).M-GH3_C was extracted from the insoluble fraction of E. coli cells with 10 mL of extraction buffer (50 mM sodium phosphate pH 11.0, 300 mM NaCl and 8 M urea).Then, the pH of the solution was adjusted to pH 8.0 with HCl (0.1 M) and the solution was clarified by centrifugation at 5000g for 10 min at room temperature.The clarified solution was loaded in a column containing 1 mL of Nickelnitrilotriacetic acid agarose resin and a series of washes at decreasing concentrations of urea (from 8 M to 0 M) were carried out.Samples were eluted with 2 mL of elution buffer (50 mM sodium phosphate pH 8.0, 300 mM NaCl and 250 mM imidazole).
Elution fractions containing highest protein concentration were pooled and buffer-exchanged with 100 mM sodium phosphate buffer (PB), pH 7 by pre-packed PD10 columns (GE Healthcare, Little Chalfont, UK).Samples were concentrated with Amicon Ultra centrifugal filters (Merck Millipore, Burlington, US) to a final concentration of 2 mg/mL of protein.Protein concentration was determined by Bradford protein assay (Bio-Rad, California, USA), using bovine serum albumin as a standard.

Activity assay
Enzymatic assays were carried out using a panel of substrates: para- para-nitrophenyl β-D-mannopyranoside (pNPMan) and para-nitrophenyl β-D-glucopyranosiduronic acid (pNPGlcA).Reactions containing 0.01 IU of enzyme were performed in PB and stopped, after 3 min, by adding an equal volume of 1 M sodium carbonate pH 11.The absorbance was measured at 405 nm (molar extinction coefficient: 18.6 mM − 1 ⋅cm − 1 ) using a Jasco V-770 UV/NIR spectrophotometer (JASCO Europe, Lecco, Italy).One unit of enzyme activity was defined as the amount of the enzyme catalyzing the formation of 1 μmol of para-nitrophenol per minute under saturating substrate conditions (10 mM) at 25 • C.
The optimal catalysis conditions were determined in PB using 10 mM of pNPGlc and pNPXyl as substrates, for M-GH3_A and M-GH3_B, respectively.The optimal pH of catalysis was measured in the pH range 3.0-10.0 in Britton-Robinson buffer, at 50 • C for M-GH3_A and at 60 • C for M-GH3_B.The optimal temperature of catalysis (T opt ) was recorded in the temperature range 10-90 • C, at pH 7.5 for M-GH3_A and at 6.5 for M-GH3_B.
The kinetic parameters were determined at optimal catalysis conditions on pNPGlu, pNPXyl from 0.1 mM to 20 mM for M-GH3_A and M-GH3_B, and on pNPGal in the range from 2 mM to 30 mM for M-GH3_A only.For each substrate concentration, four time points were obtained by stopping the reactions at 30-s intervals.The angular coefficient of the resulting linear regression was used to calculate V 0 .To calculate the kinetic parameters, the V 0 values from three independent measurements were plotted against the substrate concentration and fitted with the ORIGINLAB software (OriginLab Corporation, Northampton, MA, USA), using the Michaelis-Menten equation.The resulting kinetic parameters were reported with their fitting errors.

M-GH3s thermal stability assays
Thermal denaturation experiments were carried out by monitoring the circular dichroism (CD) signal at 200 nm in the temperature range from 10 • C to 90 • C with a Jasco J815 spectropolarimeter (JASCO Europe, Lecco, Italy).Measurements were performed in a 0.1 cm pathlength quartz cuvette and a temperature slope of 1 • C/min.
Long-term thermal stability was assessed by measuring the residual activity at T opt , after incubating M-GH3s (protein concentration: 0.5 mg/ mL) in PB, pH 7.5, at 5 • C, 25 • C and 35 • C. Experiments were performed in triplicate and reported as a mean ± standard deviation.

Polysaccharides degradation
Hydrolysis of polysaccharides and oligosaccharides was evaluated on carbohydrates purchased from Megazyme (Megazyme International Bray, Ireland), Merck (Merck Darmstadt, Germany) and VWR Chemicals (VWR International, Radnord, USA), including carboxy-methylcellulose (VWR code: 22525.296),xylan (Megazyme code: P-XYLNBE), arabinoxylan (Megazyme code: P-WAXYL), xyloglucan (Megazyme code P-XYGLN), mannan (Megazyme code: P-MANIV), κ-carrageenan (Sigma-Aldrich code: 22048), galactomannan (Megazyme code: P-GGMMV), cellobiose (Sigma-Aldrich code: 1.02352) and cellotetraose (Megazyme  Molecular phylogeny of GH3s.The rooted maximum likelihood phylogenetic tree was obtained by aligning the catalytic domains of 300 characterized M-GH3s extracted from the CAZy database (http://www.cazy.org/GH3.html).To compute the branch support values, 1000 ultra-fast bootstrap replicates were performed, and support values are reported if <100.The GH3s from Marinomonas sp.ef1 are underlined.Insertion, deletion, and translocation of the domains was reported with respect to the inferred evolutionary history of GH3 catalytic domains.The phylogenetic tree was generated with FigTree v1.4.4 (https://github.com/rambaut/figtree/releases) and customized.The table contains the architectures of M-GH3s manually annotated from the 3D structure available in PDB or AF predicted 3D structures (average domain-wise pLDDT >0.75) available in AlphaFold-DB (https://alphafold.ebi.ac.uk).Numbers in bold indicate the cluster number in the phylogenetic tree, while those in brackets indicate the number of sequences with that specific architecture, in case more than one is present per-cluster.code: O-CTE).Reactions containing 1 % of polysaccharide or oligosaccharide and 1 mg/mL of each purified M-GH3 were performed in PB at 25 • C, at 800 rpm mixing speed in a thermal shaker (Eppendorf, Hamburg, Germany).After 2 h of incubation, hydrolysis products were analyzed by high-performance anion-exchange chromatography (HPAEC) on a Dionex ICS-6000 Ion Chromatography System coupled with pulsed amperometric detection (PAD) and equipped with a Car-boPac PA 210 column (Dionex Corporation, CA, USA).Elution was performed in KOH gradient.An initial phase of 6.5 min of isocratic elution at 12 mM was followed by a linear gradient from 12 mM to 100 mM in 5 min.100 mM KOH was held for 6 min, then the initial condition of 12 mM KOH was reached in 0.5 min.Elution was performed at a flow rate of 0.6 mL/min.Calibration curves were prepared using pure monosaccharides (glucose, xylose, arabinose), dissolved in Milli-Q water.Chromeleon® (6.8) software was utilized for data processing.The HPAEC chromatograms of standard glucose, arabinose and xylose are shown in Fig. S2.
To study the evolutionary history of M-GH3s and infer their substrate specificity, a phylogenetic analysis was performed using a dataset containing all the characterized GH3s available in the CAZy database.The resulting rooted tree shows two main lineages (Fig. 2), one containing Fig. 3. Biochemical features of M-GH3s.Effects of pH on the activity of M-GH3_A (A) and M-GH3_B (B).Temperature profile of M-GH3_A (C) and M-GH3_B (D).The activity of M-GH3_A and M-GH3_B were monitored using pNGlc and pNXyl as a substrate, respectively.All the experiments were performed in quadruplicate and the shadowed area refers to the standard deviation of the data (n = 4).SEC analysis of M-GH3_A (E) and M-GH3_B (F).SEC were performed in PB, one of three independent measurements is shown.The red dots represent the MW estimated from three independent measurements using the calibration curve shown in Fig. S1.exclusively enzymes with β-glucosidase activity (clusters 1-7), and other, more heterogeneous, grouping mainly β-glucosidases (clusters 9, 10, 12, 13 and 15-20), β-xylosidases (clusters 8 and 11) and β-N-acetylhexosaminidase (cluster 14).M-GH3_A, M-GH3_B and M-GH3C are nested in clusters 1, 10 and 13, suggesting they have β-glucosidase, β-xylosidase and β-glucosidase activity, respectively.The sequence identity between the M-GH3s and the members of each cluster ranges from 31 % to 52 % (Table S2).
The architecture of known GH3s (Fig. 2) includes a two-domain catalytic core consisting of an N-terminal catalytic domain (in magenta in Fig. 2), a C-terminal catalytic domain (in orange in Fig. 2), and a fibronectin-like domain (in cyan in Fig. 2).Notably, the fibronectin-like domain is lacking in clusters 13, 14 and 15, whereas it is duplicated in cluster 9. Additional domains observed in the GH3s architectures include the C-terminal domain (inserted in cluster 1), the PA14 domain (clusters 4, 6 and 8), and the glutathione S-transferase-like domain (in cluster 10).The architecture of each M-GH3 reflects that of the cluster to which it belongs.

M-GH3s have different biochemical features
M-GH3s were recombinantly produced in Escherichia coli cells.M-GH3_A and M-GH3_B were obtained as soluble proteins and purified by affinity chromatography with a yield of 5 mg and 15 mg per liter of culture, respectively.Recombinant M-GH3_C was insoluble, and the refolding of solubilized aggregates resulted in a partially folded and inactive protein (data not shown).For this reason, we will describe the structural and functional characterization of M-GH3_A and M-GH3_B.
Activity assays point out that M-GH3_A and M-GH3_B exhibit the highest activity at pH 7.5 and 6.5, respectively (Fig. 3A and B).M-GH3_A shows highest activity at 50 • C and retains 20 % activity at 10 • C, whereas M-GH3_B has a T opt of 60 • C and maintains only 5 % activity at 10 • C (Fig. 3C and D).
The thermal stability of M-GH3s was investigated by combining thermal denaturation experiments with long-term thermal stability assays.Thermal denaturation experiments, performed by CD spectroscopy, show that M-GH3_A (T m : 66.8 ± 0.9 • C, Fig. 4A) has a higher unfolding transition midpoint temperature than M-GH3_B (T m : 59.5 ± 1.3 • C, Fig. 4B).Long-term thermal stability assays, carried out at 5 • C, 25 • C and 35 • C, suggest that M-GH3_A is more thermolabile than M-GH3_B.Indeed, M-GH3_A completely loses its activity after 6 days of incubation at 5 • C and after 8 and 4 h at 25 • C and 35 • C, respectively (Fig. 4B-D).On the other hand, M-GH3_B maintains its activity for 7 days at all tested temperatures (Fig. 4B-D).Overall, our results indicate that M-GH3_A is a bona fide cold-active enzyme, while M-GH3_B is endowed with some mesophilic traits such as thermostability and low activity in the cold.

M-GH3s displays different substrate specificity
The hydrolytic activity of M-GH3s was tested on para nitrophenyl glycosides, cellobiose, cellotetraose and polysaccharides (cellulose, laminarin, xylan, arabinoxylan, mannan, galactomannan and κ-carrageenan).M-GH3_A exhibits the highest specific activity on pNPGlc, pNPGal, pNPXyl, pNPFuc and pNPClb, while displaying lower activity on pNPMan, pNPAra and pNPGlcA, and no activity on pNPαGlc (Table 1).On the other hand, M-GH3_B exhibits a narrower substrate specificity, displaying high specific activity on pNPXyl and pNPAra, poor activity on pNPGlc and negligible or no activity on other substrates (Table 1).The analysis of the kinetic parameters indicates that M-GH3_A has a higher catalytic efficiency (kcat/K M ) towards pNPGlc than pNPXyl, suggesting that this enzyme is a β-glucosidase rather than a β-xylosidase (Table 2).In contrast, the catalytic efficiency of M-GH3_B towards pNPXyl is 2 and 14 times higher than those determined with pNPAra and pNPGlc as substrates, respectively (Table 2), indicating that this enzyme is likely a β-xylosidase.
Both M-GH3s degrade xylan and arabinoxylan (Fig. 5) whereas they are not active on mannan, galactomannan, laminarin and κ-carrageenan (Table 1).The analysis of xylan and arabinoxylan degradation products indicates that both enzymes release arabinose and/or xylose although with different yields (Fig. 5, Table 1).This suggests that both M-GH3s  act as exoglycosidases and are also active on α1-3 or α1-2 L-arabinofuranosidic bonds (Fig. 5E and F).
The hydrolysis of cellulose and its derivatives, namely cellobiose and cellotetraose is more complex (Fig. 6 and Table 1).While M-GH3_A is catalytically active towards both cellobiose (glucose yield: 3.2 ± 0.6 g/ L) and cellotetraose (glucose yield: 251.9 ± 6.2 mg/L), M-GH3_B does not show significant activity towards these compounds.Both enzymes are inactive towards cellulose.
The hypothetical metabolic pathways of monosaccharides derived from GH3-catalyzed hydrolysis (i.e.glucose, xylose and arabinose) were predicted by genome analysis using GapSeq v1.2 [35].While glucose can enter the glycolytic pathway (Fig. S3), two main pathways can be hypothesized for xylose and arabinose metabolism.The D-xylose metabolic pathway (BioCyc ID: XYLCAT-PWY, Fig. S3) includes the xylA gene coding for a xylose isomerase and an operon that contains genes responsible for xylose metabolism and transport (Fig. S3).The xylulose 5-P produced at the end of this pathway can then enter other metabolic pathways, such as the pentose phosphate pathway.The L-arabinose metabolic pathway (BioCyc ID: PWY-5515, Fig. S3) is more elusive since the genes putatively involved are distributed across three different operons.It was hypothesized that the enzymes involved in arabinose metabolism collectively convert L-arabinose to xylitol, an osmolyte typically associated with cold stress resistance [51,52].
Overall, the results demonstrate that both M-GH3s have exo-activity with distinct specificities towards colorimetric and natural substrates.M-GH3_A is a β-glucosidase with a broad substrate specificity, while M-

Table 2
Kinetics parameters of M-GH3s.The kinetics parameters were determined at optimal catalysis conditions.GH3_B is a β-xylosidase that is also active on α1-3 or α1-2 L-arabinofuranosidic bonds.It is worth noting that the experimental results support the activities predicted by the phylogenetic analysis.In addition, the monosaccharides resulting from GH3-catalyzed hydrolysis can be used by Marinomonas sp.ef1 either as a carbon source or for producing osmolytes.

Different substrates-active site interactions determine the specificity of M-GH3s
To investigate the structural reason for the different substrate specificity of M-GH3_A and M-GH3_B, we modeled their 3D structures with [37] and performed molecular docking simulations.The enzyme-substrate complex was refined with AdaptivePELE coupled with a per-residue end-state binding free energy estimate.The per-residue accuracy value indicates good quality of both models (plDDT of 0.949 for M-GH3_A and of 0.944 for M-GH3_B; iPTM score of 0.946 for M-GH3_B), making them suitable for further analyses.M-GH3_A is monomeric and consists of four domains (Figs. 3E and 7A): a (β/α) 8 TIM barrel domain (residues G61 to L286, in magenta), an (α/β) 6 sandwich domain (residues V319 to Y549, in orange), fibronectin-like domain (residues F552 to A669, in cyan), a long linker (c.a.40 amino acids) followed by an additional C-terminal domain (residues L670 to I800, in gray).The fibronectin-like and the C-terminal domains are additional domains with unknown structural and functional roles.The GH3s belonging to cluster 1 whose 3D structures are known, namely BglB from Acetivibrio thermocellus ATCC 27405 (PDB: 7MSE, sequence identity: 48.4 %) and PstG from Paenibacillus relictisesami (PDB: 8J9F, sequence identity: 49.8 %), are dimers in which the quaternary structure is stabilized by the interaction of the C-terminal domain of one protomer with the (β/α)8 TIM barrel domain of the other protomer [53].The monomeric state of M-GH3_A is probably due to the length of the linker connecting the fibronectin-like and C-terminal domains, which is 30 amino acid residues longer than those of BglB and PstG.M-GH3_B is a dimer, and each monomer consists of three domains (Figs. 3F and 7B): a (β/α) 8 TIM barrel domain (residues T23 to V354, in magenta), an (α/β) 6 sandwich domain (residues V390 to Y648, in orange) and a fibronectin-like domain (residues E686 to A755, in cyan).The (β/α) 8 TIM barrel domain and (α/β) 6 sandwich domain form the catalytic core of both enzymes.The quaternary structure of M-GH3_B is similar to that of the GH3 from Thermotoga maritima (PDB: 7ZB3), which belongs to cluster 11 and shares 44.2 % of sequence identity.
The putative catalytic residues are an Asp residue (D232 in M-GH3_A and D288 in M-GH3_B), which acts as a nucleophile and a Glu residue (E419 in M-GH3_A and E529 in M-GH3_B), which acts as an acid/base residue.In both M-GH3 enzymes, a lid loop, spanning residues Q53 to A64 in M-GH3_A, and R47 to I60 in M-GH3_B (in yellow in Fig. 7A and  B), probably controls the access to the active site.In M-GH3_A, this lid loop, along with the additional C-terminal domains completely covers the entrance of the active site.Conversely, the absence of the additional C-terminal domain in M-GH3_B results in a small opening that might help the access of polysaccharides into the active site.
Molecular docking simulations were employed to investigate the interactions between M-GH3s and cellobiose/xylobiose, the substrates that support highest specific activities.Through this analysis, we identified the amino acidic residues that are predicted to interact with both cellobiose and xylobiose.Molecular docking simulations revealed differences in the substrate binding residues of the two enzymes, which are F27, D45, R51, R164, M165, W233, S352, M765 and F777 in M-GH3_A (Fig. 8A), and W35, R73, M117, H222, D288, Y289, Y424, H429, L432, L521, F522 in M-GH3_B (Fig. 8B).The same residues were identified using a complementary approach, namely AF2BIND [54], which makes us confident about our docking analysis.In particular, M-GH3_A residues D45 and W233 have binding free energy (ΔG bind ) more negative when interacting with cellobiose than xylobiose and could play a key role in the interaction between M-GH3_A and this sugar, making it favorable compared to that with xylobiose (Fig. 8C).The conservation analysis indicates that these two residues are highly conserved in cluster 1 enzymes (Fig. 8E) and play a key role in the coordination of cocrystallized glycerol and glucose molecules contained in the active site of PstG and of GlyA1, respectively (Fig. S4A) [53].On the other hand, in M-GH3_B both substrates show similar interaction energies, with the conserved residues R73, E111, Y424 and H429 contributing the most to the positioning of the sugar moieties in the active site (Fig. 8D and E).Slight differences in the binding modes of the two substrates were observed; the distance between the Oγ of the catalytic Asp with C1 of the disaccharide suggested that for M-GH3_B the xylobiose is in a  catalytically more favorable position than cellobiose (Table 3).These residues are conserved and are involved in the coordination of a xylobiose molecule co-crystallized with the GH3 from T. maritima (PDB: 7ZB3) (Fig. S4B).

Discussion
Marinomonas sp.ef1 is a psychrotolerant bacterium isolated from the microbial consortium of Euplotes focardii, an Antarctic marine ciliate [ 22,55].In addition to the strict temperature requirements, it is known that the cold environment imposes adaptation to low nutrient availability [2].Therefore, among hydrolytic enzymes, GHs play an important role in the degradation of environmentally poly-and oligosaccharides [12].Marinomonas sp.ef1 has 34 genes coding for putative GHs classified in 19 different families.An insight into their heterogeneity is provided by the characterization of the M-GH1 and M-GH42 and the observation of their different thermal and properties and their substrate specificity towards β-galactosidic (M-GH1 and M-GH42) and β-glucosidic bonds (M-GH1) [56,57].
This study focused on GH3 enzymes identified in the genome of Marinomonas sp.ef1 and conserved in phylogenetically related Marinomonas spp.isolated from cold environments.GH3 is one of the largest families in the CAZy database and includes enzymes with β-glucosidase, β-xylosidase and N-acetylhexosaminidase activities.Usually these enzymes show exo-activity and act in synergy with endo-glucosidase (e.g.GH5, GH6 and GH7) and endo-xylanase (e.g.GH8, GH10, GH11 and GH30) in the degradation of polysaccharides [58].Since GH3s have been frequently found in genomes and metagenomes isolated from hot environments [59][60][61], it can be supposed that this enzyme family plays a key role in polysaccharide degradation and adaptation to extreme environments.
Typically, cold-active enzymes are characterized by activity at low temperatures, low thermal stability, and undergo thermal inactivation before any significant change in their secondary structure; this behavior is evidenced by the so-called temperature gap (T GAP ), namely the difference between T M and T opt [5,62,63].Our results indicate that M-GH3_A and M-GH3_B display contrasting thermal and catalytic properties.M-GH3_A is a bona fide cold-active enzyme with 20 % of activity at 10 • C, and a T GAP of 16.8 • C. M-GH3_B while retaining 5 % of activity at 10 • C, exhibits mesophilic properties, i.e. high long-term thermal stability and temperature of inactivation coincident with that triggering the loss of secondary structure (T GAP : -0.5 • C).Overall, the biochemical features of these two enzymes combined with their evolutionary history suggest that M-GH3s have different phylogenetic origins and were probably acquired during the evolution of Marinomonas species by separate events of horizontal gene transfer and subsequently lost in some lineages.
In terms of substrate specificity, both M-GH3s are exo-acting enzymes, with M-GH3_A being a promiscuous β-glucosidase, and M-GH3_B a β-xylosidase with a narrow substrate specificity.The divergence in the substrate specificity of these two enzymes is probably due to the different shape of the catalytic chamber and its entrance, as suggested by our 3D models.More in detail, the catalytic chamber of M-GH3_A is predicted to be wider but has a narrower entrance than that of M-GH3_B, resulting in a negligible activity towards polysaccharides and a broad substrate specificity towards relatively small molecules such as cellobiose and cellotetraose.The narrower entrance of M-GH3_A is likely the result of the interaction between the C-terminal domain and the (β/α) 8 TIM barrel domain, which is enabled by the length and flexibility of the linker connecting the fibronectin-like and C-terminal domains.Structural and sequence analysis suggests that the size of the catalytic chamber and the C-terminal domain may serve as distinctive traits shaping the evolutionary trajectory of M-GH3.Intriguingly, the phylogenetic analysis of characterized GH3s reveals the existence of many distinct subfamilies, also grouped in at least four classes, based on substrate specificity towards β-glucans (clusters 1-7), xylans (clusters 8 and 11), and N-acetyl-β-D hexosaminides (cluster 14).Overall, our results indicate that substrate specificity within the GH3 family can be predicted by phylogenetic analysis.It should be noted that substrate specificity data are not available for all GH3s, and atypical activities, such as β-glucuronidase [64], may be underestimated.Although the correlation between the phylogenetically conserved residues that form the catalytic chamber and the interaction model between the enzyme and the substrate appears to be a promising tool for assessing the specificity of new GH3s and their classification, further structural studies are necessary to strengthen this approach.
M-GH3s lack a signal peptide for secretion, suggesting intracellular activity.They likely play a crucial role in the intracellular hydrolysis of oligosaccharides resulting from the degradation or breakdown of cellulose, xylan, and arabinoxylan by extracellular enzymes secreted by Marinomonas sp.ef1 or other bacteria belonging to the microbial consortium of Euplotes focardii [65,66].Genome analysis revealed various operons that probably govern the catabolism of xylose and arabinose.Notably, canonical polysaccharide utilization loci [10,67] were found to be absent.Overall, our research indicates that Marinomonas sp.ef1 possesses a variety of GHs involved in the hydrolysis of glycosidic bonds, such as β-galactosidic, β-glucosidic and β-xylosidic sugar bonds [56,57].
In conclusion, this study presents a new method for annotating genes that may encode hydrolytic enzymes of the GH3 family.Additionally, it clarifies the physiological function of these enzymes in the adaptation of Antarctic bacteria, as demonstrated by Marinomonas sp.ef1.

Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests.Marco Mangiagalli reports financial support was provided by University Table 3 AdaptivePELE MD simulation statistics.The reported values (Å) are averaged over simulation replicates.The RMSF is averaged over backbone atoms of residues.The "lid" structure is highlighted in Fig. 3. RMSD: root mean square deviation; RMSF: root mean square fluctuation.

Fig. 1 .
Fig. 1.Distribution of GH3s in Marinomonas spp.Sequences encoding putative GH3s were extracted from the genome of Marinomonas spp.available in the NCBI BioSample database.The heatmap on the right displays the sequence identity of Marinomonas spp.GH3s in comparison to M-GH3s.White cells represent enzymes with <40 % identity with M-GH3s or those missing in the genome.Phylogeny of Marinomonas spp. was performed based on 16S rDNA and rpoB genes using BEAST v1.10.4 software [33] and two outgroup time calibrations from Timetree of Life.The branch lengths represent the median of the posterior distribution.The posterior distribution of nodes is reported if <1.0.Psychrophilic and mesophilic species are highlighted with blue and gray dots, respectively; empty dots indicate Marinomonas spp.isolated from sites whose environmental conditions are unknown.The phylogenetic tree was generated with FigTree v1.4.4 (https://github.com/rambaut/figtree/releases). Maa: millions of years ago.

Fig. 2 .
Fig. 2.Molecular phylogeny of GH3s.The rooted maximum likelihood phylogenetic tree was obtained by aligning the catalytic domains of 300 characterized M-GH3s extracted from the CAZy database (http://www.cazy.org/GH3.html).To compute the branch support values, 1000 ultra-fast bootstrap replicates were performed, and support values are reported if <100.The GH3s from Marinomonas sp.ef1 are underlined.Insertion, deletion, and translocation of the domains was reported with respect to the inferred evolutionary history of GH3 catalytic domains.The phylogenetic tree was generated with FigTree v1.4.4 (https://github.com/rambaut/figtree/releases) and customized.The table contains the architectures of M-GH3s manually annotated from the 3D structure available in PDB or AF predicted 3D structures (average domain-wise pLDDT >0.75) available in AlphaFold-DB (https://alphafold.ebi.ac.uk).Numbers in bold indicate the cluster number in the phylogenetic tree, while those in brackets indicate the number of sequences with that specific architecture, in case more than one is present per-cluster.

Fig. 4 .
Fig. 4. Thermal stability of M-GH3s.A) Thermal stability of M-GH3s determined by CD spectroscopy.Ellipticity values were recorded at 205 nm during heating from 10 to 90 • C. The initial CD signal was taken as 100 % for normalization.Long-term thermal stability was measured by incubating enzymes at 5 • C (B), 25 • C (C) and 35 • C (D). M-GH3_A (black line), M-GH3_B (red line).All the experiments were performed in quadruplicate and the shadowed area refers to the standard deviation of the data (n = 4).

5 Fig. 5 .
Fig. 5. Polysaccharides degradation.Degradation of xylan in the presence of M-GH3_A (A) and M-GH3_B (B).Hydrolysis of arabinoxylan in the presence of M-GH3_A (C) and M-GH3_B (D).Reactions were carried out in triplicate at 25 • C under shaking for 2 h and analyzed with HPAEC.The pattern of xylan and arabinoxylan degradation was reported in panel E and F, respectively.Ara = arabinose; Xyl = xylose.The chromatograms of the standards are reported in Fig. S2.

Fig. 7 .
Fig. 7. 3D models of M-GH3s.3D models of M-GH3_A (A) and M-GH3_B (B) predicted with AF2 (see main text) represented in surface style.The domains are colored according to the architecture reported in Fig. 2. The active site containing xylobiose and cellobiose is represented in ribbon style.The oligomerization state of M-GH3s was determined by SEC analysis (visualized on the left).Models were rendered using Pymol v.2.5.0 (Schrödinger, LLC, New York, NY).

Fig. 8 .
Fig. 8. Interactions of cellobiose and xylobiose substrates to M-GH3s.In silico docking analysis of M-GH3_A with cellobiose (A) and of M-GH3_B with xylobiose (B) Estimation of end-state MM-GBSA average binding free energy (ΔG bind ) of M-GH3_A (C) and M-GH3_B (D) in complexes with cellobiose and xylobiose.Only residues for which the ΔG bind was < − 0.5 kcal/mol for at least one enzyme-substrate combination are reported.A dashed line indicates the threshold used to identify hotspots of interaction.The error bars report SD from three independent MD AdaptivePELE simulations.(E) The evolutionary conservation of interacting residues in the M-GH3 cluster 1 and 11 are visualized as sequence logos.The most relevant sites for comparison are indicated by black arrows.

Table 1
Substrate specificity of M-GH3s.Degradation yields were determined by applying a calibration curve with diverse concentrations of standards.
A. Marchetti et al.
Milano-Bicocca.Marina Lotti reports financial support was provided by University of Milano-Bicocca.Salvatore Fusco reports financial support was provided by University of Verona.If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. of