Evolution and separation of actinobacterial pyranose and C-glycoside-3-oxidases

ABSTRACT FAD-dependent pyranose oxidase (POx) and C-glycoside-3-oxidase (CGOx) are both members of the glucose-methanol-choline superfamily of oxidoreductases and belong to the same sequence space. Pyranose oxidases had been studied for their oxidation of monosaccharides such as D-glucose, but recently, a bacterial C-glycoside-3-oxidase that is phylogenetically related to POx and that reacts with C-glycosides such as carminic acid, mangiferin or puerarin has been described. Since these actinobacterial CGOx enzymes belong to the same sequence space as bacterial POx, they must have evolved from the same ancestor. Here, we performed a phylogenetic analysis of actinobacterial sequences and resurrected seven ancestral enzymes of the POx/CGOx sequence space to study the evolutionary trajectory of substrate preferences for monosaccharides and C-glycosides. Clade I, with its dimeric member POx from Kitasatospora aureofaciens, shows strict preference for monosaccharides (D-glucose and D-xylose) and does not react with any of the glycosides tested. No extant member of clade II has been studied to date. The two extant members of clades III and IV, monomeric POx/CGOx from Pseudoarthrobacter siccitolerans and Streptomyces canus, oxidized both monosaccharides as well as various C-glycosides (homoorientin, isovitexin, mangiferin, and puerarin). Steady-state kinetic parameters of several clades III and IV ancestral enzymes indicate that the generalist ancestor N35 slowly evolved to present-day enzymes with a much higher preference for C-glycosides than monosaccharides. Based on structural predictions of ancestors, we hypothesize that the strict specificity of bacterial clade I POx (and also fungal POx) is the result of oligomerization, which in turn results from the evolution of protein segments that were shown to be important for oligomerization, the arm, and the head domain. IMPORTANCE C-Glycosides often form active compounds in various plants. Breakage of the C-C bond in these glycosides to release the aglycone is challenging and proceeds via a two-step reaction, the oxidation of the sugar and subsequent cleavage of the C-C bond. Recently, an enzyme from a soil bacterium, FAD-dependent C-glycoside-3-oxidase (CGOx), was shown to catalyze the initial oxidation reaction. Here, we show that CGOx belongs to the same sequence space as pyranose oxidase (POx), and that an actinobacterial ancestor of the POx/CGOx family evolved into four clades, two of which show a high preference for C-glycosides.

nidulans, Irpex lacteus, Lyophyllum shimeji, Peniophora gigantea, Peniophora sp., Pha nerochaete chrysosporium, Phlebiopsis gigantea, Trametes multicolor (TmPOx), and Tricholoma matsutake] (3).Fungal POx is thought to be involved in lignocellulose degradation.The distribution of POx in the fungal kingdom is somewhat peculiar compared to other enzyme members of the GMC superfamily.In general, it is only found in relatively few fungal species but still throughout most of the fungal kingdom, and it is rarely found in two closely related fungal species (2).It was hypothesized that the pox gene had been introduced into fungi via horizontal gene transfer from bacteria and thus, the functions of POx might have been redundant in a number of fungal organisms leading to its subsequent loss, or its rare occurrence across the fungal kingdom could stem from several independent late gene transfer events from bacteria.Recently, POx was also studied from bacterial sources [Pseudoarthrobacter siccitolerans, formerly Arthrobacter siccitolerans (PsPOx), Streptomyces canus (ScPOx), and Kitasatospora aureofaciens (KaPOx)] (4-6).Bacterial POx, or more precisely ScPOx and PsPOx, show characteristics distinct from fungal POx.Both are monomeric enzymes with a molecular mass of approx.55-60 kDa, while fungal POx is typically homotetrameric (four subunits of ~65 kDa).KaPOx is positioned between the bacterial and canonical fungal POx as a homodimer (two subunits of 61 kDa each).In both ScPOx and PsPOx, the FAD is non-covalently attached, whereas it is tethered to a His in KaPOx as well as in fungal POx.
In 2021, Kumano et al. reported the characterization of a new type of enzyme, an FAD-dependent C-glycoside-3-oxidase (CGOx, EC 1.1.3.50) from Microbacterium 5-2b (CarA), Microbacterium trichothecenolyticum (MtCarA), and Arthrobacter globiformis (AgCarA), and also solved its structure (7).These enzymes were reportedly inactive on glucose but oxidized the glucose moiety of C-glycosides and, to a lesser extent, the glucose part of O-glycosides at the C-3 position.CarA, MtCarA, and AgCarA show activity on C-glycosides such as carminic acid, mangiferin, homoorientin, or isovitexin, among other glycosides (7).These glycosides are natural components often associated with various plant materials.C-glycosides are metabolized by enzymatic complexes that, in a first step, oxidize the carbohydrate part of the substrate, and then, in a second step, cleave the C-C bond between the oxidized sugar and the aglycone (8).Two types of C-glycoside-metabolizing enzymes catalysing the first step have been described: NAD(H)-dependent oxidoreductases (from intestinal organisms) (9) and oxygen-depend ent enzymes such as CGOx (from soil microorganisms) (8).Our groups showed recently that both ScPOx and PsPOx can efficiently oxidize C-glycosides such as mangiferin and puerarin while also showing (low) activity on glucose and xylose (6).In fact, PsPOx has been termed a C-glycoside-3-oxidase based on these recent studies (10).The analysis of PsPOx X-ray crystal structures in complex with glucose and mangiferin, combined with mutagenesis and molecular dynamics simulations, revealed distinctive features in the active site that favor catalytically competent conformational states suitable for recogni tion, stabilization, and oxidation of the glucose moiety of the C-glycoside mangiferin (10).Since many members of this sequence space had been described as pyranose oxidases and since the C-glycoside-oxidizing enzymes oxidize a sugar in its pyranose form, we will be using the term pyranose oxidase for these enzymes here for consistency reasons.
Ancestral sequence reconstruction is a probabilistic-based approach to infer protein sequences that might have appeared in the past by using the sequence space of present-day proteins and a phylogenetic tree (11,12).These resurrected proteins usually possess additional unique properties such as higher thermostability, improved solubility, or substrate promiscuity compared to their successors (13).A number of bacterial and mammalian proteins and enzymes have been studied to date by ancestral reconstruction not only for protein engineering, but also to elucidate structure-function relationships or their evolution (14)(15)(16)(17).Bearing in mind the common sequence space of POx and CGOx (6,7), we aimed to elucidate the underlying evolutionary significance and study how the activity for glycosides and monosaccharides evolved over time.Based on a multiple sequence alignment and a phylogenetic tree, we explored the diversity and functional variety of this bacterial sequence space by comprehensively characterizing seven different ancestral enzymes and compared their properties to extant bacterial and fungal POx.Here, we only focused on sequences of actinobacterial (Actinomycetota) origin since all hitherto characterized bacterial enzymes of the POx family (ScPOx, KaPOx, CarA, and PsPOx) are from species of this phylum.

Common ancestor reconstruction
Data sets for ancestral analysis were collected using several amino acid sequences that belong to the POx sequence space (5) as seeds.After curation and selection of the data, as described in the Materials and Methods section, the data set consisted of 469 sequences.The final reconstructed phylogenetic tree exhibited a pronounced topology with four different clades (Fig. 1 likelihood phylogenetic tree calculated by RAxML with bootstrap values is shown in Fig. S1.Based on their position in the phylogeny, we then selected seven different ancestors for expression and further characterization.We selected node N35 as the common ancestor of all ancestral nodes within the tree, node N67 to get a better understanding of clade II, nodes N167 and N202 for clade III, and nodes N284, N327, and N383 for a closer characterization of clade IV.Posterior probabilities for each position in the primary sequence of the selected nodes confirmed the trend already known for calculation of ancestral sequences: the oldest node, in our case N35, showed the highest number of positions (110 out of 551) with ambiguous probability and thus had the lowest total posterior probability (83%).In addition, the closer to present-day sequences the calculated ancestor got, the higher the number of unambiguous places was, and thus the higher the number of total posterior probability (Table S1).The posterior probability distribution of all ancestors throughout their primary sequence is shown in Fig. S2.

Expression of ancestral pox sequences and general characterization
Expression of the seven ancestral pox sequences was done on a 5-mL scale using two different Escherichia coli expression hosts [BL21(DE3) and T7(pGro7)] as well as two different induction strategies (IPTG at 30°C or lactose at 18°C), and the best-performing combination of host and induction, showing the highest activity with D-glucose and DCIP as electron acceptor, was chosen for further studies without further optimisation (Table S2).After purification of the ancestors by metal affinity chromatography, the yields varied between 0.1 and 13 mg POx protein per 1 L of liquid culture.SDS-PAGE of ancestral enzyme preparations after purification is shown the Fig. S3.Significant fractions of the purified proteins were not fully loaded with FAD as shown by spectrophotometric analysis and were hence reconstituted by incubation with free FAD after the purification step.UV/Vis spectra of the enzymes after reconstitution with FAD are shown in Fig. S4, while absorption maxima and associated extinction coefficients are summarized in Table S3.When measuring activities, obtained values were corrected for the active fraction, i.e., the enzyme fraction containing the FAD cofactor.The thermostability of ancestral enzymes was measured by the ThermoFAD assay, and the resulting thermal transition temperatures T m , indicating unfolding and release of FAD, ranged from 48 to 62°C (Table S2).It had been noted previously that ancestral proteins tend to show increased thermostabilities (13)(14)(15)(16), which we however did not observe for most ancestors when compared to the monomeric extant enzymes ScPOx and PsPOx, except for N167 and N327, showing considerably higher T m values.Often expressability is improved for ancestral genes as well (13)(14)(15)(16), which we also did not see for our ancestors (Table S2).All ancestral enzymes but N35 and N67 showed a monomeric state at pH 7.5.Based on size exclusion chromatography, N67 was present as a dimer and the results for N35 were ambiguous, indicating both trimeric and pentameric states, which could be the result of nonspecific protein-protein interactions.Size exclusion chromatograms of some selected ancestors are shown in Fig. S5.

Reactivity with glycosides varies in different POx clades
The purified ancestral enzymes were initially screened for activity with monosacchar ides (D-glucose and D-xylose), 6-C-glycosides (aspalathin, carminic acid, homoorientin, isovitexin, and mangiferin), one 8-C-glycoside (puerarin), O-glycosides (fraxin, naringin, salicin, and rutin), and one S-glycoside (sinigrin).In addition, the extant proteins PsPOx and ScPOx were tested for activity with these different substrates as well, whereas KaPOx and TmPOx were only tested with Cand O-glycosides since their reactivity with the monosaccharides had been studied in detail before.This activity screening was performed using the DCIP assay (and hence dehydrogenase activity) for the POx ancestors N35, N167, N202, N284, N327, and N383 as well as for PsPOx and ScPOx, since this resulted in less background noise in the measurements.For N67, KaPOx, and TmPOx, we followed the oxidase activity using the AmplexRed-or ABTS-coupled assay, as N67 showed only negligible dehydrogenase activity, and data for the extant enzymes KaPOx and TmPOx were also reported for these assays.Activity of at least some of the POx proteins was detected for the monosaccharides D-glucose and D-xylose, the C-glycosides aspalathin, homoorientin, isovitexin, mangiferin, puerarin, and the O-glycoside fraxin (Fig. 2a).At the same time, the other substrates tested such as carminic acid, naringin, salicin, rutin, and sinigrin were not oxidized.Figure 2b summarizes and compares the specific activity data obtained for ancestors and present-day enzymes under screening conditions.All enzymes showed activity with D-glucose and D-xylose, albeit to a largely varying extent with significantly higher activities found for KaPOx and TmPOx.These two enzymes also showed no activity with any glycoside, confirming that members of the bacterial POx clade I and fungal enzymes evolved toward pronounced activity for only monosaccharides.All other enzymes showed activity with homoorientin and isovitexin.Besides that, N35 showed activity with fraxin, mangiferin and puerarin, N67 with aspalathin, N202 and ScPOx with mangiferin and puerarin, N167 and N284 with fraxin, mangiferin and puerarin, N327, N383, and PsPOx with fraxin and mangiferin.It is interesting to note that we observed a considerable increase in activity for puerarin along the phylogenetic clade III line N35-N167-N202-ScPOx, as well as an increase for mangiferin and isovitexin along the phylogenetic clade IV line N35-N284-N327-N383.The oldest ancestors, N35 and N284, also showed the widest reactivity with differ ent electron donor substrates.It seems that some activities, e.g., reactivity with the O-glycoside fraxin in clade III or with puerarin in clade IV, were lost in these clades during evolution, possibly indicating that these enzymes evolved toward a narrower substrate reactivity and hence to an increased specialization.

Catalytic properties indicate phylogenetically distinct substrate preference
Based on the initial screening of substrate preferences and the fact that all bacterial enzymes, except KaPOx, reacted with the monosaccharides D-glucose and D-xylose as well as with the glycosides homoorientin and isovitexin (Fig. 2b), we selected D-glucose as a reference monosaccharide and homoorientin as a reference C-glycoside for the determination of the apparent steady-state constants.Additionally, N67 was character ized for aspalathin; N35, N202, and ScPOx for puerarin; and N35, N284, N327, N383, and PsPOx for fraxin.Because of the very low yield of N167 after purification, this ancestor was not further characterized.
The steady-state kinetic parameters k cat , K m , and k cat /K m are summarised in Table 1.The fungal enzyme TmPOx and the bacterial clade I enzyme KaPOx clearly show the highest catalytic efficiency k cat /K m with D-glucose.Apart from these two, the oldest ancestor N35 showed the highest catalytic efficiency based on its low Michaels constant K m .Furthermore, the catalytic constants for D-glucose showed a very clear tendency along the evolutionary line of ancestors.The catalytic efficiency decreased stepwise along the phylogenetic line N35-N284-N327-N383-PsPOx (from 29 to 0.23 M −1 s −1 ) as well as for the line N35-N202-ScPOx (from 29 to 0.17 M −1 s −1 ).This decrease can mainly be attributed to a gradual shift toward very unfavorable K m values (0.27, 2,100, and 260 mM for N35, ScPOx, and PsPOx, respectively).In contrast, the catalytic efficiency for homoor ientin increased from N35 along these two lines, at least to some extent.This increase is more pronounced for the ancestors along the various nodes, not so much for the extant enzyme PsPOx (5,500, 100,000, and 23,000 M −1 s −1 for N35, N383, and PsPOx, respectively) and not at all for ScPOx (5,500, 10,000, and 410 M −1 s −1 for N35, N202, and ScPOx, respectively).This shift in the substrate preference along the evolutionary line can be Difference between substrates that affect the grouping of the enzymes on the score plot can be observed.
seen more clearly when regarding the selectivity ratio (18), i.e., the ratio of the catalytic efficiency for the two substrates homoorientin (homo) and glucose (Glc), (k cat,homo / K m,homo ) • (k cat,Glc /K m,Glc ) −1 .This ratio increases from 147 for N35 to 2,080 for ScPOx and 62,100 for PsPOx.An overview of the changes in the ratio of kinetic constants for the two substrates homoorientin and glucose is given in Fig. 3, which illustrates that during evolution the substrate preference shifted significantly toward the C-glycoside.A comparable shift can be seen for the reactivity with the O-glycoside fraxin (frax).Here, activity was only found with the oldest ancestor N35 and in the evolutionary clade IV line N284-N327-N383-PsPOx, while none of the other enzymes tested oxidized this glycoside.The selectivity ratio (k cat,frax /K m,frax ) • (k cat,Glc /K m,Glc ) −1 increased stepwise from 26.9 for N35 to a maximum value of 2,330 for N383, and then decreased for the extant member of this line, PsPOx, to 270.Again, these data show a gradual shift in substrate selectivity from glucose to a glycoside.Activity with puerarin was only found in N35 and the clade III line, and here the selectivity ratio of puerarin to D-glucose shifted from 31 for N35 to 3,300 for ScPOx.

Comparison of structural predictions
Only recently, the crystal structure of PsPOx was reported at 2.01 Å resolution, as were the structures of PsPOx in complex with glucose and mangiferin (10).These structures revealed features of the enzyme that are important for substrate binding and reactivity, namely the mobile substrate loop projecting into the active site cavity and the inser tion-1 segment interacting with this substrate loop.We took the PsPOx structure as template for structural comparison and target-based structural modeling for the various ancestors studied here.The PsPOx structure was used for structural comparison with extant isoforms of known structures, TmPOx (19) and MtCarA (7) (Fig. 4a).
PsPOx, MtCarA, and TmPOx display a highly conserved overall fold, with differences mainly found in the head domain, the oligomerization loop (the arm domain), both of which are important for tetramerization of TmPOx, or the insertion-1 segment, which occupies the same region as the oligomerization loop but controls the entrance of substrates into the active site of PsPOx (Fig. 4a) (10).Even though the sequence identities of some ancestors and the two extant bacterial enzymes were low when compared to the sequence of PsPOx (Table S4), the structural predictions of the ancestral enzymes, ScPOx and KaPOx showed a well-conserved overall fold, an FAD-and a substrate-binding domain, with the characteristic Rossmann fold-like structure in the FAD-binding domain and a combination of α-helices and β-sheets in the substrate-binding domain (Fig. 4b).Despite this well-conserved overall fold, local similarity data prediction showed that some parts of models of the ancestral enzymes, mostly flexible loops, had only low confidence (Fig. S6).As these loop regions, such as the flavinylation motif, the sub strate loop, and the insertion-1 segment, are known to play important roles in sub strate specificity and binding, we investigated them in more detail based on sequence alignment (a full-length sequence alignment is shown in Fig. S7).Differences in the number of mutations of target ancestors compared to their extant forms and number of consensus mutations are summarised in Table S5.
In PsPOx, the access of substrate to the isoalloxazine (within the active site) is through a solvent-accessible cavity that is lined by the flavinylation motif ( 125 AAHW 128 ) non-covalently binding the cofactor, the substrate loop ( 344 LDASPVPLADDD 355 ) and the insertion-1 segment ( 60 PDSRSLAQRASEGPGAGAATVNSPGAVKSGERRA 93 ) (Fig. 4a) (10).Both the sequences of the flavinylation motif and the substrate loop show distinct differences between the different POx clades (Fig. 4b).These (consensus) sequences for the flavinylation motif and the substrate loop are GTWH and DAFHYGDVP in clade I, GAWH and SETTPFPMDP in clade II, GVHW and R(P/T) (F/Y)VDEDG(E/R) in clade III, and (G/A)AHW and LDASPVPL(A/G) (D/E)DD in clade IV, respectively.The Thr residue immediately following the flavinylation motif, which was shown to form a H-bond to N5 of the isoalloxazine ring in MtCarA (7), is conserved in all sequences (Fig. S7).The sequence of the insertion-1 segment is not well conserved among ancestors and extant  The steady-state parameters were measured using the dehydrogenase assay with DCIP as electron acceptor unless labeled with *, which indicates the use of the oxidase assay.
POx enzymes; however, enzymes belonging to the same clade show a similar fold of this insertion-1 segment in the models.The structural prediction of KaPOx, which is dimeric, does not indicate the presence of a comparable insertion-1 domain but oligomeric domains (arm and head domains), comparable to fungal TmPOx.Models of N35 and N67 do not indicate oligomerization domains but rather showed a partial or full insertion-1 segment, even though they presented higher oligomerization states in solution (dimer, or a mixture of trimer pentamer) (Table S2).
Taborda et al. (10) provided structural evidence that substrate binding in the active site of PsPOx and catalysis are orchestrated by residues K55, R94, T129, Q297, Q340, H440, and N484 (Fig. 4a).These residues form interactions (hydrogen bonds) with either the D-glucose moiety or the mangiferin aglycone.H440 and N484 play a key role in catalysis and are strictly conserved in the ancestors and extant POx enzymes (Fig. S7).Residue K55 in PsPOx was shown to be positioned close to the aglycone part of mangiferin, and residues R94 and Q297 were suggested by both structural analysis and MD simulations to anchor the bulky glycoside substrate near the active site (10).K55 is conserved in the ancestors of clade IV (or replaced by the conservative substitution Lys -> Arg in N284 and N35), while ScPOx and clade III ancestors have a His at this position.Similarly, R94 is conserved in clade IV and N35 but replaced by a Glu or Thr residue in clade III, and Q297 is mainly conserved in clade IV but replaced by His in N284 and all the other bacterial enzymes.These differences in residues that directly interact with the electron donor substrate could be responsible for the variability in the reactivity with different glycoside substrates that can be observed between different clades.Figure  4b gives an overview of residues in the vicinity of the active site, taking up positions identical to the one described for PsPOx playing an important role in substrate binding and catalysis.
Finally, we were interested in how the split between the two distinct classes of POx -monomeric, bacterial enzymes active mainly on glycosides and oligomeric, bacterial (or fungal) enzymes acting on monosaccharides-took place during evolution.To this end, we modeled the structures of ancestors at nodes linking the oldest ancestor N35 to fungal POx, namely N1, N6, N12, N22, N29, and N34 including present-day KaPOx (Fig. 1).The major differences identified in the structural predictions of bacterial and fungal POx are concerning the head and arm domain, the substrate loop, and the insertion-1 segment, and therefore we focused on these regions in the structural comparison (Fig. 4c; Fig. S8).When comparing the structural predictions of N1, N6, N12, N22, KaPOx, N29, and N34 to those of N35 (the last common ancestor of all bacterial members of this sequence space), PsPOx and TmPOx, we observed that all of these possess pronounced arm and head domains, which are important for oligomerization as seen in fungal POx and KaPOx, and a substrate loop that differs from that of other bacterial POx or CGOx (Fig. 4c).While the size of the arm domain does not change along the line from N34 to TmPOx, the head domain increases in size (from 31 residues in N35 to 55 in N1 and 61 in TmPOx), and furthermore, it changes its conformation along the phylogenetic line N34, N29, KaPOx, N22, N12, N6, N1, and TmPOx.The structural prediction of N35 does not show the pronounced arm and head domains but rather an insertion-1 segment resembling a rotated arm domain, and a "barrel"-shaped bottom, which then seems to have slowly evolved to the head domain of fungal POx.The solvent-accessible surface area of monomers and dimers of N35, N34, N29, KaPOx, N22, N12, N6, and N1 displayed a general increasing tendency along the phylogenetic line PsPOx, N35, N34, N29, KaPOx, N22, N12, N6, N1, and TmPOx when comparing it to the same property of crystal structures of PsPOx and TmPOx (data not shown).

DISCUSSION
First studies on pyranose oxidase were performed on enzymes of exclusively fungal origin, and these were shown to efficiently oxidize D-glucose as well as other mono saccharides typically found in lignocellulose (D-xylose, L-arabinose, and D-galactose) preferentially at the C2 position but sometimes also at C3 or at both positions (20,21).When the first bacterial POx, an enzyme from Pseudoarthrobacter siccitolerans (PsPOx) (4), was reported in 2016 it was puzzling to see some of its catalytic properties, above all the very unfavorable kinetic constants for D-glucose with a Michaelis constant of 460 mM and low catalytic efficiency of 0.45 M −1 s −1 (10) as glucose had been considered the natural substrate of pyranose oxidases.This, however, could be explained after Kumano et al. isolated a C-glycoside-catabolising microorganism, Microbacterium sp., from soil that can break the C-C bond in carminic acid by a two-step mechanism (7).After initial oxidation of the sugar moiety, C-glycoside deglycosidases cleave the C-C bond between sugar and aglycone by acid/base catalysis (8).The enzyme initiating this deglycosylation reaction in Microbacterium sp. was shown to oxidize the glucose moiety of carminic acid at the C3 position (and to some extent also at C2) and termed "C-glycoside oxidase" or CarA (7).Subsequently, we showed that CarA belongs to the sequence space of POx, a member of the well-studied GMC superfamily (6), and PsPOx was proven to react favorably with another C-glycoside, mangiferin (K m of 0.49 mM, k cat /K m of 19,200 M −1 s −1 ).Since both fungal, monosaccharide-oxidizing POx (as well as the monosaccharide-oxidiz ing enzyme from the actinomycete K. aureofaciens (5)) and bacterial, glycoside-oxidizing pyranose oxidases belong to the same sequence space they must have evolved from the same ancestral enzyme.It was hence the objective of this study to investigate the phylogeny of actinobacterial POx and the historical trajectory of substrate preference throughout different clades in detail.
The sequence space of actinobacterial POx is clearly divided into four clades as confirmed by high bootstrap values (Fig. 1).An initial screening for substrate reactivity (Fig. 2a and b) gave a clear distinction between fungal TmPOx as well as clade I KaPOx and the other present-day bacterial enzymes, PsPOx and ScPOx, as well as ancestors positioned in clades II-IV.The former only reacted with monosaccharides [specific activities with D-glucose of 12 U/mg for TmPOx and 7.6 U/mg for KaPOx (1,5)] and showed no reactivity with any of the glycosides tested.At the same time, the latter only had very low activities with monosaccharides (0.001-0.35U/mg for D-glucose) and most enzymes oxidized various glycosides with higher efficiency.This initial screening also showed significant differences among clades II-IV with respect to the glycosides accepted.The C-glycosides homoorientin and isovitexin were oxidized by all ancestors and extant members of these clades, the C-glycoside puerarin mainly by the extant member ScPOx and ancestors belonging to clade III (as well as by the oldest common ancestor N35 and the oldest member of the ancestral line of clade IV, N284), and the O-glycoside fraxin was only oxidized by extant member PsPOx and the ancestors of clade IV (as well as by the oldest common ancestor N35).The oldest common ancestor N35 showed the most diverse substrate reactivity, which corroborates what has been found for other ancestors as well, namely that they can be multifunctional and promiscuous (22), before they eventually evolve into more specialized and efficient enzymes.N35 was also the generalist among the ancestors with respect to activity with glycosides and monosaccharides, judging from the selectivity ratio, i.e., the ratio of the catalytic efficiency for the two substrates homoorientin (homo) and glucose (Glc), (k cat,homo / K m,homo ) (k cat,Glc /K m,Glc ) −1 .This selectivity ratio increased in both the ancestral lines and extant members of POx clades III and IV, both because of an increase in the Michaelis constant for D-glucose and an increase in the catalytic constant of the glycoside, which is especially notable in clade IV (k cat,homo values of 0.03, 4.89, and 1.6 s −1 for N35, N383 and PsPOx, respectively).POx members of clades III and IV show very low K m values for homoorientin in the micromolar range, which could reflect the low concentrations of glycosides encountered by bacterial organisms in their natural habitats.N67, the only ancestor of clade II studied in this work, shows a reduced selectivity ratio for homoor ientin and D-glucose compared to N35, and hence shows comparable but rather low catalytic efficiencies for these two reference substrates.Unfortunately, no present-day POx member of clade II has been studied so we cannot say if these properties are also reflected in extant enzymes of this clade.The extant member of POx clade I, KaPOx, completely lost reactivity with any of the glycosides tested in this study and became specialized for the oxidation of monosaccharides as did the fungal enzyme TmPOx.
We recently speculated that the catalytic mechanisms and binding specificities within the POx family of enzymes are modulated through different homooligomerization states (10); i.e., the loss of reactivity with the bulkier glycosides is caused by oligomerization-KaPOx is a dimer and TmPOx is a homotetramer.The structures of PsPOx and MtCarA show an open active site with access only controlled by the substrate loop.In contrast, the crystal structure of TmPOx indicates a very restricted access to the FAD through an internal void and a narrow substrate channel (23,24).Here, the oligomerization of clade I POx seems to be initiated by a gradual extension of the head domain, which is important for subunit interaction in TmPOx (19), as well as by the evolution of the insertion-1 segment into an arm domain, supporting well the hypothesis that different functional oligomeric states contribute to enzyme specificity.Changes in both domains are supported by structural predictions of the evolutionary line of clade I POx (Fig. 4c; Fig. S8).
In this study, we only focused on sequences of actinobacterial (Actinomycetota) origin since all hitherto characterized bacterial enzymes of the POx family are obtained from species of this phyla.Genes coding for pyranose oxidases were also identified in other phyla of bacteria, such as the Pseudomonadota (classes Alpha-and Gammap roteobacteria) (6).These enzymes have not been studied biochemically in detail, yet various Rhizobium, Agrobacterium, and Stenotrophomonas species (phylum Pseudomo nadota) and Deinococcus aerius (phylum Deinococcota) were shown to contain genes that are phylogenetically related to bacterial pox genes.Furthermore, the correspond ing gene products were able to oxidize either various glycosides (25) or glucose (6).FAD-dependent oxidoreductases from Rhizobium, Agrobacterium, and Stenotrophomonas species oxidized the sugar moieties of a range of different ginsenoides, resulting in their deglycosylation.Because of this wide unexplored number of POx sequences, we can expect much wider reactivities of bacterial POx/CGOx with glycosides than currently known.

Conclusion
Actinobacteria (Actinomycetota) are found in soil or water where they contribute to the degradation of organic matter including plant material, which may contain lignocellulo sic material or various glycosides.We conclude from our results that an ancestor of pyranose oxidase, a generalist oxidizing both monosaccharides and various glycosides, evolved into various clades of specialized enzymes that oxidize primarily monosacchar ides or glycosides.Fungi sharing the habitat with Actinobacteria may have acquired pox genes from organisms forming monosaccharide-oxidizing POx by horizontal transfer.

Generation of enzyme clusters and ancestral sequence calculation
PSI-BLAST (26) searches in the NCBI database were conducted in April and May 2020.The following sequences (accession numbers and annotation) from published data sets (5) were used as seed sequences for the initial search: A0A2G7ETB5 (choline dehydrogen ase-like flavoprotein from Streptomyces sp.), A0A101RTT1 (choline dehydrogenase from S. canus), A0A0S9PHX4 (choline dehydrogenase from Agreia sp.), A0A1M5HNJ5 (choline dehydrogenase from Geodermatophilus nigrescens), F3P4S3 (conserved domain protein from Actinomyces sp.), A0A0Q8AVH9 (choline dehydrogenase from Microbacterium sp.), A0A260UCT4 (choline dehydrogenase from Rhodococcus fascians), A0A164DUC4 (6′′ ′-hydroxyparomomycin C oxidase from Agromyces sp.), A0A1H5VE67 (pyranose oxidase from Saccharopolyspora jiangxiensis), and K4QWW1 (GMC_oxred_C domain-containing protein from Streptomyces davaonensis).The search was restricted to Actinobacteria (taxon identifier 201174).Sequences showing identities of 35-99% as well as a threshold value of 0.005 and an E-value from 0 to 1e−50 were used as templates for the next iteration step (PSI-BLAST Iteration 2).Sequences from the second iteration with identities of 35-99%, a threshold value of 0.005, a query coverage 60-100% and an E-value from 0 to 1e−50 were selected as data set for the ancestral sequence reconstruction.The algorithm Usearch (v.11.0.667) was performed (27), allowing sequences with identity cutoffs higher than 97% being discarded.Sequences shorter than 420 amino acids were also deleted from the data set manually.The tool SeqScrub (28) (http://seqscrub.gabefoley.com/, accessed in September 2021) was used for uniformly renaming all sequences as well as discarding sequences with illegal amino acid characters.After inferring a multiple sequence alignment with MAFFT (v.7.0.26) using the method G-INS-1 (29), sequences not containing GMC structural motifs (2) were discarded from the data set.Trimming of the alignment was performed using Gblocks (v.0.91.1) (30).A substitution model was assessed using ModelTest (v.3.7)(31), and when using the Bayesian and Akaike information criteria the best model predicted was .The phylogenetic tree was calculated using PhyML (v.3.3_1)(33) using the Le-Gascuel substitution model.Both Gblocks and PhyML were accessed in September 2021 through the web interface NG Phylogeny (34) (https://ngphylogeny.fr/, accessed in September 2021).The tree was rooted on the midpoint and edited using the software FigTree (v.1.4.4)(The University of Edinburgh, UK).Bootstrapping of the tree was performed using RAxML (v.8) (35), where the tree converged after 570 iterations.
The tool Graphical Representation of Ancestral Sequence Predictions (GRASP) (36) (http://grasp.scmb.uq.edu.au/,accessed in September 2021) was used for inferring ancestral sequences using marginal reconstruction and Le-Gascuel as the evolution ary model, with the PhyML-constructed tree as input.Based on their position in the phylogenetic tree, seven ancestors were chosen for further analysis.The sequences were downloaded from GRASP, together with a list of posterior probabilities for each position in the primary sequence.The mean probability for each ancestor was calculated based on aligning the primary sequence to a matching probability calculated by GRASP.The probability of each position was depicted using the software SigmaPlot (v.14.0) (Systat Software, Düsseldorf, Germany).Amino acid sequences of all target ancestors are in Table S6.

Synthesis and cloning of ancestral sequences
Structural predictions for the ancestors were predicted using RoseTTAFold (https:// robetta.bakerlab.org/submit.php,accessed in January 2022) (37).The most probable model, annotated as "Model 1, " was used for determining the C-terminal flanking region.Ancestral structural predictions were explored in the software PyMOL Molec ular Graphics System (v.2.5.2, educational license) (Schrödinger, New York, NY, USA).C-terminal flanking sequences of each ancestor were determined by aligning structures to the predicted structural prediction of ScPOx (6).Coding sequences for each ances tor (excluding the C-terminal flanking sequence), followed by the TEV cleavage site (ENLYFQS) were cloned into the pET21+ vector between the restriction sites BamHI and XhoI.Genes and the TEV cleavage site were codon-optimized for E. coli.The ribosome binding sequence (nucleotide sequence TTAAGAAGGAGATATACC) was added after the BamHI restriction site in the pET21+ vector backbone.All constructs were ordered from Twist Biosciences (South San Francisco, CA, USA), and contained an ampicillin-resistance marker cassette and a C-terminal His 6 tag.

Gene expression, initial screening for the optimal induction and expression strain, and purification of recombinant proteins
Calcium-competent E. coli BL21(DE3) and T7 express strains (already transformed with the vector pGro7 overexpressing groES-groEL to increase soluble expression; New England Biolabs, Ipswich, MA, USA) were transformed with the constructs using the heat-shock transformation method.Cells were grown in 250 or 500 mL LB medium with ampicillin (100 µg/mL) inoculated with overnight cultures diluted 1:90.For the expres sion of the recombinant genes, bacterial cultures were grown at 37°C with agitation (140 rpm) until OD 600 reached 0.6-1.After that, induction of gene expression was started by adding 10 mM lactose or 1 mM IPTG (Thermo Scientific, Waltham, MA, USA) and the temperature was decreased to 18°C or 30°C, respectively.Expression continued for ~20 or 3 h, respectively.Cell pellets were collected by centrifugation (20 min, 5,000 rpm, 8°C, centrifuge Avanti J-26 XP, rotor JA-10; Beckman Coulter, Brea, CA, USA).Properties of the recombinantly expressed proteins were determined by ProtParam (38) (https:// web.expasy.org/protparam/,accessed in February 2022).All chemicals were from Carl Roth (Karlsruhe, Germany) unless indicated otherwise.Initial screening was done on a 5-mL scale.The overproduction of the seven ancestral proteins was induced with lactose and IPTG in two different strains, BL21(DE3) and T7(pGro7).Cell pellets were disrupted using sonication with an ultrasonic homogenizer (Sonoplus; Bandelin, Berlin, Germany) at 120 V and 30% cycle for 5 min, repeated three times with 5 min breaks.The soluble fraction was separated from cell debris by centrifugation (1 h, 20,000 rpm, 4°C, centrifuge Avanti J-26 XP, rotor JA-25.50).Initial activities were tested with D-glucose (final concentration 500 mM), mangiferin or puerarin (each at a final concentration at 0.1 mM) using the DCIP (dichlorophenol indophenol; Sigma-Aldrich, St. Louis, MO, USA) and AmplexRed (10-acetyl-3,7-dihydroxy phenoxazine; Chemodex, St. Gallen, Switzerland) assays.The best-performing expression strain and induction strategy were selected based on the highest activity with 500 mM D-glucose in the DCIP assay.Assays for initial screening were performed as described below.
Batch cultivations with 5 L of medium were used to produce protein for subse quent purification by affinity chromatography (ÄKTA Go chromatography system; Cytiva, Marlborough, MA, USA).Cell disruption and separation of soluble fractions from cell debris were done as previously described (6).His-Trap columns (5 mL volume; Cytiva) were equilibrated with purification buffer (150 mM NaCl, 5% glycerol, and 50 mM Tris-HCl, pH 7.5) containing 30 mM imidazole.After the sample was loaded onto the column and washing of unbound proteins; the proteins of interest were eluted with a linear gradient (0-100%, 10 min) of purification buffer containing 500 mM imidazole.The selected fractions were pooled, desalted, and concentrated using ultraconcentrators (MWCO 30 kDa; Merck Millipore, Billerica, MA, USA).All other enzymes used in this study, including the extant bacterial enzymes PsPOx (Uniprot accession number A0A024H8G7), ScPOx (A0A117Q443), and KaPOx (A0A1E7NAU4) as well as the fungal enzyme TmPOx (Q7ZA32), were purified as previously reported (4)(5)(6)39).SDS-PAGE was performed as described (6) to determine the purity of fractions and concentrated proteins.

Determination of protein concentration (FAD loading) and oligomeric state; thermostability measurements
Protein concentrations were determined using a diode array spectrophotometer (Agilent Technologies, Santa Clara, CA, USA) and measuring absorbance at 280 nm.Reconstitu tion of FAD was performed by incubating the enzyme overnight at 4°C and removing the residual, unbound cofactor using ultraconcentrators (MWCO 30 kDa, Merck Millipore).The ratio of FAD-loaded and unloaded protein was calculated by comparing protein concentrations calculated using the absorbance at 450 nm with the extinction coefficient of FAD (11,300 M −1 cm −1 ), and total protein concentration calculated using the absorb ance at 280 nm.
For determination of the oligomeric state of proteins, the gelfiltration column Superose 12 10/300 (Cytiva) was operated with 50 mM Tris-HCl buffer, pH 7.5 150 mM NaCl at room temperature.To determine the void volume, blue dextran (M r = 2000 Da) was injected.The standard curve was calculated by running the standard protein mix Gelfiltration standard (Bio-Rad, Hercules, CA, USA) and the molecular mass of proteins was calculated using the standard curve.
The thermal transition temperature, T m , of proteins was determined using the ThermoFAD assay (21,40).Protein samples were diluted to 1 mg/mL in 25 µL buffer.Denaturation was performed 25-80°C with increments of 0.5° every minute.Measure ments were performed in triplicates using a real-time PCR cycler (iCycler; Bio-Rad) by following the increase in fluorescence signal of the FAD cofactor upon denaturation.The fluorescent signal was detected using the SYBR-green filters (mMyiQ detection system; Bio-Rad).

FIG 1
FIG 1 Reconstructed phylogeny of the POx and CGOx sequence space restricted to actinobacterial sequences by GRASP.Different clades are colored differently-Amycolatopsis and Streptomyces (clade I) in black, Microbacterium (clade II) in cyan, Streptomyces (clade III) in purple, and Arthrobacter and Microbacterium (clade IV) in pink.N35 is the common ancestor of all ancestral nodes included in this tree.The additional nodes N34, N29, N22, N12, N6, and N1 were also used in this study to better understand clade I based on structural investigation.The present-day enzymes PsPOx, MtCarA, ScPOx, and KaPOx, which were previously characterized in detail, are labeled as well (4-6).No fungal POx sequences are included in the sequence space.The bar represents the phylogenetic distance as amino acid substitution per site.

FIG 2
FIG 2 (a) Structures of monosaccharide as well as Cand O-glycoside substrates that were oxidized by at least one enzyme during the initial screening.(b) Heatmap showing specific activities (in mU/mg) of the seven POx ancestors in addition to the characterized fungal TmPOx and bacterial KaPOx, ScPOx, and PsPOx enzymes.Specific activities are shown for using 200 mM of the monosaccharide substrates and 0.2 mM of the glycoside substrates (final concentration in the assay), except for TmPOx and KaPOx, the data of which were taken from previous publications (labeled with *) (1, 5).(c) Principal component analysis.Upper plot (the score plot): the score plot is a graph, in which scores of the second principal component against scores of the first principal component are plotted.The groups of enzymes are numbered according to the following: (i) N393-N327-PsPOx-N284, (ii) ScPOx-N202-N167-N35-N67-KaPOx, and (iii) TmPOx.Lower plot (the loading plot); the loading plot is a graph in which coefficients of each variable for the first component against coefficients for the second component are plotted.

ab
The catalytic constant of the PsPOx dehydrogenase activity for homoorientin oxidation using the colorimetric assay was significantly lower than when following oxygen consumption.A k cat of 31 s −1 was obtained when the activity was measured by following oxygen consumption in an oxygraph, resulting in a k cat /K m of (3.4 ± 0.3) Activity measured by following the oxygen consumption in an oxygraph.c Values were not determined.d

FIG 3
FIG3 Relative change of the ratio of the steady-state kinetic constants for the C-glycoside homoorientin and D-glucose for N35, N67, N202, ScPOx, N284, N327, N383, and PsPOx.An increase or a decrease in this ratio is depicted by arrows (the sign ↑ represents a higher and the sign ↓ a lower value than 1e + 0).

FIG 4
FIG 4 (a) Comparison of structures of bacterial POx from Pseudoarthrobacter siccitolerans (PsPOx) (10) (PDB code 7QF8) (gray), bacterial FAD-dependent C-glycoside 3-oxidase from Microbacterium trichothecenolyticum (MtCarA) (7) (PDB code 7DVE) (turquoise), and one subunit of fungal POx from Trametes multicolor (TmPOx) (19) (PDB code 1TT0) (green).Oligomerization domains from the TmPOx structure annotated as head and arm domain are highlighted as is the insertion-1 segment, a structural motif from the PsPOx crystal structure.The magnified part of the PsPOx active site shows residues important for forming the enzyme-substrate complex and catalysis.The FAD molecule is colored in yellow, the substrate loop in blue, and the flavinylation motif in pink.Catalytically important residues (Continued on next page)

FIG 4 (
FIG 4 (Continued) are colored according to the type of atom.(b) Comparison of structural predictions of the ancestors N35, N67, N167, N202, N284, N327, and N383 as well as present-day POx from Kitasatospora aureofaciens (KaPOx) (5) and Streptomyces canus (ScPOx) (6) with P. siccitolerans (PsPOx).The FAD molecule (from the structure of PsPOx) is highlighted in all structures.The insertion-1 segment is colored red in proteins where it is present, and the segment was identified based on the sequence alignment in Fig. S7.Sequences predicted for the flavinylation motif and the substrate loop are identified next to the models.As these regions are very flexible, only their primary sequence is shown.Catalytically important residues and FAD (from the PsPOx crystal structure) are colored according to the type of atom.Clade labels are also included next to the structural predictions.(c) Comparison of the structural predictions of monomeric N35, N34, N29, KaPOx, N22, N12, N6 and N1 with the crystal structures of PsPOx (PDB code 7QF8) and the TmPOx monomer (PDB code 1TT0).The arm and head domains as well as the insertion-1 segment are indicated.The number of amino acids forming the head domain is next to the structural predictions and is based on Fig. S8.FAD (from the structure of TmPOx) is colored according to the type of atom.
Escherichia coli expression hosts and type of induction), S3 (Lambda maxima and associated extinction coefficient of UV/Vis absorption spectra), S4 (Average pairwise sequence identity between extant and ancestral primary sequences belonging to the POx and CGOx sequence space), S5 (Difference in number of mutations of target ancestors compared to their extant forms and number of consensus mutations), and S6 (Amino acid sequences of the target ancestors) and Fig.S1(Phylogeny of the pyranose oxidase [POx] and C-glycoside oxidase [CGOx] sequence space), S2 (Barcode graphs displaying posterior probability distribution through the primary sequences), S3 (SDS-PAGE analysis of protein preparations purified by affinity chromatography), S4 (UV/Vis absorption spectra after reconstitution with FAD), S5 (Size exclusion chroma tograms of selected target ancestors), S6 (Predicted local similarity statistical data of ancestral structural predictions to the target crystal structure of TmPOx and PsPOx), S7 (Multiple sequence alignment of the fungal enzymes TmPOx (1) and Phanerochaete chrysosporium POx, present-day bacterial enzymes KaPOx, PsPOx, ScPOx and bacte rial FAD-dependent C-glycoside 3-oxidase from Microbacterium trichothecenolyticum (MtCarA), together with ancestral sequences of N35, N67, N167, N202, N284, N327, and N383) and S8 (Multiple sequence alignment of the sequences for fungal TmPOx and bacterial KaPOx, PsPOx, ScPOx, and MtCarA, together with ancestral sequences of N1, N6, N12, N22, N29, N34, and N35).