Whole genome structural predictions reveal hidden diversity in putative oxidative enzymes of the lignocellulose-degrading ascomycete Parascedosporium putredinis NO1

ABSTRACT Economic valorization of lignocellulose is paramount to realizing a true circular bioeconomy; however, this requires the development of systems and processes to expand the repertoire of bioproducts beyond current renewable fuels, chemicals, and sustainable materials. Parascedosporium putredinis NO1 is an ascomycete that thrived at the later stages of a wheat-straw composting community culture, indicating a propensity to degrade recalcitrant lignin-enriched biomass, but exists within an underrepresented and underexplored fungal lineage. This strain has been proven to be an exciting candidate for the identification of new enzymes targeting recalcitrant components of lignocellulose following the recent discovery of a new lignin β-ether linkage cleaving enzyme. The first genome for the genus Parascedosporium for P. putredinis NO1 genome was sequenced, assembled, and annotated. The genome is 39 Mb in size, consisting of 21 contigs annotated to contain 9.998 protein-coding sequences. The carbohydrate-active enzyme (CAZyme) repertoire was compared to 2570 ascomycete genomes and in detail with Trichoderma reesei, Fusarium oxysporum, and sister taxa Scedosporium boydii. Significant expansion in the oxidative auxiliary activity class of CAZymes was observed in the P. putredinis NO1 genome, resulting from increased sequences encoding putative lytic polysaccharide monooxygenases (LPMOs), oxidative enzymes acting within LPMO redox systems, and lignin-degrading laccases. P. putredinis NO1 scored above the 95th percentile for AA gene density across the ascomycete phylum, suggesting a primarily oxidative strategy for lignocellulose breakdown. Novel structure-based searching approaches were employed, revealing 17 new sequences with structural similarity to LPMO, laccase, and peroxidase sequences and which are potentially new lignocellulose-degrading enzymes. IMPORTANCE An annotated reference genome has revealed P. putredinis NO1 as a useful resource for the identification of new lignocellulose-degrading enzymes for biorefining of woody plant biomass. Utilizing a “structure-omics”-based searching strategy, we identified new potentially lignocellulose-active sequences that would have been missed by traditional sequence searching methods. These new identifications, alongside the discovery of novel enzymatic functions from this underexplored lineage with the recent discovery of a new phenol oxidase that cleaves the main structural β-O-4 linkage in lignin from P. putredinis NO1, highlight the underexplored and poorly represented family Microascaceae as a particularly interesting candidate worthy of further exploration toward the valorization of high value biorenewable products.

E nergy consumption continues to grow rapidly alongside improvements in living standards, and fossil fuels continue to play a major role in industrial and agricultural sectors.With their widely accepted environmentally damaging effects, the need to move away from the use of fossil fuels and toward a net zero carbon fuel source is ever more pressing.Lignocellulosic residues consisting of cellulose, hemicellulose, and lignin with minor amounts of pectins and nitrogen compounds offer the largest source of biomass for liquid fuel, chemicals, and energy (1).However, biorefining of lignocellulose has so far been limited by the recalcitrant nature of the intricate and insoluble lignin network (2,3).
Fungi are exceptional wood degraders and are predominantly used to produce an array of bioproducts, including commercial enzyme cocktails used in biological processing of lignocellulosic biomass.Ascomycetes, known as soft-rot fungi, degrade lignocellulose by penetration of plant secondary cell walls with hyphae that secrete complex enzyme cocktails in abundance at the site of attack (4).Parascedosporium putredinis NO1 is a soft-rot ascomycete identified previously as dominant in the later stages of a mixed microbial compost community grown on wheat straw (5).This behavior suggests that the fungus can efficiently deconstruct and potentially metabolize the more recalcitrant carbon sources in the substrate.Indeed, the recent discovery of a new oxidase enzyme that cleaves the major β-ether units in lignin in the P. putredinis NO1 secretome, which releases the pharmaceutically valuable compound tricin from wheat straw while simultaneously enhancing digestibility of the biomass (5), promotes a requirement for further exploration of this taxa.
Here, an annotated reference genome for P. putredinis NO1 reveals a repertoire of carbohydrate-active enzymes (CAZymes) and oxidative enzymes focused on degrading the most recalcitrant components of lignocellulose.Comparisons across the asco mycete tree of life suggest an increased proportion of oxidative enzymes within the CAZyme repertoire of P. putredinis NO1.Further investigation through CAZyme repertoire comparison with two other industrially relevant wood-degrading ascomy cetes, Trichoderma reesei and Fusarium oxysporum, as well as sister taxa Scedosporium boydii, reveals expansion in families of enzymes with roles in the oxidative dissolution of lignocellulose and demonstrates this fungus to be an exciting candidate for the identification of new lignocellulose-degrading activities.
Novel approaches were used to search the P. putredinis NO1 genome for poten tially unannotated enzyme sequences with relation to three types of classic oxida tive lignocellulose degraders: lytic polysaccharide monooxygenases (LPMOs), laccases, and peroxidases.Predicted structures were obtained for >96% of the protein-coding sequences in the genome.Structural searches were found to be effective at identifying multiple sequences for potentially novel proteins involved in lignocellulose breakdown, which had low levels of structural similarity to the classic oxidative lignin and crystal line cellulose-degrading enzymes.These sequences were also missed by sequence and domain-oriented searches.Further investigation and comparison of structures revealed varying levels of structural overlap despite the lack of sequence similarity.This strat egy of combining search approaches can be adopted to identify divergent enzyme sequences which may have alternate lignocellulose-degrading activity, variation in substrate specificity, and different temperature and pH optima.
Further investigation and characterization of such lignocellulose-degrading enzymes add to the wealth of enzymes which can be incorporated into commercial enzyme cocktails to improve their effectiveness and boost the efficiency at which biomass is converted to renewable liquid fuel and value-added chemicals.

The genome of P. putredinis NO1 suggests a strategy to degrade the most recalcitrant components of lignocellulose
The P. putredinis NO1 genome was sequenced using nanopore sequencing with the Oxford Nanopore Technologies' (ONT) MinION system to avoid errors in the assembly and annotation of coding regions resulting from long regions of repetitive DNA in eukaryotic genomes (6).The genome is 39 Mb in size, and the assembly consists of 21 contigs, containing 9998 protein-coding sequences.To investigate the lignocellu lose-degrading enzyme repertoire of the P. putredinis NO1 genome, all protein-coding sequences were annotated for CAZyme domains using the dbCAN server (7).In total, 795 CAZyme domains were predicted in the P. putredinis NO1 genome, and the distribution of these domains across the CAZyme classes can be seen in Fig. 1A.Glycoside hydro lases (GHs) are the most abundant CAZyme class with 290 identified.While auxiliary activities (AAs) also make a large contribution with 162 domains.Glycosyl transferases contribute 113 domains, and carbohydrate esterases contribute 51.Polysaccharide lyases contribute the fewest, with only 18 domains.In addition to these catalytic classes, 161 carbohydrate-binding modules (CBMs) were also identified.
An interesting observation was the proportionally high number of AA class CAZymes observed in the P. putredinis NO1 genome.To investigate this further and more broadly within the scope of the ascomycete tree of life, CAZyme profiles of all available ascomy cete genomes were elucidated (Fig. 2).To filter poorly represented lineages, genera and families with less than three and eight species-level representatives, respectively, were filtered.It was clear that P. putredinis NO1 has one of the highest proportions of AA class CAZymes within its repertoire relative to total CAZymes among ascomycete fungi.P. putredinis NO1 was a substantial outlier within the order Microascales (14.19 ± 3.15) and belonged above the 95th percentile for AA gene density among the highest AA populated genomes (20.38%), behind the genera Diaporthe (21.67 ± 1.3%) of the order Diaporthales, Cladonia (20.9 ± 1.74%) of the order Lecanorales, Xanthoria (20.55 ± 1.33%) of the order Teloschistales, and members belonging to the highly AA-enriched order Xylariales (19.36 ± 1.98%) containing contributions from densely populated genera Hypoxylon (19.78 ± 1.79%), Annulohypoxylon (20.09 ± 1.1%), Nemania (20.25 ± 1.82%), Neopestalotiopsis (21.56 ± 0.66%), Pestalotiopsis (21.49± 0.97%), Arthrinium (20.67 ± 2.13%) and Apiospora (20.87 ± 1.82%).The Parascedosporium sister taxa Scedospo rium (19.73 ± 1.07%) exhibited slightly lower AA density and belonged above the 90th percentile.Interestingly, Saccharomycetales (8.95 ± 4.63%), often associated with lignocellulose deconstruction, displayed significantly reduced AA abundance in stark contrast to neighboring phylogenies such as Teloschistales (19.37 ± 1.6%) and Pleospor ales (17.97 ± 1.98%).Members of the orders Helotiales (17.32 ± 2.36%), Botryosphaeriales (17.86 ± 1.3%), Magnaporthales (17.47 ± 0.98), and Diaporthales (18.78 ± 2.86) displayed a degree of enrichment of AAs, while members belonging to Hypocreales (13.93 ± 4.55%) and Mycosphaerellales (15.21 ± 2.89%) displayed lower abundances.Considering how the AA class of enzymes is predominantly associated with the degradation of lignin and crystalline cellulose, it highlights a potential strategy of the fungus to target these components.Indeed, in a mixed microbial community grown on wheat straw, the fungus was observed to become more dominant in the later stages of the culture, potentially due to its capacity to modify the more difficult to degrade components of lignocellulose for growth (5).The gene density distribution here highlights promising candidate lineages for further exploration for additional insights into lignin and cellulose turnover.
Within white-and brown-(basidiomycete), and soft-rot (ascomycete) fungi, it has been demonstrated that the CAZyme repertoire can vary greatly from species to species (8).To investigate the repertoire of P. putredinis NO1 in more detail, CAZyme domains were compared to that of three other wood-degrading ascomycetes.Scedosporium boydii is located within the sister taxon of Parascedosporium and has a genome of 43 Mb containing 1029 CAZyme domains.The genome and CAZyme complement of the soft-rot P. putredinis NO1 are larger than that of Trichoderma reesei, which contains 786 domains in 34 Mb of DNA.T. reesei is a mesophilic soft-rot fungus known for its ability to produce high titres of polysaccharide-degrading enzymes that are used in biomass-degrading enzyme cocktails (9).The genome of P. putredinis NO1 is slightly smaller than that of Fusarium oxysporum at 47 Mb, a phytopathogenic fungus containing an expanded CAZyme repertoire of 1430 domains (10).The lignocellulose-degrading activities of F. oxysporum have been well investigated in part due to its pathogenicity and ability to ferment sugars from lignocellulose breakdown directly into ethanol (11,12).
Examining the distribution of predicted CAZyme domains revealed that despite the similar overall number of CAZyme domains for P. putredinis NO1 and T. reesei, the proportion of AA class CAZyme domains is much higher in the genome of P. putredi nis NO1 (Fig. 1A).Proportionally, AA class CAZymes make the largest contribution to the CAZyme repertoire in the genome of P. putredinis NO1 compared to the other ascomycetes (Fig. 1).This again could suggest an oxidative strategy to target to lignin and crystalline cellulose.Although analysis of fungal secretomes would be required to confirm an improved ability of P. putredinis NO1 to deconstruct lignocellulosic compo nents, the high potential capacity for degradation of lignin and crystalline cellulose within the genome suggests that this is an important fungus to explore for new lignocellulose-degrading enzymes.This is especially relevant considering that this is the first genome assembly of the genus Parascedosporium.
The increased contribution of AA class CAZymes is mirrored by a reduced proportion of GH class CAZymes in the P. putredinis NO1 genome compared to T. reesei and F. oxysporum.This reduced GH contribution is also visible in the genome of S. boydii, a close relative of P. putredinis NO1.Despite the reduced number of the hydrolytic GH class CAZymes, the repertoires of P. putredinis NO1 and S. boydii contain the highest proportions of CBMs, domains typically associated with hydrolytic CAZymes such as GHs (13), but which have also been observed in oxidative LPMOs (14,15).The increased proportion of CBMs in the genome of P. putredinis NO1 could aid the catalytic CAZymes in accessing and binding to these substrates.Indeed, examining the CBM domains at the family level shows a high number of crystalline cellulose-binding domains in the genome of both P. putredinis NO1 and S. boydii, much higher than the number of domains assigned to any of the other CBM families (Fig. S1).

Closer investigation of the AA CAZyme repertoire reveals more about the lignocellulose-degrading strategy of P. putredinis NO1
The high number of AA domains, a functional class that notably contains LPMOs, peroxidases, and laccases (2), in the genome of P. putredinis NO1 is likely to endow this fungus with the ability to degrade recalcitrant components of the plant cell wall through a primarily oxidative mechanism.LPMOs are copper-containing enzymes that enhance polysaccharide degradation by generating new sites for attack by hydrolytic CAZymes (16).LPMOs have been shown to act on all major polysaccharide components of lignocellulose.Their oxidative action relies on exogenous electron donors provided by other AA family CAZymes, small molecule reductants and even lignin (2,16).It has recently been demonstrated that LPMOs readily utilize hydrogen peroxide (H 2 O 2 ) as a co-substrate also (17,18).
Investigating the distribution of AA domains across the AA families revealed AA9 family members to be the most abundant in the P. putredinis NO1 genome with 35 domains, the highest in the four ascomycetes investigated here (Fig. S2).This family contains the cellulose, xylan, and glucan-active LPMOs described above (19).AA3 and AA3_2 domains are the second and third most abundant families in the P. putredinis NO1 genome with 29 and 27 domains, respectively.These are flavoproteins of the glucose-methanol-choline oxidoreductase family, which includes activities such as cellobiose dehydrogenase, glucose-1-oxidase, aryl alcohol oxidase, alcohol oxidase, and pyranose oxidase (20).It is proposed that flavin-binding oxidative enzymes of this family play a central role in spatially and temporally supplying H 2 O 2 to LPMOs and peroxidases or to produce radicals that degrade lignocellulose through Fenton chemistry (17).The P. putredinis NO1 genome also contains 12 AA7 family domains, the family of glucooligosaccharide oxidase enzymes.These have recently been demonstrated to transfer electrons to AA9 LPMOs, which boosts cellulose degradation (21).Altogether, the apparent expansion of these LPMO system families suggests a potentially increased capacity for P. putredinis NO1 to oxidatively target crystalline cellulose.
The genome of P. putredinis NO1 also contains 12 AA1 family CAZyme domains.This family includes laccase and multi-copper oxidase enzymes which catalyze the oxidation of various aromatic substrates while simultaneously reducing oxygen to water (22).It has also been demonstrated that laccases can boost LPMO activity through the release of low molecular weight lignin polymers from biomass, which can in turn donate electrons to LPMOs (23).Additionally, seven domains belonging to the AA8 family were identi fied, a family of iron reductase domains initially identified as the N-terminal domain in cellobiose dehydrogenase enzymes but also found independently and appended to CBMs (2,24,25).These domains are believed to be involved in the generation of reactive hydroxyl radicals that can indirectly depolymerize lignin.There are six AA4 domains in the genome of P. putredinis NO1, the highest number of the four ascomycetes investigated here.These are vanillyl-alcohol oxidase enzymes with the ability to catalyze the conversion of a wide range of phenolic oligomeric compounds (26).These may act downstream of the lignin depolymerization catalyzed by other members of the AA class.There is a clear capacity in the P. putredinis NO1 genome for lignin depolymerization and metabolism through the multiple domains identified belonging to these families.The P. putredinis NO1 genome also contains two AA16 domains, a recently identified family of LPMO proteins with an atypical product profile compared to the traditional AA9 family of LPMOs and a potentially different mode of activation (27).
Gene expression of CAZymes in the P. putredinis NO1 genome has been explored previously during growth on glucose, compared to growth on wheat straw with samples taken at days 2, 4, and 10 (5).This transcriptomic data give a view of the potential strategy by which P. putredinis NO1 utilizes its expanded repertoire of AA class CAZymes.Upregulation of AA class CAZymes during growth on wheat straw compared to growth on glucose was observed predominantly at day 4 and then gave way to upregulation instead of mainly GH class hydrolytic CAZymes at day 10.This could represent a strategy where the recalcitrant lignin and crystalline cellulose are targeted first by LPMOs and lignin degraders such as laccases, making the polysaccharide substrates of hydrolytic GH enzymes more accessible.

Searching the P. putredinis NO1 genome for new oxidative lignocellulosedegrading enzymes with sequence-, domain-, and structural-based strat egies
Due to the evidence of a strategy for P. putredinis NO1 to target the most recalcitrant components of lignocellulose and the recent discovery of a new oxidase with the ability to cleave the major linkage in lignin from this strain (5), it was hypothesized that the genome of this fungus contains additional new enzymes for the breakdown of plant biomass.Particularly, this fungus could contain new enzymes with roles in degrading the lignin and crystalline cellulose components and which have not been annotated as CAZymes in this analysis.
Traditionally, homolog searching has been performed using a sequence-based approach (28).This approach uses either the primary amino acid sequence of an example protein to search an unknown database for similar sequences, or uses hidden Markov models (HMMs) to search for domains of interest (29).However, both techniques rely on primary amino acid sequence homology and neglect that proteins with distantly related sequences may have similar three-dimensional structures and therefore activity.The recent emergence of AlphaFold provides a resource for the fast and accurate prediction of unknown protein structures (30).Using this tool, structures were predicted for >96% of the protein-coding regions of the P. putredinis NO1 genome.These structures were used to create a database of protein structures into which structures of interesting enzymes such as those for LPMOs, laccases, and peroxidases could be searched.These structural searches for new enzymes were performed alongside sequence-and domain-based searches for comparison of the ability to identify interesting new candidates.
LPMO-related sequences were searched for in the P. putredinis NO1 genome using the sequence of an AA9 family LPMO from Aspergillus niger with the default E-value cut off of 1 × 10 −5 , with the AA9 HMM from Pfam and considering domain hits that fell within the default significance inclusion threshold of 0.01 (31), and the structure of the same A. niger LPMO with a tailored 'lowest percentage match' parameter.In total, 49 sequences were identified across the three searching strategies, and 33 of these sequences were also annotated by dbCAN as AA9 family of LPMOs (Table S1).With the objective of identifying new enzymes, the remaining 16 sequences were investigated further, and the distribution of the identification of these sequences across the three search strategies can be seen in Table 1.Two of the sequences, PutMoI and PutMoM, were identified by all three search approaches.These sequences both had conserved signal peptides with a conserved N-terminal histidine after the cleavage site, a characteristic feature of LPMOs (32).
When creating the structure database, it was tempting to filter predicted structures by pLDDT score, the AlphaFold metric for prediction confidence, to create a database solely of "high confidence" structures (30).However, pLDDT scores reflect local confidence and should instead be used for assessment of individual domains (33).The majority of the structures generated here had pLDDT scores of over 60%; however, pLDDT scores lower than 70% are considered low confidence (Fig. S3).Extracellular enzymes are of particular interest here, but these often have disordered N-terminal signal peptides which can reduce the overall pLDDT scores.Therefore, for secreted enzyme identification from AlphaFold structures, it is inappropriate to filter by pLDDT score.Indeed, the PutMoI structure mentioned above had a pLDDT score of 62%, considered to be low confidence (30), but which had characteristic features of LPMOs and which demonstrated structural similarity to the A. niger AA9 LPMO used for structural searches (Fig. 3A and B).The central beta-sheet structures align well to the A. niger AA9 LPMO for both PutMoI and PutMoM, but both also have additional loops of disordered protein which likely explains the relatively low PDBefold alignment confidence scores (Q-scores) of 0.23 and 0.34 for PutMotI and PutMoM, respectively.This again highlights the unreliability of structural confidence scores alone and demonstrates how manual inspection of structural alignments may prove more useful.Despite not being annotated as AA9 LPMOs by the dbCAN server for CAZyme annotation (7), both sequences were identified using the Pfam AA9 HMM and appear to be conserved AA9 LPMOs and, therefore, are not of interest in the discovery of new enzymes.
By utilizing multiple searching approaches, potentially new sequences with LPMOrelated activities can be identified.When searching for LPMO-related sequences, domain-based approaches identified all coding regions also identified by sequencebased searching as well as additional coding regions (Table S1).This pattern of domainbased searching identifying more coding regions than sequence-based searching was also observed for the other activities investigated (Tables S2 and S3).For structure-based searching, parameters of the searches could be tailored to identify additional coding regions with lower overall structural similarity but which may still be interesting.For example, searching the against the P. putredinis NO1 genome structure database with the structure of the A. niger AA9 LPMO and with the "lowest acceptable match" parame ter, which is the cutoff at which secondary structures must overlap between a query and a target set at 50%, yielded 30 coding regions (Table S1).Of these sequences, nine were not identified by the sequence or domain-based searching approaches and were investigated in more detail (Table 1).To investigate these further, sequences were searched against the National Center for Biotechnology Information (NCBI) non-redun dant protein database to identify related sequences (34); conserved domains were a Coding regions of proteins related to LPMOs identified through genome searching approaches with the sequence of an A. niger AA9 LPMO (E-value cutoff = 1 × 10 −5 ), the Pfam AA9 HMM (significance threshold = 0.01), and the structure of the A. niger AA9 LPMO (lowest percentage match = 50%) and which were not annotated as AA9 CAZymes by dbCAN.InterPro annotations were retrieved where possible predicted with InterPro; any CAZyme domains were annotated with dbCAN (7); the predicted structures were compared with structures in the PDB database (35); and secretion signal peptides were predicted with SignalP (36) in an attempt to elucidate the potential functions.Two of the sequences, PutMoA and PutMoD, are the two predicted AA16 LPMOs identified in the P. putredinis NO1 CAZyme repertoire earlier (Fig. S2).
Another two sequences, PutMoH and PutMoK, were not annotated as CAZymes but had conserved BIM1-like domains.BIM1-like proteins are LPMO_auxilliary-like proteins, function in fungal copper homeostasis, and share a similar copper coordination method to the LPMOs which they are related to (37).Although not likely to be involved in lignocellulose breakdown, this highlights how structurally related proteins in terms of active site or co-factor coordination structures can be identified with structural approaches where sequence-and domain-based approaches fail.Three of the nine sequences were also identified as being upregulated when P. putredinis NO1 was previously grown on wheat straw and compared to growth on glucose (Supplementary File 1) (5).Although this does not confirm the role of these proteins in lignocellulose breakdown, it provided another layer of information for the selection of interesting candidate sequences to investigate further.PutMoP was the most interesting sequence identified solely by the structural searching and showed upregulation during growth on wheat straw compared to glucose.It was not annotated as a CAZyme; no conserved domains were identified; and sequence homology was only observed in hypothetical proteins in the NCBI non-redundant protein database (34).Comparing the AlphaFold predicted structure of PutMoP to the A. niger AA9 LPMO revealed similarity at the central beta-sheet structure despite a very low Q-score of 0.05 (Fig. 3C and D).A secretion signal peptide was also predicted for this protein, suggesting an extracellular role.This immunoglobulin-like distorted β-sandwich fold is a characteristic structural feature of LPMOs and is shared across the LPMO CAZyme families (38).The similarity of this central structure is likely the reason for identification of this sequence by structural comparison.This structural similarity at the protein center, the lack of amino acid sequence similarity, and the conserved secretion signal make this protein an interesting candidate for further investigation.Searching the PutMoP structure against the whole PDB structure database returned many diverse proteins not linked to lignocellulose breakdown; however, the Q-score was very low for all the structures and did not help to discern the potential activity of this protein.The sequence lacks the N-terminal histidine after the signal peptide cleavage site, which is conserved in LPMOs, so this protein is unlikely to be an LPMO.However, a secreted unknown protein with some central structural similarity to an important class of oxidative proteins that degrade crystalline cellulose is of definite interest.
In addition to searching for LPMO-related sequences, classes of enzymes involved in the breakdown of lignin are important targets for the biorefining of plant biomass.The recalcitrance of lignin is a limiting factor hindering the industrial use of lignocellulose as a feedstock to produce biofuels.Lignin itself is also a historically underutilized feedstock for valuable chemicals (39).Laccases are multicopper oxidase family of enzymes that catalyze oxidation of phenolic compounds through an electron transfer reaction that simultaneously reduces molecular oxygen to water (23).They modify lignin by depoly merization and repolymerization, Cα oxidation, and demethylation and are particularly efficient due to their use of readily available molecular oxygen as the final electron acceptor (40,41).
Laccase-related sequences were searched for in the P. putredinis NO1 genome using the sequence of an AA1 family laccase from A. niger, a bespoke HMM constructed from ascomycete laccase and basidiomycete multi-copper oxidase sequences downloaded from the laccase engineering database (42) and with the structure of the A. niger AA1 laccase.In total, 32 sequences were identified across the three searching strategies, and only 9 of these were annotated by dbCAN as AA1 family of CAZymes (Table S2).The bespoke HMM allowed for more divergent sequences for these enzymes to be incor porated into the model's construction.The result was the identification of sequences that when explored further looked like laccase enzymes but were missed by traditional CAZyme annotation, highlighting how searching for CAZymes alone is a limited method for identifying lignocellulose-degrading enzymes.However, for the identification of new lignocellulose-degrading enzymes, more divergent sequences are of interest.A single coding sequence, PutLacJ, was identified by the structural searching approach with a 30% lowest acceptable match parameter that was not identified by sequence or domain-based searching (Table 2).
PutLacJ was not annotated as a CAZyme by dbCAN but does have a predicted cupredoxin domain, a feature of laccase enzymes (43).Structural comparisons against the PDB structure database revealed alignments with moderate confidence scores to copper-containing nitrite reductases from Neisseria gonorrhoeae which are suggested to play a role in pathogenesis (44).In fungi, it is more likely that these are playing a role in denitrification (45).The lack of a signal peptide makes it unlikely that this protein is involved in lignin depolymerization, despite the structural similarity to the beta-sheet regions of the A. niger laccase (Fig. 4).
Peroxidases (PODs) also play a major role in lignin deconstruction by white-rot fungi.PODs are lacking in brown-rot species, presumably due to their non-ligninolytic specialization of substrate degradation (46).The identification of new putative peroxi dases in P. putredinis NO1 is of interest.Fungal class II peroxidases are divided into three lignolytic forms: lignin peroxidase (LiP), manganese peroxidase (MnP), and versatile peroxidase (VP) (47).
Sequence searches into the P. putredinis NO1 genome using sequences of MnP from Aureobasidium subglaciale, LiP from F. oxysporum, and VP from Pyronema confluens yielded only two sequences (Table S3).Both peroxidase-related sequences were also identified by domain searching using a bespoke HMM constructed from sequences of MnPs, LiPs, and VPs downloaded from the fPoxDB database of peroxidase sequences (48).This domain-based approach identified only three sequences in total, all of which were annotated as AA2 family of CAZymes also (Table S3).However, structural-based searching using the structures of the same three peroxidases and with the lowest acceptable match parameter of 30% used in sequence-based searches identified nine coding regions in total (Table S3), seven of which were not identified by sequence-or domain-based searching approaches and were not annotated as AA2 CAZymes (Table 3) but were all found to be upregulated previously when P. putredinis NO1 was grown on wheat straw compared to growth on glucose (Supplementary File 1) (5).
Investigating these sequences further revealed two sequences to be the most interesting, PutPoxA and PutPoxG, both with low Q-scores of 0.01 and 0.04, respec tively.PutPoxA was not annotated as a CAZyme but does have a predicted domain of unknown function family 3632 (DUF3632).Genes encoding DUF3632 domains were previously found to be upregulated in the filamentous ascomycete Neurospora crassa when the CLR-2 transcription factor, important for growth on cellulose, was consti tutively expressed (49).The protein does, however, lack a signal peptide, and struc tural comparison to the A. subglaciale MnP shows similar helical structures, but these secondary structures do not appear to overlap very well (Fig. 5A).PutPoxG was not annotated as a CAZyme, and no conserved domains were identified, although the helical a Coding regions of proteins related to laccases identified through genome searching approaches with the sequence of an A. niger AA1 laccase (E-value cutoff = 1 × 10 −5 ), the bespoke laccase and multicopper oxidase HMM constructed from sequences from the laccase engineering database (significance threshold = 0.01), and the structure of the A. niger AA1 laccase (lowest percentage match = 30%) and which were not annotated as AA1 CAZymes by dbCAN.InterPro annotations were retrieved where possible structures do seem to align better with the A. subglaciale MnP than PutPoxA (Fig. 5B).Furthermore, searching of both structures against the PDB database was performed, but all alignments had very low Q-scores of less than 0.1.
As with the candidates identified by LPMO and laccase searching approaches, it is hard to be confident on sequence and structural investigation alone that these proteins are involved in lignocellulose breakdown.However, by utilizing multiple searching approaches, more divergent and varied sequences with potential relation to industrially important enzymes have been identified here.This strategy of searching for new enzymes involved in the breakdown of the most recalcitrant components of lignocellu lose would work well when combined with additional layers of biological data, e.g., transcriptomic or proteomic data.Many of the coding regions investigated here show structural similarity to the interesting classes of enzymes with which they were identified but lack the sequence similarity and therefore the functional annotation.Transcriptomic data showing upregulation of these genes or proteomic data showing increased abundances of these proteins when the organism in question is grown on lignocellulosic substrates would inspire more confidence in the role of these proteins in the degradation of plant biomass.Therefore, we used sequence similarity to identify the corresponding transcripts for these coding regions in the transcriptomic time course data set of P. putredinis NO1 grown for 10 days in cultures containing wheat straw published previ ously (5).The transcriptomic data were explored for all sequences which were identified solely by structural searches and therefore considered interesting (Supplementary File 1).For the four sequences explored in more detail, we found that three of the four, PutMoP, PutPoxA, and PutPoxG, were found to be significantly upregulated on at least one timepoint when grown on wheat straw compared to growth on glucose (Fig. 6).Expres sion of PutLacJ was instead found to be significantly higher during growth on glucose compared to growth on wheat straw.However, structural investigation revealed that PutLacJ had similarity to copper-containing nitrite reductase proteins, and it was concluded that it is unlikely to be involved in lignocellulose breakdown.Characterization would be required to confirm the role of these candidates in lignocellulose breakdown and to understand whether these activities are new.However, the implication in lignocellulose-degrading processes through the analysis of transcriptomic data provides another source of information by which candidates identified through the described strategy can be investigated.It is hoped that adoption of a similar strategy for analysis of the wealth of sequence data now publicly available will allow identification of novel enzyme sequences for many important processes to be made simpler.

Conclusions
P. putredinis NO1 was revealed here to contain a diverse repertoire of lignocellulosedegrading enzymes in its genome.The newly annotated reference genome is a poten tially useful resource, considering the potential of P. putredinis NO1 for the identification of industrially valuable enzymes (5).Among ascomycetes, P. putredinis NO1 exists within the 95th percentile for abundant auxiliary activity gene density, implying potential specialism regarding mechanisms of lignocellulose degradation, and belongs to a substantially underrepresented and underexplored lineage.Investigating CAZyme families in more detail revealed an increased capacity to target the most recalcitrant components of lignocellulose when compared to three other biomass-degrading ascomycetes.For crystalline cellulose degradation, expansions were observed in families of LPMOs and in families associated with LPMO systems.Multiple domains encoding lignin-degrading laccase proteins were also identified.Considering the context in which P. putredinis NO1 was identified, thriving at the late stages of a mixed microbial commun ity grown on wheat straw, we found it is feasible that the genome of this fungus contains new ligninolytic activities.By utilizing a strategy of searching genomic data for new enzymes with simultaneous sequence-, domain-, and structural-based approaches, multiple interesting sequences were identified.

Strain isolation
P. putredinis NO1 was isolated from a wheat straw enrichment culture and maintained as reported previously (5).

Genome assembly and annotation
Oxford Nanopore Technologies reads were filtered to those of length over 5 kb with SeqKit v.0.

Ascomycete genome annotation and CAZyme prediction
All available genome assemblies (n = 2,635) of ascomycota origin were retrieved from the NCBI genome assembly database.Genome assemblies with N50 values of >1,000 were retained and gene prediction was performed with FunAnnotate v.1.8.1 (58), BUSCO (59), and AUGUSTUS (60), generating a final data set of 2,570 genomes.Predicted genes for each genome were annotated with the CAZyme database (v.09242921), and mean gene densities were then calculated for each taxonomic level for comparative analysis.Unique taxonomy identifiers for each genome were retrieved from the NCBI taxonomy database using the Entrez NCBI API (61).No filtering was undertaken and a phylogenetic tree was reconstructed using ETE3 to retrieve the tree topology (get_topology) without intermediate nodes at a rank limit of genus (62) (Fig. 2).Gene densities from annota tions were mapped to the corresponding genomes on the tree.Genome metadata and annotations are available in Supplementary File 2.
These sequences were searched against the P. putredinis NO1 genome protein sequences through command line BLAST with an E-value cut off of 1 × 10 −5 (60).Results were compiled for the three classes of peroxidase.
supported by a CASE studentship from the BBSRC Doctoral Training Programme (BB/ M011151/1) with Prozomix Ltd.
This project was undertaken on the Viking Cluster, which is a high-performance compute facility provided by the University of York.We are grateful for computational support from the University of York High Performance Computing service, Viking and the Research Computing team.
Special thanks to Sally James for performing the nanopore sequencing in her kitchen in the first weeks of the COVID-19 pandemic, and to Katherine Newling for her immense help with all bioinformatic work and my endless questions.
C.J.R.S. conceptualized the investigation carried out in this paper; extracted the genomic DNA from P. putredinis NO1; performed CAZyme repertoire comparison analysis; structurally modeled the P. putredinis NO1 genome; performed sequence-, domain-, and structure-based searches of the genome; analyzed the search strategy results and was the major contributor in writing the manuscript.D.R.L. carried out the annotation of ascomycete genomes and CAZyme repertoire comparison analysis and was a major contributor to the writing of the manuscript.N.C.O. was involved in maintaining P. putredinis NO1 and extracting genomic DNA.S.R.J. library prepped and sequenced the P. putredinis NO1 genomic DNA.K.N. assembled the P. putredinis NO1 genome, performed initial annotation, and aided deposition of sequence data.Y.L. assembled the transcriptome that was used for annotation of the P. putredinis NO1 genome.N.G.S.M. was a contributor to the writing of the paper.S.B. carried out the Illumina sequencing, which was used to polish the P. putredinis NO1 genome assembly.N.C.B. was a major contributor to the conceptualization and supervision of the study in addition to making a major contribution to the writing of the manuscript.

FIG 2
FIG 2 Auxiliary activity distribution and density across the ascomycete tree of life.Genes predicted for ascomycete genome assemblies were annotated for CAZymes to explore patterns in the distribution and density of auxiliary activities (n = 2,570) within the ascomycete phylogenetic tree.Branch colors indicate the mean proportion of auxiliary activities within only the CAZyme annotations accounted for by all descendant taxa.Numerical clade annotations represent the number of sequenced genomes available.Key taxa, including lineages above the 95th percentile for AA proportion, have been highlighted.Genera and families with less than three and eight species-level representatives, respectively, have been pruned for clarity (n = 462 taxa).Nodes of taxonomic ranks below genus have been pruned (n = 93).

FIG 4
FIG 4 Structural comparison of PutLacJ laccase-related protein.The AlphaFold predicted structures of the sequence PutLacJ from the P. putredinis NO1 genome structurally aligned to the A. niger laccase used in sequence and structure-based searching (UniProt ID: A2QB28). A. niger laccase, beige; PutLacJ, blue.Q-score is a quality function of Cα alignment from PDBefold.

FIG 5
FIG 5 Structural comparison of peroxidase-related proteins.The AlphaFold predicted structures of two sequences, PutPoxA (A) and PutPoxG (B), from the P. putredinis NO1 genome structurally aligned to the A. subglaciale MnP used in sequence and structure-based searching.A predicted structure was unavailable, and so a predicted structure was generated with AlphaFold.A. subglaciale MnP, beige; PutPoxA, blue; PutPoxG, hot pink.Q-score is a quality function of Cα alignment from PDBefold.

TABLE 1
Identifying LPMO-related proteins encoded in the P. putredinis NO1 genome a

TABLE 2
Identifying laccase-related proteins encoded in the P. putredinis NO1 genome a

TABLE 3
Identifying peroxidase-related proteins encoded in the P. putredinis NO1 genome a Coding regions of proteins related to peroxidases identified through genome searching approaches with the sequences of an MnP from A. subglaciale, LiP from F. oxysporum, and VP from P. confluens (E-value cut-off = 1 × 10 −5 ), the bespoke peroxidase HMM contricted from MnP, LiP, and VP sequences in the fPoxDB database (significance threshold = 0.01), and the structure of the same three peroxidases used for sequence searches (lowest percentage match = 30%) and which were not annotated as AA2 CAZymes by dbCAN.InterPro annotations were retrieved where possible a temperature for 1 hour; and elution steps were performed at 37°C for 15 minutes.The resulting DNA libraries were sequenced on MinION R9.4.1 flow cells with a 48-hour run time.Basecalling was performed using Guppy v.3.5.2 software.