Comprehensive Genome Analysis of Cellulose and Xylan-Active CAZymes from the Genus Paenibacillus: Special Emphasis on the Novel Xylanolytic Paenibacillus sp. LS1

ABSTRACT Xylan is the most abundant hemicellulose in hardwood and graminaceous plants. It is a heteropolysaccharide comprising different moieties appended to the xylose units. Complete degradation of xylan requires an arsenal of xylanolytic enzymes that can remove the substitutions and mediate internal hydrolysis of the xylan backbone. Here, we describe the xylan degradation potential and underlying enzyme machinery of the strain, Paenibacillus sp. LS1. The strain LS1 was able to utilize both beechwood and corncob xylan as the sole source of carbon, with the former being the preferred substrate. Genome analysis revealed an extensive xylan-active CAZyme repertoire capable of mediating efficient degradation of the complex polymer. In addition to this, a putative xylooligosaccharide ABC transporter and homologues of the enzymes involved in the xylose isomerase pathway were identified. Further, we have validated the expression of selected xylan-active CAZymes, transporters, and metabolic enzymes during growth of the LS1 on xylan substrates using qRT-PCR. The genome comparison and genomic index (average nucleotide identity [ANI] and digital DNA-DNA hybridization) values revealed that strain LS1 is a novel species of the genus Paenibacillus. Lastly, comparative genome analysis of 238 genomes revealed the prevalence of xylan-active CAZymes over cellulose across the Paenibacillus genus. Taken together, our results indicate that Paenibacillus sp. LS1 is an efficient degrader of xylan polymers, with potential implications in the production of biofuels and other beneficial by-products from lignocellulosic biomass. IMPORTANCE Xylan is the most abundant hemicellulose in the lignocellulosic (plant) biomass that requires cooperative deconstruction by an arsenal of different xylanolytic enzymes to produce xylose and xylooligosaccharides. Microbial (particularly, bacterial) candidates that encode such enzymes are an asset to the biorefineries to mediate efficient and eco-friendly deconstruction of xylan to generate products of value. Although xylan degradation by a few Paenibacillus spp. is reported, a complete genus-wide understanding of the said trait is unavailable till date. Through comparative genome analysis, we showed the prevalence of xylan-active CAZymes across Paenibacillus spp., therefore making them an attractive option towards efficient xylan degradation. Additionally, we deciphered the xylan degradation potential of the strain Paenibacillus sp. LS1 through genome analysis, expression profiling, and biochemical studies. The ability of Paenibacillus sp. LS1 to degrade different xylan types obtained from different plant species, emphasizes its potential implication in lignocellulosic biorefineries.

(beechwood and corncob xylan) was monitored. Samples were collected at regular time intervals to analyze the total cell protein (indication of growth) and cellulase/xylanase activity. The growth of Paenibacillus sp. LS1 on the substrates Avicel and CM-cellulose was compromised, as indicated by the total cellular protein content of 6.7 6 1.2 mg/mL and 1.6 6 0.1 mg/mL, respectively by 32 h (Fig. 1A). Also, strain LS1 displayed very low cellulase activity on the substrates Avicel and CM-cellulose (0.2 6 0.06 U/mL) even after prolonged incubations (24 to 56 h) (Fig. 1B). However, the bacteria displayed identical growth patterns on both beechwood and corncob xylan, reaching the highest cellular protein content of up to 19 6 1.8 mg/mL (by 24 h) and 18.2 6 0.8 mg/mL (by 36 h), respectively (Fig. 1C). In agreement with the growth studies, Paenibacillus sp. LS1 exhibited predominantly higher xylanase activity than cellulase activity. While the isolate Paenibacillus sp. LS1 displayed maximum activity on xylan substrates, 2-fold higher activity was recorded on the substrate beechwood xylan (14 6 1.6 U/mL) than corncob xylan (6.7 6 0.2 U/mL) (Fig. 1D).
General features of the Paenibacillus sp. LS1 genome. The assembled genome of Paenibacillus sp. LS1 comprised a total of 40 contigs, with an N 50 contig size of 0.43 Mb. The genome size was estimated to be 7.23 Mb, and the GC content was 45.7%. Two copies of 23S, a single copy each of the 16S and 5S rRNA genes, and 74 tRNA genes coding all 20 amino acids were identified. Genome annotation revealed a total of 6,642 proteincoding genes, 3,990 (60%) of which were functionally assigned. The function of the remaining 2,652 genes could not be predicted, and hence they were designated hypothetical. Rapid Annotations using Subsystems Technology (RAST) annotation classified these protein-coding genes into 26 categories of 342 subsystems (Fig. S2). The genome of Paenibacillus sp. LS1 included all 107 known housekeeping (marker) genes (15), therefore indicating the completeness of the genome (Table S2). The genome consists of a repertoire of xylan-active CAZymes. The CAZyme repertoire of Paenibacillus sp. LS1 consisted of 270 unique proteins consisting of 299 CAZyme domains. The genome of Paenibacillus sp. LS1 was dominated by 176 glycoside hydrolases (GH), followed by 43 carbohydrate-binding modules (CBM), 42 glycosyl transferases (GT), 21 carbohydrate esterases (CE), 16 polysaccharide lyases (PL) (Fig. 2), and 1 auxiliary activity (AA) family 7 protein. No putative AA10 lytic polysaccharide monooxygenases (LPMOs) were identified. The presence of representative CAZymes from families GH1, 2, 3, 5, 6,8,9,10,11,30,31,39,43,48,51,67, and 74 among the glycoside hydrolases and CE1, 2, 3, 4, and 7 among the carbohydrate esterases suggests that the Paenibacillus sp. LS1 primarily targets lignocellulosic polysaccharides, in particular, hemicellulose (Fig. 2). Among the GHs, GH43 was the dominant family, with 18 representative members, which indicates the potential of the isolate to target arabinose moieties appended to xylans and pectins. Apart from this, the genome of Paenibacillus sp. LS1 also encoded cellulolytic CAZymes which consisted of four putative cellulases (GH5, PATRIC sequence ID peg.5091; GH6, peg.5652; GH9, peg.3676; GH48, peg.3677) and 10 b-1,4-glucosidases belonging to the GH1 and GH3 families ( Fig. S3 and Table S4). In addition to this, genome analysis also confirmed the absence of CAZy families encoding lignindegrading enzymes such as AA1 and AA2 in Paenibacillus sp. LS1. This was in line with the results obtained while screening the substrate preference, wherein the strain Paenibacillus sp. LS1 did not grow on the minimal medium supplemented with lignin (data not shown).
Removal of the D-MeGlcAp moieties can be mediated by the GH67 xylan a-1,2-glucuronosidase, peg.549. It is a tri-modular enzyme (Fig. 3) devoid of a signal peptide (Table  S3), which indicates that the enzyme is not secreted outside the cell. It shared 63% identity with a-D-glucuronidase from Geobacillus stearothermophilus T-6 (PDB: 1MQP) (19).
Expression profiling of xylan-active genes during growth of Paenibacillus sp. LS1 on different xylan substrates. qRT-PCR analysis confirmed the expression of selected genes while growth of the strain LS1 on the xylan substrates compared to the control (glucose). Notably, the gene expression was substantially higher in the presence of the substrate beechwood xylan than corncob xylan (Fig. 5). Most of the genes, including the GH10 xylanases (peg.181 and peg.315), GH39 b-1,4-xylosidase (peg.306), a-L-arabinofuranosidases from the GH30 (peg.1758) and GH43 (peg.1759) families, and xylan esterases (peg.1599, peg.3462, and peg.3263) showed a $5-fold increase in the gene expression when grown on beechwood xylan compared to corncob xylan. The a-L-arabinofuranosidase from the GH51 family (peg.5152) displayed a 3-fold increase in gene expression when grown on beechwood xylan compared to corncob xylan, while the gene expression of the only GH11 xylanase (peg.184) was 1.3-fold higher in beechwood xylan than corncob xylan. Notably, expression of the enzyme GH67 xylan a-1,2-glucuronosidase (peg.549) was observed only in beechwood xylan. Similar to the xylan-active CAZymes, the xylooligosaccharide transporter components and the respective metabolic genes also showed higher expression in the presence of beechwood xylan. The transporter component genes, peg.3234, peg.3235, and peg.3236 showed 8, 16, and 4 times increases in the relative fold change of expression in the presence of beechwood xylan, showing the position of genes involved in the xynEFG ABC transport system (A) and genes involved in xylooligosaccharide metabolism and regulation (B). Yellow arrows represent the different genes that constitute the xynEFG ABC transporter. Orange arrows represent the xynDC twocomponent system, which senses and regulates xylooligosaccharide uptake. The blue arrow represents the xylR gene, which regulates the uptake and metabolism of xylooligosaccharides. Green arrows represent the genes involved in xylose metabolism, and black arrows represent unrelated genes. The PATRIC sequence IDs of the individual genes are indicated beneath the arrows. while the metabolic genes, xylose isomerase (peg.4644), and xylulose kinase (peg.4642) showed 2 and 3 times higher expression.
The Paenibacillus sp. LS1 secretomes displayed higher specificity to beechwood xylan. The Paenibacillus sp. LS1 secretomes collected over beechwood and corncob xylan were tested on different xylan substrates to understand the substrate specificity. Both the secretomes displayed the highest overall xylanase activity of 10.7 U/mL and 8 U/mL, respectively, on the beechwood xylan substrate (Fig. 6). Of note, the secretome collected over beechwood substrate showed the highest xylanase activity on corncob xylan. This indicates that the secretome collected over the beechwood xylan substrate is rich in enzymes that could target both hardwood xylan and xylan from the graminaceous plants. The a-glucuronidase activity against 4-O-methyl-D-glucurono-Dxylan was slightly higher for the secretome obtained from beechwood xylan (5.3 U/mL) than corncob xylan (4.1 U/mL). Similarly, the arabinoxylanase activity tested over wheat arabinoxylan was slightly higher for the secretome obtained from corncob xylan substrate (1.2 U/mL) than beechwood xylan (0.9 U/mL) (Fig. 6). Notably, feeble activity on 4-nitrophenyl acetate was observed for the secretome collected over beechwood xylan (0.07 U/mL), suggesting weak acetyl xylan esterase activity, while it was negligible for the secretome collected over corncob xylan substrate (Fig. 6).
Comparative genome analysis. Genome-based relatedness was estimated in terms of OrthoANIu and in silico DDH. Among the phylogenetic neighbors, genomes for the Paenibacillus strains P. tundrae A10b T , P. tylopili MK2 T , P. cucumis AP-115 T , and P. oceanisediminis L10 T were not available in the NCBI genome database, and hence these were not considered for calculating OrthoANIu and DDH values. On the other hand, in a comparison against the closest phylogenetic neighbors, P. amylolyticus NBRC 15957 T and P. xylanexedens DSM 21292 T , OrthoANIu values of 92.17% and 92.28% and DDH values of 46.9% and 47.1%, respectively were obtained (Table S5). Furthermore, the OrthoANIu and DDH values obtained for the other phylogenetic neighbors, P. polysaccharolyticus BL9 T , P. pabuli NBRC 13638 T , P. xylanilyticus LMG 21957 T , and P. taichungensis DSM 19942 T were in the range of 78 to 82% and 22 to 25.5%, respectively (Table S5).
Comparative genome analysis of 238 distinct Paenibacillus species was performed to understand the distribution of cellulose-and xylan-active CAZymes across the genus (Fig. 7). A clustered heat map was generated based on the predicted cellulose/xylanactive CAZymes of the Paenibacillus species. Out of 238 genomes, 91 genomes lack cellulases belonging to the families GH6, GH9, GH44, and GH48. The genome of Paenibacillus sp. LS1 encoded only 3 cellulases each from the families GH6, GH9, and GH48. A similar occurrence was also observed for 39 other Paenibacillus genomes as well. Only 28 genomes encoded $4 cellulases, with the highest being encoded by P. athensensis MEC069 (12 cellulases), P. kobensis NBRC 15729 (10 cellulases), P. sambharensis SMB1, and P. woosongensis J15TS10 (7 cellulases). In addition to this, 106 Paenibacillus genomes lacked b-1,4-glucosidases (GH1), and 56 of these genomes also did not encode cellulases, making them strictly noncellulolytic representatives of the Paenibacillus genus. Only 41 out of the 238 Paenibacillus genomes lacked xylanases, i.e., representatives from GH10, GH11, and GH30. In total, 19 species out of 238 were devoid of b-1,4-xylosidases (representatives from GH39 and GH43), while 18 species completely lacked xylanases as well as b-1,4-xylosidases, making them the strict nonxylanolytic representatives among the Paenibacillus species. It is intriguing to note that among the CAZyme classes that target specific substitution moieties in xylan, the enzymes xylan a-1,2-glucuronosidase (GH67) and a-L-arabinofuranosidase (GH30 and GH43) were present in 131 and 212 Paenibacillus genomes, respectively. This showed the prevalence of a-L-arabinofuranosidases (targets arabinose moieties in xylan-abundant in graminaceous plants) over a-1,2-glucuronosidases (targets methyl-glucuronic acid residues-abundant in hardwood species). While true acetyl-xylan esterases (CE2 and CE3) were limited to only 108 of the total Paenibacillus genomes under study, the multisubstrate-specific classes such as CE1 and CE4 were abundant in all Paenibacillus species, and CE7 was present in 191 genomes.
Among the CAZyme families with mixed specificity toward cellulose and/or xylan i.e., GH3, GH5, GH8, GH51, and AA7, GH3 was the most abundant CAZyme family, which was absent only in the genome of P. farraposensis UY79, out of the 238 Paenibacillus genomes. However, in the case of auxiliary activity family of proteins, it was observed that 117 Paenibacillus genomes encoded the AA7 class of enzymes, which has been previously reported to be from fungal origin and to be active on cello-and xylo-oligosaccharides (31,32). Of note, most of the Paenibacillus genomes (135 genomes) lack the AA10 family of LPMOs, and only 17 genomes encoded $4 AA10 LPMOs.

DISCUSSION
Paenibacillus species are ubiquitous bacteria, thriving over disparate environments (33). They are known to produce hydrolytic enzymes for the deconstruction of eukaryotic cell walls and are major degraders of plant cell wall polysaccharides, particularly cellulose and hemicellulose (34). However, a genome-level comparison conducted to understand the overall enzyme complement involved in cellulose and hemicellulose (xylan in particular) utilization by the genus Paenibacillus is essential in view of the growing demand for biobased fuels. In this study, we attempted to address these gaps using biochemical, molecular, and genomic approaches. The organism under study, Paenibacillus sp. LS1, is a novel species, supported by the ANI and in silico DDH analysis performed against its phylogenetic neighbors. The genome relatedness values were below the assigned cutoffs for ANI (#95%) and DDH (#70%), therefore supporting our claim of Paenibacillus sp. LS1 being a novel species in the Paenibacillus genus.
Xylan degradation is a predominant feature across the genus Paenibacillus. To date, there are 341 distinct species in the genus Paenibacillus that have been taxonomically characterized and validly published (LPSN, https://www.bacterio.net/). Among these, the genome sequence was available for 237 distinct species (Table S6), and these were considered for comparative genome analysis with a focus on understanding the prevalence and distribution of cellulose/xylan-active CAZymes.
Comparative genome analysis based on the CAZyme profiling clearly demarcated a compromise on cellulose degradation potential across the genus Paenibacillus. This is evident from the fact that 91 Paenibacillus spp. do not encode any cellulases, while 56 species completely lack any cellulose-active CAZymes in their genomes. Only 28 out of 238 genomes harbored $4 cellulases. In contrast, 136 and 181 Paenibacillus genomes encoded $4 xylanases and b-1,4-xylosidases, strongly suggesting their preference for the xylan substrates. In line with this, the isolate Paenibacillus sp. LS1 displayed maximum activity on xylan substrates (14 6 1.6 U/mL was the highest over beechwood xylan). Of note, the activity displayed by Paenibacillus sp. LS1 was higher than the activity achieved over the same substrate (3.66 U/mL at 72 h) by P. xylanivorans A59 (10,35). In addition, in the current study it was noticed that a majority of Paenibacillus genomes do not encode LPMOs (only 103 out of 238 Paenibacillus genomes encode AA10 LPMO). In general, LPMOs are known for boosting the activity of GHs (3). Several families of these proteins have been described, with different substrate specificities and origins (36). Although the xylan-active LPMOs were initially reported to be produced from the fungus Pycnoporus coccineus (37), later studies indicated the occurrence of these families of LPMOs in the actinobacterium Kitasatospora papulosa (38). Interestingly, the comparative genome analysis also revealed 31 Paenibacillus spp. that possessed at least one xylanase and one AA10 LPMO but completely lacked cellulases in their genomes. Such a unique CAZyme repertoire may indicate the possible implications that these encoded AA10 LPMOs might have on xylan degradation by these 31 Paenibacillus spp. and requires further investigation. Considering the fact that a majority of Paenibacillus spp. have a preference for xylan substrates, it would be interesting to validate the substrate preference of LPMOs from the strict xylan-utilizing Paenibacillus isolates, which might help expand the xylan-active LPMO repertoire.
Paenibacillus sp. LS1 is unique in having at least one representative member from each xylan-active CAZyme family. This unique feature was shared by only five other Paenibacillus species, namely, P. faecis DSM 23593, P. montaniterrae J40TS1, P. paridis py1325, P. phocaensis mt24, and P. phytohabitans LMG 31459. This confirms the genomic potential of these bacteria for complete deconstruction of xylan. However, in the hierarchical clustering, Paenibacillus sp. LS1 did not share the same clade with either of these species (Fig. 7). Instead, Paenibacillus sp. LS1 was clustered together with its nearest phylogenetic neighbors within the same clade. The reason for this could be sharing a similar CAZyme profile with its nearest phylogenetic neighbors, P. amylolyticus NBRC 15957 and P. xylanexedens DSM 21292. The results from the comparative genome analysis collectively confirm that the members of the Paenibacillus genus are predominantly xylan degraders, irrespective of the habitat in which they thrive.
Paenibacillus sp. LS1 prefers complex hardwood xylan. Hardwood xylan (predominantly seen in dicots) such as beechwood xylan constitutes 4-O-methyl glucuronyl residues and acetyl esters conjugated with the xylose residues. Such an arrangement of the polymeric chain makes hardwood xylan more complex than that of monocot/grass xylan such as corncob xylan, wherein the L-arabinofuranosyl moieties are prevalent (some of which are modified to ferulate esters) (39). The genome of Paenibacillus sp. LS1 encodes a complete CAZyme arsenal for efficient deconstruction of xylan (Fig. 3). Interestingly, the results from both growth and activity studies, along with qRT-PCR-based expression analysis of selected xylan-active genes, infer a strong preference for beechwood xylan over the corncob xylan. The presence of 4-O-methyl glucuronyl residues and acetyl esters in hardwood xylan (such as beechwood xylan) justifies the high expression of the GH67 xylan a-1,2-glucuronosidase gene (peg.549) and the acetyl xylan esterase genes (CE2, peg.3462; CE7, peg.3463) on the substrate. Interestingly, peg.1599, a CE1 family CAZyme, also showed higher expression on beechwood xylan than on corncob xylan, and BLAST analysis identified peg.1599 as a putative feruloyl esterase (targets ferulic acid esters linked to arabinoxylans in monocots). However, it is notable that the CE1 family consists of enzymes with multiple activities such as acetyl xylan esterase and feruloyl esterase, and therefore, the actual role of peg.1599 needs further investigation. Also, it should be noted that the gene expression levels of the encoded acetyl xylan esterases (Fig. 5 and Table S3) and their activity (at least from the complex secretomes in this study) (Fig. 6) could not be correlated. This could be due to the absence of signal peptide in the majority of the acetyl xylan esterases produced by the strain Paenibacillus sp. LS1 (Fig. 3 and Table S3), suggesting that they are not secreted extracellularly and hence explaining the feeble activity.
Even the predicted arabinofuranosidases, peg.1758 (GH30), peg.1759 (GH43), and peg.5152 (GH51), displayed high expression in the presence of beechwood xylan, while this may seem unlikely since these CAZymes generally act upon arabinoxylans in monocots. However, GH30, GH43, and GH51 are also multisubstrate-specific families that may exhibit endo-b-1,4-xylanase (GH30 and GH51) and b-1,4-xylosidase (GH43 and GH51) activities. Therefore, the observed high expression for peg.1758, peg.1759, and peg.5152 may demarcate more distinct activities than predicted. This is supported by the fact that the secretomes displayed relatively low arabinoxylanase or arabinofuranosidase activity on the wheat arabinoxylan (Fig. 6). Overall, only two of the predicted arabinofuranosidases (the GH30 enzyme, peg.1758, and the GH43 enzyme, peg.1759) had an N-terminal signal peptide (Fig. 3 and Table S3), suggesting their possible presence in the secretome. However, considering the multisubstrate specificity of GH30 and GH43 enzymes, it could be inferred that the secreted enzymes might have low arabinofuranosidase activity. Furthermore, the GH10 xylanases, peg.181 and peg.315, and the GH39 b-1,4-xylosidase, peg.306, also showed a preference toward beechwood xylan, displaying around 5 to 10 times higher expression in beechwood xylan than corncob xylan. The higher gene expression could be due to the presence of specific substitutions in the hardwood xylan. In line with this, expression of the gene peg.549 (a-1,2-glucuronosidase) can be observed only in the presence of beechwood xylan (Fig. 5).
In a different experiment, we tried to probe the a-glucuronosidase activity for the secretomes collected over both beechwood and corncob xylan substrates. Interestingly, the secretome collected over the corncob xylan also displayed a significant a-glucuronosidase activity (Fig. 6), suggesting the possible occurrence of enzymes with similar activity other than peg. 549, which was expressed only in the presence of beechwood xylan. Therefore, it would be interesting to identify the candidate enzyme(s) responsible for the a-glucuronosidase activity of the secretome collected over corncob xylan. The only gene to show relatively similar expression on both substrates was the GH11 xylanase (peg.184). The results collectively indicate that strain LS1 prefers complex substrates, such as beechwood xylan.
Paenibacillus sp. LS1 encodes the machinery to transport and metabolize xylooligosaccharides. The transport of xylose and xylooligosaccharides in Firmicutes has been previously reported for bacteria belonging to the Bacillaceae and Clostridiaceae families (30). Also, transcriptomic analysis of Paenibacillus sp. JDR-2 identified several transport system components that were upregulated during growth over different xylan substrates (39). The genome analysis revealed the absence of the D-xylose transport system in Paenibacillus sp. LS1. However, homologues of the xylooligosaccharide ABC transporter XynEFG, which was previously reported for Geobacillus stearothermophilus T-6 (29), were identified in Paenibacillus sp. LS1. Further, the genes encoding the transporter components were identified as part of the xynDCEFG operon (Fig. 4A), where xynD and xynC were the two-component system, acting as a sensor histidine kinase (senses the presence of xylooligosaccharides in the external environment through signal transduction) and a response regulator (controlling the expression of the transporter genes), respectively (29). The presence of the complete operon confirms the plausible functional role of this transport system in the uptake of xylooligosaccharides by Paenibacillus sp. LS1.
The genes encoding the transporter components and metabolic enzymes were analyzed for expression using qRT-PCR. Similar to the xylan-active CAZymes, the transporter components and the metabolic genes also displayed higher fold change in the presence of beechwood xylan than corncob xylan. Transporter components are specific to the molecule that they are binding to and ultimately transporting inside the cell. Significantly high upregulation of the transporter components (peg.3234, peg.3235, and peg.3236) indicates greater preference for the product(s) generated from the hydrolysis of beechwood xylan than corncob xylan. It would be interesting to undertake a real-time analysis of the transporter components in the presence of different xylooligosaccharide substrates, both linear and decorated, in order to understand the finer details of the substrate promiscuity of the transporter system. Additionally, the higher expression of the metabolic genes is indicative of the internalized product being efficiently hydrolyzed to xylose by the cytoplasmic enzymes and, therefore, upregulation of the metabolic enzymes. Collectively, genomic insights supported by expression analysis by qRT-PCR confirm the FIG 8 Proposed pathway for xylan degradation and metabolism in Paenibacillus sp. LS1. Solid linkers/ arrows represent confirmed pathways, while the dotted lines infer that the pathway is hypothetical. The CAZymes involved in xylan degradation/modification are represented as cartoons with different colors as per their activities: b-1,4-endoxylanases (red), reducing end oligo-xylanase (light blue), b-1,4-xylosidases (dark blue), a-1,2-glucuronosidase (yellow), acetyl xylan esterase (pink), a-Larabinofuranosidases (light green), and feruloyl esterase (orange). NR, nonreducing end; R, reducing end; GH, glycoside hydrolase; CE, carbohydrate esterase; D-MeGlcAp, 4-O-methyl glucuronyl residues; L-Araf, L-arabinofuranosyl; FA, ferulic acid; XOS, xylooligosaccharides; XylA, xylose isomerase; XylB, xylulose kinase; G-3-P, glyceraldehyde 3-phosphate; F-6-P, fructose 6-phosphate. This figure was prepared using Biorender (app.biorender.com).
Xylan Degradation by Paenibacillus sp. LS1 Microbiology Spectrum involvement of the predicted ABC transport system and metabolic enzymes in the transport and metabolism of xylooligosaccharides by the isolate Paenibacillus sp. LS1. Conclusion. Paenibacillus sp. LS1 is a novel bacterium capable of efficient xylan degradation with the assistance of a complete arsenal of xylan-active CAZymes encoded in its genome. However, the strain was inefficient at utilizing different cellulosic substrates, evident from the growth and degradation studies and also supported by genome analysis. Growth and degradation studies of xylan, along with qRT-PCR analysis of the selected genes, affirms the preference for complex xylan substrates such as hardwood xylan by strain LS1. It also encodes a functional transport system for the internalization of xylooligosaccharides into the cell. In addition to this, comparative genome analysis revealed the prevalence of xylan-active CAZymes across the genus Paenibacillus. These results taken together indicate the possible application of the xylanolytic enzymes from Paenibacillus sp. LS1 or the other Paenibacillus species in lignocellulosic biorefineries, not only as pretreatment agents but also for generation of the value-added products.
Utilization of different cellulose and xylan substrates. Growth studies of Paenibacillus sp. LS1 were performed in M9 minimal medium supplemented with different cellulose (0.5% each of Avicel and CM-cellulose) and xylan substrates (0.5% each of beechwood and corncob xylan). While, M9 medium and M9 supplemented with 0.5% glucose were used as negative and positive controls, respectively. The primary inoculum was prepared by growing the isolate in 10 mL Luria Bertani (LB) broth for 24 h at 28°C. The culture was harvested, and the cell pellet was washed thoroughly with M9 medium. Further, the pellet was resuspended in 10 mL M9 medium, from which 0.5% of the culture was inoculated into the different experimental flasks. All experiments were performed in biological triplicates. Culture (1 mL) from each flask was collected at different time intervals and centrifuged at 10,000 rpm for 15 min. Pellet and supernatant were collected separately and stored at 220°C until further analysis.
(i) Estimation of total cell protein. Growth of the isolate was analyzed by estimating the total cell protein in the culture pellet collected at different time points as described previously (7). The pellets were treated with 0.2 N NaOH and boiled at 120°C for 10 min for cell lysis. This was followed by centrifugation at 12,000 rpm for 15 min at 4°C. Total cell protein was estimated using the standard Bradford method, essentially as described by the manufacturer's protocol.
(ii) Estimation of extracellular cellulase and xylanase activity. Reducing end assay to estimate cellulase and xylanase activity was performed using the DNS method (40) with slight modifications. A 200-mL reaction mixture consisting of 50 mM sodium phosphate (pH 7.0), 0.5% beechwood xylan or 0.5% CM-cellulose, and 50 mL of the enzyme cocktail was incubated at 37°C for 1 h with shaking at 800 rpm. The samples were centrifuged at 10,000 rpm for 15 min at 4°C. Then 40 mL of the clear supernatant was analyzed by adding 300 mL of the DNS reagent (1% 3,5-dinitrosalicylic acid, 2% NaOH, and 30% Na-K tartrate) and was boiled at 100°C for 15 min. The samples were cooled to room temperature for 5 min, and the absorbance was measured at 540 nm. The amount of reducing sugar generated was calculated using a glucose or xylose standard curve. One unit was defined as the amount of enzyme that liberated 1 mmol of reducing sugar per min.
Genome sequencing, assembly, and annotation. Paenibacillus sp. LS1 was grown in LB broth for 24 h, and the harvested culture pellet was used for DNA isolation using the standard procedure, essentially as per the manufacturer's protocol for the Qubit dsDNA HS assay kit. The quality and quantity of the isolated DNA was measured on a 0.8% agarose gel and Qubit dsDNA HS assay kit, respectively. The DNA fragmentation and library construction were done using a Nextera DNA Flex library preparation kit (Illumina) following manufacturers' protocol. After library construction, dual index adapters were ligated at the blunt end of the DNA fragments. The quality and quantity of the fragment library were estimated and checked using the Qubit dsDNA HS assay kit and Agilent 2200 TapeStation, respectively. The goodquality library was normalized, pooled, and subsequently sequenced using 2 Â 250-bp chemistry on a MiSeq platform (Illumina Inc., San Diego, CA, USA). The quality of the raw sequence was checked using FastQC (41). Adapter removal and trimming were done using Cutadapt (42).
Genome assembly and annotation were performed as previously reported (43). Assembly of trimmed good-quality reads was performed using the Unicycler assembler v.0.4.8 (44) in PATRIC (45). The quality of genome assembly was checked as per the QUAST (v.5.0.2) report (46). The assembled contig file was annotated using both the PATRIC and RAST servers (47). The annotation in PATRIC was performed using the default parameters, while that in RAST, the RASTtk pipeline (48), was used with few customizations.
Xylan Degradation by Paenibacillus sp. LS1 Microbiology Spectrum The quality of the annotated genome was assessed using the genome report file generated, based on which, further analyses were performed using PATRIC or RAST and other bioinformatic tools as per the requirements.
Genome mining for CAZymes involved in polysaccharide degradation. The CAZymes of Paenibacillus sp. LS1 were annotated using the dbCAN2 meta-server (49). An integrated approach using all the designated tools of the dbCAN2 meta-server, i.e., HMMER, Hotpep, Diamond, and CGC Finder was used for CAZyme prediction and annotation. Only those proteins or domains annotated by at least two tools were considered for further analysis (49).
CAZymes involved in the degradation of cellulose and xylan were identified with reference to the CAZy database (http://www.cazy.org/), and the protein sequences were retrieved from the annotated genome. The domain architecture of the CAZymes was predicted using Pfam (https://pfam.xfam.org/) and the NCBI Conserved Domain Database (https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml). SignalP (v.5.0) (https://services.healthtech.dtu.dk/services/SignalP-5.0/) was used to predict the presence of signal peptide. The retrieved CAZymes were screened against the PDB database using blastp to understand the extent of their uniqueness compared to the already existing well-characterized counterparts. Furthermore, proteins involved in the transport and metabolism of these polysaccharides and their degradation products were also identified. This was implemented by performing a blastp search in PATRIC against the genome of Paenibacillus sp. LS1, using the amino acid sequence of a target protein as the query.
Comparative genome analysis. Genome-based relatedness between Paenibacillus sp. LS1 and its phylogenetic neighbors (7) was estimated by comparing the OrthoANIu and in silico DDH values. The ANI calculator of the EzBioCloud database (50) and Genome-to-Genome Distance Calculator (GGDC) of DSMZ (51) were used for calculating OrthoANIu and DDH values, respectively. The genomes of the phylogenetic neighbors used for estimating OrthoANIu and DDH values were restricted to the type strains only.
CAZymes involved in cellulose and xylan degradation were identified from the CAZy database, and CAZyme profiles were generated and compiled using the dbCAN2 meta-server for 238 genomes of distinct Paenibacillus species (including Paenibacillus sp. LS1). The Paenibacillus species were identified as per the List of Prokaryotic names with Standing in Nomenclature (LPSN) (52) and were selected based on genome availability in the NCBI genome database. A clustered heatmap of the predicted cellulose and xylan-active CAZyme profiles of 238 Paenibacillus species was constructed in R (v.4.1.2) using the package pheatmap (53). Log 10 -transformed values were plotted for easier visualization.
RNA isolation, cDNA synthesis, and quantitative real-time PCR. Total RNA was isolated from the harvested cells of Paenibacillus sp. LS1 grown up to the mid-log phase on different substrates (glucose, beechwood, and corncob xylan). Total RNA isolation was performed using the TRIzol method as per the manufacturer's protocol (Sigma-Aldrich, USA). RNA integrity was analyzed using agarose gel electrophoresis and a NanoDrop 2000 UV-Vis spectrophotometer (Thermo Scientific, USA). Primers targeted for the selected genes involved in xylan degradation, xylooligosaccharides transport, and xylose metabolism were designed using the PrimerQuest tool of Integrated DNA Technologies (https://sg.idtdna.com/pages) and synthesized by Eurofins (Bengaluru, India) (Table S1). cDNA was synthesized from the total RNA using a PrimeScript 1ststrand cDNA synthesis kit (TaKaRa Bio, Inc., Japan). Quantitative real-time PCR was performed on a Mastercycler Realplex system (Eppendorf, Germany) in a final reaction volume of 10 mL containing 50 ng of cDNA, 0.4 mM primers, 1Â SYBR TB green premix Ex Taq II (TliRNase H Plus), and 1Â ROX reference dye I (6-carboxyX-rhodamine) (TaKaRa Bio, Inc.). The PCR conditions consisted of an initial denaturation at 95°C for 2 min, 40 cycles of amplification (95°C for 15 s, 54°C for 20 s, and 72°C for 30 s), and a final elongation stage at 72°C for 5 min. The gene recA (recombinase A) from Paenibacillus sp. LS1 was used as an internal control, and the relative fold change of RNA expression was estimated using the DDC T method (54).
Substrate specificity of Paenibacillus sp. LS1 xylan-active secretomes. Secretomes of Paenibacillus sp. LS1 grown in M9 medium supplemented with 0.5% beechwood and corncob xylan were collected to determine specificity toward different xylan substrates, while the secretome collected from glucose was considered a negative control. The bacteria were grown in the medium until the mid-log phase and were harvested by centrifugation at 5,000 rpm for 20 min at 4°C. Activity on beechwood and corncob xylans, wheat arabinoxylan, and 4-O-methyl-D-glucurono-D-xylan was determined using the DNS method as described in the section "Estimation of Extracellular Cellulase and Xylanase Activity.". To determine activity on 4-nitrophenyl acetate, 100 mL of 2 mM 4-nitrophenyl acetate in 50 mM sodium phosphate, pH 7.0, was incubated with 100 mL of the secretome at 37°C for 10 min. The reaction was stopped by adding 200 mL of 1 M Na 2 CO 3 , and the absorbance was measured at 405 nm. The amount of 4-nitrophenol released was measured using the 4-nitrophenol standard curve. One unit of activity was defined as 1 mmol of 4-nitrophenol released per min.
Data availability. The genome of Paenibacillus sp. LS1 has been deposited at DDBJ/ENA/GenBank under accession no. JAPDOE000000000. The version described in this paper is version JAPDOE010000000.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, PDF file, 1.5 MB.