Genome mining reveals polysaccharide-degrading potential and new antimicrobial gene clusters of novel intestinal bacterium Paenibacillus jilinensis sp. nov.

Drug-resistant bacteria have posed a great threat to animal breeding and human health. It is obviously urgent to develop new antibiotics that can effectively combat drug-resistant bacteria. The commensal flora inhabited in the intestines become potential candidates owing to the production of a wide range of antimicrobial substances. In addition, host genomes do not encode most of the enzymes needed to degrade dietary structural polysaccharides. The decomposition of these polysaccharides mainly depends on gut commensal-derived CAZymes. We report a novel species isolated from the chicken intestine, designated as Paenibacillus jilinensis sp. nov. and with YPG26T (= CCTCC M2020899T) as the type strain. The complete genome of P. jilinensis YPG26T is made up of a single circular chromosome measuring 3.97 Mb in length and containing 49.34% (mol%) G + C. It carries 33 rRNA genes, 89 tRNA genes, and 3871 protein-coding genes, among which abundant carbohydrate-degrading enzymes (CAZymes) are encoded. Moreover, this strain has the capability to antagonize multiple pathogens in vitro. We identified putative 6 BGCs encoding bacteriocin, NRPs, PKs, terpenes, and protcusin by genome mining. In addition, antibiotic susceptibility testing showed sensitivity to all antibiotics tested. This study highlights the varieties of CAZymes genes and BGCs in the genome of Paenibacillus jilinensis. These findings confirm the beneficial function of the gut microbiota and also provide a promising candidate for the development of new carbohydrate degrading enzymes and antibacterial agents.

decomposition of these polysaccharides mainly depends on gut commensal-derived CAZymes [5]. In addition to degrading structural carbohydrates, gut commensals can also antagonize enteric pathogens by producing a wide range of antimicrobial compounds [6]. Based on their biosynthesis pathway, they are classified into three main groups: bacteriocins, nonribosomal peptides (NRPs), and polyketides (PKs) [7]. These antimicrobial substances are promising candidates for the development of new antiinfective agents.
Here, we report the isolation of an uncultured Paenibacillus species, P. jilinensis YPG26, isolated from the chicken intestine. We describe its genome characteristics and further identify novel carbohydrate degradation enzyme genes and biosynthetic gene clusters (BGCs) potentially involved in pathogen antagonism.

Isolation, identification, and phylogenetic analysis
Bacteria of the genus Paenibacillus have been isolated from a variety of environments, such as humans, animals, plants, and soil [8]. The bacterial strain YPG26 T was isolated from the chicken intestine. The phylogenetic tree was constructed using the neighborjoining method and the maximum likelihood method with 1000 bootstrap replications based on the 16S rRNA gene sequence. It showed the strain YPG26 T was assigned to the genus Paenibacillus and was closest related to the species Paenibacillus telluris PS38 T (GenBank accession no. HQ257247) with 97.61% similarity, but the strain YPG26 T formed a distinct phylogenetic branch within the genus Paenibacillus (Fig. 1A). The same relationship was also supported in trees reconstructed using the maximum likelihood method (Fig. 1B). In general, the recommended 16S rRNA sequence similarity thresholds for bacterial genus and species identification were 95% and 98.65%, respectively [9]. Therefore, based on the classification threshold, the strain YPG26 T should be assigned as a novel species of Paenibacillus. The accession number for the 16S rRNA gene sequence of the strain YPG26 T deposited in the GenBank database is OK324374.

Phenotypic characteristics
Paenibacillus Species can be gram-negative, gram-positive, or gram variable, and have different growth status on the same medium (the same as P. glycanilyticus CCI5 [10], the strain YPG26 T cannot grow on LB medium, while Paenibacillus alvei MP1 can grow on LB medium [11]).The strain YPG26 T formed a single separated colony on TSB agar plate that was 1-1.5 mm in diameter, circular with slightly irregular edges, grayish-white, low convex, translucent, and glossy after aerobic cultivation at 37℃ for 20 h (Fig. S1A). The colony was slightly different in anaerobic culture (Fig. S1B). By transmission electron microscope observation, bacterial cells were rods approximately 0.9-1.2 μm wide and 4.0-5.0 μm long (Fig. S1C). Gram staining showed that the strain was positive (Fig.  S1D). Growth was observed at pH levels ranging from 5-8, with an optimum pH at 8.0, temperatures ranging from 15-50 °C, with an optimum growth at 37 °C, and the strain tolerated NaCl concentration of up to 2.0% ( Fig. S1E-G). Other physiological and biochemical characteristics are provided in Table 1, and the attributes of reference species are also described together (Table 1).

Genomic properties, ANI and DDH analyses
The generated complete genome of the strain YPG26 T was composed of a 3,966,665 bp circular chromosome ( Fig. 2) with a G + C content of 49.34% (mol%), which fit the range of the genome size of Paenibacillus from 3.02 Mbp (P. darwinianus Br) to 8.82 Mbp (P. mucilaginosus K02) and G + C content from 39 to 59 mol% [8]. It carried 33 rRNA genes, 89 tRNA genes, and 3871 protein-coding genes (CDSs) ( Table 2). The functional gene annotation was performed by blasting predicted genes against the COG and GO databases (Fig. S2).
Taxonomic and functional research of microorganisms has increasingly relied upon genome-based data and methods [15]. DNA-DNA hybridization (DDH) and average nucleotide identity (ANI) have become two gold standards for prokaryotic species circumscriptions at the genomic level [16]. The DDH and ANI values between the strain YPG26 T and reference species of Paenibacillus ranged from 13.2 to 14.0% and 67.56% to 71.07%, respectively (Table S1), which were well below the proposed thresholds of 70% and 95% for prokaryotic species delineation [17,18]. The results of the genome analysis were consistent with the outcome of the 16S rRNA sequencebased phylogenetic analysis. It also confirmed that the strain YPG26 T was a novel Paenibacillus species at the genome level, which was suggested to be named Paenibacillus jilinensis sp. nov. (jilinensis pertaining to Jilin, a province in northeast China). The type strain is YPG26 T (= CCTCC M2020899 T ). The whole-genome sequence of P. jilinensis YPG26 T has been deposited on the GenBank database with accession number CP084530. Cell size (μm) 0.9-1. Growth at 5% NaCl -

Identification of carbohydrate-active enzymes
Carbohydrate-active enzymes (CAZymes) are responsible for the biosynthesis, modification, and degradation of carbohydrates and glycoconjugates. They are involved in many metabolic pathways and are essential for microorganisms' survival [19]. By analyzing 41 Paenibacillus genomes comprising 25 species, Huang WC et al. found Paebacillus genomes encode a wide repertoire of CAZymes [20]. Three CAZymes classes were predicted in the genome of P. jilinensis YPG26 T using the carbohydrate-active enzymes database (CAZy). Glycoside hydrolases (GHs) were the most abundant class, with 58 predicted domains, followed by 50 glycosyl transferases (GTs) and 17 carbohydrate esterases (CEs) (Fig. 3A). Moreover, a total of 52 putative carbohydrate binding modules (CBMs) were present in the genomic sequence, which are appended to CAZymes and assist in substrate binding and stimulate the catalytic efficiency of the enzymes [21]. Paenibacillus species can produce some distinct variations in the numbers and families of CAZymes. These variations also explain the good adaptability of Paenibacillus species to different circumstances [22]. Thirty-eight GHs were distributed to characterize starch, chitin, and cellulose degradation based on functional categorization (Table 3). Starch degrading activity was also observed in the experiment (Fig. 3B, the chitin and cellulose degradation activity tests were not carried out). Amongst the starch degrading enzymes, 13 α-amylase from the GH13 family and 2 β-amylase from the GH14 family were represented. The phylogenetic tree of these deduced amylase showed that there were differences (Fig. 3C) and only five amylase amino acid sequences had more than 70% similarity with sequences available in the GenBank database (Table S2).

Antimicrobial activity
There is a large variety of antimicrobial substances produced by Paenibacillus species, which can target a range of human pathogens and plant pathogenic fungi [23]. Similarly, an in vitro antibacterial activity assay demonstrated some Enterococcus species are inhibited by cell-free supernatant (CFS) of P. jilinensis YPG26 T , but other pathogens were not inhibited by the agar diffusion method (Table 4). We suspected that the concentration of antimicrobial substances in CFS was too low to measure. Thus, we prepared crude extract by saturated ammonium sulfate precipitation, and the antimicrobial activity was detected by growth determinations after co-culture with CFS and crude extract. The results showed that the growth of the majority of tested pathogenic bacteria was inhibited significantly after co-culturing with CFS compared to control cultures incubated without CFS, and the crude extract showed strong growth inhibition (Fig. 4), which indicated P.
jilinensis YPG26 T can produce broad-spectrum antimicrobial substances.

Genome mining for BGCs of antimicrobial compounds
Genomic screening for biosynthetic gene clusters (BGCs) of antibacterial substances has been increasingly utilized in natural product discovery due to the large amount of bacteria whole-genome sequencing data available [24]. The genome mining indicated the presence of BGCs coding for 6 antimicrobial substances in the genome of P. jilinensis YPG26 T : NRPS, PKS, lanthipeptide, proteusin, siderophore, and terpene (Fig. 5A), which all have low similarity with the known BGCs. The total size of all BGCs was approximately 238 kb, which accounted for 6.0% of the strain YPG26 T genome (Fig. 5B). Similarly, a previous study found the great number of varieties of BGCs in Paenibacillus and Bacillus strains by genome mining [25]. Experiments with proteinase processing revealed the proteinaceous nature of the antimicrobial compounds of P. jilinensis YPG26 T , indicating that they could be bacteriocin-like substances, which was most likely to match the class II lanthipeptide BGCs with 3 core biosynthetic genes (named YPG26-lan A1, A2, and A3, respectively, Table S3) in the genome of P. jilinensis YPG26 T (Fig. 6A). The amino acid sequences of precursor peptides encoded by the three core biosynthetic genes were compared with other known class II lanthipeptides: bacteriocin J46, bovicin HJ50, butyrivibriocin OR79A, lacticin 481, mutacin II, nukacin ISK-1, steptococcin A-FF22, and variacin. The results showed that YPG26-lan A2 and YPG26-lan A3 have similar structural features to known lanthipeptides (Fig. 6B). The antibacterial activity of P. jilinenesis YPG26 T was likely conferred by YPG26-lan A2 or A3, or both. Subsequently, we will conduct further experiments to confirm the putative core biosynthetic genes of the lanthipeptide.

Antibiotic-resistant genes analysis
An unreasonable use of antibiotics may change the intestinal flora, as a result, some commensal strains acquire antibiotic resistant genes (ARGs) in order to survive in the intestinal tract, which may increase public health risk in the future. Therefore, evaluation and monitoring of antibiotic resistant genes is an important measure to prevent resistance transfer [26]. Due to some bacteria's slow growth or hard-to-culture, the use of whole-genome sequencing for antibiotic susceptibility testing has gradually become a powerful alternative [27]. The results revealed that 10 putative antibiotic resistance genes in the genome of P. jilinensis YPG26 T were responsible for resistance to antibiotics, including 8 efflux pump genes, 1 LlmA 23S ribosomal RNA methyltransferase gene, and 1 nucleoside resistance protein (tmrB) gene (Table S4). Culture-based antimicrobial susceptibility testing is still the primary method employed by clinical laboratories [27]. Thus, the sensitivity of P. jilinensis YPG26 T to different antibiotics has also been verified experimentally using the commercial antibiotics-discs diffusion method. It is a wonder that, the results showed sensitivity to all antibiotics tested (Table 5). This indicated that the strain YPG26 T has good safety.

Conclusion
In this study, we identified a novel Paenibacillus species isolated from the chicken intestine. The assembled genome analysis revealed that large numbers of carbohydrate degrading enzymes (CAZymes) genes and biosynthetic gene clusters (BGCs) of antimicrobial compounds are encoded in the genome. These genomic characteristics provid a better opportunity for understanding the intestinal niche adaption and biosynthetic potential of Paenibacillus, which will be indispensable for the direction of application in pharmaceutics, agriculture, or industry. Meanwhile, this study also contributed to the understanding of the genome features of intestinal uncultured bacteria.

Phylogenetic analysis
The 16S rRNA gene of the strain YPG26 T was amplified and sequenced as previously described [29,30]. Briefly, bacterial cells were dissolved in PCR lysis buffer (Takara, Japan). The genomic DNA was extracted using a commercial genomic DNA extraction kit, and the 16S rRNA gene was amplified using the universal bacterial primers 27F and 1492R [31]. The PCR product was sequenced, and the 16S rRNA gene sequence was compared with sequences available in GenBank by the nucleotide BLAST to determine an approximate phylogenetic affiliation. Bacillus subtilis DSM 10 (GenBank accession no. AJ276351) was used as an outgroup. The phylogenetic tree was set up using the neighbor-joining method and the maximum likelihood method in the MEGA 11.0 software [32], and the topologies were evaluated using the bootstrap resampling method with 1000 replications.

Phenotypic characterization
Cell growth of the strain YPG26 T was monitored by measuring the optical density at 600 nm as previously described [33,34], NaCl tolerance was measured in TSB medium supplemented with NaCl (i.e., the concentration of NaCl was 0.5-5%, w/v), and growth at different pH (2.0-10.0) and temperatures (4-50℃) was also tested. Cell morphology and the flagellum type were observed by transmission electron microscopy. The gram reaction was determined using the gram stain kit (Hangzhou Microbial Reagent Co., Ltd, China). Catalase activity was measured by the generation of bubbles in a 3% (v/v) H 2 O 2 solution. Carbon source tests and biochemical tests were performed using the bacterial biochemical identification tube (Hangzhou Microbial Reagent Co., Ltd, China). Growth under anaerobic circumstances was measured in the anaerobic incubator (AL-B, LABIOPHY, Dalian, China) in an atmosphere of 85% N 2 , 10% CO 2 , and 5% H 2 at 37 °C.

Whole-genome sequencing, assembly, and annotation
Total genomic DNA of the strain YPG26 T was extracted using the bacteria DNA kit (TianGen Biotech (Beijing) Co. Ltd, China). The quality and quantity of genomic DNA were evaluated using agarose (Invitrogen, USA) gel electrophoresis and a Qubit fluorometer (Thermofisher, USA), respectively. Use the DNA library prep kit (NEB, USA) for sequencing library preparation. The whole genome of the strain YPG26 T was performed using the PE150 PacBio Sequel platform and Illumina NovaSeq at the Beijing Novogene Bioinformatics Technology Co., Ltd. Assembly was completed by using SMRT link v5.0.1. Furthermore, the corrected assembly result was filtered with the base minimum mass value of 20. Finally, based on the overlap between the head and the tail, confirmed whether the genome sequence formed a circle or not and corrected the initial site by blasing with the DNA database. The annotation of the assembled genome was performed using rapid annotation subsystems technology (RAST) [35]. The tRNA and rRNA were predicted using the tRNAscan-SE [36] and rRNAmmer [37], respectively. For functional gene annotation, GO (gene ontology) [38] and COG (clusters of orthologous groups) database [39] were used.

CAZymes identification and mining of starch degrading genes
The carbohydrate-active enzymes (CAZymes) of the strain YPG26 T were identified and classified using the carbohydrate-active enzymes database [42]. The starch-degrading genes were further revealed by crosschecking with the annotations available in the database.
In vitro determination of amylase activity was carried out according to previously described [43]. Briefly, starch agar media (g/per 1000 ml: Tryptone 10.0, Yeast extract 10.0, KH 2 PO 4 5.0, soluble starch 3.0, Agar 15) was used, 10 μL with 10 8 cfu/mL of the strain YPG26 T was placed on the center of the plate and incubated at 37 °C for 48 h.
For visualization of the zone of clearance, the plate was flooded with 2 mL of Gram's iodine solution.

Antimicrobial activity
To prepare cell-free supernatants (CFS), the strain YPG26 T was cultured in TSB at 37℃ for 10 h with shaking at 200 rpm. After incubation, bacterial suspension was centrifuged at 8000 g for 10 min. Supernatants were collected and filter-sterilized with a 0.22 μm filter (Millipore, USA). The antimicrobial activity of CFS was evaluated initially according to the effect of CFS on the viability of the pathogenic bacteria [44] and appropriate modifications. Briefly, overnight pathogenic cultures were sub-cultured in TSB at 37℃ to logarithmic phase. Adjusting OD 600 to 0.02, 50 µL of each pathogenic bacteria was added per well in a 96-well plate, followed by 50 µL CFS of the strain YPG26 T or TSB medium alone (negative control). The cultures were incubated at 37℃ for 6 h, the absorbance values at 600 nm were measured using a microplate reader (Eppendorf, Germany). Relative to the absorbance value of the negative control, the absorbance value of adding CFS reflects the antimicrobial activity of the strain YPG26 T . Two biological replicates were set. To prepare crude extracts, the saturated ammonium sulfate was slowly added to the supernatant to reach 70% saturation. The precipitate was collected after standing at 4℃ overnight and 10,000 g was centrifugated for 20 min. It was then redissolved in sterile distilled water, and dialyzed extensively with sterile distilled water to remove ammonium sulfate. Finally, the crude extract was freezedried, then redissolved in sterile distilled water and the antibacterial activity was detected by the same method as the CFS antibacterial activity evaluation.

Genome mining for BGCs of antimicrobial compounds
Genome mining for biosynthetic gene clusters (BGCs) of antimicrobial compounds was carried out using the antiSMASH version 6.0.1 [45]. BGCs that differed from previously reported ones by less than 70% were considered novel [46]. The putative core biosynthetic genes of bacteriocin were further confirmed with NCBI protein BLAST. Multiple alignments of amino acid sequences of the deduced precursor peptide with other known lanthipeptides were performed using MEGA 7.0 software.