Pathogenicity and Genomic Characterization of a Novel Genospecies, Bacillus shihchuchen, of the Bacillus cereus Group Isolated from Chinese Softshell Turtle (Pelodiscus sinensis)

The Chinese softshell turtle (CST; Pelodiscus sinensis) is a freshwater aquaculture species of substantial economic importance that is commercially farmed across Asia, particularly in Taiwan. Although diseases caused by the Bacillus cereus group (Bcg) pose a major threat to commercial CST farming systems, information regarding its pathogenicity and genome remains limited. Here, we investigated the pathogenicity of Bcg strains isolated in a previous study and performed whole-genome sequencing. Pathogenicity analysis indicated that QF108-045 isolated from CSTs caused the highest mortality rate, and whole-genome sequencing revealed that it was an independent group distinct from other known Bcg genospecies. The average nucleotide identity compared to other known Bcg genospecies was below 95%, suggesting that QF108-045 belongs to a new genospecies, which we named Bacillus shihchuchen. Furthermore, genes annotation revealed the presence of anthrax toxins, such as edema factor and protective antigen, in QF108-045. Therefore, the biovar anthracis was assigned, and the full name of QF108-045 was Bacillus shihchuchen biovar anthracis. In addition to possessing multiple drug-resistant genes, QF108-045 demonstrated resistance to various types of antibiotics, including penicillins (amoxicillin and ampicillin), cephalosporins (ceftifour, cephalexin, and cephazolin), and polypeptides, such as vancomycin.


Introduction
Pelodiscus sinensis, also known as the Chinese softshell turtle (CST), is an important freshwater aquaculture species with high nutritional value [1][2][3]. CST farming provides substantial economic benefits to the aquaculture industry in Taiwan. Bacillosis caused by Bacillus cereus group (Bcg) strains poses a serious threat to commercial CST farming, owing to the high mortality rate [3]. In a previous study, we reported that infected turtles exhibited clinical signs such as epistaxis, paralysis, and petechial hemorrhages on the skin. Gross lesion findings revealed severe oedema in the body cavity, and a histological examination revealed sepsis, hemolysis, and heterophils infiltration [3].
pBS01's similar plasmids are listed in Table 2. It is apparent that pBS01 shows the greatest similarity to the plasmid found in Bacillus sp. SYJ. This strain was isolated from infected CST in China. Furthermore, it is noteworthy that pBS01 also exhibits significant similarity to plasmids found in other strains responsible for causing food poisoning and anthrax in humans, such as the pXO1 plasmid, which has been isolated from various B. anthracis strains.   -genes for drug targets, 10-GC content, and 11-GC skew, respectively.

Ribosomal Multilocus Sequence Typing (rMLST) and Multilocus Sequence Typing (MLST) for QF108-045
After typing QF108-045 for ribosome protein subunits (rps) genes, we recorded 45 of the 65 rps genes. Among the 45 rps genes analyzed, a significant number of them exhibited close matches with various Bcg strains, including B. cereus, B. thuringiensis, B. anthracis, and B. tropicus (Table S1). The MLST results showed that two loci in the QF108-045 genome had two novel alleles, which were Glp and Pta ( Figure S2); therefore, the sequence type (ST) it matched may represent the nearest one. Nevertheless, considering all housekeeping genes' alleles, the QF 108-045 strain was still found to be most closely related to B. cereus ST 234 (Table 3). The graphical circular view of the QF 108-045 genome (left) and its plasmid (right). The tracks in the figure are displayed as concentric rings, from outermost to innermost tracks representing the 1-reference position in the genome, 2-position, and order of the assembled contigs, 3-CDSforward strands, 4-CDS-reverse strand, 5-non-Coding features, 6-AMR (anti-microbial resistance) genes, 7-genes for virulence factors, 8-genes for transporters, 9-genes for drug targets, 10-GC content, and 11-GC skew, respectively.

Ribosomal Multilocus Sequence Typing (rMLST) and Multilocus Sequence Typing (MLST) for QF108-045
After typing QF108-045 for ribosome protein subunits (rps) genes, we recorded 45 of the 65 rps genes. Among the 45 rps genes analyzed, a significant number of them exhibited close matches with various Bcg strains, including B. cereus, B. thuringiensis, B. anthracis, and B. tropicus (Table S1). The MLST results showed that two loci in the QF108-045 genome had two novel alleles, which were Glp and Pta ( Figure S2); therefore, the sequence type (ST) it matched may represent the nearest one. Nevertheless, considering all housekeeping genes' alleles, the QF 108-045 strain was still found to be most closely related to B. cereus ST 234 (Table 3).

Whole-Genome ANI, dDDH Comparison, and Phylogenetic Analysis
Based on Table S3, we conducted phylogenetic, dDDH, and ANI analyses for a comparison of QF108-045 with other known Bcg genospecies. The phylogenetic tree showed that the QF108-045 genome belonged to an independent group distinct from the other Bcg genospecies. In addition, the outermost bar chart in the phylogenetic tree listed the Bcg Figure 3. Summary of the subsystem categories referred to the genes predicted in QF108-045 genome. RAST (https://rast.nmpdr.org/rast.cgi, accessed on 29 March 2023) was used to annotate the whole genome sequence.

Whole-Genome ANI, dDDH Comparison, and Phylogenetic Analysis
Based on Table S3, we conducted phylogenetic, dDDH, and ANI analyses for a comparison of QF108-045 with other known Bcg genospecies. The phylogenetic tree showed that the QF108-045 genome belonged to an independent group distinct from the other Bcg genospecies. In addition, the outermost bar chart in the phylogenetic tree listed the Bcg strains' GC ratio, which was approximately around 35% ( Figure 4). The ANI and dDDH results are listed in Figures 5 and 6, respectively. Additionally, QF108-045, Bacillus sp. SYJ, B. thuringiensis serovar chanpaisis BGSC 4BH1, B. thuringiensis 261-1, and B. cereus AFS012518 had ANI and dDDH values higher than 95% and 70%, respectively, compared to other known genospecies, indicating that they belong to the same genospecies. Furthermore, when comparing their ANIs and dDDHs with other known genospecies, the results all showed values lower than 95% and 70%, respectively. These results suggest that they are novel and independent genospecies; therefore, we proposed a new genospecies, Bacillus shihchuchen.

Genomic Islands in QF108-045
We discovered several genomic islands (GIs) after annotating the whole genome of QF108-045 ( Figure 7). Seventeen GIs comprising two hundred and thirty genes were recorded in chromosomal DNA. Among the 17, the 7th GI, starting from 2,971,997 to 2,994,351 with a size of 29,553 bp, was found to be the largest, and encoded the TetR family regulatory protein of the multidrug resistance (MDR) cluster, genes involved in heme utilization or adhesion, membrane components of the MDR system, CAAX amino terminal protease family protein, and ribonucleotide reductase transcriptional regulator. The second-largest GI was the 6th out of the 17 GI, starting from 2,816,153 to 2,835,404 with a size of 36,082 bp, which mainly encoded the hydrolase, HAD superfamily, nutrient germinant receptor hydrophilic subunit C, succinyl-CoA ligase, and phage tail fibre proteins (Table S6).
Five GIs comprising two hundred and forty genes were detected in the plasmid DNA. Among the 5, the 1st GI starting from 5,281,216 to 5,359,300 with a size of 78,084 bp was found to be the largest, which encoded the S-layer protein, reticulocyte binding protein, calmodulin sensitive adenylate cyclase (edema factor, EF), N-Acetylneuraminate cytidylyltransferase, threonine dehydratase, and L-serine dehydratase. The second-largest GI was the 5th, starting from 5,366,176 to 5,390,220 with a size of 24,044 bp, which encoded the hemolysins and related proteins containing CBS domains, flagellar P-ring protein (FlgI), mobile element protein, and magnesium and cobalt efflux protein (CorC) ( Table S6). strains' GC ratio, which was approximately around 35% ( Figure 4). The ANI and dDDH results are listed in Figures 5 and 6, respectively. Additionally, QF108-045, Bacillus sp. SYJ, B. thuringiensis serovar chanpaisis BGSC 4BH1, B. thuringiensis 261-1, and B. cereus AFS012518 had ANI and dDDH values higher than 95% and 70%, respectively, compared to other known genospecies, indicating that they belong to the same genospecies. Furthermore, when comparing their ANIs and dDDHs with other known genospecies, the results all showed values lower than 95% and 70%, respectively. These results suggest that they are novel and independent genospecies; therefore, we proposed a new genospecies, Bacillus shihchuchen.  Table S3. The statistical details of phylogenetic tree are listed in Tables S4 and S5.

Distribution of Likely Virulence Genes in QF108-045 Genomes
Based on the BLAST results of the protein sequences in the VFDB, we identified several virulence genes located on the QF 108-045 chromosome and plasmid (Table S7). For example, hemolysin III, hemolytic enterotoxin HBL, non-hemolytic enterotoxin, and O-antigen exist in the chromosome, whereas anthrax toxin (EF encoded by cya gene and protective antigen, PA, encoded by pagA gene), exopolysaccharide (BPS), and hyaluronic acid capsule-related genes exist in the plasmid. Using the information from the VFDB and RAST annotations, we identified various genes for evaluation of their localization, secretion, solubility, and antigenicity. Our screening process involved genes associated with invasion, immune evasion, motility, and outer membrane proteins, and proteins with antigenicity scores greater than 0.5 were reported. Several proteins from QF108-045 have been identified as promising subunit proteins for vaccine development. Table S8 includes information on these proteins, including their annotated genomes, antigenicity scores, and solubilities. The open reading frames of these proteins are listed in Table S9. Because QF108-045 contains the cya and pagA virulence genes, the biovar anthracis was assigned. Therefore, the bacterium was designated as B. shihchuchen biovar anthracis.

Genomic Islands in QF108-045
We discovered several genomic islands (GIs) after annotating the whole genome of QF108-045 (Figure 7). Seventeen GIs comprising two hundred and thirty genes were recorded in chromosomal DNA. Among the 17, the 7th GI, starting from 2,971,997 to was found to be the largest, which encoded the S-layer protein, reticulocyte binding protein, calmodulin sensitive adenylate cyclase (edema factor, EF), N-Acetylneuraminate cytidylyltransferase, threonine dehydratase, and L-serine dehydratase. The second-largest GI was the 5th, starting from 5,366,176 to 5,390,220 with a size of 24,044 bp, which encoded the hemolysins and related proteins containing CBS domains, flagellar P-ring protein (FlgI), mobile element protein, and magnesium and cobalt efflux protein (CorC) ( Table  S6).

Distribution of Likely Virulence Genes in QF108-045 Genomes
Based on the BLAST results of the protein sequences in the VFDB, we identified several virulence genes located on the QF 108-045 chromosome and plasmid (Table S7). For example, hemolysin III, hemolytic enterotoxin HBL, non-hemolytic enterotoxin, and Oantigen exist in the chromosome, whereas anthrax toxin (EF encoded by cya gene and protective antigen, PA, encoded by pagA gene), exopolysaccharide (BPS), and hyaluronic acid capsule-related genes exist in the plasmid. Using the information from the VFDB and RAST annotations, we identified various genes for evaluation of their localization, secretion, solubility, and antigenicity. Our screening process involved genes associated with 2.8. Comparative Proteomics for Identification of Virulence Genes from QF108-045 B. shihchuchen biovar anthracis QF108-045 isolated from CST is closely related to B. thuringiensis serovar chanpaisis strain BGSC 4BH1 and B. thuringiensis strain 261-1, both of which were isolated from rice paddies and soil and are non-pathogenic. A comparison of the proteomic differences between non-pathogenic and pathogenic (QF108-045) bacteria revealed that certain unique proteins, such as N-acetylneuraminate synthase, S-layer homology domain protein, reticulocyte-binding protein, cya, and the transcriptional activator AtxA, appeared only in QF108-045. Furthermore, these protein-encoding genes were identified in plasmids. Additionally, some proteins, such as collagen adhesion, fibronectin type III domain protein, spore coat protein CotX, and CotY, only appeared in the chromosome of QF108-045. All the proteins that only existed in QF108-045 are shown in Figure 8.

Antibiotic-Resistant Genes
Antibiotic-resistant genes were identified in the present study. Several antibioticresistant genes were detected, including B. cereus beta-lactamase I, class A (BcI), B. cereus beta-lactamase II, class B (BcII family), fosfomycin-resistant enzyme (FosB), small multidrug resistance (SMR) efflux pump (qacJ), Streptomyces rimosus oxytetracycline resistance ribosomal protection protein otr (A), the serine racemases vanW, vanT, and vanY, and ATP-binding cassette, antibiotic efflux pump (RanA) (Figure 9). of the proteomic differences between non-pathogenic and pathogenic (QF108-045) bacteria revealed that certain unique proteins, such as N-acetylneuraminate synthase, S-layer homology domain protein, reticulocyte-binding protein, cya, and the transcriptional activator AtxA, appeared only in QF108-045. Furthermore, these protein-encoding genes were identified in plasmids. Additionally, some proteins, such as collagen adhesion, fibronectin type III domain protein, spore coat protein CotX, and CotY, only appeared in the chromosome of QF108-045. All the proteins that only existed in QF108-045 are shown in Figure 8. Comparative proteomics for identification of virulence genes from QF108-045. In the heat map, 0 (black) denotes the gene is absent in genome, 1 (light yellow) denotes the gene appeared one time, 2 (yellow) denotes the gene appeared two times, and 3 (dark yellow) denotes the gene appeared three times or more. In bottom rows, pink background denotes the protein-encoded genes existed in plasmid, grey background denotes the protein-encoded genes existed in chromosome, and green background denotes the protein-encoded genes existed in both plasmid and chromosome. BT1: B. thuringiensis serovar chanpaisis strain BGSC 4BH1, BT2: B. thuringiensis strain 261-1. Comparative proteomics for identification of virulence genes from QF108-045. In the heat map, 0 (black) denotes the gene is absent in genome, 1 (light yellow) denotes the gene appeared one time, 2 (yellow) denotes the gene appeared two times, and 3 (dark yellow) denotes the gene three times or more. In bottom rows, pink background denotes the protein-encoded genes existed in plasmid, grey background denotes the protein-encoded genes existed in chromosome, and green background denotes the protein-encoded genes existed in both plasmid and chromosome. BT1: B. thuringiensis serovar chanpaisis strain BGSC 4BH1, BT2: B. thuringiensis strain 261-1.

023, 24, x FOR PEER REVIEW
14 of 24 Figure 9. Pie chart that displays the antibiotic-resistant genes identified in QF108-045. From the outermost to innermost circle, the chart denotes the antibiotic-resistant genes, antibiotic-resistant genes' family, and antibiotic family that the resistant genes targeted.

Antibiotic Susceptibility Test
The antibiotic susceptibility of QF108-045 to 25 antimicrobial agents was screened to evaluate the sensitivity of antimicrobial drugs against QF108-045. Our data showed that QF108-045 was resistant to three penicillins (penicillin, amoxicillin, and ampicillin), three Figure 9. Pie chart that displays the antibiotic-resistant genes identified in QF108-045. From the outermost to innermost circle, the chart denotes the antibiotic-resistant genes, antibiotic-resistant genes' family, and antibiotic family that the resistant genes targeted.

Discussion
The purpose of this study was to use whole-genome sequencing to determine the pathogenicity of Bcg strains isolated from diseased CSTs and determine their genospecies, virulence genes, and antibiotic-resistant genes. Additionally, we performed an antibiotic susceptibility test to examine the susceptibility of the isolated strain to various antibiotic types and generations, and the results provide insights into the clinical treatment of this pathogen.
Several approaches have been used to define bacterial species (e.g., 16S rRNA gene sequencing and phenotypic characterization) [9,40]. Nevertheless, modern species delineation approaches utilize high-throughput ANI and dDDH-based methodologies [25,27,28], meaning that two genomes belong to different genera if they share an ANI value lower than a predefined threshold (usually 95% ANI and 70% dDDH) [29]. In addition, the taxonomic approach commonly incorporates the comparison of whole genome nucleotides' GC ratio. However, in the case of Bcg strains, the GC ratio was approximately around 35% (Figure 6), which renders the use of the GC ratio for the taxonomic classification of Bcg impossible.
A previous study has shown that Bcg genomes can be grouped Into a taxonomic framework that is flexible enough to consider phenotypes, making it easier for the public to understand [34].  [33][34][35]. At an ANI threshold of 95%, all genomes assigned to the aforementioned species based on their reference genomes or respective type strains belonged to that genospecies [34]. Interestingly, phylogenetic analysis revealed B. shihchuchen that was independent of all other known Bcg genospecies. Additionally, although B. mosaicus is the most closely related genospecies, all B. mosaicus strains exhibited ANI values of less than 95% in comparison to the newly identified genospecies, B. shihchuchen. This provides evidence for classifying B. shihchuchen as a novel genospecies (reference genome: B. shihchuchen strain QF108-045).
Based on the RAST annotating, bacterial GI, and virulence factor detection, the chromosome contained the TetR family regulatory protein of the MDR cluster, genes involved in heme utilisation or adhesion, and other virulence genes, including hemolysin III, hemolytic enterotoxin HBL, non-hemolytic enterotoxin, and O-antigen. In the plasmid, we identified virulence genes, including anthrax toxin (EF and PA) and exopolysaccharides. These results indicate the importance of both chromosomes and plasmids in contributing to pathogenesis (Figure 7, Tables S2, S6 and S7).
The anthrax tripartite toxin is composed of EF, PA, and lethal factor [41]. The QF108-045 plasmid was found to contain cya and pagA genes. Therefore, the biovar anthracis was assigned, and the full name of QF108-045 was B. shihchuchen biovar anthracis. By converting ATP to cAMP, EF operates as a calmodulin-dependent adenylyl cyclase, substantially increasing intracellular cAMP levels and causing cardiovascular dysfunction and tissue damage [42,43]. PA can transport EF inside the cell through its binding affinity for the host cell's TEM8/ATR receptors. Subsequently, anthrax toxin-receptor complexes are internalized and transported to early endosomes through clathrin-coated pits [36]. These factors may contribute to the clinical signs and gross lesions of QF108-045-challenged CSTs, such as eyelid oedema and serosanguineous fluid accumulation in the body cavity ( Figure S2).
Upon examining Table 2, it becomes evident that the B. shihchuchen biovar anthracis pBS01 exhibits the closest relationship to Bacillus sp. SYJ's plasmid [44]. This strain was isolated from Chinese soft-shelled turtles in China and is associated with septicaemia infection. It is worth noting that pBS01 also exhibited close relatedness to other strains causing food poisoning and anthrax in humans. Therefore, further experimental studies are required to investigate the zoonotic potential of B. shihchuchen biovar anthracis QF108-045.
Prophylactic vaccines are the most cost-effective treatment approach for the effective treatment and prevention of bacillosis. In bacillus-related vaccine development, a wellstudied vaccine against B. anthrax is used in human medicine [45,46]. PA is considered an important antigen in B. anthrax development. Immunity against an anthrax infection requires a sufficient humoral response to PA [47,48]. Anti-PA antibodies have been demonstrated to block the early stages of spore infection and neutralize anthrax toxin activity [49]. PA, which is encoded by pagA, was also observed in QF108-045. According to the ANTI-GENpro prediction, PA also has a very high point of antigenicity (0.9), indicating that PA might be a promising antigen for future vaccine development against QF108-045. In addition to PA, other virulence genes whose products share high antigenicity are listed in Table S8, such as hemolytic enterotoxin, non-hemolytic enterotoxin, and reticulocyte binding protein. The antigen candidate screening conducted in this study will aid in future vaccine development.
We observed that most strains of B. shihchuchen genospecies were non-pathogenic, and were isolated from soil or agricultural products, such as weeds and rice paddies; nevertheless, some were pathogens of CSTs. Among B. shihchuchen genospecies, B. shihchuchen biovar anthracis QF108-045 was most closely related to Bacillus thuringiensis serovar chanpaisis BGSC 4BH1 (now B. shihchuchen serovar chanpaisis BGSC 4BH1) and Bacillus thuringiensis 261-1 (now B. shihchuchen 261-1), with ANIs of 99.47 and 99.45, respectively. One was a pathogenic bacterium isolated from CSTs, and the other was a non-pathogenic bacterium isolated from rice paddies and weeds. N-acetylneuraminate synthase, N-acetylneuraminate cytidylyltransferase, EF, and PA were found only in QF108-045 ( Figure 8). The synthesis of N-acetylneuraminic acid (Neu5Ac or sialic acid) is catalyzed by N-acylneuraminate cytidylyltransferase, which converts Neu5Ac into diphosphate and CMP-N-acylneuraminate [50]. Activated CMP-N-acylneuraminate is then incorporated into capsular polysaccharides [51]. In Streptococcus pneumoniae, the polysaccharide capsule is the primary surface structure of the organism and plays an important role in virulence, mostly by interfering with the host opsonophagocytic clearance systems [52]. Furthermore, bacterial cell-surface sialic acids are thought to mimic host sialoglycoconjugates, allowing microorganisms to avoid detection by the host's innate immune system [53,54]. The capsular polysaccharides of QF108-045 require further investigation in future pathogenic studies.
Antibiotic-resistant genes were located in the 7th GI starting from 2,971,997 to 2,994,351 on the chromosome (Figure 7). We observed BcI and BcII family genes, which render bacteria resistant to beta-lactam antibiotics, such as penicillins and cephalosporins [55]. Although QF108-045 was resistant to first-(penicillin) and second-generation penicillins (amoxicillin and ampicillin), it was susceptible to fourth-generation penicillins, such as piperacillin [56]. Other antibiotic-resistant genes included Streptomyces rimosus otr (A) and vanW, vanT, and vanY, which may confer resistance to tetracycline [57] and vancomycin [58]. Notably, qacJ is an SMR efflux pump that confers resistance to the quaternary ammonium compound [59]. Alkyldimethylbenzylammonium chloride (BKC), a quaternary ammonium compound, is a disinfectant typically used by CST farmers. Further research on the resistance of QF108-045 to BKC is required.

Animals
This study was performed using CSTs (200 ± 10 g). To confirm that the experimental turtles were bacillosis-free, bacterial isolation and polymerase chain reaction (PCR) examinations of the livers, spleens, and kidneys were performed on five sacrificed CSTs prior to importation [3]. The CSTs were kept in a freshwater outdoor facility at 28 ± 2 • C with adequate sunlight; 30% of the water was changed daily, and 3% body weight of commercial dry pellet was fed to the CSTs twice daily. Before performing the experiment, the CSTs were acclimatized for 1 week.

Experimental Challenge and Median Lethal Dose (LD 50 )
One representative strain from each group (N1, S1, and E1-QF108-010), (N2, S2, and E2-QF108-011), (N3, S3, and E3-QF108-043), and (N4, S4, and E4-QF108-045) was used to confirm pathogenicity [3]. The strains were grown on brain-heart infusion agar (BHI, Becton Dickinson, Auvergne-Rhône-Alpes, France) at 25 • C for 24 h, and were subsequently suspended in sterile phosphate-buffered saline (PBS; Na 2 HPO 4 ·12H 2 O, NaH 2 PO 4 ·2H 2 O, NaCl) to achieve optical density 1 at an absorbance of 600 nm. The bacteria were counted using the serial dilution method [60], and 20 µL of 10-fold diluted samples were dropped on a BHI agar plate. After 24 h of incubation at 25 • C, the colony forming units per ml (CFU/mL) were determined. Fifty CSTs were randomly placed in five tanks and anesthetized by administering 100 mg/L of tricaine methane sulphonate (MS-222). Subsequently, 0.1 mL of bacterial suspension was intraperitoneally (IP) injected into each turtle (10 7 CFU/turtle). The same treatments were administered to the control CST, except that they received 0.1 mL of sterile PBS. Behavioral and other clinical indicators were consistently observed in the CSTs, and the number of deaths was recorded over 14 days. Subsequently, bacteriological and gross examinations were performed on the deceased turtles. PCR was used to identify the isolated bacteria. The LD 50 in the CST was evaluated by the IP injection of the following bacterial concentrations (10 3 , 10 4 , 10 5 , and 10 6 CFU/turtle), plus one PBS group. Ten CSTs were assigned to each group, and each group was injected with 0.1 mL of bacterial suspension or 0.1 mL of sterile PBS. The LD 50 was calculated using Beheren's method [61].

Genomic DNA Extraction and DNA Quality Control
A Quick-DNA Bacterial Miniprep Kit (Zymo Research, Irvine, CA, USA) was used to extract genomic DNA from actively growing isolates. The quality of the extracted DNA was evaluated using 1% agarose gel (Bio-Helix) electrophoresis (100 V, 20 min), and images were acquired using a Gel Doc TM XR + Imager System (Bio-Rad Laboratories). A NanoPhotometer was used to measure absorbance at 260 and 280 nm (Implen, Westlake Village, CA, USA). For a good-quality DNA sample, A260/A280 and A260/A230 ratios of 1.8-2.0 and 2.0-2.2, respectively, are acceptable. Only samples that met the inclusion criteria were used in the next stage of the experiment.

DNA Library Preparation and Sequencing Using MinIon Nanopore MK1C
A DNA library was created using the Rapid Sequencing Kit SQK-RAD004 (Oxford Nanopore Technologies, Oxfordshire, UK). A fragmentation mix containing 400 ng of genomic DNA was used for random fragmentation. After incubation at 30 • C for 1 min and then at 80 • C for 1 min in a thermal cycler (Takara Bio, San Jose, CA, USA), the mixture was placed on ice and chilled at 4 • C. The fast adapter was incubated with the fragmented genomic DNA for 5 min at room temperature, and the prepared DNA library was stored on ice until being placed in a flow cell. Prior to each sequencing read, flow cell (FLO-MIN106D; Oxford Nanopore Technologies, UK) quality check (QC) was performed on a MinION Mk1C sequencer (Oxford Nanopore Technologies). After passing the QC, the flow cell was then primed with a priming kit (ONT EXP-FLP002; Oxford Nanopore Technologies). The flush buffer, flush tether, loading beads, and sequencing buffer were slowly thawed on ice for loading in the flow cell. The DNA library was then added to a flow cell placed on a MinION Mk1C sequencer for 12 h, according to the manufacturer's protocol. The flow cell was cleaned using a Flow Cell Wash Kit (ONT EXP-WSH004; Oxford Nanopore Technologies, Oxfordshire, UK) after each run and maintained at 4 • C for future use.

Raw Data Pre-Processing and Genome Assembly
All fast5 files were base-called using Guppy (version 4.3.4, Oxford Nanopore Technologies) to create fastq files when adequate reads were obtained from the sequencing runs. The ONT Gussy base-calling tool was used to transform raw signals into a DNA sequence (version 4.2.3, Oxford Nanopore Technologies). Canu [62] and Flye [63] were used for de novo genome assembly. All sequences were uploaded to the National Cen-ter for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov, accessed on 20 April 2023) with the BioProject and BioSample accession number PRJNA948896 and SAMN33923097, respectively.

Whole-Genome Distance, ANI, and dDDH Comparsion
To identify QF108-045 s similar genomes and plasmids, we utilized Mash [68], a Min-Hash dimensionality-reduction technique. This method enabled us to condense large sequences or sets of sequences into compressed sketch representations [69]. The computation of Mash distance can be rapidly performed using the size-reduced sketches, while still yielding highly correlated results with alignment-based metrics, such as ANI [70]. The whole-genome ANI comparison was conducted using the average nucleotide identity calculator online server (http://enve-omics.ce.gatech.edu/ani/, accessed on 11 April 2023) created by Rodriguez-R et al. [71]. This server utilizes both best hits (one-way ANI) and reciprocal best hits (two-way ANI) approaches when comparing two sets of genomic data.
Digital DNA-DNA hybridization was analyzed using the genome-to-genome distance calculator (GDDC) [72] with the Genome Blast Distance Phylogeny (GBDP) approach [73]. The GBDP used a basic local alignment search tool to locally align two genomes [74], yielding a list of high-scoring segment pairs (HSPs), before these HSPs were translated into a single genome-to-genome distance value. The resulting distances were then converted into values similar to DNA-DNA hybridization (DDH) using a generalized linear model, producing dDDH values. The GBDP formula d4, the sum of all the identities found in HSP divided by the overall HSP length, was applied for the dDDH value [73].

Whole-Genome Phlogenetic Tree
A whole-genome phylogenetic tree was generated using the PATRIC phylogenetic tree building service, which utilizes nucleotide sequences and amino acids from a collection of Bacterial and Viral Bioinformatics Resource Center (BV-BRC) [75] (https://www.bv-brc. org/, accessed on 10 April 2023) global Protein Families (PGFams) [76]. Both the nucleotide and amino acid sequences were used for mapping each of the PGFam-selected genes. MUSCLE was used to align protein sequences [77] and the codon alignment function of BioPython was used to align nucleotide-coding gene sequences [78]. All nucleotides and proteins were concatenated into an alignment and formatted in PHYLIP before being converted into a partition file for RaxML [79]. RaxML support values were generated using 100 rounds of the "Rapid" bootstrapping option [80]. Finally, the tree was graphically visualized using Interactive Tree Of Life software (version 6.7.5) [81].

Genomic Islands in QF108-045
To identify genomic islands in QF108-045 genomes, we utilized IslandViewer (version 4), a software developed by Bertelli et al. [82]. The GBK files were obtained from the RAST annotation server and compared to Bacillus sp. SYJ as a reference. The genomic island prediction tools, such as IslandPick and SIGI-HMM, were used to identify genomic islands, and associated mobility genes were determined using IslandPath-DIMOB and Islander tools [83].

Comparative Proteomics
Comparative proteomics was conducted using the Comparative Systems Service of BV-BRC [75]. This tool uses the protein family PATtyFams [76], annotated by RASTtk [66]. Finally, we used the Protein Family Sorter tool in the BV-BRC to study the distribution of certain protein families across various genomes, and generated a heat map for comparative proteomics.

Identification of Virulence Genes
Genomic sequences were matched against the virulence factor database (VFDB) via VFanalyser [84]. PSORTb (version 3.0.3), a protein subcellular localization prediction tool, was used to predict the protein localization and secretion [85]. All virulence genes were identified by inputting their protein sequences into the SCRATCH protein predictor [86]. The localized genes identified in the outer membrane or periplasmic regions with an antigenicity score greater than 0.5 (as identified by ANTIGENpro) were evaluated for their secretion score (identified by SOLpro), as previously described [38].

Conclusions
The present study confirmed the pathogenicity of Bcg strains isolated from CSTs and established the etiology of Bcg infection in CSTs. The whole-genome sequencing of the most virulent strain, QF108-045, confirmed the presence of a novel genospecies within Bcg, which we named B. shihchuchen. Furthermore, gene annotation revealed anthrax toxins, such as EF and PA, in QF108-045. Finally, as QF108-045 is multidrug-resistant, particularly to penicillins, aminoglycosides, and cephalosporins, antibiotic susceptibility testing prior to the treatment of CSTs with bacillosis is important.