Mosquitocidal toxin-like islands in Bacillus thuringiensis S2160-1 revealed by complete-genome sequence and MS proteomic analysis

Here, we present the whole genome sequence of Bt S2160-1, a potential alternative to the mosquitocidal model strain, Bti. One chromosome genome and four mega-plasmids were contained in Bt S2160-1, and 13 predicted genes encoding predicted insecticidal crystal proteins were identified clustered on one plasmid pS2160-1p2 containing two pathogenic islands (PAIs) designed as PAI-1 (Cry54Ba, Cry30Ea4, Cry69Aa-like, Cry50Ba2-like, Cry4Ca1-like, Cry30Ga2, Cry71Aa-like, Cry72Aa-like, Cry70Aa-like, Cyt1Da2-like and Vpb4C1-like) and PAI-2 (Cyt1Aa-like, and Tpp80Aa1-like). The clusters appear to represent mosquitocidal toxin islands similar to pathogenicity islands. Transcription/translation of 10 of the 13 predicted genes was confirmed by whole-proteome analysis using LTQ-Orbitrap LC–MS/MS. In summary, the present study identified the existence of a mosquitocidal toxin island in Bacillus thuringiensis, and provides important genomic information for understanding the insecticidal mechanism of B. thuringiensis.


Strain and medium
Bacillus thuringiensis S2160-1 was isolated by Zhang et al. and cultured as it mentioned 26 .A culture of the isolate has been maintained by the Hainan Institute of Tropical Agricultural Resources (HITAR).The strain is registered and deposited in the China General Microbiological Culture Collection Center (Collection code: CGMCC No 13274).
A local Bt toxin database was built for BLAST using known Bt toxin protein sequences, and the new Bt toxic protein nomenclature database the Bacterial Pesticidal Protein Resource Center (https:// www.bpprc-db.org/ home/) to confirm the predicted toxins.A total of 1140 amino acid sequences of Bt toxin proteins (up to January 2024) were downloaded from public data resources to construct a local database for BLAST alignment analysis.

Preparation of Bt S2160-1 total protein
Bt S2160-1 strains were streaked and cultured on LB plate medium and then a single colony was picked with a needle tip and cultured in 5 mL LB medium at 28 °C at 200 rpm for 12 h.The bacterial culture was then transferred to a conical flask containing 300 mL Nutrient Broth liquid culture medium at a ratio of 1:100 and cultured at 28 °C at 200 rpm.Subsequently, 15 mL of the bacterial culture, which was cultured continuously for 72 h until spores and crystals were completely formed, were collected every 4 h.A total of 18 cultured samples were collected over the72 h culture period and stored at − 20 °C for later use.
Each collected sample was centrifuged at 4000g at 4 °C for 20 min to harvest bacterial cells.Then, 15 mL 1 × TE buffer was added to re-suspend the bacterial cells, and the cultures were again centrifuged at 4000g at 4 °C for 20 min and the resulting supernatant was discarded.The bacterial cells from all of the sampled time points were combined, washed 3 times with 1 M NaCl, and finally re-suspended in 15 mL protein lysate buffer (50 mM Tris-HCl, 500 mM NaCl, pH 8.0).
Total proteins of Bt S2160-1 were extracted using an ultrasonic cell crusher (SONICS VCX750, SONICS & MATERIALS, INC. 53, Church Hill Rd.Newtown, CT USA), using the working conditions: 25% Ampl, pulse on 3 Sec., pulse off 15 Sec., working time 20 min.A 5 μL extracted protein solution was taken to determine the quality of the extracted proteins by SDS-PAGE electrophoresis, and 1% PMSF (100 mM), was added to each of the remaining protein solutions, which were then stored at − 80 °C for later use.

The whole protein expression profile of Bt S2160-1 identified by LTQ-Orbitrap nano-LC-MS/ MS
Bt S2160-1 was cultured at 28 °C, and culture samples were collected once every 4 h, for a total of 18 time points over three consecutive days.Proteins from each sample were extracted by sonication (SONICS VCX750, SONICS&MATERIALS, INC. 53 Church Hill Rd.Newtown, CT USA).Total protein mixtures were obtained by mixing the extracted proteins from the 18 sampled time points together and this mixture was used for analyzing the protein expression profile of Bt S2160-1.The time points covered the complete growth cycle from the vegetative phase to the end of the sporulation phase.The protein samples were digested overnight in an incubator at 37 °C by adding trypsin (trypsin: protein = 1:40) and analyzed by SDS-PAGE electrophoresis for complete digestion.
The enzymatically digested protein samples were dried using a low temperature vacuum concentrator and then re-dissolved in SCX solvent A (10 mmol/L potassium phosphate buffer in 20% ACN).The samples were separated by a strong cation exchanger (SCX) with a PolySulfoethyl A column (PolyLC, Columbia, MD; 200 Å, 5 μm, 200 × 2.1 mm).A flow rate for the SCX solvent A was set at 0.3 mL/min, and the gradient concentration of solvent B (10 mM KH 2 PO 4 , 500 mM KCl, 20% acetonitrile) was set as follows: 0% of solvent B for 0-5 min, 0-40% of solvent B for 5-50 min, 40-100% of solvent B for 50-55 min, 100% of solvent B for 55-63 min and 100-0% of solvent B for 63-70 min.Separate fractions were collected and individually lyophilized.Subsequently, 10 μL ddH 2 O was added to dissolve each fraction.The dissolved fractions were then combined into 8 fractions, which were fully mixed and prepared for desalination.
LTQ-Orbitrap Nano LC-MS/MS analysis of all 8 peptide samples was performed using an LTQ-Orbitrap nano-LC-MS/MS system (LTQ-Orbitrap Elite Mass Spectrometer, Thermo Fisher Scientific, Bremen, Germany) according to the manufacturer's instruction.The operating parameters of the mass spectrometer were as follows: The voltage of electrospray was set as 1.5 kV; the primary MS spectrum was detected by Orbitrap scanning in the mass scanning range of the m/z range 350-2000 with MS resolution of 60,000; the primary MS (MS) scans were fragmentated into the secondary MS/MS spectrum by collision induced dissociation (CID) mode at a normalized collision energy of 35% with mass resolution of 15,000.The MS data were collected automatically.
The acquired MS Data were queried and matched using Proteome Discoverer version 1.3 (Thermo Scientific) with SEQUEST search engines based on the predicted protein database of Bt S2160-1 genome as determined by the whole genome sequence and the downloaded Bt toxic protein nomenclature database (http:// www.btnom encla ture.info/).
The database search parameters were as follows: Digested by Trypsin, and a maximum of two missing digestion sites was allowed; the fixed modification was carbamidomethylation of cysteine, while variable modifications were methylation (+ 57.021 Da) and oxidation of methionine (+ 15.995 Da), respectively; the mass deviations of the primary mass spectrometry (MS) and the secondary mass spectrometry (MS/MS) were 0.1da and 0.3 Da, respectively; the secondary MS/MS scans were matched manually in order to reduce the mismatching rate; the strict threshold of False discovery rate (FDR) was set to ≤ 1% as a cut-off value for reporting result of PSMs

Complete-genome sequence analysis of Bt S2160-1
Complete-genome sequencing of Bt S2160-1 strain (GenBank accession number CP149952-CP149956) was performed on the MGISEQ-2000 platform and Oxford Nanopore Technologies (ONT) PromethION P24 device at BGI (Shenzhen, China) in 2021.Total reads of 8,764,210 were obtained which represent 2,767,081,761 bp and total 431× fold coverage of the genome.Subsequently, the whole genome comprises five replicons: one chromosome genome that is 5,425,419 bp in length with a GC content of 35.36%) and four plasmids, containing pS2160-1p1, is 476,629 bp in length with a 32.80% GC content, pS2160-1p2, is 350,765 bp in length with a 33.24%GC content, pS2160-1p3, is 107,375 bp in length with a 33.16%GC content, and pS2160-1p4 is 56,062 bp in length with a 34.47%GC content.A complete genome map of the Bt S2160-1 was also constructed using the GC viewer server tool with default parameters (Fig. 1).MetaGeneMark revealed that a predicted total of 6553 encoding genes in the Bt S2160-1 genome, comprising 5541 CDS in the chromosome genome, 512, 416, 144 and 80 coding genes in four plasmids of pS2160-1p1, pS2160-1p2, pS2160-1p3 and pS2160-1p4, respectively (Table 1).The chromosome also contains 42 rRNAs (14 23S, 14 16S, and 14 5S), 106 tRNAs, and 39 sRNAs.DNA regions related with genetic mobility were found along the chromosome and plasmids, among these we found 14 large prophage regions and 124 insertion sequences for transposases (Tnp).With only one CRISPR in chromosome, but lack of in the four plasmids, it is in agreement with most of the Bt strains lacked a functional CRISPR system may allowed higher frequency of horizontal gene transfer (HGT) to obtain selective genetic traits for better adaptability to diverse environments 31 .
Subsequently, the toxin-encoding genes were identified by performing a local BLASTP alignment query, combined with a search of the PFAM database, as well as the established rules of Bt toxin classification 32 .Finally, 13 predicted toxins were dug and shown in Table 2.

The mosquitocidal toxin island on the plasmid pS2160-1p2 of Bt S2160-1
According to the sequences analysis, the mega-plasmid pS2160-1p2 of Bt S2160-1 harbors all the genes that encode for potential pesticidal proteins (PPs), which are grouped in PAIs, and their annotation was done according to the new database of PPs.The plasmid pS2160-1p2 contains two PAIs designed as PAI-1 and PAI-2 (Fig. 2).The length of PAI-1 harboring eleven PPs is 101.6 kbp and contains nine cry genes with 3-domain encoding for Dipteran specific Cry proteins (Cry54Ba, Cry30Ea4, Cry69Aa-like, Cry50Ba2-like, Cry4Ca1-like, Cry30Ga2, Cry71Aa-like, Cry72Aa-like, Cry70Aa-like) and two genes codifying for other insecticidal toxin classes (Cyt1Da2-like and Vpb4C1-like) (Fig. 2).The PAI-2 of 30.6 kb contains two genes encoding for PPs, name Cyt1Aa, and Tpp80Aa1.The toxin genes on these plasmids form mosquitocidal toxin islands that are similar to pathogenicity islands 33 .

The virulence factors in Bt S2160-1 genome
Bt encodes a large number of virulence-associated genes.We searched for these genes and found that most of them were found in the chromosome of Bt S2160-1 strain (Table 3), while some virulence factors were found in the mega-plasmid pS2160-1p2, such as chitin-binding protein (cbp-2), Collagenases (colA-1 and colA-2) (Table 3).

Nano LC-MS/MS analysis of the whole proteome of Bt S2160-1
The protein expression profile of strain Bt S2160-1 as determined by SDS-PAGE indicated that the toxin proteins exhibited a range of molecular masses, producing bands at ranges of 130-150 kDa, 70-80 kDa, 55-60 kDa, 45-50 kDa, and 30 kDa (Fig. 3A-D).The same band on an SDS-PAGE gel can be composed of several proteins of different molecular mass and protein expression can vary at different points in the life-cycle.Therefore, to try to identify as many expressed proteins as possible, all of the proteins expressed by the Bt S2160-1 strain from the vegetative phase of growth to late sporulation were analyzed by Nano LC-MS/MS.
As a result, 159,250 raw mass spectra data were obtained by LTQ-Orbitrap Elite MS analysis.Using the SEQUEST database, 39,031 PMFs (Peptide Mass Fingerprinting, PMF) and 9282 PSMs (Peptide Spectrum Match, PSM) were retrieved based on the predicted ORF protein data of the Bt S2160-1 genome and used as the target library.The target library was composed of 9017 unique peptides.Based on the obtained spectra, 1959 proteins with high scores were identified (Fig. 3E).
A total of 1959 proteins were identified in the MS analysis, representing a coverage of 29.89% (a total of 6553 encoded proteins in the genome sequence of Bt S2160-1).The remaining 70.11% of the predicted proteins were not identified in our analysis.The number of identified proteins in our study is greater than the number obtained by Huang et al. 34 , who identified a total of 1480 proteins of which 918, 703, and 778 were identified by LC-MS/ MS from the middle vegetative, early sporulation, and late sporulation stages of growth, respectively.
Our higher number of identified proteins could be attributed to inclusion of a greater number of time points (18 sampling points) in the protein mixture that was analyzed.Therefore, apart from putative proteins predicted by the whole genome sequencing that may not be expressed as part of the growth of the strain, itself, other factors may be responsible for the incomplete number of proteins that were identified.First, the incomplete identification may have been due to the discontinuity in sampling, which may have excluded the collection of many proteins.Second, the methods used for sample preparation may have resulted in the loss of many proteins and/or the inability to obtain reliable spectra so that the proteins could not be identified.Lastly, the expression of some proteins may be so low that they did not produce spectra that could be used for identification.10 of the 13 predicted toxin proteins were found to be expressed in our analysis of protein expression (Table 4).We identified ten of the toxin proteins encoded by these genes in our MS analysis of protein expression, however, three (pS2160-1p2_id_6113, pS2160-1p2_id_6159 and pS2160-1p2_id_6259) were not identified.These results indicate that the majority, if not all, of the predicted toxin proteins should be expressed.Failure to identify three of the encoded toxin proteins may be due to their absence in the collected samples, their level in the collected samples being too low to detect, or their loss during sample preparation.Inability to detect the presence of these three proteins does not mean they are not expressed in vivo.Therefore, subsequent in vivo and in vitro studies are needed to determine if these proteins are actually expressed and to determine their potential function.

Discussion
At present, the whole genome sequencing has been widely used to mine insecticidal protein genes.Pacheco using whole genome sequencing analysis of B. thuringiensis GR007 reveals multiple pesticidal protein genes against M. sexta and S. frugiperda larvae 35 .Naveenarani also using whole genome sequencing first reported B.  www.nature.com/scientificreports/thuringiensis (Bt 62) isolate harboring two novel cry8 genes were toxic to Holotrichia serrata 36 .In the current study, the complete-genome of the mosquitocidal Bt S2160-1 strain was sequenced by MGISEQ-2000 platform and Oxford Nanopore Technologies (ONT) PromethION P24 device at BGI (Shenzhen, China) in 2021.Bt S2160-1 contains four mega-plasmids, pS2160-1p1, pS2160-1p2, pS2160-1p3 and pS2160-1p4, whose sequences were assembled with the help of bioinformatic software.The large plasmid pS2160-1p2 of Bt S2160-1 harbors all the genes that encode for potential pesticidal proteins (PPs), which are grouped in PAIs, namely PAI-1 and PAI-2 (Fig. 2).PAI-1 containing eleven PPs is 101.6 kb, while two genes encoding for PPs on PAI-2 is 30.6 kb (Fig. 2).Thus, it was apparent that all these toxin-like genes on plasmid pS2160-1p2 were clustered together.Since Hacker et al 33 .first proposed the concept of pathogenicity islands in 1990, they have been the subject of much research.A pathogenicity island (PAI), also known as a virulence island, is a cluster of genes in pathogenic bacteria that have specific structural characteristics and functions, primarily coding for products related to pathogenic virulence and bacterial metabolism.Pathogenicity islands can be up to 190 kb in size, and are often located in or near a tRNA gene or on the flanks of insertion sequences 37 .Researchers have now identified numerous functionally-related clusters of genes, including those that encode antibiotics (antibiotic resistance islands) and clusters of genes that associated with adaptive metabolic properties (metabolic islands) 38 .
Pathogenicity islands are a set of virulence-related DNA sequences in pathogenic bacteria, whose presence may play a key role in the evolution of pathogens 34 .All potential proteins are encoded in large megaplasmids as PAIs with individual gene or grouped, accompanied by repeat sequences, insertion elements, and transposases, which may allow a higher recombination rate among diverse Bt strains 39,40 .Different Bt strains harbor several pesticidal proteins (PP), forming multiple PAI islands.The PAI-1.1 is the longest PAI region in Bt GR007 strain, containing seven different cry genes 35 , which are lepidopteran specific Cry proteins (Cry1Da, Cry1Id, Cry1Ja, Cry1Nb, Cry1Ab, Cry1Bb, and Cry1Hb) 41 .This PAI-1.1 also codifies for a cluster of five insecticidal toxin components (Tcs).The Tcs proteins were originally identified in enterobacteria P. luminescens and Xenorhabdus nematophila, which are symbiont of nematodes 42 .Here, the eleven predicted toxin genes (Cry54Ba, Cry30Ea4, Cry69Aa-like, Cry50Ba2-like, Cry4Ca1-like, Cry30Ga2, Cry71Aa-like, Cry72Aa-like, Cry70Aa-like, Cyt1Da2like and Vpb4C1-like) located on PAI-1, which is speculated that the above Tcs proteins exist.PAI-2 contains two potential toxin genes (Cyt1Aa, and Tpp80Aa1) that are adjacent to each other, and Tpp80Aa1 (Cry80Aa1) is also recognized for its toxicity against Diptera insects C. pipiens pallens 23 .Otherwise, each predicted toxin gene is arranged on the plasmid as part of a gene cluster, and each predicted toxin gene is preceded or followed by one or more genes encoding transposases, such as IS4, IS6, IS1182, and Tn3 family transposases.There is also a sigma-family transcriptional regulator and an insertion sequence IS231S (transposable factor) downstream of all the toxin genes.Collectively, these elements form a mosquitocidal island similar to the virulence island in pathogenic bacteria.
For the next, we need to express all thirteen of the predicted toxic encoding genes in Escherichia coli and verify the expressed proteins by MALDI-TOF mass spectrometry.And subsequently bioassay will be performed to check the mosquitocidal activities for all these thirteen expressed proteins, even containing to determine synergistic activity of them against three-instar larvae of C. pipiens.Actually, we are just carrying out this experiment.But determining the exact contribution of each toxin (alone or in combination) to the overall toxicity of the crystal inclusions is difficult, even in the Bti strain whose synergistic activity between toxins has been well studied.Comparing the level of toxicity reported by different studies can also be problematic due to differences in experimental conditions.Factors that affect obtained levels of toxicity include host-dependent differences in the various expression systems used, differences in the size, quality, and solubility of crystal formation 43 ; differences in protein solubility or in the form of the reprecipitated protein presented to the larvae, and differences in bioassay conditions (including larval age, dietary habits, larval batches, and natural variation in insect populations) 44 .Therefore, they will take us more time to determine all these mosquitocidal activities.
Previous studies have demonstrated that the IS231 insertion sequence, belonging to the IS4 family of transposases, may provide the mobility to cry genes that is needed to form typical composite transposons 33 .Notably, the flanking repeat sequence IS231W found in B. thuringiensis subsp.israelensis (Bti) is adjacent to the cry11Aa www.nature.com/scientificreports/gene 45 .Similarly, the flanking repeat sequences IS231S and IS132C are also found in both pS2160-1p1 and pS2160-1p2 plasmids, and all the 13 toxin genes located on the pS2160-1p2 plasmid cluster together (Fig. 2).Therefore, we speculate that the cluster of toxin-encoding genes in the Bt S2160-1 strain represent a mosquitocidal toxin island.Importantly, however, several IS231-family insertion sequences associated with the cry gene structure are also observed in the pS2160-1p2 plasmid.These elements may provide the mobility needed for cry genes to form a typical composite transposon 46,47 .
A whole-proteomic analysis provided evidence that ten of the thirteen predicted toxic proteins were expressed during the growth phases of the Bt S2160-1 strain, however, three of the predicted toxic proteins (pS2160-1p2_id_6113, pS2160-1p2_id_6159 and pS2160-1p2_id_6259) of the whole proteome could not be confirmed in nano LC-MS/MS analysis.Due to the complicated sample preparation of proteomic analysis, multiple recovery is required, and after these operations, the three proteins in sampling were too low to be almost absent for detecting.In future studies, we can increase the amount of initial protein samples and reduce the steps of protein loss at the same time, or setting several parallel protein treatments, reducing the amount of protein loss after various treatments, mix evenly to ensure all proteins can reach the detected content.When RT-PCR was conducted with Bt S2160-1 strain cDNA as template, however, transcription products for these three genes were obtained (data not shown), indicating that these three genes have transcripts.Therefore, we will change alternative detection methods for these three predicted toxins.
Otherwise, the mechanisms of sporulation and ICP formation in B. thuringiensis have been investigated for many years [48][49][50][51] , the signaling pathways involved in sporulation and ICP formation was reported little, while Zheng et al. 52 found that CdaS promotes sporulation in B. thuringiensis, some c-di-AMP targets were obtained by affinity method, which are the effector proteins that affect sporulation and the formation of parasporal crystals.When exposed to susceptible G. mellonella, B. thuringiensis rapidly modulated gene expression to affect sporulation.Therefore, it is very interesting for us to figure out the how these 13 ICP encoding genes formed and what's the relationship among these toxins with sporulation in Bt S2160-1 strain.
Overall, the present study provided a complete-genome assembly of Bt S2160-1 as an alternative to the use of Bti and identified the presence of a dominant pathogenicity island of Bt toxins.In addition, for the next step, we need to clone, express and determine the mosquitocidal activity of all these 13 toxic proteins against mosquitoes, and to check whether synergistic effect existed in these toxic proteins generating high efficiency mosquitocidal activity of Bt S2160-1 strain.Therefore, the use of Bt S2160-1 to control mosquitoes, as well as other dipteran pests, would provide a great benefit to human health and agriculture.

Conclusion
In summary, the present study identified the existence of two mosquitocidal toxin islands in Bt S2160-1 strain containing thirteen potential toxic genes, and ten of thirteen predicted genes were confirmed by whole-proteome analysis using LTQ-Orbitrap LC-MS/MS, which will provide important genomic information for understanding the insecticidal mechanism of B. thuringiensis. https://doi.org/10.1038/s41598-024-66048-3

Figure 1 .
Figure 1.Visual map of the complete genome of Bt S2160-1.

Figure 3 .
Figure 3.The expression profile of the proteins of Bt S2160-1 by SDS-PAGE analysis and whole proteome analysis of Bt S2160-1 by LTQ-Orbitrap Nano LC-MS/MS.Note: (A-C) SDS-PAGE of Bt S2160-1 proteins expressed over three consecutive days in which samples were collected once every 4 h at 18 time points covering 4-72 h; (D) the protein profile of Bt S2160-1 identified by SDS-PAGE based on mixing the expressed proteins harvested over three consecutive days; (E) Analysis of nano LC-MS/MS data based on a query of the SEQUEST database.M: Fermentas PageRuler™ Unstained Protein Ladder.PMF peptide mass fingerprinting, PSM peptide spectrum match.

Table 1 .
Features of the genome of Bt S2160-1.

Table 4 .
The predicted toxin proteins detected by LTQ-Orbitrap MS in total protein samples of Bt S2160-1.