Complete Genome sequence of the nematicidal Bacillus thuringiensis MYBT18246

Bacillus thuringiensis is a rod-shaped facultative anaerobic spore forming bacterium of the genus Bacillus. The defining feature of the species is the ability to produce parasporal crystal inclusion bodies, consisting of δ-endotoxins, encoded by cry-genes. Here we present the complete annotated genome sequence of the nematicidal B. thuringiensis strain MYBT18246. The genome comprises one 5,867,749 bp chromosome and 11 plasmids which vary in size from 6330 bp to 150,790 bp. The chromosome contains 6092 protein-coding and 150 RNA genes, including 36 rRNA genes. The plasmids encode 997 proteins and 4 t-RNA’s. Analysis of the genome revealed a large number of mobile elements involved in genome plasticity including 11 plasmids and 16 chromosomal prophages. Three different nematicidal toxin genes were identified and classified according to the Cry toxin naming committee as cry13Aa2, cry13Ba1, and cry13Ab1. Strikingly, these genes are located on the chromosome in close proximity to three separate prophages. Moreover, four putative toxin genes of different toxin classes were identified on the plasmids p120510 (Vip-like toxin), p120416 (Cry-like toxin) and p109822 (two Bin-like toxins). A comparative genome analysis of B. thuringiensis MYBT18246 with three closely related B. thuringiensis strains enabled determination of the pan-genome of B. thuringiensis MYBT18246, revealing a large number of singletons, mostly represented by phage genes, morons and cryptic genes.


Introduction
Bacillus thuringiensis is an ubiquitously distributed, rod-shaped, Gram-positive, spore forming, facultative anaerobic bacterium [1,2]. Bacillus thuringiensis has been isolated from various ecological niches, including soil, aquatic habitats, phylloplane and insects [3][4][5][6][7]. The defining property of the species is the ability to produce parasporal protein crystals consisting of δ-endotoxins, which are predominantly encoded on plasmids [1,8,9]. These proteins are toxic towards a wide spectrum of invertebrates of the orders Lepidoptera, Diptera, Coleoptera, Hymenoptera, Homoptera, Orthoptera, Mallophaga and other species like Gastropoda, mites, protozoa and especially nematodes [7,[10][11][12]. In addition, B. thuringiensis produce additional toxins such as Cyt, Vip, and Sip toxins [13]. Cry toxins represent the largest group and can be subdivided into three different homology groups. In total, over 787 different Cry toxins have been identified, each exhibiting toxicity against a specific host organism [14]. It has been shown that B. thuringiensis strains can produce more than one Cry toxin resulting in a broad host range. As such, B. thuringiensis has been used widely as a biopesticide in agriculture for several decades [1,2,8,13,15,16]. Bacillus thuringiensis is a member of the genus Bacillus, which are low GC-content, Gram-positive bacteria with a respiratory metabolism and the ability to form heat-and desiccation-resistant endospores [11,17,18]. Within this genus, B. thuringiensis is a member of the Bacillus cereus sensu lato species group which originally contained seven different species (B. cereus, B. anthracis, B. thuringiensis, B. mycoides, B. pseudomycoides, B. weihenstephanensis, B. cytotoxicus [17][18][19][20][21][22][23][24][25]). Historically, most pathogenic and phenotypic properties were used for strain classification. However, recent publications utilizing genomic criteria suggest that the species group should be extended by species B. toyonensis [26,27]. Moreover, the three proposed species "Bacillus gaemokensis" [28], "Bacillus manliponensis" [29] and "Bacillus bingmayongensis" [30] have been isolated and effectively published. However, these names had not yet appeared on a Validation List at the time of pulbication [31]. Due to the very close phylogenetic relationships, it has also been proposed to assign the eleven species to a single extended Bcsl species [32,33]. The genome of Bcslmembers contains a highly conserved chromosome with regard to gene content, sequence similarity and genome synteny, while variation can be observed within mobile genomic elements such as prophages, insertion elements, transposons, and plasmids [34]. Due to the significance of Bcsl group members in human health, the food industry and agriculture, resolving the phylogeny is of great importance. Because of the highly conserved 16S rRNAgenes, the classical 16S phylogeny of Bcsl strains is inconclusive. Thus, a combination of 16S and a seven gene multi-locus sequence typing scheme have been used to establish taxonomic relationships within species of the Bcslgroup [35,36]. Comparative genomics of the cry-gene loci has revealed remarkable proximity to elements of genome plasticity such as plasmids, transposons, insertion elements and prophages [2,[37][38][39]. The activity of these mobile elements has resulted in a magnitude of highly diverse plasmid sizes through rearrangements such as deletions and insertions, as well as migration of crygenes into the bacterial chromosome [40]. The worldwide distribution of B. thuringiensis and its capacity to adapt to a diverse spectrum of invertebrate hosts is explained by the formation of spores and a remarkable variability in crystal protein families [13]. This toxin arsenal, especially the copy number of individual toxin genes, can be shaped by reciprocal co-adaptation with a nematode host, as previously demonstrated using controlled evolution experiments in the laboratory [41,42]. The B. thuringiensis strain MYBT18246 described herein and its host Caenorhabditis elegans have been selected as a model system for such co-evolution experiments [41]. One aim of this sequencing project was to provide a high-quality reference genome sequence for the original B. thuringiensis MYBT18246 in order to obtain a detailed phylogeny and shed light on the evolution of this microparasite, with a particular focus on the presence of virulence factors, elements of genome plasticity and host adaptation factors. Here we present the genome of the nematicidal B. thuringiensis MYBT18246 and its comparative analysis to the three closest relatives identified by MLST phylogeny.

Classification and features
Bacillus thuringiensis belongs to the genus Bacillus and has been isolated in the end of the nineteenth century [17,20] and used as a biocontrol agent for several decades [7,18,21]. The strain B. thuringiensis MYBT18246 is a Gram-positive, rod-shaped and spore forming bacterium (Fig.1a), as most B. thuringiensis [7]. Bacillus thuringiensis MYBT18246 was isolated in the Schulenburg lab by AS from a mixture of genotypes present in the strain NRRL B-18246, originally provided by the Agricultural Research Service Patent Culture Collection (United States Department of Agriculture, Peoria, IL, USA) [43][44][45]. As a member of the species B. thuringiensis, B. thuringiensis MYBT18246 is facultative anaerobe, motile and is able to produce parasporal crystal toxins, which is the characteristic feature of this species [2]. Growth occurred at temperatures ranging from 10 to 48°C and optimal growth was monitored at mesophil temperatures ranging from 28 to 37°C [46]. The pH range of B. thuringiensis strains varies from pH 4.9 to 8.0, with the optimum documented as pH 7 [47,48]. Strain B. thuringiensis MYBT18246 exhibits flat, opaque colonies with undulate, curled margins and produced crystals during the stationary phase ( Fig. 1a-b). Characteristic features of B. thuringiensis MYBT18246 are listed in Table 1.

Extended feature descriptions
The cell size of Bacillus thuringiensis can vary from 0.5 × 1.2 μm -2.5 × 10 μm [11]. Categorization into the group of Gram-positive organisms was confirmed by Gram staining, as shown in Fig. 1a. In Fig. 1b the production of Cry toxins can be observed. These toxins accumulate during the sporulation phase next to the endospore and build phase-bright inclusions [7]. Bacillus thuringiensis MYBT18246 exhibited 99% 16S rRNA sequence identity to other published Bcsl-members [49]. As a result of the high sequence similarity, a phylogenetic differentiation of B. thuringiensis MYBT18246 based on 16S phylogenetic differentiation of Bcsl group members is impossible (Fig. 2a). As an alternative, 23 B. thuringiensis strains, and a representative of each of the Bcsl group species were chosen for phylogenetic analysis using multilocus sequence typing as previously developed by Priest [36] (Fig. 2b). Bacillus subtilis subsp. subtilis str. 168 was selected as an outgroup to root the tree [17,18]. The phylogenies were generated using the Neighbor-Joining method [50] and evolutionary distances were computed by the Maximum Composite Likelihood method [51]. In total, 217 MLST gene sequences were compared with 1000 bootstrap replicates. Phylogenetic analysis was conducted in MEGA7 [52]. All used reference sequences were retrieved from GenBank hosted at NCBI.

Genome sequencing information
Genome project history Bacillus thuringiensis MYBT18246 was used in a coevolution study with a Caenorhabditis elegans host. The original strain MYBT18246 was selected for sequencing in order to generate a reliable reference sequence for subsequent experiments [41,42]. The genome sequence was analyzed to identify virulence factors and fitness factors contributing to the efficient infection of C. elegans. Additionally, the phylogenetic position of B. thuringiensis MYBT18246 in the Bcsl group was determined [41]. The complete genome sequence has been deposited in GenBank with the accession numbers (CP015350-CP015361) and in the integrated Microbial Genomes database with the Taxon ID 2671180122 [53]. A summary of the project information and its association with MIGS version 2.0 compliance [54] is shown in Table 2.

Growth conditions and genomic DNA preparation
Genomic DNA was isolated from B. thuringiensis MYBT18246 using the DNeasy blood and tissue kit (Qiagen, Hilden, Germany) for 454 pyrosequencing [55] and the Genomic-Tip 100/G Kit (Qiagen, Hilden, Germany) for Single Molecule real-time sequencing [56] according to the manufacturer's instructions. For SMRTsequencing the procedure and Checklist: Greater than 10 kb Template Preparation Using AmPure PB Beads was used and blunt end ligation was applied overnight. Whole-genome sequencing was performed using a 454 GS-FLX system (Titanium GS70 chemistry; Roche Life Science, Mannheim, Germany) and on one SMRT Cell on the PacBio RSII system using P6-chemistry (Pacific Biosciences, Menlo Park, CA, USA).

Genome sequencing and assembly
A summary of the project information can be found in Table 2. 454-pyrosequencing was carried out at the Institute of Clinical Molecular Biology in Kiel, Germany and SMRT-sequencing at the DSMZ Braunschweig. First, approximately 331,000,454-reads with an average length of 600 bp were assembled using the Newbler 2.8 de novo assembler (Roche Diagnostics), resulting in 729  Comparison includes strains of the Bacilli clade or Bcsl group members (blue). Paenibacillus larvae subsp. larvae DSM 25430 or Bacillus subtilis subsp. subtilis str. 168 has been used as outlier to root the tree. Sequences were aligned using ClustalW 1.6 [91,92]. The phylogenetic tree was constructed by using the Neighbor-Joining method [50] and evolutionary distances were computed by the Maximum Composite Likelihood method [51] within MEGA7.0 [52]. Numbers at the nodes are bootstrap values calculated from 1000 replicates

Genome annotation
Annotation was performed with Prokka v1.9 [63] using the manually curated Bacillus thuringiensis strain Bt407 [64] as a species reference and a comprehensive toxin protein database (including Cry, Cyt, Vip, and Sip toxins) as feature references. The Prokka pipeline was applied using prodigal for gene calling [65]. RNAmmer 1.2 [66] and Aragorn [67] were used for rRNA gene and t-RNA identification, respectively. Additionally, signal leader peptides were identified with SignalP 4.0 [68] and non-coding RNAs with an Infernal 1.1 search against the Rfam database [69]. Annotation of cry toxin genes were manually corrected and named according to the standards of the Cry toxin nomenclature by Crickmore [70]. Identified toxins were deposited at the Bacillus thuringiensis Toxin nomenclature database [14].

Genome properties
The genome of B. thuringiensis MYBT18246 consists of 12 replicons with a circular chromosome of 5,867,749 bp ( Table 3). The GC content of the chromosome is 35% and the GC content of the plasmids ranges from 32 to 37%. The total number of protein coding genes is 7089 with 6092 genes on the chromosome and 997 genes on the plasmids. The genome harbors 12 rRNA clusters, 111 t-RNA genes, 5274 predicted protein-coding genes with assigned function and 1815 genes encoding proteins with unknown function (Table 4). All gene products have been assigned to COGs ( Table 5). The genome sequence of B. thuringiensis MYBT18246 is available in GenBank (CP015350 for the chromosome and CP015351 -CP015361 for the plasmids).

Insights from the genome sequence
To investigate the phylogeny of B. thuringiensis MYBT18246 two approaches were used. First, nineteen Bacillus strains were chosen for 16S rRNA analysis within the Bacillus clade (Fig. 2a). The 16S rRNA phylogeny shows that B. thuringiensis MYBT18246 clusters with other Bcsl group members within the Bacillus clade. However, the low bootstrap values confirm the limitations of 16S rRNA as a discriminatory marker within the Bcsl species group. Second, we applied an MLST approach based on the scheme by Priest et al. [36]. This revealed that MYBT18246 clusters with the toxin cured B. thuringiensis Bt407, insecticidal B. thuringiensis serovar chinensis CT-43, and with the nematicidal B. thuringiensis YBT-1518 within the Bcsl phylogeny (Fig. 2b). Based on this phylogeny and the phenotypic defining feature of the B. thuringiensis species group (the ability to produce crystal toxins against invertebrates and nematodes), the strain B. thuringiensis MYBT18246 can be safely classified as nematicidal B. thuringiensis.
For a detailed analysis of encoded toxins in B. thuringiensis MYBT18246, we generated a local database consisting of all available Cry, Cyt, Vip and Sip protein sequences from UniProtKB [71] and GenBank [72]. The database was curated to generate a set of non-redundant reference toxins. In total, we identified three different  cry toxin genes in the B. thuringiensis MYBT18246 genome and classified them as cry13Aa2 (>95%), cry13Ba1 (<78%) and cry13Ab1 (<95%), based on the similarity scheme from the Cry-toxin naming committee by Crickmore [13,70]. Notably, these cry toxin genes are encoded on the chromosome and not on extrachromosomal elements as has been previously reported for the vast majority of cry toxin genes [7,73,74]. The toxin gene analysis revealed four additional putative toxin-like genes on plasmids with sequence similarity to cry genes and vip genes. A Pfam domain analysis using InterPro [75] revealed a p120510 encoded putative Vip-like toxin, a p120416 encoded putative Cry-like toxin and two p109822 encoded putative Bin-like toxins with potential for future studies. Additionally, the B. thuringiensis MYBT18246 chromosome was screened for prophage regions by using the Phage Search Tool with default parameters. PHAST identifies prophage regions based on key genes from a reference database and defines the boundaries using a genomic composition-based algorithm. For a more detailed description see [76]. A total of 16 putative prophage loci were identified in the chromosome, including three that were associated with the previously identified chromosomally encoded cry toxin genes. As shown in Fig. 3, the cry toxins (displayed in red, track 4) are located in close proximity to identified prophage regions (displayed in blue, track 3). Furthermore, all B. thuringiensis MYBT18246 extra-chromosomal elements were also screened for prophages to check whether we could identify phages that reside in a linear or circular state in the host, as has been reported in 2013 by Fortier et al. [77]. Apparently, intact phage regions were identified according to the PHAST score system on p150790, p120416, p109822, p101287 and p46701.
The finding of prophage associated cry genes in strain MYBT18246 indicates that phages may serve as vectors for the transmission of virulence factors within the species B. thuringiensis. This resembles the previously described lysogenic conversion of pathogens by phages [78], supporting the idea that phages may represent a driving force for the distribution of fitness factors as well as virulence factors [78][79][80]. The finding that toxins, which are generally specific for a certain type of host organism, are located within a mobile genomic element in the chromosome of this bacterium, suggests that phages of strain MYBT18246 may contribute to adaptation to different hosts [81][82][83].

Extended insights
Based on the proximity within the tree (Fig. 2b), the genomes of B. thuringiensis Bt407, B. thuringiensis serovar chinensis CT-43 and B. thuringiensis YBT-1518 were identified as closest relatives and selected for an in depth comparative analysis. Shared gene contents were determined, visualized and compared, with a focus on known virulence factors such as cry toxins and pathogenic driving forces such as phages. The analysis revealed unique as well as shared gene contents for each strain (Fig. 3). In Fig. 3 the outer rings represent the genes on the leading and lagging strand with COG classification. The inner rings (track 5-7) illustrate the orthologous genes of B. thuringiensis YBT-1518, B. thuringiensis CT-43, B. thuringiensis Bt407 in red (high similarity) to light yellow (low similarity), and white (no similarity). The The total number is based on the total number of protein coding genes in the genome circular representation of the chromosome comparison revealed that prophages are a major source of regional differences between the strains (Fig. 3). Additionally, the pan-genome of B. thuringiensis MYBT18246 compared to the three closest relatives was determined (Fig. 4).
Orthologous genes between all four organisms were identified by comparing the whole genomes using Proteinortho [84] with a similarity cutoff of 50% and an E-value of 1e −10 . Gbk-files were downloaded from NCBI and the protein sequences were extracted using cds_extractor v0.7.1 [85]. Detected paralogous genes are displayed in the Venn diagram in Fig. 4 Fig. 2a and b) and it also refines the phylogenetic relationship of the strains to each other based on non-orthologous regions. Singletons are located on the chromosome as well as on extrachromosomal elements. The density of singletons is higher (2.5 fold) on the plasmids. Notably, all major chromosomal differences can be attributed to prophage regions. All gene products were assigned to COG categories and investigated for PFAM domains and Signal peptides (Table 6). In detail, those genes code for: (i) phage proteins, (ii) morons (virulence factors), (iii) a vast majority of proteins with cryptic function. This is supported by Fig. 3 which clearly shows that the regions of differences (track 5-7) directly correspond to the regions of identified phages (track 3). Moreover, the identified cry toxins (track 4) are adjacent to identified prophage regions and could be suggested as morons. Additionally, the singletons were screened for further virulence factors and genes encoding type-IV secretion system, C5- methyltransferase, type-restriction enzymes, sporulation, resistance and genes involved in genetic competence were identified. In particular, the finding of restrictionmodification systems indicates a protection mechanism against other phages and plasmids and thus forms a putative barrier against further genomic modification.

Conclusion
In this work we present the whole-genome sequence of B. thuringiensis MYBT18246 and its specific genome features. The genome includes three nematicidal cry13 gene variants located on the chromosome, which were named according to sequence similarity as stated by the Cry Toxin Nomenclature Committee, as cry13Aa2, cry13Ba1, and cry13Ab1. Four additional putative toxin genes were identified with low sequence similarity to other known toxins on plasmids: p120510 (Vip-like toxin), p120416 (Cry-like toxin) and p109822 (two Binlike toxins). These toxins contained complete toxin domains, yet the activity against potential hosts should be elucidated in future studies. The genome comprises a large number of mobile elements involved in genome plasticity including eleven plasmids and sixteen chromosomal prophages. Both plasmids and prophages are important HGT elements indicating that they are an important driving force for the evolution of pathogens. The most striking finding is the close proximity of the chromosomal nematicidal cry toxin genes to three distinct prophages indicating a contribution of phages in defining the host range of this strain. B. thuringiensis MYBT18246 may show potential as a biocontrol agent against nematodes which should be addressed in future experiments.  Ortholog detection was performed with Proteinortho [84] including protein blast with a similarity cut-off of (50%) and an E-value of 1e −10 . The total number of genes and paralogs are depicted under the corresponding species name. Open reading frames that were classified as pseudogenes were not included in this analysis