Draft genome sequences of Pantoea agglomerans and Pantoea vagans isolates associated with termites

The genus Pantoea incorporates many economically and clinically important species. The plant-associated species, Pantoea agglomerans and Pantoea vagans, are closely related and are often isolated from similar environments. Plasmids conferring certain metabolic capabilities are also shared amongst these two species. The genomes of two isolates obtained from fungus-growing termites in South Africa were sequenced, assembled and annotated. A high number of orthologous genes are conserved within and between these species. The difference in genome size between P. agglomerans MP2 (4,733,829 bp) and P. vagans MP7 (4,598,703 bp) can largely be attributed to the differences in plasmid content. The genome sequences of these isolates may shed light on the common traits that enable P. agglomerans and P. vagans to co-occur in plant- and insect-associated niches.


Introduction
The bacterial genus Pantoea contains several economically important plant pathogens, as well as strains of clinical importance [10]. Amongst the plant pathogens, Pantoea ananatis, with its broad host range (e.g. onion, eucalyptus and pineapple) and P. stewartii subsp. stewartii, the causal agent of Stewart's wilt on maize, are the best known. The human pathogens include species such as P. septica and P. brenneri [9], although some plant-associated species have also been isolated from immunocompromised patients [12,17]. P. agglomerans and P. vagans are most commonly isolated from similar ecological niches, including both plant and insect hosts [41].
Three plasmids (pPag1, pPag2 and pPag3) were identified in the genome of the biocontrol strain P. vagans C9-1 [45] and it is thought that the presence of these plasmids may play a role in the physiological and ecological functioning of this strain. The plasmid, pPag1, codes for sucrose metabolism, while the plasmid, pPag2, harbours genes for an antimicrobial peptide and sorbitol utilization [33,46]. The megaplasmid pPag3 belongs to the LPP-1 plasmids conserved among all sequenced Pantoea sppecies to date and carries genes involved in pigment production, thiamine biosynthesis and maltose metabolism [19,46]. In contrast to P. vagans, some strains of P. agglomerans are also known to induce galls on Gypsophila spp., beet (Beta vulgaris), Douglas fir (Pseudotsuga menziesii) and Wisteria spp. [6,37]. This ability has been linked to a genomic island that encodes a Type III secretion system and pPath plasmid genes involved in the biosynthesis of the plant hormones, indole-3-acetic acid and cytokinins [6]. P. agglomerans strains have also been shown to cause opportunistic infections in humans [15,18].
In this study we summarize the features of a P. agglomerans (Mn107) and a P. vagans (Mn109) that were isolated from two different colonies of the fungusgrowing termite Macrotermes natalensis in South Africa, and provide an overview of the draft genome sequences and annotations for these two strains. The genome sequences provide some understanding of the shared genomic features that could be linked to their survival in similar environments and the unique features that characterise the species.
The 16S rRNA gene sequences of the enteric bacteria tend to provide insufficient resolution and the phylogenetic relationships of P. agglomerans MP2 and P. vagans MP7 were therefore inferred with multi-locus sequence analysis. This analysis included closely related members in the genus Pantoea with available genome sequences, and was based on partial nucleotide sequences of four protein coding genes (i.e., atpD, carA, gyrB, infB, recA and rpoB) [57]. Our results showed that P. agglomerans and P. vagans group as sister-species (Fig. 2).
The two isolates (strain codes: MP2 = Mn109-1w1C and MP7 = Mn107-old1M) were isolated from Macrotermes natalensis termite mounds in 2010. The surface of worker termite was rinsed using phospate buffer saline and MP2 was isolated from the rinsate, which was inoculated directly onto chitin medium (4 g chitin, 0.7 g K 2 HPO 4 , 0.3 g KH 2 PO 4 , 0.5 g MgSO 4 .5H 2 O, 0.01 g FeSO 4 .7H 2 0, 0.001 g ZnSO 4 , 0.001 g MnCl 2 , and 20 g of agar per litre), while MP7 was isolated from fungus comb ground in PBS and inoculated onto Carboxymethyl cellulose medium (10 g carboxymethyl cellulose and 20 g agar per litre). Isolates were streaked onto Yeast Malt Extract Agar medium (4 g yeast extract, 10 g malt extract, 4 g D-glucose and 20 g bacteriological agar per litre), and once in pure culture, they were stored in 10 % glycerol at −20°C. The specificity and possible role of associations between fungus-growing termites and the two Pantoea isolates have not been determined, but the abundance of members of the Enterobacteriaceae in both fungus-growing termite guts [40] and fungus combs [4] suggests the possibility of a specific association.

Genome project history
The genomes of both isolates were sequenced using the Illumina platform. Velvet [56] and Mauve [16] were employed for the assembly of the genomes and annotations were done using the Rapid Annotation using Subsystem Technology [5] and WebMGA. The genomes will remain as high quality drafts and are available from the National Center for Biotechnology Information (Tables 2 and 3). The Whole Genome Shotgun projects have been deposited at DDBJ/EMBL/GenBank under the accessions JPKQ00000000 and JPKP00000000, respectively. The versions described in this paper are version JPKQ00000000.1 and JPKP00000000.1.

Growth conditions and genomic DNA preparation
Pure cultures of the MP2 and MP7 isolates that were initially grown at 28°C on YMEA plates was then cultured in Luria-Bertani broth (10 g tryptone, 5 g yeast Fig. 1 Photomicrographs of source organisms. The source organisms for a P. agglomerans MP2 and of b P. vagans MP7, stained with safranin extract, and 5 g NaCl per litre). DNA was subsequently extracted from the cultures using the Qiagen DNeasy blood and tissue kit (Qiagen, CA). DNA quality was assessed using a NanoDrop™ spectrophotometer.

Genome sequencing and assembly
The genomes of the two isolates were sequenced using mate-paired Illumina sequencing using the HiSeq Platform at the Beijing Genomics Institute. Libraries with an insert size of 500 bp were generated and sequence lengths of 90 bp in both directions were obtained. After filtering out reads with >10 % Ns and/or 25-35 bases of low quality (≤Q20), and removing adapter and duplication contamination as well as trimming read ends, approximately 850 Mb of sequence data remained per isolate. The sequence reads were assembled using Velvet [56] and the sequencing and assembly metrics are given in Table 2. Contigs generated in this way were further assembled into contiguous scaffolds by alignment against the closest complete genomes, based on BLAST, of P. vagans C9-1 [45]

Genome annotation
The genomes were annotated using the RAST pipeline [5]. RAST initiated the annotation by predicting RNA molecules, followed by an initial gene prediction and placing of the genome into phylogenetic context. The most closely related genomes were used to assess protein families using FIGfams (i.e., sets of protein sequences that are similar along their full length and that likely represent isofunctional homologs). The remaining genes were then assessed against the FIGfam database      [5], followed by metabolic reconstruction. The number of protein-coding genes with functional predictions was thus based on the subsystem technology of RAST. Both genomes were also subjected to analysis on WebMGA, where comparisons to the Clusters of Orthologous Genes [50] and Protein family (pfam) databases [7] were performed with rpsblast [2]. Signal peptide prediction and transmembrane helix prediction for the protein-coding genes in the genomes were performed using Phobius [32]. CRISPR repeats were detected using the CRISPRs database [29] (Table 4).

Genome properties
The total genomes of P. agglomerans MP2 and P. vagans MP7 were 4,733,829 bp and 4,598,703 bp in size, respectively (Table 4; Figs. 3 and 4). The P. agglomerans MP2 genome includes three closed plasmids which show high sequence similarity and synteny to pPag1, pPag2 and pPag3 of P. vagans C9-1. The genome of P. vagans MP7 on the other hand incorporates only copies of pPag1 and pPag3. The pPag2-harbored herbicolin biosynthetic locus of P. vagans C9-1 is absent from the genomes of both MP2 and MP7 [33], while the pPATH pathogenicity island [37] is likewise absent from both strains. For P. agglomerans MP2, 85.4 % (4,043,819 bp) of the genome coded for 4,449 genes. Of these, 4,355 genes were protein-coding. For P. vagans MP7, 85.9 % (3,948,783 bp) of the genome coded for 4181 proteincoding genes. The majority of protein-coding genes had functional predictions using both RAST annotations and the COG database (Table 4). A high number of genes code for proteins that are involved in metabolism (COG codes C, G, E, F, H, I, P and Q) with fewer genes involved in all other classes (Table 5).

Insights from the genome sequences
The genomes of the sequenced isolates were compared to the publicly available genomes of P. agglomerans 190 and P. vagans C9-1 [45] to determine the average nucleotide identity [28,43] values between the isolates ( Table 6). The ANI calculations were done with JSpecies [43] using the BLAST function, which is based on fragmenting the genomic sequence into pieces of 1,020 (See figure on previous page.) Fig. 4 The genome structure of P. vagans MP7. The genome consists of 1 chromosome and 2 plasmids. The order of the contigs was based on the complete genome sequence of P. vagans C9-1 which is publicly available [45]. The contigs varied in size with the largest (contig 2) being approximately 1,010 kbp and the smallest (contig 6) being just below 50 kbp. The predicted ORFs are indicated in the inner tracks and are flanked with the COG classes associated with each of the ORFs. The GC content of the various regions within the genome is indicated in black, with the GC skew indicated in green and purple [48]  The percentage of total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome b Also includes pseudogenes and other genes nucleotides long and performing similarity searches to determine homology between the genomic fragments. The number of shared genes within and between species ranged from 3,400 to 3,500. Based on the ANI values, the isolates grouped with representatives of the designated species, as species cut-off values are suggested at 95 % for ANI [28].

Conclusion
The two bacteria described in this report were phylogenetically and genomically very closely related, but clearly belonged to different species. The ANI values supported the identification of isolates MP2 and MP7 as P. agglomerans and P. vagans, respectively.
Their similarity in genomic content may allow P. agglomerans and P. vagans to occupy the same or overlapping niches and perform the same or similar functional roles. This is consistent with what has been observed before where isolates of P. agglomerans and P. vagans occur in similar environmental niches and may even co-occur in the same environment [40]. Although recombination among micro-organisms occupying the same niche is common [3,27], our data indicated that P. agglomerans and P. vagans have remained sufficiently distinct to identify them as separate species. This suggests that their ability to occupy the same niche is likely a function of their shared genes [13,30,35], but that the integrity of their individual genomic complements is protected by barriers that limit genetic exchange or gene flow between these species [14,47].
Members of the genus Pantoea are often considered generalists that are isolated from a wide variety of environments [10,19,26]. Large metabolic repertoires (unpublished data, Marike Palmer) may allow species of this The total is based on the total number of predicted protein coding genes in the annotated genomes genus to form opportunistic associations with many potential hosts including insects [8,53]. These associations, as with the biocontrol isolates [41], may be based on the Pantoea isolates outcompeting potentially harmful bacteria in the respective environments as microbial antagonists. This is likely also true for P. agglomerans and P. vagans and their association with termites, however recent evidence (unpublished data, Michael Poulsen) suggest that the bacterial species may provide nitrogen fixation capabilities to the termites. It is possible that the antimicrobial [21,22,41] and metabolic capabilities (especially pectinolytic and other carbohydrate degrading enzymes) [8] of these bacteria allow them to outcompete other, potentially harmful micro-organisms, while also providing carbohydrates and other compounds for the termites to utilize [20].