Genome sequence of Phaeobacter daeponensis type strain (DSM 23529T), a facultatively anaerobic bacterium isolated from marine sediment, and emendation of Phaeobacter daeponensis

TF-218T is the type strain of the species Phaeobacter daeponensis Yoon et al. 2007, a facultatively anaerobic Phaeobacter species isolated from tidal flats. Here we describe the draft genome sequence and annotation of this bacterium together with previously unreported aspects of its phenotype. We analyzed the genome for genes involved in secondary metabolite production and its anaerobic lifestyle, which have also been described for its closest relative Phaeobacter caeruleus. The 4,642,596 bp long genome of strain TF-218T contains 4,310 protein-coding genes and 78 RNA genes including four rRNA operons and consists of five replicons: one chromosome and four extrachromosomal elements with sizes of 276 kb, 174 kb, 117 kb and 90 kb. Genome analysis showed that TF-218T possesses all of the genes for indigoidine biosynthesis, and on specific media the strain showed a blue pigmentation. We also found genes for dissimilatory nitrate reduction, gene-transfer agents, NRPS/ PKS genes and signaling systems homologous to the LuxR/I system.

It was isolated from tidal flats at Daepo Beach (Yellow Sea), Korea, which led to the species name of P. daeponensis [1]. Secondary metabolite production is a well-known feature within the Roseobacter clade [6], especially within the Phaeobacter cluster, which shows high efficiency for secondary metabolite production [7]. Examples include biosynthesis of the antibiotics tropdithietic acid (TDA) or indigoidine, quorum sensing by N-acyl homoserine lactones (AHLs), and presence of genes coding for nonribosomal peptide synthases (NRPS) and polyketide synthases (PKS) [6][7][8][9][10][11]. Furthermore, P. daeponensis was the first described facultatively anaerobic Phaeobacter species, which is capable of nitrate reduction [1]. Here we present the draft genome sequence and annotation of P. daeponensis TF-218 T . We analyzed the genome for special features with a focus on secondary metabolite production. Novel aspects of the strain phenotype are also reported.

Classification and features
16S rRNA gene sequence analysis Figure 1 shows the phylogenetic neighborhood of P. daeponensis in a 16S rRNA gene sequence based tree. The sequences of the four 16S rRNA gene copies in the genome of strain DSM 23529 T differ from each other by up to two nucleotides, and differ by up to two nucleotides from the previously published 16S rRNA gene sequence (DQ81486) [ Table 1]. A representative genomic 16S rRNA gene sequence of P. daeponensis TF-218 T was compared with the Greengenes database for determining the weighted relative frequencies of taxa and (truncated) keywords as previously described [21]. The most frequently occurring genera were Ruegeria (31.6%), Phaeobacter (28.8%), Silicibacter (13.6%), Roseobacter (13.3%) and Nautella (3.6%) (713 hits in total). Regarding the five hits to sequences from the species, the average identity within HSPs was 99.9%, whereas the average coverage by HSPs was 19.0%. Regarding the 45 hits to sequences from other species of the genus, the average identity within HSPs was 97.8%, whereas the average coverage by HSPs was 18.9%. Among all other species, the one yielding the highest score was Roseobacter gallaeciensis (AY881240), which corresponded to an identity of 98.6% and an HSP coverage of 18.8%. (Note that the Greengenes database uses the INSDC (= EMBL/NCBI/DDBJ) annotation, which is not an authoritative source for nomenclature or classification.) The highest-scoring environmental sequence was AF253467 (Greengenes short name 'Key aromatic-ring-cleaving enzyme protocatechuate 34-dioxygenase ecologically important marine Roseobacter lineage d on Indulin seawater'), which showed an identity of 99.8% and an HSP coverage of 18.8%. The most frequently occurring keywords within the labels of all environmental samples which yielded hits were 'microbi' (2.8%), 'marin' (2.7%), 'coral' (2.4%), 'diseas' (1.8%) and 'water' (1.8%) (492 hits in total). The most frequently occurring keywords within the labels of those environmental samples which yielded hits of a higher score than the highest scoring species were 'marin' (17.4%), 'sediment' (8.5%), 'aromatic-ring-cleav, ecolog, enzym, import, indulin, kei, lineag, protocatechu, roseobact, seawat' (4.4%), 'coco, island, near, site' (4.3%) and 'redox-stratifi, reef, sandi' (4.3%) (4 hits in total).

Chemotaxonomy
The principal fatty-acid profile of strain TF-128 T consisted of major amounts of unsaturated fatty acid C 18:1ω7c (57.7%) and 11-methyl C 18:1ω7c (16.6%) in addition to straight-chain fatty acids (12.8%) and hydroxyl fatty acids (9.9%). Apart from the differences in the proportions, the fatty acid profile is similar to those of the type strains of P. gallaeciensis, P. inhibens and P. caeruleus. The major polar lipids of strain TF-218 T are phosphatidylcholine, phosphatidylglycerol, phosphatidylethanolamine, two unidentified lipids and an aminolipid [1].

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of the DOE Joint Genome Institute Community Sequencing Program (CSP) 2010, CSP 441 "Whole genome type strain sequences of the genera Phaeobacter and Leisingera -a monophyletic group of physiologically highly diverse organisms". The genome project is deposited in the Genomes On Line Database [22] and the complete genome sequence is deposited in GenBank. Sequencing and annotation were performed by the DOE Joint Genome Institute (JGI) using state-of-the-art sequencing technology [40]. A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
A culture of DSM 23529 T was grown aerobically in DSMZ medium 514 [41] at 37°C. Genomic DNA was isolated using a Jetflex Genomic DNA Purification Kit (GENOMED 600100) following the standard protocol provided by the manufacturer, but modified by an incubation time of 40 min, the incubation on ice over night on a shaker, the use of an additional 25 µl proteinase K, and the addition of 200 µl protein precipitation buffer. DNA is available from DSMZ through the DNA Bank Network [42]. Evidence codes -TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). Evidence codes are from the Gene Ontology project [39].

Genome sequencing and assembly
The draft genome sequence was generated using Illumina sequencing technology. For this genome, we constructed and sequenced an Illumina shortinsert paired-end library with an average insert size of 221 bp, which generated 21,978,034 reads, and an Illumina long-insert paired-end library with an average insert size of 9,327 +/-1,586 bp, which generated 19,261,756 reads totaling 6,186 Mbp of Illumina data. All general aspects of library construction and sequencing performed can be found at the JGI web site [43]. The initial draft assembly contained 15 contigs in 10 scaffold(s). The initial draft data was assembled with Allpaths [44] and the consensus was computationally shredded into 10 kbp overlapping fake reads (shreds). The Illumina draft data was also assembled with Velvet [45], and the consensus sequences were computationally shredded into 1.5 kbp overlapping fake reads (shreds). The Illumina draft data was assembled again with Velvet using the shreds from the first Velvet assembly to guide the next assembly. The consensus from the second Velvet assembly was shredded into 1.5 kbp overlapping fake reads. The  Genes were identified using Prodigal [47] as part of the DOE-JGI genome annotation pipeline [48], followed by a round of manual curation using the JGI GenePRIMP pipeline [49]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [50].

Genome properties
The genome statistics are provided in Table 3  genes predicted, 4,310 were protein-coding genes and 78 RNA genes, including four rRNA operons. The majority of the protein-coding genes (80.7%) were assigned a putative function, while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.        Table 5). The circular conformation of the two largest extrachromosomal elements was experimentally validated using PCR. The plasmids contain characteristic replication modules of the RepABC-, RepA-and RepB-type comprising a replicase as well as the parAB partitioning operon [51]. The respective replicases that mediate the initiation of replication are designated according to the established plasmid classification scheme [52]. The different numbering of the replicases (e.g., RepC-8, RepC-9a and RepC-9b) from RepABC-type [53,54] plasmids corresponds to specific plasmid compatibility groups that are required for a stable coexistence of the replicons within the same cell [56; unpublished results]. The 276 kb RepC-8 type replicon pDaep_A276 contains an additional DnaA-like I replicase gene (Daep_04147), but the parAB partitioning operon is lacking (Table 6). This distribution may be the result of a plasmid fusion and a functional inactivation of one replication module. This explanation is in agreement with the presence of two postsegregational killing systems (PSK) each consisting of a typical operon with two small genes encoding a stable toxin and an unstable antitoxin [55]. Moreover, this RepC-8 type plasmid contains a large type-VI secretion system (T6SS) with a size of about 30 kb. The role of this export system has first been described in the context of bacterial pathogenesis, but recent findings indicate a more general physiological role in defense against eukaryotic cells and other bacteria in the environment [56][57][58]. We found T6S systems also on DnaA-like I type plasmids of P. caeruleus DSM 24564 T (pCaer_C109), L. methylohalidivorans DSM 14336 T (pMeth_A285) and L. aquimarina DSM 24565 T (pAqui_F126). The 174 kb plasmid pDaep_B174 contains two RepABC-9 type replication modules (Figure 3a). Both of them harbor a specific perfect palindrome sequence (5'-ATCCGCG' [RepABC-9a]; 5'-TTGCACG' [RepABC-9b]) that may represent the functional cisacting anchor for plasmid partitioning [59]. This composite replicon may have either originated from a plasmid fusion or from a horizontal recombination. The latter explanation is supported by two sitespecific XerC recombinase genes (Daep_04383, Daep_04398) that are located head-to-head adjacent to the two replicases repC9-a and repC9-b. This plasmid contains many transposases and putative phage-derived components including a DNAprimase (Daep_04238) and an RNA-directed DNA polymerase (Daep_04390). The general operon structure of this plasmid seems to be scrambled by transposition or recombination events, as illustrated by the type-IV secretion system. pDaep_B174 contains two copies of the characteristic virD-operon comprising the relaxase VirD2 and the coupling protein VirD4 (Table 6). Moreover, the operon contains a complete, as well as a partial, virB gene cluster for the transmembrane channel [57]. The first four genes in the partial cluster are missing, and the truncated virB4 pseudogene (Daep_04339) is flanked by a transposase. But plasmid stability is probably ensured by a PSK system (Table 6).  [60,61], which also belong to the Roseobacter clade. The 117 kb RepA-I type replicon pDaep_C117 contains a LuxR-type two-component transcriptional regulator (Daep_03918) and a complete rhamnose operon [62] and is dominated by genes that are required for polysaccharide biosynthesis. pDaep_C117 RepB-I Daep_03883 ---- † Genes for the initiation of replication, toxin/antitoxin modules and type IV secretion systems (T4SS) that are required for conjugation. The locus tags are accentuated in blue. 1 solitary replicase without partitioning module; 2 presence of adjacent DNA relaxase VirD2; Ψ partial pseudogene. P. daeponensis was described as a facultatively anaerobic bacterium that uses nitrate as electron acceptor [1]. We found genes involved in nitrogen metabolism scattered over the chromosome, involved in the pathways of the assimilatory and the dissimilatory nitrate reduction to ammonia (Daep_03263, _03264 and _03265; Daep_03099, _03100, _03263 and _03264) [63][64][65]. Furthermore, we detected all genes necessary for the dissimilatory nitrate reduction to nitrogen, including a cluster for the nitrate reductase (Daep_03099, _03100), the nitrite reductase (Daep_02798), the nitric oxide reductase (Daep_00020, _00021) and the nitrous oxide reductase (Daep_03697) [64].
P. daeponensis encodes a gene transfer agent (GTA), a virus-like particle that mediates the transfer of genomic DNA between prokaryotes [66]. The GTA cluster has a length of ~17 kb (Daep_01107 -Daep_01126) and has a high homology to GTAs of other Phaeobacter species, e.g. the P. inhibens strains DSM 17395, 2.10 and T5 T [28,67]. Screenings for genes coding for phage-related proteins gave hits for a phage integrase (Daep_00002, _00008 and _01212) and a phage-related gene (Daep_02906), but no complete prophage genomes were detected.
Further genome analysis of P. daeponensis also revealed genes related to secondary metabolism. We found genes coding for a non-ribosomal peptide synthase (Daep_00048, _01832, _01834, _01837, _02357 and _03495) and a polyketide synthase (Daep_00050). Two homologs to the luxRI quorum sensing system [68] were also determined (Daep_01951 and _01952; Daep_03917 and _03918). Genes coding for biosynthesis of tropodithietic acid and siderophores, as described for the P. inhibens strains DSM 17395, 2.10 and T5 T [66,67], were not detected. P. daeponensis was described as a yellowish white colony forming bacterium on Marine Agar (MA; Difco) [1]. Here we could show that P. daeponensis forms blue-framed colonies when grown on YTSS broth [11]. In the genome we found genes probably encoding indigoidine biosynthesis [11]. The respective operon (Daep_03493, _03494, _03495, _03496, _03497 and _03498) is similar to the operon recently described for the closely related strain Phaeobacter sp. Y4I [11]. The luxRI genes and the gene Daep_01773 show homology to the quorum-sensing systems and the clpA gene of Phaeobacter sp. strain Y4I, respectively. Strain Y4I lost its pigmentation by transposon insertions in each of the two luxRI quorum-sensing systems, revealing that pigment production in strain Y4I is regulated via quorum sensing [11]. Transposon insertion in gene clpA of strain Y4I, coding for a universal regulatory chaperone protein ClpA, which degrades abnormal and regulatory proteins, led to a higher pigment production. The presence of the biosynthesis operon and the regulatory systems indicates that P. daeponensis is also able to produce indigoidine in a similar way as strain Y4I. Phylogenetic analysis shows that P. daeponensis and P. caeruleus form a cluster together with the Leisingera species L. methylohalidivorans and L. aquimarina (Figure 1). The cluster is set apart from the clade comprising P. gallaeciensis, P inhibens and P. arcticus, but the backbone of the 16S rRNA gene tree shown in Figure 1 is rather unresolved. Using the Genome-to-Genome Distance Calculator (GGDC) [69][70][71], we performed a preliminary phylogenomic analysis of the draft genomes of the type strains of the genera Leisingera and Phaeobacter and the finished genomes of the P. inhibens strains DSM 17395 and 2.10. Table 7 shows the results of the in-silico calculated DNA-DNA hybridization (DDH) similarities of P. daeponensis to other Phaeobacter and Leisingera species. The highest values were obtained for P. caeruleus, L. aquimarina and L. methylohalidivorans, thus confirming the 16S rRNA gene analysis. A reclassification of P. daeponensis and P. caeruleus as species of the genus Leisingera is one possible solution to taxonomically better represent the genomic data.  [69]. The standard deviations indicate the inherent uncertainty in estimating DDH values from inter-genomic distances based on models derived from empirical test data sets (which are always limited in size); see [69] for details. The distance formulas are explained in [70]. The numbers in parentheses are GenBank accession numbers identifying the underlying genome sequences.
Even though discrepancies between the current classification of the group and the genomic data apparently exist, it is also obvious that P. caeruleus, which forms blue colonies [5], is the closest known relative of P. daeponensis (Table 7). For this reason, the formation of blue colonies by P. daeponensis DSM 23529 T on YTSS medium [11] observed in this study, confirmed by the presence of genes for indigoidine biosynthesis in the genome, is probably of taxonomic relevance. This warrants an update of the taxonomic description of P. daeponensis.

Emended description of the species Phaeobacter daeponensis Yoon et al. 2007
The description of the species Phaeobacter daeponensis is the one given by Yoon et al. 2007 [1], with the following modification. Forms blue colonies when cultivated on YTSS medium.