Genome sequence of the Wenxinia marina type strain (DSM 24838T), a representative of the Roseobacter group isolated from oilfield sediments

Wenxinia marina Ying et al. 2007 is the type species of the genus Wenxinia, a representative of the Roseobacter group within the alphaproteobacterial family Rhodobacteraceae, isolated from oilfield sediments of the South China Sea. This family was shown to harbor the most abundant bacteria especially from coastal and polar waters, but was also found in microbial mats, sediments and attached to different kind of surfaces. Here we describe the features of W. marina strain HY34T together with the genome sequence and annotation of strain DSM 24838T and novel aspects of its phenotype. The 4,181,754 bp containing genome sequence encodes 4,047 protein-coding genes and 59 RNA genes. The genome of W. marina DSM 24838T was sequenced as part of the activities of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project funded by the DoE and the Transregional Collaborative Research Centre 51 (TRR51) funded by the German Research Foundation (DFG).


Introduction
Strain HY34 T (= DSM 24838 T = CGMCC 1.6105 T = JCM 14017 T ) is the type strain of Wenxinia marina in the monospecific genus Wenxinia [1,2], which belongs to the widely distributed marine Roseobacter group [3]. The strain was isolated from sediments of the Xijiang oilfield located in the South China Sea (China) [1]. The genus Wenxinia was named after Professor Wen-Xin Chen, a Chinese pioneer in soil microbiology. The species epithet marina refers to the Latin adjective marina ('of or belonging to the sea') [1,2]. Current PubMed records do not indicate any follow-up research with strain HY34 T after the initial description of W. marina [1].
In this study we analyzed the genome sequence of W. marina DSM 24838 T . We present a description of the genome sequencing and annotation and present a summary classification together with a set of features for strain HY34 T , including novel aspects of its phenotype.

Classifications and features
16S rRNA gene analysis Figure 1 shows the phylogenetic neighborhood of W. marina in a 16S rRNA gene based tree. The sequences of the two identical 16S rRNA gene copies in the genome, differ by three nucleotides from the previously published 16S rRNA gene sequence (DQ640643). A representative genomic 16S rRNA gene sequence of W. marina DSM 24838 T was compared with the Greengenes database for determining the weighted relative frequencies of taxa and (truncated) keywords as previously described [4]. The most frequently occurring genera were Ruegeria (41.6%), Paracoccus (31.0%), Oceanicola (14.0%), Silicibacter (5.0%) and Loktanella (3.3%) (60 hits in total). Among all other species, the one yielding the highest score was Oceanicola granulosus (AAOT01000021), which corresponded to an identity of 94.7% and an HSP coverage of 99.6%. (Note that the Greengenes database uses the INSDC (= EMBL/NCBI/DDBJ) annotation, which is not an authoritative source for nomenclature or classification.) The highest-scoring environmental sequence was DQ640643 (Greengenes short name 'Rhodobacteraceae South China Sea oil field sediment isolate HY34 Rhodobacteraceae str. HY34'), which showed an identity of 99.8% and an HSP coverage of 100.0%. The most frequently occurring keywords within the labels of all environmental samples that yielded hits were 'microbi' (4.3%), 'coral' (3.6%), 'sea' (2.6%), 'diseas' (2.5%) and 'china' (2.4%) (190 hits in total). The most frequently occurring keywords within the labels of those environmental samples which yielded hits of a higher score than the highest scoring species were 'antecubit, fossa, skin' (13.9%) and 'china, field, oil, rhodobacteracea, sea, sediment, south' (8.3%) (3 hits in total). Some of these keywords fit well to the isolation site of strain HY34 T [1].

Figure 1.
Phylogenetic tree highlighting the position of W. marina relative to the type strains of the neighboring genera Citreicella and Rubellimicrobium. The tree was inferred from 1,381 aligned characters of the 16S rRNA gene sequence under the maximum likelihood (ML) criterion as previously described [4]. The branches are scaled in terms of the expected number of substitutions per site. Numbers adjacent to the branches are support values from 1,000 ML bootstrap replicates (left) and from 1,000 maximum-parsimony bootstrap replicates (right) if larger than 60% [4]. Lineages with type strain genome sequencing projects registered in GOLD [5] are labeled with one asterisk, those also listed as 'Complete and Published' with two asterisks [6].

Morphology and physiology
Cells of strain HY34 T form Gram-negative, ovoid or short rods (0.7-0.8 µm in width and 1.3 µm in length) [ Figure 2]. Motility and sporulation were not observed. Cells are strictly aerobic and display a heterotrophic lifestyle. When cultured on Marine Agar 2216 colonies with a weak pink color be-came visible, but bacteriochlorophyll a was not detected. The strain grows in a temperature range of 15-42°C with an optimum at 34-38°C. NaCl is required for growth (0.5-9%) with an optimum salt concentration at 1-4%. Further, the strain grows in a range of pH 6.5-8.5 with an optimum pH of 7.5-8.0. The strain is oxidase-and catalase-positive. Nitrate is reduced to nitrite. Indole and H2S are not produced. Cells hydrolyze urea and Tween 20, and a weak hydrolysis of Tween 40 and Tween 80 was also detected. The strain does not hydrolyze agar, casein, starch, DNA or CMcellulose.

Genome sequencing and annotation Genome project history
This strain was twice selected for genome sequencing on the basis of its phylogenetic position [14]. First as part of the DFG funded project "Ecology, Physiology and Molecular Biology of the Roseobacter clade: Towards a Systems Biology Understanding of a Globally Important Clade of Marine Bacteria" and later as part of the "Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project" [15], a follow-up of the GEBA project [16], which aims in increasing the sequencing coverage of key reference microbial genomes. The strain was independently sequenced from the same source of DNA and produced draft sequences that were finally joined. The project information can found in the Genomes OnLine Database [5] and the Integrated Microbial Genomes database (IMG) [17]. A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
A culture of W. marina DSM 24838 T was grown aerobically in DSMZ medium 514 [18] at 30°C. Genomic DNA was isolated using Jetflex Genomic DNA Purification Kit (GENOMED 600100) following the standard protocol provided by the manufacturer but modified by an incubation time of 60 min, incubation on ice overnight on a shaker, the use of an additional 50 µl proteinase K, and the addition of 100 µl protein precipitation buffer. The DNA is available from the Leibniz-Institute DSMZ through the DNA Bank Network [19]. Evidence codes -TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). Evidence codes are from of the Gene Ontology project [50].

Genome sequencing and assembly
The genome sequencing under the DFG funded part of the project was perform as previously described for Rubellimicrobium thermophilum [6], with 3.3 million reads delivered by the first run on an Illumina GAII platform. To increase the sequencing depth, a second Ilumina run was performed, providing another 8.1 million reads. The first draft assembly from 9,139,639 filtered reads (median read length 122 nt) resulted in more than 300 contigs. To gain information on the contig arrangement an additional 454 run was performed. The paired-end pyrosequencing jumping library resulted in 158,608 reads, with an average read length of 450 bp. Both draft assemblies (Illumina and 454 sequences) were fractionated into artificial Sanger reads of 1,000 nt in length plus 75 bp overlap on each site. These artificial reads served as an input for the phred/phrap/consed package [20]. In combination the assembly resulted in 265 contigs in 26 scaffolds. The genome sequencing under the DoE funded part of the project was performed as previously described for Halomonas zhanjiangensis [21] also using the Illumina technology [22]. An Illumina Standard shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform. All general aspects of library construction and sequencing performed at the JGI can be found at [23]. The final assembly for this part of the project resulted in 41 scaffolds covering 4,175,892 bp (ARAY00000000). The draft sequence from the first (DFG-funded) part was mapped to the permanent draft version ARAY00000000 using minimus2 [24]. By manual editing the number of contigs was reduced to 22 in 8 scaffolds (AONG00000000). The combined sequences provided a 356 × coverage of the genome.

Genome annotation
Genes were identified using Prodigal [25] as part of the JGI genome annotation pipeline. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Identifications of RNA genes were carried out by using HMMER 3.0rc1 [26] (rRNAs) and tRNAscan-SE 1.23 (tRNAs) [27]. Other non-coding genes were predicted using INFERNAL 1.0.2 [28]. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [29] CRISPR elements were detected using CRT [30] and PILER-CR [51].

Genome properties
The genome statistics are provided in Table 3 and Figure 3. The genome of DSM 24838 T has a total length of 4,181,754 bp and a G+C content of 70.5%. Of the 4,106 genes predicted, 4,047 were proteincoding genes, and 59 RNAs. The majority of the protein-coding genes (80.4%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Phages
Many bacteria encode genome-inserted gene sequences, which are associated with prophages, one of the major reason for horizontal gene transfer and bacterial diversity [34,35]. The genome sequence of W. marina DSM 24838 T was found to encode several prophage-associated gene sequences (e.g., wenxma_00641 to wenxma_00646, wenxma_00930 to wenxma_00936, wenxma_01496 to wenxma_01510).

Quorum Sensing
Analysis of the DSM 24838 T genome sequence revealed the presence of gene sequences associated to quorum sensing (QS) [36][37][38]. QS is a bacterial communication system via chemical signal molecules called autoinducers, which are produced and released by QS bacteria to coordinate behaviors with respect to their population density [38]. Interestingly and surprisingly, QS induces also individual morphologies and cell division modes, which was recently shown for D. shibae DFL-12, another representative of the Roseobacter group [39,40]. Regarding to QS the genome of DSM 24838 T codes for, e.g., two N-acyl-L-homoserinelactone synthetases (LuxI homologues, wenxma_01086 and wenxma_03269) and two genes possibly encoding QS-involved response and transcriptional regulators (LuxR homologues, wenxma_01085 and wenxma_03267).

Morphological traits
With regard to morphological traits, several genes associated with the putative production, biosynthesis and export of exopopolysaccharides (wenxma_00281, wenxma_02363 and wenxma_02364, wenxma_03720 and wenxma_03721) and capsule polysaccharides (wenxma_00822, wenxma_02023 to wenxma_02025, wenxma_02704 and wenxma_02705, wenxma_04069) were detected. Interestingly, the genome of DSM 24838 T was found to encode several gene sequences putatively involved in pili formation (e.g., wenxma_01776 to wenxma_01787, wenxma_03426 to wenxma_03435) and chemotaxis (e.g., wenxma_3823 to wenxma_03830), although the strain was described as non-motile [1]. Hence, it could be that the formed pili play a role for adhesion or switching-type motility on solid surfaces. Further, according to its genome strain DSM 24838 T accumulates polyhydroxyalkanoates as storage compounds (wenxma_02601 to wenxma_02604), which is in accordance with the findings of Ying and colleagues for strain HY34 T [1].