Genome sequence of the pink–pigmented marine bacterium Loktanella hongkongensis type strain (UST950701–009PT), a representative of the Roseobacter group

Loktanella hongkongensis UST950701-009PT is a Gram-negative, non-motile and rod-shaped bacterium isolated from a marine biofilm in the subtropical seawater of Hong Kong. When growing as a monospecies biofilm on polystyrene surfaces, this bacterium is able to induce larval settlement and metamorphosis of a ubiquitous polychaete tubeworm Hydroides elegans. The inductive cues are low-molecular weight compounds bound to the exopolymeric matrix of the bacterial cells. In the present study we describe the features of L. hongkongensis strain DSM 17492T together with its genome sequence and annotation and novel aspects of its phenotype. The 3,198,444 bp long genome sequence encodes 3104 protein-coding genes and 57 RNA genes. The two unambiguously identified extrachromosomal replicons contain replication modules of the RepB and the Rhodobacteraceae-specific DnaA-like type, respectively.


Introduction
Loktanella hongkongensis UST950701-00P T (= DSM 17492 T = NRRL B-41039 T = JCM 12479 T ) was isolated from a biofilm grown naturally on a glass coupon that had been submerged in the coastal seawater of Hong Kong for 7 days in July 1995 [1]. In the marine environment, bacteria in biofilms mediate the settlement and metamorphosis of the planktonic larvae of many benthic invertebrates. The cells of UST950701-00P T , when attached as a biofilm, were able to induce settlement and metamorphosis of the polychaete Hydroides elegans [2]. The chemical cues mediating the larval response were found to be low-molecular weight compounds associated with the exopolymeric matrix of the bacterial cells [3][4][5].
In this study we analyzed the genome sequence of L. hongkongensis DSM 17492 T . We present a description of the genome sequencing, an annotation and a summary classification together with a set of features for strain, including novel aspects of its phenotype.

Organism information
Classification and features Figure 1 shows the phylogenetic neighborhood of L. hongkongensis DSM 17492 T in a 16S rRNA gene based tree. The sequence of the single 16S rRNA gene copy in the genome does not differ from the previously published 16S rRNA gene sequence (AY600300).
The single genomic 16S rRNA gene sequence of L. hong-kongensis DSM 17492 T was compared with the Greengenes database for determining the weighted relative frequencies of taxa and (truncated) keywords as previously described [6]. The most frequently occurring genera were Loktanella (46.2 %), Ketogulonicigenium (14.9 %), Methylarcula (10.3 %), Silicibacter (10.0 %) and Ruegeria (8.5 %) (65 hits in total). Regarding the five hits to sequences from representatives of the species, the average identity within high-scoring segment pairs was 99.6 %, whereas the average coverage by HSPs was 98.0 %. Regarding the 13 hits to sequences from other representatives of the genus, the average identity within HSPs was 95.6 %, whereas the average coverage by HSPs was 97.6 %. Among all other species, the one yielding the highest score was Loktanella vestfoldensis (NR_029021), which corresponded to an identity of 95.8 % and a HSP coverage of 99.4 %. (Note that the Greengenes database uses the INSDC (= EMBL/NCBI/DDBJ) annotation, which is not an authoritative source for nomenclature or classification). The highest-scoring environmental sequence was FJ869048 (Greengenes short name 'Roseobacter isolates Chesapeake Bay water 2 m depth isolate CB1079Rhodobacterales str. CB1079'), which showed an identity of 99.2 % and an HSP coverage of 99.9 %. The most frequently occurring keywords within the labels of all environmental samples which yielded hits were 'lake' (8.6 %), 'tin' (7.1 %), 'qinghai' (6.4 %), 'microbi' (3.2 %) and 'sea' (3.1 %) (185 hits in total). The most frequently occurring keywords within the labels of those environmental samples which yielded hits of a higher score than the highest scoring species were 'sea' (15.4 %), 'water' (7.7 %), 'bloom, chl, concentr, contrast, diatom, dure, filter, non-bloom, spring, station, success, surfac, yel' (5.1 %) and 'bai, chesapeak, depth, roseobact' (2.6 %) (3 hits in total). These keywords fit well to the isolation site of strain UST950107-009P T .
L. hongkongensis UST950107-009P T is Gram-negative and non-spore forming (Table 1). Cells are short rods and non-motile (Fig. 2). When grown on Marine Agar 2216 (Difco) at 30˚C in the absence of light, colonies are pink in color, convex with entire margin, and have smooth and shiny surface; brown diffusible pigment is produced. However, whitish colonies would emerge from every culture upon aging (3 days or beyond). The colonies of the white morphovar, with otherwise identical morphological properties, can be maintained as separate cultures (UST950701-009 W) without turning pink. L. hongkongensis UST950107-009P T cannot grow on nutrient agar or trypticase-soy agar (both from Oxoid).
The growth of L. hongkongensis UST950701-009P T is strictly aerobic and requires at least 2 % NaCl (up to 14 %). The ranges of temperature and pH where its growth can occur are 8-44˚C and 5.0-10.0, respectively.  [6,13]. The tree was inferred from 1353 aligned characters of the 16S rRNA gene sequence under the maximum likelihood (ML) criterion as previously described [14]. Rooting was done initially using the midpoint method and then checked for its agreement with the current classification ( Table 1). The branches are scaled in terms of the expected number of substitutions per site. Numbers adjacent to the branches are support values from 350 ML bootstrap replicates (left) and from 1000 maximum-parsimony bootstrap replicates (right) if larger than 60 % [6]. Lineages with type strain genome sequencing projects registered in GOLD [7] are labeled with one asterisk, those also listed as 'Complete and Published' with two asterisks L. hongkongensis UST950107-009P T can utilize a wide range of mono-, di-, tri-and polysaccharides, and sugar alcohols. Citrate is not utilized. Catalase, oxidase and beta-galactosidase activities are positive whereas arginine dihydrolase, lysine decarboxylase, ornithine decarboxylase, urease, tryptophane deaminase and gelatinase are negative. L. hongkongensis UST950701-009P T does not produce bacteriochlorophyll a, indole, acetoin or H 2 S. It cannot hydrolysis casein or tween 80. Streptomycin, penicillin, chloramphenicol, amplicilin and tetracycline can inhibit the growth of L. hongkongensis UST950107-009P T but kanamycin cannot (all data from [1]).
The phenotype of the strain was described as well as the assimilation of a wide range of sugars was tested by Lau et al. [1] with the API50CH system, which is based on the detection of biochemical reactions. Using the API50CH system positive reactions were found for more than 20 carbon sources. None of these results could be confirmed by the OmniLog measurement. L. hongkongensis was positive for only five sugars, as well as for a number of carboxylic acids (e.g. malate and citrate) and amino acids. This observation agrees with the finding of Van Trappen et al. [6], who determined the phenotype of three Loktanella strains using API20NE, except for the difference that no positive reaction was found for the carbon sources given in [6]. Positive reactions found in the OmniLog measurements but not in growth experiments might be due to the higher sensitivity of the former [17].

Genome sequencing and annotation
Genome project history The genome was sequenced within the project "Ecology, Physiology and Molecular Biology of the Roseobacter clade: Towards a Systems Biology Understanding of a Globally Important Clade of Marine Bacteria". The strain was chosen for genome sequencing according to the Genomic Encyclopedia of Bacteria and Archaea criteria [29]. For the same reason it was previously also chosen as part of the "Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes project" [51,52], a follow-up of the GEBA project [30], which aims at increasing the sequencing coverage of key reference microbial genomes. Two draft sequences were produced independently from the same source of DNA and finally joined. According project information can found in the Genomes OnLine Database [31]. The Whole Genome Shotgun sequence is deposited in Genbank and the Integrated Microbial Genomes database (IMG) [32]. A summary of the project information is shown in Table 2.

Growth conditions and genomic DNA preparation
A culture of strain DSM 17492 T was grown aerobically in DSMZ medium 514 [33] at 28°C. Genomic DNA was isolated using Jetflex Genomic DNA Purification Kit (GENOMED 600100) following the standard protocol provided by the manufacturer but modified by an incubation time of 60 min, incubation on ice over night on a shaker, the use of additional 50 μl proteinase K, and the addition of 100 μl protein precipitation buffer. DNA is available from the DSMZ through the DNA Network [34].

Genome sequencing and assembly
The genome was sequenced using a combination of two libraries ( Table 2). Illumina sequencing was performed on a GA IIx platform with 150 cycles. The paired-end library contained inserts of an average of 500 bp in length. The first run on Illumina GAII platform delivered 1.0 million reads. A second Illumina run was performed on a Miseq platform to gain a higher sequencing depth. To achieve longer reads, the library was sequenced in one direction for 300 cycles, providing another 2.1 million reads. After error correction and clipping by fastqmcf [35] and quake [36], the data was assembled using velvet [37]. A total of 2,403,257 reads with a mean length of 126 bp passed the filter step and were assembled in 54 contigs. To gain information on the contig arrangement an additional 454 run was performed. The paired-end jumping library of 3 kb insert size was sequenced on a 1/8 lane. Pyrosequencing resulted in 158,608 reads with an average length of 337 bp. A total of 41 scaffolds was obtained from Newbler assembler (Roche Diagnostics). Both draft assemblies (Illumina and 454 sequences) were fractionated into artificial Sanger reads of 1000 nt in length plus 75 bp overlap on each site. These artificial reads served as an input for the phred/phrap/consed package [38]. By manual editing the number of contigs was reduced to 13. Using minimus2 [39], the resulting sequence was mapped to an existing permanent draft version of the genome published on IMG-ER by the DOE Joint Genome Institute, which was sequenced as described earlier [53]. The source DNA of both samples was obtained from the same origin DSM 17492 T . The combined sequences provided a 132 × coverage of the genome.

Genome annotation
Genes were identified using Prodigal [40] as part of the JGI genome annotation pipeline. The predicted CDSs were translated and used to search the National Center for Biotechnology Information nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Identification of RNA genes were carried out by using HMMER 3.0rc1 [41] (rRNAs) and tRNAscan-SE 1.23 [42] (tRNAs). Other noncoding genes were predicted using INFERNAL 1.0.2 [43] Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review platform [44] CRISPR elements were detected using CRT [45] and PILER-CR [46].

Genome properties
The genome statistics are provided in Table 3 and Fig. 3. The genome of strain DSM 17492 T has a total length of 3,198,444 bp and a G + C content of 68.3 %. Of the 3161 genes predicted, 3104 were identified protein-coding genes, and 57 RNAs. The majority of the protein-coding genes were assigned a putative function (83.9 %) while the   Table 4.

Insights from the genome sequence
Genome sequencing of L. hongkongensis DSM 17492 T reveals the presence of two plasmids with sizes of about 85 kb and 103 kb (Table 5). These plasmids contain characteristic replication modules of the RepB and DnaA-like type comprising a replicase as well as the parAB partitioning operon. The respective replicases that mediate the initiation of replication are designated according to the established plasmid classification scheme [47]. The different numbering of the replicases (RepB-I, DnaA-like I) corresponds to specific plasmid compatibility groups that are required for a stable coexistence of the replicons within the same cell. Type-IV secretion systems for conjugative plasmid transfer [48,49] and postsegregational killing systems, consisting of a typical operon with two small genes encoding a stable toxin and an unstable antitoxin [50], are missing on both plasmids. The presence of a RepA-I plasmid replicase (lokhon_02202) in close proximity to a complete rRNA operon on the chromosomal 1,0 MB contig 684.8 is conspicuous. The parAB partitioning operon is located 15 genes downstream of repA-I indicating that the replication module has been subjected to several recombination events with the chromosome and is probably not functional any more. However, genome finishing would be required to document the presence of a single chromosomal replicon in L. hongkongensis DSM 17492 T .

Conclusion
The marine Roseobacter group is widely distributed in the marine environment. In this study we analyzed the genome sequence of L. hongkongensis UST950701-009P T , which was isolated from a marine biofilm, and summarized known and newly revealed aspects of its phenotype. Genome analysis of this type strain demonstrated at least two extrachromosomal elements with replication systems specific or at least characteristic for the family Rhodobacteraceae.