Complete genome sequence of Coraliomargarita akajimensis type strain (04OKA010-24T)

Coraliomargarita akajimensis Yoon et al. 2007 is the type species of the genus Coraliomargarita. C. akajimensis is an obligately aerobic, Gram-negative, non-spore-forming, non-motile, spherical bacterium that was isolated from seawater surrounding the hard coral Galaxea fascicularis. C. akajimensis is of special interest because of its phylogenetic position in a genomically under-studied area of the bacterial diversity. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the family Puniceicoccaceae. The 3,750,771 bp long genome with its 3,137 protein-coding and 55 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain 04OKA010-24 T (DSM 45221 = JCM 23193 = KCTC 12865) is the type strain of the species Coraliomargarita akajimensis and was first described in 2007 by Yoon et al. [1]. Strain 04OKA010-24 T was isolated from seawater surrounding the hard coral Galaxea fascicularis L., collected at Majanohama, Akajima, Okinawa, Japan. Yoon et al. considered strain C. akajimensis 04OKA010-24 T to represent a novel species in a new genus belonging to subdivision 4 of the phylum Verrucomicrobia. Based on 16S rRNA the phylum Verrucomicrobia has been divided into five subdivisions [2]. In the second edition of Bergey's Manual of Systematic Bacteriology three subdivisions were included at the rank of family: 'Verrucomicrobiaceae' (subdivision 1), 'Xiphinematobacteriaceae' (subdivision 2) and 'Opitutaceae' (subdivision 4) [3]. There were three identified species in subdivision 4, Opitutus terrae [4][5][6] isolated from soil and the marine bacteria 'Fucophilus fucoidanolyticus' [7], isolated from a sea cucumber and Alterococcus agarolyticus [8], isolated from a hot spring that was originally misclassified as a member of the Gammaproteobacteria.
In 2007, coincident to the description of C. akajimensis, the class Opitutae, which comprises two orders: the order (Puniceicoccales containing the family Puniceicoccaceae and the order Opitutales containing the family Opitutaceae) was proposed for the classification of species belonging to subdivision 4 of the phylum 'Verrucomicrobia' [9]. Besides the genus Coraliomargarita [1] the genera Cerasicoccus [10], Pelagicoccus [11], Puniceicoccus [9] belong into the family Puniceicoccaceae. Here we present a summary classification and a set of features for C. akajimensis 04OKA010-24 T , together with the description of the complete genomic sequencing and annotation.

Classification and features
Within the class Opitutae, strain C. akajimensis 04OKA010-24 T shares the highest degree of 16S rRNA gene sequence similarity with Puniceicoccus vermicola (88.3%), isolated from the digestive tract of a marine clamworm [5], and Pelagicoccus croceus (87.6%) [12], whereas the other members of the class share 84.1 to 87.2% sequence similar-ity [13]. 'Lentimonas marisflavi' and 'Fucophilus fucoidanolyticus' are the closest related cultivable strains (94.0% sequence similarity), whose names are not yet validly published. 'Fucophilus fucoidanolyticus' was isolated from sea cucumbers (Sticopus japonicus) and is able to degrade fucoin [14]. GenBank contains also a large number of 16S rRNA sequences with reasonably high sequence similarity from phylotypes (uncultured bacteria) reflecting the problem of efficient culturing of bacteria from the class Opitutae. However, only few sequences from genomic and marine metagenomic surveys surpass 90% sequence similarity, indicating that members of the genus Coraliomargarita are not widely distributed globally in the habitats screened thus far (status April 2010). Figure 1 shows the phylogenetic neighborhood of C. akajimensis 04OKA010-24 T in a 16S rRNA based tree. The two copies of the 16S rRNA gene in the genome are identical with the previously published sequence generated from DSM 45221 (AB266750). Phylogenetic tree highlighting the position of C. akajimensis 04OKA010-24 T relative to the other type strains within the phylum Verrucomicrobia. The tree was inferred from 1,373 aligned characters [15,16] of the 16S rRNA gene sequence under the maximum likelihood criterion [17] and rooted in accordance with the current taxonomy [18]. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 300 bootstrap replicates [19] if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [20] are shown in blue (Akkermansia muciniphila CP001071, Opitutus terrae CP001032), published genomes in bold.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position [27], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [28]. The genome project is deposited in the Genome OnLine Database [20] and the complete genome sequence is deposited in Gen-Bank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
C. akajimensis 04OKA010-24 T , DSM 45221, was grown in DSMZ medium 514 (bacto marine growth medium) [29] at 25°C. DNA was isolated from 0.5-1 g of cell paste using a MasterPure Gram Positive DNA purification kit (Epicentre MGP04100), adding 5 µl mutanolysin to the standard lysis solution for 40 min at 37°C and a final 35 min incubation on ice after the MPC-step.  Altitude not reported Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [26]. If the evidence code is IDA, then the property was directly observed by one of the authors or an expert mentioned in the acknowledgements. consed -a 50 -l 350 -g -m -ml 20. The initial Newbler assembly was converted into a phrap assembly by making fake reads from the consensus and collecting the read pairs in the 454 paired end library. Illumina sequencing data was assembled with Velvet [30], and the consensus sequences were shredded into 1.5 kb overlapped fake reads and assembled together with the 454 data. The Phred/Phrap/Consed software package was used for sequence assembly and quality assessment in the following finishing process. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible misassemblies were corrected with gapResolution, Dupfinisher, or sequencing cloned bridging PCR fragments with subcloning or transposon bombing [31]. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks (J-F. Cheng, unpublished). A total of 297 additional Sanger reactions were necessary to close gaps and to raise the quality of the finished sequence. Illumina reads were also used to improve the final consensus quality using Polisher [32]. The error rate of the completed genome sequence is less than 1 in 100,000.

Genome annotation
Genes were identified using Prodigal [33] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [34]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, Uni-Prot, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and In-terPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [35].

Genome properties
The genome is 3,750,771 bp long and comprises one main circular chromosome with a 53.6% GC content (Table 3 and Figure 3). Of the 3,192 genes predicted, 3,137 were protein-coding genes, and 55 RNAs. Seventeen pseudogenes were also identified. The majority of the protein-coding genes (63.6%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Insights from genome sequence
With 94% identity based on 16S rRNA analysis 'F. fucoidanolyticus' is one of the closest related, cultivated organism to C. akajimensis. Sakai and colleagues report the existence of intracellular α-Lfucosidases and sulfatases, which enable 'F. fucoidanolyticus' to degrade fucoidan [14]. This fucoidan degrading ability could be shared by C. akajimensis, as the annotation of the genome sequence revealed the existence of 49 sulfatases and 12 α-Lfucosidases belonging to glycoside hydrolase family 29. Furthermore 12 β-agarases are encoded in the genome of C. akajimensis, which is not in ac-cordance to Yoon et al., who reported that agar was not hydrolyzed by C. akajimensis [1]. Fortytwo genes coding for transcriptional regulators belonging to the AraC-family were found in C. akajimensis. It might be noteworthy that the genes coding for the AraC-family regulators, agarases, sulfatases and α-L-fucosidases are unequally distributed over the genome, with most of them localized in the first third of the genome (bp 33,731-1,412,308). The genes for several fucosidases and sulfatases are clustered and their expression might be under the control of an AraC-family regulator. In addition to C. akajimensis only two more genomes of members of the Opitutae are sequenced (but not yet published): Opitutus terrae, an obligately anaerobic, motile bacterium isolated from a rice paddy soil microcosms [6] and Opitutaceae bacterium TAV2 isolated from the gut of a woodfeeding termite. Because of the quite distant relatedness of these three sequenced organisms, a comparison of genomes seems to be of limited use. The reported characteristic differences between the Opitutae [1] are partly reflected in the now known genome sequence. In the case of the motile bacterium O. terrae 36 proteins belonging to the COG pathway 'flagellum structure and biogenesis' are predicted, whereas in the genome of the nonmotile C. akajimensis, no proteins belonging in this category are encoded. Another characteristic fea-ture is the ability to reduce nitrate.  [14]. In the case of starch hydrolysis, the genome data match the experimental data previously reported. The O. terrae reported to be starch-hydrolyzing encodes one α-amylase and for three proteins containing α-amylase domains. For C. akajimensis, starch hydrolysis is not reported and in the genome there is only one gene identified that could encode for an α-amylase.