Complete Genome Sequence of Phreatobacter sp. Strain NMCR1094, a Formate-Utilizing Bacterium Isolated from a Freshwater Stream

Phreatobacter sp. strain NMCR1094 was isolated from a freshwater stream. In this study, we report the complete genome sequence of strain NMCR1094, which contains 4,974,952 bp with 65.8% G+C content and 4,701 predicted coding sequences. In particular, the Phreatobacter sp. NMCR1094 genome contains a formate dehydrogenase region.

T he type species Phreatobacter oligotrophus, belonging to the genus Phreatobacter of the class Alphaproteobacteria, was originally isolated from ultrapure water from a water storage tank (1). Presently, three species in this genus with a valid name have been published (http://www.bacterio.net/). The bacteria classified under the genus Phreatobacter are strictly aerobic, motile, and Gram-negative rods (1)(2)(3).
Phreatobacter sp. strain NMCR1094 (ϭFBCC-B2502 ϭKACC 19706 ϭNBRC 113394) was isolated from the surface of freshwater in Yeongdeok, Republic of Korea (36°24=41.3ЉN, 129°21=51.0ЉE), using a standard dilution plating method on R2A agar (BD Difco) medium. Strain NMCR1094 is a novel Gram-negative, aerobic, motile (by means of a polar flagellum), and rod-shaped bacterium. The 16S rRNA gene sequence of strain NMCR1094 was obtained from the complete genome sequence. The resulting 16S rRNA gene sequence was compared with sequences in the EzBioCloud database (4), which revealed that strain NMCR1094 was the most closely related to Phreatobacter cathodiphilus (98.7% similarity), followed by Phreatobacter stygius (98.5%), and Phreatobacter oligotrophus (98.4%). In the phylogenetic tree based on 16S rRNA gene sequences, strain NMCR1094, P. cathodiphilus S-12 T , P. stygius YC6-17 T , and P. oligotrophus PI_21 T formed a robust clade with high bootstrap values, indicating that strain NMCR1094 is a member of the genus Phreatobacter. The aim of the present study was to sequence the genome of the strain NMCR1094 in order to elucidate its metabolic potential and taxonomic position.
Strain NMCR1094 was grown aerobically at 25°C in the R2A agar medium used for the isolation of pure culture. For sequencing the complete genome, genomic DNA from strain NMCR1094 was extracted and further purified using the DNeasy blood and tissue kit (Qiagen) and Wizard genomic DNA purification kit (Promega), respectively. Sequencing was performed on the PacBio RS II platform (Pacific Biosciences, USA) using one single-molecule real-time (SMRT) cell at DNA Link (Seoul, South Korea), producing 217,315 bp of long reads and 1,851,789,035 bp after subread filtering. The wholegenome de novo assembly was carried out with Hierarchical Genome Assembly Process 3.0 (HGAP 3.0) (5). As the estimated genome size was 4,974,952 bp and the average coverage was 183ϫ, after preassembly, 6,226 error-corrected long subreads (seed bases; 150,014,664 bp) were generated and de novo assembled for making the wholegenome sequence. As a result of the HGAP process, we obtained an N 50 contig value of 4,974,952 bp and a total contig length of 4,974,952 bp, using a polishing process. The finalized genome was circularized manually using CLC Genomics Workbench v8.0 (CLC bio, USA), and putatively ambiguous areas were visually inspected.
The complete genome sequence of NMCR1094 is composed of a single circular chromosome. Putative gene-coding sequences (CDSs) from the assembled contigs were identified using Glimmer v3.02 (6), and open reading frames (ORFs) were obtained. These ORFs were searched using Blastall alignment against the NCBI nonredundant protein database (nr) for all species. The data were submitted to the Rapid Annotations using Subsystems Technology (RAST) server (7) and the National Center for Biotechnology Information (NCBI) genome sequence database. Identification of potential coding sequences was accomplished using the Basic Local Alignment Search Tool (BLAST) against the UniProt (8), Pfam (9), and Clusters of Orthologous Groups (COGs) (10) databases. Signal peptides and transmembrane helices were predicted using SignalP 4.1 (11) and TMHMM v2.0 (12). Genes for rRNA, tRNA, and other miscellaneous features were predicted using RNAmmer v1.2 (13), tRNAscan-SE v1.21 (14), and Rfam v12.0 (15). Automatic detection of clustered regularly interspaced palindromic repeats was conducted using MinCED v0.2.0 (16). Default parameters were used for all software programs unless otherwise noted. The carbohydrate-active and associated binding modules in strain NMCR1094 were determined using the Carbohydrate-Active enZyme (CAZy) database (http://www.cazy.org/) (17).
The complete genome size is 4,974,952 bp with 65.8% GϩC content. Gene prediction revealed that this genome comprises 4,701 CDSs, 48 tRNA genes, and 6 rRNA genes. The genes were classified into 21 COG functional categories. According to the annotations assigned using the CAZyme database, the genome of strain NMCR1094 contained 82 carbohydrate-active enzyme genes that include 17 genes encoding glycoside hydrolases (GHs), 58 genes encoding glycosyltransferases (GTs), 5 genes encoding carbohydrate esterases (CEs), and 2 genes encoding carbohydrate-binding modules (CBMs). These substances are responsible for the potential utilization of carbohydrates. The Phreatobacter sp. NMCR1094 genome contains a formate dehydrogenase gene cluster (Fig. 1). The genes fdhF (NMCR1094_02996), fdsB (NMCR1094_02997), fdsG (NMCR1094_02998), and fdhD (NMCR1094_03003) are predicted to encode subunits of formate dehydrogenase (FDH), which catalyzes the final step in the pathway involved in the reversible conversion of formate to CO 2 (18). mobB (NMCR1094_02999), moeA (NMCR1094_03000), and mobA (NMCR1094_03002) are predicted to encode proteins for the synthesis of a molybdenum cofactor essential for the activity of most bacterial molybdoenzymes (19). Therefore, the genomic information reveals novel insights into formate dehydrogenase in oligotrophic freshwater environments.
Data availability. The genome sequence of Phreatobacter sp. NMCR1094 has been deposited in GenBank under accession number CP039865. The associated BioProject and BioSample accession numbers are PRJNA533000 and SAMN11431406, respectively. The version described in this paper is the first version.