High-quality draft genome sequence of Flavobacterium suncheonense GH29-5T (DSM 17707T) isolated from greenhouse soil in South Korea, and emended description of Flavobacterium suncheonense GH29-5T

Flavobacterium suncheonense is a member of the family Flavobacteriaceae in the phylum Bacteroidetes. Strain GH29-5T (DSM 17707T) was isolated from greenhouse soil in Suncheon, South Korea. F. suncheonense GH29-5T is part of the GenomicEncyclopedia ofBacteria andArchaea project. The 2,880,663 bp long draft genome consists of 54 scaffolds with 2739 protein-coding genes and 82 RNA genes. The genome of strain GH29-5T has 117 genes encoding peptidases but a small number of genes encoding carbohydrate active enzymes (51 CAZymes). Metallo and serine peptidases were found most frequently. Among CAZymes, eight glycoside hydrolase families, nine glycosyl transferase families, two carbohydrate binding module families and four carbohydrate esterase families were identified. Suprisingly, polysaccharides utilization loci (PULs) were not found in strain GH29-5T. Based on the coherent physiological and genomic characteristics we suggest that F. suncheonense GH29-5T feeds rather on proteins than saccharides and lipids.


Introduction
Flavobacteria/Cytophagia have been frequently observed in aquatic and soil habitats [1][2][3] and play a major role in polysaccharide decomposition [2,4,5]. Type strains of the genus Flavobacterium have been isolated from many different habitats such as fresh water, sea ice and soil, and some Flavobacterium strains are pathogenic to humans and animals [2,6]. Strain GH29-5 T (= DSM 17707 T = CIP 109901 T = KACC 11423 T ) is the type strain of Flavobacterium suncheonense [2,7], which belongs to Flavobacteriaceae [8]. F. suncheonense GH29-5 T was isolated from greenhouse soil in Korea [10]. Flavobacterium johnsoniae UW101 T , a well studied model organism, was as well isolated from soil [11,12] and harbors a considerable number of CAZymes and PULs [13]. Thus, an investigation of the genome of strain GH29-5 T will give further insights into the variety of CAZymes and the polysaccharide decomposition potential of this microrganism.
Here we present the set of carbohydrate active enzymes, polysaccharide utilization loci and peptidases of F. suncheonense GH29-5 T , together with a set of phenotypic features and the description and annotation of the high-quality draft genome sequence from a culture of DSM 17707 T .

Classification and features
The sequence of the single 16S rRNA gene copy in the genome is identical with the previously published 16S rRNA gene sequence (DQ222428). Figure 1 shows the phylogenetic neighborhood of F. suncheonense GH29-5 T inferred from a tree of 16S rRNA gene sequence, as previously described [14]. The next related type species are F. cauense R2A-7 T (EU521691), F. enshiense DK69 T (JN790956), F. limnosediminis JC2902 T (JQ928688) and F. saliperosum S13 T (DQ021903) with less than 95.9 % 16S rRNA gene identity. The 16S rRNA gene sequence of strain GH29-5 T has an identity of only 93.9 % with F. aquatile DSM 1132 T (AM230485).
The 16S rRNA gene sequence of F. suncheonense GH29-5 T was compared with the Greengenes database [15]. Considering the best 100 hits, 99 sequences belonged to Flavobacterium and one sequence to Cytophaga sp. (X85210). Among the most frequent keywords within the labels of environmental samples were 40.4 % marine habitats (such as marine sediment, deep sea, seawater, whale fall, diatom/phytoplankton bloom, Sargasso Sea, sponge, sea urchin, bacterioplankton), 12.3 % soil habitats (such as rhizosphere, grassland, compost), 11.6 % freshwater habitats (such as lake, riverine sediment, groundwater), 8.9 % cold environments (such as Antarctic/Artic seawater, lake ice or sediment), but also 2.7 % wastewater habitats. Interestingly, environmental 16S rRNA gene sequences with 99 % sequence identity with F. suncheonense GH29-5 T were clones from wetland of France (KC432449) [16] and an enrichment culture of heterotrophic soil bacteria from the Netherlands (JQ855723), and with 98 % sequence identity to a soil isolate from Taiwan (DQ239767).
As described for Flavobacterium [17], F. suncheonense GH29-5 T stains are Gram-negative ( Table 1). The colonies are convex, round and yellow, but flexirubin-type pigments are absent and gliding motility was not observed [10]. The strain is positive for the catalase and oxidase tests [10], as are most members of the genus Flavobacterium [6]. Cells divide by binary fission, possess appandages and occur either as single rod shaped cells, with 0.3 μm in width and 1.5-2.5 μm in length, or as filaments (Fig. 2).

Genome project history
This strain was selected for sequencing on the basis of its phylogenetic position [20,21], and is part of Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project [22], a follow-up of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) pilot project [23], which aims at sequencing key reference microbial genomes and generating a large genomic basis for the discovery of genes encoding novel enzymes [24]. KMG-I is the part of the "Genomic Encyclopedia of Bacteria and Archaea: sequencing a myriad of type strains initiative" [25] and a Genomic Standards Consortium project [26]. The genome project is deposited in the Genomes OnLine Database [27] and the permanent draft genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE-JGI using state-of-the-art sequencing technology [28]. A summary of the project information is shown in Table 2.

Growth conditions and genomic DNA preparation
A culture of GH29-5 T (DSM 17707) was grown aerobically in DSMZ medium 830 (R2A Medium) [29] at 28°C. Genomic DNA was isolated using a Jetflex Genomic DNA Purification Kit (GENOMED 600100) following the standard protocol provided by the manufacturer. DNA is available from the DSMZ through the DNA Bank Network [30].

Genome sequencing and assembly
The draft genome of strain GH29-5 T was generated using the Illumina technology [31]. An Illumina Std. shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform which generated 9,392,462 reads totaling 1408.9 Mbp (Table 3). All general aspects of library construction and sequencing performed at the DOE-JGI can be found at [32]. All raw sequence data were passed through DUK, a filtering program developed at DOE-JGI, which removes known Illumina sequencing and library preparation artifacts (Mingkun L, Copeland A, Han J: DUK. unpublished 2011). The following steps were performed for assembly: (1) filtered reads were assembled using Velvet [33], (2) 1-3 Kbp simulated paired-end reads were created from Velvet contigs using wgsim [34], (3) Sequence reads were assembled with simulated read pairs using Allpaths-LG [35]. Parameters for assembly steps were: 1) Velvet ("velveth 63 -shortPaired" and "velvetg -very_clean yes -exportFiltered yes -min_contig_lgth 500 (See figure on previous page.) Fig. 1 Phylogenetic tree of the genus Flavobacterium and its most closely related genus Capnocytophaga. Modified from Hahnke et al. [68]. In short: the tree was inferred from 1254 aligned characters of the 16S rRNA gene sequence under the maximum likelihood (ML) criterion. The branches are scaled in terms of the expected number of substitutions per site. Numbers adjacent to the branches are support values from 1000 ML bootstrap replicates (left) and from 1000 maximum-parsimony bootstrap replicates (right) if larger than 60 % Evidence codes are from the Gene Ontology project [18] Evidence codes -IDA inferred from direct assay (first time in publication); TAS traceable author statement (i.e., a direct report exists in the literature); NAS non-traceable author statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence) -scaffolding no -cov_cutoff 10"), (2)

Genome annotation
Genes were identified using Prodigal [36] as part of the DOE-JGI genome annotation pipeline [37], followed by manual curation using the DOE-JGI GenePRIMP pipeline [38]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information non-redundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro database. These data sources were combined to assert a product description for each predicted protein. Additional gene prediction analysis and functional annotation was performed within the IMG-ER platform [39].

Genome properties
The assembly of the draft genome sequence consists of 54 scaffolds amounting to 2,880,663 bp. The G + C content is 40.5 % (Table 3) which is 1.5 % higher than previously reported by Kim et al. [10] and thus shows a difference that surpasses the maximal range among strains belonging to the same species [40]. Of the 2821 genes predicted, 2739 were protein-coding genes, and 82 RNAs. The majority of the protein-coding genes (69.2 %) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COG functional categories is presented in Table 4.   An estimate of the overall similarity between F. suncheonense and the five reference strains was conducted using the Genome-to-Genome Distance Calculator (GGDC 2.0) [41,42]. It reports model-based DDH estimates (digital DDH or dDDH) along with their confidence intervals [42], which allow for genome-basted species delineation and genome-based subspecies delineation. The recommended distance formula 2 is robust against the use of incomplete genome sequences and is thus especially suited for this dataset.
The result of this comparison is shown in Table 5 and yields dDDH of below 22 % throughout, which confirms the expected status of distinct species. Furthermore, the G + C content was calculated from the genome sequences of the above strains and their pairwise differences were assessed with respect to F. suncheonense. Differences were 2.4 % (F. cauense), 2.8 % (F. enshiense), 1 % (F. saliperosum), 9.1 % (F. columnare) and 8.3 % (F. aquatile). These differences confirm the status of distinct species, because, if computed from genome sequences, these differences can only vary up to 1 % within species [40].  Digital DDH values (dDDH) and the respective confidence intervals (C.I.) are specified for GGDC's recommended formula 2. The columns "HSP length / total length [%]", "identities / HSP length [%]" and "identities / total length [%]" list similarities as calculated from the intergenomic distances, which were also reported by the GGDC (Formulae 1-3)

Gliding motility
McBride and Zhu [43] described the diversity of genes involved in gliding motility among members of phylum Bacteroidetes. The machinery for gliding motility is composed of adhesin-like proteins, the type IX secretion system, and additional proteins [43]. Even though strain GH29-5 T was never observed to glide [10], all necessary genes for gliding motility were identified in its genome (Table 6).

Carbohydrate active enzymes and peptidases
Cottrell and Kirchman [44] showed that members of the Cytophaga-Flavobacteria group preferentially consume polysaccharides and proteins rather than amino acids.
This phenotypic feature was attributed by Fernández-Gómez et al. [4] to higher numbers of peptidases and additionally higher numbers of glycoside hydrolases and carbohydrate-binding modules in the genomes of Bacteroidetes compared to other bacteria. F. suncheonense GH29-5 T was isolated from greenhouse soil, hydrolyzes casein and gelatin, but did not utilize any of the tested saccharides [10,19]. Therefore, we compared the predicted CDS against the CAZyme [45,46] and dbCAN [47] database. The CAZyme annotation (Additional file 1, Table S1) was a combination of RAPSearch2 search [48,49] and HMMER scanning [50] as described in Hahnke et al. [14]. The genome of strain GH29-5 T comprised a small number of carbohydrate active enzymes (49) including 36 glycosyl transferases, nine glycoside hydrolases, four carbohydrate binding modules and six carbohydrate esterases (Table 7). Furthermore, sulfatases were suggested as important enzymes for the metabolic potential of Bacteroidetes to degrade sulfated algae polysaccharides such as carrageenan, agarans and fucans. Only, three sulfatases were identified in the genome of strain GH29-5 T (Additional file 1, Table S2).

Polysaccharide utilization loci
CAZymes of Flavobacteria that are suggested to be involved in polysaccharide decomposition are frequently observed to be organized in gene clusters. Such polysaccharides-utilization loci (PULs) consist of a TonB-dependent receptor, a SusD-like protein and carbohydrate active enzymes [51,52]. In strain GH29-5 T five TonB-dependent transporters were identified of which G498_00119, G498_01595, G498_02575 were associated to siderophores and G498_00706, G498_00915 were associated with a SusD-like protein. The gene cluster up-stream of the TonB-dependent transporter G498_00706 comprised five hypothetical proteins.   [14]. The genome of strain GH29-5 T comprised 117 identified peptidase genes (or homologues), mostly serine peptidases (S, 50), metallo peptidases (M, 50) and cysteine peptidases (C, 14) ( Table 8, Additional file 1:  Tables S3 and S4). Hence, the low number of carbohydrate active enzymes and the high number of peptidases in the genome of strain GH29-5 T reflects its currently known substrate range being proteins rather than saccharides.

Conclusions
The genome of F. suncheonense GH29-5 T contains a relaltively low number of carbohydrate active enzymes in contrast to genomes of other Flavobacteriaceae such as Flavobacterium branchiophilum [54], Flavobacterium rivuli [14], Formosa agariphila [55], Polaribacter [4,56], 'Gramella forsetii' [57] and Zobellia galactanivorans [17]. This is surpising, since greenhouse soil might be a rich source of plant litter. McBride et al. [13] described the genome features of Flavobacterium johnsoniae UW101 T , a bacterium that was as well isolated from soil [11,58]. Both the genomes of F. johnsoniae UW101 T and F. suncheonense GH29-5 T have an almost equal number of 31 and 39 peptidases per Mbp, respectively. The genomes, however, differ remarkably in the number of CAZymes, with 47 genes per Mbp in the genome of F. johnsoniae UW101 T and only 18 genes per Mbp in the genome of F. suncheonense GH29-5 T . Thus, this small set of CAZymes contributes only little to a pool of enzymes, which might be essential for a Flavobacterium to feed on soil components. A systematic collection of genome sequences, such as GEBA [23] and KMG-1 [22], will provide the scientific community with the possibility for a systematic discovery of genes encoding for novel enzymes [24] and support microbial taxonomy. In addition, genome sequences also provide further taxonomically useful information such as the G + C content [40], which, as seen in this report might significantly differ from the values determined with traditional methods.
Based on the observed large difference in the DNA G + C content and the additional information on cell morphology obtained in this study, an emended description of F. suncheonense is proposed. The description of Flavobacterium suncheonense is as given by Kim et al. [10] and Dong et al. [7], with the following modifications: the DNA G + C content is 40.5 mol%, and amendments: possesses appendages of 50-80 nm in diameter and 0.5-8 μm in length.