High quality draft genomic sequence of Flavobacterium enshiense DK69T and comparison among Flavobacterium genomes

Flavobacterium enshiense DK69T is a Gram-negative, aerobic, rod-shaped, non-motile and non-flagellated bacterium that belongs to the family Flavobacteriaceae in the phylum Bacteroidetes. The high quality draft genome of strain DK69T was obtained and has a 3,375,260 bp genome size with a G + C content of 37.7 mol % and 2848 protein coding genes. In addition, we sequenced five more genomes of Flavobacterium type strains and performed a comparative genomic analysis among 12 Flavobacterium genomes. The results show some specific genes within the fish pathogenic Flavobacterium strains which provide information for further analysis the pathogenicity.


Organism information
Classification and features F. enshiense DK69 T is a Gram-negative, strictly aerobic, yellow-pigmented rod shaped bacterium isolated from soil collected at a pharmaceutical company in Enshi, Hubei province, China. The total soil C, N, P, S and Fe Fig. 2 A NJ phylogenetic tree of the strains within family Flavobacteriaceae based on core-protein sequence comparisons. GenBank accession numbers are shown in parentheses. *represents the strains sequenced by us Fig. 1 A NJ phylogenetic tree of the strains within family Flavobacteriaceae based on 16S rRNA gene sequence comparisons. GenBank accession numbers are shown in parentheses. The sequences were aligned using CLUSTALX, and the phylogenetic tree was obtained using MEGA 6 [19] software of neighbor-joining method [39], with the bootstrap values of 500 replicates. *represents the strains sequenced by us concentrations were 39.83, 3.34, 0.68, 0.36, 33.80 g kg −1 , respectively, and the pH was 6.97 [1]. A neighborjoining phylogenetic tree based on the 16S rRNA gene sequences was built using MEGA 6 [19] and showed that strain DK69 T was clustered within a branch containing other species in the genus Flavobacterium (Fig. 1). In addition, the sequence of F. enshiense DK69 T was compared with other sequenced strains of the family Flavobacteriaceae use BioLinux [20], and a total of 24 core protein sequences were obtained with 50 % identity and E-value exponent of e −10 . A phylogenetic tree based on the 24 core protein sequences of the core genome ( Fig. 2) is similar to the 16S rRNA gene based tree.
The colonies of F. enshiense DK69 T are smooth with regular edges, circular, yellowish and about 1 mm in diameter after grown on R2A agar at 28°C for 48 h.  Evidence codes-IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [27] Growth occurs at 4-32°C, pH 6.0-8.0 on R2A and TSA, but not on NA or LB media, and NaCl is not required [1]. Cells are non-flagellated, non-sporeforming, non-motile, rod-shaped (Fig. 3). Oxidaseand catalase-positive. The DNA G + C content is 34.4 mol% [1]. The general description of this strain is shown in Table 1.

Genome sequencing information
Genome project history Genome of F. enshiense DK69 T was sequenced by Majorbio Bio-pharm Technology Co., Ltd, Shanghai, China. The high-quality draft genome sequence was deposited in the National Center for Biotechnology Information. Contigs less than 200 bp were not included. The GenBank accession number is JRLZ00000000. The summary of the genome sequencing project information is shown in Table 2.
Growth conditions and genomic DNA preparation F. enshiense DK69 T was grown on R2A medium at 28°C for 2 d with 160 rpm shaking. Cells in latelog-phase growth were harvested and lysed by EDTA, lysozyme, and detergent treatment, followed by proteinase K and RNase digestion. The DNA was extracted and purified using the QiAamp kit according to the manufacturer's instruction (Qiagen, Germany). The quantity of DNA was measured by the NanoDrop Spectrophotometer to ensure that the DNA concentration is greater than 20 ng/μl, then 5 μg of DNA was sent to Majorbio (Shanghai, China) for sequencing.

Genome sequencing and assembly
The Illumina Hiseq2000 with the Paired-End library strategy was used to determine the whole-genome sequence of F. enshiense DK69 T . TruSeq DNA Sample Preparation Kits are used to prepare DNA libraries with insert sizes of 300-500 bp for single, paired-end, and multiplexed sequencing.

Genome annotation
The annotation of the genomic sequences was completed using the NCBI Prokaryotic Genome Annotation Pipeline  The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome which was combined using Best-placed reference protein set and the gene caller GeneMarkS + . SignalP [30] and SOSUI [31] were used to predict signal peptides and transmembrane helices. The predicted CDSs were also used to search against the Pfam protein family database [32]. The GenBank database [33] and the COG databases [34] BLASTP search were used to predict protein sequences.

Genome properties
The genome statistics are provided in Table 3 and Fig. 4. After genome annotation, the genome of F. enshiense DK69 T was found to have a total length of 3,375,260 bp, a G + C content of 1,273,385 bp (37.7 mol %) and 74 contigs. From a total of 3,054 genes predicted, 2,848 genes are protein-coding genes, 50 are RNA genes, 57.9 % are assigned with putative functions and the remaining are annotated as hypothetical proteins or proteins of unknown functions. The distribution of genes into COGs functional categories is shown in Table 4.

Profiles of metabolic network and pathway
The metabolic network and pathways of F. enshiense DK69 T (Fig. 5) were predicted using the Kyoto Encyclopedia of Genes and Genomes [35]. The metabolic network showed that F. enshiense DK69 T possesses glycolysis, TCA cycle and pentose phosphate  The total is based on the total number of protein coding genes in the annotated genome Fig. 5 Metabolic network and pathways of Flavobacterium enshiense DK69 T as predicted using KEGG [35]. Green lines indicate pathways that are possessed by this strain pathways and could utilize casein, tyrosine, sucrose and D-mannitol. The genome analysis results are in agreement with the phenotypes [1].

Comparison of the 12 Flavobacterium genomes
The genomic information of the 12 Flavobacterium genomes are summarized in Table 5. OrthoMCL [36] analysis was performed to identify the set of orthologs among the 12 Flavobacterium genomes. F. enshiense DK69 T shared 1,190 genes with the other 11 Flavobacterium strains, and had 437 strain-specific genes which may contribute to the species-specific features (Fig. 6). Three of the 12 Flavobacterium strains are fish pathogenic bacteria [6][7][8]. Using OrthoMCL [36] analysis, a total of ten proteins we found to be unique in the three fish-pathogenic species. Three of the putative proteins were reported to be related to the pathogenicity of pathogenic bacteria including polysaccharide deacetylase [37], ABC transporter ATPase and ABC transporter permease [38] (Table 6).

Conclusions
The genomic results of F. enshiense DK69 T and related strains reveled useful information. (1) The genome based phylogenetic analysis results is in agreement with the 16S rRNA gene based one; (2) The genomic data are correlated with some phenotypes of strain DK69 T ; (3) Compared to the three fish pathogenic Flavobacterium  Fig. 6 A venn diagram indicates the twelve genomes of Flavobacterium analyzed by OrthoMCL [36] illustrate the number of the unique proteins and the common proteins among them strains, no pathogenic related genes was detected in the environmental strain DK69 T which indicated its nonpathogenicity; and (4) Some specific genes were found within the three fish pathogenic Flavobacterium strains which provides information for further analysis the pathogenicity.