High quality draft genomic sequence of Arenimonas donghaensis DSM 18148T

Arenimonas donghaensis is the type species of genus Arenimonas which belongs to family Xanthomonadaceae within Gammaproteobacteria. In this study, a total of five type strains of Arenimonas were sequenced. The draft genomic information of A. donghaensis DSM 18148T is described and compared with other four genomes of Arenimonas. The genome size of A. donghaensis DSM 18148T is 2,977,056 bp distributed in 51 contigs, containing 2685 protein-coding genes and 49 RNA genes.

In order to provide genome information and determine genomic differences of Arenimonas species, we performed genome sequencing of strains A. donghaensis DSM 18148 T , A. composti KCTC 12666 T , A. malthae CCUG 53596 T , A. metalli CF5-1 T and A. oryziterrae KCTC 22247 T . In this study, we report the genomic features of A. donghaensis DSM 18148 T and compare it to the close relatives.
Cells of A. donghaensis DSM 18148 T are Gram-negative, aerobic, non-spore-forming, straight or slightly curved rods, motile by means of a single polar flagellum. Colonies are yellowish white, translucent and convex on R2A agar after 3 d cultivation (Fig. 2). API ID 32 GN and Biolog GN2 MicroPlate systems (bioMe'rieux) were used to investigate sole carbon source utilization, and β-hydroxybutyric acid, L-alaninamide, L-glutamic acid and glycyl-L-glutamic acid could be utilized by strain DSM 18148 T ( Table 1).

Genome sequencing information
Genome project history Genome sequencing project of A. donghaensis DSM 18148 T was carried out in April, 2013 and was finished in two months. The obtained high-quality draft genome of A. donghaensis DSM 18148 T has been deposited at DDBJ/ EMBL/GenBank under accession number AVCJ00000000. The version described in this study is the first version, AVCJ01000000. The genome sequencing project information is summarized in Table 2.

Growth conditions and genomic DNA preparation
A. donghaensis DSM 18148 T was cultivated aerobically in LB medium at 28°C for 3 d. The DNA was extracted, concentrated and purified using the QiAamp kit according to the manufacturer's instruction (Qiagen, Germany). Sequences were aligned using CLUSTALW, and phylogenetic inferences were obtained using the neighbor-joining method within the MEGA 5.05 software [8]. Numbers at the nodes represent percentages of bootstrap values obtained by repeating the analysis 1000 times to generate a majority consensus tree. The scale bar indicates 0.02 nucleotide change per nucleotide position

Genome sequencing and assembly
The whole-genome sequence of A. donghaensis DSM 18148 T was determined using the Illumina Hiseq2000 [9] with the Paired-End library strategy (300 bp insert size) at Shanghai Majorbio Bio-pharm Technology Co., Ltd. [10] (Shanghai, China). A total of 9,571,421 reads with an average read length of 93 bp (885.9 Mb data) was obtained. The detailed methods of library construction and sequencing can be found at Illumina's official website [9]. Using SOAPdenovo v1.05 [11], these reads were assembled into 51 contigs (>200 bp) with a genome size of 2,977,056 bp and an average coverage of 332.4 x.

Genome annotation
The draft sequence of strain A. donghaensis DSM 18148 T was submitted to NCBI Prokaryotic Genome Annotation Pipeline [12] for annotation according to

Genome properties
The whole genome of A. donghaensis DSM 18148 T is 2,977,056 bp in length, with a G + C content of 68.7 % ( Fig. 3 and Table 3 were assigned with putative functions, while the remaining ones were annotated as hypothetical proteins. The result of protein function classification is shown in Table 4, which was performed by searching all the predicted coding sequences of strain DSM 18148 T against the Clusters of Orthologous Groups protein database [13] using BlastP algorithm with E-value cutoff 1-e 10 . A more detailed summary of the genome properties about this strain is provided in Table 3.

Insights from the genome sequences
Strain A. donghaensis DSM 18148 T can only use several sole carbon sources and cannot assimilate glucose and other sugars [1]. Genome analysis using the Kyoto Encyclopedia of Genes and Genomes (KEGG) [14] orthology and pathway assignment analysis revealed this strain has a complete TCA cycle, but lacks the hexokinase which catalyzes the first step of glycolysis, as well as the glucose-6-phosphate dehydrogenase, gluconolactonase and 6-phosphogluconate dehydrogenase that responsible for the oxidative phase of pentose phosphate pathway. This is in agreement with the experimental result that this bacterium can only use several sole carbon sources. The general features of the five Arenimonas sequenced genomes are summarized in Table 5. Orthologs clustering analysis was performed using OrthoMCL [15] with Match cutoff of 50 % and E-value Exponent cutoff of 1-e 5 for the five Arenimonas genomes. These five Arenimonas bacteria share 1014 genes, which are classified into 21 COG functional categories. The major categories are energy production and conversion (8.7 %), amino acid transport and metabolism (8.7 %), coenzyme transport and metabolism (5.8 %), lipid transport and metabolism (5.1 %), translation, ribosomal structure and biogenesis (12.4 %), replication, recombination and repair (5.2 %), cell wall/membrane/ envelope biogenesis (5.9 %), posttranslational modification, protein turnover, chaperones (6.3 %), general function prediction only (8.4 %), function unknown (7.3 %) and signal transduction mechanisms (5.3 %) ( Fig. 4 and Table 6).
There are 601 strain-specific genes for A. donghaensis DSM 18148 T which may contribute to species-specific features of this bacterium. Among them, 359 are classified into 20 COG functional categories major belonging to transcription (6.3 %), general function prediction only (8.5 %), function unknown (7.3 %) and signal transduction mechanisms (9.0 %). The remaining 242  The total is based on the total number of protein coding genes in the genome unique genes (40.3 %) are not classified into any COG categories ( Fig. 4 and Table 7). In addition, the five Arenimonas strains had a pan-genome [16] size of 7501 genes. The nucleotide diversity (π) was calculated using MAUVE v2.3 [17] and DnaSP v5 [18]. The five genomes of Arenimonas species had a nucleotide diversity (π) value of 0.18, which means an approximate genuswide nucleotide sequence homology of 82 %. The clustered regularly interspaced short palindromic repeats (CRISPRs) mediate resistance to foreign genetic material and thus inhibit horizontal gene transfer [19]. Screening the CRISPRs system in the five Arenimonas genomes using CRISPRfinder program online [20] found that only one CRISPR system (on contig 41) exist in the genome of A. composti KCTC 12666 T . This CRISPR length is 5331 bp, with 29 bp direct repeat (DR) sequences be separated by 87 spacers.
Fifteen available genome sequences of the family Xanthomonadaceae were chosen for genomic based phylogenetic analysis, including the five Arenimonas genomes that were sequenced by us. In total, 1014 core protein sequences were extracted using the cluster algorithm tool OrthoMCL with default parameters [15].  The total is based on the total number of protein coding genes in the core genome  The total is based on the total number of strain-specific genes of A. donghaensis DSM 18148 T Fig. 5 A phylogenetic tree highlighting the phylogenetic position of A. donghaensis DSM 18148 T . The conserved protein was analyzed by OrthoMCL with Match Cutoff 50 % and E-value Exponent Cutoff 1-e 5 [15]. The phylogenetic tree was constructed based on the 1014 single-copy conserved proteins shared among the fifteen genomes. The phylogenies were inferred by MEGA 5.05 with NJ algorithm [8], and 1000 bootstrap repetitions were computed to estimate the reliability of the tree. The genome accession numbers of the strains are shown in parentheses The neighbor-joining (NJ) phylogenetic tree showed that the five Arenimonas species clustered into the same branch (Fig. 5), which is in accordance with the 16S rRNA gene-based phylogeny (Fig. 1). Similar to A. donghaensis DSM 18148 T , the TCA cycle is complete and hexokinase is absent in all the five Arenimonas strains. The proteins responsible for the oxidative phase of pentose phosphate pathway are also incomplete in five Arenimonas strains, this may be part of the reasons that the five Arenimonas strains can only use several single carbon sources.

Conclusions
To the best of our knowledge, this report provides the first genomic information of the genus Arenimonas. The genomic based phylogeny is in agreement with the 16S rRNA gene based one indicating the usefulness of genomic information for bacterial taxonomic classification. Analysis of the genome shows certain correlation between the genotypes and the phenotypes especially on utilization of sole carbon sources.