Draft genome sequence of Cellulomonas carbonis T26T and comparative analysis of six Cellulomonas genomes

Most Cellulomonas strains are cellulolytic and this feature may be applied in straw degradation and bioremediation. In this study, Cellulomonas carbonis T26T, Cellulomonas bogoriensis DSM 16987T and Cellulomonas cellasea 20108T were sequenced. Here we described the draft genomic information of C. carbonis T26T and compared it to the related Cellulomonas genomes. Strain T26T has a 3,990,666 bp genome size with a G + C content of 73.4 %, containing 3418 protein-coding genes and 59 RNA genes. The results showed good correlation between the genotypes and the physiological phenotypes. The information are useful for the better application of the Cellulomonas strains. Electronic supplementary material The online version of this article (doi:10.1186/s40793-015-0096-8) contains supplementary material, which is available to authorized users.

So far, three genomes of Cellulomonas have been published including Cellulomonas flavigena DSM 20109 T [13], Cellulomonas fimi ATCC 484 T [14] and "Cellulomonas gilvus" ATCC 13127 T1 [14] and showed a wide variety of cellulases and hemicellulases in their genomes [13,14]. In order to provide more genomic information about Cellulomonas strains for potential industrial application, we sequenced the genomes of Cellulomonas carbonis T26 T [1], Cellulomonas cellasea DSM 20118 T [2] and Cellulomonas bogoriensis DSM 16987 T [15]. Here we present a summary genomic features of C. carbonis T26 T together with the comparison results of the six available Cellulomonas genomes.

Classification and features
The taxonomic classification and general features of C. carbonis T26 T are presented in Table 1. A total of 105 single-copy conserved proteins were obtained within the 13 genomes by OrthoMCL with a Match Cutoff 50 % and an E-value Exponent Cutoff 1-e 5 [16,17]. Figure 1 shows the phylogenetic tree of C. carbonis T26 T and 12 related strains based on conserved gene sequences. The tree was constructed by MEGA 5.05 with Maximum-Likelihood method to determine phylogenetic position [18]. The genome based phylogenetic tree ( Fig. 1) is similar to the 16S rRNA gene based phylogenetic tree [1]. a Evidence codes -IDA Inferred from Direct Assay, TAS Traceable Author Statement (i.e., a direct report exists in the literature), NAS Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [23] Fig. 1 Phylogenetic tree showing the position of C. carbonis T26 T (shown in bold) based on aligned sequences of 105 single-copy conserved proteins shared among the 13 genomes. The conserved protein was acquired by OrthoMCL with a Match Cutoff 50 % and an E-value Exponent Cutoff 1-e5 [15,16]. Phylogenetic analysis was performed using MEGA version 5.05 and the tree was built using the Maximum-Likelihood method [17] with 1000 bootstrap repetitions were computed to estimate the reliability of the tree. The corresponding GenBank accession numbers are displayed in parentheses Strain C. carbonis T26 T is Gram-positive, aerobic, motile and rod-shaped (0.5-0.8 × 2.0-2.4 μm) (Fig. 2). The colonies are yellow-white, convex, circular, smooth, nontransparent and about 1 mm in diameter after 3 days incubation on R2A agar at 28°C [1]. The optimal growth occurs at 28°C ( Table 1). The strain was able to hydrolyse CM-cellulose, starch, gelatin, aesculin and positive in catalase and nitrate reduction [1]. C. carbonis T26 T was capable of utilizing a wide range of sole carbon sources including D-glucose, L-arabinose, mannose, N-acetyl glucosamine, maltose, gluconate, sucrose, glycogen, salicin, D-melibiose, D-sorbitol, xylose, D-lactose, Dgalactose, D-fructose and raffinose [1, Table 1].

Genome project history
This organism was selected for sequencing particularly due to its cellulolytic activity and other applications. Genome sequencing was performed by Majorbio Bio-pharm Technology in April-June, 2013. The raw reads were assembled by SOAPdenovo v1.05. The genome annotation was performed at the RAST server version 2.0 [19] and the NCBI Prokaryotic Genome Annotation Pipeline and has been deposited at DDBJ/ EMBL/GenBank under accession number AXCY00000000. The version described in this study is the first version AXCY01000000. The project information are summarized in Table 2.

Growth conditions and genomic DNA preparation
Strain C. carbonis T26 T was grown aerobically in 50 ml LB medium at 28°C for 36 h with 160 rpm shaking. Cells were collected by centrifugation and about 20 mg pellet was obtained. Genomic DNA was extracted, concentrated and purified using the QiAamp kit (Qiagen, Germany). The quality of DNA was assessed by 1 % agarose gel electrophoresis and the quantity of DNA was measured using   [22].

Genome annotation
The draft genome sequence of C. carbonis T26 T was annotation through the RAST server version 2.0 and Fig. 3 A graphical circular map of the C. carbonis T26 T genome performed with CGview comparison tool [39]. From outside to center, ring 1, 4 show protein-coding genes colored by COG categories on forward/reverse strand; ring 2, 3 denote genes on forward/reverse strand; ring 5 shows G + C% content plot, and the innermost ring shows GC skew the National Center for Biotechnology Information Prokaryotic Genome Annotation Pipeline. Genes were identified using the gene caller GeneMarkS + with the similarity-based gene detection approach [23]. The predicted CDSs were translated and used to search the NCBI Nonredundant Database, Pfam [24], KEGG [25], and the NCBI Conserved Domain Database through the Batch web CD-Search tool [26]. The miscellaneous features were prediction by WebMGA [27], TMHMM [28] and SignalP [29]. The putative cellulose-degrading enzymes were identified through Carbohydrate-Active enZYmes Database (CAZymes) Database [30].

Genome properties
The whole genome of C. carbonis T26 T is 3,990,666 bp in length, with an average GC content of 73.4 %, and comprised of 547 contigs. The genome properties and statistics are summarized in Table 3 and Fig. 3. From a total of 3513 genes, 3418 protein-coding genes were identified and 71 % of them were assigned putative functions, while the remainder was annotated as hypothetical proteins. In addition, 36 pseudogenes, 11 rRNA, 46 tRNAs and 1 ncRNA were identified. The distributions of genes among the COGs functional categories are shown in Table 4.

Insights from the genome sequence
In order to reveal more genomic information for better application of the Cellulomonas strains, the genomic features of C. carbonis T26 T together with the comparison results of the six Cellulomonas genomes were analyzed (Table 5). OrthoMCL analysis with a Match cutoff of 50 % and an E-value Exponent cutoff of 1-e5 identified 1189 single-copy conserved proteins among the six Cellulomonas genomes (Fig. 4). Several carbohydrateactive enzymes have been identified and classified into different families of glycoside hydrolases, carbohydrate binding modules, carbohydrate esterases, auxiliary activities and polysaccharide lyases [31] (Fig. 5, Additional file 1: Table S1). Some putative glycoside hydrolases may be responsible for the ability of Cellulomonas spp. to utilize various sole carbon sources. Some potential cellulose-degrading enzymes were found and analyzed (Fig. 6, Additional file 1: Table S2). The percentage is based on the total number of protein-coding genes in the annotated genome C. fimi ATCC 484 T possesses the highest number of putative cellulases, including ten members of βglucosidases (GH1 and GH3); six members of endoglucanases (GH6 and GH9); four endo-β-1,4-glucanases (GH48 and GH5) and one cellobiose phosphorylase (GH94). C. carbonis T26 T has the fewest putative cellulases, including one cellobiose phosphorylase (GH94); one endoglucanase (GH6) and five β-glucosidases (GH1 and GH3). Cellulose activity assays were performed on Congo-Red agar media [32] and all of the six Cellulomonas strains yielded a cellulose clearing zone on the media (data not shown). The Kyoto Encyclopedia of Genes and Genomes was used to construct metabolic pathways and all of the six Cellulomonas strains have the complete cellulose degradation pathways (data not shown).
In addition to the utilization of cellulose, the Cellulomonas strains are also known to degrade hemicelluloses. A large number of putative intracellular and extracellular xylan degrading enzymes have been identified in the Cellulomonas genomes, such as endo-1-4,-β-xylanase, βxylosidase, α-L-arabinofuranosidase, acetylxylan esterase and α-glucuronidase (Additional file 1: Table S3) which suggests the capacity to degrade hemicelluloses. We also found a large number of α-amylases which are responsible to the degradation of starch in the six Cellulomonas genomes (Additional file 1: Table S4) suggest the potential application in bioremediation of food industrial wastewater.

Conclusions
The genomic information of C. carbonis T26 T and the comparison results of the six Cellulomonas genomes revealed a high degree of putative cellulases, hemicellulases. In addition, we found that the genomes also contain members of α-amylases. These information provides a genomic basis for the better application of Cellulomonas spp. in industry and environmental bioremediation. In addition, the genomes possess many putative carbohydrate-active enzymes which is in agreement with their physiological ability to utilize various sole carbon sources. Endnote 1 Editorial notealthough designated as a type strain of Cellulomonas gilvus by Christopherson et al., this strain continues to be listed as a non-type strain of Cellvibrio gilvus in the ATCC catalogue. At present, neither name has standing in the taxonomic literature. Fig. 4 Ortholog analysis of the six Cellulomonas genomes conducted using OrthoMCL. The total numbers of shared proteins among the six genomes and unique proteins from each species were tabulated and presented as a Venn diagram  Fig. 6 The distribution of cellulases in six Cellulomonas genomes. The cellulases are β-glucosidase, endoglucanase, endo-β-1,4-glucanase and cellobiose phosphorylase