Genomic analysis of Agrobacterium radiobacter DSM 30147T and emended description of A. radiobacter (Beijerinck and van Delden 1902) Conn 1942 (Approved Lists 1980) emend. Sawada et al. 1993

Agrobacterium radiobacter is the only known non-phytopathogenic species in Agrobacterium genus. In this study, the whole-genome sequence of A. radiobacter type strain DSM 30147T was described and compared to the other available Agrobacterium genomes. This bacterium has a genome size of 7,122,065 bp distributed in 612 contigs, including 6,834 protein-coding genes and 41 RNA genes. It harbors a circular chromosome and a linear chromosome but not a tumor-inducing (Ti) plasmid. To the best of our knowledge, this is the first report of a genome from the A. radiobacter species. In addition, an emended description of A. radiobacter is described. This study reveals information that enhances the current understanding of its non-phytopathogenicity and its phylogenetic position within Agrobacterium genus.

A taxonomic classification that relies on the phytopathogenic phenotypes may not accurately reflect the actual phylogenetic relationships of strains within Agrobacterium [10]. Accordingly, an alternative classification method was applied which divided most Agrobacterium strains into 3 biovariants (Biovars I, II and III) [10]. Among the 3 biovariants, Biovar I is the most complex group and includes several members (genomovars), designated as genomovar G1 through G9 and G13 [8,11]. At present, two strains in Biovar I have been completely sequenced: Agrobacterium sp. H13-3 (G1) and A. tumefaciens C58 (G8). The genome sequencing revealed that these strains contained two chromosomes and different numbers of plasmids. A. radiobacter DSM 30147 T also belongs to Biovar I (it is classified as a member of genomovar G4), which indicates its close relationship to A. tumefaciens C58 and Agrobacterium sp. H13-3 [12].
Most strains in the genus Agrobacterium are phytopathogens and induce crown gall tumors or hairy root diseases in their host plants [2]. However, A. radiobacter is an exception because it does not have the tumor-inducing (Ti) plasmid that contributes to the pathogenicity [13][14][15][16]. A. radiobacter members have been widely found in soil, in the rhizosphere of plants and in clinical specimens [17]. A strain of A. radiobacter was reported to enhance soil arsenic phytoremediation, indicating a potential application in bioremediation [18]. However, some members have been identified as opportunistic human pathogens [19]. So far, a total of 11 Agrobacterium genomes (3 finished and 8 draft genomes, listed in Table 1) have been sequenced but no genome of A. radiobacter has been reported. Considering its essential biological feature and important phylogenetic position in the genus Agrobacterium, we present the genome sequence of A. radiobacter DSM 30147 T , the first sequenced strain in this species.

Classification and features
Genome sequences and 16S rRNA genes were used for phylogenetic analysis. In view of the close evolutionary relationship and the inconsistent phylogeny between Agrobacterium and Rhizobium [12], we pre-analyzed all sequenced strains in these two genera and found that two "Rhizobium" members were very closely related to the 12 Agrobacterium members (including strain DSM 30147 T ). Thus, all of the 12 Agrobacterium members with sequenced genomes, two Rhizobium strains [R. lupini HPC(L) and Rhizobium sp. PDO1-076] ( Table 1) and an out-group strain R. rhizogenes K84 [7,8], were included in the phylogenetic analysis. A comparison of the 15 genomes revealed a total of 370 proteins that were shared across these genomes. A rooted neighbor-jointing (NJ) phylogenetic tree was constructed based on the shared amino acid sequences. As shown in Figure 1a, A. radiobacter DSM 30147 T was in the same cluster as the Biovar I members Agrobacterium sp. H13-3 (G1) and A. tumefaciens C58 (G8), and showed the closest relationship with A. tumefaciens str. Cherry 2E-2-2. A NJ phylogenetic tree was also constructed based on the 16S rRNA genes ( Figure  1b). When comparing the trees generated by the core protein sequences with those generated by 16S rRNA gene sequences, small topological differences in topology were found between them. In comparison to the tree generated using the 370 conserved proteins, some strains could not be distinguished with a high degree of clarity using the 16S rRNA genes. Therefore, phylogenomic analysis was considered a more robust approach than that using the 16S rRNA genes to infer the phylogeny, especially for closely related strains [21,25,26].
Strain DSM 30147 T is rod-shaped (0.6-0.8 x 1.5-1.8 μm) ( Figure 2). The enzyme activities and carbon sources utilization of strain DSM 30147 T were tested using API ZYM, API 20 NE and API ID 32 GN systems and the results are shown in Table 2 and in the emended description of A. radiobacter.

Genome sequencing and annotation Genome project history
To make a comprehensive genomic comparison for the Agrobacterium genomes, the whole genome sequence of A. radiobacter DSM 30147 T was determined. This draft genome sequence has been deposited at DDBJ/EMBL/GenBank under accession number ASXY00000000. The version described in this study is the first version, ASXY01000000. The project information is summarized in Table 3.

Growth condition and DNA isolation
A. radiobacter DSM 30147 T was grown aerobically in LB medium [38] at 28 °C for 24 h. The DNA was extracted, concentrated and purified using the QiAamp kit according to the manufacturer's instruction (Qiagen, Germany).

Genome sequencing and assembly
Illumina Hiseq2000 with the Paired-End library strategy (300 bp insert size) was used to determine the whole-genome sequence of A. radiobacter DSM 30147 T and obtained a total of 15 [37]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledg ements.

Genome annotation
The draft genome of A. radiobacter DSM 30147 T was annotated using the National Center for Biotechnology Information (NCBI) Prokaryotic Genome Annotation Pipeline (PGAP) [41], which combines the gene caller GeneMarkS + [42] with the similarity-based gene detection approach. Protein function classification was performed by searching all the predicted coding sequences of strain DSM 30147 T against the Clusters of Orthologous Groups (COGs) protein database [43] using Blastp algorithm with E-value cutoff 1-e 10 .

Genome properties
The whole genome of A. radiobacter DSM 30147 T is 7,122,065 bp in length, with an average GC content of 59.9%, and distributed in 612 contigs. Compared to the complete reference genome A. tumefaciens C58 [44] (also belonging to Biovar I, Figure 1), the whole genome of strain DSM 30147 T could clearly be divided into 2 replicons, a circular chromosome and a linear chromosome (Figure 3). In accordance with its nonphytopathogenicity phenotype, strain DSM 30147 T did not contain a Ti plasmid. Of the 6,894 genes predicted, 6,853 were protein-coding genes (CDSs), and 41 RNA genes. A total of 5,320 CDSs (77.85%) were assigned with putative functions, and the remaining proteins were annotated as the hypothetical proteins. The genome properties and statistics are summarized in Table 4 and Figure 3. The distribution of the genes into COG functional categories is shown in Table 5.