High-quality permanent draft genome sequence of Bradyrhizobium sp. Ai1a-2; a microsymbiont of Andira inermis discovered in Costa Rica

Bradyrhizobium sp. Ai1a-2 is is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen fixing root nodule of Andira inermis collected from Tres Piedras in Costa Rica. In this report we describe, for the first time, the genome sequence information and annotation of this legume microsymbiont. The 9,029,266 bp genome has a GC content of 62.56% with 247 contigs arranged into 246 scaffolds. The assembled genome contains 8,482 protein-coding genes and 102 RNA-only encoding genes. This rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project proposal. Electronic supplementary material The online version of this article (doi:10.1186/s40793-015-0007-z) contains supplementary material, which is available to authorized users.


Introduction
Bradyrhizobium sp. strain Ai1a.2 is a representative of a distinctive lineage affiliated with the Bradyrhizobium elkanii superclade [1]. The B. elkanii superclade is one of the three main branches of the genus, together with the B. japonicum/B. diazoefficiens superclade [2,3], and the group encompassing photosynthetic Aeschynomene symbionts [4].
Members of the lineage represented by strain Ai1a.2 are readily diagnosed because they share a distinctive length variant in helix 9 within the 5′ intervening sequence region of the 23S rRNA gene [5]. Strain Ai1a.2 and its relatives have an insertion of 16 nucleotides in this region in comparison to B. elkanii USDA76, which can be identified by a straightforward PCR assay [6]. In a survey of 420 Bradyrhizobium strains from 25 countries [1], only 2% of the strains had this 23S rRNA length variant. These strains all clustered together into a strongly supported clade based on concatenated data for 23S rRNA and five protein-coding genes [1].
This clade was placed as the most basally diverging lineage within the B. elkanii superclade, and it included strains from three locations: Central America, the Caribbean, and South Africa. Strain Ai1a.2 was sampled in Costa Rica from the tree Andira inermis [6], and highly similar strains are also known to occur as symbionts of the same host legume in Panama [7]. Parker and Rousteau [8] also detected strains from this group in nodule samples from the beach legume Canavalia rosea in two Caribbean locations (Guadeloupe and Puerto Rico). Two Bradyrhizobium strains from distantly related legume hosts (Leobordea spp.) in South Africa (WSM2632, WSM2783) also belong to this clade [9].
Andira inermis, the host of strain Ai1a.2, is a large tree (up to 35 m height) commonly found in riparian habitats from southern Mexico through northern South America [10]. Andira was traditionally considered to be an earlydiverging lineage within the Tribe Dalbergieae [11], but more recent phylogenetic analyses have suggested that it forms a separate lineage with unclear relationship to dalbergioid legumes [12]. Here we provide an analysis of the high-quality permanent draft genome sequence of Bradyrhizobium strain Ai1a.1. The fact that the genome of its close relative WSM2783 has also been sequenced as part of the Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project [13] will enable detailed comparative analysis of this group.

Classification and features
Bradyrhizobium sp. Ai1a-2 is a motile, non-sporulating, non-encapsulated, Gram-negative strain in the order Rhizobiales of the class Alphaproteobacteria. The rod shaped form has dimensions of approximately 0.5 μm in width and 1.5-2.0 μm in length ( Figure 1 Left and Center). It is relatively slow growing, forming colonies after 6-7 days when grown on half strength Lupin Agar (½LA) [14], tryptone-yeast extract agar (TY) [15] or a modified yeastmannitol agar (YMA) [16] at 28°C. Colonies on ½LA are opaque, slightly domed and moderately mucoid with smooth margins (Figure 1 Right). Figure 2 shows the phylogenetic relationship of Bradyrhizobium sp. Ai1a-2 in a 16S rRNA gene sequence based tree. The 16S rRNA gene sequence of Aia1-2 (using a 1,370 bp intragenic sequence) is identical to that of Bradyrhizobium sp. WSM2783. Bradyrhizobium sp. Ai1a-2 is also closely related to Bradyrhizobium sp. Cp5.3 and Bradyrhizobium sp. Th.b2 with 16S rRNA gene sequence identities of 99.77% and 99.23%, respectively, as determined using NCBI BLAST analysis [17]. The highest identity (99.16%) of the 16S rRNA gene sequence of strain Ai1a-2 to type strain sequences occurs with Bradyrhizobium icense LMTR 13 T and Bradyrhizobium paxllaeri LMTR 21 T based on alignment using the EzTaxon-e server [18,19].
Minimum Information about the Genome Sequence (MIGS) is provided in Table 1 and Additional file 1: Table S1.

Symbiotaxonomy
Strain Ai1a.2 was isolated from the tree Andira inermis, Costa Rica [6]. The authentication of the symbiotic ability could not be performed using this host because seeds could not be accessed. The symbiotic capability of strain Ai1a.2 was tested on Macroptilium atropurpureum and this strain was able to nodulate this host. Acetylene reduction assays showed established nodules contained active nitrogenase, indicating an effective symbiosis with this host [6].

Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Genomic Encyclopedia of Bacteria and Archaea, Root Nodulating Bacteria (GEBA-RNB) project at the U.S. Department of Energy, Joint Genome Institute (JGI). The genome project is deposited in the Genomes OnLine Database [20] and a high-quality permanent draft genome sequence in IMG [21]. Sequencing, finishing and annotation were performed by the JGI using state of the art sequencing technology [22]. A summary of the project information is shown in Table 2.

Growth conditions and genomic DNA preparation
Bradyrhizobium sp. Ai1a-2 was cultured to mid logarithmic phase in 60 ml of TY rich media on a gyratory shaker at 28°C [23]. DNA was isolated from the cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [24].

Genome sequencing and assembly
The draft genome of Bradyrhizobium sp. Ai1a-2 was generated at the DOE Joint Genome Institute (JGI) using the Illumina technology [25]. An Illumina standard shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform which generated 21,669,974 reads totaling 3,250.5 Mbp. All general aspects of library construction and sequencing were performed at the JGI and details can be found on the JGI website [26]. All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation artifacts (Mingkun L, Copeland A, Han J, Unpublished). Following steps were then performed for assembly: (1) filtered Illumina reads were assembled using Velvet (version 1.1.04) [27], (2) 1-3 Kbp simulated paired end reads were created from Velvet contigs using wgsim [28], (3) Illumina reads were  Ai1a-2 (shown in blue print) relative to other type and non-type strains in the Bradyrhizobium genus using a 1,310 bp intragenic sequence of the 16S rRNA gene. Azorhizobium caulinodans ORS 571 T sequence was used as an outgroup. All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 5.05 [37]. The tree was built using the maximum likelihood method with the General Time Reversible model. Bootstrap analysis with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Strains with a genome sequencing project registered in GOLD [20] have the GOLD ID mentioned after the strain number and are represented in bold, otherwise the NCBI accession number is provided.

Genome annotation
Genes were identified using Prodigal [30], as part of the DOE-JGI genome annotation pipeline [31,32]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, KEGG, COG, and InterPro databases. The tRNAScanSE tool [33] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [34]. Other non- coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [35]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes-Expert Review (IMG-ER) system [36] developed by the Joint Genome Institute, Walnut Creek, CA, USA.

Genome properties
The genome is 9,029,266 nucleotides with 62.56% GC content ( Table 3) and comprised of 246 scaffolds. From a total of 8,584 genes, 8,482 were protein encoding and 102 RNA only encoding genes. The majority of genes (75.10%) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 4.

Conclusions
Bradyrhizobium sp. Ai1a-2 is a member of a widely distributed Bradyrhizobium lineage, isolated from diverse legume hosts in North, Central and South America and South Africa. Little is currently known of the symbiotic associations of its host Andira inermis, apart from the discovery that the Puerto Rican isolate Bradyrhizobium sp. EC3.3 can also establish a symbiosis with this host [8].
The Costa Rican isolate Aia1-2 16S rRNA gene sequence is distinct to that of EC3.3 but identical to the 16S rRNA sequence of South African isolate Bradyrhizobium sp.