High-quality permanent draft genome sequence of the Mimosa asperata - nodulating Cupriavidus sp. strain AMP6

Cupriavidus sp. strain AMP6 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from a root nodule of Mimosa asperata collected in Santa Ana National Wildlife Refuge, Texas, in 2005. Mimosa asperata is the only legume described so far to exclusively associates with Cupriavidus symbionts. Moreover, strain AMP6 represents an early-diverging lineage within the symbiotic Cupriavidus group and has the capacity to develop an effective nitrogen-fixing symbiosis with three other species of Mimosa. Therefore, the genome of Cupriavidus sp. strain AMP6 enables comparative analyses of symbiotic trait evolution in this genus and here we describe the general features, together with sequence and annotation. The 7,579,563 bp high-quality permanent draft genome is arranged in 260 scaffolds of 262 contigs, contains 7,033 protein-coding genes and 97 RNA-only encoding genes, and is part of the GEBA-RNB project proposal.

However, it has been isolated from two cultivated legumes in Minas Gerais, Brazil [18]. This suggests that further surveys in South America may discover additional wild legume hosts that utilize Cupriavidus symbionts.
The only legume studied to date that is exclusively associated with Cupriavidus nodule symbionts is Mimosa asperata, from which Cupriavidus strain AMP6 was isolated in 2005 [12]. The range of M. asperata is centered in Mexico and extends slightly into south Texas, Cuba, and northern Central America [19]. Based on both housekeeping loci and symbiotic loci, strain AMP6 represents an early-diverging lineage of nodule-symbiotic Cupriavidus [10,12], whose genome may provide insights about how legume nodule symbiosis became established in this group.
Strain AMP6 was collected at the Santa Ana National Wildlife Refuge in Hidalgo County, Texas. Cupriavidus nodule bacteria resembling strain AMP6 are currently known only from M. asperata populations in the lower Rio Grande valley of Texas, and have not been detected in surveys of Mimosa species in other geographic locations [2,[9][10][11]. Nevertheless, inoculation tests have indicated that Cupriavidus strain AMP6 has the capacity to develop an effective nitrogen-fixing symbiosis with three other species of Mimosa [12]. M. asperata occurs mainly along the margins of seasonally flooded wetlands [20], a habitat characterized by heavy silt/clay soils with neutral to moderately alkaline pH (pH 7.0 -8.4; [21]).
The first completed genome for a betaproteobacterial legume symbiont was that of Cupriavidus taiwanensis LMG 19424 T [22]. Here we provide an analysis of the high-quality permanent draft genome sequence of Cupriavidus strain AMP6, enabling comparative analyses of symbiotic trait evolution in this genus.

Organism information Classification and features
Cupriavidus sp. strain AMP6 is a motile, Gram-negative, non-spore-forming rod ( Fig. 1 Left, Center) in the order Burkholderiales of the class Betaproteobacteria. The rod-shaped form varies in size with dimensions of 0.4-0.6 μm in width and 1.2-1.7 μm in length ( Fig. 1 Left). It is fast growing, forming 1.2-1.6 mm diameter colonies after 24 h when grown on YMA [23] at 28°C. Colonies on YMA are white-opaque, slightly domed, moderately mucoid with smooth margins (Fig. 1 Right). Figure 2 shows the phylogenetic relationship of Cupriavidus sp. strain AMP6 in a 16S rRNA gene sequence based tree. This strain is phylogenetically most related to Cupriavidus taiwanensis LMG 19424 T , Cupriavidus alkaliphilus ASC-732 T and Cupriavidus necator N-1 T (deposited as ATCC43291 T ) with sequence identities to the AMP6 16S rRNA gene sequence of 99.11 %, 99.04 % and 98.69 %, respectively, as determined using the EzTaxon-e server [24]. Cupriavidus taiwanensis LMG 19424 T is a plant symbiont and was isolated from root nodules of Mimosa pudica collected from three fields at Ping-Tung Country in the southern part of Taiwan [25]. Both ASC-732 T and N-1 T are soil bacteria that are not able to nodulate or fix nitrogen with legumes [26,27]. Minimum Information about the Genome Sequence (MIGS) [28] of AMP6 is provided in Table 1.

Symbiotaxonomy
Cupriavidus sp. strain AMP6 was isolated from Mimosa asperata nodules collected at the Santa Ana National Wildlife Refuge in Hidalgo County, Texas [12]. Cupriavidus sp. strain AMP6 was assessed for nodulation and nitrogen fixation on five mimosa species, including M. pigra, M. pudica, M. invisia, M. strigillosa and M. quadrivalvis [12]. Strain AMP6 could nodulate all hosts apart from M. quadrivalvis [12]. Additional acetylene reduction assays provided information on the nitrogenase activity of strain AMP6 on those hosts. These test showed substantial nitrogenase activity with M. pudica and M. invisia but only a small amount with M. pigra [12]. The absence of nodule nitrogenase activity was also observed for M. strigillosa and M. quadrivalvis [12].

Genome sequencing information Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Genomic Encyclopedia of Bacteria and Archaea, The Root Nodulating Bacteria chapter project at the U.S. Department of Energy, Joint Genome Institute [29]. The genome project is deposited in the Genomes OnLine Database [30] and the high-quality permanent draft genome sequence in IMG [31]. Sequencing, finishing and annotation were performed by the JGI using state of the art sequencing technology [32]. A summary of the project information is shown in Table 2.

Growth conditions and genomic DNA preparation
Cupriavidus sp. strain AMP6 was grown on YMA solid medium [23] for 3 days, a single colony was selected and used to inoculate 5 ml TY broth medium. The culture was grown for 48 h on a gyratory shaker (200 rpm) at 28°C. Subsequently 1 ml was used to inoculate 60 ml TY broth medium and grown on a gyratory shaker (200 rpm) at 28°C until OD 0.6 was reached. DNA was isolated from 60 mL of cells using a CTAB bacterial genomic DNA isolation method [33]. Final concentration of the DNA was 0.6 mg/ml.

Genome sequencing and assembly
The genome of Cupriavidus sp. AMP6 was generated at the DOE Joint genome Institute [32]. An Illumina Std shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform which generated 15,823,344 reads totaling 2,373.5 Mbp. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI web site [34]. All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation Fig. 2 Phylogenetic tree highlighting the position of Cupriavidus sp. strain AMP6 (shown in blue print) relative to other type and non-type strains in the Cupriavidus genus using a 1,024 bp internal region of the 16S rRNA gene. Several Alpha-rhizobia sequences were used as an outgroup. All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 5.05 [46]. The tree was build using the maximum likelihood method with the General Time Reversible model. Bootstrap analysis with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Strains with a genome sequencing project registered in GOLD [30] have the GOLD ID mentioned after the strain number, otherwise the NCBI accession number is provided. Finished genomes are designated with an asterisk artifacts (Mingkun L, Copeland A, Han J. unpublished). Following steps were then performed for assembly: (1) filtered Illumina reads were assembled using Velvet (version 1.1.04) [35] (2) 1-3 Kbp simulated paired end reads were created from Velvet contigs using wgsim [36] (3) Illumina reads were assembled with simulated read pairs using Allpaths-LG (version r42328) [37]. Parameters for assembly steps were

Genome annotation
Genes were identified using Prodigal [38], as part of the DOE-JGI genome annotation pipeline [39,40] followed by a round of manual curation using GenePRIMP [41] for finished genomes and Draft genomes in fewer than 10 scaffolds. The predicted CDSs were translated and used to search the NCBI non-redundant database, UniProt, TIGRFam, Pfam, KEGG, COG, and InterPro databases. The tRNAScanSE tool [42] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [43]. Other non-coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using IN-FERNAL [44]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes-Expert Review (IMG-ER) system [45] developed by the Joint Genome Institute, Walnut Creek, CA, USA. Phylum Proteobacteria TAS [49,50] Class Betaproteobacteria

Genome properties
The genome is 7,579,563 nucleotides with 65.46 % GC content ( Table 3) and comprised of 260 scaffolds and 262 contigs. From a total of 7,130 genes, 7,033 were protein encoding and 97 RNA only encoding genes. The majority of genes (80.24 %) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COG functional categories is presented in Table 4.

Conclusion
Cupriavidus sp. AMP6 belongs to a group of Beta-rhizobia isolated from Mimosa asperata. Phylogenetic analysis revealed that AMP6 is most closely related to Cupriavidus taiwanensis LMG 19424 T , which was isolated from Mimosa pudica, and is able to nodulate and fix nitrogen in association with several Mimosa species [13].

Competing interests
The authors declare that they have no competing interests.
Authors' contribution MP supplied the strain and background information for this project, PVB supplied DNA to JGI, TR performed all imaging, SDM and WR drafted the paper, JH provided financial support and all other authors were involved in sequencing the genome and editing the final manuscript. All authors read and approved the final manuscript.