High-quality permanent draft genome sequence of Ensifer medicae strain WSM244, a microsymbiont isolated from Medicago polymorpha growing in alkaline soil

Ensifer medicae WSM244 is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as a legume microsymbiont of Medicago species. WSM244 was isolated in 1979 from a nodule recovered from the roots of the annual Medicago polymorpha L. growing in alkaline soil (pH 8.0) in Tel Afer, Iraq. WSM244 is the only acid-sensitive E. medicae strain that has been sequenced to date. It is effective at fixing nitrogen with M. polymorpha L., as well as with more alkaline-adapted Medicago spp. such as M. littoralis Loisel., M. scutellata (L.) Mill., M. tornata (L.) Mill. and M. truncatula Gaertn. This strain is also effective with the perennial M. sativa L. Here we describe the features of E. medicae WSM244, together with genome sequence information and its annotation. The 6,650,282 bp high-quality permanent draft genome is arranged into 91 scaffolds of 91 contigs containing 6,427 protein-coding genes and 68 RNA-only encoding genes, and is one of the rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project proposal. Electronic supplementary material The online version of this article (doi:10.1186/s40793-015-0119-5) contains supplementary material, which is available to authorized users.


Introduction
Root nodule bacteria that fix atmospheric nitrogen in association with annual and perennial pasture legumes have important roles in agriculture. Some of the most important associations in temperate and Mediterranean regions are the Ensifer (Sinorhizobium 1 ) -Medicago symbioses that produce nutritious feed for animals. Medicago is a genus within tribe Trifolieae, which is included in the "temperate herbaceous papilionoid" Inverted Repeat Lacking Clade (IRLC) legumes [1,2]. Species of Medicago are amongst the most extensively grown forage and pasture plants and have been cultivated ever since Medicago sativa L.  [3]. Medicago spp. are symbiotically specific: nearly all studied species are nodulated by strains of rhizobia belonging to either Ensifer medicae or the closely related species E. meliloti [4,5]. E. medicae can be distinguished from E. meliloti by its ability to nodulate and fix nitrogen with M. polymorpha L. [5].
Ensifer medicae WSM244 was isolated in 1979 from a root nodule of M. polymorpha L. growing on alkaline soil (pH 8.0) near Tel Afer, Iraq [6]. This strain was superior in N 2 -fixation on a range of medics (M sativa L., M truncatula Gaertn., M. tornata L., M. polymorpha L., M. littoralis Loisel., M scutellata (L.) Mill.) in glasshouse tests in Australia and field trials in Iraq in 1980, and was recommended for development as an inoculant in Iraq (D. Chatel, pers com.). WSM244 has also been used in trials aimed at developing acid-tolerant inoculant strains for pasture medics, as the acid-sensitive nature of the microsymbiont is a constraint to the growth and persistence of Medicago spp. in agricultural regions with moderately acidic soils [7]. When field tested in an acidic soil (pH 5.0 CaCl 2 ) in Western Australia, WSM244 survived at the site of inoculation for two years, but unlike several more acid tolerant strains it did not demonstrate saprophytic competence and was unable to colonize the soil [8]. This characteristic of WSM244 as an acid-soil sensitive strain correlates with its acid sensitive profile for growth in laboratory media and an inability to maintain a neutral intracellular pH when exposed to pH 6.0 or less [9]. This is in contrast to other E. medicae strains, which typically are the dominant microsymbiont partners of annual medics growing on acid soils, in contrast to the more acid-sensitive E. meliloti, which preferentially associates with alkaline-soil-adapted Medicago spp. [10]. The pH response phenotype of WSM244 is in marked contrast to the sequenced acid tolerant E. medicae strain WSM419 [11]. Sequencing the genome of WSM244 and comparing its attributes with an acidtolerant strain such as WSM419 would provide a means of establishing the molecular determinants required for adaptation to acid soils. This strain was therefore selected as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) sequencing project [12]. Here we present a summary classification and a set of general features for E. medicae strain WSM244, together with a description of its genome sequence and annotation.

Organism information
Classification and features E. medicae WSM244 is a motile, Gram-negative rod ( Fig. 1 Left and Center) in the order Rhizobiales of the class Alphaproteobacteria. It is fast growing, forming colonies within 3-4 days when grown on half strength Lupin Agar [13], tryptone-yeast extract agar [14] or a modified yeast-mannitol agar [15] at 28°C. Colonies on ½LA are white-opaque, slightly domed and moderately mucoid with smooth margins (Fig. 1 Right). Figure 2 shows the phylogenetic relationship of E. medicae WSM244 in a 16S rRNA sequence based tree. This strain is the most phylogenetically related to Ensifer medicae WSM419 and Ensifer meliloti LMG 6133 T based on the 16S rRNA gene alignment, with sequence identities of 100 % and 99.71 %, respectively, as determined using the EzTaxon-e database, which contains the sequences of validly published type strains [16]. Minimum Information about the Genome Sequence for WSM244 is provided in Table 1 and Additional file 1: Table S1.  Table S2.

Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Genomic Encyclopedia of Bacteria and Archaea, The Root Nodulating Bacteria chapter project at the U.S. Department of Energy, Joint Genome Institute. The genome project is deposited in the Genomes OnLine Database [17] and a high-quality permanent draft genome sequence is deposited in IMG [18]. Sequencing, finishing and annotation were performed by the JGI [19]. A summary of the project information is shown in Table 2.
Growth conditions and genomic DNA preparation E. medicae WSM244 was grown on TY solid medium [14] for three days, then a single colony was selected and used to inoculate 5 ml TY broth medium. The culture was grown for 48 h on a gyratory shaker (200 rpm) at 28°C. Subsequently 1 ml was used to inoculate 60 ml TY broth medium and grown on a gyratory shaker (200 rpm) at 28°C until OD 0.6 was reached. DNA was isolated from 60 ml of cells using a CTAB bacterial genomic DNA isolation method (http://jgi.doe.gov/collaborate-with-jgi/pmo-overview/protocols-sample-preparation-information/). Final concentration of the DNA was 0.5 mg ml −1 .

Genome sequencing and assembly
The draft genome of E. medicae WSM244 was generated at the DOE Joint genome Institute (JGI) using the Illumina technology [20]. An Illumina Std shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform which generated 22,576,268 reads totaling 3,386.4 Mbp. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website. All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which (The species name "Sinorhizobium chiapanecum" has not been validly published.) Azorhizobium caulinodans ORS 571 T was used as an outgroup. All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 6 [37]. The tree was built using the Maximum-Likelihood method with the General Time Reversible model [38]. Bootstrap analysis [39] with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Strains with a genome sequencing project registered in GOLD [17] are in bold font and the GOLD ID is provided after the GenBank accession number. Finished genomes are indicated with an asterisk removes known Illumina sequencing and library preparation artifacts ((Mingkun L, Copeland A, Han J. unpublished) . The following steps were then performed for assembly: (1) filtered Illumina reads were assembled using Velvet (version 1.1.04) [21], (2) 1-3 Kbp simulated paired end reads were created from Velvet contigs using wgsim (https://github.com/lh3/wgsim), (3) Illumina reads were assembled with simulated read pairs using Allpaths-LG (version r41043) [22]. Parameters for assembly steps were

Genome annotation
Genes were identified using Prodigal [23], as part of the DOE-JGI genome annotation pipeline [24,25]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information nonredundant database, UniProt, TIGRFam, Pfam, KEGG, COG, and InterPro databases. The tRNAScanSE tool [26] was used to find tRNA genes, a Evidence codes-IDA Inferred from Direct Assay, TAS Traceable Author Statement (i.e., a direct report exists in the literature), NAS Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [53] (http://geneontology.org/page/guide-go-evidence-codes) whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [27]. Other non-coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [28]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) platform [29] developed by the Joint Genome Institute, Walnut Creek, CA, USA [30].

Genome properties
The genome is 6,650,282 nucleotides with 61.21 % GC content ( Table 3) and comprised of 91 scaffolds of 91 contigs. From a total of 6,495 genes, 6,427 were protein encoding and 68 RNA only encoding genes. The majority of protein-coding genes (79.34 %) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 4.

Insights from the genome sequence
WSM244 is one of six strains of E. medicae and one of 30 E. medicae or E. meliloti Medicago-nodulating strains that have been sequenced and whose genomes have been deposited in the IMG database. The genome of WSM244 falls within the expected size range of 6.4-7.2 Mbp for E. medicae. As observed in other E. medicae genomes, WSM244 possesses a large number of genes assigned to COG functional categories for: transport and metabolism of amino acids (12.15 %), carbohydrates (11.17 %), inorganic ions (5.3 %), lipids (3.91 %) and coenzymes (3.32 %), transcription (8.63 %) and signal transduction (3.66 %). The WSM244 genome contains only four pseudo genes, the numbers of which are highly variable in sequenced E. medicae strains and can be as high as 485 (E. medicae WSM4191). All six E. medicae strains share high ANI values of 99.18-99.67 %, which is consistent with the low levels of genetic diversity found in E. medicae populations [31]. The six E. medicae strains share 5,425 core orthologous genes. WSM244 contains 202 unique genes, including those found in clusters encoding a putative polyketide synthase, phage proteins and a sulfonate transport system. Around 72 % of these unique genes encode hypothetical proteins. Strain WSM244 is particularly interesting, as it lacks the acid tolerance of other E. medicae strains. The genome  of this strain does contain orthologs of acid response or acid tolerance genes that were initially discovered in E. medicae WSM419. These genes include actA (lnt), actP, actR, actS, phrR, lpiA and acvB [32][33][34][35]. WSM244 also contains the tcsA-tcrA-fsrR-regulatory gene cluster which is required for the low-pH-activation of lpiA and acvB in E. medicae WSM419 [36]. This finding is in direct contrast to the absence of fsrR, tcsA and tcrA in the the acid-sensitive strain E. meliloti 1021. This suggests that either there may be differences in pH responsive gene expression in the WSM244 background, or that acid tolerant E. medicae strains possess other candidate genes that are required for low pH adaptation and have not yet been identified.

Conclusions
WSM244 is of particular interest as it was isolated from M. polymorpha growing in alkaline soil and it lacks the acid tolerance of E. medicae strains isolated from medics growing in acid Sardinian and Greek soils [9]. WSM244 is the only acid-sensitive E. medicae strain that has been sequenced to date. Analysis of its sequenced genome and comparison with other sequenced E. medicae and E. meliloti genomes will yield new insights into the molecular basis of acid tolerance in rhizobia and into the ecology and biogeography of the Ensifer-Medicago symbiosis.
Endnotes 1 Editorial note-Readers are advised that in Opinion 84 the Judicial Commission of the International Committee on Systematics of Prokaryotes ruled that the genus name Ensifer Casida 1982 has priority over Sinorhizobium Chen et al. 1988 and the names are synonyms [1]. It was further concluded that the transfer of members of the genus Sinorhizobium to the genus Ensifer, as proposed by Young [2] would not cause confusion.