Novel European free-living, non-diazotrophic Bradyrhizobium isolates from contrasting soils that lack nodulation and nitrogen fixation genes – a genome comparison

The slow-growing genus Bradyrhizobium is biologically important in soils, with different representatives found to perform a range of biochemical functions including photosynthesis, induction of root nodules and symbiotic nitrogen fixation and denitrification. Consequently, the role of the genus in soil ecology and biogeochemical transformations is of agricultural and environmental significance. Some isolates of Bradyrhizobium have been shown to be non-symbiotic and do not possess the ability to form nodules. Here we present the genome and gene annotations of two such free-living Bradyrhizobium isolates, named G22 and BF49, from soils with differing long-term management regimes (grassland and bare fallow respectively) in addition to carbon metabolism analysis. These Bradyrhizobium isolates are the first to be isolated and sequenced from European soil and are the first free-living Bradyrhizobium isolates, lacking both nodulation and nitrogen fixation genes, to have their genomes sequenced and assembled from cultured samples. The G22 and BF49 genomes are distinctly different with respect to size and number of genes; the grassland isolate also contains a plasmid. There are also a number of functional differences between these isolates and other published genomes, suggesting that this ubiquitous genus is extremely heterogeneous and has roles within the community not including symbiotic nitrogen fixation.

and aromatic compound degradation 8 . Nitrogen removal through heterotrophic denitrification is an important step in the global nitrogen cycle carried out by many groups including Bradyrhizobium 10 . The multiple roles of Bradyrhizobium in the nitrogen cycle make the ecology of this group important for agriculture.
Bradyrhizobium is studied extensively due to its symbiotic relationship with soybean and consequently several genomes have been published. Currently, there are seven complete Bradyrhizobium genomes in the NCBI database. Six of these are symbiotic and are able to fix nitrogen and form root nodules on legumes (B. diazoefficiens USDA 110, B. japonicum USDA 6, B. japonicum E109, Bradyrhizobium sp. ORS278, Bradyrhizobium sp. BTAi1 and B. oligotrophicum S58) with ORS278, BTAi1 and S58 able to form both stem and root nodules on the aquatic legume Aeschynomene 6,11 . The other bradyrhizobial genome (Bradyrhizobium sp. S23321) is free-living because it is unable to form nodules; however, it still contains the genes required for nitrogen fixation. Four genomes sequenced from North American forest soils were also missing nodulation and nitrogen fixation genes (Bradyrhizobium sp. LTSP849, Bradyrhizobium sp. LTSP857, Bradyrhizobium sp. LTSP885 and Bradyrhizobium sp. LTSPM299). These genomes were sequenced using shotgun sequencing of the soil community and were assembled to near completion 2 . Due to the availability of a diverse array of genome reference sequences, Bradyrhizobium is an appropriate model to study other soil bacteria: understanding the mechanisms of Bradyrhizobium adaptation to independent living in agricultural soils under contrasting management may reveal the genetic potential of this globally important genus.
Here we present the genome and gene annotations and carbon metabolism profiles of two free-living Bradyrhizobium isolates from the Highfield experiment at Rothamsted Research that has three long-term treatment regimes: grassland, arable (wheat) and bare fallow tilled regularly to maintain a plant-free soil. Maintenance of these treatments for 60 years has led to distinct differences in soil properties and the soil microbiome 12 . Bradyrhizobium sp. G22 and Bradyrhizobium sp. BF49 were isolated from soil taken from the permanent grassland and permanent bare fallow plots of the Highfield experiment respectively. These Bradyrhizobium strains are the first to be isolated and genome sequenced from European soil and the first free-living and non-diazotrophic isolates, without the presence of either nodulation or nitrogen fixation genes, to have their genomes sequenced and assembled from cultured samples. The isolates were interrogated for differences to determine the level of genetic heterogeneity in carbon metabolism between these isolates.

Results and Discussion
General genome description and comparisons. The genome of the grassland isolate G22 is 9,022,917 bp in size while the bare fallow isolate BF49 genome is 7,547,693 bp, constituting a 1.5 Mbp size difference in addition to a 364,482 bp plasmid in G22. The genome size for G22 is similar to nodulating strains B. diazoefficiens USDA 110, B. japonicum USDA 6, B. japonicum E109, B. oligotrophicum S58 and Bradyrhizobium sp. BTAi1, whereas for BF49 it is closer in size to the free-living strain Bradyrhizobium sp. S23321 and the photosynthetic, nodulating strain Bradyrhizobium sp. ORS278. Table 1 summarises the genome information of the two novel strains G22 and BF49 along with the other seven completed Bradyrhizobium genomes in the database. G22 had 19 contigs which could not be placed in the chromosome or plasmid. These were annotated and used in the subsequent analysis.
G22 shows 1356 more genes (8787) than BF49 (7431) and this rises to 1902 when including the genes contained on the plasmid (541). GC content is similar between the strains at 63.7% and 63.8% for G22 and BF49 respectively, consistent with other Bradyrhizobium strains listed in Table 1. The G22 plasmid has a GC content of 60.7%, identical to the plasmid of BTAi1. The G22 and BF49 genomes show high pairwise sequence identity in comparison to both USDA 110 (G22: 84.4%, BF49: 83.4%) and S23321 (G22: 83.4%, BF49: 83.2%) using LASTZ (Large-Scale Genome Alignment Tool).
Orthologous gene clusters and core genome phylogeny. G22 and BF49 were compared with the free-living isolate, S23321, and the symbiotic isolate, USDA 110 ( Fig. 1). This suggests that there is a core genome of 4562 genes which are present in all four genomes assessed. The 103 genes present only in the USDA 110 genome include those involved in nodulation and uptake hydrogenase. The 171 genes which are only present in   Genes involved in nitrogen fixation and nodulation. Nitrogen fixation and nodulation genes including nifDKH, nodD and nodABC; are all absent from both G22 and BF49 (Table 2) and so we suggest that these isolates are not able to either form nodules or fix atmospheric nitrogen. The absence of both nif and nod genes is in contrast to other published, complete Bradyrhizobium genomes and indicates similarity with forest soil bacteria (LTSP849, LTSP857, LTSP885 and LTSPM299) 2  ORS278 and the free-living strain Bradyrhizobium sp. S23321) (Fig. 4). These strains use a nod-independent route for stem and root nodulation of Aeschynomene 11 . The FixLJ two component system is present in both the grassland and bare fallow isolate ( Table 2). FixLJ acts in response to low oxygen conditions in soil and in the nodule and controls the expression of genes for both nitrogen fixation and denitrification 18,19 . This two-component system has also been shown to regulate the response to nitric oxide in Sinorhizobium meliloti being shown to regulate a high proportion of genes induced by the presence of nitric oxide 20 . The presence of fixLJ in G22 and BF49 is consistent with other Bradyrhizobium isolates including all seven completed genomes.
Genes involved in denitrification. Both genomes encode a nitrate reductase, NapA/B, which catalyses the reduction of nitrate to nitrite; the first stage in denitrification [21][22][23] . This is common among Bradyrhizobium including all seven complete genomes ( Table 2 and Fig. 4) and the four genomes from North American forest soils (Table 2). In addition, all fully-sequenced isolates including both G22 and BF49 contain nirK, encoding a respiratory copper-containing nitrite reductase which is involved in the second stage of denitrification reducing nitrite to nitric oxide. The third stage of denitrification is the conversion of nitric oxide into nitrous oxide, a potent greenhouse gas, and is catalysed by a nitric oxide reductase encoded by norB/C 22,23 . The presence of a nitric oxide reductase gene has been noted in all previously published and complete Bradyrhizobium genomes in addition to G22 and BF49. The denitrification pathway is not present in the four genomes of strains from North American forest soils. The ability to convert the greenhouse gas, nitrous oxide into environmentally benign nitrogen gas in the final stage of denitrification is an attribute with global importance 24 . Nitrous oxide reductase encoded by nosZ catalyses this process but it is not ubiquitous across Bradyrhizobium 22,23,25,26 . Of the seven published complete genomes, only B. diazoefficiens USDA 110, Bradyrhizobium sp. BTAi1 and B. oligotrophicum S58 contain the nosZ gene. It is absent from the grassland isolate, G22 but present in the bare fallow isolate, BF49. The presence of the gene shows a potential for BF49 to perform this function and increases the agricultural and environmental importance of this isolate. Uptake hydrogenase. The uptake of hydrogen is catalysed by uptake hydrogenase which is encoded by the hup genes [27][28][29] . This process produces ATP which is used by nitrogen-fixing bacteria to mediate for energy lost through the nitrogen fixation process 27,29 . The nickel-iron hydrogenase, encoded by hupSL 29 , is absent from both G22 and BF49 but is present in all of the symbiotic strains of Bradyrhizobium with complete genomes. These genes are also absent from LTSP849, LTSP857, LTSP885 and LTSPM299 (Table 2).
Photosynthesis and carbon fixation. Genes for photosynthesis are present in four of the published complete Bradyrhizobium genomes; S23321 (free-living), S58, BTAi1 and ORS278 (Aeschynomene host) (Fig. 4b). These include genes for bacteriochlorophyll (bchCXYZ/FNBHLM), carotenoids (crtEF), light harvesting polypeptides (pucBAC/pufBA) and reaction centre subunits (puhA/pufLM) 30,31 . They are not present in the soybean-nodulating isolates (USDA 110, USDA 6, and E109), G22 or BF49 genomes or the four isolates from forest soils (LTSP849, LTSP857, LTSP885 and LTSPM299) ( Table 2). Many heterotrophic bacteria including Bradyrhizobium can fix carbon using the Calvin-Benson-Bassham cycle (CBB cycle). The significance of this is unclear in most cases although the photosynthetic Aeschynomene-nodulating strain ORS278 has been shown to require an active CBB cycle for symbiotic nitrogen fixation 32 . The first stage of the CBB cycle is catalysed by Ribulose-1,5-bisphosphate carboxylase oxygenase (RuBisCo) which is present in both G22 and BF49 genomes, consistent with other Bradyrhizobium isolates ( Table 2). Transketolase is an important enzyme in both the CBB cycle and pentose phosphate pathway, catalysing the interconversion of sugars 33,34 . Genes for the transketolase enzyme are present in both G22 and BF49 consistent with all other complete Bradyrhizobium isolates which have been published.
Carbon metabolism. The principal components analysis (PCA) biplot (Fig. 5A) shows that there is a separation of the time points and USDA 6 from G22 and BF49 across PC1 which accounts for 44.86% of the variation. Across PC2, G22 and BF49 separate and PC2 accounts for 30.52% of the total variation. The first two principal components were visualised as together they accounted for 75.38% of the variation. The third PC accounted for 7.5%. Figure 5B shows the 95 substrates colour coded according to category. More carboxylic acids and amino acid substrates are associated with the separation across PC1; USDA 6 from G22 and BF49. Carbohydrates tend to have a negative direction across PC2 being more closely associated with BF49. One carboxylic acid, malonic acid, was the only substrate which has a positive direction for PC2 and negative for PC1 associating more closely with G22.
Malonic acid can be found in plant tissues, including legumes being first characterised from alfalfa leaves in 1925 and has been found to be in very high concentrations in soybeans [35][36][37] . This pathway was found to be closely associated with the symbiotic nitrogen fixation pathway in Rhizobium leguminosarium bv trifolii 38 . Malonic acid is activated before being broken down into acetate and carbon dioxide through decarboxylation 39 . It is often converted to malonyl-CoA by a CoA transferase, which is present in G22 (Phosphoribosyl-dephospho-CoA    transferase) and the malonyl-CoA is then decarboxylated by malonate decarboxylase 35,39 . Malonate decarboxylase has four subunits; alpha, beta, gamma and delta; all of which are present in G22. The grassland isolate also contains two mad genes; madL and madM. These genes have previously been reported to be part of the malonate decarboxylase operon as transport proteins thought to be involved in malonate uptake 35 . The grassland isolate G22 also contains a malonyl CoA acyl carrier protein transacylase and triphosphoribosyl-dephospho-CoA synthetase. All genes involved in malonate decarboxylation are absent in the bare fallow isolate BF49. Malonate transport and utilisation genes are also present in other Bradyrhizobium isolates including BTAi1, ORS278, USDA 110, S23321 and all four of the forest strains (LTSP849, LTSP857, LTSP885 and LTSPM299). When grown in malonic acid, only G22 was able to metabolise it whereas BF49 and USDA 6 were not able to utilise this carbon source to the same extent. The highest loadings and therefore the substrates which make the largest contribution to PC1 were L-pyroglutamic acid, L-leucine and D-galacturonic acid. L-pyroglutamic acid is an amino acid which is also known as 5-oxoproline. It is involved in the glutathione pathway and is converted to L-glutamic acid by the enzyme 5-oxoprolinase (EC 3.5.2.9) 40,41 . This gene is present in USDA 6 but is absent in both G22 and BF49. USDA 6 recorded a larger OD across all time points for this substrate than G22 and BF49. L-leucine is also an amino acid and is involved in numerous pathways including valine, leucine and isoleucine biosynthesis and degradation 40,41 . USDA 6 was able to utilise this substrate to a much greater extent than G22 and BF49 and the main difference in genes involved in L-leucine metabolism was the presence of leucine transaminase (EC 2.6.1.6) in USDA 6 and absence in G22 or BF49. D-galacturonic acid is a carbohydrate involved in numerous pathways including pentose and glucoronate interconversions, starch and sucrose metabolism and amino sugar and nucleotide sugar metabolism 40,41 . It is metabolised by USDA 6 more readily than G22 and BF49. This is likely due to the presence of pectinase (EC 3.2.1.15) and urinate dehydrogenase (EC 1.1.1.203) in USDA 6 but not in G22 or BF49. OD curves for the substrates discussed can be found in Supplementary Information S1.
Plasmid. Plasmid replication genes repABC and parAB are only found on the G22 plasmid and not on either the G22 or BF49 chromosomes. The trb operon for conjugal transfer consists of 12 genes; traI, trbBCDEJKLF-GHI 42 . From this operon, 10 out of the 12 genes are present on the G22 chromosome; only trbH and traI are missing. Conjugal transfer genes trbBCDEFGIJL are absent from BF49. Conjugative transfer DNA nicking endonuclease genes traR/traO are present only in the G22 plasmid. The other genes which were unique to the plasmid were two genes involved in purine utilisation (yagS and a putative xanthine transporter) and one for osmotic stress (opgC). The purine utilisation genes, yagS, encodes a periplasmic aromatic aldehyde oxidoreductase. YagS is usually part of an operon yagTSRQ which encodes a molybdenum-containing iron sulphur flavoprotein, where YagS is the FAD-containing subunit. The role of this protein has been suggested to be detoxification of aromatic aldehydes 43,44 . The flavoprotein produced from yagTSRQ shows homology with xanthine dehydrogenase 43 . The osmotic stress gene, opgC, is involved in the synthesis of osmoregulated periplasmic glucans (OPGs). The exact role of OPGs is not understood however they have been shown to potentially play a role in the interaction between bacteria and their eukaryotic host 45,46 . The OpgC protein has been examined in Rhodobacter sphaeroides and was shown to encode a succinyltransferase homolog involved in the succinyl modification of OPGs 45,46 . The other genes contained on the plasmid are also on the chromosome of both G22 and BF49 including genes for DNA ligases, DNA repair, cAMP signalling and RNA processing and modification.

Conclusions
The strains described here are the first to be isolated and genome-sequenced from European soil and are unique compared to other completed Bradyrhizobium genomes due to the absence of previously characterised genes and gene clusters for symbiosis, nitrogen fixation and photosynthesis. They are also distinct from the North American forest isolates as G22 and BF49 contain genes for denitrification. They represent a major group, likely to play a key role in denitrification. The presence or absence of the terminal denitrification gene, nosZ, may determine whether the end product of denitrification is the potent greenhouse gas, nitrous oxide or the less problematic nitrogen gas. The carbon metabolism analysis shows that G22 and BF49 show different metabolic profiles over time and this are also distinct from a nodulating strain, USDA 6. The genomes and carbon metabolism analysis indicate that the free-living soil Bradyrhizobium have the potential to carry out many degradative and transformative functions in soil; the marked differences between two isolates from comparable soils that have undergone different management indicates that they form part of an extremely heterogeneous group.

Methods
Isolation and identification. Bradyrhizobium were isolated from the permanent grassland and bare fallow plots in the Highfield experiment at Rothamsted Research 12 . Serially diluted soil samples were plated onto modified arabinose gluconate (MAG) agar plates incubated at 28 °C 47 . Colonies forming after 7 days were picked, DNA extracted using MicroLYSIS-Plus using manufacturer's instructions (Microzone, UK) and identified by PCR by the production of a 1360 bp 16S rRNA fragment using custom-designed Bradyrhizobium specific primers: Bradj16S70F (5′-GCGGGCGTAGCAATACGTCAGC-3′) and Bradj16S1430R (5′-GCCGGCTGCCTCCCTTGCGGGTTA-3′). Each PCR mixture (20 μ l) consisted of 1x NH4 reaction buffer, 0.5 mM of each dNTP, 2.5 mM MgCl 2 , 0.1 μ M of each primer, 1 U Biotaq polymerase (Bioline, UK) and 10 ng of genomic DNA. The PCR conditions were as follows: 95 °C 2 min, 30 cycles of 95 °C 15 sec, 68 °C 15 sec and 72 °C 1 min and extension at 72 °C for 10 min. The PCR products were examined on a 1.5% agarose gel stained with ethidium bromide (0.2 μ g ml −1 ) at 100 V for 60 min. PCR products were purified using the Wizard SV gel and PCR clean up system (Promega, USA), sequenced as described in Mauchline et al. (2014) 48 and identification confirmed using BLAST and searching the NCBI and RDP databases. Two isolates, one from grassland soil and one from bare fallow soil were chosen at random from the generated Bradyrhizobium culture collection to have their genomes sequenced de novo.
De novo genome sequencing. DNA was extracted from isolates grown in MAG broth incubated at 28 °C and shaken at 100 rpm using the GenElute Bacterial Genomic DNA kit (Sigma Aldrich, USA De novo genome assembly and annotation. The sequence data was assembled using SOAPdenovo2 assembler and gap closer 49 using a maximum read length of 615 bp and the average insert size of 6 kb. A range of assemblies were produced using kmer values; 61-83, 81, 83, 85, 87, 89, 91. The genomes were manually curated using Geneious (Biomatters Ltd v8.1.5). Gap closing was carried out by PCR. Any remaining contigs from the 61-83 kmer reference assembly over 500 bp were investigated while those under were discarded. The G22 assembly had 19 contigs over 500 bp unplaced within the chromosome, with some containing annotated genes. No contigs over 500 bp remained after the completion of the final BF49 reference and so we can be confident that all genes are present in the assembly. The genomes were uploaded into RAST (Rapid Annotation using Subsystem Technology) for annotation 50,51 . The default parameters were used which were as follows: Classic RAST annotation scheme, RAST  Genome comparisons. Genome annotations were downloaded from RAST and examined manually using KEGG 40,41 . OrthoVenn was used to assess gene orthology using an e-value of 1e −10 52 . BRIG (BLAST Ring Image Generator) was used to compare the genomes with other Bradyrhizobium isolates (NCBI blast version 2.2.31) 53 . Large-Scale Genome Alignment Tool (LASTZ) was used to assess genome-wide sequence similarity in Geneious 54 .
16S rRNA phylogeny analysis. The 16S rRNA gene sequences for G22 and BF49 were compared with other Bradyrhizobium sequences in the NCBI database. All Bradyrhizobium 16S sequences from the NCBI RefSeq database and the four Bradyrhizobium isolates from North American forest soils 2 were downloaded. These sequences were aligned using MUSCLE using 8 iterations 55 . The aligned region was extracted (1220 bp) and a phylogenetic tree was created using the neighbour joining clustering method with 1000 bootstraps. A 75% support threshold was used for drawing the phylogenetic tree. Accession numbers for the sequences used in this analysis can be found in Supplementary Information S2.
Core genome phylogenetic analysis. OrthoVenn was used to identify genes which were present in all 9 genomes and were considered the "core genome". The inflation value was set to 1.5 and an e-value of 1e −10 . These genes were uploaded into the call SNPs and infer phylogeny (CSI) tool 56 hosted by the Center for Genomic Epidemiology 57 using the default options using B. diazoefficiens USDA 110 as the reference sequence. Default options were as follows: minimum of 10x depth at SNP positions, 10% relative minimum depth at SNP positions, a minimum of 10 bp distance between SNPs, minimum SNP quality score of 30, minimum read mapping quality score of 25 and minimum z-score of 1.96. Altered FastTree was selected. The core genome phylogenetic tree (Newick file) was visualised in Geneious.
Biolog carbon assays. Isolates G22, BF49 and B. japonicum USDA 6 were grown in MAG broth, incubated at 28 °C and shaken at 100 rpm. Cell density was estimated from 1 ml of culture stained with 0.05% methylene blue using a haemocytometer. The cultures were diluted to a cell density of 10 6 μ l −1 . The diluted cultures were centrifuged at 14000 × g for 1 minute, the supernatant was removed and the cells re-suspended in sterile deionised Scientific RepoRts | 6:25858 | DOI: 10.1038/srep25858 water. Each well of a Biolog GN2 MicroPlate ™ was inoculated with 140 μ l (10 8 cells/ 140 μ l) of bacterial culture.
Three replicate plates per isolate were used. The optical density (OD) was read using a Varioscan SkanIt plate reader (Thermo Fisher Scientific Inc.) at 590 nm and 25 °C every 24 hours for a total of 98 hours (0, 24, 48, 72 and 98 hours). The plates were incubated at 25 °C and shaken at 100 rpm. Full list of substrates can be seen in Supplementary Information S3. Statistical analyses. The Biolog data was analysed using principal components analysis (PCA) using the inbuilt function, prcomp, in R (version 3.2.2). For visualisation of the PCA, biplots were drawn using PC1 and PC2 and the loadings matrix was extracted (see Supplementary Information S3). This identified specific substrates which were associated with the isolates and also substrates making the greatest contribution to the principal component. These pathways were then examined in the genomes. The substrates were grouped into carbohydrates, carboxylic acids, amines and amides, amino acids, polymers and miscellaneous according to categories identified in previously published work 58 .