Comparative genomic analysis of Bradyrhizobium strains with natural variability in the efficiency of nitrogen fixation, competitiveness, and adaptation to stressful edaphoclimatic conditions

ABSTRACT Bradyrhizobium is known for fixing atmospheric nitrogen in symbiosis with agronomically important crops. This study focused on two groups of strains, each containing eight natural variants of the parental strains, Bradyrhizobium japonicum SEMIA 586 (=CNPSo 17) or Bradyrhizobium diazoefficiens SEMIA 566 (=CNPSo 10). CNPSo 17 and CNPSo 10 were used as commercial inoculants for soybean crops in Brazil at the beginning of the crop expansion in the southern region in the 1960s–1970s. Variants derived from these parental strains were obtained in the late 1980s through a strain selection program aimed at identifying elite strains adapted to a new cropping frontier in the central-western Cerrado region, with a higher capacity of biological nitrogen fixation (BNF) and competitiveness. Here, we aimed to detect genetic variations possibly related to BNF, competitiveness for nodule occupancy, and adaptation to the stressful conditions of the Brazilian Cerrado soils. High-quality genome assemblies were produced for all strains. The core genome phylogeny revealed that strains of each group are closely related, as confirmed by high average nucleotide identity values. However, variants accumulated divergences resulting from horizontal gene transfer, genomic rearrangements, and nucleotide polymorphisms. The B. japonicum group presented a larger pangenome and a higher number of nucleotide polymorphisms than the B. diazoefficiens group, possibly due to its longer adaptation time to the Cerrado soil. Interestingly, five strains of the B. japonicum group carry two plasmids. The genetic variability found in both groups is discussed considering the observed differences in their BNF capacity, competitiveness for nodule occupancy, and environmental adaptation. IMPORTANCE Today, Brazil is a global leader in the study and use of biological nitrogen fixation with soybean crops. As Brazilian soils are naturally void of soybean-compatible bradyrhizobia, strain selection programs were established, starting with foreign isolates. Selection searched for adaptation to the local edaphoclimatic conditions, higher efficiency of nitrogen fixation, and strong competitiveness for nodule occupancy. We analyzed the genomes of two parental strains of Bradyrhizobium japonicum and Bradyrhizobium diazoefficiens and eight variant strains derived from each parental strain. We detected two plasmids in five strains and several genetic differences that might be related to adaptation to the stressful conditions of the soils of the Brazilian Cerrado biome. We also detected genetic variations in specific regions that may impact symbiotic nitrogen fixation. Our analysis contributes to new insights into the evolution of Bradyrhizobium, and some of the identified differences may be applied as genetic markers to assist strain selection programs.

IMPORTANCE Today, Brazil is a global leader in the study and use of biological nitrogen fixation with soybean crops.As Brazilian soils are naturally void of soybean-compati ble bradyrhizobia, strain selection programs were established, starting with foreign isolates.Selection searched for adaptation to the local edaphoclimatic conditions, higher efficiency of nitrogen fixation, and strong competitiveness for nodule occupancy.We analyzed the genomes of two parental strains of Bradyrhizobium japonicum and Bradyrhizobium diazoefficiens and eight variant strains derived from each parental strain.We detected two plasmids in five strains and several genetic differences that might be related to adaptation to the stressful conditions of the soils of the Brazilian Cer rado biome.We also detected genetic variations in specific regions that may impact symbiotic nitrogen fixation.Our analysis contributes to new insights into the evolution of Bradyrhizobium, and some of the identified differences may be applied as genetic markers to assist strain selection programs.with the differences in symbiotic abilities based solely on the genomes or proteomes of CPAC 15, CPAC 7, and the corresponding type strains of B. japonicum and B. diazoefficiens (14,15).More recently, differences in the symbiotic island were examined using draft genome assemblies of the parental strains and a subset of the variants (16).However, the genetic variations responsible for the phenotypic differences in N-fixation capabili ties, competitiveness for legume nodule occupancy, and adaptation properties of those variants are not yet known.
Species of the genus Bradyrhizobium display broad variation in lifestyles and legume host range, presumably influencing the group's genetic organization (17).Conceptually, pangenomes represent the entire set of genes of a phylogenetically related group of strains fractioned into core and dispensable genomes (18).The core genome consists of genes shared by all strains of the group and usually codes essential functions.In contrast, the dispensable genome refers to genes present in a subset of strains (the accessory genome) or only in individual strains (the unique genome), and it is composed of genes that may provide additional functions for those strains and often relate to environmental adaptation (19).Analyses of bacterial pangenomes may elucidate genomic variability even between closely related strains within a species and reveal information about horizontal gene transfer (HGT) and evolution (20).
Events of HGT, recombination, and mutations are some of the main drivers of bacterial evolution (21).HGT refers to the transfer of DNA segments from one organ ism to another.Mobile genetic elements such as plasmids and integrative conjugative elements may contribute to the fitness of recipient bacteria by conferring selective advantages, including legume symbiosis, antibiotic resistance, and pathogenicity (22,23).Concerning the genomes of rhizobia, symbiotic plasmids (i.e., plasmids encoding the determinants of legume symbiosis) are prevalent in the genera Sinorhizobium and Rhizobium, whereas the genera Bradyrhizobium and Mesorhizobium usually carry their symbiotic genes in genomic islands integrated into the chromosomes, known as symbiosis islands (SIs) (23,24).Genomic islands are commonly inserted at tRNA genes and may present a lower GC mol% in comparison to the rest of the chromosome, in addition to a high number of genes encoding hypothetical proteins, plasmid conjuga tion, integrases, insertion sequences (ISs), and transposases (22).This movement of DNA segments may prompt genome recombination, including inversions, translocations, TABLE 1 Competitiveness (percentage of nodule occupancy when co-inoculated with another competitive strain) and BNF capacity (mg of N accumulated per plant) comparison between the parental strain (P) and the variant strains compared to the reference variant (R) a a The values are presented as percentages relative to the reference strain of each group.Data for the CPAC 15 group were retrieved from Hungria et al. (10), while data for the CPAC 7 group were retrieved from Santos et al. (12).
In this study, we provide a comprehensive analysis of complete or high-quality draft genomes for the natural variants of B. japonicum and B. diazoefficiens adapted to Brazilian Cerrado soils for longer or shorter periods, respectively.Each group encompassed one parental strain previously used as a commercial inoculant in Brazil, SEMIA 566 (=CNSPo 17) or SEMIA 586 (=CNPSo 10), one natural variant strain currently used in commercial inoculants and treated as the reference genome, CPAC 15 or CPAC 7, and seven other putative natural variants (Table 1).Our findings shed light on the genetic basis of phenotypic variation and evolutionary dynamics within these economically important strains.

Whole-genome sequencing statistics
Hybrid genome assembly using Oxford Nanopore Technologies (ONT) and Illumina reads from the B. japonicum strains (excluding CPAC 15, for which the published genome was used) resulted in seven finished genomes and one strain with the chromosome split into two contigs (Table 2).The genomes ranged from 9,584,431 to 10,367,856 bp, which is comparable to the published genome of the reference strain CPAC 15, estimated at 9,583,027 bp (14), and of the type strain USDA 6 T with 9,207,384 bp (26).The total number of coding genes ranged from 8,758 to 9,472.The GC content (mol%) ranged from 63.34% to 63.55%.
The genome assemblies from the B. diazoefficiens group resulted in finished genomes for all nine strains (Table 2).The genome sizes were smaller than those of the B. japonicum group, ranging from 9,120,098 to 9,138,003 bp, which is very close to the genome size of the type strain USDA 110 T (9,105,828 bp) (27), and of the original draft genome of reference strain CPAC 7, estimated at 9,138,870 bp (14).The total number of coding genes ranged from 8,220 to 8,244 and the GC content (mol%) ranged from 63.96% to 63.97%.
We computed the average nucleotide identity (ANI) of each parental and variant against the reference strain of each group (CPAC 15 and CPAC 7).The strains of both groups shared high ANI values, confirming that they are closely related.The parenthood of the strains within each group was further confirmed by their BOX-PCR profiles (Fig. S1).The B. japonicum strains shared ANI values equal to or higher than 99.82% with CPAC 15, and all strains of the B. diazoefficiens group shared ~99.99% with CPAC 7. ANI comparisons against the species type strains, B. japonicum USDA 6 T and B. diazoefficiens USDA 110 T , confirmed the species designations of all strains were accurate (Table 2).

Core genome phylogeny and genome synteny
As expected, the B. japonicum and B. diazoefficiens groups were separated into two clades with high bootstrap support in a core-genome phylogeny (Fig. 1).The type strains B. japonicum USDA 6 T and B. diazoefficiens USDA 110 T were included in the phylogeny and presented a basal position in the respective clade of each species.The core genome phylogeny showed low differentiation between the strains of both groups as expected since we are working with closely related strains in each group.Interestingly, the B. japonicum group could be subdivided into two well-supported monophyletic groups.The five variants of the B. japonicum group with a genome size >10.2Mb (compared to ~9.6 Mb for the other B. japonicum strains) formed one monophyletic group, suggesting an ancestral genome enlargement at the base of this clade, while the other four strains (including CPAC 15) formed a second monophyletic group (Fig. 1).
The genomes of the B. japonicum group are highly syntenic; however, it is possible to observe rearrangements, including inversions, transpositions, and additions/deletions (Fig. 2a).By comparing the genome of the B. japonicum reference strain CPAC 15 and its parental CNPSo 17, we observed a possible inversion of a large segment (around 3,500,000 bp).The strains CNPSo 23, CNPSo 24, CNPSo 29, CNPSo 31, CNPSo 34, and CNPSo 38 are highly syntenic with CNPSo 17, indicating conservation of the gene order.On the other hand, strain CNPSo 22, which is polyphyletic with CPAC 15, appears to have the same inversion present in CPAC 15.However, we note that CPAC 15 and CNPSo 22 are the only genomes not assembled using Flye, and thus we cannot rule out that the putative inversion instead reflects assembly errors.Other small inversions were detected, as well as translocations and deletions; however, they are unlikely to impact the symbiosis islands (see Fig. S2 for alignments of the symbiosis islands).We found a higher number of IS and transposases in the five larger genomes.The genomes of the B. diazoefficiens group are highly syntenic (Fig. 2b), and unlike the B. japonicum group, they displayed no relevant rearrangements.The strains of the B. diazoefficiens group contained 69 ISs and 9 transposase genes.

Plasmid content
We identified two sets of orthologous and highly syntenic plasmids in the five B. japonicum strains with larger genomes (strains CNPSo 22, CNPSo 29, CNPSo 31, CNPSo 34, and CNPSo 38), which were confirmed as plasmids by the presence of repABC operons encoding proteins responsible for plasmid replication and partitioning (28).Plasmid "a" presented high similarity among the strains, with 98%-99.9%identities and sizes ranging from 160,017 to 170,161 bp.Plasmid "b" shared 96.6%-99.9%identities among the strains and sizes ranging from 283,210 and 291,690 bp.Interestingly, the plasmids did not account for the full difference in the genome sizes of these five B. japonicum strains compared to the four with smaller genomes (CPAC 15, CNPSo 17, CNPSo 23, and CNPSo 24), indicating that this lineage also experienced an ancestral A BLASTn search showed that a segment (~49,500 bp) of plasmid "a" of all variants shares >99.95% identity to the 210 kb plasmid pNK6b (GenBank accession AP014686.1) of B. diazoefficiens NK6 (29).The number of genes on the "a" plasmid ranges from 123 to 132 and encodes a type III secretion system (T3SS), which is known to participate in host cell infections, including symbiotic interactions (30).The T3SS cluster of plasmid "a" is composed of the secretion and cellular translocation (sct) genes (sctNVJQRSTU) in all strains except for CNPSo 29, which containes a putative deletion of eight genes caused by an IS4 family transposase, including part of the T3SS apparatus.Interestingly, plasmid "a" also harbors genes related to stress responses, including genes coding for toxinantitoxin systems (TA systems) (MazE/MazF, Phd/YefM), a secretion protein (HlyD family efflux transporter periplasmic adaptor subunit, lost in CNPSo 38), a heat-shock protein (molecular chaperone HtpG), and a trehalose-6-phosphate synthase.A LysR substratebinding domain-containing protein related to nitrogen metabolism was also identified.Moreover, plasmid "a" contained the gene clusters traADGFHM and trbBCDEFGHIJKL related to conjugation.
The entire sequence of plasmid "b" shares around 99.99% identity with the 290 kb plasmid pN03G-2 of B. japonicum pN03G-2 (GenBank accession CP126012), suggesting common ancestry.The number of genes on the "b" plasmids ranges from 212 to 219, and the annotation revealed genes encoding proteins related to stress responses, such as TA systems (VapB/VapC, Phd/YefM, AbiEii/AbiGii), efflux transporters (HlyD and HlyB families), a cold-shock protein, heat-shock proteins (chaperonin GroEL, co-chaperone GroES), and an O-antigen gene cluster.Plasmid "b" also carries several genes encoding proteins involved in the synthesis, transport, metabolism, and regulation of amino acids, nucleic acids, carbohydrates, and lipids.Even though no T3SS genes were detected, two T3SS effector proteins (C48 family peptidase and E3 ubiquitin-protein ligase) were identified.The conjugation apparatus of plasmid "b" is encoded by traACDFGM and trbBCDEJLFGIH.More information about the gene contents of the plasmids is presented in Table S1.

Symbiosis islands
The symbiosis islands A and B of both groups were detected according to Weisberg et al. (23), whereas SI C was identified as the region proposed by Kaneko et al. (26).In addition, the automatic annotation of SI A was manually curated according to previous studies (23,27,31).SI A of the B. japonicum and B. diazoefficiens groups is bordered by tRNA-valine and a recombinase gene.Within each group of strains, the SIs are highly syntenic (Fig. S2  and S3), although some differences in gene content are evident.
The average size of SI A in the B. japonicum genomes is ~690,000 bp and can be found between the chromosomal positions 7,581,129 and 8,615,546 depending on the strain.SI A contains between 584 and 615 genes.SI A of all B. japonicum strains carries the genes classically important for nodulation and nitrogen fixation, including nodD2D1ABCSUIJZ, noeEIL, nolAIKNOY, nolUV, the regulatory system nodV/nodW, the nitrogen fixation genes fixABCKRWX and nifABDEHKNQSTWXZ, T3SS genes (sct, also referred to as "Rhizobiumconserved" genes, rhc) rhcC1C2DJNQRSTUV, and genes encoding T3SS effector proteins known as nodulation outer proteins (nop), nopAALARBWE1HLMP1P2P3.In addition to these symbiosis genes, between 155 and 222 hypothetical genes, 75 ISs from diverse families of transposases, and 100 pseudogenes were annotated.Interestingly, the monophyletic group of strains carrying plasmids have a smaller SI A (28 fewer genes on average) than those without plasmids (Fig. S2a); none of the known symbiosis genes are absent from SI A in these strains.SI B of the B. japonicum group, located between chromosomal positions 1,465,741 and 1,809,591 depending on the strain, is entirely syntenic among the strains (data not shown) and contains 4,160 bp with seven genes; an integrase gene and ybgC serve as the borders of this SI, with five intervening hypothetical genes.With an average size of 203,000 bp, SI C is also highly syntenic and smaller in the strains carrying plasmids (18 fewer genes on average) (Fig. S2b).SI C can be found between chromosomal positions 8,807,943 and 9,339,632 depending on the strain.The region is bordered by genes coding for a tyrosine-type recombinase/integrase and a 5′-methylthioadenosine/S-adenosylhomocysteine, and includes ~225 genes, with 59-65 hypothetical genes, 36 pseudogenes, and 24 IS of various transposase families.The average GC mol% content of SI A, B, and C of the B. japonicum group is 60.58%, compared to 63.34%-63.55% for the genome-wide average.SI A of the B. diazoefficiens group is slightly smaller than in the B. japonicum group, with an average size of 671,500 bp.It is located between the chromosomal positions 7,426,338 and 8,082,430.The average number of genes within SI A is 570, including 155 hypothetical genes, 85 IS from diverse transposase families, and 97 pseudogenes.This region in B. diazoefficiens is highly syntenic, except for a ~2 kb deletion in the genome of CNPSo 106 (Fig. S3a).The nodulation, nitrogen fixation, and T3SS genes are preserved as in the B. japonicum group.SI B, which is conserved across strains (data not shown), is 15,546 bp in length and is located between chromosomal positions 1,661,452 and 1,677,009.This region contains 15 genes, including nine hypothetical genes, one pseudogene, and one IS5 family transposase.SI C of the B. diazoefficiens group is found between the chromosomal positions 620,318 and 776,578 in the genomes and is ~156,300 bp in length.The average number of genes is 156, with 49 hypothetical genes, 31 pseudogenes, and 16 IS from different transposase families.This region also displays high synteny across the strains (Fig. S3b).The average GC mol% content of SI A, B, and C of the B. diazoefficiens group is 59.96%, compared to 63.96%-63.97%for the genome-wide average.

Pangenome analysis
A pangenome analysis was performed to understand the genomic variability of the strains.The pangenome of the B. japonicum group comprises 10,550 genes, 8,787 of which belong to the core genome, 1,559 to the accessory genome, and 204 unique genes.Interestingly, 748 genes were shared only between the monophyletic group of strains carrying the plasmids; in contrast, only 187 genes were shared only between the strains without a plasmid (Fig. 3a).In addition, strain CNPSo 22 carries the largest number of unique genes (32 unique genes), followed by CNPSo 17 (31 unique genes), and then CNPSo 29 and CNPSo 38 (each with 27 unique genes).Of the dispensable genome fraction, 148 genes (125 accessory genes and 23 unique genes) belong to SI A. As SI A is specialized for rhizobium-legume symbioses, we focused our analyses on these genes, as we consider genes in this region to have a higher likelihood of contributing to the differences in BNF capacity and competitiveness of the strains (Table S2).
The B. diazoefficiens group has a smaller pangenome than the B. japonicum group, consistent with the B. diazoefficiens strains being more closely related than the B. japonicum strains (Table 2).This pangenome contains a total of 8,665 genes, composed of 8,428 core, 107 accessory, and 130 unique genes.The strain CNPSo 104 presented the largest number of unique genes (22 unique genes), followed by CNPSo 106 (18 unique genes), and then CNPSo 107 and CNPSo 108 (each with 17 unique genes) (Fig. 3b).Interestingly, 21% of the dispensable genome fraction falls within SI A, and it includes 37 accessory genes and 13 unique genes (Table S3).

Nucleotide polymorphisms
We next used Snippy to search for nucleotide variations using CPAC 15 and CPAC 7 as the reference genomes for the B. japonicum and B. diazoefficiens groups, respectively.Snippy detects five types of nucleotide variants: single nucleotide polymorphisms (SNPs), multiple nucleotide polymorphisms (MNPs), insertions (ins), deletions (del), and complex variations defined by a combination of SNPs and MNPs.We were particularly interested in nucleotide polymorphisms potentially associated with variation in competitiveness or BNF capacity of the strains (Table 1).
When comparing the sequencing data of the B. japonicum strains to the reference strain CPAC 15, a total of 1,150 unique variations were detected.These include 828 SNPs, 21 MNPs, 59 insertions, 89 deletions, and 153 complex variations.Of these, 924 are in protein-coding sequences and 226 in intergenic regions.Excluding synonymous mutations as these are less likely to have biological effects, 71 variations were detected within genes or intergenic regions of SI A and 66 variations in SI C; no variations were observed in SI B. The variations within the SIs that we predict might be related to competitiveness or BNF are presented in Table 3.
The strains of the B. diazoefficiens group are very closely related, and this conservation is also reflected in the number of nucleotide polymorphisms.Within this group, we detected only 57 variations: 24 SNPs, 32 deletions, and 1 insertion.Of these, 48 are in protein-coding sequences and 9 are in intergenic regions.No MNP or complex variations were identified.Six of the nucleotide variations were detected within SI A, while no variations were observed in SI B or SI C. The variations that we predict might influence the competitiveness or BNF capacity of these strains are shown in Table 4.

DISCUSSION
In this study, we compared the genomes of two Bradyrhizobium "parental" strains previously used as inoculants in Brazil [B.japonicum CNPSo 17 (=SEMIA 566), and B. diazoefficiens CNPSo 10 (=SEMIA 586)] to the genomes of eight variants derived from the parental strains.The variants include two current commercial strains, B. japonicum CPAC 15 and B. diazoefficiens CPAC 7, respectively.More information about the variant strains is provided in the Introduction and in Table 1.

Symbiotic island organization
The Cerrado soils are likely a challenging environment due to the high temperatures, long dry-season periods, low pH and nutrient availability, and high aluminum content (4).Consequently, these soils may select for broad genetic and metabolic diversity.HGT is essential for the dissemination of selective advantages and evolution of symbiotic BNF ability among rhizobia, and it contributes to variation in rhizobium-legume symbioses.The symbiotic genes of Bradyrhizobium species are usually clustered in SIs located on the chromosome (17).Previous studies with the type strains suggested that the SIs of B. diazoefficiens USDA 110 T and B. japonicum USDA 6 T are split into three segments (SIs A, B, and C) with different sizes and locations (26,27).It was suggested that this might result from a larger ancient SI being integrated into the chromosome and then rearranged into separate segments.These same three regions were also identified in the chromosome of B. japonicum CPAC 15 and B. diazoefficiens CPAC 7 (14), as well as in all variant and parental strains analyzed in our study.a For each polymorphism, "X" indicates that it is present in a given strain, while "-" indicates it is absent.
Consistent with past studies, we observed that SI A contains most of the classical genes required for symbiotic nitrogen fixation.SI A is integrated into a tRNA-valine gene, which is the most common insertion site in bradyrhizobia (23), and carries the nif genes encoding proteins responsible for synthesizing and regulating nitrogenase, as well as the fix genes involved in oxygen metabolism.Additionally, SI A also carries the classical genes required for nodulation, including nod, noe, and nol, as well as the rhc and nop clusters that encode T3SS and T3SS effectors (T3Es), respectively, which are involved in other steps of nodulation or alternative nodulation processes (30).The presence of T3SS and T3E genes on the SIs of bradyrhizobia has been frequently described (23,32), with Teulet et al. (31) noting that 90% of Bradyrhizobium genomes with nod genes also encode a T3SS, inferring a common evolutionary origin for both gene groups in the genus.In both the B. japonicum and B. diazoefficiens groups, the rhc and nop clusters of all strains correspond to α-RhcI, commonly found in the family Nitrobacteraceae (31).

B. japonicum plasmids
Although plasmids are unusual in the genus Bradyrhizobium, there are some reports of their occurrence.Cytryn et al. (33) sequenced the plasmid of the photosynthetic Bradyrhizobium sp.BTAi1, a strain able to nodulate aquatic legumes of the genus Aeschynomene with no requirement of nod genes.Also, up to three plasmids were detected in five B. japonicum strains and six Bradyrhizobium elkanii strains from China, Thailand, and the USA (33).Whole-genome sequencing carried out by Iida et al. (29) showed that B. diazoefficiens NK6, isolated from the root nodules of soybean grown in paddy fields in Niigata (Japan), contains four plasmids (pNK6a, pNK6b, pNK6c, and pNK6d), and five other Bradyrhizobium strains had plasmids detected by pulsed-field gel electrophoresis.Ormeño-Orrillo and Martínez-Romero (17), in a study including hundreds of Bradyrhizobium genomes, revealed that 35 contained at least one copy of the repB gene, indicative of a plasmid.Finally, Bradyrhizobium sp.DOA9 is unique in that its symbiotic genes are found on a symbiotic plasmid rather than on a chromosomal SI; the DOA9 symbiotic plasmid includes nodulation, nitrogen fixation, T3SS, and type IV (T4SS) secretion system genes (32).
We detected a monophyletic group of five B. japonicum variants that each carried two plasmids.Both plasmids were annotated as encoding a number of toxin-antitoxin systems.These include a putative MazE/MazF TA system, a VapC toxin, and a Phd/YefM antitoxin on plasmid "a." Similarly, plasmid "b" putatively encodes two VapC family toxins, one VapB family antitoxin, one Phd/YefM family antitoxin, and two nucleotidyl transfer ase AbiEii/AbiGii toxin family proteins.In addition to functioning as plasmid addition systems, TA systems may also contribute to stress responses, having been linked to cell dormancy, drug-tolerant persister cells, survival during infection, adaptation to hostile environments, and biofilm formation (34,35).Interestingly, in Sinorhizobium meliloti, a VapB/VapC TA system was implicated in cell growth during symbiotic infection (36).It is a For each polymorphism, the strains carrying that polymorphism are indicated with an X.
tempting to speculate the TA systems of plasmid "a" may contribute to stress tolerance or impact legume symbiosis.Plasmid "a" carries a T3SS gene cluster that shows only 59% similarity with the T3SS genes of SI A. Other studies also reported the presence of multiple T3SS gene clusters in rhizobia, suggesting that they might be related to host specificity and competitiveness, although further research is required (30,31).While plasmid "b" did not carry genes for a T3SS apparatus, we did detect genes encoding possible T3SS effectors, including a C48 family peptidase, an E3 ubiquitin-protein ligase, and a hypothetical protein with an E3 ubiquitin transferase SlrP conserved domain.
Plasmid "a" was also annotated as encoding a HlyD family efflux protein.HlyD belongs to the resistance, nodulation, and cell division family of efflux transporters.Efflux transporters function as pumps to expel antimicrobial compounds, heavy metals, lipooligosaccharides, proteins, small molecules, and divalent metal cations and help bacteria to survive in hostile environments.In Escherichia coli, HlyD is responsible for secreting hemolysin in pathogenic infections (37).Likewise, we detected genes putatively encoding efflux transporter proteins on plasmid "b, " including two HlyD family secretion periplasmic adaptor subunit and a HlyB protein family.It is possible that these efflux proteins improve the competitiveness of bradyrhizobia by increasing resistance to antimicrobial compounds produced by other soil microbes or during the symbiotic association with legumes (38).
Several other proteins related to stress responses were identified on the plasmids, all of which may help bradyrhizobia better survive the challenging conditions faced in Cerrado soils.Plasmid "a" carries a gene putatively encoding a molecular chaperone (HtpG), while plasmid "b" carries genes putatively encoding a cold-shock protein of the CspA family, a GroEL chaperonin, and the co-chaperone GroES.The chaperone HtpG is a heat shock protein involved in maintaining protein-folding homeostasis in E. coli under high-temperature conditions (39) and was upregulated during salt stress in Rhizobium tropici (38).The CspA family includes RNA chaperones responsible for regulating the expression of target genes during temperature downshifts (40).GroEL and its cofactor GroES represent a chaperone system constitutively expressed under normal conditions and is important for bacterial protein folding by creating a hydrophilic environment.These proteins are upregulated in heat stress conditions, preventing protein denatura tion (40).Bradyrhizobium strains often carry groEL-groES operons; da Silva Batista et al. (15) detected two spots of GroEL proteins in the B. japonicum CPAC 15 proteome, while Gomes et al. (41) found GroEL spots in the B. diazoefficiens CPAC 7 proteome.
In addition to the chaperones, plasmid "a" putatively encodes a trehalose-6-phos phate synthase (OtsA), while plasmid "b" putatively encodes a choline dehydrogenase (BetA).Trehalose-6-phosphate synthases are involved in the biosynthesis of trehalose, a nonreducing disaccharide involved in bacterial tolerance against desiccation, heat, cold, oxidation, and osmotic stresses (42).OtsA orthologs have been linked to enhanced nodule occupancy competitiveness of B. diazoefficiens via improved osmotic tolerance in the early stages of soybean nodulation (42,43).Similarly, OtsA was linked to both free-living osmoadaptation in S. meliloti and competitiveness for alfalfa nodulation (44).Likewise, BetA is involved in osmoadaptation through the production of the osmopro tectant glycine betaine, and its disruption in S. meliloti prevents this organism from using environmental choline as an osmoprotectant (45).The S. meliloti bet operon is also highly induced in alfalfa nodules (46).Considering the edaphoclimatic conditions of the Cerrado region, which has year-round high temperatures and long dry-season periods that could result in osmotic stresses, the plasmid-encoded chaperons, OtsA, and BetA may improve the fitness of bradyrhizobia in Cerrado soils.
We also detected a putative O-antigen biosynthesis gene cluster on plasmid "b, " which could impact the O-antigen structure of the variants carrying this plasmid.The O-antigen comprises repeating oligosaccharide units, and together with lipid A and the core oligosaccharide, it composes the lipopolysaccharides of Gram-negative bacterial cell walls.O-antigens are structurally diverse and are essential for bacteria-host interactions by suppressing host defenses.We identified a full pathway for the produc tion of dTDP-L-rhamnose, a common component of bacterial O-antigens.In addition, we identified two glycosyltransferase family-2 proteins, two glycosyltransferase family-4 proteins, an ABC transporter permease, an ABC transporter ATP-binding protein, two glycosyltransferases, and an acetyltransferase that may also be related to O-antigen biosynthesis (47).
Overall, we hypothesize that plasmids "a" and "b" may provide adaptive advantages to the strains carrying these mobile elements under the hostile conditions faced in Cerrado soils.By improving their saprophytic ability in the soil, the strains would also be more competitive for nodule occupancy.

Variation in the gene content of the symbiotic islands
Whole genome alignments indicated high conservation of gene order across the chromosomes within the B. japonicum and B. diazoefficiens genomes.Although we detected the same possible inversion in B. japonicum CPAC 15 and CNPSo 22 (which are not sister taxa), this may instead reflect assembly errors as these were the only two genomes not assembled using Flye.Similarly, the SIs A, B, and C of B. japonicum and B. diazoefficiens were also highly syntenic within each group, despite these regions being enriched for IS and transposase genes (48).
To further evaluate how genomic variation may be correlated with the phenotypic differences of the variants, pangenome analyses were performed to identify variations in gene content across strains.We focused on variations in SI A, as this region con tains the primary genes required for symbiotic nitrogen fixation.The B. japonicum strains carrying plasmids have an SI A smaller than those that lack plasmids.This difference is driven by a contiguous region of ~50 genes present in the plasmidless strains, which is replaced by a set of 27 genes in the plasmid-containing strains.In the strains lacking plasmids, about half of the 50 genes are annotated as hypothetical genes, while the others included genes encoding type II and type IV secretion systems proteins, oxidoreductases, transcriptional regulators, IS, and transposases (Table S2).In the plasmid-containing strains, the 27 genes include eight hypothetical genes, transpo sases, and genes encoding proteins that may influence their saprophytic ability, such as a cold shock protein, carbohydrate and peptide transport systems, and LuxR-like transcriptional regulators (Table S2).
A few other interesting differences were observed within the B. japonicum group (Table S2).The parental strain CNPSo 17 appears to have lost a NoeE-like protein, which is a sulfotransferase related to modifications in the Nod factors and host specificity (24).Interestingly, CNPSo 38, which is less competitive than most variants of the B. japonicum group, gained a gene putatively encoding a second copy of NopM, an effector protein secreted by the T3SS that is related to negative effects in the interaction with legumes (49).On the other hand, no gain or loss of a specific gene that might explain the higher competitiveness of CNPSo 22, CNPSo 23, CNPSo 24, CNPSo 29, CNPSo 31, and CNPSo 34 from the B. japonicum group was identified.
Likewise, several interesting differences were observed within the B. diazoefficiens group within SI A (Table S3).CNPSo 106, a variant with equal BNF capacity and com petitiveness to CPAC 7, lost a gene coding for an acetyltransferase containing a GNAT domain.Acetyltransferases are involved in Nod factor biosynthesis (24) and, recently, a GNAT acetyltransferase was identified as related to competitiveness for Pisum sativum in Rhizobium leguminosarum bv.viciae (50).In addition, CNPSo 106 may have lost putative nopM and bacA-like genes.The negative impact of nopM was described above, while bacA and bclA (bacA-like) encode peptide transporters essential for symbiosis with legumes that produce nodule-specific cysteine-rich (NCR) peptide but not for symbiosis with legumes such as soybean that do not produce NCR peptides (51)(52)(53).Interestingly, CNPSo 104, the most competitive and efficient nitrogen fixer of this group, appears to have lost nine genes encoding five transposases, three hypothetical proteins, a sulfite exporter TauE/SafE family protein, and a sulfurtransferase important to sulfur and carbon cycles, while gaining two hypothetical genes and two putative transposases.We did not observe any obvious gene gains or losses in SI A related to the higher BNF capacity of CNPSo 104 and CNPSo 108 compared to the rest of the B. diazoefficiens group.

Nucleotide variations within SI A of the B. japonicum group
In addition to examining variation in gene presence/absence, we compared the genomes of the variant and parental strains to identify nucleotide sequence variations potentially associated with differences in BNF efficiency or competitiveness for nodule occupancy.Recently, Bender et al. (16) analyzed SNPs in the SI A of some of the same strains used in our study; while some SNPs were detected in both studies, others were not, likely as we included more strains, used complete genomes, and used different annotation and analysis tools.
Several interesting nucleotide polymorphisms were detected within SI A of the B. japonicum group of strains.We detected an SNP in a gene encoding an AbiEi family antitoxin found only in the parental strain CNPSo 17, which has equal competitiveness but lower BNF capacity than CPAC 15 (Table 1).A study by Chen et al. (35) demonstrated that mutation of the abiEi antitoxin gene of Mesorhizobium huakuii did not alter the number of nodules but strongly affected bacteroid occupancy and BNF efficiency.In addition, 21 genes directly related to symbiotic BNF were up-or downregulated in the transcriptome of the M. huakuii abiEi mutant.We therefore hypothesize that the nucleotide variation detected in abiEi might negatively affect the BNF capacity of the parental strain.
Strain CNPSo 22 contained a unique SNP in a hypothetical gene containing a conserved domain of the extra-cytoplasmic function (ECF) σ factor of the rpoE gene.The σ factors are ubiquitous in bacterial genomes and are involved in the control of gene expression by binding to RNA polymerase.A putative ECF σ factor of S. meliloti is associated with several stress conditions including heat and salt stress, as well as carbon and nitrogen starvation (54).In addition, Martínez-Salazar et al. (55) suggested that rpoE4 of Rhizobium etli is a general regulator involved with saline and osmotic responses, oxidative stress, and cell envelope biogenesis.Gourion et al. (56) showed that B. diazoefficiens USDA 110 T ECF σ factor mutants are more sensitive to heat and desiccation upon carbon starvation than the wild type.In addition, mutants formed nodules with reduced number, size, and BNF capacity in association with G. max and Vigna radiata, suggesting that ECF σ factors are important for Bradyrhizobium symbiosis (56).Considering that CNPSo 22 is a highly competitive variant, we hypothesize that the nucleotide variation in the ECF σ factor gene of SI A may be a contributing factor.
Strain CNPSo 24 has higher competitiveness and efficiency of BNF than CPAC 15, and it contains a unique SNP in a gene encoding a DUF1521 domain containing a protein homolog to the T3E NopE1.Zenher et al. (57) detected NopE1 in mature Macroptilium atropurpureum nodules hosting B. japonicum, indicating a putative function of this effector in rhizobium-legume symbioses.Wenzel et al. ( 58) also identified NopE1 in nodules and showed that mutation of B. japonicum nopE1 and its homolog, nopE2, results in a reduced number of nodules on M. atropurpureum and G. max.However, the same double mutant significantly increased the number of nodules in V. radiata, suggesting the impact is host specific (58).The DUF1521 domain-containing protein of CNPSo 24 is located three genes downstream to several other rhc genes of SI A and thus may play a role in symbiotic BNF; however, whether this variation positively or negatively influences the symbiotic capacity of CNPSo 24 remains to be evaluated.
Strain CNPSo 34 presents higher competitiveness but lower BNF capacity than CPAC 15.CNPSo 34 contains an in-frame deletion in a gene encoding a PAS domain S-box protein.Prokaryotic PAS domains usually are part of two-component regulatory systems composed of a histidine kinase sensor and a response regulator.Several BNF and nodulation proteins have a PAS domain that serves as an oxygen and/or redox sensor, which are important for nitrogenase activity and energy metabolism, respec tively (59).Examples include FixL of the FixL/FixJ two-component system that detects environmental oxygen levels and regulates the expression of BNF genes (60).Besides FixL/FixJ, Azorhizobium caulinodans also has the NtrY/NtrX two-component system; NtrY is a membrane-associated sensor with a PAS domain, which may be involved in sensing extracellular nitrogen levels (59).NifU has a PAS domain at the N-terminus, possibly related to iron and sulfur mobilization for the iron-sulfur cluster of nitrogenase (59).In addition, the nodV/nodW genes are involved in regulating the nodulation genes through flavonoid signals; NodV has four PAS domains (59).Given that the focal PAS domain S-box protein is found within SI A, we hypothesize that it is also related to symbiosis and that the in-frame deletion in CNPSo 34 contributes to its symbiotic phenotypes.
Strain CNPSo 38, with equal competitiveness and lower BNF capacity than CPAC 15, carried a unique SNP in dctA.The dctA gene is essential for symbiotic nitrogen fixation as it encodes a membrane protein responsible for transporting the C 4 -dicarboxylates malate, succinate, and fumarate, which are the primary carbon sources received by rhizobia in nodules (61,62).Therefore, as dctA is essential for symbiotic nitrogen fixation, the nucleotide variation within this gene may negatively impact the BNF capacity of CNPSo 38 and potentially also its saprophytic ability.
The monophyletic group of plasmid-carrying strains CNPSo 22, CNPSo 29, CNPSo 31, CNPSo 34, and CNPSo 38 carries an SNP in a gene encoding a YopT-type cysteine protease, an effector protein usually found in the pathogenic bacteria Pseudomonas syringae and Yersina (49) and homologous to the T3E NopT.Sinorhizobium fredii NGR234 nopT mutants show improved nodulation with Phaseolus vulgaris and Tephrosia vogelii and are negatively impacted in their association with Crotalaria juncea (49).Conversely, Bradyrhizobium vignae ORS3257 nopT mutants form fewer nodules on Vigna unguiculata and Vigna mungo (63).We, therefore, hypothesize that the SNP in the gene encoding a YopT-type cysteine protease may impact nodulation and deserves further studies.Interestingly, this group of strains also contains an SNP located in nolY encoding an isoflavone nodD-dependent protein related to infection events.B. diazoefficiens USDA 110 T nolY mutants show a slight nodulation defect on G. max, M. atropurpureum, and V. unguiculata and a severe nodulation defect on V. mungo (64).These five variant strains also carry SNPs in two genes related to nitrogenase biosynthesis, nifS and nifE.NifS is a cysteine desulfurase involved in donating sulfur for the FeS metallocluster of the Fe protein of nitrogenase (65).NifE participates in the Fe-Mo-co metallocluster synthesis of the MoFe protein of nitrogenase, along with the nifB, nifH, nifN, nifQ, and nifV genes (66).These two SNPs were also detected in CNPSo 22 and CNPSo 38 by Bender et al. (16) and could impact the BNF efficiency of these variant strains.
All variants with plasmids and the parental strain CNPSo 17 carry an SNP in an intergenic region 65 bp upstream of a gene encoding the nodule efficiency protein C (nfeC) of SI A. The nodule efficiency proteins were first identified in S. meliloti GR4 and are associated with improved nodulation efficiency and competitiveness with Medicago (67).Similarly, the deletion of nfeC in B. diazoefficiens USDA 110 T resulted in delayed nodula tion on soybeans and reduced competitiveness for nodule occupancy (68).This group of strains also has an SNP in an intergenic region 132 bp upstream of a gene encoding an electron transfer flavoprotein alpha subunit FixB family protein in SI A. In addition to nod and nif genes, the fix genes are important to BNF as they encode electron transfer proteins that function under microaerobic conditions (69).The nucleotide variations upstream of nfeC and fixB may alter their expression and consequently impact nodula tion and nitrogen fixation, respectively.
The reference strain CPAC 15 has a unique SNP in a gene putatively encoding a C48-family peptidase.Young et al. (70) suggested that the Bll8244 protein of B. diazoeffi ciens USDA 110 T , a homolog of C48-family peptidases, functions as a genistein secreted T3E protein.The C48-family protein of our focal strains contains a conserved domain of small ubiquitin-like modifier (SUMO) proteases, the main effector family found in the genus Bradyrhizobium (31).Moreover, a SUMO domain was identified in a putative effector protein of B. japonicum Is-34 that is responsible for the inability of this strain to nodulate the soybean Rj4 genotype (71).Consequently, we hypothesize that the SNP in the focal gene encoding a C48-family peptidase in CPAC 15 may impact competitiveness and nodulation, and, therefore, it should be carefully investigated in further studies.

Nucleotide variations within SI A of the B. diazoefficiens group
The reference strain CPAC 7 carries three interesting nucleotide polymorphisms in SI A compared to the rest of the B. diazoefficiens strains.CPAC 7 contains a one nucleotide insertion 108 bp upstream to nfeC.As discussed above, nfeC is related to nodulation and competitiveness of rhizobia.We hypothesize that the nucleotide insertion may impact nfeC expression and thus the competitiveness of CPAC 7. CPAC 7 also contains a one nucleotide frameshift insertion in a gene putatively encoding a pentapeptide repeatcontaining protein homologous to YjbI, a truncated hemoglobin of Bacillus subtilis.Rogstam et al. (72) demonstrated that a B. subtilis yjbI mutant is hypersensitive to sodium nitroprusside, a source of nitric oxide.Therefore, the CPAC 7 variation within yjbI of the B. diazoefficiens group may impact the saprophytic ability of the strains.Finally, CPAC 7 contains a one nucleotide frameshift insertion in a gene putatively encoding an N-acetyltransferase GNAT family.As N-acetyltransferases may promote modifications of Nod factors (24), this mutation could impact competitiveness for nodulation and host specificity.

Concluding remarks
Using whole genome sequencing and comparative genomic methodologies, we identified numerous genomic differences across the B. japonicum and B. diazoefficiens variants.Overall, a higher amount of variability was found within the B. japonicum strains compared to the B. diazoefficiens group.Interestingly, we detected a remarkable diversity of mobile elements in the B. japonicum group, with a high number of insertion sequen ces that may have contributed to genome rearrangements and HGT.The B. japonicum group has a comparatively larger pangenome, and a total of 1,150 nucleotide polymor phisms were detected across the strains, including 71 non-synonymous variations within SI A. In addition, a monophyletic group of five B. japonicum variants unexpectedly carries two plasmids, with several of the plasmid-encoded genes putatively associated with tolerance to environmental stresses, including (i) toxin and antitoxin systems, (ii) a type III secretion system, (iii) efflux systems, (iv) chaperon system and cold-shock proteins, (v) osmoprotectant (Ost), (vi) choline dehydrogenase (BetA), and (vii) O-antigen biosynthesis.Moreover, the comparison within SI A of the B. japonicum group uncovered nucleotide polymorphisms in genes encoding proteins that may impact the phenotypes of the following strains: (i) an ECF sigma factor of CNPSo 22, (ii) an antitoxin of CNPSo 17, (iii) a DUF1521 protein (NopE1) of CNPSo 24, (iv) a S box PAS domain of CNPSo 34, (v) dctA of CNPSo 38, (vi) nopT, nolY, nifS, and nifE of plasmid-containing variants, and (vii) nfeC gene of CNPSo 17.
Less variation was observed in the B. diazoefficiens group.The chromosomes and SIs were highly syntenic, and a smaller pangenome was detected in comparison to the B. japonicum group.In total, only 57 nucleotide polymorphisms were detected, of which only six are located in SI A. No plasmids were detected in the B. diazoefficiens strains.The large difference in the level of genetic variability of the two groups likely results from how the variants were originally isolated.The B. japonicum variants were isolated as highly competitive variants following more than 10 years of growth of CPAC 15 in Cerrado soils.In contrast, the B. diazoefficiens variants were selected as strains with high BNF capacity and were selected based on colony morphological differences followed by only a few years of adaptation to Cerrado soils (4).
In conclusion, we identified numerous genetic variations-including gene gains/ losses, plasmid acquisition, and nucleotide polymorphisms-across natural variants of the soybean inoculants B. japonicum CNPSo 17 and B. diazoefficiens CNPSo 10, high lighting the high plasticity of Bradyrhizobium genomes.The level of genetic variability correlated with the length of time the parental strains were allowed to adapt to their new environment.We hypothesize that many of the genetic variations reflect early adaptation to the stressful conditions of Cerrado soils that might improve saprophytic ability or that alter competitiveness or BNF capacity with local soybean genotypes.In general, single genomic differences able to explain the phenotypic differences of the variants were not obvious, suggesting that the observed alterations in competitiveness for nodule occupancy and BNF capacity may instead reflect the cumulative impact of multiple genomic variations.

Bradyrhizobium strains
This study examined 18 Bradyrhizobium strains, nine belonging to the species B. japonicum and nine to the species B. diazoefficiens.For each species, the strains included one parental genotype previously used as a commercial inoculant in Brazil, one natural variant (the reference genome) used in commercial inoculants from 1992 until the present, and seven other natural variants.The strains used in this study, as well as their BNF and competitiveness capacities, are shown in Table 1.BOX-PCR was performed as previously described (73).The background of the strains is detailed in the Introduction.All strains are deposited in the "Diazotrophic and Plant Growth Promoting Bacteria Culture Collection of Embrapa Soybean" (WFCC Collection # 1213, WDCM Collection # 1054), Londrina, Paraná, Brazil.

DNA extraction and genome sequencing
A high-quality draft genome of CPAC 15 was previously sequenced by Siqueira et al. (14) and was retrieved from the National Center of Biotechnology Information (NCBI; GenBank accession CP007569).Draft genomes of strains CNPSo 17, CNPSo 22, CNPSo 23, CNPSo 38, CNPSo 10, CNPSo 104, CNPSo 105, and CNPSo 107 were previously reported by Bender et al. (16) and their raw Illumina data were used in the process of completing their genomes in this study.All other strains were sequenced as part of this study.Strains were grown on a modified yeast mannitol medium (74) at 28°C for 5 days.The total DNA of each strain was extracted using the DNeasy Blood and Tissue kit (Qiagen), according to the manufacturer's instructions.
Libraries for Illumina sequencing were prepared following the instructions of the Nextera XT kit (# 15031942 v01) and sequenced on a MiSeq instrument to generate 300 bp paired-end reads.ONT sequencing was performed using a Rapid Barcoding Kit (SQK-RBK004) and an R9.4.1 flow cell on a minION device.ONT basecalling and demultiplexing were performed using Guppy version 5.011+ 2b6dbffa5 and the high accuracy model (ONT).Sequencing statistics are provided in Table 2.
De novo genome assembly was performed using Flye version 2.9-b1768 (78) with the ONT reads.Assemblies were checked for overlaps between the two ends of a contig using NUCmer version 4.0.0rc1(79), and if identified, one copy of each overlap was removed.The assemblies were first polished with the ONT reads using Racon version 1.4.13(80) followed by Medaka version 1.4.1 (github.com/nanoporetech/medaka);read mapping was performed with Minimap2 version 2.20-r1061 (81).The assemblies were then polished with the Illumina reads using Pilon version 1.24 (82) and Racon; read mapping was performed using bwa version 0.7.17-r1198-dirty (83).NUCmer was used again to check for overlaps between two ends of a contig, with one copy removed if found.
As Flye did not produce a fully circular chromosome for strains CNPSo 22 and CNPSo 38, de novo assembly was repeated using Unicycler version 0.5.0 (84) with both the ONT and Illumina reads.The Flye and Unicycler assemblies were then merged using the patch function of RagTag version 2.1.0(85).The procedure resulted in a circular chromosome for strain CNPSo 22, and the resulting assembly was used for downstream analyses.As this process did not improve the quality of the CNPSo 38 genome, we continued with the Flye-based assembly for subsequent steps.
Finally, the assemblies were reoriented such that the chromosomes began at the putative origin of replication (26), using circlator version 1.5.5 with the fixstart option (86).The genome of B. japonicum CPAC 15 was similarly reoriented.

Genome annotation
Genome assemblies were annotated using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) program version 2022-02-10.build5872(87).The symbiotic island regions A and B were detected as described by Weisberg et al. (23), whereas SI region C was detected according to Kaneko et al. (26).

Genome assembly statistics and average nucleotide identity
The quality of the genome assemblies was evaluated using the web-based tool QUAST: Quality Assessment Tool for Genome Assemblies (88).Pairwise average nucleotide identity was calculated using FastANI version 1.33 with default parameters (89).For ANI calculations, the genomes of B. japonicum USDA 6 T (GenBank accession NC_017249.1)and B. diazoefficiens USDA 110 T (GenBank accession NC_004463.1)were downloaded from NCBI.

Pangenome calculation
Prior to pangenome analysis, the genome sequences were reannotated using Prokka version 1.14.6 (90) to produce GFF3 files compatible with Roary.The GFF3 files produced by Prokka were used as input for Roary version 3.11.2(91) using the default identity threshold of 95%.Pangenomes were calculated separately for each group of strains (B.japonicum and B. diazoefficiens).The pangenomes were visualized as UpSet plots using the R package UpSetR version 1.4.0 (92).

Strain phylogenetic analysis
A pangenome of all 18 strains, as well as of B. japonicum USDA 6 T and B. diazoefficiens USDA 110 T , was constructed using Roary with the -e and -n options to produce a concatenated alignment of the 2,689 core genes; the alignment was produced using MAFFT version 7.310 (93).The alignment was trimmed with trimAl version 1.4rev22 (94) with the automated1 option and used to construct a maximum likelihood phylogeny with RAxML version 8.2.12 (95), under the GTRCAT model with 1,000 bootstrap inferen ces.The phylogeny was visualized using iTOL (96).

Nucleotide variant identification
Single nucleotide polymorphisms and INDELs were identified using Snippy version 4.6.0(github.com/tseemann/snippy)and the PGAP output, using CPAC 7 and CPAC 15 as the reference genomes.Polymorphisms located in intergenic regions were visualized with the Integrative Genomics Viewer program version 2.11.9 (97) to identify polymor phisms in the promoter regions of genes related to competitiveness, BNF capacity, and saprophytic ability.The protein sequences of genes upstream or downstream of the variation, as well as the protein sequences of genes located in the SIs containing variations, were annotated using the NCBI conserved domain database version 3.17 (98) with the CD-search option (99) to identify functional units in the protein sequences.

Genome synteny analyses
The genome sequences of the B. japonicum and B. diazoefficiens groups were compared using BLASTn from BLAST+ version 2.14.0 (100) with the following parameters: identity threshold of 98%, maximum number of high scoring pairs of 200, and a minimum raw gapped score of 10,000.The output files were used as input for the Artemis Comparison Tool (ACT) version 18.2.0(101) to visualize genome synteny.The SI A, B, and C of each group were selected using the program faidx version 0.7.2.1 according to the delimita tions established by Weisberg et al. (23) and Kaneko et al. (26), and ACT comparisons were performed as described above.

FIG 1
FIG 1 Unrooted phylogeny of reference, parental, and variant strains of the B. japonicum and B. diazoefficiens groups.(a) A maximum likelihood phylogeny of all strains was prepared from a concatenated alignment of 2,689 core genes.The scale represents the mean number of nucleotide substitutions per site.Sub-trees of the (b) B. japonicum and (c) B. diazoefficiens groups are shown with different branch length scales to better display the within-group relationships.Numbers at the nodes indicate the bootstrap values based on 1,000 bootstrap replicates.

FIG 2
FIG 2 Genome-wide synteny analysis of the (a) B. japonicum and (b) B. diazoefficiens parental strains (CNPSo 17 and CNPSo 10), reference strains (CPAC 15 and CPAC 7), and other variants.Pairwise genome alignments were performed, and homologous regions between pairs of genomes are connected by red lines (if in the same orientation) or blue lines (if in the inverse orientation).Figures specifically presenting the synteny of the symbiotic islands are provided as Fig. S2 and S3.

FIG 3
FIG 3 UpSet plot summarizing the pangenome of the reference, parental, and variant strains of the (a) B. japonicum group or (b) B. diazoefficiens group.The set size shows the total number of gene families in a given genome, while the intersection size shows the number of gene families conserved across the indicated proteomes.

TABLE 2
Genome assembly statistics of the parental strain (P), the reference variant for the genome comparison (R), and other variant strains of the B. japonicum and B.

diazoefficiens groups Strain GenBank accession number Size (bp) Contigs Number of plasmids Coverage (-fold) CDS with protein Pseudogenes ANI against CPAC 15
chromosome enlargement.No plasmids were found in the strains of the B. diazoefficiens group.

TABLE 3
Nucleotide polymorphisms within symbiosis island A potentially related to competitiveness for nodule occupancy or biological nitrogen fixation, which were detected in comparisons between the parental strain (P) or variant strains and the corresponding reference strain, B. japonicumCPAC 15

TABLE 4
Nucleotide polymorphisms within symbiosis island A potentially related to competitiveness for nodule occupancy or biological nitrogen fixation, which were detected in comparisons between the parental strain (P) or variant strains and the corresponding reference strain, B. diazoefficiensCPAC 7 a