Natural Chromosome-Chromid Fusion across rRNA Operons in a Burkholderiaceae Bacterium

ABSTRACT Chromids (secondary chromosomes) in bacterial genomes that are present in addition to the main chromosome appear to be evolutionarily conserved in some specific bacterial groups. In rare cases among these groups, a small number of strains from Rhizobiales and Vibrionales were shown to possess a naturally fused single chromosome that was reported to have been generated through intragenomic homologous recombination between repeated sequences on the chromosome and chromid. Similar examples have never been reported in the family Burkholderiaceae, a well-documented group that conserves chromids. Here, an in-depth genomic characterization was performed on a Burkholderiaceae bacterium that was isolated from a soil bacterial consortium maintained on diesel fuel and mutagenic benzo[a]pyrene. This organism, Cupriavidus necator strain KK10, was revealed to carry a single chromosome with unexpectedly large size (>6.6 Mb), and results of comparative genomics with the genome of C. necator N-1T indicated that the single chromosome of KK10 was generated through fusion of the prototypical chromosome and chromid at the rRNA operons. This fusion hypothetically occurred through homologous recombination with a crossover between repeated rRNA operons on the chromosome and chromid. Some metabolic functions that were likely expressed from genes on the prototypical chromid region were indicated to be retained. If this phenomenon—the bacterial chromosome-chromid fusion across the rRNA operons through homologous recombination—occurs universally in prokaryotes, the multiple rRNA operons in bacterial genomes may not only contribute to the robustness of ribosome function, but also provide more opportunities for genomic rearrangements through frequent recombination. IMPORTANCE A bacterial chromosome that was naturally fused with the secondary chromosome, or “chromid,” and presented as an unexpectedly large single replicon was discovered in the genome of Cupriavidus necator strain KK10, a biotechnologically useful member of the family Burkholderiaceae. Although Burkholderiaceae is a well-documented group that conserves chromids in their genomes, this chromosomal fusion event has not been previously reported for this family. This fusion has hypothetically occurred through intragenomic homologous recombination between repeated rRNA operons and, if so, provides novel insight into the potential of multiple rRNA operons in bacterial genomes to lead to chromosome-chromid fusion. The harsh conditions under which strain KK10 was maintained—a genotoxic hydrocarbon-enriched milieu—may have provided this genotype with a niche in which to survive.

chromosome with an unexpectedly large size was discovered in strain KK10 (23), and the results of analyses conducted here may provide new insights into the occurrence of natural genomic rearrangements and evolution in the family Burkholderiaceae.

RESULTS
A massive chromosome in strain KK10. The hybrid assembly technique using the DNBSEQ short-read and GridION long-read sequencing data successfully provided a high-quality, circularized genome of strain KK10 (see Table S1 and Fig. S1 in the supplemental material). The genome of strain KK10, with a total size of 8,350,386 bp, consisted of two replicons-a chromosome (6,679,877 bp) and a (mega)plasmid (1,670,509 bp) (Fig. 1, Table 1)-that carried a total of 7,324 coding genes according to the Procaryotic Genome Annotation Pipeline (PGAP) annotation. Among the complete genomes of Cupriavidus strains available in databases, most strains had a chromid (,4 Mbp) in addition to the main chromosome (,5 Mbp), while only strain KK10 and another strain (Cupriavidus metallidurans strain Ni-2) (24) had a single chromosome of which the size was relatively large (.6 Mbp) while lacking a chromid ( Table 1). The size of the KK10 chromosome is comparable in length to the largest single chromosomes within those known in Burkholderiaceae in the NCBI database (Burkholderia cenocepacia VC7848, 7.50 Mbp, and B. cenocepacia 895, 7.46 Mbp, which were isolated from clinical patients in unpublished studies). The KK10 chromosome possessed 5,957 coding genes and five rRNA gene (16S-5S-23S) operons (Fig. 1). Additionally, the plasmid of KK10, which possessed 1,367 coding genes, is 1.67 Mbp long; the sizes of the known plasmids of other Cupriavidus strains range from 30 kbp to 1.5 Mbp (Table 1).
Comparative genomics between strains KK10 and N-1 T . Among the complete genomes of Cupriavidus strains available in databases, C. necator N-1 T showed the highest average nucleic acid identity (ANI; 98.84%; Fig. 2), followed by C. necator H16 (94.99%) and C. necator NH9 (91.71%). Within 5,957 coding genes in the KK10 chromosome, 5,480 genes (92.0%) were assigned as the orthologous genes shared with N-1, while the plasmid carried fewer orthologous genes (442 of 1,367 coding genes, 32.3%). The graphical genome comparisons between KK10 and N-1 shown in Fig. 3A clearly indicate synteny between their chromosomes. A genome region that showed synteny with the chromid of N-1 was distributed in the middle of the KK10 chromosome (position 1.77 Mb to 4.57 Mb; Fig. 3A) and suggested that the massive chromosome of KK10 was created through fusion of the prototypical chromosome and chromid; the reference genome of N-1 appeared to conserve the structures of prototypical replicons in the ancestral genome of KK10. The GC skew profile of the KK10 chromosome ( Fig. 1 and 3B) clearly indicated two peaks originated from the prototypical chromosome and chromid. Interestingly, among five rRNA operons in the KK10 genome (rrn1 to -5, shown as pink arrowheads in Fig. 3A), rrn1 and rrn3 were located exactly at both of the considered conjunction sites of the chromosome and chromid, providing evidence that the repetitive sequences of the rRNA operons induced fusion of the prototypical chromosome and chromid and resulted in the creation of a massive chromosome in KK10. Evidence of chromosome-chromid fusion in KK10 across the rRNA operons. The presence of repetitive multiple rRNA operons conserved in bacterial genomes often causes the fragmentation and/or misassembly of the genome sequencing reads, especially in cases where only a short-read sequencing method was applied and repeat sequences such as rRNA operons were longer than the read lengths (25,26). To eliminate possibilities of misassemblies across rRNA operons and provide evidence that chromosome-chromid fusion occurred in the KK10 genome, raw data obtained from the GridION long-read sequencing was analyzed in detail. Multiple raw reads with lengths of 9,000 to 55,000 bp which provided coverage across the rRNA operons rrn1 and rrn3 (approximately 5,000 bp) located at the potential conjunction points were found (Fig. S2). Sequencing comparisons between these raw reads and the chromosome and chromid of N-1 clearly indicated that the regions upstream and downstream from rrn1 and rrn3 were swapped ( Fig. 4A) without any exceptions (Fig. S2). Functional genes located upstream and downstream from rrn1 and rrn3 were perfectly conserved between KK10 and N-1 and involved genes potentially responsible for DNA replication/repair (DNA polymerase-3 subunit epsilon), nitrogen assimilation (nitrate/nitrite reductase/transporter, nitronate monooxygenase), fatty acid degradation (acyl-CoA synthetase/dehydrogenase, acetyl-CoA acetyltransferase), oxidative stress response (organic hydroperoxide reductase, glutathione S-transferase), and citrate synthesis (Fig. 4B).
Characterizations of functional genes located on the KK10 chromosome. Two gene clusters that were widely conserved in Cupriavidus strains and potentially responsible for aromatic hydrocarbon degradation were found in the KK10 chromosome ( Fig. 5; Table S2). One of these gene clusters that contained poxABCDEF genes encoding benzene/phenol/toluene monooxygenase and xylEGHIJKQ genes for catechol degradation via an extradiol ring-cleavage pathway was characterized to be responsible for cell growth on benzene/phenol/toluene degradation (27) and was previously confirmed to enable Cupriavidus strains to grow on these hydrocarbon substrates as the sole carbon and energy sources (28). Another gene cluster, which consisted of benABCD genes encoding benzoate dioxygenase and catABCD genes for catechol degradation via an intradiol ring-cleavage pathway, was considered to be responsible for growth on benzoic acid (29). The pox-xyl gene cluster was located in a region of the  Table S2). Consistent with the findings of these functional genes in the KK10 genome, KK10 cells were confirmed to grow on benzene and benzoic acid as the sole carbon and energy sources through growth assays (Fig. S3).

DISCUSSION
Hypothetical mechanisms for the chromosome-chromid fusion across the rRNA operons. Cupriavidus genomes in the databases appeared to possess 4 to 7 rRNA operons that were located in both chromosomes and chromids. These repeated operons provide more opportunities for intragenomic recombination resulting in rearrangement of bacterial genomes (30), and indeed, repeated rRNA operons have been known to mediate chromosomal rearrangements in Salmonella (31)(32)(33)(34). Previous studies proposed that the repeated sequences of multiple rRNA operons were conserved through concerted evolution among these operons, thereby avoiding divergence caused by frequent recombination, eliminating mutations and repairing possible double-strand breaks (35). Therefore, the chromosome-chromid fusion in the KK10 genome may be described by an intragenomic (i.e., between the main chromosome and chromid) homologous recombination mechanism with a typical crossover event through doublestrand break repair in an rRNA operon via resolution of a double Holliday junction in opposite orientations (Fig. 6). Functional gene homologs encoding enzymes known to be responsible for DNA double-strand break repair, such as the helicase-nuclease complex AddAB (36), recombination protein RecA, Holliday junction helicase, and resolvase RuvABC and RecG, were all conserved in the KK10 chromosome. Another potential mechanism by which to explain the observed phenomenon is through template switching during replication resulting in a single circular chromosome (37). Notably, rrn1 in the KK10 chromosome, which served as the conjunction point, was located close to the considered replication terminus of the prototypical chromosome, where there is a higher likelihood of mutations, chromosomal fusion, and overall spontaneous The pox-xyl gene cluster encoding proteins for benzene/phenol/toluene degradation was located in a region that originated from the prototypical chromid (blue), while the ben-cat gene cluster for benzoic acid degradation was located in a region on the prototypical main chromosome (red). fragility compared to regions nearer to the replication origin (11,38,39). From this perspective, it is worth considering that strain KK10 had been maintained as a member of a bacterial consortium for many years on a diesel fuel carbon source and the mutagenic polycyclic aromatic hydrocarbon (PAH) benzo[a]pyrene (40). These harsh conditions may have contributed to increasing the possibilities of DNA damage and mutations in the members of the community (41,42). It is still unclear if replication from the origin sites of the prototypical chromid is still functional or suppressed (6) in the fused single chromosome, and further investigations, such as marker frequency analysis (43), shall be required for clarification.
Potential effects of the chromosome-chromid fusion on the gene functions. Previous studies observed that the functional genes on the chromid appeared to be expressed niche-specifically and differentially from the genes on the main chromosome (6); therefore, genome rearrangements through the chromosomal (chromosomechromid) fusion event may influence, or potentially even silence, the expression of functional genes on the prototypical chromid. In contrast, a previous study of Burkholderia cenocepacia proposed that translocations of a gene segment from a small replicon to the main chromosome increased their expression (44). Thus, a remaining question is how the expression behavior of functional genes in the KK10 chromosome were affected because of the chromosome-chromid fusion.
The pox-xyl gene cluster that was located in the region that originated from the prototypical chromid of KK10 and the ben-cat gene cluster that was found in the region from the prototypical main chromosome may each serve as potential indicators for expression of functional genes derived from each of the chromosomal regions. KK10 cells were confirmed to grow on benzene and benzoic acid here (Fig. S3). In a previous study KK10 grew on benzoic acid and salicylic acid, and the downstream metabolite of these compounds, catechol, was detected (20). A gene considered to be responsible for salicylic acid utilization, which encodes salicylate 1-hydroxylase, which transforms salicylic acid to catechol (28), was also found in the prototypical chromid region (position 4.55 Mb). Thus, the functional gene sets that originated from either the prototypical chromid or the prototypical main chromosome both appeared to be expressed and functioning, at least at certain levels that were sufficient to grow under the conditions tested unless other entirely unknown enzymes were functioning. Another potential indicator for the expression of functional genes includes gene sets responsible for cell motility. Gene operons encoding proteins related to flagellar biosynthesis and chemotaxis were widely conserved among Cupriavidus strains, and these were all located on the chromid (2). In the KK10 genome, these conserved gene sets were also found in the region that originated from the prototypical chromid (Table S3), and together with other strains previously reported (45)(46)(47), cell motility of KK10 was confirmed by microscopic observation. These results suggested that for at least some of the functional genes that were conserved in the region that originated from the prototypical chromid of KK10, their expression was not silenced.
These targeted functional genes that originated from the prototypical chromid of KK10 appeared to retain their functions; however, expression of other functions such as nitrogen assimilation or fatty acid metabolism, of which responsible genes were located close the fusion sites (Fig. 4B), could be more affected through this genomic rearrangement. Therefore, further transcriptome-based analyses and comparisons with other Cupriavidus strains shall be required to evaluate in detail how the expression levels of these functional genes and the cell survival rates were influenced through the chromosome-chromid fusion event.
Frequency and ecological relevance of the chromosome-chromid fusion event. The GC skew profiles of chromosomes with a size of .6.0 Mbp from other Burkholderiaceae strains, Cupriavidus metallidurans Ni-2 and Burkholderia cepacia LO6 (both sequenced using the PacBio RS II system) (24,48), indicated that their chromosomes had multiple GC skew peaks, similar to the KK10 chromosome, and these results suggested that they contained regions that originated from multiple prototypical replicons, i.e., the main chromosome, chromid, and possibly, plasmids (Fig. S4). These are in contrast to the chromosome from a member of the genus Pandoraea (P. norimbergensis DSM11628; Fig. S4), which is known to lack a chromid in its genome and conserve relatively larger single chromosomes than those of other genera that widely conserve a chromid (2). Based on currently available database information, it is not certain whether these chromosomal structures from C. metallidurans Ni-2 and B. cepacia LO6 resulted from a naturally occurring chromosome-chromid fusion or from sequencing misassemblies. Further genome comparison between strain Ni-2 and C. metallidurans strain CH34 T (as a reference) indicated that the locations of potential chromosomechromid fusion did not match the positions of their rRNA operons (Fig. S5), indicating that another unknown mechanism may have mediated the chromosome-chromid fusion unless it was caused by sequencing misassemblies.
Naturally occurring chromosome-chromid fusion in bacterial genomes has rarely been found and thus was referred to as an "exceptional" case in the bacterial groups that conserve chromids in their genomes (6,9). Therefore, even though the bacterial chromosomal fusion mediated by homologous recombination or replication template switching may occur frequently in nature, the variants generated through such fusion (and possible excision) events may rarely outcompete other wild-type cells in ecosystems (6). However, interestingly, these bacterial genotypes may have successfully occupied niches in unique, specialized environments; the V. cholerae strains and B. cepacia LO6 discussed above were isolated from clinical samples, and C. metallidurans Ni-2 and strain KK10 were isolated from laboratory-maintained bacterial consortia. Fusion of the chromid into the main chromosome may have limited the chances of foreign gene acquisition into the genome due to the lower rates of horizontal gene transfers into the main chromosome compared to those into chromids (2). In contrast, it may have provided benefit to cells by stabilizing and conserving efficient functional genes that were carried on the chromid. From this perspective, the relatively large size (1.67 Mbp) of the KK10 plasmid may have resulted from the higher necessity for acquiring foreign genes into the plasmid instead of the chromid.
In summary, this study reported a new occurrence of bacterial chromosomal fusion with another replicon that was potentially mediated by intragenomic homologous recombination between repeated rRNA operons. Naturally occurring chromosomechromid fusion by the same mechanism has not been previously studied, to the best of our knowledge, and it is still uncertain how common this phenomenon may be among prokaryotes. Intriguingly, the inverse phenomenon was previously reported in Salmonella; homologous recombination between rRNA operons generated a new plasmid from the main chromosome (49). Nonetheless, if these processes are occurring more than has been understood, the presence of multiple rRNA operons conserved in bacterial genomes may not only contribute to the robustness of the function of the ribosome, but also provide more opportunities for genomic rearrangements and evolution through frequent homologous recombination. Additionally, the results of this investigation also emphasize that long-read sequencing analyses may be rigorously applied to fully resolve a complete bacterial genome to avoid misassemblies that may occur at rRNA operons. Although short-read sequencing has been widely applied for bacterial genome sequencing, this work serves as a cautionary tale that these short reads are insufficient to differentiate between fused and unfused genomic structures. Future studies of the fused chromosome of strain KK10 may provide further information about the ecological relevance and significance of the bacterial chromosomal fusion, including how the fusion event positively or negatively affects bacterial survival in ecosystems.

MATERIALS AND METHODS
Bacterial strain and culture conditions. Cupriavidus necator strain KK10 was isolated from a diesel fuel-degrading bacterial consortium that originated from cattle pasture soil from the Gulf region of Texas by streaking on solid medium with a heavy fraction of diesel fuel as the carbon and energy source (21,22,40,50,51). Strain KK10 was routinely cultured on 20 mM glycerol in Stainier's basal medium (SBM) at 30°C with rotary shaking at 150 rpm in the dark. The growth capability of strain KK10 on the aromatic hydrocarbon substrates as sole carbon and energy sources was tested by incubating bacterial cells under the same conditions with 50 mg L 21 of selected substrates in 20 mL of SBM in 100-mL-volume conical flasks that were stoppered with silicone plugs. Cell growth was evaluated by measuring the optical density of the cultures at 600 nm, and light microscopy was utilized to observe motility and cell multiplication using an Eclipse E800 system (Nikon, Tokyo, Japan).
Complete genome sequencing of strain KK10. The complete genome sequence of strain KK10 was previously announced with detailed information on the sequencing method (23). In brief, genomic DNA of strain KK10 extracted from the cells grown on 20 mM glycerol for 3 days was sequenced employing a hybrid assembly of short-read (DNBSEQ-G400; MGI Tech, Shenzhen, China) and long-read (GridION X5; Oxford Nanopore Technologies, Oxford, UK) sequencing technologies. For the GridION sequencing, a library was created using a ligation sequencing kit (SQK-LSK109; after adapter ligation, .3-kb fragments were enriched). The long-read sequences that were obtained using R9.4.1 flow cells were base-called using Guppy v. 4.0.11 and then trimmed and quality-filtered using Porechop v. 0.2.3 and Filtlong v. 0.2.0 (.1 kb). After trimming and quality filtering of the raw reads obtained from each sequencing platform, the complete genome sequence was determined through de novo assembly using Unicycler v. 0.4.7 (52) and was validated using Bandage v. 0.8.1 (Fig. S1) (53). GenSkew v. 1.0 was used to calculate the GC skew and to approximate the origin and terminus of replication.
Functional gene profiling and comparative genomics. Genome annotation was performed through the annotation pipelines provided by NCBI (PGAP v. 5.2) and JGI (Integrated Microbial Genomes [IMG] annotation pipeline v. 5.0.20). The reference Cupriavidus genome sequences deposited in the NCBI and IMG databases were used for comparative genomics. ANI between strain KK10 and reference Cupriavidus strains was determined using fastANI v. 1.32 (54), and a heatmap with cluster dendrograms was created with the heatmap function in R. Sequence similarity between the strain KK10 genome and reference genomes was visualized using Circoletto (http://tools.bat.infspire.org/circoletto/) (55), an online tool based on Circos (56), and Mauve v. 2.4.0 (57). SonicParanoid v. 1.3.5 (58) was used to identify the orthologous genes between the KK10 genome and reference genomes.
Data availability. The sequencing assembly and raw data for the KK10 genome are all available in public databases, under the accession numbers CP073677 and CP073678 in NCBI GenBank and under accession number 2913661577 in the IMG/MER. The raw SRA sequences are available under accession numbers SRR14308055 and SRR14308056, which are under BioProject number PRJNA722091 and BioSample number SAMN18744514.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPLEMENTAL FILE 1, PDF file, 4.9 MB.

ACKNOWLEDGMENTS
This research work was funded by the Japanese Society for the Promotion and Science (JSPS) KAKENHI grant 19K15738 to J.F.M. and grant 17K00555 to R.A.K.