The extent and characteristics of DNA transfer between plasmids and chromosomes

SUMMARY Plasmids are extrachromosomal genetic elements that reside in prokaryotes. The acquisition of plasmids encoding beneﬁcial traits can facilitate short-term survival in harsh environmental conditions or long-term adaptation of new ecological niches. Due to their ability to transfer between cells, plasmids are considered agents of gene transfer. Nonetheless, the frequency of DNA transfer between plasmids and chromosomes remains understudied. Using a novel approach for detection of homologous loci between genome pairs, we uncover gene sharing with the chromosome in 1,974 (66%) plasmids residing in 1,016 (78%) taxonomically diverse isolates. The majority of homologous loci correspond to mobile elements, which may be duplicated in the host chromosomes in tens of copies. Neighboring shared genes often encode similar functional categories, indicating the transfer of multigene functional units. Rare transfer events of antibiotics resistance genes are observed mainly with mobile elements. The frequent erosion of sequence similarity in homologous regions indicates that the transferred DNA is often devoid of function. DNA transfer between plasmids and chromosomes thus generates genetic variation that is akin to workings of endosymbiotic gene transfer in eukaryotic evolution. Our ﬁndings imply that plasmid contribution to gene transfer most often corresponds to transfer of the plasmid entity rather than transfer of protein-coding genes between plasmids and chromosomes.


INTRODUCTION
Horizontal gene transfer is an important driver of genetic diversity and evolution of novelty in prokaryotes.The commonly known mechanisms of DNA transfer in prokaryotes include natural transformation, 1,2 conjugation, 3,4 and transduction. 5Additional taxon-specific mechanisms include gene transfer agents (GTAs) that have been described in alpha-proteobacteria, [6][7][8] plasmid vesicles in the archaea Halorubrum spp., 9 and cell fusions that have been described in halophilic archaea. 10,11The known transfer mechanisms vary in the type of conductive mechanisms, the configuration of transferred DNA, and the prospect of integration of the transferred DNA into the recipient chromosomes.The integration of acquired (or invasive) DNA into the recipient chromosomes can be facilitated by a plethora of DNA editing and repair mechanisms, including homologous or illegitimate recombination, 12,13 as well as integrases and transposases (reviewed in Arnold et al. 14 ).Plasmids transferred via conjugation (or rarely also via natural transformation) typically persist in the host cell as self-replicating elements.Plasmid integration was reported for F-plasmids that may be integrated into the host chromosomes via homologous recombination 15 or at insertion sequence (IS) sites. 16DNA transfer between plasmids and chromosomes may furthermore occur via diverse mobile genetic elements (MGEs) that can transfer between DNA molecules (with or without replication) via homologous recombination or site-specific recombination (reviewed in Partirdge 17 ).Studies of gene transfer in prokaryote evolution are typically based on shared sequence similarity among distantly related organisms, such as findings of homologous genes encoded on chromosomes and plasmids, which are inferred to stem from lateral gene transfer events.9][20] Notably, this approach assumes that gene transfer between plasmids and chromosome occurred either in the donor or the recipient at some point in time.
Direct evidence for gene transfer between plasmids and chromosomes entails diverse reports of transposon-mediated transfer.An example for transfer of catabolic genes is the report of Tn4652-mediated transfer of toluene and xylene degradation genes in Pseudomonas putida. 21Similarly, transposon-mediated transfer of naphthalene dioxygenase between plasmid and chromosome was reported in P. putida isolated from coal tar waste. 22An experimental evolution study of P. putida adaptation to heavy metals stress further revealed that Tn4652 may transfer frequently between the chromosome and resident plasmid in that organism. 23Transposons may furthermore mobilize antibiotic resistance genes, as reported in Enterococcus faecalis where a Tn925-mediated transfer of tetracycline resistance gene via a conjugative plasmid was observed, 24 as well as a Tn1549-mediated transfer of vancomycin resistance gene. 25,26hether gene transfer from plasmids to chromosomes is a frequent phenomenon remains questionable.According to the ''plasmid paradox'' hypothesis, plasmids carrying antibiotic resistance genes are at risk of extinction following a transfer of the resistance gene into the chromosome. 27However, the distinct repertoires of antibiotic resistance genes in plasmids and chromosomes indicate that antibiotic resistance gene transfer between plasmids and chromosomes is rather rare. 28,29The extent of DNA transfer between plasmids and chromosomes thus remains understudied.
Here, we reconstruct events of DNA transfer between plasmids and the chromosome of their host.For that purpose, we developed a computational approach to detect homologous genomic regions between plasmid and chromosome replicon pairs.Our approach is inspired by studies of endosymbiotic gene transfer in eukaryotic cells (e.g., as previously shown by Haznaki-Covo et al., 30 Richly et al., 31 and Hazkani-Covo and Martin 32 ).DNA transfer from the eukaryotic organelles-mitochondria and plastids-is a recognized mechanism that generates genetic diversity within animal and plant genomes. 33,34ndosymbiotic gene transfer had a paramount role in eukaryogenesis 35 and is still ongoing today, e.g., within the human population. 36The analysis of replicon pairs within the same cell enables us to infer transfer events while avoiding the inherent challenges of gene transfer inference between distantly related organisms (e.g., due to high sequence divergence 37 ).The repertoire of homologous regions uncovered by our approach supplies a comprehensive view on the extent and type of gene transfer between chromosomes and their resident plasmids.

Detection of homologous loci between plasmids and their host chromosome
To examine the extent of DNA transfer between plasmids and chromosomes, we analyzed the genomes of 1,304 completely sequenced prokaryotic isolates that include a single chromosome and at least one plasmid.The majority of isolates in this set harbor a single plasmid (49%) or two plasmids (24%).The dataset comprises 2,966 pairs of chromosome and plasmid replicons that co-occur in the same isolate, that is, both replicons were documented in the same prokaryotic host.Shared sequences within plasmid-chromosome pairs were detected by sequence similarity search of the plasmid sequence against the chromosome sequence.Our search detected shared sequences in 2,912 (98%) plasmid-chromosome pairs, including plasmids from among almost all genera with the prominent taxa being Bacillus, Escherichia, and Klebsiella.The magnitude of shared plasmid sequence with the chromosome had a median of 5.6% and reached 100% of the plasmid genome (Figure 1A; Table S1).The proportion of shared plasmid sequence cannot be well explained by the plasmid size (r s = 0.2, p < 0.001, using Spearman correlation).
A close examination of shared sequences between plasmids and chromosomes revealed multiple hits of varying sequence similarity (Figure 1B).At first, we suspected that this pattern is the result of assembly artifacts.However, because the genomes we analyzed here are complete, we realized that the sequence similarity pattern is a result of genuine biological processes.Example evolutionary scenarios are either multiple syntenic DNA transfer events between the plasmid and chromosome or, alternatively, a single transfer event followed by gradual (and heterogeneous) degradation of sequence similarity (e.g., due to different selection regimes).The single transfer is the more parsimonious scenario.Thus, although the results from the sequence similarity search are useful in identifying the proportion of plasmid genome shared with the chromosomes, a parsimonious inference of transfer events could be achieved by joining neighboring hits into segments that correspond to transfer events.
To infer transfer events between plasmid and chromosomes, we developed a novel hit-agglomeration approach that joins neighboring local sequence similarity hits into continuous segments of sequence similarity.The hit-agglomeration procedure is applied independently to each replicon in plasmid-chromosome pairs (Figure 1C).Applying this approach, e.g., for the plasmid-chromosome pair in Borrelia burgdorferi JD1 uncovered a transfer or plasmid-encoded type I restriction endonuclease (Figure 1B).Note that the recovered segment corresponds conceptually to a pairwise sequence alignment of the plasmid and chromosome genomes.However, such pairwise alignment could not result from a standard alignment algorithm due to the heterogeneity of plasmid and chromosome sequence similarity along the segment.The application of the hit-agglomeration approach to all plasmid-chromosome pairs in our dataset recovered a total of 45,400 plasmid segments and 99,290 chromosomal segments in 1,974 (66.5%) of the plasmid-chromosome pairs.Plasmids where no segments could be recovered are generally smaller, contain less coding sequences (CDSs), and are characterized by a higher nucleotide sequence complexity in comparison with plasmids where segments were recovered (Figures S1A-S1C).
Multiple copies of the same plasmid segment in the chromosome may originate either from multiple transfer events between the plasmid and chromosome or duplications of the transferred regions in one of the replicons or a combination of both processes (Figure 1C).To extend our evaluation to include multiple copies of shared sequences, we opted for a two-dimensional representation of the shared DNA that enables us to pinpoint the homologous segments between plasmids and chromosomes and quantify the copy number of shared loci.For that purpose, we integrated the information from segments found in each plasmid and chromosome pair into intersections.These are areas in the 2D space of DNA sequence similarity between the two replicons that are intersections of their segments and contain local sequence similarity hits (Figure 1B).Intersections of plasmid and chromosome segments corresponding to homologous sequences were detected in 1,974 plasmid-chromosome pairs (97% of the pairs having plasmid segments; Table S2).

Detected plasmid segments correspond to DNA transfer between paired plasmids and chromosomes
To evaluate the approach performance, we observed the resulting intersections in two species where gene transfer between a resident plasmid and the host chromosome was previously described.The first example is an E. faecalis isolate, where a transfer of a pathogenicity island including a vancomycin resistance gene, was previously hypothesized. 25Sequencing of that isolate revealed plasmid genes that were integrated in the chromosome in the neighborhood of IS clusters, especially IS1216. 26Our approach identified 17 segments in plasmid pTEF1 that are shared with the chromosome sequence and correspond to 43% of the plasmid sequence (Figure 2A).The segment size ranged between 209 and 9,452 bp.The largest segment overlaps with 12 genes in the plasmid genome and includes IS1216 and IS256.The pattern of segment similarity suggests a degradation of sequence similarity between the plasmid and chromosome, as previously described. 26s the second example, we examined the resulting segments in isolates of P. putida, in accordance with earlier reports on transposon-mediated transfer the between plasmid and chromosome in that species.One isolate-P.putida S12-stood out with a high number of similar loci between the plasmid and chromosome.The recovered intersects correspond to ISS12 that was found in 23 copies in the resident plasmid (pTTS12; Figure 2B).Transposition activity of ISS12 was previously described in P. putida S12, where it can alter the expression level of a solvent efflux pump and thus modify the P. putida sensitivity to toluene. 38The presence of ISS12 in the P. putida S12 genome sequence was previously reported. 39,40Notably, because the ISS12 is an active transposon, the number of ISS12 copies in the plasmid and chromosome can be diverse within the P. putida S12 population. 41Taken together, our approach is useful for identifying shared genomic loci between paired plasmids and chromosomes that correspond to putative DNA transfer events.

DNA transfer between plasmids and chromosomes correspond commonly to transposable elements
To further examine the characteristics of homologous regions in plasmids and chromosomes, we examined the properties of the plasmid segments found in intersections with chromosomal segments.The median number of segments per plasmid (i.e.,  S1).The inlay shows a higher resolution of the data for 0-0.4 shared plasmid fraction for the top most frequent genera.(B) 2D dotplot presentation of sequence similarity between plasmid lp28-1 (y axis) and the chromosome (x axis) in Borrelia burgdorferi JD1 (GenBank: GCA_000166655.2).Colored strips next to the axes show loci of sequence similarity to the paired replicon as identified with BLAST or MUMMER, segments resulting from the agglomeration algorithm, and annotated genes.Sequence similarity between plasmid and chromosomes are shown with red and yellow lines that correspond to the BLAST and MUMMER hits.The inlay shows a zoom into an intersection of sequence similarity in both plasmid and chromosome.The plasmid segment overlaps with a plasmid-encoded restriction enzyme and three pseudogenes (one of which is of a methyltransferase).(C) An illustration of hypothetical scenarios for DNA transfer between plasmid (green) and chromosome (blue) followed by duplications of the transferred DNA.Similarity detection: a schematic representation of detected local sequence similarity between the plasmid and chromosome.Red, gray, orange, and brown fragments depict local similarity hits in co-linear blocks.Hit agglomeration: co-linear sequence similarity hits are joined to form segments of shared sequence similarity between plasmid and chromosome.A pairwise presentation of the segments is possible in 2D space (as in B).
plasmid-chromosome pair) was three segments.The majority of short segments were identified in large plasmids (Figure 3A).Segments corresponding to a high proportion of the plasmid genome are characterized by a high sequence similarity to the chromosome (Figure 3B).Accordingly, segments found in small plasmid types (here, plasmid size < 100 kb) have a high sequence similarity to the chromosome compared with segments found in medium and large plasmids (a = 0.05, using Kruskal-Wallis test and Tukey posthoc test).The segment sequence similarity with the chromosome was not associated with the segment length (r s = 0.077, p value < 0.001, using Spearman correlation).The differences we observed in segment length and segment sequence similarity depending on the plasmid size suggest frequent erosion of sequence similarity in segments found in large plasmids.
What is the content of DNA transferred between plasmids and chromosomes?Plasmid segments including multiple neighboring genes that likely correspond to co-transfer of multiple CDSs are rather rare (12%).About a fifth (19.3%) of the plasmid segments correspond to partial or complete CDSs.The majority of plasmid segments correspond to non-coding (or intergenic) regions (Figure 3C).Comparing the content of plasmid segments with the total plasmid genome shows that non-coding regions are overrepresented in the plasmid segments (Figure 3C).In contrast, the proportion of chromosomal segments corresponding to CDSs is larger than the proportion of segments corresponding to non-coding regions (Figure 3C).This difference between plasmids and chromosomes suggests that homologous loci tend to be non-functional in the plasmid genomes (i.e., likely pseudogenes).
To gain an insight into the type of functions that are transferred between plasmids and chromosomes, we classified all CDSs in plasmid segments into functional categories.An examination of the CDSs that could be classified into a known functional category reveals a majority of transposable elements and ISs that correspond to 33% of the classified CDSs (Figure 3D).The next most frequent categories are ''replication, recombination, and repair,'' ''amino acid transport and metabolism,'' and ''transcription,'' which correspond mostly to genes coding for MGEs, e.g., integrases, recombinases, and relaxases (Table S3).Genes Two demonstrative examples of detected intersections of plasmid and chromosome segments for (A) plasmid pTEF1 in E. faecalis V583 (GenBank: GCA_000007785.1) and (B) a mega-plasmid in P. putida S12 (GenBank: GCA_000495455.2).Axes in the 2D presentation correspond to chromosome (x axis) and plasmid (y axis) genome sequences.Colored strips next to the axes show loci of sequence similarity to the paired replicon as identified with BLAST or MUMMER, segments resulting from the agglomeration algorithm, and annotated genes.Intersections of plasmid and chromosomes are shown with red and yellow lines that correspond to the BLAST and MUMMER hits.Illustrations of plasmid maps show annotated genes in the plasmid genome and segments identified in the plasmid genome with our approach.
encoding transport and metabolism functions comprise 21% of the total transferred genes, with a majority of genes classified into transfer and metabolism of amino acids or inorganic ions.
In addition to protein-coding genes, we examined evidence for the transfer of RNA genes between plasmids and chromosomes, which is considered as rare (but exceptions have been reported, e.g., for rRNA genes in Aureimonas plasmids 42,43 ).Our analysis uncovered multiple segments that putatively correspond to transfer of RNA genes.The dataset analyzed here includes overall 197 rRNA genes in 42 plasmids, all of which are shared with the chromosome (Table S4).Out of the total 88 segments comprising rRNA genes, 61 segments correspond to a complete gene set of the rRNA operon.Example is sharing of an rRNA operon between plasmids and chromosomes in several isolates of the sugarcane endophyte Azospirillum brasilense, in agreement with previous reports. 44ditionally, the dataset analyzed here includes 832 tRNA genes in 171 plasmids, of which 239 (29%) genes are shared with the chromosome in 63 plasmid-chromosome pairs (Table S4).An extreme example is a tRNA island comprising 13 tRNA genes in plasmid pBM400 in the soil bacterium Priestia megaterium strain QM B1551. 45,46Using our approach, we uncovered four copies of that region in the host chromosome, in variable levels of sequence conservation.Notably, comparing the pBM400 plasmid genome with similar-sized plasmids from two recently sequenced P. megaterium isolates 47 reveals a high conservation of the plasmid-encoded tRNA island (Figure S2).Considering their low frequency, sharing of rRNA and tRNA genes between plasmids and their host chromosomes remains a rather rare phenomenon in the analyzed dataset.The difference in the proportion of shared genes between rRNA genes (all genes) and tRNA genes (ca.30%) indicates that Properties of plasmid segments that correspond to intersects with chromosomal segments are presented.The plasmid segment length had a median of 381 bp and a maximum of 116,763 bp.Here, we classify plasmids into three arbitrary size categories: small plasmids < 100 kb, medium plasmids between 100 and 500 kb, and large plasmids > 500 kb.(A) The distribution of plasmid segment length shown in three categories of plasmid size (see Figure S1 for additional details; segment data are available in Table S2).(B) Distribution of the proportion of plasmid genome that corresponds to plasmid segments according to three categories of sequence similarity between plasmid and chromosome.The inlay figure shows the plasmid fraction range between 0.1 and 1.The segment sequence similarity ranged between 80%-100% identical nucleotides with a median of 91.8% (but we note the high sequence similarity threshold used in the initial screening).(C) A comparison of segment sequence and annotation of protein-coding sequences (CDSs).The fraction of segments and their proportion from the plasmid and chromosome genome are shown according to four categories.(D) Frequency of CDSs in plasmid segments according to their functional annotation (see example in Figure S2; data are available in Tables S3 and S4).plasmid tRNAs may supplement the host tRNA repertoire (e.g., as reported for Methanoperedenaceae 48 ).
Multi-CDS plasmid segments reveal putative cotransfers of functionally related genes One of the strengths of our approach is the ability to uncover shared sequence similarity across multiple neighboring genes.The distribution of co-transferred protein-coding genes according to their annotated function shows an apparent high frequency on the diagonal, which indicates that genes classified into the same category tend to be transferred together (Figure 4A).This scenario would fit, for example, the transfer of genes that are included in the same functional unit, operon, or regulon.The highest frequency of co-transfer appears in combination with transposition and ISs, as well as replication, recombination, and repair genes (including recombinases and integrases).Those gene functions are common constituents of MGEs (i.e., transposons and integrons).Genes coding for transcription functions, such as transcriptional regulators, are the fourth most frequently transferred category, often in combination with genes in other functional categories that are classified into metabolic functions (Figure 4A).
Combinations of genes classified into amino acid transport and metabolism as well as inorganic ion transport and metabolism stand out with a high frequency of co-transferred genes.Here, we show two examples for such plasmid segments.In the genome of the probiotic bacterium Lactobacillus plantarum strain P-8, our approach uncovered a putative transfer event of four neighboring genes, including a-galactosidase and b-galactosidase, which play a role in galactose metabolism (Figure 4B).The segment was identified in plasmid LBPp2 that encodes for diverse metabolic functions and is stably maintained in L. plantarum P-8. 49In the genome of Acidiphilium cryptum strain JF-5, which was isolated from a coal mine lake sediment, our approach uncovered a putative transfer of three neighboring genes between one of the eight resident plasmids (pACRY02) and the chromosome.The three genes function in branchedchain amino acid transport and metabolism (Figure 4C).
Antibiotic resistance gene transfer is rare, and genes are often co-transferred A prominent aspect of plasmid research is focused on their contribution to the dissemination of antibiotics resistance genes (B and C) Examples of plasmid segments that reveal putative transfer of multiple genes of similar functional category.The average sequence similarity of hits comprising these segments was 92% identical nucleotides in (B) and 94% identical nucleotides in (C).No IS elements or transposons could be identified in the flaking regions of these putative transfer events.See Figure 2 for elements in the 2D presentations of plasmid-chromosome shared sequence similarity.
(ARGs).Antibiotic resistance genes constitute 0.42% of the shred loci between plasmids and chromosomes.A search for ARGs in our dataset uncovered 1,655 ARGs belonging to 86 ARG families and 54 drug classes.These genes are encoded in 862 plasmids that harbor R1 ARGs and reside in 305 isolates.Comparing the plasmid-encoded ARG loci with the plasmid segments shows that only 90 (5.4%) of those genes are shared between the plasmid and paired host chromosome (Table S3).The ARGs correspond to 71 plasmid segments in 56 plasmid-chromosome pairs (that is, 19% of ARG-carrying plasmids) in 48 isolates.The most common genes that were detected with ARGs in the same segment are transposases.Notably, 26 ARGs were found within segments that include multiple ARGs.Additionally, several of the ARGs were shared with the chromosome in multiple copies.An example is the sulfonamide resistance gene (sul1), which is encoded in the plasmid pKEC-a3c in the Citrobacter freundii strain CFNIH1 that was isolated from a hospital sink 50 and has four copies in the host chromosome (Table S3).This pattern of gene transfer between plasmids and the chromosomes is expected when the ARG transfer is mediated by MGEs.Indeed, the shared plasmid segment including sul1 is reminiscent of a class 1 integron In34 51 (as identified using TnCentral 52 ).
Many of the organisms where we observed ARG transfer between plasmids and chromosomes are commonly reported as players in ARG dissemination within the hospital environment.Examples are Klebsiella species as well as Acinetobacter and Salmonella.Nonetheless, one organism stands out as an exception, Ralstonia solanacearum, which is a potato pathogen.Indeed, previous studies reported the presence of antibiotic resistance plasmids in agricultural habitats, likely as a result of antibiotics usage in animal husbandry and the application of manure for fertilization. 53e repertoire of intersection shapes reveals frequent erosion of transferred DNA To further investigate the fate of transferred DNA, we examined the pattern of local sequence similarity within intersections, which can supply hints as for rearrangements and duplications of transferred DNA (Figure 5A).The simplest pattern corresponds to intersects that include a single sequence similarity hit covering the full extent of the plasmid and chromosome segments.The interpretation of such intersection is a single transfer between the plasmid and chromosome.This simple transfer pattern is very common, comprising 69% of the plasmid segments.A slightly more complex pattern involves multiple colinear sequence similarity hits that span the full extent of the plasmid and chromosome segments.The most parsimonious interpretation here is also of a transfer between the plasmid and chromosome, where subsequent sequence evolution, e.g., by nucleotide substitutions and short indels, led to degradation of sequence similarity.Heterogeneous sequence similarity of homologous loci hinders the detection of the entire region as a single BLAST hit and is instead detected as multiple partial hits.This type of intersection is common (30%) and in some cases comprise multiple partial hits (e.g., Figure 1B).Most of the transfer events identified from direct intersection of one-to-one plasmid and chromosome segments corresponded to non-coding (i.e., intergenic) DNA and partial CDSs in the plasmid genome (Figure 5B), suggesting that these plasmid loci are non-functional.Indeed, a recent study revealed a significantly higher density of pseudogenes in plasmids compared with chromosomes. 54ore complex intersection patterns arise when the transferred sequence is also subject to duplications and rearrangements on either replicon.When a plasmid segment consists of multiple hits, but their targets are distributed among several chromosome segments, the most parsimonious scenario is of a single plasmid to chromosome transfer followed by chromosome rearrangements or duplications.Here, we assume that within-replicon rearrangements are the more parsimonious scenario compared with inter-replicon transfer, such that a scenario involving multiple transfers is less likely.A one-to-many pattern was observed in 12% of the plasmid segments.The opposite pattern, where many plasmid segments have hits in just one chromosome segment, accounted for 3% of the plasmid segments.We note that in the two one-to-many patterns, duplications are slightly more common than other rearrangements and that the number of duplications can reach to dozens, a situation especially common in repetitive sequences and transposable elements (e.g., Figure 2B).Complex intersection patterns, comprising manyto-many plasmid and chromosome segment pairs, is observed in the remaining 80% of the plasmid segments.About half of these segments correspond to non-coding DNA and partial CDSs in the plasmid genome, with the second half corresponding to plasmid-encoded CDSs (Figure 5B).
Of the total plasmid segments, 22,673 (50%) are shared with the chromosome in a single copy, whereas the remaining segments are shared with the chromosome in multiple copies ranging between 2 and 457 with a median of 2 copies.Plasmid segments appearing with few copies in the chromosome (i.e., 1-3) include a large proportion of non-coding regions, whereas segments appearing in multiple copies include similar shares of non-coding and CDSs (Figure 5C).To gain an insight into the evolution of such repetitive patterns, we examined one extreme example of a plasmid where we identified a many-tomany pattern, the genome of Piscirickettsia salmonis strain B1, an intracellular fish pathogen isolated from Coho salmon that harbors four plasmids. 55Our approach identified a highly repetitive pattern of shared regions between plasmid pPSB1-2 and the chromosome that correspond to either transposases, integrases, or ISs (Figure 5D).The plasmid segments include an IS30-like element that has seven copies in the plasmid and tens of copies in the chromosome.This IS element has been first described in E. coli 56 and is classified as a copy-and-paste IS (reviewed in Siguier et al. 57 ).A recent analysis of P. salomis genomes suggested that transposase activity in those organisms was rampant with many transposase copies in that genome corresponding to pseudogenes. 58Indeed, the number of chromosomal copies of segments that correspond to ISs and transposons (category 01 in Figure 4A) had a median of 8; segments in that category had significantly higher number of copies in the host chromosome compared with the remaining segments (p < 0.001, using Wilcoxon test).
Gene non-functionalization and the accumulation of neutral mutations in transferred genes are expected to generate a complex intersect pattern over time.An example for this process was uncovered by our approach in two isolates where a whole plasmid transfer was detected.The genome of Corynebacterium atypicum DSM 44849 (initially named R2070) includes a circular corynephage FCATYP2070I (termed also phage plasmid). 59Our approach identified one almost intact copy of the plasmid phage in the chromosome, as well as a highly eroded copy of that plasmid phage, with a pattern that we term ''a ghost of past transfer'' (Figure 5E).The second example was observed in a Staphylococcus aureus strain HOU1444-VR that was isolated from a human patient and harbors three plasmids. 60Our approach uncovered an almost intact copy of plasmid pVR-MSSA_02 in the chromosome as well as evidence for another, highly eroded copy in the chromosome (Figure 5F).We note that this plasmid was recently annotated as a phage plasmid. 61 full plasmid sharing with the chromosome was observed for only 17 (0.7%) plasmids in 12 diverse genera (Table S1).Taken together, in addition to protein-coding genes, our approach uncovers frequent evidence for past transfers between plasmids and chromosomes that evolved into pseudogenes and intergenic regions.

DISCUSSION
Plasmids are often described in the literature as vehicles of horizontal gene transfer in bacterial evolution.Plasmid-mediated gene transfer may be divided into two possible non-exclusive routes: the first route is transfer of the plasmid genetic element, whose encoded functions are available for the host as long as the plasmid persists in the host.The second route is the transfer of plasmid genes to the chromosome, such that plasmid functions may become available to the host also following plasmid loss.Gene transfer may also occur from chromosomes to the resident plasmid, leading to gene gain by the plasmid and the evolution of novel plasmid functions.Here, we evaluated the frequency of gene transfer between plasmids and chromosomes by investigating the extent of gene sharing between plasmids and their host chromosomes.
Our analysis revealed two main properties of DNA sharing patterns between plasmids and chromosome pairs: duplication and amelioration.Plasmid loci shared with the chromosome may be fragmented, such that they cannot be found using mere sequence similarity search.To gain evidence for a common evolutionary history of such fragmented events, we developed the hit-agglomeration approach.Fragmented evidence for homologous sequences in plasmids and chromosomes is likely due to amelioration of sequence similarity following the gene transfer.Our analysis furthermore revealed that plasmid loci shared with the chromosome may be duplicated in the chromosome (and plasmid) in multiple copies.Such pattern of sequence similarity cannot be discovered by collinear alignment methods (i.e., global pairwise alignment); hence, we opted to visualize the data and examine pattern of sequence similarity in 2D space, similarly to dot-matrix approaches. 62,63Our extension of dotmatrix to perform whole-genome comparisons enabled us to uncover the full extent of shared sequences between plasmids and chromosomes.
The functional annotation of highly repetitive shared sequences showed that these mostly correspond to transposable genetic elements, which we found duplicated in several plasmidchromosome pairs in hundreds of copies.5][66] Contemporary evidence from experimental evolution experiments furthermore showed that transposon-mediated gene transfer between plasmids and chromosomes can be frequent and rapid. 23Indeed, our analysis uncovered a sumptuous number of transfer events comprising multiple genes, with one commonly annotated as a common component in MGEs, such as transposases or ISs.The discovery of transfer events comprising multiple genes is another strength of our approach.Although our approach enabled the discovery of several instances of transfer of functionally related genes (i.e., operons, Figure 4), most of the multiple-gene transfers we reconstructed here correspond to MGEs.
The role of plasmids in the dissemination of ARGs has been extensively studied (reviewed in Castan ˜eda-Barba et al. 67 ).Furthermore, the potential transfer of ARGs between plasmid and chromosomes has been suggested as a major determinant of plasmid survival (known as the plasmid paradox 27 ).Our results indicate that if ARGs are transferred from the plasmid to the chromosome, such transfers are rather a rare phenomenononly ca. 5% of the plasmid-encoded ARGs in our data were identified as shared with the host chromosome.Indeed, this proportion supplies a glimpse only into those isolates where ARG transfer may render the plasmid non-essential, whereas past transfer events leading to plasmid loss remain unknown.Notwithstanding, the ARG repertoire in plasmids and chromosomes is distinct, 28 suggesting the presence of multiple barriers for ARG transfer between plasmid and chromosomes. 29Our study supports the notion that ARG transfer between plasmids and chromosomes is a rare event in the evolution of antibiotic resistance plasmids.
The transfer events detected by our method reveal frequent DNA transfer between plasmids and chromosomes that is mediated by MGEs, which harbor the required mechanisms for integration into DNA molecules.The loci where we observed eroded sequence similarity are likely non-functional and correspond to intergenic regions.Indeed, a recent study of plasmid evolution revealed that the non-functional copies of diverse MGEs are highly frequent in plasmids. 54Putative transfers that correspond to functional genes not accompanied by MGEs constitute the minority of segments in our data, indicating that transfer mediated by homologous recombination is extremely rare in nature.The integration of whole plasmids in the host chromosome is likewise rare in our data.Plasmid integration in the host chromosome may lead to the emergence of a new replication origin, e.g., as previously observed in the cyanobacterium Synechococcus elongatus. 68Because eubacterial chromosomes are characterized by a single replication origin, 69 such integration events likely have negative consequences for the host's replication.
Our findings reveal several parallels in the evolution of gene transfer between plasmids and their host chromosome and endosymbiotic gene transfer in eukaryotes.In both evolutionary processes, there is a continuous transfer of genetic material that typically generates genetic diversity, yet the transferred material is typically non-functional, better described as molecular fossils (Figure 5). 34,36Furthermore, in line with our suggestions here, the contribution of homologous recombination to endosymbiotic gene transfer is considered limited, with most organelle DNA likely integrated into the nuclear genome via illegitimate recombination (e.g., non-homologous end joining). 70,713][74] Taken together, we conclude that plasmid-mediated gene transfer most often corresponds to transfer of the plasmid entity rather than the transfer of protein-coding genes from plasmids to chromosomes.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following: or having more than a single chromosome were excluded.Notably, sequence similarity between plasmids and their resident chromosomes may stem from assembly artifacts.This includes the assembly of resident plasmids as part of the host chromosome (e.g., as reported in Weber et al. 84 ), or alternatively, reports of chromosomal fragments as plasmids.Consequently, the data was subjected to consecutive filtration steps in order to exclude genome assemblies suspected to be erroneous.In order to exclude erroneously annotated plasmids, contigs that were not included in PLSDB 77 (version 2020_06_23) were excluded.Similarly, genome assemblies that had a 'suppressed' status in RefSeq 76 (version October 2023) were excluded.The presented results correspond to homologous regions identified in 1,304 isolates (27.9% of the original dataset), including 2,966 plasmid-chromosome pairs.

Detection of local sequence similarity
To find shared sequences within plasmid-chromosome pairs we conducted a comprehensive sequence similarity search of the plasmid sequence against the chromosome sequence using two tools: BLAST (blastn) 80 and MUMmer. 81BLAST is powerful in detecting sequence similarity while allowing for a certain frequency of mismatches and gaps, while MUMmer is useful for the detection of identical sequences only and finding small repeats such as CRISPR arrays and microsatellites.By combining hits from BLAST and MUMmer analysis, we could cover different types of shared sequences that would be overlooked otherwise.Aiming for a conservative estimate of sequence similarity, we further filtered the resulting MUMmer hits using a threshold of R20bp hit length and the BLAST hits using a threshold of R80% identical nucleotides in the hit alignment.In the 2,966 plasmid-chromosome pairs, we detected 1,036,360 BLAST hits and 1,109,447 MUMmer hits.The number of BLAST hits per pair ranged between 0 and 40,250 hits with a median of 57 hits.The number of MUMmer hits per pair ranged between 0 and 35,623 hits with a median of 29 hits.

Agglomeration of local sequence similarity data into segments
In order to join neighboring local sequence similarity hits into segments, we designed and implemented a hit agglomeration algorithm.The hit agglomeration algorithm was applied to both replicons in each chromosome-plasmid pair.The input for the hit agglomeration algorithm is a vector of binary data that was generated as follows.Data of sequence similarity hits resulting from BLAST and MUMmer along replicons was converted into a string of binary characters (i.e., 0 and 1) having the same length as the replicon.The nucleotide positions included in sequence similarity hit were labeled as 1 and those nucleotides not corresponding to a hit were labeled as 0.
Segments are defined by the coordinates of the first and last nucleotide positions.The agglomeration algorithm runs on each position of the binary sequence and compares the density of the similarity hits over a predefined window size to the left and the density of the similarity hits over the same window size to the right dr p = r p w dl p = l p w where: r p is the number of hits over w nucleotides to the right of position p l p is the number of hits over w nucleotides to the left of position p dr p is the density over w to the right dl p is the density over w to the left Thereafter, the method assigns the score of the difference in the density to each position of the sequence.We call the scoring vector (differential density, dd): dd p = dr p À dl p dd p is the differential density for position p, it ranges between -1 and 1.A high absolute value of dd implies that the similarity hits are clustered in one side of the position p, and a low absolute deferential density (dd) means that this position is in the middle of a homogeneous distribution of the character (Figures S3A and S3B).The differential density (dd) vector has the same length as the original sequence for circular sequences.We initially used three window sizes of 200bp, 500bp and 1,000bp.For our calculations, we chose the window size of 1,000bp as it had the highest sensitivity with no large difference in its specificity.
The agglomeration algorithm was implemented in a recursive function.By splitting the sequence at the position with the local maximum absolute values of dd, those positions of local maximum dd or local minimum dd should satisfy two criteria before being selected as segmenting points: (1) They should be above a certain threshold.(2) They should be in a transition between one character to the other (0 to its left and 1 to its right or 1 to its left and 0 to its right).Segmenting at a selected position produces two sub-sequences that have the same structure as the original sequence, the two daughter sequences are fed to the segmentation again and again recursively until the termination condition is met.The segmentation terminates when the maximum dd is below a certain threshold.
In order to estimate a dd threshold, we simulated multiple binary sequences with similar characteristics (length and density) as the segment resulting from empirical data.We calculated the random maximum dd for each simulation, thereafter we defined a threshold using a significance level of a = 0.01.For calculating the dd vectors, we have used a convolution function.We defined the convolution segments in the paired plasmid (P < 0.001, using Wilcoxon test), with an average ratio of 1.66 chromosome to plasmid segments in all pairs.Furthermore, the total segment length was significantly larger for chromosomes in comparison to their paired plasmid (P < 0.001, using Wilcoxon test).Sequence similarity hits may be found in a close proximity on one replicon but scattered far apart on the other replicon; this pattern corresponds to random evidence for sequence similarity that falls closely on one of the replicons by chance alone.Co-linear hits found in close proximity in the plasmid will be joined into one plasmid segment by our agglomeration approach.However, if the corresponding sequence similarity hits in the chromosome are scattered (i.e., do not form a chromosomal segment), no intersection will be formed.Indeed, sequence similarity hits included in intersection-less segments were shorter and found with a higher E-value compared with hits in segments having a matched intersection in the chromosome (Figures S1D and  S1E).Consequently, we considered intersection-less segments as spurious segment and excluded those from the data; similarly, intersections corresponding to short segments (<100bp) were excluded from further analysis.
By gridding the 2D dot plot of hits depending on the alternating regions of segments and non-segments for each replicon, where segments of the x-axis replicon determine columns and segments of the y-axis replicon determine rows, we acquire areas from the dot plot that represent intersections for every pair of segments belonging to different replicons.This will yield i number of intersections between the two replicons, where theoretically: Sc is the number of chromosomal segments, and Sp is the number of plasmid segments.
A set of hits that belongs to the same segment on the chromosome and the same segment on the plasmid falls into the intersection of those two segments.Those hits potentially belong to one event of transfer, or alternatively, a duplication or a rearrangement of the transferred region.Each intersection is defined by four coordinates (a beginning and an end on each replicon), and has five characteristics; Two dimensions coming from the size of the chromosomal segment on the x-axes, and the plasmid segment on the y-axes, a hit density on the chromosome.Theoretically, the number of all possible intersections between chromosomal and plasmid segments in our dataset is 11,572,139.However, when accounting for intersections, we eliminate spurious intersections by considering only intersections that contain at least one hit.Thus, the actual number of non-empty intersections is 304,003.
Segments on one replicon might not form any full intersection with any segment on the other replicon, this means that while hits within this segment are clustering together on its replicon, they are sparse on the other replicon and failed to create any segment on it.Excluding all segments that did nor form an intersection reduced the number of plasmid segments to 45,399 segments (71% of the total plasmid segments) and 99,289 (91.9%) chromosomal segments in 2,116 plasmid-chromosome pairs (93% of pairs with segments).Some intersections that contain hits might still be partially or mostly empty on either one or both of the replicons, this observation can be explained by genomic rearrangements on the replicon where the intersection is not full as the segments are formed by hits falling together on one of the replicons but could be anywhere far from each other on the other replicon.An intersection has two fullness measures, one on the plasmid segment and one on the chromosome segment.In Figure 5A, the three purple boxes illustrate three intersections that are full on the chromosome (fullness = 1) and only partially full on the plasmid (0 < fullness < 1).This pattern reflects a biological scenario of a DNA transfer from the plasmid to the chromosome followed by rearrangements on the chromosome, as it is more parsimonious than the alternative scenario that multiple regions from the chromosome have transferred separately to the plasmid and ended up in a continuous DNA segment.The green pattern in Figure 1D shows multiple intersections with fullness = 1 on both replicons and those intersections belong to the same plasmid segment but to different chromosomal segments.Here there is no obvious parsimonious scenario, as the transfer could have happened in one event in either direction or in multiple events from the plasmid to the chromosome, and duplication events might have preceded or followed the transfer if it happened from the chromosome to the plasmid.
Analyzing the pattern of sequence similarity hits within the intersections, we realized that the intersection of segments on both replicons may generate false intersections created by small hits that are falling in the shadow of other, well-supported, intersections (see Figure S4).Consequently, we opted to perform a downstream filtration of short segments.One option was to filter the intersections for their fullness (that is, how much of their space is covered by sequence similarity hits), however, there was no clear threshold that we could identify for that criterion.Alternatively, we found that filtering by the plasmid segment length was useful in eliminating most of the putatively false intersections, with a small effect on other general patterns of shared sequence similarity between plasmids and chromosomes (Figure S5).Consequently, we opted for applying a threshold on the plasmid segment length where all segments <100 bp are excluded prior to the intersection procedure.
Visualization of 2D plots were performed using an in-house MatLabª script.A set of hits that belongs to the same segment on the chromosome and the same segment on the plasmid falls into the intersect of those two segments.Those hits either belong to one event of transfer, or alternatively, a duplication or a rearrangement of the transferred region.Each intersect is defined by four coordinates (a beginning and an end on each replicon), and has five characteristics; Two dimensions coming from the size of the chromosomal segment on the x-axes, and the plasmid segment on the y-axes, a hit density on the chromosome, a hit density on the plasmid, and the number of hits that it contains.The density by hits for intersects is calculated by dividing the cumulative size of all hits' projections on one replicon by the size of the segment.Our analysis revealed 304,003 intersects in 1,974 chromosomeplasmid pairs.

Classification into functional categories and detection of ARGs
Classification of genes into functional categories was performed with EggNog-mapper v2 (WEB). 78That is based on EggNOG 5.0 database. 85Antibiotic resistance genes were identified with the comprehensive antibiotic resistance database (CARD) and resistance gene identifier (RGI) tool. 79

Statistical tests
Data visualization and statistical tests were performed with MatLabª and R. P-values below the function threshold (eps = 2.2x10 -16 ) are reported as P < 0.001.

Figure 1 .
Figure 1.Detection of homologous genomic regions between plasmids and chromosomes For a Figure360 author presentation of this figure, see https://doi.org/10.1016/j.cub.2024.06.030.(A) Distribution of the proportion of plasmid genome sequence shared with the chromosome (TableS1).The inlay shows a higher resolution of the data for 0-0.4 shared plasmid fraction for the top most frequent genera.(B) 2D dotplot presentation of sequence similarity between plasmid lp28-1 (y axis) and the chromosome (x axis) in Borrelia burgdorferi JD1 (GenBank: GCA_000166655.2).Colored strips next to the axes show loci of sequence similarity to the paired replicon as identified with BLAST or MUMMER, segments resulting from the agglomeration algorithm, and annotated genes.Sequence similarity between plasmid and chromosomes are shown with red and yellow lines that correspond to the BLAST and MUMMER hits.The inlay shows a zoom into an intersection of sequence similarity in both plasmid and chromosome.The plasmid segment overlaps with a plasmid-encoded restriction enzyme and three pseudogenes (one of which is of a methyltransferase).(C) An illustration of hypothetical scenarios for DNA transfer between plasmid (green) and chromosome (blue) followed by duplications of the transferred DNA.Similarity detection: a schematic representation of detected local sequence similarity between the plasmid and chromosome.Red, gray, orange, and brown fragments depict local similarity hits in co-linear blocks.Hit agglomeration: co-linear sequence similarity hits are joined to form segments of shared sequence similarity between plasmid and chromosome.A pairwise presentation of the segments is possible in 2D space (as in B).

Figure 2 .
Figure 2. Intersections of plasmid and chromosome segments reveal putative gene transfer between plasmids and chromosomes For a Figure360 author presentation of this figure, see https://doi.org/10.1016/j.cub.2024.06.030.Two demonstrative examples of detected intersections of plasmid and chromosome segments for (A) plasmid pTEF1 in E. faecalis V583 (GenBank: GCA_000007785.1) and (B) a mega-plasmid in P. putida S12 (GenBank: GCA_000495455.2).Axes in the 2D presentation correspond to chromosome (x axis) and plasmid (y axis) genome sequences.Colored strips next to the axes show loci of sequence similarity to the paired replicon as identified with BLAST or MUMMER, segments resulting from the agglomeration algorithm, and annotated genes.Intersections of plasmid and chromosomes are shown with red and yellow lines that correspond to the BLAST and MUMMER hits.Illustrations of plasmid maps show annotated genes in the plasmid genome and segments identified in the plasmid genome with our approach.

Figure 3 .
Figure 3. Sequence characteristics and protein-coding genes in plasmid segmentsFor a Figure360 author presentation of this figure, see https://doi.org/10.1016/j.cub.2024.06.030.Properties of plasmid segments that correspond to intersects with chromosomal segments are presented.The plasmid segment length had a median of 381 bp and a maximum of 116,763 bp.Here, we classify plasmids into three arbitrary size categories: small plasmids < 100 kb, medium plasmids between 100 and 500 kb, and large plasmids > 500 kb.(A) The distribution of plasmid segment length shown in three categories of plasmid size (see FigureS1for additional details; segment data are available in TableS2).(B) Distribution of the proportion of plasmid genome that corresponds to plasmid segments according to three categories of sequence similarity between plasmid and chromosome.The inlay figure shows the plasmid fraction range between 0.1 and 1.The segment sequence similarity ranged between 80%-100% identical nucleotides with a median of 91.8% (but we note the high sequence similarity threshold used in the initial screening).(C) A comparison of segment sequence and annotation of protein-coding sequences (CDSs).The fraction of segments and their proportion from the plasmid and chromosome genome are shown according to four categories.(D) Frequency of CDSs in plasmid segments according to their functional annotation (see example in FigureS2; data are available in TablesS3 and S4).

Figure 4 .
Figure 4. Co-transferred genes in plasmid segments are functionally related For a Figure360 author presentation of this figure, see https://doi.org/10.1016/j.cub.2024.06.030.(A) A heatmap showing the frequency of paired combinations of gene functions that appear in plasmid segments that comprise multiple CDSs.The matrix axes correspond to gene function.Each cell in the matrix shows the frequency of paired gene functions found in plasmid segments (see color legend on top).(B and C) Examples of plasmid segments that reveal putative transfer of multiple genes of similar functional category.The average sequence similarity of hits comprising these segments was 92% identical nucleotides in (B) and 94% identical nucleotides in (C).No IS elements or transposons could be identified in the flaking regions of these putative transfer events.See Figure2for elements in the 2D presentations of plasmid-chromosome shared sequence similarity.

Figure 5 .
Figure 5. Intersection shape supplies clues for the transfer evolutionary history For a Figure360 author presentation of this figure, see https://doi.org/10.1016/j.cub.2024.06.030.(A) A two-dimensional representation of DNA similarity between paired plasmid and chromosome (as in a dot plot).Intersections of plasmid and chromosome segments correspond to shared sequence similarity and their pattern reflects different biological scenarios.Those are detailed in the legend where P stands for plasmid and C for chromosome.(B) Intersection shape of plasmid segments.(C) Copy number of plasmid segments in the host chromosome.(D) A 2D view of intersects in P. salmonis strain PM32597B1 plasmid pPSB1-2 and the chromosome.(E) 2D view of sequence similarity of phage-plasmid FCATYP2070I in C. atypicum.(F) 2D sequence similarity between plasmid pVR-MSSA_02 and the chromosome in S. aureus.

TABLE
Detection of local sequence similarity B Agglomeration of local sequence similarity data into segments B Intersections of plasmid and chromosome segments B Classification into functional categories and detection of ARGs d RESOURCE AVAILABILITY B Lead contact B Materials availability B Data and code availability d METHOD DETAILS B Data B B Statistical tests