Incompatibility groups of Pseudomonas plasmids revisited: comprehensive analysis of R-factors and their replicons

Bacterial plasmid incompatibility (Inc) groups, IncP-1 to IncP-14, have long been recognised as R-factors. These factors harbour antimicrobial resistance genes (ARGs) in Pseudomonas species (PIncs). Despite their importance, some PIncs plasmid remain underexplored at the sequence level. This study attempts to address this gap by determining the complete nucleotide sequences of several key plasmids. Using BLAST search and AlphaFold3-based protein structure prediction, we identified replication initiation proteins (RIPs) and origins of vegetative replication (oriV) for these plasmids. Notably, mini-plasmids containing these RIP and oriV sequences successfully replicated in Pseudomonas aeruginosa. Through pangenome analyses using the PLSDB database with the experimentally identified RIP gene reference, 2,351/59,895 plasmids were classified, and their core and accessory genes, including ARGs, were detected. Furthermore, phylogenetic analyses of the RIPs with publicly available protein sequences enabled the grouping of existing PIncs into distinct evolutionary lineages, providing a robust framework for identifying RIPs across a broader range of previously uncharacterised plasmids. The RIPs of 8,860 plasmids in the PLSDB were newly assigned. These findings offer crucial insights into the complex landscape of plasmid-mediated ARG dissemination, marking a significant advancement in the understanding of Pseudomonas plasmids across clinical and environmental contexts.


INTRODUCTION
Plasmids are self-replicating mobile genetic elements (MGEs) that play a key role in facilitating the exchange of genetic information among bacterial cells through conjugation or transformation.Plasmids, with their diverse functions, have emerged as the primary MGEs responsible for disseminating antimicrobial resistance genes (ARGs) among bacterial populations, which are classically known as R-factors and represent a threat to global public health and antimicrobial development (1).The accurate classification of plasmids is therefore indispensable for gaining valuable insights into the epidemiology of plasmid-mediated antimicrobial resistance, which would allow us to identify specific plasmids carrying ARGs.This precision of classification is paramount for conducting genomic epidemiological studies (2,3).Historically, a primary method for classifying plasmids has been based on the concept of incompatibility (Inc) grouping, which relies on the phenotypic interactions between host cells harbouring two different plasmids (4).Plasmids that cannot co-reside within the same host cell line belong to the same Inc group, indicating that their replication or maintenance systems are similar.
ESKAPE pathogens, which include Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter spp., have acquired clinically significant ARGs via plasmids and are recognised as key multidrug-resistant bacterial species, and novel antimicrobial agents must be developed to combat these species (5).The genus Pseudomonas, including P. aeruginosa, has been extensively studied because of its diverse nature, widespread presence, and clinical significance as a causative agent of various infections (6).Since the 1970s, a series of Inc groups (from IncP-1 to IncP-14) have been proposed based on incompatibility tests conducted on the genus Pseudomonas (Table 1) (7).In parallel, within the order Enterobacterales, which includes Escherichia coli, Klebsiella species, Enterobacter spp., and related bacteria, Inc groups ranging from IncA to IncZ were defined based on their host organisms (8).Notably, the same plasmid can be classified into different Inc groups depending on the bacterial host, with examples such as IncP-1 being equivalent to IncP, IncP-3 corresponding to either IncA or IncC, IncP-4 being the same as IncQ, and IncP-6 aligning with IncG.Based on these classifications, extensive studies have been conducted on the fundamental characteristics of these plasmids, including their replication, maintenance systems, and transmissibility (Table 1) (9)(10)(11)(12).
However, the terminology surrounding these classifications can be confusing.The IncP-1 to IncP-14 groups, commonly referred to as "Inc groups in Pseudomonas," are often mistakenly labelled as "IncP" or "IncP groups".Notably, "IncP" specifically refers to a single group within the classification of plasmids in Enterobacterales, corresponding only to the IncP-1 group in Pseudomonas.To avoid confusion, we refer to the Inc groups in Pseudomonas as "PIncs."Recent advancements in sequence technologies have enabled the complete sequencing of several PIncs plasmids, providing detailed insights into their genetic makeup (13)(14)(15)(16).Nevertheless, significant gaps exist in our understanding of certain PIncs groups, including IncP-5, IncP-11, IncP-12, IncP-13, or IncP-14, and only a partial DNA region is deposited for IncP-10 (Table 1).This is likely due to the narrow host range of some of these plasmids, which cannot be transmitted to E. coli from their original host P. aeruginosa via conjugation (17).
In this study, we present the complete nucleotide sequences of five R plasmids, i.e., Rms139 (IncP-2) (18), Rms163 (IncP-5) (19,20), Rsu2 (IncP-9), RP1-1 (IncP-11) (21), R716 (IncP-12) (22), and pMG26 (IncP-13) (23,24).Additionally, through in silico analyses, we identified a putative representative plasmid for IncP-10, similar to the archetype IncP-10 plasmid, R91-5 (25,26).After identifying their replication-initiation protein (RIP) genes, we reclassified these plasmids in a publicly accessible database (27) and proposed the introduction of two new plasmid groups in Pseudomonas: IncP-15 and IncP-16.Furthermore, several other unclassified plasmids in the genus Pseudomonas, which harbour clinically important ARGs, such as genes conferring resistance to carbapenem and tigecycline, were examined.Two additional groups of plasmids, pQBR103-like and pSTY-like plasmids, were further discussed as unassigned PIncs groups.The elucidation of plasmid diversity and classification within PIncs is vital for deepening our understanding of genetic exchange mechanisms and the dissemination of ARGs.We aim to address the existing gaps in plasmid classification by proposing updated identification schemes of RIPs.Using RIP data elucidated in this study, we could accurately classify 2,351 plasmids within PLSDB, one of the plasmid databases containing 59,985 plasmids with their whole nucleotide sequences (27,28), of which 27,126 have not yet been classified.The identified RIP information with previously known information was used to infer phylogenetic trees of putative RIPs from publicly available protein sequences (UniParc).Our analysis revealed the following: (i) except IncP-6, all other plasmids possess a winged-helix (WH) domain (named WH RIPs), exhibiting homology with each other, indicating a shared evolutionary origin.Based on reconstructed phylogeny, we propose a bifurcated evolutionary scenario of WH RIPs; one lineage that includes IncP-5, -9, -10, and -11, and another distinct lineage.(ii) IncP-6, in contrast, belongs to the superfamily of archaeo-eukaryotic primases (AEPs) and represents a distinct evolutionary lineage.

DNA manipulation
Total DNA from the bacterial strains was extracted using a NucleoSpin Tissue Kit (TAKARA BIO, Japan).Polymerase chain reaction (PCR) was performed on a T100 thermal cycler (Bio-Rad Laboratories, USA) using the KOD One PCR Master Mix (TOYOBO, Japan).The primers used in this study are listed in Table S1.The following amplification conditions were used: 30 cycles at 98°C for 10 s, 67°C for 5 s, and 68°C for 1 s.The pUC-GW plasmid with DNA fragments including putative RIP genes and oriVs of each plasmid, except those for RP1-1 and pMG26, was synthesised by the custom service of Azenta Life Science, USA.Restriction enzymes (New England Biolabs, USA or TAKARA BIO), the HiYieldTM Gel/PCR DNA fragments Extraction kit (RBC Bioscience, Taiwan), NEBuilder Hifi DNA Assembly system (New England Biolabs), and competent E. coli JM109 and DH5α cells (RBC Bioscience) were used for cloning the DNA fragments.Plasmid DNA from the bacterial strains was extracted using the NucleoSpin Plasmid EasyPure Kit (TAKARA BIO).Electroporation to introduce the plasmids into different bacterial strains was performed using a MicroPulser electroporator (Bio-Rad Laboratories).All other procedures were performed according to standard methods (29).

Conjugation assays
Filter mating assays were performed as described previously (30).In brief, donor and recipient strains, P. aeruginosa PAO1808 with a plasmid and P. aeruginosa PAO1R or P. putida KT2440, were cultured in LB with the appropriate antibiotic(s) (preculture).One millilitre of the above culture in LB medium was harvested and suspended in fresh LB medium.Approximately 10 8 colony forming units/mL (CFU/mL) of the donor and recipient suspended in 150 µL LB were mixed and then dropped on 0.2 µm pore-size filters (cellulose acetate, Advantec) on an LB agar plate for 24 h at 37°C.The mixture on the filter was resuspended in PBS and then spread onto LB plates with appropriate antimicrobials to select transconjugants (i.e., recipient cells with the plasmid).Liquid mating assays were performed as follows.The precultured donor cells were resuspended in 500 µL LB, mixed with 1 mL recipient cells for 2 h (200 rpm), and statically incubated for 3 h at 37°C.The cells were then harvested, resuspended in PBS, and spread onto LB plates with the appropriate antibiotics to select the transconjugants.

Complete sequencing of the plasmids by next-generation sequencing
The nucleotide sequences of the plasmid DNAs were determined as follows: whole genomic DNA extracted from their host (P.aeruginosa PAO1808, a derivative strain of PAO1 (31)) was sequenced using the HiSeq2500 platform (Illumina, USA).Trimmed high-quality short reads (read length >140 bp and quality score >15) were assembled using SPAdes v3.14.0 using the plasmid option (32).If circular contig(s) could not be found, the host chromosomal DNA was removed by mapping the resultant contigs on the host genome sequences [their deposited sequences, that is, DDBJ/GenBank accession number NC_002516 (P.aeruginosa PAO1)] using Geneious Prime software v2020.01(33).Subsequently, the plasmid reads were extracted from the remaining contigs using SAMtools v1.7 (34) and SeqKit v0.8.0 (35) and reassembled using SPAdes (36).For RP1-1 and pMG26, gaps in the plasmids were closed by PCR with the designed primer sets based on the ends of contigs and Sanger sequencing of the PCR products.

Prediction of RIP genes and oriVs of each plasmid
To identify the genes encoding the RIP and origin of vegetative replication (oriV) regions of Rms139, Rms163, RP1-1, and R716, two prediction approaches were used; after coding sequence (CDS) annotations using DFAST-core, protein.faafiles generated by DFAST-core were used for the following: (i) nucleotide sequence-based searches were performed using BLASTn; (ii) amino acid sequence-based protein structure prediction and the structure-based searches were performed as described later.The GC-skew value, which has been commonly used to measure the asymmetry in the composition of two nucleotides, guanine (G) and cytosine (C), along a DNA strand and is defined as (G-C)/(G+C), of the plasmids was used by Webskew (https://genskew.csb.univie.ac.at/webskew) to detect the changes in the GC skew, which indicates the DNA replication origins or termini (42,43).Subsequently, the nucleotide or amino acid sequences of putative CDSs around the above GC-skew shift point(s) were subjected to methods (i) or (ii).The putative promoter region of the RIP gene was predicted using BPROM (http://www.softberry.com/berry.phtml?topic=bprom&group=help&subgroup=gfindb) (44).Subsequently, the predicted RIP and oriV sequences of each PIncs were cloned into the pUC-GW-Amp vector (Azenta Life Science).Transformation of P. aeruginosa PAO1808 was performed with each of the resultant plasmids (at most 472 ng/μL), and the cultures were spread onto LB+Pip plates.After incubation at 37ºC overnight, colonies on the plates were isolated.Next, the presence of the mini-plasmid was confirmed by PCR with specific primers after extraction of total DNA of transformants.

Protein structure prediction of RIPs
AlphaFold3-mediated protein structure prediction was performed using the amino acid sequences of the RIP gene products in representative Pseudomonas plasmids (https://figshare.com/s/9554d4fc982d7929fb77) on the AlphaFold server (45).A structurebased protein homology search using the predicted structural models was performed for the PDB100 database on the FoldSeek server in 3Di/AA mode (46).The predicted and experimental structures were visualised and compared using PyMOL v3.0.3.The predicted structures were coloured based on the root mean square deviation (RMSD) from the reference experimental structures, i.e., the Pi or Rep19A proteins, which are RIPs of the R6K and pCA-347 plasmids, respectively (47,48), using the ColorByRMSD script (https://pymolwiki.org/index.php/ColorByRMSD).The distances between aligned C-alpha atom pairs were stored as B-factors of these residues, which are coloured using a colour spectrum, with blue specifying the minimum pairwise RMSD and red indicating the maximum.Unaligned residues were coloured grey.
For the final dataset, the sequence length (bp) and GC content (%) were calculated using SeqKit v2.3.0 (https://bioinf.shenwei.me/seqkit/)(35).Plasmid sequences belonging to each PIncs group were re-annotated using DFAST-core, with a reference genome from each group to ensure consistency in the functional annotations.GFF3 annotation files produced by DFASTcore were subsequently used as input files for PIRATE, a toolbox for pangenome analysis (51).

Pangenome analysis of plasmids
A pangenome, i.e., the entire set of protein genes from all members of a defined group (all the plasmids of the PIncs group), was analysed using PIRATE (https://github.com/SionBayliss/PIRATE).Here, we used DIAMOND v0.9.14 (52) rather than BLAST (53) to identify homologous genes (protein families) in PIRATE.PIRATE was run on default parameters with a high-scoring segment pair (HSP) query length threshold of 0.5, a sequence similarity step range of 10,20,30,40,50,60,70,80,90,95, and 98%, and an MCL v14.137 inflation value of 2. A locally optimal alignment with no gaps is called a HSP in BLAST (54).MCL is a Markov cluster algorithm (https://micans.org/mcl/).The core genes in each PIncs group were those that were present in more than 95% of the plasmids included in the pangenome analysis.Statistical computing and result visualisation were performed using R v4.1.3(https://www.r-project.org/).The output of SeqKit and PIRATE are available in Figshare (https://figshare.com/s/0f2aa75d1eb4c5848a32).

Phylogenetic analysis of the RIPs
Detailed methods are provided in Supplementary Text S1 and illustrated in Figure S1.In brief, to analyse the evolutionary relationships of the RIPs of PIncs, we identified 2,374 RIPs that originated from 2,332 PInc plasmids.Subsequently, 251 non-redundant proteins at 100% identity were extracted using MMseqs cluster v15-6f452 (55).Sequence homology between the non-redundant proteins was assessed using BLASTp (53), and the resultant hits were visualised via CPAP (https://github.com/yosuken/CPAP).To detect remote homology between PIncs, we extended the sequence diversity by collecting homologous sequences through a twostep homology search process.In the first step, PLSDB proteins predicted with orfM (n = 20,331,985) (56) were searched using hmmsearch of HMMER v3.3.2 (57).In the second step, UniParc proteins (downloaded on Feb 2024; n = 607,912,929) were searched using hmmsearch.We obtained 6,948 non-redundant sequences (at 60% identity) that were classified into 11 homologous groups.To investigate whether these groups exhibit homology with each other, a highly sensitive hidden Markov model (HMM)-HMM comparison was performed using HHblits v3.3.0 (58).We extracted the homologous regions of the alignments of the 10 groups (containing 6,306 sequences in total).The secondary structures of the homologous regions were predicted and visualised based on the crystal structures (PDB: 2NRA, 4PT7) and alphafold3 predictions using iCn3D (https://www.ncbi.nlm.nih.gov/Structure/icn3d/)(59).After removing the gappy sequences, we realigned the remaining 6,265 sequences using mafft v7.455 (60).

Overview of plasmid sequences
The genomes of P. aeruginosa PAO1808, each harbouring one of the six plasmids (Rms139, Rms163, Rsu2, RP1-1, R716, and pMG26), were sequenced.Thus, the complete nucleotide sequences for each R factor were successfully obtained (Table 1).Notably, although previous studies have mostly focussed on IncP-9 plasmids, we also determined the complete nucleotide sequence of an ancestral R plasmid from the IncP-9 group, known as Rsu2 or R2 in certain literature, which was originally isolated at Shinshu University in Japan (67,68).Regarding pMG26, no circular contigs were identified following the removal of reads mapped to the reference genome, the chromosome of P. aeruginosa PAO1.Instead, a unique 99-kb DNA region was identified within the chromosome of its host, PAO1808, which distinguished it from other plasmid hosts.

Identification of the RIP gene in each R plasmid
RIPs of the representative plasmids in each PInc group were identified by nucleotide sequencebased search and/or structure-based search (see Materials and Methods).
Initially, no RIP gene was detected in the IncP-5 plasmid Rms163 through nucleotide sequencebased searches.Subsequently, a GC-skew plot of Rms163 was generated (Figure S3), and three nearby CDSs were selected for three-dimensional structure prediction.One CDS showed structural similarity to the replication initiator A family proteins, specifically to Rep19A of pCA-347 with RMSD = 4.2 (Figure 2B).Downstream of this gene, a putative oriV region was identified with characteristic repeats, palindromic sequences, a putative DnaA box, and an A+T-rich region (Figure 1).Transformation of P. aeruginosa PAO1 with a mini-Rms163A construct containing the RIP gene candidate and oriV resulted in a transformation efficiency of 4.4 x 10¹ CFU/μg-DNA.This candidate gene was subsequently named repP-5A.
Although the complete sequence of the plasmid previously identified as IncP-10 is not available in public databases, a 2.7-kb DNA sequence from the archetype IncP-10 plasmid R91-5 (25) was used as the query.This sequence was previously identified as the replication region of R91-5 through incompatibility testing (26).We identified 45 plasmids in the PLSDB with regions similar to this 2.7-kb fragment, including an unnamed plasmid from the P. aeruginosa strain PAB546 (GenBank accession number NZ_MN433456.1),which we tentatively named pPAB546.While no RIP gene was initially annotated in this plasmid, a manual search similar to that used for IncP-5 led to the identification of a putative RIP similar to Rep19A (RMSD = 5.4) (Figure 2B), encoded by a gene near the conserved 2.7-kb region (Figure 1).The transformation of mini-pPAB546 into P. aeruginosa yielded a frequency of 1.7 × 10² CFU/μg-DNA.
The IncP-11 plasmid RP1-1 possessed a putative CDS classified as rep_cluster_2025 by the MOB-suite package (49), located at nucleotides 54,711-54,953 of the RP1-1 sequence (GenBank accession number LC700336).A nearby oriV region featured four 12-bp repeats and an A+T-rich region, although no DnaA box was identified (Figure 1).Initial transformation attempts with a pUC-GW-AMP construct harbouring these regions failed to yield transformants (<8.0 × 10¹ CFU/μg-DNA).However, a downstream CDS, showing 61% identity with the replication initiator protein RepA from a P. stutzeri plasmid-like contig, was cloned along with the oriV region.The constructed mini-RP1-1A successfully transformed P. aeruginosa at a frequency of 1.3 × 10² CFU/μg-DNA (Figure 1), leading to the designation of the CDS as repP-11A.

pMG26 (IncP-13) is not a plasmid but an integrative and conjugative element (ICE)
The plasmid pMG26, previously identified as a multidrug resistance plasmid and classified as IncP-13 (17), did not yield a circular contig during assembly, even after removing reads mapping to the PAO1808 chromosome.Instead, a 99-kb DNA region was detected within the chromosome of its host, PAO1808, which was not conserved in the hosts of other plasmids.This observation led to the hypothesis that pMG26 might function as an ICE, similar to the reclassification of FP8 as ICEPae1161 (69).
To investigate this hypothesis, PCR analyses were performed to determine whether an intermediate circular DNA form of the 99-kb DNA region could be generated in the host PAO1808.As shown in Figure S4, PCR products were obtained using three different primer sets, suggesting that the pMG26 could indeed form a circular intermediate within its host.Notably, 49-bp direct repeats (5'-ATGGTGGGTCGTGTAGGATTCGAACCTACGACCAATTGGTTAAAAGCCA-3') were identified at both ends of the integrated DNA region.The predicted circular form of pMG26 was 99,171 bp in length and was integrated into the tRNA Lys locus situated between the lepA and clpB region in the P. aeruginosa PAO1 chromosome (5,086,973 nt, NC_002516).
The 99-kb DNA region of pMG26 encoded a MOB H type relaxase and MPF G family type IV secretion system (T4SS) (Table 1).To test its conjugative transferability, both filter mating and liquid mating assays were performed between P. aeruginosa PAO1808 (donor) carrying pMG26 and P. putida KT2440 (recipient).The element was successfully transferred from the donor to the recipient strain, with transfer frequencies of 1.4-1.9× 10 -5 per donor (filter mating) and 0.28-3.4× 10 -5 (liquid mating) per donor (Table 3).These results strongly suggest that pMG26 functions as an ICE.
Additionally, PCR analysis revealed that pMG26 integrated at three different tRNA Lys sites on the KT2440 chromosome (PP_t09, PP_t11 and PP_t13).Notably, the entire DNA region of pMG26 exhibited a high degree of similarity with that of pKLC102 (70) (Figure S5), a member of the PAPI-1 family of ICEs, which are abundant in P. aeruginosa (70).Further analyses using MOB-suite indicated that rep_cluster_322 was present in both pMG26 and pKLC102.The putative oriV region of pKLC102 (70) was also identified in pMG26.
Despite these similarities, no transformants were detected when a mini-pMG26A containing the rep_cluster_322 CDS and putative oriV region of pMG26 was introduced into P. aeruginosa (below the detection limit, <2.0 x 10 1 CFU/μg-DNA).This result was not unexpected because the autonomous replication of ICE is typically mediated by rolling-circle replication involving their relaxases (71).
These findings suggest that pMG26, originally classified as an IncP-13 plasmid, functions as an ICE, with a potential role in the horizontal transfer of multidrug resistance genes in Pseudomonas spp.The high similarity with pKLC102 further supports this reclassification and underscores the significance of ICEs in the evolution and dissemination of resistance determinants.
However, no transconjugants of Rms163 were detected in the liquid mating assay (below the detection limit, <2.0 × 10 -7 per donor), suggesting that the conjugation system of Rms163 is specifically adjusted for transfer under solid conditions.This may reflect the differences in the environmental conditions or surface requirements for effective mating formation (72,73).
MOB-suite analysis revealed that the IncP-2 plasmid Rms139 carried the MPF I family T4SS and MOB P type relaxase gene (Table 1).In contrast, neither the previously known MOB type relaxase nor the MPF family T4SS components were identified in the IncP-5 plasmid Rms163 or IncP-12 plasmid R716.Instead, both Rms163 and R716 were found to harbour defect in organelle trafficking (Dot) or intracellular multiplication (Icm) systems, which are known as type IVB secretion systems (74).Similarly, the IncP-11 plasmid RP1-1 (GenBank accession number LC700336) lacked the previously known MOB type relaxase and MPF families but possessed putative genes for relaxase (34,618-35,376 nt), type IV coupling protein (32,822-30,702 nt), and proteins involved in mating pair formation.
These findings confirm that all the R-factors sequenced in this study are self-transmissible, although they utilise diverse mechanisms for conjugation.This diversity in conjugation systems highlights the varied strategies employed by plasmids to spread across bacterial populations, which has important implications for the dissemination of antibiotic resistance genes.

Reclassifications of plasmids in PLSDB based on RIPs of PIncs
Based on the nucleotide sequences of the identified RIP and oriV regions of the plasmids of IncP-1 to IncP-13, new Inc groups within the Pseudomonas genus could be proposed.In our previous studies, novel plasmid groups hosted by Pseudomonas, known as PromA group and pSN1216-29-like plasmids, were identified (75)(76)(77)(78).As the nucleotide sequences of their RIP gene and oriV region differed from those of IncP-1 to IncP-13, these new plasmid groups were designated as IncP-15 and IncP-16, respectively.
Several other important Pseudomonas-derived plasmids carrying ARGs have not yet been assigned to any Inc group.For instance, pQBR103 is a large 425-kb plasmid identified in Pseudomonas strains colonising the leaf and root surfaces of sugar beet plants (79,80).This plasmid contains two putative RIP genes located at 339,909-341,066 nt and 388,322-389,491 nt.Notably, the latter RIP gene exhibited low identity (46.2% at the nucleotide level) with repP-2A.Recently, similar large-size plasmids (>400 kb) have been identified, which carry the tmexCD-toprJ gene cluster, encoding a resistance-modulation-division (RND) family efflux pump (81), conferring resistance to multiple classes of antimicrobials including tigecycline.
Another plasmid of interest is pSTY, which was identified in the styrene degrading Pseudomonas strain VLB120 (82).Recently, plasmids similar to pSTY have been reported to carry carbapenem resistance genes, bla VIM (83).Their RIP genes and oriV regions were predicted, leading to their provisional classification as Inc pSTY , despite the absence of incompatibility testing (83).The identification and classification of such plasmids are crucial because they expand our understanding of the diversity and evolutionary trajectories of ARGbearing plasmids in Pseudomonas.
To identify and accurately classify similar plasmids within the publicly available PLSDB (2023_11_03_v2, containing 59,895 plasmids) into specific PIncs, we prepared custom-made RIP reference sequences (Table S2).These reference sequences included the newly identified RIP gene sequences of PIncs, excluding IncP-8 and IncP-13 (ICEs) and IncP-14 (not available at the time).Using these custom sequences, 2,357 plasmids were identified that could be classified into PIncs (Table 1, S4).This approach allowed for the classification of 530 plasmids that were not previously classified by the PlasmidFinder database and 1,514 plasmids that were not previously classified by the MOB-suite package database, resulting in a more comprehensive plasmid classification.Notably, our custom-made RIP reference corrected 109 misclassifications identified by the MOB-suite package database, underscoring the accuracy of our approach (https://figshare.com/s/915a336484ee084ffbea).

IncP-1 (=IncP) plasmids
In our previous study, 14 subgroups within IncP-1, denoted as IncP-1αβγδεζηθικλμορ, were proposed based on the incompatibility tests (78).In this study, we tentatively propose an additional subgroup, IncP/P-1π, which includes 41 plasmids (Table S4), including pEP5289.Although pEP5289 was previously identified as a member of the IncP-1 group found in the genus Neisseria (84), it has not been experimentally demonstrated whether this plasmid is incompatible with other IncP-1 plasmids.Furthermore, the host range of these plasmids is limited to the genus Neisseria, a feature that may distinguish them from other IncP-1 plasmids.A total of 25 core genes were identified across 321 plasmids in this group, the average numbers of which were 65.7 (Table 1, S4, S5-1).

IncP-2 plasmids
Plasmids belonging to the IncP-2 group exhibit significantly larger average DNA sizes of approximately 440 kb.Notably, 293 core genes were identified within this group (Table 1, S4, S5-2), a number significantly higher than that in the other PIncs group.This was around 53% of the average gene numbers (293/549.8) of 87 plasmids in IncP-2 (Table S4).As previously reported (85), the megaplasmid family known as pBT2436 and pRBL16 (14,15), associated with ARGs, has been recognised as part of the IncP-2 plasmids.Given their large size, specific gene sets in the plasmids are likely crucial for maintaining these plasmids within their hosts.Interestingly, one core gene product exhibited identity with the Tus protein (Table S5-2), known for binding to DNA replication terminus (ter) sites of replicons (86).This protein, encoded by rep_cluster_1115 in the MOB-suite database, was found not only in all IncP-2 plasmids but also in nine other large-sized plasmids (113-527 kb) and, therefore, was removed from our custom-made RIP reference sequences (https://figshare.com/s/3e3ced8633136259f678).

IncP-4 (=IncQ) plasmids
IncP-4 plasmids, also known as the IncQ group, exhibit a wide range of DNA sizes, from 1.7 to 412 kb, with an average of 121 kb (Table 1, Figure 3A).These plasmids are characterised by the presence of repBAC genes, which are essential for their single-strand displacement replication (12).However, only two core genes repA and repC (repP-4A) were consistently observed across this group, which was a rather small ratio for their average gene numbers, i.e., 1.5% of 134.1 (Table 1, S4, S5-4).This was possibly due to the presence of multi-replicons in the IncP-4 group, with only 29.1% plasmids containing a single replicon (Table S4).

IncP-5 plasmids
Only one plasmid closely related to Rms163, an IncP-5 plasmid, was identified in the PLSDB (Table 1).This limited representation in the database may not reflect the true prevalence of these plasmids in natural environments.ARGs in Rms163 were located within a Tn3-like transposon and a remnant of class 1 integron (Figure S6), with nucleotide sequences showing 100% identity to segments of Tn5393 and class 1 integron previously observed in an IncP-9 plasmid, pSN0726-36, isolated from a soil sample in Japan (78).Interestingly, the repP-5A gene in Rms163 exhibited 100% identity with the corresponding gene in pPALA50 (GenBank accession number CP111035.1), a gene recently discovered in P. aeruginosa isolated from the sputum of a patient with cystic fibrosis (88).Despite the absence of ARGS in pPALA50, the high conservation of other genetic elements between Rms163 and pPALA50 is notable (Figure S6).

IncP-10 plasmids
All the identified IncP-10 plasmids were hosted by P. aeruginosa.More than 50% of them contained ARGs for carbapenem (Table 1).Among the 49 plasmids in the IncP-10 group, the sequence lengths ranged from 29,402 to 106,679 base pairs (bp).Pangenome analysis using PIRATE identified 209 gene families across these plasmids, with 19 core genes present in 95-100% of the plasmids (29.0% per average gene numbers, 19/65.6)(Table S4), including putative genes for plasmid replication and maintenance (Table S5-8).

IncP-12 plasmids
Nine plasmids hosted by P. aeruginosa were categorised as IncP-12 plasmids (Table 1), with 165 core genes, which was a rather large ratio, 74.2% per their average gene numbers (165/222.4,Table S4), although several core genes encoded putative hypothetical proteins (Table S5 -10).This indicated that nine plasmids in IncP-12 had high genetic conservation, although none of them other than R716 carried any known ARGs.

IncP-15 (PromA)
A total of 42 plasmids were identified as IncP-15/PromA plasmids, which share 30 core genes, which was 50.3% of their average gene numbers (30/59.6,Table S4).These core genes included genes involved in replication and conjugative transfer (Table S5 -11).Interestingly, several of these plasmids do not carry any known accessory genes and exhibit very conserved genetic structures in different subgroups (75,77).

pSTY-like plasmids
pSTY-like plasmids exhibited significant diversity in their genetic content.Pangenome analysis using PIRATE identified 1,073 gene families across 15 plasmids in this group, with 66 core genes (Table S5-14).Some of these plasmids have been linked to clinically relevant ARGs, such as carbapenem and tigecycline resistance (83,103), making them important targets for further research.
The low sequence diversity of the RIPs is not suitable for the detection of remote homology between the RIPs of 15 PIncs.Thus, we extensively collected homologous sequences of RIPs through a two-step homology search process (Figure S2A).In the first step, we collected homologous proteins of the RIPs from the entire PLSDB plasmid dataset.We obtained 1,885 non-redundant RIPs with 90% identity, which were reclassified into 12 groups of homologous sequences (i.e., some PIncs were merged including RIPs of similar sequences; see Figure S2).In the second step, we collected homologous sequences of RIPs of the first step, from the comprehensive protein database UniParc, and obtained 6,948 non-redundant RIPs with 60% identity (87,818 sequences in total), consisting of 11 groups of homologous sequences (Figure S2).
We then analysed the homologous relationships between the 11 groups.Surprisingly, 10 groups exhibited homology with each other, with the exception of PG6, which includes IncP-6 plasmids.Based on the highly sensitive HMM-HMM comparisons using HHblits, we identified two supergroups (SG1 and SG2).The groups within each SG shared homology (>99% probability, evalue ≤1.5e-22) (Figure 5A).In addition, PG9, which includes IncP-9 plasmids, shared homology (90.1 -98.4% probability, evalue ≤ 8.8e-05) with all other groups except PG6.The solved and predicted crystal structures of SG1 RIPs, such as RepA from plasmid pPS10 (PG7), RepE from plasmid F (PG7), π from plasmid R6K (PG7), and TrfA protein from plasmid RK2 (PG1), contain two winged-helix domains (WH1 and WH2) (104).Although the crystal structures of PG9, PG10, and PG5 RIPs have not been reported, based on the alphafold3 predictions, these are predicted to have only one WH domain (Figure 2).We observed that the C-terminal WH domain (WH2) of SG1 RIPs shared homology with the WH domain of PG9 RIPs.WH2 of the RepA protein of pPS10 binds precisely to the iteron 3' end and is the major determinant of RepA sequence specificity (105).We therefore speculated that the conserved WH domain likely plays an important role in replication origin recognition.Here, we named these WH domain-containing RIPs "WH RIPs", which constitute a "WH RIP superfamily".
The remaining PG6 exhibited no homology with the other groups (<1% probability by HHblits).To characterise the evolutionary origin of PG6, we performed a HHblits search against Pfam v35.0.The RIPs of PG6 show significant homology to PF03090 (Replicase family; with 98.4% probability by HHblits), including the ColE2 Rep protein.The ColE2 Rep protein belongs to the superfamily of AEPs (106) and is the only plasmid-specific trans-acting factor required for initiation of plasmid replication (107).The RIPs of IncP-6 plasmids (9, 89) have a PriCT domain, which may form a compact module and play a central role in DNA unwinding.The C termini of some AEPs suggest that it probably plays a similar role in these proteins.The RIPs of PG6 plasmids have domain compositions different from those of the other groups.We therefore concluded that RIPs of PG6 have different evolutionary origins with the other groups.
We reconstructed the phylogeny of the WH RIPs using the identified homologous region of the WH domain.The homologous region showed similar secondary structure compositions including multiple alpha helixes and beta turns among the 10 groups of the WH RIPs (Figure 5B).A maximum-likelihood phylogenetic tree of the WH RIPs was reconstructed, which comprised 6,194 clustering representatives (60% identity) associated with 28,424 PLSDB plasmids and 66,527 Uniparc proteins (Figure 5C).The tree is subdivided largely according to the 10 groups with two special notes.First, PG7 (including IncP-7 plasmids) and PGsty (including pSTY plasmids) were merged and formed one large group because these are being composed of similar sequences.Second, PG5 (including IncP-5 plasmids) formed a subclade within the clade of PG10.We defined clades considering the group affiliations and branch support values.Most clades (i.e., PG5, PG9, PG16, PG3, PG15, PG4, and PG7/PGsty) were identified with two types of high support values (UFBoot2: 72-100%, TBE: 99-100%).The remaining (PG1 and PG10) clades were not highly supported by UFBoot2 but were supported by TBE (>99%).Considering the larger structure, we defined a superclade PG1/PG4/PG15, containing the PG1, PG4, and PG15 clades, and a superclade PG5/PG9/PG10, containing the PG5, PG9, and PG10 clades, both of which were supported with high support values (UFBoot2: 77%, TBE: 99.9-100%).Notably, the branch dividing the PG5/PG9/PG10 superclade and the remaining tree constitutes the well-supported division of SG1, which is composed of two WH domains, and SG2/PG9, which is composed of one WH domain.We thus speculated that this division is possibly the most ancestral split in the tree.
The tree includes diverse sequences not derived from PLSDB plasmids, indicating that numerous uncharacterised plasmids and incompatibility groups exist.In total, 75.4% (n = 4,669) of tree leaves were not associated with PLSDB plasmids (Figure 5C).Conversely, a substantial proportion of PLSDB plasmids (47.4%; n = 28,424) was associated with the remaining leaves of the tree (n = 1,525), indicating that these plasmids are likely replicated by WH RIPs.Considering plasmids with alternative replication mechanisms that employ disparate types of RIPs, such as the superfamily of AEPs (e.g., IncP-6) and rolling-circle replication (e.g., Pfam PF01719; Rep_OBD), this fraction is notably high, almost reaching 29,662 PLSDB plasmids identified with plasmid types using PlasmidFinder v2.1 (https://figshare.com/s/915a336484ee084ffbea;59,895 plasmid list).Notably, 9,666 plasmids lacking PlasmidFinder types were associated with the tree.In other words, besides the 2,357 PIncs plasmids, 8,860 additional plasmids were associated with the tree.This indicates that the tree can serve as a reference for the characterisation of plasmids, facilitating the identification and classification of RIP sequences.Furthermore, among the 1,525 leaves associated with PLSDB plasmids, 51 leaves are notably associated with a large number of plasmids, each linked to more than 100 plasmids, with a range from 102 to 6,872 plasmids, and collectively accounting for 32,713 RIPs, which represents 81.0% of the total 40,371 RIPs.This suggests a biased knowledge of plasmids containing similar RIPs.Subsequently, 9,141 plasmids (32.2%) were identified with multiple RIPs, suggesting that multi-replicons are not a special case of plasmids.
Each clade of the tree contains diverse taxonomic lineages, suggesting potential host transitions within the clade (Figure 5C).In contrast, some taxa are enriched in specific clades.For instance, the phylum Actinomycetota is enriched in PG15 (84.2%), whereas the phyla Bacillota, Bacteroidota, Campylobacterota, and Fusobacteriota are enriched in PG7/PGsty (90.0, 93.0, 95.0, 100%, respectively).Although this enrichment might be associated with some specific environment (e.g., gut microbe for the latter), further study is required to describe the detailed association between clades, lineages, and environments.
In summary, remote homology detection and phylogenetic reconstruction of the RIPs of PIncs and their homologs lead to the following findings: (i) except IncP-6, the RIPs of PIncs exhibit homology and are classified as members of the WH RIP superfamily, (ii) a well-supported branch in the phylogenetic tree divided WH RIPs into the PG5/PG9/PG10 superclade, which comprises one WH domain, and the remaining WH RIPs, which are composed of two WH domains, and (iii) the tree greatly expanded the plasmid universe through the identification and classification of the WH RIP superfamily.

CONCLUSION
This study provides valuable insights into the classification and genetic characterisation of Pseudomonas plasmids.We successfully determined the complete whole nucleotide sequences of several previously uncharacterised plasmids, including those in the Pseudomonas, namely the IncP-5, IncP-11, and IncP-12 groups, and reclassified plasmids that were previously thought to belong to 'IncP-13' plasmids as an ICE.Furthermore, we identified RIP genes associated with these plasmids, along with the IncP-2 plasmid, contributing to a more refined understanding of their replication mechanisms.
Our study also introduced four newly proposed plasmid groups, IncP-15, lncP-16, and pQBR103-like and pSTY-like groups, thereby significantly expanding the plasmid groups and contributing to the expansion and refinement of the Pseudomonas plasmid database.Among the extensive collection of 59,895 plasmids in the PLSDB, a subset (2,351 plasmids) was classified into these specific PIncs, which underlines the importance of updating and refining plasmid databases to reflect new discoveries.
Our findings underscore the strengths of our classification system, which is grounded in experimental data and aligns closely with established knowledge.In contrast, the recently launched IMG/Plasmid database, despite its comprehensiveness in terms of including a vast number of plasmid sequences (>700,000 sequences) (https://img.jgi.doe.gov/cgibin/plasmid/main.cgi) (108), employs plasmid taxonomic units (PTUs) based solely on average nucleotide sequence identity (109).Although this method is thorough in scope, it fails to capture the distinct characteristics of individual plasmid groups that have been documented through years of research as they are affected by long-sized accessory genes in plasmids.The experimental identification of genes and DNA regions involved in plasmid replication remains crucial for a deeper understanding of plasmid biology, including their host range, coevolution with bacterial hosts, and potential applications in biotechnology.These findings are particularly relevant for understanding the spread of ARG-associated plasmids in clinical and environmental Pseudomonas isolates, especially in the context of multidrug-resistant P. aeruginosa strains.
The reconstructed phylogenetic tree of WH RIPs will provide a fundamental basis for plasmid detection and classification based on RIPs, which are essential proteins for understanding plasmid replication mechanisms.Further research should focus on integrating experimental data with large-scale sequence analyses to reconcile discrepancies between traditional and modern classification systems.Additionally, experimental validation and phylogenetic analysis of RIPs and oriV in plasmids in plasmids with alternative replication mechanisms, including the superfamily of AEPs and rolling-circle replication, are essential.These efforts will enhance our ability to track and manage antimicrobial resistance in various environments, ultimately contributing to better public health outcomes.%-identities of all-against-all BLASTp among the 251 non-redundant RIPs were visualised.The heatmap and the dendrograms were generated using R v3.6.1 and R packages dendextend and gplots via CPAP v0.1.0(https://github.com/yosuken/CPAP)with a option ("--clust-method complete") to perform clustering with complete-linkage method.

Figure 3 .
Figure 3. Features of plasmids of PIncs in the PLSDB.Distribution of (A) plasmid sizes (kb) and (B) G+C contents (%), and (C) ratio of core and accessory genes of each PIncs plasmids in the PLSDB.(D) Relationship of each PIncs plasmids to other plasmid replicons and ARGs in the PLSDB.A sankey diagram was generated using the SankeyWidget library in Python through in-house programming.

Figure 5 .
Figure 5. Evolutionary relationships and phylogeny of WH RIPs.(A)Remote homology among 10 WH RIP groups, identified using HHblits.Two supergroups (SG1 and SG2) were defined according to the high probability (>99%) predicted within each supergroup.Relationships with more than 90% probability were indicated by lines.The WH2 domain of SG1 is homologous to the WH domain of SG2 and PG9.The domain lengths shown are arbitrary.(B) The secondary structures of representatives of the 10 WH RIP groups.The representatives were selected from the 3D structure-resolved sequences shown in Figure2.Namely, PG1 (RepP-1A (=TrfA1) of RK2), PG3 (RepP-3A of pRA1), PG4 (RepP-4A (=RepC) of RSF1010), PG7 (RepP-7A of pCAR1), PG15 (RepP-15A of pSN1104-11), PG16 (RepP-16A of pSB1216-29), PGsty (ReppSTY), PG9 (RepP-9A of pWW0), PG5 (RepP-5A of Rms163), and PG10 (RepP-10A of pPAB546) were selected.The secondary structures were predicted and visualised using iCn3D based on the 3D structures.(C) The ML phylogenetic tree of WH RIPs (6,194 leaves) calculated based on the conserved region (98 positions in the alignment) of the WH domain.The tree was rooted by the superclade of PG5/PG9/PG10.The clade and superclade names, bootstrap values, and number of leaves are shown in the legend.The consensus taxonomic affiliation of each leaf is shown in the outer ring.The number of leaves for each taxon is shown in the legend.The number of sequences (PLSDB plasmids and UniParc proteins) associated with each leaf is indicated by lines in the outmost rings.The line length represents the number of sequences but the upper limit in the visualisation is set to 100.

TABLES 1106 Table 1 . Lists of representative plasmids in the PIncs. 1107
(110)ification of MOB classes and MPF types was identified by the MOB-suite(110)with criteria of >=90% identity and >=60% coverage."-" indicates that the genes involved in conjugation have not been detected, whereas "NA" indicates that the classes and types of MOB or MPF were not identified although the plasmid was self-transmissible.b Core genes of each Inc group plasmid were identified by PIRATE analyses.c Partial DNA sequences are available.d pMG26 was shown to be not a plasmid but an integrative and conjugative element in this study. a