Genome analysis of Spiroplasma citri strains from different host plants and its leafhopper vectors

Spiroplasma citri comprises a bacterial complex that cause diseases in citrus, horseradish, carrot, sesame, and also infects a wide array of ornamental and weed species. S. citri is transmitted in a persistent propagative manner by the beet leafhopper, Neoaliturus tenellus in North America and Circulifer haematoceps in the Mediterranean region. Leafhopper transmission and the pathogen’s wide host range serve as drivers of genetic diversity. This diversity was examined in silico by comparing the genome sequences of seven S. citri strains from the United States (BR12, CC-2, C5, C189, LB 319, BLH-13, and BLH-MB) collected from different hosts and times with other publicly available spiroplasmas. Phylogenetic analysis using 16S rRNA sequences from 39 spiroplasmas obtained from NCBI database showed that S. citri strains, along with S. kunkelii and S. phoeniceum, two other plant pathogenic spiroplasmas, formed a monophyletic group. To refine genetic relationships among S. citri strains, phylogenetic analyses with 863 core orthologous sequences were performed. Strains that clustered together were: CC-2 and C5; C189 and R8-A2; BR12, BLH-MB, BLH-13 and LB 319. Strain GII3–3X remained in a separate branch. Sequence rearrangements were observed among S. citri strains, predominantly in the center of the chromosome. One to nine plasmids were identified in the seven S. citri strains analyzed in this study. Plasmids were most abundant in strains isolated from the beet leafhopper, followed by strains from carrot, Chinese cabbage, horseradish, and citrus, respectively. All these S. citri strains contained one plasmid with high similarity to plasmid pSci6 from S. citri strain GII3–3X which is known to confer insect transmissibility. Additionally, 17 to 25 prophage-like elements were identified in these genomes, which may promote rearrangements and contribute to repetitive regions. The genome of seven S. citri strains were found to contain a single circularized chromosome, ranging from 1.58 Mbp to 1.74 Mbp and 1597–2232 protein-coding genes. These strains possessed a plasmid similar to pSci6 from the GII3–3X strain associated with leafhopper transmission. Prophage sequences found in the S. citri genomes may contribute to the extension of its host range. These findings increase our understanding of S. citri genetic diversity.


(Continued from previous page)
Conclusions: The genome of seven S. citri strains were found to contain a single circularized chromosome, ranging from 1.58 Mbp to 1.74 Mbp and 1597-2232 protein-coding genes. These strains possessed a plasmid similar to pSci6 from the GII3-3X strain associated with leafhopper transmission. Prophage sequences found in the S. citri genomes may contribute to the extension of its host range. These findings increase our understanding of S. citri genetic diversity.
Characterization of spiroplasmas have been based on morphological and biological properties. However, because growth, metabolism, and DNA-DNA relatedness studies are time consuming, serological deformation tests and enzyme-linked immunosorbent assays have been used for identification of new spiroplasma groups in accordance to the International Subcommittee on the Taxonomy of Mycoplasmatales [2,11]. Recently, longread high-throughput sequencing technology and whole genome sequencing of bacteria have become costeffective and offers a precise method to differentiate bacterial species and strains that have highly repetitive regions in its genome [12,13].
The pathogen's wide host range and vector transmission serve as bottlenecks and drivers of genetic diversity of S. citri populations. Although S. citri exist in free living form in insect hemolymph and appropriate culture media, the pathogen in the vector must enter and move through the salivary gland and exit into the salivary duct and be expelled by the vector during probing and/or feeding in plants where S. citri infects host phloem tissue and exists intracellularly and is phloem-limited. Therefore, the objective of this study is to examine the genomes of S. citri collected from diverse hosts from different locations and times; and analyze the relationship between the genotype and phenotype of S. citri from citrus and horseradish (perennial crops); carrot and Chinese cabbage (annual crops); and from the BLH vector. The analysis was performed on whole-genome sequences of seven newly sequenced strains of S. citri and compared amongst each other, other S. citri strains, and spiroplasmas from other hosts. New insights in the evolution and diversity of S. citri is presented herein.

Genome assembly and annotation
Cultures of six strains of Spiroplasma citri were established and sequences reported previously (Table 1) [14,15] and a new strain, C5, is reported here. Briefly, S. citri strains C189 and LB 319 were isolated from the woody crop, citrus. BR12, CC-2, and C5 strains were isolated from the seasonal crops such as horseradish, Chinese cabbage, and carrot, respectively. BLH-13 and BLH-MB strains were isolated from the BLH. The complete genomes of the six strains were acquired using the longread technology, PacBio [14,15] and C5 was obtained using Nanopore sequencing technology. Sequences from each strain were assembled into single chromosomal contigs. Contigs that did not associate with the chromosome were designated as putative plasmids ( Table 2). The chromosome and plasmid status of each contig were further confirmed by BLASTn analyses against the GenBank database Release 236 (Supplementary Table  S1). The circular chromosome for all seven strains ranged from 1,576,550 to 1,742,208 bp, with an average G + C content of 25.4%. Total genome size ranged from 1,611,714 to 1,832,173 bp in strains isolated from plants and 1,968,976 to 2,155,613 bp in strains isolated from the BLH. Annotation of each contig was performed by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP), which predicted 32 tRNA genes, three rRNA genes and protein-coding genes which ranged between 1597 and 2232. Extrachromosomal DNAs, characterized as putative plasmids varied in all the strains viz., one or two plasmids from citrus, two plasmids from horseradish, three plasmids from Chinese cabbage, seven plasmids from carrot, and eight or nine plasmids from the BLH. Putative plasmid sizes ranged from 2047 bp to 135, 023 bp (Supplementary Table S1). Seven of the 32 plasmids identified in these seven strains could not be circularized and further research is needed to determine if they are linear or products of sequencing error or culturing conditions.

Phylogenomics
Molecular phylogenetic inference of 39 spiroplasmas was performed using 16S rRNA genes in the NCBI database. Analysis of this gene sequence indicated that S. citri strains are closely related, but not identical. The phylogeny inferred from the 16S rRNA gene shows that S. citri strains formed a monophyletic group with plant pathogenic S. kunkelii, S. phoeniceum, and a honeybee pathogen, S. melliferum (Fig. 1, Supplementary Table  S2). To facilitate a high-resolution comparison of S. citri strains, core genomes were analyzed for nine S. citri genomes available in NCBI. Using the orthoMCL pipeline, a total of 863 orthologous genes were identified as conserved among the S. citri strains. The 863 orthologous genes were concatenated, and a maximum-likelihood approach was employed to generate a S. citri phylogeny (Fig. 2). Phylogenetic analyses with the core orthologous sequences among the S. citri strains showed citrus strains C189 from southern California and R8-A2 from Morocco clustered together. CC-2, isolated from Chinese cabbage, and C5, isolated from carrot, clustered together. Strains LB 319, BLH-13, BLH-MB, and BR12 clustered together in a separate clade. There was clear separation of S. citri from S. kunkelii ( Supplementary  Fig. S1).

Comparative genomics
The circular chromosome of the seven S. citri strains from the U.S. was compared via BLASTn to the R8-A2 strain from citrus in Morocco as the reference  CP047437.1 -CP047446.1 [15] sequence. Visualization of these results was performed using the BLAST Ring Image Generator (BRIG). This genome level comparison among S. citri strains isolated from different sources showed a high level of homogeneity among each other and the reference genome, R8-A2 (Fig. 3). S. citri strains C189, LB 319, and BR-12, which were isolated from citrus and horseradish, appear most similar to R8-A2. A large region of dissimilarity near the middle of the chromosome is notable in the BLH-13 strain, ranging from 600 kbp to~800 kbp. These differences were not found in the BLH-MB strain, but some variability in this region can be seen in CC-2 and C5.
Dot-matrix pairwise sequence comparisons revealed a highly repetitive region at~1200 kbp (Supplementary Fig. S2). The repetitive region, marked in Fig. 3, is an area of higher dissimilarity in the chromosome among the S. citri strains analyzed in this study.
Pairwise whole genome comparisons were performed with S. citri strains BLH-13, LB 319, and CC-2 which were selected to represent the biological diversity in this study. This comparison revealed high genome similarity, higher numbers of shared genes, and limited genome re-arrangements as observed in the center region (Fig. 4a). In contrast,  genome comparison among different species of plant pathogenic spiroplasmas were examined using LB 319 from citrus, S. phoeniceum P40 from periwinkle, and S. kunkelii CR2-3X from corn. Here, S. citri, S. kunkelii, and S. phoeniceum showed significant differences in gene content, low level of genome similarity, and extensive genome rearrangements. S. kunkelii and S. phoeniceum also exhibited fewer regions of genome similarity and extensive genomic rearrangements (Fig.  4b).
Functional assignment of S. citri LB 319 protein-coding sequences Due to similarity of the chromosome of the seven S. citri strains studied, LB 319 was selected for further characterization. LB 319 had 1750 annotated protein coding sequences (CDS) and the functional classification of these protein coding sequences assigned only 553 CDS (32%) in different clusters of orthologous groups (COGs). The most abundant functional category was DNA replication, recombination and repair, followed by translation. These categories mainly consist of gDNA polymerases (dnaE, dnaN, dnaX, holA, holB, polC), nucleotide excision repair (uvrA, uvrB, uvrC), DNA topoisomerases (gyrA, gyrB, parC, parE), ribosomal proteins, and tRNA synthetases genes. Other important functional categories include translation (COG category K), nucleotide metabolism and transport (COG category F), and transcription (COG category O) (Fig. 6). Descriptive functional information of the genes is included in Supplemental Table S3. The low number of assigned COGs  [17]. S. citri strain R8-A2 was used as a reference. Each of the illustrated rings refers to one S. citri strain's chromosome, according to the listed coloration. White regions represent dissimilarity from the reference genome. A highly repetitive region, marked in this image, was revealed during dot-matrix pairwise sequence comparisons ( Supplementary Fig. S2). This area is marked based on the location of this region identified in the R8-A2 strain suggests that a large proportion of them may be fragments of unrecognized pseudogenes. Genes involved in mismatch repair like mutS, mutI, mutH, exoI, exoX, recI and genes involved in homologous recombination like recA, recB, recC etc. are missing or truncated.
Plasmid variability between S. citri strains Plasmids are genetic elements which may encode virulence factors and also play important roles in establishing host range [20][21][22][23]. Although a high level of similarity was found in the chromosome of S. citri strains, more diversity was found in the number of putative plasmids associated with these strains (Fig. 7a). For the sake of this study these putative plasmids are referred to as plasmids since the DNA in 25 of 32 plasmid-like contigs were circularized. Eight and nine plasmids were found in S. citri strains isolated from the BLH. Strains isolated from carrot and Chinese cabbage contained seven and three plasmids, respectively. S. citri strains isolated from citrus and horseradish possessed one to two plasmids. S. citri adhesionrelated proteins (ScARPs), which are expected to be involved in S. citri adhesion to insect cells [25,26], were  [18], based on whole-genome alignments with Lastz v. 1.0.4 aligner [19]. Both panels depict pairwise comparisons, expressed as percentage of nucleotide similarity, that connect different homologous genomic regions. Chromosomes are completely finished and pictured in blue. a S. citri strains BLH-13 isolated from the beet leafhopper (BLH), LB 319 isolated from citrus and CC-2 isolated from Chinese cabbage. b S. citri LB 319, S. phoeniceum P40 and S. kunkelii CR2-3X predicted in several plasmids by NCBI's Prokaryotic Genome Annotation Pipeline (PGAP). These ScARPs were present in one plasmid in BR12 and CC-2, two plasmids in C5 and BLH-MB, and three plasmids in BLH-13. No full-length ScARPs were predicted in C189 or LB 319 plasmids. S. citri strain C189, isolated from citrus in 1960, retained only one plasmid, pScp-C189-1. BLAST results revealed that this plasmid was highly similar to plasmid pSci6, identified in S. citri strain GII3-3X [24]. All strains analyzed in this study contained at least one plasmid with  6 Functional classification of protein-coding genes in S. citri LB 319. The functional categorization of each protein-coding gene was classified according to the COG assignments. The pie graph indicates the COG distribution statistics. Each color represents a COG functional description. a All annotated protein-coding genes in the S. citri LB 319 genome. b Protein-coding genes that could be assigned functional category very high similarity to pSci6. (Fig. 7b). BLH-MB possessed two plasmids which resembled pSci6.
Prophage prevalence in S. citri strains The PHASTER (PHAge Search Tool -Enhanced Release) web server was used to identify and annotate putative prophage regions within the S. citri genomes. Seventeen to 25 prophage-like elements were identified in the S. citri genomes ( Supplementary Fig. S3). Plasmid pScpLB319-2 in LB 319 and pScpC5-3 in C5 contained prophage sequences ( Supplementary Fig.  S3 B,E). Two plasmids in BLH-13 (pSciBLH13-1 and pSciBLH13-6) and three plasmids in BLH-MB (pSciBLHMB-1, pSciBLHMB-7, and pSciBLHMB-8) possessed prophage sequences ( Supplementary Fig. S3 F,G). No prophage sequences were predicted in plasmids from C189, BR12, or CC-2. A large proportion of these prophage sequences were homologous to plectrovirus SpV1 [27] and SVTS2 [28] (Supplementary Table S4). While multiple plectrovirus SpV1 sequences were found to be distributed throughout the chromosome, a concentrated area of SVTS2 sequences was found at approximately 1.2 Mbp. This region of the chromosome was found to have highly repetitive sequences in all S. citri genomes in this study as shown by dot-matrix pairwise sequence comparisons (Fig. 3, Supplementary Fig. S2). Further characterization of the prophage region in the chromosome and plasmids will be examined in the future.
Putative virulence components in S. citri S. citri does not have specialized protein secretion apparatus, such as the type II and type III secretion systems. S. citri utilizes a sec-dependent protein export pathway. In S. citri LB 319, secY (GMI18_RS01645), secA (GMI18_RS01115), ftsY (GMI18_RS00765), ffh (GMI18_ RS02240), secE (GMI18_RS01245), and yidC (GMI18_ RS10485) are the genes involved in the sec-dependent pathway and are conserved across S. citri strains. Several sequences in plasmid pScp-C189-1 were homologous to genes associated with type IV secretion systems, including Mob-like transmembrane proteins, TraG proteins, and proteins containing TraM recognition domains and type IV secretion system DNA-binding domains. This plasmid is conserved across S. citri strains. Fructose operon genes are major pathogenicity determinants in S. citri [29,30]. fruR (GMI18_RS08785), fruA (GMI18_ RS08780) and fruK (GMI18_RS08775) are genes present in the fructose operon and were present in all the S. citri strains analyzed in this study. fruR codes for bacterial transcriptional regulators of carbohydrate catabolic operons. fruA codes for the permease of the phosphoenolpyruvate:fructose phosphotransferase system. This fructose permease allows uptake and concomitant phosphorylation of fructose into fructose-1-phosphate. fruK codes for a 1- Fig. 7 Analysis of Spiroplasma citri plasmids. a Number of plasmids from each of the S. citri strains analyzed in this study. Each bar represents one strain and is labeled with the host that strain was collected from. b BLASTn results of conserved S. citri plasmid. Whole genome comparison of conserved S. citri plasmid visualized by BLAST Ring Image Generator (BRIG) [17]. Plasmid pSci6, identified in S. citri strain GII3-3X [24], was used as a reference. Each of the illustrated rings refers to S. citri plasmids identified in the seven other strains analyzed in this study that were closely related to pSci6. Lighter regions represent dissimilarity from the reference plasmid. Outermost ring denotes coding sequences and is labeled with gene names. pA-pN represent hypothetical proteins, as named in GenBank accession AJ969074 [24] phosphofructokinase, which uses ATP to phosphorylate fructose-1-phosphate into fructose-1,6-bisphosphate [30].

Discussion
Prior to advances in long-read sequencing, the best assembly of S. citri was strain GII3-3X, which contained 39 chromosomal contigs [24]. The first sequence of S. citri containing a single chromosomal contig was reported in 2017 for Moroccan S. citri strain R8-A2 T [31]. Complete genomes, with single chromosomal contigs, of six more strains of S. citri isolated in the United States were reported in 2020 [14,15] and a seventh strain, C5, isolated from carrot, is reported in this study. Chromosome sizes of these genomes were similar in size to strain R8-A2 T (~1.6 Mbp), although most of the U.S. strains were slightly larger in size. Additionally, the number of predicted protein coding regions was higher in U.S. strains compared to the R8-A2 T genome. One set of rRNA genes and 32 tRNA genes were predicted in U.S. strains, which is consistent with the R8-A2 T strain [31].
Phylogenetic analyses between S. citri strains showed citrus strains C189 from southern California and R8-A2 from Morocco clustered together. C189 was originally obtained in 1957 from Washington Navel in Riverside, California by graft transmission to sweet orange seedling [4] and has been continuously maintained in the greenhouse. R8-A2 was originally obtained from Washington Navel in Morocco by graft transmission to sour orange [32]. Therefore, both strains were isolated and cultured from greenhouse citrus hosts many years after initially acquired and sequenced. Strains LB 319, BLH-13, BLH-MB, and BR12 clustered together. In these cases, S. citri was isolated and cultured directly from these field sources and sequenced. CC-2 and C5, which originated from annual hosts with similar row crop ecologies, clustered together in a separate clade. Moreover, the analysis of core orthologous genes suggested that strains from citrus were very closely related and BLH strains were more closely related to citrus strains than those from Chinese cabbage or carrot. S. citri strain GII3-3X, isolated from C. haematopceps, did not group with strains isolated from BLHs from California, but this may be due to the incomplete genome sequence of GII3-3X.
A high level of homology and synteny was observed between the S. citri strains in this study, with some dissimilar regions and genomic rearrangements appearing in the center region of the chromosome. However, comparison with the most closely related species, S. phoeniceum and S. kunkelii, showed that the chromosomal organization is largely rearranged and exhibits much lower levels of sequence similarity. S. phoeniceum is associated with lethal yellows in periwinkle [33], while S. kunkelii causes corn stunt disease of Zea mays L. [34].
Large rearrangements were also observed when comparing S. citri to S. melliferum, a honeybee pathogen [35]. When comparing gene content, S. citri strains shared approximately 80% of the homologous gene clusters observed, with about 3% of gene clusters being unique to each strain. However, S. citri strain LB 319 shared approximately 64% of gene clusters with S. phoeniceum and S. kunkelii, while about 23% of gene clusters were unique to S. citri. These differences may be caused by differential gene loss, phage-mediated horizontal gene acquisition, and by ecological and biological diversification [35,36].
Moreover, some of this variation may be due to variation in prophage sequences [37], which are viral or phage genomic DNA sequences integrated into a bacterial genome. S. citri is highly susceptible to viral invasion, due to its lack of a cell wall [38]. Between seventeen and twenty-five areas of the genome were predicted to contain prophage insertions in the S. citri genomes studied which contributed to the variations in chromosome size. Most of these sequences observed in S. citri are homologous to SpV1, a plectrovirus, and were dispersed throughout the genome [37]. PHASTER analysis demonstrated that SpV1 was prevalent throughout the chromosome of all seven S. citri strains, but the number and positions were inconsistent (Supplemental Fig. S3). SpV1 viral sequences have been associated with major variations of the S. citri genome [39]. These viral sequences, integrated into the Spiroplasma chromosome, can have a large effect on genome stability. A model for the evolution of the Spiroplasma genome has been linked to viral invasion, which could account for intraspecific genome size variation, low conservation in chromosomal organization, and a gain of lineage-specific genes [36]. The rearrangements and genome instability are apparent in pairwise comparisons of S. citri strains, and even more so when compared to their closest relatives, S. phoeniceum and S. kunkelii. Viral invasions likely promoted these rearrangements in plant pathogenic bacteria and could attribute to their adaptation to different hosts. In contrast to SpV1, multiple copies of phage sequences homologous to SVTS2 were present in one specific, repetitive region in all the S. citri strains studied (Supplementary Fig. S3). Integration of SVTS2 viral sequences into the chromosome of S. citri has been associated with the resistance of S. citri to spiroplasma virus SVTS2 [40]. This may be why SVTS2 sequences are not prevalent throughout the genome and repeating elements containing SVTS2 sequences are conserved across strains.
S. citri does not have specialized protein secretion apparatus like the Type II and Type III secretion systems, but instead utilizes the sec-dependent protein export pathways S. citri does not have T4SS. Components of the T4SS are present and highly conserved in S. citri strains analyzed in this study. However, their importance in virulence has yet to be verified. T4SSs are related to bacterial conjugation systems and secrete substrates into a wide range of target cells, including other bacteria and eukaryotic cells. T4SSs are known to play a role in pathogenesis in a wide range of bacteria by genetic exchange and delivery of effector molecules to target cells [41].
Fructose operon genes are major virulence components in S. citri, which utilizes fructose for pathogenicity and growth in plants [29]. The fructose operon consists of three genes: fruR, fruA, and fruK. S. citri mutants of fruR, which likely encodes for the putative regulator protein of the fructose operon, abolished expression of all three genes of the operon. This mutant exhibited severely impaired pathogenicity, which could be restored when the mutant was complemented with functional fructose operon genes [30]. This work proposed an explanation of the role of fructose utilization in the pathogenicity of S. citri, suggesting that companion cells in the plant host compete for fructose. S. citri uses fructose as a carbon and energy source, resulting in a reduced fructose concentration in the plant companion cells, modifying the distribution of photoassimilates, leading to disease symptoms [30].
Every S. citri strain studied had a unique pattern of extrachromosomal DNA and the presence or absence, position (free or integrated), and number of these sequences is a significant source of variation among strains [42]. After several years of maintenance in plants, S. citri strain BR3-G showed chromosomal rearrangements compared to strain BR3-T, which was transmitted from plant to plant by the BLH [39]. Strain BR3-G was found to be non-transmissible by the BLH which was correlated to a large deletion of a SpV1-related transposase gene [39]. Prolonged cultivation of bacteria has been reported to cause free plasmid DNA to be integrated into the chromosome through recombination events [43][44][45]. Strains isolated from BLH examined in this study contained two or three plasmids with predicted prophage sequences. The strains isolated from plants exhibited one or no plasmids with predicted prophage sequences. The plasmids that contain viral sequences were homologous to those in the chromosome of S. citri strain R8-A2. It is plausible that after transmission by the BLH to plants, plasmids containing viral sequences could be incorporated by recombination with chromosomal plectroviral sequences. This would result in fewer plasmids in perennial plants compared to annual plants, which was observed in this study.
Plasmids of phytoplasmas and spiroplasmas are known to be involved in insect transmissibility [46][47][48][49]. All the newly sequenced S. citri strains contained at least one plasmid with high homology to plasmid pSci6, identified in S. citri strain GII3-3X [24]. pSci6 plasmid encodes P32 protein, associated with insect transmissibility, and this plasmid confers insect transmissibility into nontransmissible strains of S. citri [50]. P32 has been suggested to interact with surface membrane proteins and may be necessary, but not sufficient for spiroplasma adhesion and invasion of insect cells [51]. In strain GII-3, eight proteins belonging to the ScARP protein family, which are expected to be involved in S. citri adhesion to insect cells, were encoded by five plasmids [24][25][26]. In previous studies, a loss of the high-molecular-mass plasmids carrying ScARP genes was correlated with a nontransmissible phenotype [51]. ScARP genes were found to be present in plasmids isolated from perennial crops and the leafhopper vector in this study. These plasmids are highly similar to pSci2 and pSci5 from S. citri strain GII3-3X and pBJS-O from S. citri strain BR3-3X [24,52] (Supplementary Table S1). Interestingly, no fulllength ScARPs were predicted in C189 or LB 319 plasmids. The lack of additional plasmids containing ScARP genes, seen in strains obtained from perennial crops, suggest these strains could have lost their ability to be transmitted by the BLH.
Several other genes have been assigned putative functions in plasmid pSci6. This included soj-parA, which is involved in DNA partitioning, and traG and mob, which are associated with DNA transfer and are suggested to be involved in a Type IV secretory pathway [24]. A coding region was also identified in plasmid pScp-C189-1, which has homology to a plasmid replication-relaxation (relaxase) family protein. Plasmids in S. citri have been suggested to be horizontally transferred between cells by conjugation [24]. Several important genes are at the core of plasmid conjugation, including Type IV coupling proteins and relaxases [53]. The traG and mob genes found in pSci6 correspond to VirB4/D4 components of the type IV secretion pathway, which allows for the translocation of DNA through cytoplasmic membranes [54]. Walled bacteria require many components in their conjugation system; however, these components may not be necessary in S. citri, a wall-less bacterium. In addition to traG and mob, Saillard et al. suggested that pSci6 should contain a relaxase, but this family of proteins was not reported [24]. The replicationrelaxation (relaxase) family protein is essential for plasmid replication and plasmid DNA relaxation, part of conjugative DNA transfer in bacteria [55][56][57]. A BLAST search of the coding sequences in pScp-C189-1 revealed homology to a replication-relaxation family protein. This further supports previous studies that reported genetic exchanges by a conjugation-like process in S. citri [58]. The genes identified in this plasmid encode for virulence-associated proteins involved in adhesion and conjugal DNA transfer. The occurrence of this persistent, conserved plasmid suggests it plays an important role in this pathogen.

Conclusion
Six Spiroplasma citri genomes were published recently, but were not fully analyzed [14,15]. Those sequences, along with C5, a strain de novo assembled in this study, greatly expanded the availability of S. citri genomes and allowed performance of extensive in silico comparative genomic studies that provide insights into this organism's genetic diversity and evolution. An extremely high level of homogeneity was observed in the chromosomal contigs across S. citri strains. Variation in plasmid number may play an important role in insect transmission and virulence. Moreover, differences in genome size and stability appear to result from variations in number and site of plectroviral sequences inserted into the genome. These features likely contribute to S. citri adaptation to different hosts and transmissibility by leafhopper vectors. Further studies will be necessary to validate the roles of plasmids and viral insertion sequences in S. citri strains isolated from various hosts.

Strain isolation and DNA preparation
Cultures of S. citri strains CC-2, C189, BR-12, LB 319, BLH-13, and BLH-MB were isolated, and DNA extracted in a previous study [14,15]. S. citri strain C5 was collected in 2005 from carrot growing in NW Bakersfield, California. Briefly, S. citri was isolated and grown in LD8 medium [59]. Later the cultures were triple cloned and stored at − 80°C until further use. Cultures were re-established and total genomic DNA was extracted by CTAB [60] or by DNeasy Blood and Tissue Extraction kit (Qiagen). Collection details of S. citri in this study are listed in Table 1.
Whole-genome shotgun sequencing S. citri strain C5 was sequenced on the Oxford Nanopore (Oxford, United Kingdom) MinION platform [61]. The library was prepared with Oxford Nanopore (Oxford, United Kingdom) Rapid Barcoding Kit-SQK-RBK004 according to the manufacturer's specifications. Data was collected using MinION Release 19.10.1. Bases were called using Guppy v.3.4.5 and the adapter screening and quality filtering of raw sequencing data were performed using Fastp v 0.20.0 [62]. Remaining S. citri strains' genomes were sequenced previously on PacBio (Menlo Park, CA, USA) RS II platform [14,15].

De novo sequencing assembly
Sequences from S. citri strains CC-2, C189, BR-12, LB 319, BLH-13, and BLH-MB were reported previously [14,15]. Briefly, raw reads were filtered, subreads were established by PacBio, and assembled into contigs using Canu 1.8 [63]. For S. citri strain C5, contigs were assembled using Canu 1.8. An additional polishing step was performed by medaka v 1.0.3 (Oxford Nanopore Technologies) and frame-shiftcorrected by DIAMOND v 0.9.28 [64] and MEGAN v 6.18.4 [65], following the pipeline described by Arumugam et al. [66]. Approximately 500 bp segments from each end of a contig were used for BLASTn search to check the contig singularity. Appropriate reads connecting both ends were used for enclosure. The chromosome and plasmid status of each contig were further confirmed by BLASTn analyses against the GenBank database. The genome sequence data was deposited in the NCBI database (accession numbers shown in Table 2). Annotation of each contig was performed by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) [67].

Phylogenetic analyses
The 16S rRNA sequences of thirty-nine Spiroplasma species were obtained from the NCBI database. Sequence alignments were carried out with the PRANK alignment tool [68]. Maximum-likelihood approach was used to reconstruct the phylogenetic tree using RAxML software [69]. Bootstrapping was performed with 1000 replicates. The resulting phylogeny was visualized with FigTree [16].
Orthologous genes of S. citri isolates were predicted using the OrthoMCL v. 2.0 pipeline [70]. All-versus-all BLASTN (E value < 10 − 5 , alignment coverage > 50%) comparison of all gene sequences for each species was performed and orthologous genes were clustered by OrthoMCL v. 2.0. Multiple sequence alignment was done with PRANK v. 170,427 [71]. The sequence alignments were concatenated by FASconCAT v. 1.1, yielding a gene super-matrix [72]. Maximum-likelihood approach was used to reconstruct the phylogenetic tree using RAxML v. 8.2 software with 1000 bootstrap replicates [69]. The resulting phylogeny was visualized using FigTree v. 1.4.3 [16].

Bioinformatics analysis
Large genome comparison of eight S. citri sequences was computed and visualized with the use of BLAST Ring Image Generator (BRIG) v 0.95 [17]. Pairwise genome alignment was achieved by the Lastz v. 1.04 program [19]. The results were visualized using AliTV v. 1.0 [18]. Shared and genome-specific genes were identified between the S. citri strains isolated from different sources and among S. citri LB 319, S. kunkelii CR2-3X, and S. phoeniceum P40. The sequence similarity search step in the OrthoMCL analysis was conducted at the nucleotide level [70]. Functional annotation of COG was done using eggNOG-mapper [73]. Prophage sequences were predicted using PHASTER online server [74]. 1957 [4]. Mention of trade names or commercial products in this publication is solely for providing specific information and does not imply recommendation or endorsement by the USDA. USDA is an equal opportunity provider and employer.
Authors' contributions RKY, GV, FO and DP conceptualized the study and acquired the S. citri cultures. Funds for this research were from grants to RKY and GV. DP, YM, and VS cultured the pathogens and purified the DNA. FO supervised the PacBio submissions. AE and TD cultured, purified DNA, and provided raw sequence data for S. citri strain C5, JC provided technical assistance for genome assembly and bioinformatics, SH provided some technical assistance. RR and ST conducted data curation and analysis. RR, ST and RY wrote the manuscript and DP, SH, JC, GC, and GV edited the manuscript. All author(s) read and approved the final manuscript.

Funding
This work was supported by USDA 2034-22000-013-10R and two grants from the Citrus Research Board: Project 5300-191 to R. Yokomi; and Project 5100-153 to G. Vidalakis. G. Coaker and S. Thapa were supported by grants from the USDA (2019-70016-29796, 2016-70016-24833). Additional support for this work was provided in part by the Citrus Research Board (Project 6100 and 5300-179), the USDA National Institute of Food and Agriculture, Hatch Project 1020106 and the National Clean Plant Network (AP17PPQS&T00C118 and AP18PPQS&T00C107). The sequencing was carried out at the DNA Technologies and Expression Analysis Cores at the UC Davis Genome Center, supported by NIH Shared Instrumentation Grant 1S10OD010786-01. The funding bodies played no role in the design of the study and collection, analysis or interpretation of data and writing of the manuscript.

Availability of data and materials
The data described in this study can be freely and openly accessed from the NCBI database. All sequence data has been deposited in GenBank under the accession numbers CP042472-CP042475, CP0426368-CP046373, CP047426-CP047446, and CP053304-CP053311. The versions described in this paper are the first versions. PacBio sequencing reads for BioProject PRJNA558054 have been deposited in the NCBI Sequence Read Archive (SRA) under accession number SRR9903453 for CC-2. PacBio sequencing reads for BioProject PRJNA591027 have been deposited in the NCBI SRA under accession numbers SRR10843927 for C189; SRR10507068 for LB 319; SRR10507067 for BR12; SRR10507065 for BLH-13; and SRR10507066 for BLH-MB. Nanopore sequencing reads for BioProject PRJNA625113 have been deposited in the NCBI SRA under accession number SRR11536473 for C5. S. citri cultures for BLH-13, BLH-MB, BR12, LB 319, and CC-2 have been deposited in ATCC no. SD-7532-7535 and SD-7278, respectively. However, ATCC Biorepository Service Agreement 2019-BRS-00049 maintains confidentiality of information regarding these cultures. C189 and C5 cultures are available from the corresponding author.