Genome Analysis of Two Pseudomonas syringae pv. aptata Strains with Different Virulence Capacity Isolated from Sugar Beet: Features of Successful Pathogenicity in the Phyllosphere Microbiome

ABSTRACT Members of the Pseudomonas syringae species complex are heterogeneous bacteria that are the most abundant bacterial plant pathogens in the plant phyllosphere, with strong abilities to exist on and infect different plant hosts and survive in/outside agroecosystems. In this study, the draft genome sequences of two pathogenic P. syringae pv. aptata strains with different in planta virulence capacities isolated from the phyllosphere of infected sugar beet were analyzed to evaluate putative features of survival strategies and to determine the pathogenic potential of the strains. The draft genomes of P. syringae pv. aptata strains P16 and P21 are 5,974,057 bp and 6,353,752 bp in size, have GC contents of 59.03% and 58.77%, respectively, and contain 3,439 and 3,536 protein-coding sequences, respectively. For both average nucleotide identity and pangenome analysis, P16 and P21 largely clustered with other pv. aptata strains from the same isolation source. We found differences in the repertoire of effectors of the type III secretion system among all 102 selected strains, suggesting that the type III secretion system is a critical factor in the different virulent phenotypes of P. syringae pv. aptata. During genome analysis of the highly virulent strain P21, we discovered genes for T3SS effectors (AvrRpm1, HopAW1, and HopAU1) that were not previously found in genomes of P. syringae pv. aptata. We also identified coding sequences for pantothenate kinase, VapC endonuclease, phospholipase, and pectate lyase in both genomes, which may represent novel effectors of the type III secretion system. IMPORTANCE Genome analysis has an enormous effect on understanding the life strategies of plant pathogens. Comparing similarities with pathogens involved in other epidemics could elucidate the pathogen life cycle when a new outbreak happens. This study represents the first in-depth genome analysis of Pseudomonas syringae pv. aptata, the causative agent of leaf spot disease of sugar beet. Despite the increasing number of disease reports in recent years worldwide, there is still a lack of information about the genomic features, epidemiology, and pathogenic life strategies of this particular pathogen. Our findings provide advances in disease etiology (especially T3SS effector repertoire) and elucidate the role of environmental adaptations required for prevalence in the pathobiome of the sugar beet. From the perspective of the very heterogeneous P. syringae species complex, this type of analysis has specific importance in reporting the characteristics of individual strains.

(e.g., beneficial microbes), plant-pathogenic bacteria have to mitigate biotic and abiotic stressors by developing a set of mechanisms involved in motility, nutrient uptake, intermicrobial competition, and host plant manipulation (2). Phytopathogenic bacteria harbor a diverse weaponry of virulence factors, including the type III secretion system (T3SS) and its effectors, phytotoxins, phytohormones, ice nucleation activity (INA), plant cell wall-degrading enzymes, and exopolysaccharides (3).
Pseudomonas syringae inhabits diverse ecological niches and copes with different environmental conditions (4). Its prevalence in the leaf microbiota and the pathogenic potential of P. syringae are achieved chiefly via features such as (i) ubiquity, (ii) a broad host range, (iii) a diversity of virulence/competition strategies, and (iv) versatile genomic/ metabolic features involved in host colonization (5). In the vocabulary used for pathogen diagnosis, P. syringae is characterized as a species complex comprised of several closely related Pseudomonas phytopathogenic species, which are similar at the genetic level (6). Based on pathogenic characteristics, more than 60 different pathovars have been described thus far (7). However, some experts emphasize the need to reconsider this terminology (8). Focusing on P. syringae pv. aptata (Ptt), the causative agent of leaf spot disease, frequent disease reports and the occurrence of new emergences recorded in recent years indicate that this bacterium is a potential candidate for future severe beet and chard field epidemics (9)(10)(11)(12).
The P. syringae pv. aptata strains P16 and P21 were reported as causative agents of leaf spot disease on sugar beet in Serbia (13). In our previous studies, greenhouse experiments showed a difference in the two strains' pathogenic potential (virulence and host range) (8,14). The P16 strain was characterized as a mildly low virulent strain with a narrow host range, while the P21 strain was depicted as one of a group of highly virulent strains with a very broad host range. The present study aimed to perform genomic profiling of these two plant pathogens by evaluating the genes that encode features involved in strategies necessary for survival and virulence capacity in the phyllosphere microbiome. This allows us to test the hypothesis that the different virulence associated gene repertoire of these two strains is responsible for the evident difference in pathogenicity detected in planta. Numerous research focused on the molecular biology and pathogenesis of plant-pathogenic bacteria, and the determinant factors in host-pathogen interactions have opened the door to a new era in bacterial disease management (15). Despite the increasing number of disease reports in recent years worldwide, there is still a lack of information in the literature about the genomic basis of pathogenic traits of P. syringae pv. aptata. To the best of our knowledge, this is the first in-depth analysis of P. syringae pv. aptata genomes for the purpose of elucidating overall features responsible for putative prevalence in the phyllosphere microbiome.

RESULTS AND DISCUSSION
General features of Pseudomonas syringae pv. aptata P16 and P21 draft genomes and their phylogenomic status within the Pseudomonas syringae complex. To understand the differences between P. syringae pv. aptata isolates P16 and P21 and their phylogenetic status within the P. syringae species complex, we compared their genomes with the genomes of 100 strains from the NCBI database. The genomes selected for analysis were closely related strains from phylogenetic group (PG) 02, which consists of ubiquitous, heterogeneous, and highly pathogenic P. syringae strains. We also included in the overall comparison publicly available genomes of representative strains from PGs 01, 03, 04, 07, 09, 10, and 13 isolated from various environmental sources. According to NCBI data, a general comparison of genome sizes shows that the analyzed P. syringae strains have genome sizes in the range of 5.77 to 6.87 Mbp. Comparison of genome sizes within PG02 showed that the isolate P. syringae pv. aptata P21 is in the group of strains with the largest genome size (6.35 Mbp), similar to another strain of the same P. syringae pv. aptata pathovar, namely, DSM50252 (6.36 Mbp). Strains P16 and P21 had far more tRNA (56) and rRNA genes (9) than other strains (tRNA 29 to 44; rRNA 3 to 4) from the same pathovar (Table 1). The general features and assembly statistics of the draft genomes of P. syringae pv. aptata strains P16  and P21 and other 100 P. syringae genomes included in the phylogenomic comparison are listed in Table 1. To validate annotation results and assess annotated gene homology, we searched for the blast hits between two genomes from our strains and reference strain P. syringae pv. syringae B728a, as a widely used model system and member of PG02. We revealed 5402 CDSs hits between P21 and B728a, and 4,181 CDSs hits between P16 and B728a with a percentage identity above 80%. The CDS were classified into different groups (COG categories) based on their roles in the cell (Table 2). Among CDSs with recognized function, 597 CDSs for the P16 strain and 653 CDSs for the P21 strain did not have any COG category. Phylogenetic studies based on multilocus sequence analysis (MLSA) of several housekeeping genes have made a significant contribution by determining the status of particular pathovars and strains and by establishing the existence of 13 phylogenetic groups within the P. syringae complex (16)(17)(18). Previous phylogenetic analysis revealed that P. syringae pv. aptata belongs to the 02 phylogenetic group (and subgroup 02b), consisting of ubiquitous strains with strong epiphytic phases (16). Phylogenomic studies within the P. syringae complex provided a thorough analysis of the P. syringae complex using publicly available genome data (7,19,20). Six phylogenomic branches were distinguished based on 139 draft and complete genomes of P. syringae species complex, grouping the P. syringae pv. aptata DSM 50252 strain in the phylogenomic branch I, mainly consisting of members belonging to the PG02 phylogenetic group (7).
Average nucleotide identity analysis. To estimate the average nucleotide identity between P. syringae pv. aptata P16 and P21 and other members of the P. syringae complex, pairwise alignment of genome sequences and similarity analysis were performed. Regarding the threshold, three different clades with a similarity level of at least 95% were observed within the strains of PG02 (Fig. 1). Isolates P16 and P21 were grouped in clade 3 with other P. syringae pv. aptata strains isolated from sugar beet and strains designated as  (21,22). Still, we observed certain inconsistencies in clear boundaries between previously obtained phylogenetic/phylogenomic groups and subgroups and proposed distinctive pathovars. Although taxonomic and phylogenomic analyzes of the P. syringae species complex were the focus of numerous studies, the divergence of approaches that use deposited genomic data could lead, to a certain extent, to noncompliance with previous similar studies. Many isolates from the databases have been named without full description, or their identifications haven't been updated and revised upon the novel and modern techniques development (7). This suggests the need for comprehensive harmonization of P. syringae genomic databases to provide solid genomic-based identification and comparison. Furthermore, our results show the great necessity to reconsider the genome-driven revision of P. syringae pathovars classification. Future development of more powerful genomic methods and analyzes will allow us to elucidate the epidemiology and disease etiology of P. syringae. Pangenome analysis. The pangenome of two isolates from this study and 100 P. syringae strains from the NCBI database contained 55,220 genes, of which 596 core, 1,083 soft-core, 4,968 shell, and 48,573 cloud genes ( Fig. 2A). Core genes include highly conserved genes with phylogenetic information, while accessory genes (shell and cloud) are a flexible part of the genome (7). The obtained core genome comprised only 3.04% of the total pangenome, while a high percentage of accessory genes (97%) was observed. With more sampled genomes of P. syringae, the number of shell and cloud genes is expected to increase (6). The rarefaction curve of core/total genes showed that the number of core genes rapidly declined until 30 genomes were added, and subsequently curve remains fairly constant, while the number of total genes in the pan-genome continues to increase almost linearly (Fig. 2B). Additionally, the number of unique genes constantly increased with more sampled genomes (Fig. 2C). The frequency of genes within a whole-genome set showed that 35,000 genes were strainspecific (Fig. 2D). P. syringae strains have formed four separate clusters on the phylogenetic tree, and the distribution of accessory genes varied among the strains (Fig. 2E). According to core genome sequence analysis, the P16 and P21 were closely related and belonged to the same cluster with four other P. syringae pv. aptata strains, all originated from sugar beet. In contrast, P. syringae pv. aptata G733 was clustered in a separate clade as other P. syringae pv. aptata strains, along with P. syringae strains similar to ANI-clade 2 (Fig. 2E), suggesting that strain G733 was misidentified as P. syringae pv. aptata. The observed extensive accessory genome revealed a high degree of variability among different P. syringae strains (Fig. 2E), which could be associated with their ability to survive in diverse ecological niches and the extensive horizontal gene transfer throughout the P. syringae complex (6). On the other hand, core genome components are less likely to undergo horizontal gene transfer, which makes them more reliable for tracking the evolutionary history (23). A high percentage of the flexible genome (93.71%) was also previously observed in the pangenome analysis of 127 P. syringae genomes (7).
Pathogenic features of P. syringae pv. aptata P16 and P21-plant colonization traits. Colonization of the phyllosphere and population fitness of the foliar plant pathogens represents critical factors for the successful development of epiphytic and endophytic life stages. To colonize (survive or infect) host plants, P. syringae has evolved numerous adaptations and features. Flagellar-dependent or -independent motility plays a crucial role in coping with variable environmental conditions and diverse interactions with plants (24). Swarming represents flagellar-mediated motility, which is required for movement on solid surfaces. It is an important trait for successfully colonizing the habitat by spreading a biofilm in different pathogenic varieties of the P. syringae complex (25). Based on the report of the presence/absence of genes from the pangenome analysis, we found that all P. syringae strains tested, including the P. syringae pv. aptata strains, contained genes putatively encoding flagella maintenance (the flh, flg, and fli genes) and the che chemotaxis genes (Fig. 3). In addition, P16 and P21 strains possess genes involved in swarming motility and twitching (flagellar-independent) motility. In particular, P16 contains genes that encode the swarming motility protein and the swarming regulation sensor protein (swrC and rssA), while only the rssA gene for swarming regulation was detected in P21 and other Ptt strains tested (Fig. 3). Interestingly, swarming motility can be regulated by different environmental factors. Swarming in P. syringae pv. syringae B728a is light-mediated via histidine kinase (LOV-HK) as a positive regulator in response to blue light and a bacteriophytochrome (Bphp1) as a negative regulator in response to red/far-red light (26). Also, a study of thermal-mediated regulation of swarming motility in PssB728a showed that swarming is suppressed at 28 to 30°C, thereby affecting optimal growth temperature (27). All tested Ptt strains have a myriad of histidine kinases, while P16 and P21 also possess Bphp1, suggesting the possibility of light regulation of swarming in P. syringae pv. aptata.
Twitching motility represents type 4 pili (T4P)-mediated (flagellar-independent) movement, primarily used to move across wet and solid leaf surfaces (28). The assembly and functionality of T4P require around 40 genes that encode the major structural proteins involved in elongation and retraction (29). In the P. syringae complex, the best-studied organism with active twitching motility is P. syringae pv. tabaci 6605 (30,31), although many other pathovars have T4P. The draft genomes of all analyzed P. syringae strains, including P. syringae pv. aptata, revealed a set of pil genes involved in the assembly, extension, attachment, and retraction of T4P and the pilT gene coding for the twitching motility protein (Fig. 3). That strongly suggests that P. syringae pv. aptata could use twitching motility to succeed in habitat colonization. Once reaching a comfortable area, bacterial pili allow adhesion and the initiation of biofilm formation, while the bdlA gene plays a key role in biofilm dispersion in P. syringae pv. aptata. It should be noted that swarming motility and T4P components are co-related with biofilm formation. As a colonization factor in plant-pathogenic bacteria, the ability to form a biofilm requires the production of different biofilm matrix exopolysaccharides (EPSs) (32). In the draft genomes of neither of the strains from this study did we find putative genes for EPSs, except in the case of several P. syringae strains, including P16, which possesses only genes (acsAC) for cellulose synthases (Fig. 3). This putative trait of the P16 strain could be linked with low virulence activity in previous greenhouse testing, because it has been shown that cellulose reduces virulence in the P. syringae pv. syringae UMAF0158 strain (33). Presumably, cellulose could play a role in slowing down the transition from the epiphytic to the endophytic phase for P. syringae strains (34).
Pathogenic features of P. syringae pv. aptata P16 and P21-plant virulence traits. Plant pathogens use various mechanisms to regulate and express a large facet of genes required for virulence, manipulating the plant immune system and producing secondary metabolites. Virulence itself requires the engagement of genes encoding proteins that have a direct impact on host cell death (through T3SS and toxin production, ice nucleation activity) or mechanisms that create a virulence-like environment (uptake of g -aminobutyric acid [GABA], alginate production). We performed a predictive machine-learningbased analysis to identify T3SS effectors in 102 P. syringae strains, including P16 and P21 strains. We were able to reveal the repertoire of 78 effectors in all strains, with 15 to 25 effectors detected in most strains (Fig. 4). The effectors in 90% of the analyzed strains studied were HopJ, HrpZ, HopAK1, AvrE1, HopAH2, HopAA1, HopI1, HrpK, HopAG1, and HopM1. In general, the P. syringae effectorome contains more than 70 effector families, of which only three, AvrE, HopM, and HopAA, are encoded in the conserved region of the canonical pathogenicity island and appear to be present in multiple strains of the P. syringae complex (35). The highest number of T3SS effectors (39 effectors) in our analysis was detected in the genome of the DC3000 strain. Only five other strains, including two Ptt strains (ICMP11935 and P21), had more than 25 effectors. On the other hand, strains isolated from environmental sources (nonplant associated) have a small effector repertoire (e.g., only three known effectors were detected in strains UB246 and CC1582).
Pseudomonas syringae pv. aptata was previously established as a pathovar with a small repertoire of T3S effector genes (36). In the comparative genomic study of 19 P. syringae pathovars, P. syringae pv. aptata strain DSM50252 contained only 12 T3S effector biosynthetic genes (36). Our analysis revealed 16 previously known effectors in DSM50252, whereas 29 effectors were detected in ICMP11935 (Fig. 4). Of all the strains compared, the effectors AvrPphF and HopAT1 were detected only in ICMP11935. Overall, analysis of all six P. syringae pv. aptata strains isolated from sugar beet revealed the presence of 32 different T3SS effectors, indicating a significantly larger repertoire of effectors than previously known. The genomes of Ptt strains, including P16 and P21 strains, contain T3SS effectors from conserved effector locus (AvrE1, HopM1, HopAA1) and 13 other effectors (HopAK, HrpZ, HopJ, HrpA, HopAH2, HopC1,  HopH1, HopZ3, HopAC1, HopI1, HopAZ1, HopBA1, and HopBF1). We detected nine coding sequences for T3SS effectors in the genome of strain P21 (HopAI1, HopAH1,  HopAG1, AvrRpm1, HrpK, HopAW1, HopAU1, HopW, and HopAV1), which were absent in the genome of strain P16 (Fig. 4). Furthermore, we identified genes for three effectors in the genome of strain P21 (AvrRpm1, HopAW1, and HopAU1), which were not detected in any other P. syringae pv. aptata strain or known to be part of P. syringae pv. aptata effectorome thus far. The AvrRpm1 detected in strain P21 matches coding sequences of effectors secreted by P. syringae pv. syringae B728a strain with identity 99.54%. This effector was present only in genomes of strains from PG02, mainly in pv. syringae. On the other hand, the effector HopAU1 was matched with the same coding sequences of P. syringae pv. theae with an identity of 99.84% and was also detected in strain CFBP2116 from PG03. The HopAW1 was found only in seven other strains of PG02. The effector HopAW1 matches with multiple different P. syringae pathovars with more than 99% identity. The AvrRpm1 has a role in the phosphorylation of RIN4 protein and suppression of PAMP-triggered immunity and effector-triggered immunity (37). HopAW1 still has an unknown biochemical function. However, it belongs to the HopAS1 family (http://www.pseudomonas-syringae.org/hop_used_names.htm), which strongly impacts the virulence of P. syringae pv. actinidae and effector-triggered immunity in Arabidopsis (38). Interestingly, the HopAS1 effector was not present in any of the strains from PG02 (Fig. 4). HopAU1 in P. syringae pv. phaseolicola is characterized as an effector involved in the late stage of infection with the main role in modifying the apoplast into a replication-permissive niche, while DhopAU1 strains couldn't reach wild-type growth level in plant apoplast (39). Additionally, the DhopAU1 showed a constant level of expression during the early stages of infection and also altered expression levels corresponding to transcriptional regulators involved in response to environmental changes and induction of virulence factors such as phytotoxins and T3SEs (21). Future investigations of the genomes or secretomes of P. syringae pv.
FIG 4 T3SS effectors repertoire detection by Effectidor analyzes. P. syringae strains were ordered as in the ANI analysis. Strains marked in yellow, gray, and orange colors belong to ANI clade 1, 2, and 3 of PG02, respectively. Other strains are representatives of PGs 13, 03, 09, 07, 04, 01, and 10. Certain effectors presence is indicated by blue color (dark blue for pv. aptata strains) and white blocks indicate its absence.
Pseudomonas syringae pv. aptata Genome Microbiology Spectrum aptata strains should complement available data about the T3S effector repertoire obtained in draft genomes and their actual secretion in apoplast-like growth conditions. We revealed 29 putative T3SS effectors for strain P21, among which 26 are already known and confirmed by BLASTp analyzes of their protein sequences listed in Fig. 4. Three putative proteins, which did not have BLAST hits with already known T3SS effectors for P. syringae complex, are pantothenate kinase, VapC endonuclease, and RAQPRD integrative conjugative protein. They could represent candidates for novel T3SS effector identification. For strain P16, predictive T3SS effector analysis showed 20 putative effectors, among which 16 effectors were confirmed by BLASTp analysis (Fig. 4). The other four proteins represent putative effectors, such as VapC endonuclease, phospholipase, pantothenate kinase, and pectate lyase. All coding sequences for putative novel effectors had high AUPRC scores in the range of 0.646 (VapC endonuclease) to 0.957 (Phospholipase). In addition, Effectidor analysis revealed approximately 30 putative novel effector-like proteins (AUPRC score above 0.5), suggesting that there is a much broader repertoire of effectors within the P. syringae complex than previously known.
All tested P. syringae strains, including P16 and P21 and other P. syringae pv. aptata strains have the inaZ gene for ice nucleation activity, the virulence sensor histidine kinase (phoQ), and virulence master regulator complexes (algU/rpoE, csrA1A2, gacA, and bvgAS) (Fig. 3). In addition, it should be noted that siderophores represent virulence factors important for the pathogen's survival in an iron-limited environment. Genes encoding the receptors for siderophores such as ferripyoverdin, ferripyochelin, ferrichrome, ferrienterobactin, ferric-anguinibactin, and ferric-pseudobactin BN8/BN7 were found in the genomes of all tested strains.
We did not detect any putative phytotoxins with the Prokka annotation. However, BLAST atlas analysis of P16 and P21 strains and reference strain B728a, a known producer of phytotoxins and closely related strains from the same phylogroup of the P. syringae complex, revealed the presence of the syrD gene encoding syringomycin in P16 and P21, with 98.24% and 97.06% identity with syrD in B728a, respectively. Genes for syringopeptin (sypA, sypB and sypC) were also detected in both Ptt strains and validated with BLAST hits against B728a with high identity (85% to 97%).
All analyzed 102 P. syringe genomes, including P. syringae pv. aptata strains possess sequences for the production of the plant hormone auxin (indole-3-acetic acid; IAA) (iaaM), and an enormous majority possess the gene for utilization of GABA (gabP) (Fig. 3). Auxin plays an important role in plants, affecting processes such as cell division, elongation, and fruit development, but overproduction and a high level of IAA can also promote disease susceptibility (40). Auxin-mediated sensitivity to disease caused by P. syringae pv. tomato DC 3000 was demonstrated to arise due to inhibition of host defenses (e.g., suppression of salicylic acid signaling network) (41). Draft genomes of the examined Ptt strains revealed coding sequences for GABA utilization (gabP) by GABA permease (Fig. 3). The plant defense mechanism of producing GABA, the most abundant amino acid in the apoplast, acts as a response to inhibiting T3SS expression (42). Utilization of GABA by GabP permease could directly attenuate the induction of plant defenses mechanisms in hosts where delivery of T3SSEs leads to host immunity responses triggered by effectors (43). Another important factor for plant disease development is motility-mediated virulence in P. syringae (44). Deletion of master regulators for swarming, swimming, or twitching motility of various P. syringae pathovars resulted in a reduction or a low level of the virulence potential (31,45). We demonstrated that draft genomes of the examined Ptt strains contain various motility coding sequences, suggesting that strong motility-mediated colonization could be an important factor affecting the virulence of P. syringae pv. aptata. This is in accordance with results obtained from the swarming motility assay, where both strains (P16 and P21) performed positive swarming motility (8).
Conclusions. Analysis of draft genomes of two P. syringae pv. aptata strains in comparison with 100 other related Ptt genomes reveals key features of this pathovar required for dissemination in the phyllosphere of sugar beet. In our previous studies performed in greenhouse conditions, the P16 and P21 strains were established as distinct regarding their pathogenic potential. Evaluation of their draft genomes clearly demonstrates the high similarity of the repertoire of putative features required for disease occurrence. However, the T3SS effector analysis revealed notable differences in the repertoire of T3SS effectors between P16 and P21 strains, which can be responsible for such huge divergences obtained in greenhouse experiments and virulent phenotypes among strains. The T3SS effector analysis provides valuable data about confirmed effector repertoire as well as putative and novel T3SS effectors, suggesting the greater role of T3SS in P. syringae pv. aptata virulence and targeting the specific proteins which could be further investigated as candidates for novel effectors.

MATERIALS AND METHODS
Library preparation and genome sequencing. Two strains of P. syringae pv. aptata, P16 and P21, were cultured on King's B medium (Titan Biotech Ltd.) and incubated at 27°C overnight, while DNA extraction for genome sequencing was performed using the Zymo BIOMICS Miniprep kit (Irvine, CA 92614, USA). Extracted DNA was sent to the CosmosID company (Rockville, MD, USA) for genome sequencing and annotation. Libraries were prepared using the Ion Xpress Plus Fragment Kit (Thermo Fisher Scientific) and quantified with Qubit (Thermo Fisher Scientific). Clonal amplification, purification, and library loading on an Ion 540 chip (Thermo Fisher Scientific) were performed using the Ion Chef System together with the Ion 540 Kit-Chef (Thermo Fisher Scientific). Sequencing with read lengths of 200 bp was carried on the Ion S5 XL System (Thermo Fisher Scientific). The genome sequences of P. syringae pv. aptata P16 and P21 were made publicly available at the NCBI GenBank (Acc. No. JAHCZG000000000.1 and JAHDTA000000000.1).
Genome annotations. Raw single-end reads were trimmed and processed using a BBDuk from BBtools v. 36.49 with a reading quality trimming parameter of 20 for isolates (46). The trimmed fastQ files were assembled using SPAdes v. 3.9.0 with the -careful parameter, while all other parameters were default (47). The resulting contigs were filtered for length using reformat from BBtools v. 36.49 and only contigs that were at least 500 bp were retained. The genome size of P16 and P21 was calculated by counting k-mer frequency of the raw read data and compared with the raw read data of other P. syringae strains available from NCBI. The Jellyfish v. 2.2.6 tool was used for counting of k-mers (48). CheckM's v. 1.0.13 was used with default parameters to generate a genome bin plot (49). The genome-wide annotation was performed with Prokka, which uses the Prodigal tool (50) to identify the coordinates of candidate genes and for the prediction of coding sequences, compares them with large databases with known sequences: NCBI1 blastp, UniProt, RefSeq and a series of hidden Markov model (HMM) profile databases, including Pfam and TIGRFAMs (51). The HMM is performed using hmmscan from the HMMER 3.1 package (52). The CDS were classified into different groups based on their roles in the cell, with reference to orthologous groups (COGs; http://www.ncbi.nlm.nih.gov/COG/). Lastly, for the identification of antimicrobial resistance (AMR) genes and virulence factors (VFs), the assembled genomes were screened against the Resfinder (53), AMR and VFDB (54). AMR and VF genes were considered present if their sequences matched with the assembled genome at .90% nucleotide identity and .60% alignment coverage of the gene's sequence length.
Comparative genomics. The genome sequences were compared to other genomes of strains/pathovars from the P. syringae species complex ( Table 1). The used strains belong mostly to the PG02, closely related to P. syringae pv. aptata and consisting of ubiquitous and virulent strains with a great potential to cause plant disease epidemics. Besides strains from PG02, the representative strains of PGs 01, 03, 04, 07, 09, 10, and 13 were also included in the comparative analysis. For the ANI analysis, statistics were first calculated for all 102 genomes using the assembly-stats (https://github.com/sanger-pathogens/assembly -stats) package, and three genomes (strain code: NZIPFR-VIR1, WSPS007, and DSM50252) were excluded that had a value of N50 , 10 kbp, which was the necessary minimum for FastANI analysis. The genome comparisons were made between all pairwise combinations of strains using FastANI (22), and all strain pairs were tested using the "many to many" method in FastANI and by using the "-matrix" option. Earlier studies have shown that an ANI value of 94% to 96% corresponds to the recommended DNA-DNA hybridization species cutoff of 70% (55). ANIclustermap v1.2.0 (https://github.com/moshi4/ANIclustermap) was implemented to visualize the results.
Pangenome analysis was performed using Roary v3.13.0 (56) with previously annotated selected 102 genomes with Prokka annotation as input files in GFF format. We used all default parameters (56), except the parameters group limit which is increased to 60,000 clusters. The results were visualized using the script roary_plots.py. and create_pan_genome_plots.R software of the Roary package. The genes were classified as the core if found in .99% of genomes, while genes present in 95% to 99%, 15% to 95%, and less than 15% were considered soft-core, shell, and cloud genes, respectively. The approximately maximum likelihood phylogenetic tree was built using the core_genome_alignment.aln file in FastTree 2.1 software (57). The pan-genome was constructed from inputted genome assemblies and then determined the gene set present in each genome assembly. That gene set (output file: gene_ presence_absence) was used for reporting results about gene differences involved in pathogenic features among P16 and P21 strains and all other P. syringae strains. Genes encoding T3SSEs were excluded from this analysis and were subjected to independent analysis. Phyton (v. 3.8.10) was used for image generation in the Jupyter Notebook tool. BLAST atlases were generated by Gview server (https://server .gview.ca, [58]) by carrying out genome-wide blastn searches in order to compare P16 and P21 to the genome of reference strain B728a (GenBank accession no. NC_007005). Homologous genes on each genome reporting a BLAST hit above the threshold cut-off (80% identity, minimum HSP length of 100 bp, and expected value of 1e 210 ) were considered a valid match.
Type III secretion effectors identification. The type III secretion system effectors for all 102 genomes are predicted and identified using the Effectidor web server (https://effectidor.tau.ac.il/) prediction pipeline, which represents a new machine-learning-based prediction tool (59). The input file represented genome annotations, and the effector input file consisted of an annotated T3SS effector from genome annotations, which trained the pipeline for T3SS effector detection. A total of 51 obligatory features were extracted from a mandatory input file, including all the bacterium DNA ORF sequences in a FASTA format. These features include the GC content, protein length, relative frequencies of amino acids in the full protein and the N-terminal region, homology to known T3Es in other bacteria and the analyzed strain, etc. The area under the precision-recall curve (AUPRC) was used as a scoring method, while results were obtained using 0.5 as a threshold for identifying effectors. AUPRC value is sensitivity and specificity value that machine learning software utilizes to accurately classify components of a set into two precisely divided categories and indicates if the component represents T3S effector or not. The AUPRC score reflects inference precision averaging over all possible cutoffs. The AUPRC of Effectidor on these data were 1.0. Predicted effectors were subsequently blasted using NCBI BLASTp tool (https://blast.ncbi.nlm.nih.gov/Blast.cgi) to validate their detection.