Core-Genome Multilocus Sequence Typing for Epidemiological and Evolutionary Analyses of Phytopathogenic Xanthomonas citri

ABSTRACT Xanthomonas citri subsp. citri is the cause of bacterial citrus canker, responsible for major economic losses to the citrus industry. X. citri subspecies and pathovars are responsible for diseases in soybean, common bean, mango, pomegranate, and cashew. X. citri disease has been tracked using several typing methods, but recent studies using genomic sequencing have been key to understanding the evolutionary relationships within the species, including fundamental differences among X. citri subsp. citri pathotypes. Here, we describe a core-genome multilocus sequence typing (cgMLST) scheme for X. citri based on 250 genomes comprising multiple examples of X. citri subsp. citri pathotypes A, A*, and Aw; X. citri subsp. malvacearum; X. citri pv. aurantifolii, pv. fuscans, pv. glycines, pv. mangiferaeindicae, pv. viticola, and pv. vignicola; and single isolates of X. citri pv. dieffenbachiae and pv. punicae. This data set included genomic sequencing of 100 novel X. citri subsp. citri isolates. cgMLST, based on 1,618 core genes across 250 genomes, is implemented at PubMLST (https://pubmlst.org/organisms/xanthomonas-citri/). GrapeTree minimum-spanning tree and Interactive Tree of Life (iTOL) neighbor-joining phylogenies generated from the cgMLST data resolved almost identical groupings of isolates to a core-genome single nucleotide polymorphism (SNP)-based neighbor-joining phylogeny. These resolved identical groupings of X. citri subsp. citri pathotypes and X. citri subspecies and pathovars. X. citri cgMLST should prove to be an increasingly valuable resource for the study of this key species of plant-pathogenic bacteria. Users can submit genomic data and associated metadata for comparison with previously characterized isolates at PubMLST to allow the rapid characterization of the local, national, and global epidemiology of these pathogens and examine evolutionary relationships. IMPORTANCE Xanthomonas citri is a plant pathogen that causes major economic losses to the citrus industry and sweet orange production in particular. Several subspecies and pathogens are recognized, with host ranges including soybean, common bean, mango, pomegranate, and cashew, among others. Recent genomic studies have shown that host-adapted X. citri subspecies and pathovars and X. citri subsp. citri pathotypes form distinct clades. In this study, we describe a core-genome multilocus sequence typing (cgMLST) scheme for this species that can rapidly and robustly discriminate among these ecologically distinct, host-adapted clades. We have established this scheme and associated databases containing genomic sequences and metadata at PubMLST, which users can interrogate with their own genome sequences to determine X. citri subspecies, pathovars, and pathotypes. X. citri cgMLST should prove to be an invaluable tool for the study of the epidemiology and evolution of this major plant pathogen.

B acterial citrus canker has a major economic impact on the production of all commercial citrus crops, including oranges, limes, tangerines, lemons, and grapefruit. Three pathotypes of canker are recognized: A, B, and C. Type A, caused by Xanthomonas citri subsp. citri, is the most widespread and economically damaging, whereas types B and C, caused by X. citri pv. aurantifolii, have much-reduced virulence on sweet orange and have very limited geographical spread (1). The type A (2, 3) pathotype has the broadest host range and infects most economically important citrus plants worldwide, particularly causing a major economic burden on the South American and Californian orange industries (2,4). Two variants of pathotype A have evolved: A*, which can cause canker on all citrus but with some isolates that can infect only key lime (Citrus aurantifolia), and A w , which infects only key lime and alemow (Citrus macrophylla) (1).
Whole-genome sequencing has greatly advanced the study of the epidemiology and evolution of pathogenic bacteria, greatly improving the discriminatory power and portability of other approaches such as ribotyping or pulsed-field gel electrophoresis (7). Genomic sequencing and analysis tools, developed primarily for the study of human bacterial pathogens to track and investigate outbreaks of disease caused by particularly virulent or antimicrobial-resistant clones, can also be usefully employed for the study of bacterial plant disease epidemiology and evolution.
Whole-genome sequences from isolates of pathogenic bacteria are usually compared using SNP (single nucleotide polymorphism)-based approaches that involve whole-genome alignments. Such SNP-based approaches have been used in recent studies of Xanthomonas citri biology (8,9); however, they involve identifying genomes of isolates from the literature and downloading their sequences, followed by the computationally intensive alignment of multiple genomes to generate SNP profiles, which are then used to produce phylogenetic trees using methods such as neighbor joining (NJ), maximum parsimony, or maximum likelihood. Core-genome multilocus sequence typing (cgMLST) uses whole-genome sequence data to examine genetic similarities between isolates. It is based on allelic variations at a large number of core-genome loci that are present in all, or nearly all, members of a species (10). It differs from other whole-genome sequencing approaches in that it does not include noncore, accessory genes in comparisons of genomes, and it examines variation in allelic profiles rather than core-genome SNPs. In addition, cgMLST is computationally efficient, scalable, and suited for the representation of very large numbers of genomic comparisons. cgMLST schemes have been established for a diverse range of human pathogens, and some schemes contain many thousands of genomes. For example, the curated, open-source database PubMLST (https://pubmlst.org/) contains genomic data and metadata for 655,340 genomes of .100 bacterial species, and the EnteroBase database (https://enterobase.warwick .ac.uk) contains 379,370 Salmonella and 237,066 Escherichia coli/Shigella genomes and corresponding metadata alone (as of 23 November 2022).
In this study, we describe a cgMLST scheme and website resource that can be used to rapidly and easily identify X. citri subsp. citri variants from genome sequences without the need for computationally intensive and time-consuming core-genome SNP extraction, genome alignment, and phylogenetic comparisons. The X. citri cgMLST database at https:// pubmlst.org/organisms/xanthomonas-citri represents an invaluable resource for tracking the spread of pathovars of this devastating pathogen, which should also prove to be a useful, scalable tool in future national and international efforts to control citrus canker and other crop diseases.

RESULTS
Genome sequencing. Assemblies from each X. citri isolate consisted of between 61 and 161 contigs, with N 50 values of between 96,324 and 1,044,915 nucleotides (nt) and an average depth of coverage of 102Â (range = 31Â to 900Â). For all isolates, more than 99% of reads were mapped to the family Xanthomonadaceae using Kraken (11).
rMLST. Ribosomal MLST (rMLST) confirmed the species designations of the 250 X. citri isolates listed in Table 1 as well as 20 other Xanthomonas spp. and the 4 other species examined and listed in Table S1 in the supplemental material. Figure 1 shows a neighbor-joining tree of all 274 genomes in this study based on the 53 concatenated rRNA gene loci used in the rMLST scheme. It can be clearly seen that all X. citri isolates form a separate and distinct clade whose closest neighbors are genomes of other X. citri pathotypes and subspecies. From this analysis, the Xanthomonas species X. vasicola and X. perforans appear to be the most closely related to X. citri, with the genomes of other xanthomonads such as X. euvesicatoria being separated by greater genetic distances. Example genomes of E. coli, Pseudomonas aeruginosa, Xylella fastidiosa, and Stenotrophomonas maltophilia are separated by even larger genetic distances from the genomes of Xanthomonas spp., including X. citri.
cgMLST. A total of 1,618 core genes (present in .99% of isolates) were found among 250 X. citri isolate genomes. These genes were numbered XCIT00001 to XCIT01618. Allele calling of the subset of the initial 100 records (from study isolates) resulted in isolates having between 99.4% and 100% of their loci with alleles designated. Core-genome MLST (cgMLST) groupings of the 250 genomes uploaded to the PubMLST website were made based on the number of allelic mismatches. This resulted in 171 groups of genomes with 5 or fewer mismatches (isolates tagged as Xc_cgc_5 on the PubMLST website), 113 with 10 or fewer mismatches (Xc_cgc_10), 53 with 50 or fewer mismatches (Xc_cgc_50), 39 with 100 or fewer mismatches (Xc_cgc_100), and 25 with 200 or fewer mismatches (Xc_cgc_200).
X. citri phylogeny. A neighbor-joining tree of concatenated MLST allelic sequences of the 250 X. citri isolates is shown in Fig. 2. This phylogeny was generated using the Interactive Tree of Life (iTOL) plug-in on the PubMLST website. It clearly distinguishes individual X. citri pathovars (colored), with the genomes of isolates belonging to the same pathovar being grouped, although some subgroupings are evident. This is most marked for X. citri pv. fuscans isolate genomes, which are represented by three clades that include isolates from previous studies by Alavi et al. (12) and Aritua et al. (13). These correspond to isolates from three lineages originally named X. citri pv. phaseoli and X. citri pv. phaseoli GL 1 and X. citri pv. phaseoli GL fuscans GL2 and GL3. Genomes from isolates of X. citri pv. aurantifolii resolve as two main clades, with the smaller group showing more genetic similarity     to members of the X. citri pv. anacardii group than to other members of X. citri pv. aurantifolii. Overall, this phylogeny displays a high degree of congruence with a neighbor-joining phylogeny based on core-genome SNPs, with the same groupings of genomes and only superficial differences in the tree structure (Fig. S1). A minimum-spanning tree generated using GrapeTree based on cgMLST allele data shows groupings identical to those made using both methods (Fig. 3).
The times taken to generate phylogenies based on core-genome SNPs using kSNP3 and MEGA 11 (36 h), concatenated allelic sequences using iTOL (15 min), and MLST allele data using GrapeTree (4 min) varied considerably. All tests were run on a 2020 3.6-GHz 10-Core Intel Core i9 iMac with 16 GB RAM.

FIG 3
Minimum-spanning tree based on 250 core-genome allelic profiles of Xanthomonas citri. Isolates/groups are colored according to pathovar/ subspecies. Pathotype A*, pathotype A w , and the divergent pathotype A clade containing five genomes, including that of isolate LH37-1, are shown. The phylogeny was generated using the GrapeTree (23) plug-in on the PubMLST website (https://pubmlst.org/). associated metadata, including 100 novel X. citri subsp. citri isolates sequenced in this study.
A neighbor-joining phylogeny based on concatenated allele sequences generated using the iTOL plug-in on the PubMLST website had a structure very similar to that of a core-genome SNP-based NJ tree but was generated within several minutes, compared to the .36 h required to generate an SNP-based phylogeny that depended on multiple-whole-genome alignment. This is important because as the number of sequenced X. citri genomes deposited in public databases increases, the computational resources required to generate SNP phylogenies de novo will become greater.
We used GrapeTree, implemented at PubMLST, to generate and display minimumspanning trees of study data, and this plug-in can quickly and clearly display phylogenies colored according to metadata such as country of origin, date, host species, pathovar, and X. citri subsp. citri pathotype. The phylogenies generated using each of the methods used here, SNP-based NJ and iTOL and GrapeTree phylogenies based on concatenated cgMLST locus sequences, were largely congruent, with very similar groupings (see Fig. S1 in the supplemental material). However, GrapeTree, with its ability to easily and quickly display very large genomic data sets such as those present in EnteroBase, is eminently scalable as data sets grow, unlike core-SNP-based methods, which are more computationally intensive and time-consuming.
Neighbor-joining phylogenies based on concatenated rRNA gene sequences were generated in this study from genomes uploaded to the PubMLST website. This tree delineated the 21 different Xanthomonas species and 4 other more distantly related ones, including E. coli and P. aeruginosa. The rRNA gene is automatically applied to PubMLST genome data and serves as a further check of species designations for uploaded genome data.
The cgMLST scheme implemented on the PubMLST website for X. citri will, we hope, be an increasingly useful tool for the study of the epidemiology and evolution of the major cause of citrus canker, X. citri subsp. citri, but should also be of benefit for the study of other plant-pathogenic X. citri subspecies and pathovars included in this study as well as those not yet included in the database.

MATERIALS AND METHODS
Bacterial isolates. A total of 101 X. citri subsp. citri isolates were obtained from Fundecitrus, Araraquara, São Paulo, Brazil, an association maintained by citrus growers and juice manufacturers from the State of São Paulo to conduct research, education, and implementation of citrus crop protection. Isolate 306, corresponding to the previously sequenced genome of strain 306 (14), was resequenced as part of this study, resulting in the sequencing of 100 novel isolates. These were sampled from citrus plants from 15 different countries and included 75 isolates from Brazil; 4 from South Korea; 3 each from Argentina and the United States; 2 each from China, New Zealand, and Paraguay; and 1 each from Australia, Fiji, France, India, Iran, Mauritius, Taiwan, Thailand, and Uruguay. One isolate's country of origin is unknown. Details of the isolates are shown in Table 1. These isolates were all pathotype A isolates from sweet orange, with the exception of two pathotype A* isolates from key lime. Study bacteria were isolated between 1979 and 2015. Data on the year of isolation were not available for 31 of the 100 isolates.
Genomic DNA sequencing. Genomic sequencing was performed by MicrobesNG (University of Birmingham) from pure culture material stabilized in DNA/RNA Shield buffer (Zymo Research, CA, USA). Genomic DNA libraries were prepared using Nextera XT library prep kits (Illumina, San Diego, CA, USA). Libraries were sequenced using Illumina sequencers (HiSeq), using a 250-bp paired-end protocol. Reads were adapter trimmed using Trimmomatic 0.30 with a sliding window quality cutoff of Q 15 (15) and scanned using Kraken (11) to confirm species identity. De novo assembly was performed on samples using SPAdes version 3.7 (16).
A further 150 X. citri genome sequences, downloaded from the European Nucleotide Archive (ENA), were included for analysis, including 65 X. citri subsp. citri isolates (comprising 12 pathotype A, 14 pathotype A*, and 10 pathotype A w isolates according to their cited literature sources), 9 X. citri pv. aurantifolii isolates, 37 X. citri pv. fuscans isolates, 12 X. citri pv. glycines isolates, 12 X. citri subsp. malvacearum isolates, 3 X. citri pv. mangiferaeindicae isolates, 3 X. citri pv. viticola isolates, 2 X. citri pv. vignicola isolates, and 1 isolate each of X. citri pv. dieffenbachiae and pv. punicae. Details of all X. citri genomes included in this study are shown in Table 1. In addition, 24 genomes representing single examples of 20 different Xanthomonas spp. and single examples of Stenotrophomonas maltophilia, Escherichia coli, Xylella fastidiosa, and Pseudomonas aeruginosa were downloaded from GenBank. Details of these isolates are shown in Table S1 in the supplemental material.
Core-genome MLST. Complete coding sequences were identified in the finished genome assembly of strain 306 (14) using Prokka (17) with default settings. These were used in Roary (18) to identify 1,618 genes found in all 250 genomes. A BIGSdb database for X. citri was set up on the PubMLST website (19), with loci being defined for each of the identified core genes and named using an XCIT prefix and a fivedigit identifier, ranging from XCIT00001 to XCIT01618. The database was seeded with the coding sequence found in strain 306 for each of these loci defined as allele 1. Allelic variants found in the 100isolate locally sequenced data set were then identified using the BIGSdb allele caller, with thresholds of 98% identity over 98% of the alignment length compared to reference alleles. A further round of allele calling using the same parameters and all previously identified alleles as references was performed, followed by manual scanning to identify more variable alleles containing small indels. The database was then expanded to include all 250 isolates and alleles identified as described above. Start codon positions were adjusted in nine loci as the codon identified in the reference genome was not found consistently across the data set, whereas an alternate consensus start codon was identified nearby. Core-genome sequence types (cgSTs) were defined automatically by BIGSdb for profiles with fewer than 50 missing loci. Single-linkage cluster schemes were set up within the database to identify related isolates using a range of locus mismatch thresholds (200, 100, 50, 25, 10, and 5 locus mismatches).
Ribosomal MLST (rMLST) (20), implemented on the PubMLST website, confirmed the species identity of all 274 study isolates. It was also used to generate concatenated rRNA gene sequences for phylogenetic analysis. As rMLST examines allelic variation at 53 universal rRNA genes, it is ideally suited for the rapid phenotypic analysis of genomes of different species.
Phylogenetic trees. In common with previous studies of X. citri evolution and epidemiology, we generated phylogenies based on core-genome SNPs using a reference genome. We used the finished genome of X. citri pv. citri strain 306 (14) as a reference and kSNP3 v3.12 (21) to generate fasta nucleotide files of SNPs, which were used in MEGA 11 (22) to generate NJ trees. Genomic DNA sequence data and their associated metadata can be analyzed using a variety of methods implemented on the PubMLST website. Here, we generated minimum-spanning trees from allelic profiles using GrapeTree (23). Neighbor-joining trees based on concatenated nucleotides of cgMLST loci were generated using Interactive Tree of Life (iTOL) (24). Both the GrapeTree and iTOL plug-ins are implemented on the PubMLST website (https://pubmlst.org/). A neighborjoining tree was constructed for the 250 X. citri study isolates and 24 other species listed in Table S1. This was generated from the 53 concatenated rRNA gene sequences used in the rMLST scheme using the iTOL plug-in as described above.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, PDF file, 0.2 MB.