A new Multi Locus Variable Number of Tandem Repeat Analysis Scheme for epidemiological surveillance of Xanthomonas vasicola pv. musacearum, the plant pathogen causing bacterial wilt on banana and enset

Xanthomonas vasicola pv. musacearum (Xvm) which causes Xanthomonas wilt (XW) on banana (Musa accuminata x balbisiana) and enset (Ensete ventricosum), is closely related to the species Xanthomonas vasicola that contains the pathovars vasculorum (Xvv) and holcicola (Xvh), respectively pathogenic to sugarcane and sorghum. Xvm is considered a monomorphic bacterium whose intra-pathovar diversity remains poorly understood. With the sudden emergence of Xvm within east and central Africa coupled with the unknown origin of one of the two sublineages suggested for Xvm, attention has shifted to adapting technologies that focus on identifying the origin and distribution of the genetic diversity within this pathogen. Although microbiological and conventional molecular diagnostics have been useful in pathogen identification. Recent advances have ushered in an era of genomic epidemiology that aids in characterizing monomorphic pathogens. To unravel the origin and pathways of the recent emergence of XW in Eastern and Central Africa, there was a need for a genotyping tool adapted for molecular epidemiology. Multi-Locus Variable Number of Tandem Repeat Analysis (MLVA) is able to resolve the evolutionary patterns and invasion routes of a pathogen. In this study, we identified microsatellite loci from nine published Xvm genome sequences. Of the 36 detected microsatellite loci, 21 were selected for primer design and 19 determined to be highly typeable, specific, reproducible and polymorphic with two- to four- alleles per locus on a sub-collection. The 19 markers were multiplexed and applied to genotype 335 Xvm strains isolated from seven countries over several years. The microsatellite markers grouped the Xvm collection into three clusters; with two similar to the SNP-based sublineages 1 and 2 and a new cluster 3, revealing an unknown diversity in Ethiopia. Five of the 19 markers had alleles present in both Xvm and Xanthomonas vasicola pathovars holcicola and vasculorum, supporting the phylogenetic closeliness of these three pathovars. Thank to the public availability of the haplotypes on the MLVABank database, this highly reliable and polymorphic genotyping tool can be further used in a transnational surveillance network to monitor the spread and evolution of XW throughout Africa.. It will inform and guide management of Xvm both in banana-based and enset-based cropping systems. Due to the suitability of MLVA-19 markers for population genetic analyses, this genotyping tool will also be used in future microevolution studies.

Introduction Investissements d'avenir » program (Labex Agro: ANR-10-LABX-0001-01), under the frame of I-SITE MUSE (ANR-16-IDEX-0006). GVN, JLFR, and EW were funded by the CGIAR Research Program on Roots, Tubers, and Banana (CRP-RTB). GVN and EW were also supported by the COST Action CA16107 EuroXanth, supported by COST (European Cooperation in Science and Technology). GVN was supported by a 2018 grant of the French Embassy in Uganda. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
MLVA is based on the detection of tandem repeat (TR) polymorphisms within the genome of an organism [9,10]. Tandem repeats (TR) with short repetitive DNA sequences, called microsatellites when smaller than a 9 bp unit, are highly variable within bacterial genomes [11,12] and may resolve the genetic diversity of monomorphic pathogens [7,13]. Variability in the number of TRs is mainly generated through slipped strand mispairing during DNA replication [14]. Short TR loci mainly evolve following either a stepwise mutation model (SMM) where new alleles are created by the addition or deletion of a single repeat unit [15] or a generalized two-phase model [16] where the distribution of the numbers of multiple-repeat mutations approximate a geometric distribution [17]. The mutation rates differ between TR loci within a wide range as a result of TR locus specific properties [18]. A high mutation rate increases the likelihood of size homoplasy of TR alleles, i.e. they are identical by state (a same allele size) but have different identity by descent. Size homoplasy may distort the information provided by highly variable loci but this distortion can be minimized by increasing the number of loci and combining markers with different genetic diversity values [16,19].
In this study, we describe the development of a new genotyping method for Xvm, based on MLVA. The MLVA scheme was evaluated on a collection of 335 Xvm strains from five countries, and its discriminatory power was compared to the SNP-derived typing method. This work demonstrates the usefulness and power of the MLVA scheme targeting 19 TR loci for monitoring Xvm populations and epidemics at different temporal and geographical scales.

Bacterial strains
A total of 335 strains of Xvm from known hosts and potential alternative hosts were collected. They were sampled from Ethiopia (n = 122), Uganda (n = 150) and other parts of Eastern and Central Africa (ECA; namely DR. Congo, Kenya, Rwanda, Tanzania, n = 63) ( Table 1). Within this collection, 20 strains were obtained from the National collection of Plant Pathogenic Bacteria (NCPPB) ( Table 1). All 335 strains were confirmed to be Xvm using GspDm primers [38] and five new Xvm-specific primers [8]. All bacterial strains were grown on Wilbrink or YPGA medium for 48h at 28˚C and stored in W broth/glycerol or YPGA slants at -80˚C.

DNA extraction
Strains were first grown on a carbon source-free agar medium (Yeast extract-Peptone-Agar) at 28˚C for 24h, and DNA extracted using the Wizard Genomic DNA Purification kit (Promega, Charbonnières-les-Bains, France). DNA quantification and quality control were performed using a Tecan Infinite 200 NanoQuant microplate reader (Tecan Trading AG, Switzerland); DNA was diluted to a final concentration of 10 ng.μL −1 using milliQ water, and stored at −20 C until use.

In silico identification of MLVA loci
Nine Xvm genomes (NCPPB 2005(NCPPB , 2251, and 4434) from the National Centre for Biotechnology Information (NCBI) public database were screened for VNTR loci using the online tool 'Polloc-V' (http://bioinfo-web.mpl.ird.fr/xantho/utils/) developed by Luis-Miguel Rodriguez-R and Ralf Koebnik. We used Tandem Repeat Finder, TRF [39] as a TR detection algorithm proposed with the following selection criteria: full loci size from 50 to 400 bp; pattern size from 5 to 9 bp; number of repetitions set to 6 or above; stringency set to maximal values 2 (match) -7 (mismatch) -7 (indel); minimum percentage of

Definition and selection of the PCR primers
PCR primers (20-to 27-nucleotides long) were designed in the flanking regions of the tandem repeat sequence using Geneious v. 9.2. Melting temperatures were set around 68˚C to allow downstream primer multiplexing with the QIAGEN Multiplex kit. Primer design parameters were set to be stringent, to avoid the formation of primer dimers and hairpins and to allow downstream multiplexing, and the annealing temperature parameter was set at Melting Temperature minus 4˚C (Tm-4˚C).
Using BLASTN [42] under Geneious, primers and their corresponding TRs were searched for within the nine Xvm reference genomes to determine the location of each locus (inter-or intragenic), and to verify that (i) each motif corresponded to a single locus per genome, and (ii) the locus-corresponding primers exactly targeted the genomic region containing the locus. Only primers fulfilling these criteria were selected for subsequent analyses. Hence, 21 loci and their corresponding primers were selected for development of the MLVA scheme.

Preliminary PCR screening
This first screening assessed typeability (ability to amplify all strains of a given lineage or species), reproducibility, and polymorphism of the identified loci. These criteria were assessed on a core-collection comprising of 15 Xvm strains from Burundi, DR. Congo, Ethiopia, Kenya, Rwanda, Tanzania and Uganda representing the geographical distribution of the pathogen.
PCR amplifications were done using the Multiplex PCR kit (Qiagen) in a total volume of 15μL. PCR cycles consisted of one initial denaturation (95˚C for 15 minutes) to activate the « hot start » Polymerase, followed by 25 cycles of denaturation at 94˚C for 30s, annealing (55˚C to 62˚C) for 90s, elongation at 72˚C for 90 s; and a final elongation step at 60˚C for 30 minutes. Electrophoresis of PCR products was done on 3% agarose gel at 100V for 45 minutes. From this screening, 19 polymorphic loci and primer pairs were retained for the following steps.

Primer multiplexing
The multiplexing consisted of three to four loci mixes per PCR reaction. Each of the multiplex PCR reactions was optimized by testing three hybridization temperatures (57, 60 and 63˚C) per mix, according to QIAGEN indications, on seven bacterial strains of the core Xvm collection. The optimal hybridization temperature was determined by visualization of amplicon intensity on 3% agarose gel electrophoresis. The combinations of the different loci in each reaction mix were chosen according to the size ranges of the PCR products, in order to avoid overlapping fragment sizes.

Genotyping on ABI3500 capillary sequencer
The "forward" primers of each pair were labeled with a fluorophore: 6-FAM, blue (Eurogentec, Angers, France); VIC, green; NED, black; or PET, red (Applied Biosystems, Life Technologies, Saint Aubin, France). The labeling of the different primers was chosen according to the size and intensity of each PCR product: the NED and PET fluorophores being assigned to the smaller fragments, 6-FAM to the primers giving fragments of weak intensity, and VIC to primers giving larger fragments. Pools of four pairs of labeled primers corresponding to each locus were established (Table 3) and tested in multiplex PCR, using the Multiplex PCR kit (QIAGEN, Courtaboeuf, France) according to manufacturer recommendations. Reaction mixtures (15 μL) consisted of 0.2 μM of each primer (forward primer labelled with one of the fluorescent dyes 6-carboxyfluorescein FAM, NED, PET, and VIC, 2X QIAGEN Multiplex MasterMix, 5X Q-solution and 2 μL of bacterial genomic DNA (10 ng.μL −1 ). PCR reactions consisted of an initial denaturation step of 15 min at 95˚C; 25 cycles of 30 s at 94˚C, at annealing temperatures of either 60 or 63˚C, 90 s at 72˚C, and a final 30 min step at 60˚C. Each PCR product was diluted to 100 −1 and 1.5 μL of diluted PCR product was added to 1.5 μL Hi-Di Formamide (for GeneScan -500 LIZ) and 12 μl GeneScan -500 LIZ internal size standard (Applied Biosystems). The 100-fold dilution was chosen following preliminary tests of different PCR loading volumes, because it gave peaks of good intensity (3000-10000 fluorescence units) with no stutter peaks and rare fluorescence saturation phenomena. This was done to avoid peaks saturating the electropherogram and enable accurate analysis. Capillary electrophoresis was conducted on the ABI3500XL DNA Analyzer 24-channel sequencer (Applied Biosystems).
Analyses were conducted at the GenSeq technical facilities of the « Institut des Sciences de l'Evolution de Montpellier »-Labex CEMEB "Centre Méditerranéen de l'Environnement et de la Biodiversité".

Analysis of a core-collection using SNP-derived RFLP markers
A collection of 63 strains was analyzed using the MLVA-19 scheme and SNP-derived RFLP typing tools. Two sets of SNP-derived markers were used. The first set, named WAS-SNPs, targeting 500-bp loci was adopted from Wasukira et al. [6] while the second set, named VN-SNPs, targeting between 200 to 600-bp were newly designed to further characterize the Xvm population [8].

Data analysis
Data scoring. Electrophoregramms were analyzed with GeneMapper 4.0 (Applied Biosystems). Peaks were first automatically detected using the analysis panels we defined for each mix. Each peak was then carefully checked by eye, and false peaks (rare artefacts due to fluorescence saturation) were discarded. The reproducibility of the allelic patterns was checked by running several DNA extractions of the strains NCPPB2005 and NCPPB2251, as well as duplicate analyses of eight Ugandan Xvm DNAs. Fragment sizes obtained for each TR locus were transformed to tandem repeats numbers. Subsequently, the allele sizes were transformed into repeat numbers using a home-made script using R version 3.4.0 [43]. The tandem repeat numbers obtained were rounded up to the next integer [12]. All alleles scored by size, and their  corresponding "raw" and rounded repeat numbers, are summarized in the S2 Table in Supplemental Information. Analysis of genetic data. The typeability and specificity of each MLVA locus to Xvm, pathovars of X. vasicola, and other Xanthomonas species were evaluated by comparing the percentage of strains amplified.
Principal component analysis (PCA) was performed using the FactoMineR package [44] in R to estimate the contribution of each locus and how they account for the genetic variability described in the current Xvm collection.
We estimated the genotypic resolution of the MLVA scheme [45] and described the genotypic diversity in relation to different combinations of TR loci by a genotype accumulation curve using R (R::poppr:: genotype_curve [46]). The curve is generated by sampling x loci randomly and counting the number of multilocus genotypes (MLG) observed. This sampling is repeated r times from 1 to n-1 loci, creating n-1 distributions of observed MLGs [47].
Reconstructing evolutionary relationships across Xvm African haplotypes. Haplotype networks were constructed using the algorithm combining global optimal eBURST (goe-BURST) and Euclidean distances in the Phyloviz 2 software [48]. It allowed the visualization of the different clonal complexes (groups of haplotypes differing by a single locus, or Single locus variant (SLV)).
The mutation model followed by the MLVA molecular markers was estimated by looking at the locus variation of recently diverging haplotypes, i.e. single-locus variants (SLV) and double-locus variants (DLV), along the haplotype network of the minimum spanning tree. Furthermore, the number of TR repeats involved in the mutation event was examined to determine whether the stepwise mutation model (SMM), i.e. addition or deletion of a single repeat, was supported for these TR loci.
Comparison of the discriminatory power and congruence of the typing techniques. MLVA-19 and SNP-derived RFLP typing techniques were compared using the Hunter and Gaston discriminatory Index (HGDI) [49].
Distance matrices calculated from each MLVA-19 and SNP dataset were calculated and compared using the Mantel test performed by the CADM.post function of the R package ape 5.0, with 9,999 permutations [50]. The Mantel correlation coefficients were computed on rank-transformed distance matrices.
Genetic structure. The genetic structure of the Xvm population was assessed by Discriminant Analysis of Principal Components (DAPC) using the adegenet package for the R software [51][52][53] since DAPC is free of any assumption linked to a population genetic model (such as Hardy-Weinberg equilibrium or absence of linkage disequilibrium). The number of clusters was assessed using the function find.clusters, which runs successive k-means clustering with increasing number of clusters (k) and the optimal number of clusters selected based on lowest Bayesian information criterion (BIC) [52]. Eleven independent k-means and DAPC runs were performed to assess the stability of clustering.

Deposition to MLVABank website
The MLVA-19 allelic profiles were deposited to the MLVA website dedicated to plant bacterial pathogens http://bioinfo-web.mpl.ird.fr/MLVA_bank/Genotyping/, corresponding to the MLVAbank at http://microbesgenotyping.i2bc.paris-saclay.fr/, to make MLVA-19 data accessible in an interactive way [54]. The website allows viewing databases with sorting and clustering options, submitting queries, and sharing databases which are maintained and managed by different owners once a common agreement is achieved among partners [26].

The MLVA scheme MLVA-19 is based on 19 highly polymorphic loci that are evenly distributed on the genome
From the in silico screening of VNTR loci and corresponding primers (detailed in Materials and methods), 21 loci were selected that were unique in each genome, and whose corresponding primers were specific to the locus flanking regions.
From the initial screening of a representative Xvm core-collection strains from different countries (n = 15), 19 loci out of the 21 tested were polymorphic, with two to four alleles per locus. Loci XVM013 and XVM023 did not amplify in any strain and were therefore excluded from the downstream analyses. The primers targeting the 19 loci were multiplexed in sets of either four-or three-loci mixes ( Table 3). All but one (XVM038) were considered as microsatellite loci, with motif sizes ranging from 6 to 9 nucleotides. The majority (12 of 19) consisted of 7 nucleotide repeats, and most (12 of 19) had an intergenic location (Table 4). Most loci were evenly distributed on the Xvm genome (S1 Fig). The distance between two adjacent loci ranged from 26.9 kb to 1293.7 kbp, except between XVM006 and XVM036 (652 bp), and between XVM030 and XVM002 (1835 bp).
Of the 19 loci, 12 contained perfect repeat motifs (no variation of the TR sequence) among which six were interrupted in 3' (S1 Table). Seven loci contained imperfect repeat motifs (XVM006, XVM014, XVM016, XVM024, XVM029, XVM036, XVM038) with alternative TR sequence displaying only one mismatch from the reference sequence (mostly transitions between Thymine and Cytosin). Loci XVM027, XVM002, and XVM030 were the most polymorphic, whereas XVM018 and XVM038 were the least polymorphic (Table 4). There was no clear relationship between polymorphism and motif size (S2 Fig), nor with location on the genome, although the least polymorphic locus XVM038 had the greatest motif size (12 bp) and considered a minisatellite.

The MLVA-19 scheme allowed for good genotypic resolution within Xvm
Contribution of each MLVA marker to the scheme was determined from the Principal Component Analysis (PCA). The first three dimensions explained 66.5% of variance, with axes 1 and 2 being the most informative (respectively 41.9% and 16.88%). Loci XVM030, XVM016, XVM024, XVM022 and XVM020 contributed most to the axis 1, while XVM035, XVM018, XVM012, XVM006 and XVM036 contributed most to axis 2 (S3 Fig); loci XVM002 and XVM028 contributed most to axis 3 (S3 Table,  The genotypic resolution of the MLVA-19 scheme is represented by the genotype accumulation curve (Fig 1). Our set of loci has been shown to be sufficient to accurately resolve the different haplotypes in our sample as the curve reached a plateau with 19 loci. The genotype accumulation curve revealed that more than 90% of the genotypes could be detected with 16 markers, hence the MLVA-19 scheme accurately estimates the clonal diversity of our sample (Fig 1).

Most of the mutations of our MLVA-19 loci involve single repeat events
In order to estimate the mutation model of the MLVA microsatellite marker evolution, we looked at the variation of repeat numbers between recently diverging haplotypes, i.e. SLVs within the clonal complexes and also with double-locus variants (DLV); most loci were analyzed on more than five evolutionary steps. Thirteen loci (XVM002, 5,6,12,14,20,21,22,24,27,28,35, and 38) revealed that SLVs resulted from more than 60% of single TR variation, with eight above 80%. XVM012 has a majority (44.44%) of single steps, and XVM014 had 50% of both single and double repeat variation. The locus XVM015 displayed a majority of multiple repeats variations ranging from 2 to 13 repeats. Such events involving large number of repeats could result from recombination mechanism [56]. The loci XVM016, XVM018, XVM029, XVM030 and XVM036 were not involved in SLVs or DLVs. But for those loci, almost all the alleles or number of repeats fulfilled the allelic range observed within our collection (S2 Table).

Locus
No locus or very few loci were amplified with other Xanthomonas species used in this study.

MLVA-19 is congruent with the SNP-RFLP based typing method but more discriminative
We estimated the congruence between the MLVA-19 scheme and the SNP-derived typing method from Wasukira et al. [6] by three complementary approaches. We first assessed the distribution of SNP sublineages and SNP-based haplotypes on the goeBURST minimum spanning tree drawn from the 53 MLVA-19 haplotypes. Two MLVA clusters were consistent with the SNP-sub-lineages SLI and SLII described by Wasukira et al. (Fig 2). Moreover, the MLVA19-haplotypes corresponding to SNP-haplotypes 3 and 4 are also clustered together.
We also performed the Mantel tests between distance matrices obtained on a strain subset (n = 32) genotyped by both methods. Distance matrices (Euclidean and Manhattan distances) were highly correlated, with respective Mantel correlation coefficients of 0.412 and 0.492 (P< 0.001 for both Euclidean and Manhattan matrices, 9999 permutations), indicating that genetic distances from MLVA-19 and SNP-based RFLPs were significantly congruent.
A collection of 63 strains was typed with both SNP-derived markers [6,57] and the MLVA-19 scheme. The discriminatory index was 0.981 for MLVA (n = 63 haplotypes) and 0.564 for the SNPs (n = 6 haplotypes).

MLVA-19 reveals epidemiological relationships between countries and is discriminative at the field scale
The MLVA-19 scheme distinguished 208 haplotypes among the Xvm collection (n = 335). Twenty-nine clonal complexes (CC) defined as groups of single locus variant (SLV) grouped 51.34% of the strains using the goeBURST algorithm. Numbers of haplotypes per CC ranged from 2 to 11; 13 CC grouped three haplotypes and above. Fifty-five per cent of the haplotypes (n = 114) remained as singletons, differing from each other by four to thirteen loci.
We also analyzed strains that were isolated from the same field. At this scale, the MLVA-19 scheme discriminated from 8 to 13 haplotypes per field (Ethiopia, n = 2 and Uganda, n = 1) ( Table 6). Hence, the MLVA-19 was discriminative enough to distinguish different haplotypes from the country scale to the field scale.

Discussion
Understanding invasion routes, biology and epidemiology of pathogens is important to elucidate the main factors involved in the invasion process, to develop epidemiological surveillance strategies aimed at preventing new introductions, as well as building pathogen-informed breeding strategies. At small-scales, it is important to determine the source of outbreaks and the means of transmission to limit the pathogen dispersion. In this study, we developed a highly discriminative typing tool that allowed us to elucidate the population structure and diversity of Xanthomonas vasicola pv. musacearum. The MLVA approach has become a standard in evaluating the population structure and dynamics of bacterial pathogens affecting human, animal and plant health especially due to its ability to detect small yet significant genetic differences.

MLVA-19 scheme is well suited for molecular epidemiological analysis of Xanthomonas vasicola pv. musacearum
To our knowledge, MLVA-19 is the first scheme of this type to be developed for Xvm. This scheme consists of 19 loci evenly distributed on the Xvm genome. When choosing markers for analysis, it is important to ensure that the combination of markers selected allow for accurate discrimination of the haplotypes [58]. Principal component analysis and the genotype accumulation curve indicated that these 19 loci are sufficient to accurately discriminate the Xvm population and thus adding more markers would not identify many new genotypes. The genotype accumulation curve assesses the power to discriminate among unique haplotypes given a  random combination of loci [47] and resolved almost all the haplotypes with 16 to 17 microsatellite markers, indicating that such a number of loci would be sufficient to detect most of the genetic diversity within Xvm. MLVA-19 is also well suited for molecular epidemiology analyses. We determined that most loci of the MLVA-19 scheme evolved by gaining or losing a single repeat at one time, supporting a stepwise mutation model. This may facilitate the relatedness analyses and gene genealogies, and makes this MLVA-19-scheme useful in molecular epidemiology study. Several authors have noted that although SMM is considered to be the predominant mutational model for the microsatellites within bacteria, precise data remain scarce on the actual mutation model and possible variations around this model within the Proteobacteria phylum [26,33,59,60].
MLVA-19 was also discriminative enough to identify and monitor different haplotypes of Xvm at the field scale, as shown in the banana and enset fields in Uganda, and Ethiopia respectively. This paves the way for future studies addressing the importance of multi-infection events and the spatial dynamics of bacterial infection within and across farm fields, comparing the spatial structures of aerial infestations and soilborne or tool-mediated infestations, among others.

MLVA-19 are partially transferable to other pathovars of X. vasicola
All MLVA-19 loci were amplified in Xvm, but several loci amplified within the other pathovars of X. vasicola. One MLVA scheme initially developed for Xanthomonas citri pv. citri has also been partly and successfully adapted to a close pathogenic bacterium from the same species [61,62]. On the other hand, the phylogenetically closest species (X. oryzae, X. cannabis [63]) were not amplified by the MLVA-19. Collectively, these findings further provided evidence for a close phylogenetic relatedness between Xvm and other pathovars of Xanthomonas vasicola.

MLVA-19 is congruent with SNP markers, while revealing an unexpected diversity
From our data, the MLVA typing scheme was much more discriminatory than the SNP typing method described by Wasukira et al. [6]. Indeed, SNP and MLVA markers differ in mutation rates and mechanisms with independent evolutionary processes. Although SNP markers are robust phylogenic markers, less prone to distortion via selective pressure, as is the case for repetitive sequences [64]; their mutation rate is much lower than that of TR, and as such, do not offer enough polymorphisms to discover recent evolutionary events. Whereas, MLVAs mutate faster through the addition or subtraction of tandem repeats, producing greater levels of variation and often providing more discriminatory power per marker. Due to their different evolutionary dynamics, MLVAs and SNPs offer complementary information [65]. MLVA is considered suitable for short-term epidemiological analysis, while SNPs are suited to longterm or global epidemiological analyses [66].
MLVA-19 typing confirmed the differentiation of Xvm into two main sublineages as defined by Wasukira et al [6]. Furthermore, MLVA also identified several homogeneous clusters within each of these sublineages with three to four DAPC clusters per sublineage. Interestingly, some genetic groups (DAPC 6,7,9,12) were not discriminated by SNP-derived markers. The phylogenetic relationship of these clusters to sublineages I and II remains to be determined, and should be clarified by additional genomic sequence analyses.

MLVA-19 as part of a hierarchical Xvm typing system
While SNPs reveal little polymorphism, their phylogenetic signal is informative as it is not disturbed by homoplasic events. The little diversity obtained with SNP markers defined sub-lineages within Xvm but is not sufficient to track the strains of this genetically monomorphic pathogen during outbreaks [7]. We developed highly discriminatory TR markers suitable to separate Xvm isolates within populations which are congruent with SNP typing. In the future, both genotyping systems could be used together within a hierarchical typing procedure [16,67], with the SNP markers being used to define the higher evolutionary groups at the lineage level, and our MLVA-19 scheme being used for outbreak investigations, regional surveillance, amount and directions of gene flows.

Conclusion
We have established that the MLVA-19 scheme developed in this study is highly resolvent from the regional scale to the field scale. This genotyping tool is thus perfectly suited for exploring the genetic diversity of the recently emerging Xvm populations in East and Central Africa and could in future be helpful in addressing evolutionary and ecological questions that are important to address for deciphering the epidemiology of Xanthomonas wilt on banana, including the reconstruction of Xvm invasion routes throughout Africa. With the MLVA profiles deposited in MLVA Bank (http://bioinfo-web.mpl.ird.fr/MLVA_bank/Genotyping/), it will be possible to share data from new outbreaks or new emerging situations and compare them to the Xvm known genetic diversity for epidemiological investigations. This portable and simple genotyping tool can also be used in the future to assist the regional deployment of new Xvm-resistant banana and enset genitors.
Supporting information S1  Table. MLVA-19 alleles in X.vasicola pv. musacearum: Summary of allele sizes and correspondence with raw number of repeats and rounded number of repeats. The alleles shared with other pathovars of X.vasicola are indicated in the column "shared with". Xvv: X.vasicola pv. vasculorum; Xvh: X.vasicola pv. holcicola. a Raw repeat numbers were rounded to the next integer, following Pourcel [12].