Tracking the dissemination of Erwinia amylovora in the Eurasian continent using a PCR targeted on the duplication of a single CRISPR spacer

Fire blight is the most devastating disease affecting pome fruit production globally. The pathogen is native to North America and was imported to western Europe in the 1950s, progressively spreading over the continent in the ensuing decades. Previous phylogenetic studies have revealed the extreme genetic homogeneity of the pathogen outside its center of origin, which makes epidemiological studies difficult. These are generally only possible using hypervariable regions of the genome such as those represented by CRISPRs (Clustered Regularly Interspaced Short Palindromic Repeats), which are, however, not practical to sequence due to their size and variability. Here, we present a simple PCR assay targeting the duplication of a single CRISPR spacer in Erwinia amylovora that was found to be an important marker to discriminate between two main European populations of the pathogen. We implemented the assay on a total of 582 isolates to follow the spread of fire blight across the continent over several decades and, wherever possible, within single countries. The results obtained point to the occurrence of two major separate introduction events for E. amylovora in Europe that occurred approximately 20 years apart, and confirmed the existence of two principal distribution areas located in Northeastern Europe and the Eastern Mediterranean Basin from which the pathogen moved on to colonize the Eurasian continent.


Background
Fire blight, caused by the enterobacterium Erwinia amylovora, was first described on apple trees in the Hudson Valley (State of New York) in the 1790's. This severe disease, which affects plants belonging to the family Rosaceae, is thought to have originated in North America, moved to New Zealand, where it was first detected in 1910, and arrived in the United Kingdom in 1958, before sequentially spreading to virtually every country of continental Europe and in the Mediterranean area during the second part of the twentieth century (Bonn and van der Zwet 2000). In the first two decades of the new millennium, fire blight has progressively moved eastward reaching Russia (Jock et al. 2013), the Caucasus region (Gaganidze et al. 2018(Gaganidze et al. , 2021, Central Asia (Djaimurzina et al. 2014;Doolotkeldieva et al. 2021) and South Korea (Park et al. 2017). Phylogeographic studies based on the diversity of CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) regions have revealed that the extreme genetic homogeneity of the pathogen outside its center of origin is largely due to a bottleneck effect, resulting from a restricted number of dissemination events of so-called CRISPR group I strains radiating from the East Coast of the United States McGhee and Sundin 2012) or possibly even introduced from New Zealand (Bühlmann 2015). Isolates from this "Widely Prevalent" clade (Zeng et al. 2018;Parcey et al. 2020) infect plants of the subfamily Amygdaloideae (formerly Spiraeoideae) and show sequence identities at genomic level exceeding 99.99% (Smits et al. 2010;Mann et al. 2013). Under these circumstances, it is clear that distinguishing different strains to make further epidemiological statements is only possible on the basis of hypervariable regions of the genome such as those represented by CRISPRs, which are however cumbersome to sequence by traditional methods due to their size often exceeding 2 kb .
Together with Cas (CRISPR associated) proteins, CRISPRs build a mechanism that provides a form of acquired immunity against foreign nucleic acids such as phages or plasmids via the integration of part of their sequence as a spacer into the array. RNA transcribed from the spacer sequence helps Cas proteins to recognize and cut foreign pathogenic DNA (Barrangou et al. 2007;Sorek et al. 2008). E. amylovora contains two seemingly poorly active CRISPR repeat regions (CRR1 and CRR2) with their associated cas gene cluster, alongside with a third region (CRR4) that is apparently a relic of a second lost CRISPR/Cas system, which is present in other pathoadapted Erwinia species like Erwinia pyrifoliae, Erwinia tasmaniensis and Erwinia piriflorinigrans Smits et al. 2011Smits et al. , 2013. In E. amylovora, CRR1 and CRR2 consist of variable spacer sequences, which are mostly 32 bp long and separated by multiple 29-bp palindromic direct repeat sequences (Additional file 1: Figure S1a). An appealing feature of CRRs for population genetics is that their comparative analysis simplifies the determination of the chronological succession of a set of strains, as new spacers are exclusively inserted in a polar manner at the 5′ end of the cluster next to the leader sequence (Barrangou et al. 2007). Consequently, strains that display recent incorporation of new spacers at the 5'end or show deletions/ duplications in the internal region of the array cannot be ancestral (Additional file 1: Figure S1b).
After our first publication on the topic ) we subsequently added an increasing number of E. amylovora genomes to our database, which were obtained from strains of cosmopolitan origin Bühlmann 2015;Zeng et al. 2018;Parcey et al. 2020). Initial analysis of the extracted CRR1 sequences revealed some intriguing temporal dynamics within its spacer pattern (Fig. 1), which corroborated previous minisatellite and single nucleotide polymorphism data suggesting that two E. amylovora populations arrived in Europe via discrete introduction events (Bühlmann 2015;Bühlmann et al. 2014). Subsequent diversification of these two founder populations then took place over the following decades during the spread of the disease within the continent, to the Mediterranean area and to Asia. In this work, we developed a simple PCR assay targeting the spacer duplication in CRR1 of E. amylovora that was found to be a discriminating element of these two archetypal European genotypes, and used the assay to follow the spread of the disease across the Eurasian continent over several decades and, wherever possible, within single countries.

CRR1 genotypes in E. amylovora genomes
Analysis of the genomes of E. amylovora strains from Europe, the Mediterranean area, Asia, North America and Oceania revealed the presence of 16 distinct genotypes in CRR1 (Fig. 1). Based on the aforementioned process of spacer acquisition and deletion in CRRs (Additional file 1: Figure S1a), these genotypes were divided into two main discrete groups, dependent on the presence or absence of the duplication of spacer 1029. The first group contains archetypal genotype A in which spacer 1029 is duplicated, together with eight derived genotypes resulting from the deletion of single or multiple internal spacers or, in the case of genotype F, from the polar addition of spacer 1036 at the 5′ end of the array. The second group contains archetypal genotype D, holding a single copy of spacer 1029, and four derived genotypes originating from the deletion of single or multiple internal spacers within the array. These two groups were named "A-derived" and "D-derived", respectively, and can easily be distinguished using the PCR assay designed around spacer 1029 duplication site. It is not possible to define which of the two archetypal genotypes is ancestral without additional evidence, as the loss and duplication of a single spacer are events that are equally plausible. Nevertheless, the presence of both genotypes at the center of origin of the disease (McGhee and Sundin 2012; Tancos and Cox 2016) as well as in México ) and in Zealand McGhee and Sundin 2012) reinforces previous observations based on molecular clock data that the split occurred prior to their arrival in Europe (Bühlmann 2015;Bühlmann et al. 2014). Two further genotypes (3 and Γ) could not be assigned to either of the above groups, because they displayed deletion events involving either the whole region surrounding spacer 1029 or part thereof and could thus potentially have originated from any of the two aforementioned archetypal genotypes. This third group, herein defined as the "null PCR genotype", is expected to result in a negative PCR amplification using primers C1f04/C1r09 as one or both primer annealing sites are missing.

Temporal distribution of the CRR1 genotypes
Analysis of the available genomes revealed that CFBP 1252, the oldest available European strain isolated from the first outbreak in the United Kingdom in 1958, belonged to CRR1 genotype A (Additional file 2: Table  S1). This CRR1 genotype remained the only one to be detected in Europe until 1979, when genotype D first appeared in the strain Ea 1/79 from Germany. In the following years, fire blight spread to an increasing number of countries in Europe and in the Mediterranean area. Nonetheless, A and D persisted as the only two CRR1 genotypes until 1997, when the first A-derived genotype (F) was detected in strain Ea263 in Israel (Fig. 1). The latter strain was also the first in which a genotype variation could also be observed in CRR2 (a ➔d) , confirming the overall stability of both CRR1 and CRR2 up to that timepoint. The first Dderived genotype (W) was detected in strain FAW612, isolated in 2001 from Malus sp. in Switzerland. Thereafter, the number and type of A-and D-derived CRR1 genotypes increased noticeably, with some strains displaying novel genotypes generated by internal spacer deletions also in CRR2 Bühlmann 2015). Based on these data, we hypothesize that two founder populations of E. amylovora, corresponding to the CRR genotypes Aaα and Daα, sequentially colonized Europe in the timeframe 1958 to 1979 and remained dominant until just before the turn of the century, when they started to differentiate into several derived genotypes as the disease moved to an increasing number of countries in Western Europe and beyond. Since the populations derived from the two primal genotypes differ only by the above described duplication of spacer 1029, we propose herein a PCR assay that can be implemented as a useful approach to quickly track the spatio-temporal Fig. 1 Spacer composition of the CRR1 of sequenced cosmopolitan strains of E. amylovora. Numbering of the spacers is coherent with the one proposed earlier , with the first digit corresponding to the number of the CRR followed by a three-digit number that increases from the most ancient to the more recent spacers (3′ to 5′ direction in the DNA). Sixteen different genotypes were recorded, which could be divided into three different groups (A-derived, D-derived, other) depending on the archetypal genotype from which they are probably originating. Spacer 1029 is highlighted in red, while the position of the C1f04/C1r09 primers developed to verify its number of copies via PCR is indicated by the arrows. It is worth to note that even if the Y genotype clearly evolutionarily belongs to the A-derived group, it would result in no amplification in the PCR assay due to the absence of spacer 1030 that is the annealing site of primer C1f04. For the purpose of our study, this genotype and related strain ACW65131 were nevertheless assigned to the A-derived genotypes to which they clearly belong. One typical representative for each genotype, including geographical origin and year of isolation, is listed hereafter. propagation of A-and D-derived genotypes outside its center of origin of E. amylovora in North America.

PCR detection of A-and D-derived genotypes
The functionality of the PCR targeting the duplication of spacer 1029 was validated in a number of strains with known CRR1 sequence whereby amplicons of the expected size were consistently obtained (Fig. 2). The test was thus applied to a larger number of isolates covering 31 countries across Europe, Asia and the Mediterranean area (Table 1), and the results were evaluated both on a continent-wide scale as well as, if the number of isolates and their diversity were sufficient, within the single countries. In the latter case, the timeline of the appearance of the two ancestral CRR1 genotypes and, if available, data about their geographical distribution were analyzed in detail and correlated with the first appearance of fire blight in the country according to the EPPO database (https://gd.eppo.int/taxon/ERW IAM/ distribution). Whenever possible, PCR results were integrated with available genome or literature data to extend geographical and temporal coverage of the study.

Distribution of archetypal A-and D-genotypes in Europe
PCR and genome analyses confirmed the prevalence of archetypal genotype A in northwestern Europe (United Kingdom, France, The Netherlands, Ireland) from the first appearance of fire blight in the UK in 1958 until 1979, when strain Ea 1/79 carrying archetypal genotype D was isolated from Germany (Additional file 2: Table  S2). Nonetheless, archetypal genotype A remained predominant in the investigated strains until 1986 when, notwithstanding the geographical variations among more or less genetically homogenous countries, both genotypes started to be detected with similar overall frequency in Europe. However, our PCR approach does not allow to directly distinguish between the two archetypal genotypes and their corresponding derivatives as it can be done with DNA sequencing. As the appearance of the null PCR genotype in strain IVIA 1626 from Spain predated the earliest genome-based internal deletion in CRR1 from 2001 (strain FAW612 carrying CRR1 genotype W) (Bühlmann 2015), we can now assign the emergence in Europe of a derived genotype to at least 1996. Geographical distribution of European isolates spanning from 1958 to 2019 ( Fig. 3 and Additional file 2: Table S3) showed the pervasiveness of archetypal genotype D and its derivatives in countries of northeastern Europe such as the Czech Republic, Poland, the Russian exclave of Kaliningrad, Latvia, Lithuania and Germany, where it virtually represented (apart from solitary exceptions in few countries) all analyzed isolates. In all other countries of Europe, both archetypal genotypes were present, whereby the dominant genotype and its incidence varied from country to country. Archetypal genotype D was slightly prevalent in central Europe, where this genotype was detected in 61 and 78% of the Austrian and Swiss isolates, respectively. On the other hand, archetypal genotype A was more widespread in the Netherlands (65%), in southern European countries like Spain (75%) and Turkey (63%), or in South-East Europe, where it was cumulatively found in 57% of the isolates from Serbia and Montenegro. Serbia was the only European country where a significant fraction of the isolates (34%) displayed the null PCR genotype. Fig. 2 PCR amplification of the genomic region containing spacer 1029 in different strains of E. amylovora using primers C1f04 and C1r09. Strains bearing two copies of spacer 1029 (A-derived genotype, blue) yielded a 276 bp amplicon, while those bearing a single copy (D-derived genotype, green) resulted in a shorter band at 215 bp. Strain Tk119, which displays an extensive deletion at the 5′ end of the array that includes both primer annealing sites, did not yield (as expected) any amplification. One of the four additional Turkish isolates with unknown genotypes tested (Tk3) revealed a D-derived genotype, while two others (Tk11 and Tk86) could be assigned to the A-derived genotype. A fourth isolate (Tk2) displayed a larger band on gel that indicates the generation of a longer PCR product. Sanger sequencing of the purified 337-bp amplicon revealed the presence of a third copy of spacer 1029, a result that is compatible with the estimated size of the band on gel. CFBP 3098 and CFBP 1430 were used as positive controls for D-and A-derived genotypes, respectively. PCR-grade water was used as the negative non-template control (NTC) Table 1 Origin, number and year of isolation of E. amylovora isolates analyzed in this project. The list consists of both isolates received as inactivated cells and tested by PCR alone, including some literature strains (da Costa 2020), as well as fully sequenced strains whose genome data were already available (Bühlmann 2015)

Distribution of archetypal A-and D-genotypes in the Mediterranean area, Russia, and Asia
A large fraction of isolates with the null PCR genotype was found as well in Israel (46%), which revealed the largest diversity in the 1029 marker among all countries analyzed so far. However, roughly identical percentages of the archetypal genotypes A and D (26 and 27%, respectively) were found (Fig. 4). The latter two genotypes were also found in nearby Lebanon, where the two isolates from 1998 hint to a similar situation concerning the genetic diversity in the area. The situation depicted by the Russian isolates showed a discrete distribution of the two founder genotypes in the country, with those displaying archetypal genotype D being prevalent in the central part of European Russia, in the region included between the basins of the rivers Don and Volga, while the Caucasus region in the southern part of the country was dominated by archetypal genotype A (Additional file 3: Figure S2). On the other hand, archetypal genotype A was virtually universal in all other Asian countries analyzed over the respective examined timeframes: Iran (2004)(2005)(2006)(2007)(2008)(2009)(2010), Georgia (2018), Kazakhstan and Kyrgyzstan (2011-2019). The only exception was a single genome-sequenced isolate from the year 2010 in Iran (Ea2) displaying the CRR1 З genotype, which would result in a null genotype in the PCR due to the lack of the C1f04 primer annealing site on spacer 1030 (Fig. 1).

Spatio-temporal analysis of archetypal genotype distribution in selected countries
Five countries with different fire blight history and a sufficient number of isolates were selected for a detailed chronological and geographical analysis of the distribution of E. amylovora.

Austria
A total of 118 isolates were analyzed for Austria, making it the country with the most comprehensive spatiotemporal dataset. E. amylovora was first detected in Austria in 1993 in the western federal state of Vorarlberg. Although the original foci were eradicated by 1998, new ones appeared in the same region (Keck 2004). During the first 4 years only the A-derived genotype was present in Vorarlberg, with sporadic isolates belonging  Figure  S3). This confirms that the initial eradication effort in the country was generally successful but was probably negated by a second introduction event (Keck et al. 2002;Ruppitsch et al. 2004) possibly from nearby Switzerland and Germany, countries where the Dderived genotype was prevalent. Indeed, after 1998, the latter genotype became widespread in most of Austria with the A-derived genotype playing a significant role only in the southeastern part of the country, in the federal states of Carinthia, Styria, and Burgenland. The distribution of the two primal genotypes of E. amylovora within Austria hints to the existence of two distinct geographical impact zones, with the northwestern federal states being mainly affected by the archetypal genotype D that is typical of northern Europe, while the southeastern part of the country was rather influenced by archetypal genotype A that is prevalent in southern Europe and in the Mediterranean area.

Serbia and Montenegro
Fire blight was first recorded in Serbia and in Montenegro in 1989and 1996, respectively (Balaz et al. 2013. Forty-two isolates from both countries spanning from 1998 to 2017 were analyzed in this work. From 1998 to 2013, exclusively the A-derived and null PCR genotypes were detected in our sample set, suggesting that until then these were the predominant variants. Geographical distribution of the two genotypes pointed to a more uniform population both in Montenegro as well as in central and southern Serbia, where only the A-derived genotype was present, compared to the Fig. 4 Prevalence of archetypal genotypes A (blue) and D (green) in the Mediterranean Area, Russia, and Asia in the timeframe 1982-2019. Pie charts denote the number of isolates of each genotype recovered in each country. If only one genotype was detected, the number of corresponding isolates is indicated between parentheses after the country code. Countries are colored according to the predominant archetypal genotype if this was retrieved in a fraction exceeding 90%, else they are shown in petrol blue. The null PCR genotype is indicated in red northern region around Belgrade, where also the null PCR genotype could be detected (Fig. 6). These results confirmed earlier results obtained by restriction analysis of genomic DNA and pulsed-field gel electrophoresis (PFGE), in which strains from central and southern Serbia were found to be more genetically homogeneous compared to those isolated in the northern part of the country, and support earlier assumptions that E. amylovora was initially introduced to Serbia from two separate directions (Ivanović et al. 2012). The widespread presence of the A-derived genotype in Serbia and in Montenegro is consistent with the presence of the same genotype in other Mediterranean countries such as Greece and Turkey, supporting earlier hypothesis of fire blight pathogen spread from south to north of the Balkan peninsula (Bonn and van der Zwet 2000). Only starting from 2015 the D-derived genotype was detected in a total of five samples isolated in the northwestern part of Serbia. In the same year, the null PCR genotype appeared in Montenegro along to the already present Aderived genotype. Both these findings point to successive introductions events in two countries. On the other side, the absence of the null PCR genotype in the nearby countries suggests that international plant trade is the major source of its introduction. Large number of fruit producers and nurseries in Serbia import pomaceous planting material from countries where null PCR genotype was detected.

Spain
The introduction of fire blight in Spain occurred in the north of the country (Basque region) in 1995 (López et al. 1999) and was characterized by the A-derived genotype (Fig. 7). In the following year, with the first detection of the null PCR genotype in strain IVIA 1626 retrieved in the surroundings of Pamplona (Navarra), the currently earliest known occurrence in Europe of a genotype not belonging directly to the CRR1 A-or Dgenotype was established. Up to this discovery, this distinction belonged to strain Ea263 carrying the A-derived CRR1 genotype F that was isolated in 1997 in Israel   (Fig. 1). The first occurrence of the D-genotype in Spain can be dated to 1997 with plasmid-less strain UPN527 (Llop et al. 2006;Mann et al. 2013) from the Navarra province. In the following years, the A-derived genotype remained prevalent throughout the country, although the D-derived genotype was regularly detected in the northeast of the country in the provinces of Navarra, Saragossa, and La Rioja. Despite a relative short timeframe of 11 years and the relatively small subset of isolates considered, the analysis of the Spanish isolates is of special interest because, for most of the isolates, data from previous PFGE analyses (Jock et al. 2002;Donat et al. 2007) were available, allowing a direct comparison of the two approaches. The results obtained (Additional file 2: Table S2) confirmed that CRISPR-based genotyping does not correlate with PFGE-derived Pt grouping used to outline geographic distribution patterns for E. amylovora (Bühlmann 2015). Nonetheless, we can confirm previous hypotheses suggesting multiple waves of the disease, with archetypal genotype A being introduced first, probably by natural spread or trade from nearby France, United Kingdom or Belgium (Jock et al. 2002), and followed then by archetypal genotype D through import of latently infected plant material from northeastern or central Europe (Donat et al. 2007).

Israel
Our dataset included 95 isolates of E. amylovora, which was first detected in Israel in 1985 (Shabi et al. 1990), mostly from the northern part of the country where the apple and pear production is concentrated (Dafny-Yelin et al. 2021) (Fig. 8). However, the exiguous number of datapoints available in the first 25 years of its fire blight history proved to be an important limitation for drawing definite conclusions about the succession of genotypes entering the country. The earliest strain analyzed, CFBP 3098, dates back to 1987 and was determined to have a D genotype, but the isolates collected in the 1990's already displayed an A or an A-derived genotype. Furthermore, in the same period, two 1997 isolates from nearby Lebanon (LebA3 and LebB66) already showed diverging primal genotypes . This suggests that, by the time we were able to include a sufficient number of isolates to the investigation, the genetic landscape in Israel was already determined by a substantial diversity. This is reflected by the analysis of the isolates retrieved between 2007 and 2018 that resulted in no clear spatio-temporal prevalence of any of the three genotypes in northern Israel.  (Jock et al. 2002;Donat et al. 2007) Discussion Different molecular approaches have been evaluated throughout the years to analyze the diversity of E. amylovora in Europe, including PFGE (Jock et al. 2002;Jock et al. 2013), rep-PCR (Rico et al. 2008), Multiple-Locus VNTR (variable number of tandem repeats) Analysis (MLVA) , full-genome single nucleotide polymorphism (SNP) (Bühlmann 2015) or complete sequencing of CRISPR regions , with the latter two that have been proven to be efficient genotyping methods in genetically homogenous species. SNP analysis previously indicated the occurrence of two major separate introduction events for E. amylovora in Europe that occurred approximately 20 years apart (Bühlmann 2015). The first lineage could be traced back to founding strains CFBP 1232 and CFBP 1252 and was first detected in 1958 in the United Kingdom before colonizing mainland Europe and the Middle East (Lebanon, Israel, and Egypt). SNP analysis suggested that this lineage then moved on from there to Iran, Turkey, Moldavia and southern Russia. The second lineage made its first appearance in 1979 in Germany with archetypal strain Ea 1/79 and encompassed isolates from central-eastern Europe (Germany, Switzerland, Poland), and also isolates from Israel and Portugal. These results are in accordance with the data based on sequencing analysis of the CRISPR regions obtained previously ) and confirmed herein. These two founder genotypes denominated Aaα and Daα can be distinguished by the presence of either two or one copies of spacer 1029 in CRR1, respectively. Both genotypes remained stable and predominant until the mid-1990s, when the first differentiation into the corresponding A-and D-derived genotypes was observed ( Fig.  1 and Additional file 2: Table S2). The existence of this specific marker in CRR1 allows to assign European E. amylovora isolates to one of the founder genotypes or its derivatives, hence enabling to understand how these two original lineages have moved across Europe only by performing a simple PCR assay and without having to resort to sequencing the entire CRRs or enumerating all SNPs in the genome. PCR data from isolates included in previous studies from Spain (Donat et al. 2007) or the rest of Europe (Jock et al. 2002;Jock et al. 2013) confirmed the lack of correlation that was previously observed between PFGE patterns and CRISPR genotyping , as three out of four of the investigated Pt groups (Pt1, Pt2 and Pt4) were found to contain both archetypal CRR1 genotypes. The only exception was group Pt3 that contained exclusively isolates belonging to the A-derived genotype, but this observation could be biased by their geographical relatedness that is compatible with a single population that moved from France to Spain between the 1970's and the 1990's (Additional file 2: Table S2). Despite the fact that no parallels could be drawn between the two genotyping methods, the dispersal routes hypothesized using the two approaches are comparable and suggest the presence of two main distribution areas located in Northeastern Europe and the Eastern Mediterranean Basin from which the pathogen moved to colonize Europe (Jock et al. 2002;Jock et al. 2013).
Provided a sufficient spatio-temporal distribution of the isolates to be analyzed, our simple PCR approach was able to discriminate distinct invasion patterns also within single countries such as in the case of Spain, Serbia/Montenegro or Austria, where it was possible to detect the chronological succession of the different disease waves and their likely geographical origin. The Austrian dataset had 10 isolates overlapping with a previous study (Ruppitsch et al. 2004) that targeted VNTR-A ), a short sequence repeat (SSR) region within the PstI fragment of the pEa29 plasmid, and was suggestive of a second discrete introduction of E. amylovora in the country after the apparently successful eradication effort of the first outbreak in the western federal state of Vorarlberg. Our approach indicates that these two waves correspond to the lineages represented by archetypal genotypes A and D, as supported by the correlation between our PCR results and the number of VNTR-A repeats, which was lower in Austrian isolates displaying archetypal genotype D (repeat range: 3-7) with respect to those belonging to archetypal genotype A (repeat range: 12-14) (Additional file 2: Table S2). This statement could not, however, be generalized to isolates sampled all over Europe , for which the ranges of repeats in VNTR-A between the two archetypal genotypes were largely overlapping (Additional file 2: Table S4). This suggests that only one subpopulation belonging to archetypal genotype A, characterized by a high number of VNTR-A repeats, was responsible for the initial second introduction wave in Austria. Among the six VNTR regions analyzed by Bühlmann et al. (2014), only VNTR-B showed a significant correlation with the two archetypal European CRR1 populations, with A-and D-derived genotypes exhibiting three and two repeats, respectively, in almost the totality of the isolates considered , the only notable exception being strain LebA3 from Lebanon. Despite displaying largely overlapping repeat ranges for most of the VNTRs, there was a trend for a smaller variance in number of alleles in archetypal genotype D that points to higher homogeneity in the population probably due to its later introduction and consequently less time to diverge. Even if there was only a limited correlation between the different genotyping approaches, the epidemiological conclusions that could be drawn were similar, suggesting that the geographical distribution patterns across Europe have been remained constant over the years.
Overall, the PCR analysis of the region around spacer 1029 provides a single but potent genetic marker, which allows to follow the two major E. amylovora populations that are present on the Eurasian continent. Previous data indicating the occurrence of two major introduction events in Europe could herewith be confirmed (Bühlmann 2015;Bühlmann et al. 2014) and discrete invasions patterns within some single countries could also be recognized. Due to the simple nature of the approach, which is limited to the detection of a solitary CRR marker, the method has also certain limitations. A fundamental requirement is the availability of an adequate number of isolates providing a sufficient spatio-temporal coverage of an epidemic. In this perspective, early isolates from shortly after the first appearance of the disease are especially critical, since excessive mixing of the two populations prior to the first measurable data point was shown to prevent the possibility to make any reasonable hypothesis about the spread of fire blight in a certain region. The latter case is best illustrated by the example of the situation in Israel, where the relatively high data coverage in the last 10 years was not able to compensate for the paucity of data in the first 25 years of fire blight history.
Even though in this work we focused on the duplication of spacer 1029 because of its special relevance for tracking the spread of fire blight founder populations over the European and Asian continent, a similar PCR approach can be adapted to any particular CRR marker (i.e., spacer deletion or duplication) that is found to be specific for the bacterial populations of E. amylovora in a certain geographical area. Archetypal genotype A is virtually ubiquitous in the region spanning from the Caspian Sea to Central Asia (Fig. 4), a fact that makes the primers presented here of limited practical use to study the current diversity and distribution of the disease within the corresponding single countries but which still can be used as monitoring tool for detecting the possible future emergence of archetypal genotype D in the area. Moreover, the diversity present in each country within the CRRs can be assessed based on DNA sequencing of few representative isolates and this information can then be employed to design new specific primer sets flanking the corresponding variable regions. In this way, it was possible to design country-tailored PCR approaches to explore the diversity and the geographical distribution of E. amylovora in Georgia (Gaganidze et al. 2021) and in the region of Central Asia that includes Kyrgyzstan and Kazakhstan (Doolotkeldieva et al. 2021), where different subpopulations of the archetypal genotype A of the pathogen are present.

Conclusions
The analysis of the CRRs provides a useful tool to explore the diversity and distribution of E. amylovora at different geographical levels even with a relatively simple and accessible approach like PCR, which can be implemented in any basic molecular biology laboratory. Limited DNA sequence analysis may be required only in the very first phases of the investigation to assess the overall genetic diversity that might be present in the subset of isolates to be analyzed, but this task can also be replaced by an initial PCR screening using the set of primers designed for CRR1 and CRR2 that are already present in the literature ).

Methods
Bioinformatic analysis of CRISPR repeat regions in E. amylovora genomes CRISPR spacers and repeats were extracted from 82 available genomes of E. amylovora of worldwide origin (Bühlmann 2015) (Additional file 2: Table S1) using the CRISPR-Cas-Finder software (https://crisprcas.i2bc. paris-saclay.fr/) and assigned to the correct CRR based on the known sequence of the repeat (Grissa et al. 2007). Spacers were aligned manually in MS Excel and compared to identify conserved patterns among strains. Highly homologous spacers diverging from each other by no more than three bases were considered equivalent if they were found in the same order on the matching CRR. Progressive numeration of the spacers in 3′-to-5′ orientation with a four-digit code followed the same system adopted previously ).

Design of PCR assay targeting spacer 1029
A PCR assay targeting the duplication of spacer 1029 within CRR1 was developed by designing primers C1f04 (5′-CGATCAACCTGTTTTTCAGTAGGT-3′) and C1r09 (5′-CCGCCGAGACAACCGGCTATCC-3′) on adjacent spacers 1030 and 1027, respectively. This PCR is expected to produce a 276-bp or a 215-bp amplicon depending whether or not spacer 1029 is duplicated in a particular strain. Strains CFBP 1430 and CFBP 3098 were used as corresponding positive controls, respectively. The PCR reaction was optimized using a T100 Thermal Cycler (BIO RAD) in a volume of 10 μL. The reaction contained 5 μL KAPA2G Robust HotStart ReadyMix with dye (Sigma-Aldrich), 3.5 μL PCR-grade water, 0.5 μL of each primer (10 μM) and 0.5 μL of cell lysate from a 1:20-diluted overnight culture in Luria-Bertani medium. The PCR program consisted in an initial denaturation for 3 min at 95°C followed by 35 cycles of denaturation for 15 s at 95°C, annealing for 15 s at 60°C and elongation for 15 s at 72°C. A final elongation step of 1 min at 72°C was performed at the end of cycle.
A second amplification targeting the whole CRR1 with primer pair C1f01 (5′-TGAGTAGCAAATCCGTGCGT GCT-3′) and C1r01 (5′-AATCAGTCCCCCCATGCT GTGAC-3′)  was performed with all samples as control. The goal was to confirm that strains that did not show any amplification in the first PCR due to deletions in CRR1 affecting one or both annealing sites of primers C1f04 and C1r09 did indeed belong to E. amylovora. Reaction conditions were similar to the PCR above, except that the annealing temperature and the extension time were raised to 68°C and 2 min to adapt them to the primer sequences and expected amplicon length, respectively. If no amplification was obtained in both PCR assays, the sample was excluded from the analysis. For both PCRs the presence and size of amplicons were evaluated by electrophoresis on a 1.5% agarose gel.

Determination of the archetypal genotype by PCR
A total of 486 heat-inactivated isolates of E. amylovora extracted from different host plants were received from 31 European, Mediterranean and Asian countries and tested in the PCR with primers C1f04/C1r09 along with one sample each from the USA and Canada that were already present in the lab. The full list of heatinactivated isolates, including their PCR results, is available in Additional file 2: Table S2. The combined recapitulative catalog of 582 isolates, including both those analyzed by PCR as well as the 82 utilized on the basis of their genome data (Bühlmann 2015) and 12 PCR results from the literature (da Costa 2020), is presented in Table 1.
Additional file 1: Figure S1. a Schematic structure of CRR1 and CRR2 in E. amylovora. The CRR is constituted of an AT-rich leader sequence followed by identical direct repeats (DR) that are separated by unique spacers (S1-S8). The sequence of the DR is highly similar between CRR1 and CRR2 and differs only in two adjacent base pair in the central region of the 29-bp sequence . Newest spacers are incorporated immediately downstream of the AT-rich leader sequence. b Spacer arrangement and evolution of a CRISPR array in six hypothetical strains (A-F) sharing a common ancestor. Older spacers are shared among the different strains and are located at the 3' end of the array. New spacers are added in a polar fashion at the 5' end next to the leader (L) sequence. Spacers of the same color sharing the same vertical position are identical between strains. Divergence points, i.e. the addition of different set of spacers in two lineages, are represented by the change in their color. In the depicted example, B is probably the most ancient strain, since it is the one which is closest to the last common ancestor (situated on the branching point between A-C and D-F). Strain A is more recent than B as suggested by the presence of seven additional spacers at the 5' end of the array. It is also clear that strain C diverged from the lineage containing A and B before their separation into two distinct strains. On the right side of the tree the loss/duplication of two internal spacers in strain F, although close to the 3' end of the array, suggests that the latter strain has separated from strain E only after the branching point between D and E (otherwise the same deletion would have been present also in strain D).
Additional file 2: Table S1. List of 82 genomes used for initial genotyping of the CRISPR repeat regions (Bühlmann 2015). The CRR1 genotype was marked in column A and color-coded according to the affiliation to the corresponding archetypal genotype. Table S2. List of isolates analyzed in this work and result of the PCR targeting the duplication of spacer 1029 (column A). Whenever the complete genome was available (*) the CRR1 genotype ) was reported and color-coded according to the affiliation to the corresponding archetypal genotype. Table S3. Distribution of isolates belonging to the Aderived, D-derived and PCR null genotypes in the countries analyzed in this work. Table S4. Comparison of the VNTR profiles between genomesequenced isolates displaying the A-derived and the D-derived genotype . Empty cells indicate missing microsatellite.
Additional file 3: Figure S2. Approximate location and CRR1 genotype of E. amylovora isolates in the region spanning from Eastern Europe to Central Asia. Two main areas of influence are recognizable, with the Aderived genotype radiating from the region surrounding Black and Caspian Seas and the D-derived genotype moving toward Russia from Eastern Europe. It can be noted that in Russia, the region between the rivers Don and Volga apparently lies at the crossroad between the movements of the two above-mentioned primal populations.
Additional file 4: Figure S3 (a-k). Prevalence of A-derived (blue), Dderived (green) and PCR null genotype in Austria in the period 1993-2017. The number of isolates retrieved in each federal state in a certain timeframe is indicated in parentheses using the pattern (A-D-null). Federal states are colored accordingly. The interval for each slide is highlighted on the timeline, which shows the cumulative data for the whole country.