Assessment of Multilocus Sequence Analysis (MLSA) for Identification of Candidatus Liberibacter Solanacearum from Different Host Plants in Spain

Liberibacter is a bacterial group causing different diseases and disorders in plants. Among liberibacters, Candidatus Liberibacter solanaceraum (CLso) produces disorders in several species mainly within Apiaceae and Solanaceae families. CLso isolates are usually grouped in defined haplotypes according to single nucleotide polymorphisms in genes associated with ribosomal elements. In order to characterize more precisely isolates of CLso identified in potato in Spain, a Multilocus Sequence Analysis (MLSA) was applied. This methodology was validated by a complete analysis of ten housekeeping genes that showed an absence of positive selection and a nearly neutral mechanism for their evolution. Most of the analysis performed with single housekeeping genes, as well as MLSA, grouped together isolates of CLso detected in potato crops in Spain within the haplotype E, undistinguishable from those infecting carrots, parsnips or celery. Moreover, the information from these housekeeping genes was used to estimate the evolutionary divergence among the different CLso by using the concatenated sequences of the genes assayed. Data obtained on the divergence among CLso haplotypes support the hypothesis of evolutionary events connected with different hosts, in different geographic areas, and possibly associated with different vectors. Our results demonstrate the absence in Spain of CLso isolates molecularly classified as haplotypes A and B, traditionally considered causal agents of zebra chip in potato, as well as the uncertain possibility of the present haplotype to produce major disease outbreaks in potato that may depend on many factors that should be further evaluated in future works.


Introduction
The genus Liberibacter is composed by obligate parasites of multiple bacterial species identified in plants worldwide [1]. The species of the genus include the following: Ca. L. asiaticus (CLas), Ca. L. africanus (CLaf) and Ca. L. americanus (CLam), which cause Huanglongbing or HLB in citrus; Ca. L. solanacearum (CLso) which affects mainly potato, celery or carrot; Ca. L. europeaeus (CLeu) found in Rosaceae plants but considered by some authors as an endophyte rather than a plant pathogen [2] and Liberibacter crescens (Lcr) which has been identified on tropical babaco and papaya hybrid and it is the only one able to grow in axenic culture [1,3]. Besides, a new Candidatus Liberibacter species, named as Candidatus Liberibacter brunswickensis, was detected in a native Australian eggplant psyllid and was not associated to any plant disease [4].
CLso has been detected in America, New Zealand, Europe and the Mediterranean basin associated with damages in economically important crops of the Solanaceae and Apiaceae families [5]. So far, nine haplotypes have been identified for CLso based on the analysis of single nucleotide polymorphisms (SNPs) across partial sequences of 16S rRNA, intergenic spacer region (ISR) and rplJ and rplL genes. Haplotypes A, B and F were associated with zebra chip disease in potato, meanwhile, C, D and E have been described to cause habitually vegetative disorders in apiaceous plants such as carrot, celery and parsnip, with E also being recently detected in potato in Spain [6]; finally, there was haplotype U, which was identified in nettle plants [7]. In 2019, two more haplotypes were identified, haplotype G in Solanum umbelliferum, and haplotype H in Apiaceae and Polygonaceae family plants [8,9]. Each haplotype appears to be found in particular geographical areas and is transmitted by different vectors, with Bactericera cockerelli being the vector psyllid for A, B and F haplotypes, Bactericera trigonica for haplotypes D and E, Trioza apicalis and Trioza anthrisci for haplotype C and Trioza urticae for haplotype U [5,8]. So far, there is no available information regarding transmission of haplotypes G and H by psyllids.
Characterization and identification of Liberibacter types have been based on the gene of the 16S rRNA and the β operon which codes for different proteins of the large ribosomal subunit and subunits B and C of the RNA polymerase [10,11]. However, analysis of single genes may not be totally able to reproduce precise phylogenetic relationships among liberibacters which is why multiple gene-based phylogenetic approaches, such as Multilocus Sequencing Analysis (MLSA) or Multilocus Sequencing Types (MLST), have been applied in the last few years [10,12]. MLSA is based in nucleotide sequencing of fragments from protein-coding genes that evolve at a slow and constant rate and usually shows a higher resolution as compared with ribosomal gene sequence [13]. Nevertheless, MLSA may differ significantly according to the type and number of genes selected [11]. Therefore, this approach needs to be carefully designed and validated to determine its phylogenetic accuracy and its degree of congruence compared with other methodologies or biological characteristics of the bacteria.
Herein, a comprehensive study was undertaken to evaluate the phylogenetic relationship of all CLso haplotypes previously identified by ribosomal gene sequences in Spain and situate them within the Liberibacter context, including information from other species of the genus that produce plant diseases. For that purpose, we have used an MLSA approach based on genes previously utilized in MLST analysis [12] that were selected and concatenated in different combinations. In summary, the aim of the present study was to refine the understanding of the phylogenetic relationships of Liberibacter species and help to clarify those found in Spain in different host plants according to a precise methodology.

DNA Extraction and Q-PCR Analysis for Detection of CLso
Detection of CLso was conducted by quantitative real-time PCR (q-PCR) on total DNA extracted from plant leaves (carrot), tubers (potato) or true seeds samples (carrot, parsnip, tomato and pepper). For plant leaves, roots and tuber samples, approximately 0.5 g of material were used for DNA extraction. When seed samples were analysed, a pool of 100 seeds was used. The samples were ground with a steel bead for 1 min at 30 rpm using the homogenizer Retsch Mixer Mill MM-400 in 10 mL of phosphate-buffer saline (pH 7.4) supplemented with Tween 20 (0.02%). Then, 1 mL from each homogenized sample was centrifuged at 4 • C for 1 min at a low speed (1000× g) to remove the coarse particles. The supernatant was then centrifuged at 4 • C for 7 min at 10,000× g to concentrate the bacteria and the pellet used for total DNA extraction by using the DNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA) following the manufacturer's instructions.
For qPCR analysis, two different combinations of specific primer and probes ( Table 2) were used for the detection of CLso [22,23]. PCRs were conducted in a 7500 Fast Real-Time PCR System (Applied Biosystems, Foster City, CA, USA) under the following conditions: an initial cycle of 10 min at 95 • C for initial denaturation followed by 40 cycles of 15s at 95 • C and 1 min at 60 • C. The reaction mix (15 µL) consisted of 7.5 µL of 2 x SensiFAST™ Probe Lo-ROX mix (Bioline Reagents Ltd., London, UK), 0.5 µM of each forward and reverse primers, 0.1 µM for the probe, and 3 µL of extracted total DNA as template. Samples of seeds, potato tubers or carrot plant leaves free of CLso or in the presence of the bacteria were used in each extraction as negative or positive controls, respectively. The efficiency of the DNA extractions was determined by qPCR on total DNA extracts by amplification of the endogenous cytochrome oxidase gene (COX) [24]. TET-ATCCAGATGCTTACGCTGG-BHQ-2 1 q-PCR probes.

Haplotype Identification and Characterization
Haplotypes from samples shown in Table 1 were identified firstly by analysis of 50S ribosomal subunit proteins L10/L12 genes (rplJ/rplL) using primers CL514F/R (Table 3) and a protocol previously described [25], and secondly by MLSA.
For MLSA, different primer pairs, determined as useful in MLST studies conducted on Liberibacter [12], were used for PCR amplification of partial sequences of the housekeeping genes adK, dnaG, fumC, grpE, icdA, metG, mutS, purA, recA, and gyrB (Table 3). DNAs from samples described in Table 1 were used as targets in PCRs performed as described before for all genes [12] except for gyrB. For gyrB, PCR amplification was performed in 50 µL volume containing 0.5 µM primers JG-gyrB1 and JG-gyrB2, 0.2 mM dNTPs, and 2 U of Taq polymerase (Promega). The amplification reaction conditions consisted of 94 • C for 1 min, 55 • C for 1 min, and 72 • C for 2 min for 40 cycles, plus an initial step of 94 • C for 5 min and a final step of 72 • C for 10 min. PCR products were visualized in 2% agarose gel containing Midori Green nucleic acid gel staining solution (Nippon GeneticsEurope, Dueren, Germany) and purified with the Wizard SV Gel and PCR Clean-up System Kit (Promega Corporation, Madison, WI, USA). PCR products were sequenced at STABVIDA (Lisbon, Portugal).
Nucleotide sequences were deposited in GenBank. Accession numbers for the partial sequences of the genes used in this study are MT847239 to MT847243 and MT864604 to MT864672.

Phylogenetic Analysis and Molecular Dating of Candidatus Liberibacter Isolates
Sequences were edited and aligned using BioEdit Sequence Alignment Editor [26]. Additionally, sequences from 50S or the housekeeping genes from different Liberibacter isolates obtained from the National Center for Biotechnology Information database (NCBI) (http://www.ncbi.nlm.nih.gov) were included in the analysis. Phylogenetic analyses from individual or concatenated genes were conducted in MEGA X [27] using maximum likelihood analysis (ML) and the Tamura-Nei model [28]. A total of 1000 bootstrap re-samplings, were generated in each analysis.
Molecular dating of Liberibacter species was estimated based on liberibacters divergence calculated with MEGA X [27] as above and calibrating the molecular clock using the time of divergence between CLas and CLaf, which was previously estimated as 147 million years (Myr) [10].

Descriptive Analysis of the Sequences
The number of polymorphic sites, mutations, nucleotide diversity, average nucleotide differences and Tajima's test were calculated separately from individual genes or concatenated sequences using DnaSP version 6.12.03 [29].

Detection of CLso from Different Samples
CLso was detected in samples of potato tubers or seeds from different plant species using two different primer-probe specific assays for qPCR. To rule out the presence of inhibitors in qPCR reactions, often present in difficult samples such as tubers and seeds, detection of the COX gene was used in each sample as an internal amplification control. Only samples that tested positive using the two specific qPCR assays for CLso were used for the subsequent sequence analysis.

50S Ribosomal Subunit Proteins L10/L12 (rplJ/rplL) Genes Sequence Analysis
Sequence analysis of the partial 50S ribosomal proteins L10/L12 genes (rplJ/rplL) confirmed that all 39 isolates from either the database or those obtained in this work belonged to Liberibacter species. Some of these sequences included those found in potato samples in Spain. The analysis showed similarities in rplJ/rplL sequences ranging from 62.69 to 99.76 (p-distance in %). The mean similarity level (±SE) on this gene among liberibacters was 91.09 ± 0.52 with a maximum of 150 nucleotide differences over a sequence of 453 nt positions. Within CLso isolates, a mean similarity of 98.48 ± 0.33 and an average nucleotide difference of 6.54 ± 1.42 were elucidated. A mean similarity of 76.52 ± 1.40 and an average nucleotide difference of 97.33 ± 5.93 were found among HLB liberibacters and no differences were shown between the two Lcr isolates studied. Moreover, the mean similarity between CLso and HLB or Lcr isolates was 76.15 ± 1.53 (100.61 ± 6.38 nt) and 66.09 ± 2.21 (145.36 ± 9.23 nt), respectively; finally, that between HLB and Lcr isolates was 65.77 ± 1.79 (144.75 ± 7.28 nt).
The phylogenetic tree constructed using the Maximum Likelihood method and the Tamura-Nei model, inferred different phylogroups corresponding to the different types of Liberibacter species or haplotypes. CLso samples were clustered together and separated from HLB species or Lcr. Besides, clear different clusters were associated within the CLso haplotypes, but with close connections between D and E or F and G haplotypes. All haplotypes detected in Spain were included in D and E groups, even those isolates detected in potato tubers that differentiated from A, B, and F haplotypes causing zebra chip or infecting potato, tomato, and pepper in other areas of the word ( Figure 1). To study in greater depth the relationship among the different CLso haplotypes, the mean similarity among these different groups was evaluated ( Table 4). The analysis showed similarities among different isolates ranging from 96.71 ± 0.85 to 99.77 ± 0.23. The lowest differences were shown between D and E, and F and G haplotypes, and the highest between H and the rest of the haplotypes, Figure 1. Phylogenetic tree of Liberibacter strains based on partial sequences of the 50S ribosomal proteins L10/L12 genes (rplJ/rplL). The tree was based on nucleotide sequences from 39 isolates and 453 nt positions. Analysis was performed using the Maximum Likelihood method and the Tamura-Nei model [28]. Bootstrap values (1000 replications) are shown at the branch points. Grey circles indicate those samples sequenced in this work. Phylogenetic analyses were conducted in MEGA X [27].
To study in greater depth the relationship among the different CLso haplotypes, the mean similarity among these different groups was evaluated ( Table 4). The analysis showed similarities among different isolates ranging from 96.71 ± 0.85 to 99.77 ± 0.23. The lowest differences were shown between D and E, and F and G haplotypes, and the highest between H and the rest of the haplotypes, as shown also in the phylogenetic tree ( Figure 1). Similarities between haplotype from potato in Spain and haplotypes A and B causing zebra chip in America were 98.84 ± 0.52 and 97.68 ± 0.01, respectively.
Evolutionary information was calculated for each housekeeping gene either in CLso alone or including Liberibacter species causing disease in citrus, and L. crescens, which is known to be taxonomically far apart from CLso or HLB isolates. No significant sequence variations were found among isolates belonging to the same Liberibacter species or haplotype as shown in an example in Figure 2; thus, a representative sequence of each group was selected for further analysis. Example of phylogenetic tree of representative Liberibacter strains based on partial sequence of adenosine kinase gene (adk). The tree was based on nucleotide sequences from 30 isolates and 208 nt positions. Analysis was performed using the Maximum Likelihood method and the Tamura-Nei model [28]. Bootstrap values (1000 replications) are shown at the branch points. Grey circles indicate those sequenced in this work. Phylogenetic analyses were conducted in MEGA X [27].
To obtain evolutionary information of each gene, different parameters were considered. Data regarding Liberibacter and CLso evolution are shown in Table 5. Negative Tajima's D values resulted as a consequence of fewer haplotypes than segregating sites. This indicates low-frequency polymorphisms and suggests scenarios of a population size expansion after a bottleneck or a selective sweep and/or purifying selection. Higher differences were found between haplotypes and segregation sites when all liberibacters were included in the analysis compared with that shown when just CLso was studied, reinforcing the idea of an absence of positive selection but a nearly neutral mechanism for the analysed housekeeping gene evolution [30]. Example of phylogenetic tree of representative Liberibacter strains based on partial sequence of adenosine kinase gene (adk). The tree was based on nucleotide sequences from 30 isolates and 208 nt positions. Analysis was performed using the Maximum Likelihood method and the Tamura-Nei model [28]. Bootstrap values (1000 replications) are shown at the branch points. Grey circles indicate those sequenced in this work. Phylogenetic analyses were conducted in MEGA X [27].
To obtain evolutionary information of each gene, different parameters were considered. Data regarding Liberibacter and CLso evolution are shown in Table 5. Negative Tajima's D values resulted as a consequence of fewer haplotypes than segregating sites. This indicates low-frequency polymorphisms and suggests scenarios of a population size expansion after a bottleneck or a selective sweep and/or purifying selection. Higher differences were found between haplotypes and segregation sites when all liberibacters were included in the analysis compared with that shown when just CLso was studied, reinforcing the idea of an absence of positive selection but a nearly neutral mechanism for the analysed housekeeping gene evolution [30].
In order to select those genes to be used in the MLSA approach, a precise study about inter-Liberibacter and intra-CLso similarity was performed.  (Figure 3a,b).   (Figure 3a,b).   -recA (H). The evolutionary history was inferred using the Maximum Likelihood method and the Tamura-Nei model [28]. Bootstrap values (1000 replications). Analysis was conducted in MEGA X [27].
In addition, a separated phylogenetic analysis was performed for each gene using sequences for CLso haplotypes A, B, C, D, and E, obtained in this work, but also including information for other liberibacters available in databases. The phylogenetic trees revealed, most of the time, similar topologies, although some variations were found among the different genes analysed. Every CLso isolate was always included in the same phylogroup and separated from isolates that cause HLB in citrus species or Lcr (Figure 4). Moreover, different species of Liberibacter, causing HLB in citrus were clearly separated; meanwhile, cluster division within CLso isolates was not as noticeable. , metG (f), mutS (g), purA (h), recA (i) and gyrB (j). Analysis was performed by using the Maximum Likelihood method and the Tamura-Nei model [28]. Bootstrap values (1000 replications) are shown at the branch points.
Overall, in comparison with the rplJ/rplL genes tree, phylogenetic trees predicted from individual housekeeping gene sequences presented congruent phylogroups. However, in some cases, the clustering was poorly resolved and weakly supported by bootstrap values and within CLso, some kind of disagreement among the trees was shown. In the case of gyrB gene, a strong discrepancy was shown in the phylogenic separation between HLB isolates and Lcr as compared with that obtained by the other genes ( Figure 4).
To overcome the possible evolutionary differences of the genes and the putative poor resolution of the single-gene analysis, an MLSA approach based on concatenated sequence analysis was performed. Sequences from the ten housekeeping genes previously studied were used in the analysis, and those showing more suitable features based on single-gene analysis were selected and combined for MLSA analysis.
Similarl to previous analysis, the phylogenetic tree inferred different phylogroups corresponding to Liberibacter species or haplotypes. CLso isolates were clustered together and separated from HLB species or Lcr. Different clusters were associated with different CLso haplotypes with higher bootstrap values that supported those obtained by rplJ/rplL.
Haplotypes D and E were apart from haplotypes C and B and less separated from A ( Figure 5). As a whole, the analysis of housekeeping genes provides a robust haplotype delineation equivalent to the ribosomal analysis but supported by different genetic features. , metG (f), mutS (g), purA (h), recA (i) and gyrB (j). Analysis was performed by using the Maximum Likelihood method and the Tamura-Nei model [28]. Bootstrap values (1000 replications) are shown at the branch points.
To overcome the possible evolutionary differences of the genes and the putative poor resolution of the single-gene analysis, an MLSA approach based on concatenated sequence analysis was performed. Sequences from the ten housekeeping genes previously studied were used in the analysis, and those showing more suitable features based on single-gene analysis were selected and combined for MLSA analysis.
Similarl to previous analysis, the phylogenetic tree inferred different phylogroups corresponding to Liberibacter species or haplotypes. CLso isolates were clustered together and separated from HLB species or Lcr. Different clusters were associated with different CLso haplotypes with higher bootstrap values that supported those obtained by rplJ/rplL.
Haplotypes D and E were apart from haplotypes C and B and less separated from A ( Figure 5). As a whole, the analysis of housekeeping genes provides a robust haplotype delineation equivalent to the ribosomal analysis but supported by different genetic features. The evolutionary history was inferred using the Maximum Likelihood method and the Tamura-Nei model [28]. Bootstrap values (1000 replications) are shown at the branch points. A total of 3967 nt positions in the final dataset were analysed in MEGA X [27].

Concatenated Sequence Analysis
In order to determine which and how many genes were needed to be used in the MLSA analysis to achieve a robust delineation of the different Liberibacter species and haplotypes, different gene combinations were evaluated. Gene selection was performed according to the similarity range previously determined for single genes (Figures 3 and 4). adk, mutS, icdA, and recA were selected for MLSA being those that showed the widest and narrowest ranges of similarity within liberibacters or CLso isolates.
The inter-Liberibacter phylogroup sequence interval similarities were 99. 13  The evolutionary history was inferred using the Maximum Likelihood method and the Tamura-Nei model [28]. Bootstrap values (1000 replications) are shown at the branch points. A total of 3967 nt positions in the final dataset were analysed in MEGA X [27].

Concatenated Sequence Analysis
In order to determine which and how many genes were needed to be used in the MLSA analysis to achieve a robust delineation of the different Liberibacter species and haplotypes, different gene combinations were evaluated. Gene selection was performed according to the similarity range previously determined for single genes (Figures 3 and 4). adk, mutS, icdA, and recA were selected for MLSA being those that showed the widest and narrowest ranges of similarity within liberibacters or CLso isolates.
The topology of the eight trees inferred from the different concatenated gene combinations was found to be quite similar among them and comparable to the one based on the ribosomal rplJ/rplL genes ( Figure 6). CLso haplotypes defined a single cluster clearly separated from those liberibacters causing HLB or Lcr. However, a more detailed analysis allowed to detect some clustering differences between CLso haplotypes D and E, detected in Spain, and the other haplotypes analysed ( Figure 6). As for individual genes, the evolutionary information from each gene combination was studied. Table 6 shows the average pairwise nucleotide diversity and the difference per site and sequence, the number of mutations and segregation sites, as well as Tajima's D values.  (Figure 3c,d).

Molecular Dating on Liberibacter Species
To infer the evolutionary dynamic of Liberibacter species a time tree was constructed based on concatenated sequence analysis of the all house-keeping genes evaluated above. Equal evolutionary rates in all lineages were assumed and the clock was calibrated to convert distance to time using the time of divergence between CLas and CLaf estimated previously as 147 Myr [10]. Using this parameter, the time of divergence between CLso E and CLso D was 6.50 Myr, between CLso D or CLso E and CLso A was 7.97 Myr, and between these three and CLso C was 8.39 Myr. Finally, the longest time was found between CLso B and the rest and was estimated at 12.57 Myr. The distance obtained between CLam and CLas was 252.20 Myr, in similar range to that obtained previously [10] (Figure 7).

Figure 7.
Timetree inferred by applying the RelTime method [31,32] to the user-supplied phylogenetic tree whose branch lengths were calculated using the Maximum Likelihood (ML) method and the Tamura-Nei substitution model [28]. The time tree was computed using one calibration constraint (147 Myr between CLas/CLaf). This analysis involved nine nucleotide sequences and BT1 of L. crescens was used as outgroup. There were a total of 3933 positions in the final dataset. Evolutionary analyses were conducted in MEGA X [27].

Discussion
Many Liberibacter species have been described as pathogens and associated as the causal agents of serious diseases such as HLB or zebra chip. However, in the last few years, other bacteria belonging to this group were described as non-pathogenic or just causing minor, unclear, or indirect damages in plants [1][2][3][4]. Among the Liberibacter species, different CLso haplotypes, generally linked to different geographic regions, have been described to cause disease in a variety of economically important crops [1]. This situation may resemble that described for the different HLB liberibacter types, considered as the result of an evolutionary divergence resulting in species diversity according to the origin [3].
Haplotype determination in CLso has been conventionally based on single nucleotide polymorphisms in genes associated with ribosomal elements. However, analysis of single genes may not reflect general phylogenetic relationships in prokaryotes. To overcome this limitation, multiple gene-based phylogeny approaches, such as MLSA or MLST, have been introduced for genomic analysis [11][12][13]. MLSA is based on nucleotide sequencing of internal fragments of protein-coding genes and subsequent phylogenetic analysis [13]. This technique generally shows higher resolution compared with ribosomal RNA sequencing, as protein-coding genes evolve at a slow and constant rate, providing better resolution power [33]. In our study, the MLSA phylogeny roughly confirmed the ribosomal gene-based grouping previously suggested [7][8][9][10]12,14,34], but revealed more precise and robust information about the genetic relationships of different Liberibacter haplotypes and species. Phylogenetic trees predicted from individual housekeeping genes in CLso showed some differences within the species. The different location of the haplotypes in the single-gene tree analysis could indicate a different evolutionary process, such as recombinant events, horizontal gene transfer, or intragenomic rearrangements [33]. These slight differences among the housekeeping genes in CLso haplotypes reveal a short genetic distance among them, similar to that shown when ribosomal sequences were analysed. The discrepancies observed in tree topologies from individual  [31,32] to the user-supplied phylogenetic tree whose branch lengths were calculated using the Maximum Likelihood (ML) method and the Tamura-Nei substitution model [28]. The time tree was computed using one calibration constraint (147 Myr between CLas/CLaf). This analysis involved nine nucleotide sequences and BT1 of L. crescens was used as outgroup. There were a total of 3933 positions in the final dataset. Evolutionary analyses were conducted in MEGA X [27].

Discussion
Many Liberibacter species have been described as pathogens and associated as the causal agents of serious diseases such as HLB or zebra chip. However, in the last few years, other bacteria belonging to this group were described as non-pathogenic or just causing minor, unclear, or indirect damages in plants [1][2][3][4]. Among the Liberibacter species, different CLso haplotypes, generally linked to different geographic regions, have been described to cause disease in a variety of economically important crops [1]. This situation may resemble that described for the different HLB liberibacter types, considered as the result of an evolutionary divergence resulting in species diversity according to the origin [3].
Haplotype determination in CLso has been conventionally based on single nucleotide polymorphisms in genes associated with ribosomal elements. However, analysis of single genes may not reflect general phylogenetic relationships in prokaryotes. To overcome this limitation, multiple gene-based phylogeny approaches, such as MLSA or MLST, have been introduced for genomic analysis [11][12][13]. MLSA is based on nucleotide sequencing of internal fragments of protein-coding genes and subsequent phylogenetic analysis [13]. This technique generally shows higher resolution compared with ribosomal RNA sequencing, as protein-coding genes evolve at a slow and constant rate, providing better resolution power [33]. In our study, the MLSA phylogeny roughly confirmed the ribosomal gene-based grouping previously suggested [7][8][9][10]12,14,34], but revealed more precise and robust information about the genetic relationships of different Liberibacter haplotypes and species. Phylogenetic trees predicted from individual housekeeping genes in CLso showed some differences within the species. The different location of the haplotypes in the single-gene tree analysis could indicate a different evolutionary process, such as recombinant events, horizontal gene transfer, or intragenomic rearrangements [33]. These slight differences among the housekeeping genes in CLso haplotypes reveal a short genetic distance among them, similar to that shown when ribosomal sequences were analysed. The discrepancies observed in tree topologies from individual genes reinforced the need to approach this study using multilocus analysis including multiple independent genes to achieve more accurate group phylogeny.
Our MLSA analysis was based on the sequence analysis of housekeeping genes used in a previous MLST study [12]. A total of ten genes were analysed and the selected combinations were approached based on two main criteria: (i) the ease of amplifying the target sequence from all the isolates, and (ii) the suitability of each housekeeping-gene chosen from an evolutionary perspective. A non-cultivable microorganism such as CLso requires PCR amplification of the target sequence from DNA extracted from plant material. Herein, after appropriate DNA extraction, all the genes selected were relatively easily amplified from infected plant material. In addition, the analysis of short sequences from PCR products was acceptable for the phylogenetic studies, and no cross reactions with other bacterial DNA were obtained after sequencing. Moreover, the low frequency of polymorphisms and the absence of positive selection of the studied genes suggest a nearly neutral mechanism for their evolution. Thus, all the selected genes showed similar features and, therefore, could be applied in an MLSA approach, as occurred in other bacterial models [35].
A gene concatenation has been defined to provide better results than a single gene analysis because it predicts intraspecific relationships more accurately [36]. Here, the concatenated genes were also chosen according to their individual taxonomic resolution inter-Liberibacter and intra-CLso. Genes presenting widest and narrowest similarities among the isolates were selected to avoid over and underestimation of the genetic distance among the liberibacters studied, and then assembled and evaluated as a single sequence. Our results demonstrated that the combinations of two or three key genes may be sufficient to reveal the precise phylogenetic position of the different liberibacters. In many cases, the information provided by these few key genes was similar to that obtained after analysis of the ten concatenated genes. The consistency in the classification of Liberibacter isolates by MLSA was given by the bootstrap analysis of the phylogenetic trees as well as by the evolutionary features of the concatenated genes.
Classification of the Liberibacter isolates is not just an academic or taxonomic issue. Many of the Liberibacter species are considered important pathogens in agriculture, and their classification can seriously affect many crops, the international trade, and the associated sectoral economy. Currently, new species of Liberibacter are being identified and it is likely that other new ones, associated or not with plant diseases, will be detected in plants or insect vectors in the near future [37]. By then, it will be essential to determine if these new isolates are plant pathogens as well as their relationship with the isolates or variants already characterized. This can be achieved with appropriate phylogenetic studies and through the use of different characters that average the evolution of different genetic features. Indeed, our work aimed to determine the phylogenetic position of those haplotypes of liberibacters detected in Spain commonly associated with apiaceous crops but able to infect potato tubers. An initial ribosomal sequence analysis grouped the CLso isolates detected in potato crops within the haplotype E, undistinguishable from those infecting carrots, parsnips, or celery. Most of the analysis performed with single genes grouped E and D haplotypes at a certain distance from A, B, and C.
It should be noted that the characterization of the haplotype is important from a molecular tool perspective, but it may not be related to bacterial pathogenicity and it is not addressed to directly identify characters involved in virulence or host range. Virulence or host range markers should be determined after genomic analysis from the available genomes or after whole genomic sequencing of isolates as they emerge [38]. So far, an MLSA approach provides valuable information regarding CLso relationship between their suggested pathogenicity in certain plant hosts and their evolution.
Taking advantage of the previously published data for HLB in citrus [10] and using the information obtained from the analysed housekeeping genes, a molecular dating for CLso was inferred. To estimate the time of splitting among the CLso isolates, the evolutionary divergence among the different isolated was calculated using the concatenated sequence of the ten genes evaluated. The clock was calibrated using the time of divergence between African and Asian liberibacters which was estimated at 147 Myr ago [10]. Shorter distances were obtained within the CLso haplotypes, with the shortest being between haplotypes D and E, and the longest between B and the rest of the haplotypes with a distance of more than double that estimated between D and E. These data on the divergence among CLso haplotypes support the hypothesis that the evolution of the different isolates maybe associated with different hosts, in different geographic areas and, probably connected to different vectors [39]. Why are some CLso haplotypes more prevalent in a particular host for example A and B for potato in America versus D and E for carrots, parsnip and celery in Europe? Are the genetic differences found in CLso haplotypes sufficient to play a key role in differentiating the host range or are other factors involved? In Spain, only haplotypes D and E were identified, both efficiently transmitted by the psyllid B. trigonica and very closely related to apiaceous plants. However, haplotype E has also been detected sporadically in potato growing areas, mainly in tubers and a few records in plants. Although the detection of haplotype E in potato in Spain is considered a type of incidental event [39], the potato plants displayed the typical symptoms attributed to the presence of haplotypes A and B in regions where they are present. Thus, the minor affectation in potato in this region could be more related to the prevalence of the vector B. trigonica, a psyllid that has shown its preferences into settle, feed, and oviposit on apiaceous plants, than with genomic variations observed in different haplotypes of CLso [39,40]. Still, another possibility is the existence of a different host range based on genomic variations between haplotypes A or B and D or E, but, so far, such association has not been identified between haplotypes [15,16,41,42]. New genomic and proteomic approaches are desirable to understand the infection processes and elucidate the factors that rule the host range and virulence of these bacteria. In Spain, these studies are addressed to evaluate the possible evolution from haplotype D to E and their transmission by vectors, which may be determining factors that can play a key role in the infection of different hosts [43].
What is clear is the absence in Spain of CLso haplotypes molecularly classified as producing the typical zebra chip disease. However, a definitive answer to the possibility that current haplotypes cause a major outbreak in potato crops will depend on ongoing studies on the transmissibility of haplotype E by an efficient vector, or by a more in-depth genomic analysis to determine differential factors that may govern the host range or virulence of the pathogen.