Genome-wide discovery and development of polymorphic microsatellites from Leishmania panamensis parasites circulating in central Panama

The parasite Leishmania panamensis is the main cause of leishmaniasis in Panama. The disease is largely uncontrolled, with a rising incidence and no appropriate control measures. While microsatellites are considered some of the best genetic markers to study population genetics and molecular epidemiology in these and other parasites, none has been developed for L. panamensis. Here we have developed and tested a new panel of microsatellites for this species, based on high-throughput genome-wide screening. The new set of microsatellites is composed of seventeen loci, mainly spanning trinucleotide or longer motifs. We have evaluated the sensitivity and specificity of the panel based on a sample of 27 isolates obtained from cutaneous leishmaniasis patients from central Panama and also several reference species from both L. (Leishmania) and L. (Viannia) subgenera. The genetic equilibrium was assessed both intra- and inter-loci, while the reproductive mode was evaluated using several tests. The new SSR panel shows high polymorphism and sensitivity, as well as good specificity. The preliminary data described here for L. panamensis suggest extensive departure from Hardy-Weinberg proportions, significant linkage disequilibrium and strong deficit of heterozygotes. Several recombination tests involving multilocus linkage disequilibrium and a phylogenetic approach allowed rejection of frequent recombination in our dataset. The genome-wide strategy described here proved to be useful to identify and test new polymorphic SSR loci in Leishmania. The new panel of polymorphic microsatellites is a valuable contribution to the existing molecular markers for the study of genetic structure and other aspects of this important species.


Findings
In Panama, Leishmania (Viannia) panamensis is responsible for most of the reported clinical cases [1], causing cutaneous and mucocutaneous leishmaniasis. Some previous works have explored the genetic composition of local populations of this important parasite using kinetoplast DNA RFLP [2] and AFLP [3]. As these systems have some known limitations, more powerful genetic markers are needed for this species.
Full genome sequencing is increasingly being used to uncover a vast number of DNA polymorphisms in many organisms; most of them being single nucleotide polymorphisms (SNPs). While genome-wide SNP genotyping is becoming more feasible and attractive for population genetics and molecular epidemiology in many organisms, markers such as microsatellites may still have advantages over SNPs due to high polymorphism information content and fast mutation rate [4]. Microsatellites (also simple sequence repeats, SSR) are still considered very useful markers for applications involving short temporal/spatial scales, or slowly evolving or clonal organisms [4][5][6].
There is growing evidence that some SSR may have low transferability among species, limiting the usefulness of these markers for interspecies studies [7,8]. Additionally, the high intraspecific genetic variability reported for species of the L. (Viannia) subgenus further warrants the necessity to develop species-specific microsatellite panels. In this study we describe, develop and characterize a novel panel of microsatellite markers specific for L. panamensis based on genome-wide high-throughput screening (Additional file 1: Extended Methods). We have used for this purpose the sequenced L. panamensis chromosomes (Genbank accession numbers CP009370 to CP009404) [9]. Ethical approval for the experiments described here was obtained from INDICASAT AIP Institutional Review Board (Additional  Annealing temperature f Range of fragment sizes found in L. panamensis strains tested so far file 1: Extended Methods). As far as we know, this is the first microsatellite panel developed for Leishmania panamensis, using a bioinformatic pipeline at genomic level. We demonstrate that this new microsatellite panel is highly polymorphic and has good sensitivity and specificity, making it very promising for studying the genetic diversity in this parasite.

In silico microsatellite screening and polymorphism assessment
The SSR mining procedure we used on the reference L. panamensis genome allowed the detection of a large number of potential loci, being the AC, AG, AGC, AT, ACC, AGG and GC the most frequent motifs (Additional file 1: Extended Methods; Additional file 2: Table S1). This distribution of abundance of SSR types differ from what was found in L. braziliensis [10]. However, results are not directly comparable as different search strategies were used. Out of 19,297 perfect microsatellite loci, 18,921 were considered appropriate for primer design and 12,871 flagged as potentially unique sequences. Four hundred and six microsatellite loci including di-, tri-, tetra-and pentanucleotides were evaluated for consistent amplification and polymorphism using two DNA pools, each one containing DNA from 10 different L. panamensis isolates. After several steps of selection, 104 potentially polymorphic loci were tested for polymorphism and consistent amplification patterns with DNA from individual isolates. All SSR loci were detected by fluorescent capillary electrophoresis using a three primer method (Additional file 1: Extended Methods). Finally, 17 microsatellite loci were selected, showing coherent segregation patterns of all peaks found in the pools, as well as low levels of stuttering (Table 1 and Table 2). The new set of L. panamensis microsatellites was validated by genotyping a group of 27 isolates obtained from cutaneous leishmaniasis patients from the central region of Panama (Additional file 1: Extended Methods).
We decided to exclude dinucleotide SSRs from the final panel, to improve allele assignment and stuttering levels. Additionally, we also chose to use perfect repeat loci to avoid errors due to ambiguous allele assignment and undefined marker evolution models [4].
None of the new microsatellite loci described showed more than two clear amplification peaks, suggestive of average diploidy in the corresponding chromosomes of the L. panamensis genome. This pattern of only 1-2 alleles per locus at microsatellite loci is consistently observed at each Leishmania species tested so far, including L. panamensis from Ecuador and Peru [11].
Sensitivity, specificity and cross-species amplification of the microsatellite panel Eleven loci could be amplified producing robust amplification signals even when tested with very low amounts of template Leishmania DNA (down to 0.01 ng) (Additional file 2: Table S3). These results may open the possibility to genotype parasites directly from infected tissues or insects with this new SSR panel. The described SSR set has high specificity for Leishmania DNA, showing no crossreactivity with human DNA or with DNA from the insect vector, Lutzomyia sp. (Additional file 2: Table S2).
In general, the new microsatellite panel showed much better cross-species amplification with other species of the L. (Viannia) subgenus than with species of L. (Leishmania). Almost all markers were able to show correct size amplification products when tested with isolates of L. guyanensis, L. braziliensis, L. peruviana and L. lainsoni (Additional file 2: Table S5). Several markers did not amplify in some of the species evaluated, indicating potential null alleles.

Genetic polymorphism and preliminary population data
Even when all L. panamensis isolates where collected from patients from a small geographic area, no repeated multilocus genotypes were detected (Additional file 2: Table S4). This result is congruent with AFLP data obtained with the same group of isolates, ruling out the hypothesis of full clonality in this population [3]. The polymorphism information content ranged from 0.071 (M75) to 0.877 (M233), with a mean value of 0.420 (Table 2). Markers M202 and M207 did not produce amplification products in isolates P7 and P14 respectively, indicating possible null alleles (Additional file 2: Table S4). Plotting genotypic diversity as a function of the number of loci showed that about 8-10 markers were enough to explain most of the genetic diversity in our sample of L. panamensis isolates (Fig. 1).
Observed heterozygosity (Ho) ranged from 0 to 0.556 with a mean value of 0.181 and was notably lower than the expected heterozygosity (He) for almost all loci ( Table 2).
Within-population inbreeding coefficient (F IS ) was first estimated by the method of moments, showing that all loci had positive F IS (range 0.371-1; overall F IS = 0.620, 95 % CI = 0.418-0.717), except markers M78 and M149. As an important step in the validation of the new microsatellite loci for population analyses, the possible presence of null alleles should be evaluated, particularly if excess homozygosis is observed. Several methods and algorithms have been reported in the literature to test microsatellites for possible null alleles [12]. However, most methods commonly used for that purpose make the assumption of Hardy-Weinberg equilibrium, which we cannot fulfil in the case of Leishmania data. Therefore, in order to verify if the observed positive value of F IS was influenced by the possibility of having null alleles, we used Bayesian estimation of this coefficient and compared models considering or not the presence of null alleles in our data. This analysis revealed a large significant positive value (F IS = 0.613, 95 % CI: 0.549-0.667). The model considering inbreeding (nfb) had a lower deviance information criterion (DIC = 1403) than the model considering only null alleles (nb, DIC = 1478), suggesting that inbreeding is the significant component of the model, rather than null alleles.
The exact test for Hardy-Weinberg equilibrium revealed a systematic departure from these theoretical proportions, as 13 out of 17 loci had significant P values (Table 2). Similarly, pairwise linkage disequilibrium (LD) was tested on all possible pairs of loci, showing significant values. Out of 136 comparisons, and expecting only seven significant values by chance, 40 pairs of loci showed P values below the strict Bonferroni corrected threshold (0.00036) (Additional file 2: Table S6).
Furthermore, multilocus LD was assessed using the new microsatellites on our sample. While the expected mean values for measures of multilocus LD under panmixia are around cero, the values observed were significantly larger (I A = 2.19 and r d = 0.145; both significant at P < 0.001) ( Fig. 2a and b). The phylogenetic test for recombination shows an observed parsimony tree length of 163, significantly shorter (P < 0.001) than those generated from shuffled datasets (mean tree length: 229 steps, Fig. 3). Both approaches allow rejection of the null hypothesis of panmixia. Using a different marker system on the same group of isolates, we have already shown strong signatures of LD, as independent AFLP markers generated highly correlated distance matrices between isolates [3]. Diverse levels of LD have been reported in various Leishmania species using several marker systems [13][14][15]. Due to the limited sample size we cannot rule Fig. 1 Plot of mean genotypic diversity as a function of the number of loci out that some additional factors may be influencing the LD estimations reported here, such as Wahlund effect, hidden population subdivision, cryptic species within the sampling units, recombination rate, genetic drift, mutation rates, epistasis or selection.
Although several tests performed on our data allow rejection of the hypothesis of panmixia/frequent recombination, the values of association indexes as well as those of inbreeding coefficient are not consistent with strict clonality either. Predominant clonality would produce strongly negative F IS values and a higher index of association (I A and r d ). Taken together, these results suggest a mixed mode of reproduction that involves both clonality and selfing with sporadic recombination. The same mode of reproduction has been suggested for other species of the L. (Viannia) subgenus, including L. braziliensis and L. guyanensis [13,16] as well as some other parasites such as Trypanosoma brucei and Plasmodium falciparum [17,18]. In the closely related L. guyanensis, a simulation approach has shown that a reproductive mating scheme involving frequent sexual events may account for the observed levels of linkage disequilibrium and heterozygote deficit [16].
We have developed and tested a new panel of microsatellites using L. panamensis genomic sequence data. The bioinformatic pipeline employed allowed us to identify