Intraspecific Genomic Divergence and Minor Structural Variations in Leishmania (Viannia) panamensis

Leishmania (Viannia) panamensis is one of the most important Leishmania species associated with cutaneous leishmaniasis (CL) in Latin America. Despite its wide geographic distribution and pathogenic potential in humans and animals, the genomic variability of this species is low compared with other Leishmania species circulating in the same geographical area. No studies have reported a detailed analysis of the whole genome of L. panamensis from clinical isolates using DNA high-throughput sequencing to clarify its intraspecific genomic variability or plausible divergence. Therefore, this study aimed to evaluate the intraspecific genomic variability of L. panamensis from Colombia and Panama. A total of 22 genomes were analyzed, 19 from Colombian patients with CL and three genomes from Panama obtained from public databases. The phylogenomic analysis revealed the potential existence of three well-supported clades as evidence of intraspecific divergence. Additionally, the whole-genome analysis showed low structural variations in terms of ploidy, copy number variations, and single-nucleotide polymorphisms (SNPs). SNPs shared among all clades were identified, revealing their importance in different biological processes of L. panamensis. The findings not only expand our knowledge of intraspecific genomic variability of one of the most important Leishmania species in South America but also highlights the possible existence of different clades/lineages/subpopulations across a geographic scale.


Introduction
Cutaneous leishmaniasis (CL) is an endemic parasitic disease observed in several areas of the world. This clinical manifestation is widely distributed in regions such as sub-Saharan Africa, the Mediterranean, the Middle East, Central and South Asia, and Central and South America [1][2][3]. It can be caused by a wide variety of Leishmania species such as L. braziliensis, L. amazonensis, L. aethiopica, L. mexicana, L. guyanensis, L. panamensis, L. peruviana, L. tropica, and L. major [4]. Despite a large number of Leishmania species associated with this disease in South America, the main species responsible for CL are L. panamensis and L. braziliensis [5,6]. These species are characterized not only by their wide geographical distributions across the region and their associations with the intra-and peridomiciliary transmission cycles [5] but also by their abilities to infect different mammalian hosts and adapt to new

DNA Mapping
The paired-end Illumina reads of 19 Colombian clinical isolates and of the three Panamanian genomes (data downloaded from DDBJ/ENA/GenBank (http://www.ebi.ac.uk/ena) and TriTrypDB (http: //tritrypdb.org) databases) were mapped to the reference UA946 L. panamensis genome [27] (this genome was used as reference due to its high quality of assembly and annotation as reported elsewhere [27]) and assembled with the SMALT program (version 0.7.4) (www.sanger.ac.uk/resources/software/smalt/). The mapping involved the following parameters: exhaustive search option (−x and −y of 0.8), a reference hash index of 13 bases, and a sliding step of 3. An identity threshold of y = 0.8 prevented the mapping of non-Leishmania reads to the reference sequences because SMALT can trim reads before mapping them to the reference sequence. The read file merging, sorting, and elimination of PCR duplicates was implemented with SAMtools (version 0.1.18) and Picard (version 1.85) [30].

Nuclear and Mitochondrial Phylogenomic Inferences
The SNPs from whole nuclear and mitochondrial genomes were extracted and then used to build alignments. The preliminary clustering among L. panamensis isolates was explored using an approximately-maximum-likelihood phylogenetic tree built in FastTree double-precision version 2.1.10 [31]. The robustness of the nodes was evaluated using the bootstrap method (BT, with 1000 replicates). Phylogeographic relationships were comprehensively analyzed from each alignment using a Bayesian evolutionary approach based on Markov Chain Monte Carlo (MCMC) implemented in BEAST-2 [32], considering a node dating step using the geographic origin as reference metadata. For that, the best substitution model was initially chosen in jModelTest v0.1.1 [33]. The MCMC was then carried out considering a strict clock model and the Bayesian skyline population model, with a chain length of 2,000,000 states and resampling every 10% of the states. The effective sample size (ESS) was determined when ESS values were >200 for all parameters, indicating sufficient sampling, monitoring this, Bayesian Skylide analysis conducted in Tracer v1.7.1 [34]. Tree files were summarized with LogCombiner v1. 10.4 (with burn-in of 300,000) and then were annotated in Tree Annotator v2.4.8 (with a burn-in of 100,000) [32], with maximum clade credibility and mean node heights. The obtained tree was visualized using the interactive tool Interactive Tree Of Life V4 (http://itol.embl.de) [35]. Additionally, phylogenetic networks were conducted with the aim to detect recombination signatures in the analyzed population. These analyses were carried out in SplitsTree5 [36], using the neighbor-net method. We included as reference sequence the UA946 (LpanUA) sequence from Urrea et al. [27]. Additionally, we used as outgroup the L. guyanensis genome assembly from DDBJ/ENA/GenBank database (http://www.ebi.ac.uk/ena) under Run accession SRR8179913 (Lguy_SRR8179913).

Evaluation of Chromosome and Gene CNVs
For the chromosomal somy estimation, the median read depth of each chromosome was initially calculated (di). All positions with a read depth of >1 standard deviation away from this initial median were then removed, and the di was recalculated from high-quality reads. Subsequently, the median depth of the 35 chromosomes (dm) for L. panamensis was calculated, and the somy (S-value) of each chromosome was obtained with the following formula: S = 2 × di/dm [37]. The somy values calculated from sequencing data are averages across the potentially variable somy of these cells. For this reason, somy values may be non-integers, representing the mean value of a mixed population. The ranges of monosomy, disomy, trisomy, tetrasomy, and pentasomy were then used to define the full cell-normalized chromosome depth or somy (S) as S < 1.5, 1.5 ≤ S < 2.5, 2.5 ≤ S < 3.5, 3.5 ≤ S < 4.5, and 4.5 ≤ S < 5.5, respectively, as previously described [25]. As the depth of the genomes was sufficient, we did not investigate other normalization factors, such as various percentile depths or a statistically weighted normalization factor. To evaluate the CNVs at the gene level, we defined average haploid depth per gene without somy effect as d HG and the full cell depth with somy effect as d FG . Their relationship was defined as d FG = S × d HG . We evaluated the gene or chromosome copy number by considering their biological and statistical significance. Significance was set at a z-score cutoff of >2 and adjusted p-value (Student's t-test) of <0.05. Tandem gene arrays were defined as groups of homologous genes that were contiguously located on a chromosome. The heatmaps were created using the Heatmap3 package in R [38]. Finally, the genes that presented CNVs were subjected to Gene Ontology enrichment analyses using TriTrypDB tools (http://tritrypdb.org) with Fisher's exact test used to maintain the FDR below 0.05. The GO terms were submitted to REVIGO [39].

SNP Estimations
To detect the SNPs, the reads were aligned to the reference UA946 L. panamensis genome sequence assembly using the SMALT program (version 0.7.4) (http://www.sanger.ac.uk/science/tools/smalt-0). Smalt options for exhaustive searching for optimal alignments and random mapping of multiple hit reads were used. The Picard program (version 1.85) (http://broadinstitute.github.io/picard/) was used for merging and sorting bam files and marking duplicated reads, as described previously [25]. The SNPs of <15 bp were identified using the population-based Unified Genotyper method in the Genome Analysis Toolkit (GATK) (version 3.4; https://software.broadinstitute.org/gatk/). Low-quality SNPs were filtered by GATK Variant Filtration criteria considered were QD < 2.0 || MQ < 40 || FS > 60.0 || ReadPosRankSum < −8.0. To avoid false negatives, the SNP quality cutoff was set at 300. All candidate SNPs were visually inspected in the Integrative Genomic Viewer (IGV_2_3_47) and SAMtools to avoid false positives. Once the SNPs were identified along the genomes, the allele frequency per chromosome was determined in each sample. Homozygous and heterozygous variant SNPs were determined from allele frequency estimation data, considering as homozygous variants to allele shift of < 0.2 or > 0.80 and as heterozygous variants to allele shifts between 0.2 and 0.8 [40]. Additionally, the SNPs shared between and within the three clades were included in a matrix Excel, which was used to perform the Venn diagram and to represent graphically the SNPs shared in a circular bar plot using an R script.
The SnpEff program (version v4.1) [25] was used to classify all SNPs based on their functional impact. The SNPs were considered significantly different when the allele shift difference was at least 0.25 [40], with a Mann-Whitney U test p-value of <0.05.

Data Availability
The dataset generated during the present study was deposited at DDBJ/ENA/GenBank under the accession number PRJEB35090.

Identification of Leishmania Species
The clinical characteristics and the geographical distribution of the Colombian isolates included in the study are presented in Table 1 and Figure S1, respectively. Sanger sequencing analysis performed using as target the Cytb and HSP70 genes allowed us to identify that the samples analyzed corresponded to L. panamensis (Table S1). We also observed that all the samples were collected from the western part of Colombia: 63% of them obtained from Antioquia (12/19), 21% from Choco (4/19), and the remaining 16% from other areas (Santander, Bolivar, and Cauca).

Nuclear and Mitochondrial Phylogenomic Inferences
The preliminary phylogenomic analyses based on nuclear ( Figure S2A) and mitochondrial: Maxicircle ( Figure S2B) genomes using the UA936 L. panamensis (LpanUA) genome sequence as reference and L. guyanensis (Lguy_SRR8179913) genome sequence as outgroup, allowed to confirm that all the genomes evaluated were closely related to LpanUA becoming additional evidence that these genomes belong to L. panamensis. The tree topology revealed the potential existence of three subpopulations clustered in well-supported nodes (with bootstrap ≥ 90.0) ( Figure S2). The subsequent Bayesian phylogeographic analyses from nuclear and mitochondrial genomes ( Figure 1A,B, respectively) showed consistently the existence of three main clusters, which were later confirmed by the topology of the phylogenetic networks using the network algorithm method ( Figure 1C,D, respectively), supporting the hypothesis that L. panamensis is diversifying into three major clades having relationship with their geographical origin: Clade-1 (highlighted in purple) where all Panamanian sequences fell in (LPPSC-1, SRR10246848, SRR10246848), the Clade-2 (highlighted in sky blue), which included five Colombian genomes (LpS8036, LpS8046, LpS8172, LpS8192, and LpS8131) and the Clade-3 (highlighted in green), where the other Colombian genomes were clustered. Interestingly, the Clade-2, which included Colombian isolates collected from Chocó and Antioquia (two west Colombian regions located close to Panama), was closer to Clade-1 (which includes all isolates from Panama), than to the Clade-3, which included the other Colombian isolates collected inside the country (Antioquia, Cauca, and Bolivar).
Another interesting result is associated with the LpS8061 sequence, the only genome that could not be assigned to any of the three major clades identified ( Figure 1A-D), that was collected in Santander, a geographical region located in eastern Colombia. Finally, during the analysis of the mitochondrial genome ( Figure 1B-D), two remarkable swapping events were identified in the phylogenetic tree ( Figure 1B), that were supported by the phylogenetic network topology ( Figure 1D). One of them involved the change of LpS8046 genome from Clade-1 to Clade-2 and the second event involved the identification of two possible sub-clades relatively distant in the Clade-3 (sub-clade-1 formed for LpS7762, LpS8124, LpBON94, LpS7842, LpBON83, LpS8117, LpS8136, Lp8049, and Lp8144 genomes, and sub-clade-2 for LpS7694, LpS8109, LpS8056, and LpS807 genomes); these possible clustering patterns were analyzed and compared with the clinical and demographic characteristics of each genome, however, in this study we did not observe any relationships among them.

Evaluation of Chromosome and Gene CNVs
We used the reads obtained in the sequencing to estimate the copy numbers per chromosome in all genomes analyzed. When we compared the somy values among them, we observed that in most of the genomes analyzed 34 of 35 chromosomes exhibited disomic behavior, except chromosome 31, which presented more than three copies in all genomes. Additionally, we observed that in some genomes of Clade-1 (SRR10246848) and of Clade-3 (LpS8109 and LpS8136), the chromosome 1 presented a trisomic behavior compared with the other genomes ( Figure 2). The S-values were consistent with the somy values predicted based on the alternative allele frequency profiling. The allele frequency counts for each predicted heterozygous SNP did not exhibit discordance between read depths and allele frequencies, confirming the accuracy of the previously described somy profiles.

SNP Estimations
We analyzed and determined the number of SNPs in each of the genomes analyzed. The obtained results showed 61,388 SNPs in total. When we evaluated the potential effects of those SNPs on gene function, we identified SNPs with a moderate or high functional impact, approximately 2700 SNPs per genome, with the lowest (1831) and highest (3732) numbers of SNPs with functional impact identified in LpS8124 and SRR10246848 genomes, respectively ( Figure 4A); considering the total number of variants, our results showed no relationship between the number of SNPs and the ploidy (e.g., chromosome 31: more than three copies); however, the number of SNPs was associated with the chromosome size because larger chromosomes presented a greater number of SNPs (chromosomes 20, 31, 34, and 35) than the shorter chromosomes ( Figure 4A,B).

Genome ID Number of Stop Lost Number of Stop Gained Synonymous Variant Other Variants Total
Cellular zinc ion homeostasis Glycerol transport Detection of stimulus Prolyl-tRNA aminoacylation Cellular amide metabolism

SNP Estimations
We analyzed and determined the number of SNPs in each of the genomes analyzed. The obtained results showed 61,388 SNPs in total. When we evaluated the potential effects of those SNPs on gene function, we identified SNPs with a moderate or high functional impact, approximately 2700 SNPs per genome, with the lowest (1831) and highest (3732) numbers of SNPs with functional impact identified in LpS8124 and SRR10246848 genomes, respectively ( Figure 4A); considering the total number of variants, our results showed no relationship between the number of SNPs and the ploidy (e.g., chromosome 31: more than three copies); however, the number of SNPs was associated with the chromosome size because larger chromosomes presented a greater number of SNPs (chromosomes 20, 31, 34, and 35) than the shorter chromosomes ( Figure 4A,B).
Of the SNPs with functional impact, we identified that 64.8% of them were homozygous SNPs and 35.2% heterozygous SNPs (Table S3), which describe the high genomic stability of L. panamensis (2700 SNPs per genome analyzed) compared with the stability of other Leishmania species (e.g., L. braziliensis, which presents an average of 17,000 SNPs per isolate). Most variants observed were synonymous, whereas the others affected the stop positions ( Table 2). Several of SNPs identified are located in genes encoding hypothetical proteins, whereas the others are mainly located in genes encoding transporter proteins (pteridine transporter), surface proteins (amastin surface protein), and cytoskeletal proteins (β-tubulin). Later, we evaluated the SNPs shared between and within the clades identified. When we compared the genomes included on each clade, we observed that the Clade-3 had the lowest number of SNPs shared within the genomes (653 SNPs), compared with Clade-1 and Clade-2 where the number of SNPs shared was 2097 and 2429, respectively ( Figure 4C); in general most of SNPs shared in each clade were located in genes encoding proteins associated with hypothetical function (56-67%) and the remaining located in genes with known function (33-46%). When we compared the three clades (Clade-1 + Clade-2 + Clade-3), we identified 489 SNPs shared between them ( Figure 4C) of which 225 (46%) were mainly located in genes encoding transporter proteins (pteridine transporter, ABC transporter, folate/bioperin transporter, and iron/zinc transporter), intracellular degradation-associated proteins (ubiquitin-conjugating enzyme putative), host-pathogen interaction-associated proteins (amastin surface protein), or LPG biosynthesis-associated protein (galactofuranosyl glycosyltransferase) (Table  S4). Additionally, the comparison across the three clades determined not only the presence of unique SNPs in each one of them, possibly associated with the geographical origin of the samples, but also the elevated number of SNPs shared between Clade-1 and -2 (266 SNPs) ( Figure 4D).

Discussion
Although the extensive literature has revealed intra-and interspecific genetic variability of some Leishmania species (mainly L. donovani, L. major, L. tropica) [23,30,41,42], little is known about the genetic variability of Leishmania (Viannia) species. The reported studies so far have focused on L. braziliensis, where the majority report consistent findings in describing its high intraspecific genetic diversity [16,26,43] even in cell populations isolated from a single L. braziliensis strain ("sub-strain") [44], while in L. panamensis the results are not yet concordant, probably for the molecular approaches used. While studies using MLST reveal extreme genetic homogeneity within L. panamensis [10], others using MLEE [5], sequencing of specific genes (Cytb/HSP70) [6], AFLP [13], or microsatellites [14], have reported a high intraspecific genetic diversity in this species. Considering that DNA highthroughput sequencing has the potential to identify changes in the whole genome structure, we used this approach to reveal the intraspecific genomic variability of L. panamensis from Colombian clinical isolates and Panamanian genomes retrieved from databases.
Initially, and with the purpose of determining the relationship among the 22 genomes analyzed, we performed a Bayesian phylogenetic analysis on nuclear and mitochondrial genomes ( Figure 1A-D). The results obtained both in nuclear and mitochondrial analysis, allowed us to identify the potential existence of three clades (Clade-1, Clade-2, and Clade-3). Clade-1 was formed by Panamanian sequences, Clade-2 was formed by genomic sequences from northwest Colombia (Chocó and Antioquia), and Clade-3 was formed by genomic sequences from other regions of Colombia (Bolivar, Cauca, and Antioquia). Interestingly the genomes that fell into the Clade-2 are closer to Clade-1 (Panamanian genomes) than Clade-3; those results could indicate that in Colombia two different phylogroups of L. panamensis can be found. Comparing the transmission cycles of L. panamensis in Colombia and Panama; the epidemiological landscape in Panama is mainly enzootic and involves different sand fly vectors (Lutzomyia trapidoi and Lutzomyia panamensis) [45,46] and vertebrate reservoirs (Choloepus hoffmani and Bradypus griseus) [46,47], while in Colombia, the transmission cycles are predominantly intra and peridomiciliary [5,48,49]. In addition, considering that some studies describe that the high transmission rate of L. panamensis appears to be mainly related to high population densities of animal reservoirs (Choloepus hoffmani) in areas of the oldgrowth forest [46,50]. We believe that the presence of the new clade (Clade-2) in the Colombian territory could be due to (i) the migration of these reservoirs from Panamá to Colombia as result of

Discussion
Although the extensive literature has revealed intra-and interspecific genetic variability of some Leishmania species (mainly L. donovani, L. major, L. tropica) [23,30,41,42], little is known about the genetic variability of Leishmania (Viannia) species. The reported studies so far have focused on L. braziliensis, where the majority report consistent findings in describing its high intraspecific genetic diversity [16,26,43] even in cell populations isolated from a single L. braziliensis strain ("sub-strain") [44], while in L. panamensis the results are not yet concordant, probably for the molecular approaches used. While studies using MLST reveal extreme genetic homogeneity within L. panamensis [10], others using MLEE [5], sequencing of specific genes (Cytb/HSP70) [6], AFLP [13], or microsatellites [14], have reported a high intraspecific genetic diversity in this species. Considering that DNA high-throughput sequencing has the potential to identify changes in the whole genome structure, we used this approach to reveal the intraspecific genomic variability of L. panamensis from Colombian clinical isolates and Panamanian genomes retrieved from databases.
Initially, and with the purpose of determining the relationship among the 22 genomes analyzed, we performed a Bayesian phylogenetic analysis on nuclear and mitochondrial genomes ( Figure 1A-D). The results obtained both in nuclear and mitochondrial analysis, allowed us to identify the potential existence of three clades (Clade-1, Clade-2, and Clade-3). Clade-1 was formed by Panamanian sequences, Clade-2 was formed by genomic sequences from northwest Colombia (Chocó and Antioquia), and Clade-3 was formed by genomic sequences from other regions of Colombia (Bolivar, Cauca, and Antioquia). Interestingly the genomes that fell into the Clade-2 are closer to Clade-1 (Panamanian genomes) than Clade-3; those results could indicate that in Colombia two different phylogroups of L. panamensis can be found. Comparing the transmission cycles of L. panamensis in Colombia and Panama; the epidemiological landscape in Panama is mainly enzootic and involves different sand fly vectors (Lutzomyia trapidoi and Lutzomyia panamensis) [45,46] and vertebrate reservoirs (Choloepus hoffmani and Bradypus griseus) [46,47], while in Colombia, the transmission cycles are predominantly intra and peridomiciliary [5,48,49]. In addition, considering that some studies describe that the high transmission rate of L. panamensis appears to be mainly related to high population densities of animal reservoirs (Choloepus hoffmani) in areas of the old-growth forest [46,50]. We believe that the presence of the new clade (Clade-2) in the Colombian territory could be due to (i) the migration of these reservoirs from Panamá to Colombia as result of landscape alteration which triggers demographic changes that lead to occasional migrations out of the territory [46,51], (ii) occupational exposure to infectious sandfly bites when humans enter the forest [52,53], or (iii) human migration from or to Colombia [54,55].
Another possible explanation for the presence of two clades in Colombia, can be associated with possible disruptive segregation of ancestral populations of L. panamensis, similar to that proposed for Triatoma dimidiata in the transmission of Trypanosoma cruzi [54]. This hypothesis is supported by the wide variety of sandflies species that can transmit Leishmania in Colombia; and also, when we analyze the number of SNPs of Clade-3, this has the lowest number compared to other clades (653 SNPs) which could suggest it is an ancestral clade that has suffered a possible bottleneck. However, we believe that additional studies and a larger pool of Colombian and Panamanian genomes are necessary to confirm this hypothesis including Bayesian dating analysis that helps us to depict the emergence time scale of these clades. An additional observation about the genomic divergence found is associated with the geographical localization of the clades identified. While the genomes belonging to Clade-3 occupy Colombian mountain ecosystems (western cordillera), most of the genomes of the Clade-2 (specifically Chocó) and all genomes of Clade-1 are located outside of these mountainous regions, in dry and warm zones at lower altitudes ( Figure S1), which suggest not only that the western cordillera is acting as geographic barrier to avoid the dispersion of native vectors species from the eastern to western region, but also, could be separating two divergent genetic clades producing a strong inter-population structure, as has been proposed for different Triatominae [53,54] and Lutzomyia [49,55,56] species.
We identified a swapping event when we compared tree topologies of nuclear and mitochondrial genomes, where there is a marked change in clustering of a genome belonging to Clade-2 in the first case, to Clade-1 in the second, respectively (Figure 1). This finding may represent evidence of the existence of introgression events, the transfer of genetic material, usually via hybridization and backcrossing, from one entity (clade in this case) into the gene pool of a second divergent entity, which has been described as an alternative for the rapid evolution on ecological time scales playing a key role in response to environmental changes [57]. In addition, the evidence of two potential sub-clades detected in the mitochondrial genome phylogenies ( Figure 1C,D) may suggest that genes carried in the mitochondrial genome are important for local adaptation and dispersion of L. panamensis, a validated theory for genes exposed to introgression in other species including Trypanosoma cruzi [57,58]. These phylogenetic signatures represent initial findings of the possible dispersal events of this new world Leishmania species; however, studies are required to clarify the population structure and the evolutionary forces that have shaped this divergence, through the development of studies with a larger number of samples, ideally involving other countries.
Subsequently, we analyzed the structural changes to chromosome level using the copy number variation as an indicator. Various studies have described that some Leishmania species (L. infantum, L. major, L. donovani, L. amazonensis, L. braziliensis, and L. panamensis) generate changes in the number of chromosomes as an environment-dependent mechanism, as well as an adaptation process in response to stress [24,25,28,37,[59][60][61]. However, although this mechanism is shared between Old and New World Leishmania species, the changes in the number of chromosomes vary considerably across species. The results obtained in this study describe the homogeneity in somy across the L. panamensis genomes herein analyzed ( Figure 2) and additionally confirm the high somy value in chromosome 31, which is characteristic of all the Leishmania species evaluated to date [20]. These results are consistent with the dynamics of ploidy observed in genomes SRR10246848 and SRR10246849 analyzed by Restrepo et al. [28]. The lack of genomic plasticity at the chromosomal level observed for L. panamensis has also been reported in other pathogens such as T. brucei; recently, a study has shown that none of the T. brucei subspecies present aneuploidy [62]. The low aneuploidy in these pathogens could be explained as a possible feature of adaptation of these parasites to human hosts or could be related to the loss of recombination events in this species, similar to that proposed for T. brucei [62]. This is also supported by the low number of SNPs found in the L. panamensis genomes analyzed which would explain a plausible mechanism of adaptation to human infection. This adaptation process has been clearly demonstrated in Colombian clinical isolates from an urban population survey conducted by our group where L. panamensis is the predominant species [5] compared with the enzootic population where L. braziliensis is the more prevalent species [6].
Another parameter evaluated in this study corresponded to local CNVs. The obtained results allowed us to identify a low number of genes with CNVs in the genomes analyzed ( Figure 3A,B); these results reinforce the concept of a low level of structural variation in L. panamensis. Despite these findings, we analyzed the genes with CNVs and observed that 44 of them were shared among all the genomes. Interestingly, some of them have been previously identified by other authors and include genes involved in parasite survival and virulence [27,28], which indicate their importance during the adaptation to human hosts. Additionally, the ontology enrichment analysis revealed that most genes with CNVs were involved in cellular ion zinc homeostasis in many clinical isolates; some of these genes were previously identified in L. infantum [63], as reported by a previous study, wherein the authors have described the need of these genes in the parasite to maintain the intracellular concentrations of zinc and avoid the toxic effects of this metal. Considering the aforementioned findings, we believe that L. panamensis, similar to other Leishmania species, needs to maintain intracellular zinc homeostasis because of the importance of this metal as a cofactor of several hundred proteins involved in replication, infectivity, and virulence [64,65]. Another group of genes with CNVs observed in some genomes was associated with glycerol transport. Previous studies have described that glycerol can serve as a precursor for the synthesis of complex carbohydrates (such as β-mannan) or glycoconjugates (such as LPG) [66], suggesting that the transport of this component in L. panamensis is essential for its survival and infection establishment. In this context, natural selection could be occurring favoring that CNVs are located in pivotal genes for the survival of the parasites. Nevertheless, future studies should consider the effect of natural selection of microorganisms with low structural variation as L. panamensis.
The last genomic analysis used to evaluate the genetic variability among the genomes included in this study was associated with the identification of nucleotide-level variations (SNPs). Initially, we observed that the number of SNPs per genome in all the clades analyzed was low (≈1800 to ≈3700 SNPs) ( Figure 4A), compared with the results obtained for other Leishmania (Viannia) species, such as L. braziliensis, where the number of variants observed in clinical isolates from Brazil ranged from ≈96,000 to ≈132,000 SNPs [26], in clinical isolates from Colombia was approximately 18,000 (data in preparation), and in L. peruviana isolates was approximately 27,000 [21]. These results together with the relationship between the number of homozygous/heterozygous SNP sites (39.785/21.603), observed in this study (Table S3), indicate not only a considerable genetic differentiation among species belonging to the same subgenus (L. braziliensis, L. peruviana, and L. panamensis) but also substantially lower variation in terms of SNPs in L. panamensis. We considered that the low genetic diversity observed in L. panamensis, might be shaped probably due to the process of adaptation to human infection. This could generate a bottleneck in this species that favored certain genotypes (founder effects) and promoted the stochastic loss of others, allowing the drift on its genomic diversity, as has previously been pointed out for T. brucei [67].
When we evaluated the SNPs shared within the genomes of each clade, we observed that despite geographic proximity into the genomes of Clade-2 and -3 ( Figure S1), they presented a great difference regarding their genomic diversity. We observed a greater genetic homogeneity in the genomes belonging to the Clade-3 than in the genomes belonging to Clade-2 ( Figure 4C). In addition to these findings, we observed that the genomes of Clade-2 present a similar number of SNPs ( Figure 4A) and SNPs shared ( Figure 4C), with the genomes belonging to the Clade-1 (Panamanian genomes). Interestingly, when we analyzed the relationship between the three clades, we observed that Clade-2 shared more SNPs with Clade-1 (266 SNPs) than with Clade-3 (24 SNPs) ( Figure 4D). The genetic and geographic proximity of the genomes belonging to Clade-1 (2097 SNPs) and Clade-2 (2429 SNPs) and the difference regarding the genomes of Clade-3 (653 SNPs) suggest not only the presence of a new clade of L. panamensis in Colombia, but also the possible incorporation of it from Panama, probably due to migration of humans, vectors, or reservoirs towards our country. These results are very important since the incorporation of a new genotype could impact not only at the epidemiological level of CL but also probably affect the clinical and therapeutic response to antimonials. However, we believe that is necessary to analyze a larger number of Panamanian and Colombian genomes to confirm these hypotheses. Another interesting finding was associated with the 225 SNPs annotated, shared between the three clades analyzed ( Figure 5A,B. Table S4), most of them previously described in the literature and involved in the different biological processes of the parasite [27,68].

Conclusions
In conclusion, this is the first study to report the intraspecific genomic variability of L. panamensis. The results demonstrate the existence of three well-supported clades as evidence of intraspecific divergence as well as the low structural variability in terms of somy, CNVs, and SNPs, which agree with a previous study [28]. Additionally, a particular phenomenon of adaptation to human infection as is the case of the Colombian clades due to their significant decrease in SNPs number was found. These findings suggest that this parasite does not require major structural changes to adapt to the vertebrate host and vectors responsible for the maintenance and transmission of CL but indeed drifts enough to maintain a stable and successful transmission cycle. However, we consider that additional analyses with a large number of genomes from different South American regions are necessary to confirm these findings.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/3/252/s1, Figure S1: Geographical distribution of genomes included in the study. Georeferenced genomes discriminated by geographical localization. The maps were built on QGIS 3.6.0 based on the GPS coordinates of each isolation (QGIS Geographic Information System, Open Source Geospatial Foundation Project, http://qgis.osgeo.org). The circles represent each clade identified in this study. Figure S2: Preliminary Leishmania panamensis phylogenetic analysis of nuclear and mitochondrial genomes. The phylogenetic reconstructions represent the preliminary results that show the potential existence of three populations in the genome set of L. panamensis analyzed based on (A) nuclear and (B) mitochondrial (Maxicircle) SNPs alignments. LpanUA as reference genome of L. panamensis and Lguy_SRR8179913 (L. guyanensis) was used as an outgroup. The trees were built in FastTree double precision version 2.1.10 [31] and visualized in the interactive Tree Of Life V4 tool (http://itol.embl.de) [35], Table S1: Percent of similarity between the sequences of each Colombian clinical isolate with the sequences deposited on the Genebank database, in the two genes evaluated (Cytb and HSP70), Table S2: List of genes with copy number variations (z score cut off >2 and adjusted p-value) in all the genomes analyzed, Table S3: Total number of homozygous and heterozygous SNPs found in the genomes analyzed. Table S4: SNPs shared among the three clades identified.