Diversity and Within-Host Evolution of Leishmania donovani from Visceral Leishmaniasis Patients with and without HIV Coinfection in Northern Ethiopia

ABSTRACT Visceral leishmaniasis (VL) is a fatal disease and a growing public health problem in East Africa, where Ethiopia has one of the highest VL burdens. The largest focus of VL in Ethiopia is driven by high prevalence in migrant agricultural workers and associated with a high rate of coinfection with HIV. This coinfection makes VL more difficult to treat successfully and is associated with a high rate of relapse, with VL/HIV patients frequently experiencing many relapses of VL before succumbing to this infection. We present genome-wide data on Leishmania donovani isolates from a longitudinal study of cohorts of VL and VL/HIV patients reporting to a single clinic in Ethiopia. Extensive clinical data allow us to investigate the influence of coinfection and relapse on the populations of parasites infecting these patients. We find that the same parasite population is responsible for both VL and VL/HIV infections and that, in most cases, disease relapse is caused by recrudescence of the population of parasites that caused primary VL. Complex, multiclonal infections are present in both primary and relapse cases, but the infrapopulation of parasites within a patient loses genetic diversity between primary disease presentation and subsequent relapses, presumably due to a population bottleneck induced by treatment. These data suggest that VL/HIV relapses are not caused by genetically distinct parasite infections or by reinfection. Treatment of VL does not lead to sterile cure, and in VL/HIV, the infecting parasites are able to reestablish after clinically successful treatment, leading to repeated relapse of VL.


disability-adjusted life years [DALYs]
). VL is caused by infections with protozoan parasites from the Leishmania donovani species complex. While clinical VL is frequently fatal without adequate treatment, most L. donovani infections remain asymptomatic (see reviews in references 3 and 4), or at least subclinical, and much remains unknown about what leads to the development of severe disease (5) or to the success or failure of treatment. Most of the global burden of VL is in the Indian subcontinent, but East Africa represents the second largest burden of VL caused by L. donovani. While the number of cases has been falling in the Indian subcontinent (6), they continue to increase in Africa. Ethiopia, Sudan, and South Sudan have the highest prevalence in this region, and VL is one of the most significant vector-borne diseases in Ethiopia, with .3 million people at risk of infection in a number of distinct foci in lowland areas of the country (7,8).
The most important focus of VL in Ethiopia is around Metema and Humera in the Northwest of the country, where since the 1970s, there has been an important focus of disease particularly associated with seasonal migration of nonimmune laborers from the surrounding non-endemic highland regions for agricultural work. Several outbreaks have occurred in this focus, and there is evidence that this focus is spreading and has seeded outbreaks in other previously non-endemic areas (7)(8)(9)(10). One factor in the growth of VL in Ethiopia is coinfection and, particularly, coinfection of L. donovani with HIV. While VL/HIV coinfection is a widespread concern (11), Ethiopia has the highest rate of VL/HIV coinfections in Africa, and possibly globally; while estimates vary between studies, HIV may be present in .20% of VL cases (7,12). In Ethiopia and elsewhere, HIV coinfection is known to increase the probability that L. donovani infections progress to symptomatic visceral leishmaniasis (13,14). These coinfections are difficult to treat, with poor clinical outcomes for patients with very high mortality and high rates of relapse following treatment (14)(15)(16)(17)(18). It is generally thought that cure of VL is not sterile, so that parasites continue to be present, but there is little direct evidence of this. It is unknown whether relapses of VL are due to reinfection by parasites, which we would expect to occur more commonly when patient immunity is compromised, or due to recrudescence of the original infection from parasites that persist through original treatment.
There has been significant work on the genetics of L. donovani in East Africa, which has revealed Ethiopia as a hot spot for diversity of visceral leishmaniasis, with two major genetic groups of L. donovani occurring (19). These groups were shown to be transmitted by distinct sandfly vectors and are separated by the rift valley (20,21). A single Ethiopian isolate has been identified from a third group of parasites, related to those found on the Arabian Peninsula and adjacent areas of the Middle East (22). To add to this complex picture, a 2004 outbreak of VL in northern Ethiopia has been associated with a diverse population of parasites that persisted after the outbreak (9), some of which are hybrids between two of these groups (19,23,24). However, the studies to date represent relatively small numbers of isolates often collected over an extended period of time and from a wide geographical area. While this has established an important baseline in understanding the diversity and basic structure of parasite populations, these studies give little or no insight into the origins of the observed genetic variation or its phenotypic consequences. For example, any genetic variation driving variation in disease presentation or clinical outcome is confounded by variation over time and space.
Here, we report whole-genome sequence data from parasites isolated from two cohorts, VL patients and patients coinfected with HIV and Leishmania. All patients were recruited at a single clinic in Northern Ethiopia over a period of 18 months. This makes our work one of the largest studies of genome-wide genetic variation in leishmaniasis at a single place and time. Extensive clinical and immunological data were collected from these patients over 12 months of follow-up, and long-term outcomes were recorded up to 3 years post-VL diagnosis; thus, the parasite genome data we report is from a well-understood patient population. Full details of patient recruitment and clinical data from these patients were previously reported (25). Parasite isolation is generally only possible when either bone marrow or splenic aspirates are being collected for parasitological investigation, which is routinely performed at the time of diagnosis of each VL episode. This includes additional isolates from individual patients that were collected where possible during subsequent episodes of clinical relapse during the follow-up of this cohort. We thus present the first-to our knowledge-longitudinal data on within-host evolution of Leishmania parasites. These data demonstrate that most relapse cases are caused by the originally infecting parasites and that the parasite infrapopulations (the population of parasites within a host at a particular time) undergo strong bottlenecks during treatment. VL relapse is not generally associated with known markers for drug resistance, and despite some diversity, we show that parasites from VL infections without HIV and VL/HIV coinfections represent the same parasite population. We conclude that relapse in VL/HIV patients is due to the persistence of parasites after treatment that, in the absence of fully functional host immunity, can re-expand to cause the recurrent episodes of disease.

RESULTS
No distinct phylogenetic origin of parasites from VL and VL/HIV patients in northern Ethiopia. We sequenced the genomes of 108 L. donovani isolates from a total of 98 VL patients in Ethiopia. For five isolates, two aliquots of the same initial culture were processed and sequenced independently. The genome-wide haploid coverage of the sequenced samples ranged between 7 and 30 (median, 13). Our study included 68 patients who had VL only and 29 patients who were additionally coinfected with HIV and were receiving antiretroviral treatment (25). The complete metadata of samples taken, technical replicates, and patients are summarized in Table S1 in the supplemental material.
These parasites were isolated from a cohort of VL patients in which 78.1% of the successfully treated VL/HIV patients relapsed within a 3-year time period, while none of the patients without HIV infection relapsed (25). Therefore, we tested for the possibility that there is a genetic difference between parasites isolated from HIV-negative versus HIV-positive VL patients that could explain this difference in relapse rate. Phylogenetic reconstruction using the first parasite isolate taken from each patient, which is not necessarily the first episode of VL in the HIV patients, showed the genetic relatedness between isolates based on whole-genome single nucleotide polymorphism (SNP) data (Fig. 1A). It suggested that there was no clear phylogenetic structure between parasites from the two groups. To investigate more subtle genetic structure, we used two measures to test for a possible association of HIV status with genomewide parasite genomics, based on those isolates from 67 HIV-negative and 29-HIV positive patients, respectively. Analysis of similarities (ANOSIM) using pairwise genomewide Nei's D as distance measure (R = 0.09321, P value = 0.0290, false-discovery rate [FDR] = 0.048) as well as Blomberg's K (K = 0.0348, P value = 0.048, FDR = 0.048) both showed an association of HIV status with the phylogeny at an a of 0.05 but not at an a of 0.01. Isolates from HIV-positive patients were more frequently samples from second or subsequent relapses of VL disease, which could introduce a bias due to a longer period of within-host evolution in these infections. Therefore, we also tested for the association of HIV status based on isolates taken during the first episode of VL disease for a particular patient (see Fig. S1) (67 samples from HIV-negative and 5 samples from HIVpositive patients). In this case, no significant association was present (ANOSIM, R = 20.0602, P value = 0.6; Blomberg's K, K = 0.0197, P value = 0.4).
Recurrent VL in HIV patients is caused by persistent parasites. Seven of the patients in our study had multiple parasite isolates taken as part of clinical investigation of VL relapses during the first 12-month follow-up period (Table 1) For the remaining 92 patients, parasite isolates could only be taken at a single time point during our study either due to the absence of relapses or because taking additional bone marrow or splenic aspirates was not clinically justified. To identify if recurrent VL had been caused by re-expansion of persistent parasites within a patient after initial cure, we compared Nei's genetic distances based on whole-genome SNP data between all pairwise sample combinations.  Longitudinal samples of parasites from a patient were typically most closely related to parasite samples from the same patient at different times ( Fig. 2A). For two time series samples (1023_EoT_1_primary_pos_0 and 1045_ToD_1_recurrent_pos_3), the genetically most similar samples came from another patient ( Fig. 2A and S2A and B). For patient 1023, the genetic distance between samples gradually increased with time, while the first sample from this patient was very closely related to isolates from two other patients. However, for patient 1045, parasites isolated from the third VL episode were very different from parasites isolated 7 months later during the fourth recurrent VL episode, being more closely related to several isolates from other patients. This trend was confirmed when comparing sequential isolates from a patient to the initial isolate from the same patient (Fig. 2B). While within-host parasite changes gradually increased with time, those genetic distances were still very small compared to the median pairwise genetic distance in our entire data set. In this context, two isolates taken just before and after treatment of the fourth VL relapse for patient 1045 were clearly outliers, with a genetic distance similar to that of pairs of samples from different patients, indicating possible reinfection only in this case (Fig. 2B). Together, this suggests that the majority (6 of 7) of HIV-coinfected VL patients relapsed due to the original persistent parasite strain rather than reinfection with L. donovani.
Reduction in parasite heterozygosity during within-host evolution. As persistent Leishmania populations seem to play a dominant role in recurrent VL in HIV-positive patients, we investigated whether particular patterns of genomic change might be associated with this prolonged within-host evolution. Ideally, we would compare longitudinal samples within the same patient over multiple relapses of VL, but the ethical constraints on taking appropriate isolates imply that relatively few such sample sets could be taken (see Table 1). The largest proportions of our parasite samples were isolated from different patients with primary VL and at different numbers of recurrent VL episodes. The identification of convergent genetic changes that could indicate functionally relevant adaptations to prolonged immune challenge is more difficult from these cross-sectional data, as changes will arise in different genomic backgrounds, and  Diversity and Within-Host Evolution of Leishmania ® within-host adaptation will be conflated with genetic variation that was present in the initially infecting strain. We thus first inspected more general patterns of genomic change with recurrent VL. The isolates represent samples of the diversity of parasites present in the infected tissue. From a previous analysis, it is clear that diploid SNP variants called by GATK can show allele frequencies far from the 0.5 expected for a diploid genotype (22). We therefore reasoned that the proportion of variable sites called as heterozygotes would be a good approximation of the genetic diversity of the infecting infrapopulation and give some indication of the effective population size. While we refer to this measure as "heterozygosity" it represents a combination of heterozygous differences within individual parasite clones and genetic differences within a diverse and potentially polyclonal infection. In our data, heterozygosity was much lower in isolates from recurrent VL in HIV-positive patients than in isolates from primary VL in HIV-negative patients (   (Table 1). (A) Heat map of pairwise Nei's distances between isolates from the seven patients with longitudinal sampling along with any other isolates from our cohort most closely related to those. Row and column side colors on the outside indicate patient origin, with colors as used described for panel B, except that gray indicates isolates from patients represented by only a single isolate. Samples appear in the same order in row and column labels, but sample row names are abbreviated. (B) Increase of genetic distance between the first parasite isolate from a patient and subsequent isolates. (Left) Distribution of pairwise genetic distances across all 106 parasite isolates, excluding technical replicates and isolates with incomplete metadata. (Right) Pairwise Nei's distances between the first isolate for each patient with longitudinal samples and subsequent samples from that patient (y axis), plotted against the time in months between sampling dates (x axis). Samples are colored by patient origin. primary_neg, FDR = 0.0000004; recurrent_pos versus primary_pos, FDR = 0.0893283; pri-mary_pos versus primary_neg, FDR = 0.8705747). Isolates from primary episodes of VL in HIV-infected patients had intermediate levels of heterozygosity, with a greater difference of the "primary_pos" cohort to the "recurrent_pos" than to the "primary_neg" cohort. This analysis left it unclear whether reduction in heterozygosity was associated with multiple episodes of VL disease only or if differences between the primary VL in HIV-positive and HIV-negative patients were also present. Therefore, we tested whether the fraction of heterozygous sites in primary VL isolates varied with HIV infection status when the reported length of illness before parasite isolation or the estimated parasite load at the time of isolation were taken into account. However, neither of these covariates had a significant effect on heterozygosity, and no difference between HIV categories was observed (see Fig. S3A and B). Diversity and Within-Host Evolution of Leishmania ® Next, we investigated the timing of the reduction in heterozygosity over the course of primary and relapse VL episodes. In the cross-sectional comparison, there was a strong difference in heterozygosity between isolates from patients with primary infections and those from patients experiencing the first relapse of VL (Fig. 3A), suggesting that treatment of primary VL itself already reduced heterozygosity levels. To test this directly, we looked at how heterozygosity of parasite isolates varied with the number of previous VL episodes and compared samples taken before and after treatment and in both HIV-positive and -negative patients (Fig. 3B). For the primary VL data, all three posttreatment samples suggested a noticeable reduction in heterozygosity levels compared to that in before-treatment isolates, but the low sample size impeded assessing statistical significance (Fig. 3B). Moreover, a small treatment effect was also still present for recurrent VL in our time series data (Fig. 3C, patients 1023 and 1045). While the reduction in heterozygosity between primary VL and the first relapse of VL was strongest, there was a trend for a linear decrease in heterozygosity of small effect size from the first VL relapse and subsequent recurrence of VL, though only significant at a lenient threshold of a = 0.1 (linear model fraction of heterozygous genotypes [fHL] ; number of relapses, P value = 0.0798) (Fig. 3D). This relationship was also reflected in our within-patient longitudinal data, where parasite heterozygosity levels decreased through time with additional relapse episodes. The only exception was a clear increase for patient 1045, for whom we had previously suggested reinfection (Fig. 3C). Taken together, these data suggest a strong effect of initial drug treatment on heterozygosity levels, followed by a weaker continuous effect, presumably due to repeated population size decreases during drug treatment.
Infections in recurrent VL are as complex as primary infections. The observed reduction of heterozygosity with recurrent VL could have been partly due to clonal diversity compared to a heterozygous allele shared across all cells in a population. Therefore, we investigated the amount of possible clonal diversity (the complexity of infection) in our isolates. By inspecting the distributions of minor allele frequencies at diploid chromosomes, we categorized them into isolates likely to come from single clonal infections (see Fig. S4A), isolates potentially representing complex infections with multiple clones (Fig. S4C), and those with slightly noisy allele frequency distributions that either represented very few variants or had a few variants present at unexpected frequencies ( Fig. S4B and S5). Despite this somewhat subjective assessment, complex infections were clear outliers in terms of mean distance from the expected allele frequency and the variance of observed frequencies (Fig. S4D). Complex infections were found in 9% (6/68) of our primary isolates and 14% (4/29, one additional technical replicate) of the isolates from recurrent VL. None of our within-patient longitudinal data gained complexity with time, but one lost complexity between recurrent VL episode 1 versus 2 (patient 1040). The four complex isolates from recurrent VL were from episodes 1, 2, and twice from episode 10. Noisy profiles were more common in isolates from recurrent VL likely due to the small number of heterozygous sites in these isolates that reduced the signal-to-noise ratio. If the reduced heterozygosity we observe in recurrent VL was due to loss of clones in complex infections, we would expect complex infections to be common in primary isolates and less common in isolates from recurrent VL, which was not the case for our samples.
Unclear role of variation in chromosome dosage in within-host heterozygosity reduction. Leishmania species are known to vary widely in chromosome copy number, with even different cells within a single clonal population varying in chromosome dosage (termed mosaic aneuploidy). This characteristic is generated through rapid change in chromosome copy number between mother and daughter cells for a subset of chromosomes, leading to cells having different aneuploidy profiles of chromosome dosage. Reductions in copy number are expected to reduce heterozygosity of individual parasites (26). Moreover, frequent turnover of chromosome copies should lead to a reduction in observed strain allelic diversity if a particular haplotype is favored by selection or drift is high due to small population sizes. While previous work suggests that aneuploidy mosaicism and turnover are rare in the mammalian host compared to that in promastigote culture (27), it could still contribute to the observed reduction of heterozygosity during within-host evolution. As we only sequenced DNA extracted from pools of cells for each parasite isolate instead of from individual cells, we cannot observe cell-to-cell variation. Therefore, we compared aneuploidy profiles between isolates from our time series data as well as for independently sequenced aliquots from some isolates as a conservative estimate of the degree of aneuploidy variation in our study (see Table S2). The aneuploidy level was generally low in our sample cohort, with 73.5% (83/113) of all samples having a population aneuploidy profile of diploid chromosomes apart from a tetraploid chromosome 31 (see Fig. S6). Time series samples and replicate primary isolates were from a total of 11 patients, and 72.7% (8/11) of those patients with multiple samples had the same population-wide aneuploidy profiles in all samples. In contrast, isolates from patients 1023, 1037, and 1045 showed some aneuploidy variation either between sampling events or between aliquots from the same biopsy sample ( Fig. 4; Tables 2 and S2). This suggests that a low level of aneuploidy variation is present within patients, though we cannot definitively associate this variation with the reduction in heterozygosity we observed in isolates from patients with repeated relapses.
No signal of drug resistance evolution during repeated treatment of recurrent VL. A possible explanation for the survival of parasites through treatment is the evolution of drug resistance within the parasite infrapopulation, which would also contribute to the high rate of treatment failure during subsequent VL relapses. However, so far, there is little direct evidence that treatment failure in VL/HIV patients is largely driven by drug resistance. We investigated whether known markers for drug resistance in Leishmania were present in our study cohorts and compared their frequencies between primary and recurrent VL. Gene copy numbers were estimated for genes within four loci previously suggested to be involved in drug resistance, including the H locus (28), the M locus (MAPK1 gene [29]), both associated with resistance to antimonial drugs, the miltefosine transporter (30), and the miltefosine sensitivity locus (MSL) (31). We tested for change in copy number at each of these four loci in HIV-negative primary VL cases ("primary_neg"), HIV-positive primary cases ("primary_pos"), and from relapse cases in HIV positive patients ("recurrent_pos"). Several genes showed increases with respect to the baseline chromosome copy number for the two larger  Table 2. The largest circle (profile 1) represents the diploid condition but with a tetrasomic chromosome 31.
Diversity and Within-Host Evolution of Leishmania ® groups (primary_neg and recurrent_pos), including YIP, MRPA, and ASS at the H locus and at a gene associated with the miltefosine transporter (annotated as a hypothetical protein in the LV9 reference genome; one sample t tests, FDR , 0.001) (Fig. 5); only in two of these above cases the primary_pos cohort had a significant increase in copy number compared to the chromosome background, but this cohort also had far fewer samples, making it hard to detect small effect size changes (one sample t tests, FDR , 0.05) (Fig. 5). Comparison between all three groups for each gene did not show any differences between the three disease categories except for the second MAPK1 ortholog (ANOVA, FDR , 0.05) (Fig. 5). The increased dosage at the H locus in our cohort is in line with previous findings from Ethiopian isolates (22), but as both this increase and the dosage increase of the miltefosine transporter-associated gene are similarly present in both HIV-positive and -negative patients, they cannot explain VL relapse. Only for the second ortholog of the MAPK1 gene does our data support a copy number increase with relapse compared to primary VL, which could indicate evolution of drug resistance during repeated treatment (Fig. 5). This comparison was, however, only significant at an a of 0.05 and was not observed in longitudinal comparisons within a patient (see Fig. S7). A larger sample size would be needed to support the role of this variant as an important factor in driving worse treatment outcomes in VL/HIV patients.
It has previously been suggested that different aneuploidy patterns themselves can be adaptive in new environments (27,32). We noticed that the dominant aneuploidy profile described above (all chromosomes are diploid except for a tetraploid chromosome 31) was more frequent in primary VL isolates before treatment (88.55% [62/70]) than in isolates from recurrent VL (44.7% [17/38]) (see Fig. S8A). This poses the question of whether these different aneuploidy patterns might be adaptive in HIV patients, allowing parasites with these patterns to become more frequent in these patients during repeated VL episodes. We therefore tested if divergence from the dominant aneuploidy profile increased with the number of episodes of VL relapse (Fig. S8C and D), but there was only support at an a of 0.1 for this trend (linear model, sum of the difference profile % number of relapses, P value = 0.0561). In total, 13 of 25 isolates from different patients with recurrent VL showed six different divergent aneuploidy profiles (Fig. S8B). In those profiles, a total of six chromosomes showed changes in somy in a

FIG 5
Gene copy numbers at known drug resistance loci across disease cohorts. Gene copy numbers at four drug resistance loci are shown for the three disease cohorts: "primary_neg," "primary_pos," and "recurrent_pos." Gene copy numbers are estimated as differences in somy (Continued on next page) Diversity and Within-Host Evolution of Leishmania ® few combinations. Four of the deviations were present in isolates from different patients, with the most frequent changes occurring in 28% and 20% of all isolates from recurrent VL showing a reduction from a somy of 4 to 3 for chromosome 31 and a somy increase from 2 to 3 for chromosome 7, respectively (Fig. S8B). While withinpatient environments are expected to be heterogeneous due to both patient-specific differences and different drug treatment histories, this convergence might suggest an adaptive benefit of reduced chromosome 31 dosage or other convergent somy increases during prolonged parasite evolution in HIV patients with recurrent VL.

DISCUSSION
We have shown that, in northern Ethiopia, the same diverse population of parasites is responsible for VL and VL/HIV cases. Infection of HIV-positive patients and the associated clinical manifestations (25) are thus not due to phylogenetically distinct strains. However, we observe a small degree of genetic structure between isolates from VL and VL/HIV patients with borderline statistical significance. While we cannot exclude the possibility that specific groups of strains may have a higher propensity to infect HIV-positive patients, we propose that this is likely to be due to geographic structure in the parasite population that is correlated with geographic differences in HIV prevalence. Many of the VL patients in this region of Ethiopia (8), including 84.8% of the patients in this cohort (25), are migrant workers who move from regions of nonendemicity for seasonal agricultural work. Possibly due to their origin from a region of nonendemicity and the working and living conditions on the farms, they are frequently infected with Leishmania and develop clinical VL. HIV infection is also more frequent in the migrant worker population (33), introducing an association between the location of infection and HIV status. Identifying the actual location these patients acquired VL infections would require a detailed study based on these farms. These are extremely challenging environments to work in, because of both the lack of basic infrastructure and the currently poor security situation in the farming area close to the border with Sudan.
Our within-patient longitudinal data suggest that recurrent VL in HIV-coinfected patients is predominantly caused by the re-expansion of persistent parasites. In six of seven patients for which we obtained longitudinal samples from subsequent relapses of VL, parasites from the same patient were clearly most closely related to each other. For one patient (1023), parasite isolates were extracted at four different time points and showed a gradual divergence from the initial isolate, as expected if parasites from the originally infecting population are responsible for relapses. In only a single case (patient 1045), isolates taken at the third and fourth episodes of VL for this patient (the time of initial diagnosis in this study and 7 months later) were as divergent as isolates from different patients. We interpret this as likely due to reinfection of this patient following cure. We cannot rule out the possibility that this patient was initially infected with multiple parasite strains and, by chance, different strains from an initial infection were observed in subsequent isolates. However, we think this scenario is unlikely, as we cannot see any strong indication that any isolates from this patient are complex infections (see Fig. S5 in the supplemental material). We conclude that, in this single case, reinfection was contributing to the relapse of VL. The observation in the other six cases that parasite isolates from a single patient gradually diverge with the number of VL relapses provides the first genetic evidence that successful treatment of VL-at least in HIV-coinfected patients-is not sterile and is responsible for relapse.
It is clear from animal models of cutaneous leishmaniasis that Leishmania can enter a distinct, quiescent nonreplicating state (34) that has been associated with persistence in tissues. Parasite persistence certainly also occurs clinically. Parasite DNA and live parasites   have been isolated from healed cutaneous leishmaniasis (CL) scars (35) and from the skin of cured post-kala-azar dermal leishmaniasis (PKDL) patients (36). Reappearance of active CL has been recorded up to 50 years after patients leave an area of endemicity (37). This phenomenon also occurs with VL (38), sometimes with fatal consequences (39). The presence of persistent parasites may be vital in establishing effective long-term immunity to leishmania reinfection (40). The role of persistent parasites is also supported by our finding of much lower heterozygosity in parasite isolates from relapse episodes of VL, with a gradual reduction in heterozygosity with multiple relapses. Together, it suggests that heterozygosity in the infrapopulation of parasites within a host is lost due to strong population size reduction during initial VL treatment. In many HIV-positive patients, the population re-expands, causing relapse of disease symptoms: treatment of recurrent episodes of VL then leads to repeated population bottlenecks and continuing small reductions in heterozygosity. While this scenario is consistent with our data, a number of questions remain. We find a number of isolates that show unusual allele frequency distributions suggestive of complex infections caused by multiple clones; these were no more frequent in primary isolates than those taken during relapse episodes of VL, suggesting that loss of clonal complexity in the infections is not responsible for the observed loss of heterozygosity. We note that this is based on only a single isolate from spleen or bone marrow per patient at any timepoint, and it is possible that parasite populations are present in other host tissues such as the skin (41).
We find complex infections present as late as the tenth episode of VL relapse, which is very puzzling; although for these isolates, the total heterozygosity is low enough that it might have been generated during within-host evolution. The origins of the heterozygosity present in the initial infection is not completely clear. One scenario could be the frequent presence of multiple infecting clones in the initial sandfly bite, most of which are lost on initial treatment. This would imply either that flies frequently pick up multiclonal infections from single mammal hosts or that they frequently bite multiple infected hosts before transmitting an infection. A few studies have either investigated multiple isolates collected from individual hosts (42,43) or directly investigated the diversity of parasites in tissues (44) or vectors (45,46) and find some evidence for polyclonal infections, but such reports seem to be rare. Similarly, individual sand flies biting more than two infected hosts seems unlikely to be very frequent when prevalence of Leishmania infections is not very high. However, it might be promoted if prevalence is much higher in very focal areas and because multiple blood meals appear to greatly promote Leishmania infections in the sandfly gut (47). This could tend to increase the number of flies transmitting infections picked up from multiple hosts. In general, we know too little about the life span or biting rates of sandflies to evaluate this possibility, even in the best studied foci of VL (48). Prevalence in this region is not well understood, as quantitative data are either old (e.g., see reference 49) or restricted to subpopulations such as HIV patients (e.g., see reference 50) or patients reporting with suspected VL (e.g., see reference 51), although detailed information about one small neighboring focus is available (52).
While, to our knowledge, we present the first longitudinal data from cases of Leishmania relapse, a number of studies have used genomic approaches to investigate recurrent malaria infections, which have demonstrated that relapse is due to parasite recrudescence (53)(54)(55). While these studies do not identify any genetic variation associated with relapse, they confirm that this is largely due to regrowth of one, or few, clonal lineages from initially complex infections (54,55). One key limitation of our analysis of longitudinal changes in parasite populations is the ethical constraint on the availability of parasite isolates, which we could only obtain when invasive spleen or bone marrow aspirates were taken by clinicians for diagnostic purposes. Isolating parasites from blood samples would be less invasive (56,57), but these approaches are little used and require conditions not generally available in resource-limited clinical settings. Culture-free approaches could be feasible (44), although parasitemia in many VL patients is likely to be too low for current approaches (e.g., see reference 58).
The heterozygosity we report in primary VL isolates appears to be due to both complex infections and genuine heterozygous variants within a parasite clone. These multiple causes of heterozygosity are one reason we have not attempted to use our diversity estimates to directly quantify the size of any population bottleneck in these patients. Another barrier to quantification is the rather complex genetics of Leishmania. Leishmania can reproduce clonally but can also have normal meiotic sex (59), which is probably rare (60), and might also be able to reproduce parasexually (61). Leishmania populations also show extensive variation in aneuploidy (22,42,(61)(62)(63)(64)(65), but it is unclear to what extent this occurs in the field (27,44). Indeed, this aneuploidy variation seems to occur particularly rapidly in culture (26,61). Very rapid turnover of chromosome copies should decrease heterozygosity within individual cells, but it is not clear that it should impact heterozygosity measured at the level of an entire population of cells (61). Robust approaches to estimate parameters such as the change in population size upon drug treatment from patterns of genetic variation between isolates or through time in a longitudinal sample would need to take these unusual features of Leishmania genetics into account.
We report a relatively small amount of aneuploidy variation between isolates, which could contribute to the loss of heterozygosity we observed with treatment of VL relapse. An important caveat to these results is that we estimate ploidy from DNA extracted from the entire population of cells present in the primary sample taken to isolate parasites. Using these initial cultures minimizes the possibility that aneuploidy changes occur as the parasites adapt to in vitro growth, but there could still be some difference to the aneuploidy profile of parasites in the patients themselves (27,44). In addition, the measure will be conservative in identifying variation in somy, as it ignores variation between parasite cells within a single isolate. Directly demonstrating that aneuploidy turnover occurs and quantifying its impact on heterozygosity differences between isolates would require either multiple samples from a much larger set of patients or investigating the genomes of individual cells directly, which is now possible but remains costly and technically challenging and gives much less clear results than sequencing larger amounts of material (66).
The parasites we investigate from recurrent VL cases have not been phenotypically investigated for drug sensitivity or resistance, and we do not generally identify changes at drug resistance loci in the genomic data from these parasites. We thus presume that drug resistance is not responsible for the increased difficulty in treating patients with repeated relapses of VL (18,25) and the failure to successfully treat these patients long term. Indeed, it is unclear in general to what extent treatment failure in Leishmania is due to drug resistance (67)(68)(69). However, we can only screen for known genetic variants associated with resistance, and our knowledge of the genetic basis of drug resistance in Leishmania is very incomplete (68) and largely based on studying parasite clones selected for resistance in the laboratory (70). In particular, the markers that are available for VL have largely been identified in Indian L. donovani or in L. infantum. African L. donovani populations are genetically highly differentiated from populations elsewhere in the world (22), and it is possible that the genetic basis of drug resistance in Africa, if any, could be very different. In particular, many known markers are for resistance to miltefosine, which is not widely available in Africa (71). Little work has been done to investigate drug resistance markers in African visceral leishmaniasis, where drug resistance is not thought to be a significant clinical problem (72), and no genetic markers have-to our knowledge-been identified in the region. Identifying such resistance in recurrent VL/HIV would be complicated by the complex history of drug exposure for many of these patients, as clinicians frequently resort to second-line drugs or combinations in an attempt to improve treatment outcomes (16,18,73).
Conclusion. We show that a diverse population of L. donovani present in northern Ethiopia is responsible for visceral leishmaniasis in both single VL infections and in VL/HIV coinfections, with no clear genome-wide genetic differentiation between parasites isolated from the two patient groups. Our data suggest that most relapses are caused by recrudescence (regrowth) of the initial infecting parasite population rather than reinfection from a subsequent infected sandfly bite. Parasite populations present in primary VL episodes are more diverse than those isolated during VL relapses and that the infrapopulation of parasites within a host patient loses genetic diversity with increasing numbers of subsequent relapses. This loss of diversity is presumably due to population bottlenecks induced by treatment of each episode of disease. We observed complex multiclonal infections, but these were not more common in primary infections. We propose that infecting parasites are able to reestablish despite clinical cure after antileishmanial treatment, due to the dysfunction of the immune system of patients coinfected with VL and HIV (25) that is unable to prevent reexpansion of the parasite infrapopulation leading to VL relapse. This implies that better approaches for identifying patients at highest risk of relapse (25) and alternative treatment that allows these patients to restore functioning immunity might help prevent subsequent relapses and so improve clinical outcomes for VL/HIV patients. Parasite isolation and culturing. Isolates were obtained from splenic or bone marrow aspirates from VL and VL/HIV patients. Splenic aspiration can only be performed if the spleen is palpable at least 3 cm below the costal margin, there are no signs of bleeding, jaundice, or severe anemia and platelet count of ,40,000/ml. Otherwise, bone marrow aspiration was performed. Immediately after collection, tissue aspirations were cultured as previously described (74) and kept in primary medium until DNA was extracted using DNeasy Blood & Tissue extraction kits (Qiagen, Hilden, Germany) according to the manufacturer's instructions. Extracted DNA samples were stored at 220°C before library preparation and sequencing.

MATERIALS AND METHODS
Sequencing. Genomic DNA was sheared into 400-to 600-bp fragments by focused ultrasonication (Covaris Inc.), and Illumina libraries were prepared using an NEB Ultra II custom kit (75) and then cleaned up using Agencourt AMPure XP SPRI beads (Beckman Coulter). The resulting libraries were sequenced as multiplexed pools of 96 samples and 17 samples (combined with samples from unrelated studies) as 151-bp paired-end reads on the Illumina HiSeq X10 platform.
Phylogenetic reconstruction and phenotype association. Phylogenetic trees were reconstructed using a distance-based method: pairwise distances between samples were calculated as Nei's genetic distances using the R package StAMPP version 1.6.1 (80), and trees were reconstructed with neighbor joining using the R package ape version 5.4 (81). To obtain bootstrap values, the total number of SNPs was drawn from all SNPs with replacement for each of 100 bootstrap replicates. For each, Nei's distance matrices were calculated and individual replicate trees reconstructed with neighbor joining. Reported values at the nodes of the original tree are percentages of the respective node across the replicate trees. Associations of HIV and VL recurrence status with the phylogeny were tested based on the pairwise genetic distance matrix (Nei's D) using ANOSIM analysis (R package vegan, version 2.5-6) and Blomberg's K (R package phytools, version 0.7-47) (82).
Population genomics analysis. Population genomic analysis was based on genotype calls for each sample for the previously described 370,714 SNP sites. All analysis and plotting were performed in R (83). Alleles at all SNP sites were polarized based on their frequencies across all primary isolates and encoded as either H or L for "high" or "low" frequency, respectively, across all samples isolated from patients with primary VL. During time series analysis, it was noticed that the two samples isolated from patient 1040 had unusually high fractions of LL genotypes during recurrent VL that were much higher than in isolates from other patients with recurrent disease. We therefore checked if this could be a biological signal or rather a technical artifact. Generally, high fractions of LL genotypes in a sample were associated with higher missingness of genotype calls, suggesting that an extreme fraction of LL genotypes might rather be a genotype calling artifact in samples with less power for accurate SNP calling. We therefore excluded 8 samples with a fraction of $0.002 unknown genotype calls from this analysis (1002_ToD_1_primary_neg, 1025_ToD_1_primary_neg, 1040_EoT_2_recurrent_pos, 1040_ToD_1_recurrent_pos, 1041_ToD_1_primary_neg, 1061_ToD_1_primary_neg, 1077_ToD_1_recurrent_pos, and 1079_ToD_1_recurrent_pos).
Diversity and Within-Host Evolution of Leishmania ® Evaluation of complexity of infection. In a strictly clonal population, allele frequency distributions have a clear expectation given the ploidy of the respective chromosome; e.g., for a diploid chromosome, the allele frequency distribution should peak at 0.5, and for a triploid chromosome, it should peak at one-third and/or two-thirds. We used this expectation to evaluate whether each isolate was most likely to represent (i) a single clonal infection where the allele frequencies were as expected, (ii) noisy profiles with very few variants or without very clear peaks, or (iii) profiles with peaks at unexpected allele frequencies, likely to represent complex infections with a mixture of clones (see Fig. S4A to C in the supplemental material). For each isolate, the allele frequency distribution for all variants called in that isolate on each diploid chromosome and the absolute difference between each allele frequency and the expected frequency of 0.5 were calculated. The distribution of these differences was summarized by its mean and standard deviation (Fig. S4D and E), confirming that complex infections were distinct from either single infection profiles or those with little or noisy signal.
Aneuploidy and copy number variant analysis. For copy number estimation, previously described bam files were further processed to estimate somies of individual samples and chromosomes as well as the relative copy number of genes previously suggested to be involved in drug resistance. GATK version 3.6 was used to perform indel realignment using "RealignerTargetCreator" to identify and "IndelRealigner" for realignment in the identified regions (78). Then bam files were filtered for proper pair mapping, and duplicates were removed using SAMtools version 1.9 with the parameters "-F 1024 -f 0 Â 0002 -F 0 Â 0004 -F 0 Â 0008." Genome coverages were estimated with bedtools genomecov version 2.29 and parameters "-d -split." Data were subsequently processed in R. Aneuploidy profiles for each sample were estimated using median chromosome coverage. The median chromosome coverage for each sample was used to estimate the chromosome-specific coverage, and across all chromosomes of a sample, the median value was assumed to present the diploid stage. So for each sample, chromosome-specific somy equals the chromosome-specific coverage divided by the median chromosome-specific coverage across all chromosomes times 2. Visualizations of aneuploidy profiles were performed in R with the package heatmap.2, and minimum spanning networks were visualized with the mst function of the pegas package (version 0.13).
To assess the coverage at genes of putative interest for drug resistance, coverage was estimated through the median coverage across bases in the respective gene. This coverage was translated into a somy equivalent by dividing the base coverage at each genomic position by the median chromosome coverage and multiplying with the chromosome-specific rounded somy. The somy equivalent change in coverage was estimated for genes from four loci previously associated with drug resistance, including the H locus, the M locus (MAPK1) associated with resistance to antimonial drugs, the miltefosine transporter and associated genes (84,85), and the miltefosine sensitivity locus. Genes in the LV9 reference genome (version 43) were identified via ortholog annotation in TriTryp (https://tritrypdb.org/) of the respective gene names of L. infantum JPCM5 (version 41) as previously described (22); locus identifiers (IDs) for the LV9 genome annotation are included in Fig. 5.
Data availability. Sequencing reads from this project are all available from the European Nucleotide Archive and BioProject under accession number PRJEB30077.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.