Microevolution of Trypanosoma cruzi reveals hybridization and clonal mechanisms driving rapid genome diversification

Protozoa and fungi are known to have extraordinarily diverse mechanisms of genetic exchange. However, the presence and epidemiological relevance of genetic exchange in Trypanosoma cruzi, the agent of Chagas disease, has been controversial and debated for many years. Field studies have identified both predominantly clonal and sexually recombining natural populations. Two of six natural T. cruzi lineages (TcV and TcVI) show hybrid mosaicism, using analysis of single-gene locus markers. The formation of hybrid strains in vitro has been achieved and this provides a framework to study the mechanisms and adaptive significance of genetic exchange. Using whole genome sequencing of a set of experimental hybrids strains, we have confirmed that hybrid formation initially results in tetraploid parasites. The hybrid progeny showed novel mutations that were not attributable to either (diploid) parent showing an increase in amino acid changes. In long-term culture, up to 800 generations, there was a variable but gradual erosion of progeny genomes towards triploidy, yet retention of elevated copy number was observed at several core housekeeping loci. Our findings indicate hybrid formation by fusion of diploid T. cruzi, followed by sporadic genome erosion, but with substantial potential for adaptive evolution, as has been described as a genetic feature of other organisms, such as some fungi.


Introduction
Trypanosoma cruzi is a kinetoplastid protozoan and the etiologic agent of Chagas disease, one of the neglected and highest impact parasitic infections in the Americas 1 .
Chagas disease is estimated to cause great loss in both health-care costs and disabilityadjusted life years 2 . Human migration and specific modes of transmission than the canonical vector-based have led to a spreading beyond its natural geographical boundaries, becoming a global issue 1,3 . T. cruzi transmission is a zoonosis maintained by numerous species of triatomine insects and different species of mammals 4 . This parasite has a complex life cycle, where transmission to humans occurs most frequently by contamination with infected feces from triatomine vectors. To evade the immune responses the parasite displays a vast, complex repertoire of surface molecule genes involved in cell invasion and pathogenicity [5][6][7] . The regions of the genome coding for these molecules are composed of highly repetitive sequences, with hundreds to thousands of members of each family, as well as substantial numbers of transposable elements, microsatellite and tandem repeats [7][8][9] .
The reproductive mode of these parasitic protozoans has been widely debated, and both preponderate clonal evolution (PCE) 10,11 and sexual reproduction [12][13][14] have been proposed. Many eukaryotic pathogens have essentially clonal population structures while also having non-obligate sexual cycles, which may enable them to adapt to environmental changes [15][16][17] . In trypanosomatids, the most well studied mating system is that of Trypanosoma brucei, for which a meiotic process is well supported based on the identification of a haploid parasite stage in the vector 18 and on the patterns of allele inheritance and recombination observed in experimental hybrids 19 . In T. cruzi, great genetic and phenotypic diversity is observed and six distinct genetic clades have been recognized, named TcI to TcVI (Discrete Typing Units or DTU-I to VI) 20 . Genotyping analysis indicates that TcV and TcVI are recent, natural inter-lineage hybrids of TcII and TcIII, showing that recombination events have shaped the evolution of T. cruzi lineages 13,16,21 . In addition to natural evidence of hybridization, the formation of T. cruzi hybrids in vitro was also described 22 . Genetic marker analyses of hybrid strains showed multiallelic (non-Mendelian) inheritance of microsatellite alleles and an elevated DNA content 22,23 . The hybridization events in T. cruzi were proposed to occur via a parasexual mechanism, similar to thise observed in certain fungi 16,17,22,24 . While certain aspects of the molecular mechanisms involved in this process have been studied 25 , the contribution of each parental strain in hybrid genomes, as well as the mechanisms and adaptive significance of the hybridization phenomenon, are not well understood.
Experimental evolution approaches in defined environments have been widely used to study the evolution of model organisms (such as bacteria, yeast and Drosophila sp.) [26][27][28][29] although this is not the case for parasitic protozoa. To investigate the effects of clonal reproduction and genetic exchange on the T. cruzi genome at the microevolutionary scale we selected two closely related TcI strains (P1, P2) and three hybrid clones that were generated from a P1 x P2 cross (1C2, 1D12, 2C1) 22 . In addition, to dissect the underlying mechanisms of hybridization in T. cruzi we applied a comparative genomics approach based on genome sequence data generated for parental strains and hybrid clones at the beginning and at the end of the in vitro microevolution experiment.

Stability of DNA content during experimental in vitro microevolution
To investigate the effects of clonal reproduction and genetic exchange on the T. cruzi genome at the microevolutionary scale we selected two TcI strains (P1, P2) and three hybrid clones that were generated from a P1 x P2 cross (1C2, 1D12, 2C1) 22 . The two parents and three hybrids were included in an in vitro evolution experiment in which they were continuously cultured as epimastigotes forms for five years, equivalent to approximately 800 generations of replication by binary fission (Fig. 1a). At the end of the experiment, we re-cloned the parasite lines and selected three clones per line for analysis.
During this microevolution experiment, we monitored the DNA content of each parasite line at approximately 200-generation intervals to check for evidence of large-scale structural genomic changes. We observed that the DNA content of the two parental lines remained approximately constant ( Fig. 1b-1d). At the start of the experiment, the total DNA content of the hybrid clones was approximately 70% greater than the parentals ( Fig.   1c-1d, 23 ), but the mechanism underlying this difference has remained unknown. Over the course of the microevolution experiment, in contrast to P1 and P2, we found that the DNA content of the hybrids became progressively lower over time, up to a maximum decrease of 22.5% by generation 800. By extrapolation from the estimated genome sizes of P1 (94.5Mb) and P2 (92.5 Mb) 23 , these data indicate genome erosion in the hybrids occurred at an average rate of 23 kb per generation. Next, we looked for evidence of instability at specific loci in the hybrid genomes using PCR-based multilocus microsatellite genotyping, which had previously provided some evidence for non-Mendelian inheritance in the hybrids from the original P1 x P2 cross 22 . Specifically, at many loci the hybrids inherited more than one allele per parent. We found that three of the nine evolved hybrid clones had lost at least one microsatellite allele (Supplementary Table 1). These data indicated that the enlarged genomes of T. cruzi hybrids were subject to gradual erosion over an extended period of clonal replication. To dissect the underlying mechanisms, we applied a comparative genomics approach based on sequence data for parental and hybrid samples from the beginning and end of the microevolution experiment.

Parental strains display distinct genomic signatures
The genomes of the two parental strains were characterized using de novo genome assemblies from short Illumina reads. Despite high sequence coverage and the linking information provided by additional long insert size libraries, the final assemblies only reconstructed 77.1 % and 74.8 % of the P1 and P2 genomes, respectively (Supplementary Table 2), due to the high repeat content of the genomes as also observed to other previously sequenced T. cruzi genomes. The haploid estimated genome size of P1 differed from P2 by approximately 2 Mbp, and both parental strains genomes were found to be slightly larger than the 44 Mbp reference TcI-Sylvio X10/1 strain 7 , in line with flow cytometry-based DNA content measurements 23 .
While the parental strains both belong to the TcI clade, with highly conserved synteny in the core regions, they still show significant genetic diversity. Reads from the parental strains were mapped to the reference TcI-Sylvio X10/1 genome, followed by SNV calling and a comparative analysis. The resultant genome-wide average SNP density was 9.28 SNP/kb and 3.71 SNP/kb for P1 and P2, respectively ( Table 1). The majority of SNPs were located within repetitive regions, particularly in areas containing surface molecule gene family members, which can be related to gene expansion in these regions.
Indeed, CNV analysis showed that P1 displays these tandem-repeated and surface molecule coding regions expanded in comparison to P2. To avoid the influence of low mapping quality in repeated regions, we applied a strict mapping quality filter, removing any SNP in regions with mapping quality below 50. SNP density in the retained highquality regions was 2.84 and 2.11 SNPs/kb for P1 and P2, respectively (Table 1), and a total of 68,616 and 34,843 SNPs exclusive for each parental strain, were catalogued for use in the analysis of the hybrid strains (Fig. 1e). This level of differentiation between P1 and P2 provided ample scope to explore the genetic composition and inheritance patterns in the hybrid clones.

An increase in genetic diversity is observed after hybridization
To evaluate the impact of hybridization on genetic diversity we compared variant densities in the high-quality mapping regions of the genomes of all hybrid and parental strains. The average SNP density in these regions was higher in all hybrids than in parental strains: 3.48 SNP/kb for 1C2-0, 3.29 SNP/kb for 1D12-0 and 3.49 SNP/kb for 2C1-0 (Table 1), partially due to the presence of alleles from both parental strains. In addition to the SNPs inherited from each parental strain, additional SNPs common to all hybrids, and SNPs exclusive to each hybrid clone were identified in both coding and noncoding regions in patterns specific to each sample (Fig. 1e). The SNPs that were common to all hybrid strains are to a smaller extent SNPs that were not detected in the parental genomes, but mainly mutations that occurred in culture during the approximately 50 generations of growth before hybrid formation. Interestingly, de novo mutations common to all hybrids and exclusive to each hybrid clone were predicted to generate a higher percentage of non-synonymous changes in comparison to the mutations inherited from the parental strains (Supplementary Table 3). Indeed, the ratio of non-synonymous changes per synonymous changes was higher in the de novo mutations common to all hybrids (2.36 for 1C2-0, 2.25 for 1D12-0 and 2.19 for 2C1-0), and exclusive to each hybrid clone (2.40 for 1C2-0, 2.23 for 1D12-0 and 2.37 for 2C1-0) than in those inherited from the parental strains (1.41 for 1C2-0, 1.40 for 1D12-0 and 1.42 for 2C1-0) (Supplementary Table 3).
To investigate the impact of these mutations on the variable regions of the genome we compared the ratio of SNPs in surface molecule genes per non-surface molecule genes. This ratio is approximately 4-fold higher in SNPs common to all hybrids and in SNPs exclusive to each hybrid clone than in those inherited from the parental strains (Supplementary Table 4). In addition, we compared the ratio of non-synonymous mutations in surface molecule genes per non-synonymous mutations in other genes.
Interestingly, this ratio was approximately 3.3-fold higher in the SNPs that appeared after the hybridization event in comparison to those inherited from the parental strains (Supplementary Table 5).

Chromosome copy number variation analysis reveals tetraploid hybrids
Chromosome copy number variation (CCNV) was determined using the combination of Read Depth Coverage (RDC) and Allele Balance (AB) methods 30 . An increase or decrease in the mean RDC of a chromosome when compared to the overall genome coverage is an indicator of a gain or loss of chromosomal sequences, respectively 30 . If the ratio between the median chromosome coverage and the median genome coverage is approximately one, the chromosome has the same copy number as the genome, while fluctuations in this ratio indicates aneuploidies. Here, the RDC analysis was based on the ratio between the mean coverage of predicted single-copy genes in each chromosome and the mean coverage of all single-copy genes in the genome as described by 31 (Fig. 2a). This methodology was shown to eliminate bias caused by chromosomal repetitive content in T. cruzi 31 , which would otherwise obscure the correct copy numbers.
We were unable to estimate CCNV in chromosomes 17, 22, 30 and 47 by this methodology due to the lack of single-copy genes in these chromosomes, and therefore the chromosomal somy prediction presented for them was estimated based on the ratio between the mean RDC of each chromosome position and the mean coverage of all genome positions (Fig. 2a). Aneuploidy was observed in the P2 strain, involving trisomy of chromosomes 2 and 21 ( Fig. 2a-2b). The RDC analysis indicated aneuploidy in chromosome 44 in the P1 strain, but due to the lack of regions with high-quality SNPs, it was not possible to confirm this in the AB analysis ( Fig. 2a-2b). The RDC analysis also indicated aneuploidies in chromosome 6 in both parental strains and in chromosome 31 of P1 strain (Fig. 2a). However, the AB analysis was not consistent across the entire chromosome ( Fig. 2b), which may be related to copy number variation in specific regions of these chromosomes rather than aneuploidy, as observed in other T. cruzi strains 13,31 .

RDC analysis does not allow differentiation between different levels of euploidy
(or near-euploidy), therefore whole genome ploidy was estimated by analysing the proportion of the alleles in heterozygous SNP positions of all single-copy genes (Fig. 2c).
In both parental strains a peak at 50% was observed, while in the hybrids, peaks at 25 and 75% or 50% were observed (Fig. 2c), which is expected for diploid and tetraploid genomes, respectively 30 . This ploidy estimation is in accordance with the previous DNA content estimation based on flow cytometry data, in which the hybrids displayed ~70% greater DNA content than the parental strains (Fig. 1c, 23 ), given that this method could not determine the relative amounts of mitochondrial kDNA and nuclear DNA. Inheritance of SNPs exclusive to both parental strains was observed in the hybrid genomes (Fig. 1e), which shows that the tetrasomic profile observed resulted from a contribution from both parental chromosomes.
In order to address chromosome specificities and confirm the somy estimates, the Chromosomes 2, 7 and 13 have longer tandem repeated regions and coding sequences for surface molecules (percentage of surface molecule genes: chr2 7.26%, chr7 9.55%, chr13 8.38%), while chromosomes 18, 21 and 37 have longer core regions (percentage of surface molecule genes: chr18 1.25%, chr21 1.98%, chr37 3.74%) (Fig. 3a). The mapping quality in surface molecules' coding sequences was often observed to drop significantly, which affected the inferred proportions of alleles in those regions, and this was found to lead to erroneous somy estimates (Fig. 3b). To avoid this, a strict mapping quality filter was applied, and all the allele proportions were plotted by chromosome position. Using this strategy, it was possible to identify specific regions in chromosomes that could be used to infer CCNV with high confidence. Chromosomes 2, 7 and 13 showed longer regions with diverse allele proportions, nonetheless it was still possible to identify the trisomy of chromosome 2 of P2 and tetrasomies in hybrids based on using the smaller core regions (Fig. 3c). Chromosomes 18, 21 and 37 showed somy patterns that were clearly consistent with the whole genome CCNV, confirming the trisomy in chromosome 21 of P2 and tetrasomy in the hybrid clones (Fig. 3d).  1c). While parental strains remained essentially diploid (apart from the few aforementioned trisomies), a mixture between trisomic and tetrasomic profiles was observed in the hybrids after growth in culture (Fig. 4c). Indeed, hybrid 2C1 displayed a clear transition from a tetraploid to a triploid pattern after growth in culture (Fig. 4c), suggesting that genome erosion eventually leads to a return towards diploidy after the hybridization event, as observed in the naturally occurring TcV and TcVI hybrids 23

Patterns of genome erosion after in vitro microevolution
To investigate the patterns of genome erosion and CCNV in the hybrids, we performed a copy number variation (CNV) analysis throughout the whole genome of parental and hybrid clones after extended culture growth. Despite the stability of DNA content in parental clones during our experiment (Fig. 1c-1d), similar gene categories showed copy number variation in both parental and hybrid clones after culture growth (Supplemental_Tables_S6-S9). Gene losses occurred mainly in host-parasite interaction genes such as surface molecule genes and genes related to exocytosis and cell secretion, while genes related to macromolecule metabolism, ion transport and cell division showed an increase in copy numbers ( Fig. 6; Supplementary Fig. 11). Despite the events of  Fig. 7a), likely caused by the loss of genetic material. Multiple new genomic variants were found in the in vitro evolved hybrid progeny, displaying patterns specific to each clonal isolate (Fig. 7a). However, these variants were distinct from the novel mutations that emerged in the period between hybrid formation and the start of the microevolution experiment (t=0) as described above.
To better understand how these genomes evolved during culture growth, the SNP density was also evaluated in each chromosome separately. The SNP density pattern per chromosome remained similar for all samples in first generation (t=0) and after culture growth (t=800). Some chromosomes (e.g. chrs 5, 11, 24 and 37) display higher SNP density than the others across all samples (Fig. 7b). After culture growth, the P2 evolved clone replicates had an increase in SNP density in nearly all chromosomes (Fig. 7b) (Fig. 7c). To investigate the impact of new mutations in these regions, we compared the ratio of non-synonymous mutations in surface molecule genes to non-synonymous mutations in other genes. This ratio was higher for the mutations that appeared in each parental clone after culture growth in comparison to those present in the first generation (approx. 2.5-fold higher for P1 clones and 3.5-fold higher for P2 clones; Supplementary Table 13). Regarding the hybrid strains, a similar ratio was observed in the SNPs that appeared in each hybrid clone after culture growth and those exclusive to each hybrid clone present in the first generation (Supplementary Table 13).
These results indicate that surface molecule coding regions may mutate faster than the rest of the genome either by clonal or parasexual mechanisms, representing an important region of genomic diversification.

Discussion
The results from whole genome sequencing analysis of this T. cruzi genetic cross have clearly demonstrated that, while the parental strains were diploid, all initial hybrid clones were essentially tetraploid. We showed that genomic variants exclusive to both parental strains were present in all hybrid clones, indicating that the tetraploid profile observed is obtained by the fusion of the two disomic parental genomes, although more complex pathways cannot be ruled out in the absence of direct observational evidence. In addition, we showed that genome erosion was gradual, with sequential losses of chromosome copies rather than coordinated jumps between ploidy levels. As previously proposed 16,22 , our results confirmed that hybridization in T. cruzi happened via a parasexual mechanism rather than a canonical meiotic process. This phenomenon resembles the parasexual cycle of the pathogenic fungus Candida albicans, which also involves diploid fusion followed by non-meiotic genome reduction back to aneuploids and diploids (reviewed by 24 ). In contrast, hybridization in T. brucei and possibly in Leishmania spp. has been associated with parental strains having undergone a meiosislike process followed by a fusion of haploid cells 14,18,[32][33][34][35] . In this process, mainly diploid hybrids are formed, but polyploid hybrids may also be produced by the fusion between parental cells that failed to undergo meiosis 14,18,[32][33][34][35] . Despite the clear evidence of parasexual hybridization in in vitro T. cruzi hybrids, it is still not clear if a meiosis-like mechanism could also contribute to the generation of natural T. cruzi hybrids, as observed in other trypanosomatids. Further work will be required to establish whether there is a single common mechanism or several alternative modes of hybridization.
Our microevolution experiment has shown that T. cruzi genomes are highly responsive to the environmental conditions. The CNV analysis showed that while surface molecule gene numbers were being eroded after culture growth, genes involved in cell metabolism, cell division and ion transport were being expanded. This genome plasticity was not only observed at the level of gene copy number variation, but also at the chromosome level. The parental strains displayed distinct aneuploidies before and after culture, and despite the overall pattern of genome erosion, some chromosomes remained tetrasomic in the hybrid strains. Indeed, aneuploidies have been widely reported in natural T. cruzi populations, and although CCNV varies among and within DTUs, it seems constant within a given population 13,31,36 . While in many multicellular organisms, aneuploidy is known to have severe consequences, in trypanosomatids it represents an important mechanism for rapidly overcoming environmental diversity [36][37][38][39] . Aneuploidy is reversed quickly once the benefits of an extra chromosome expire, as observed in P2 aneuploidies in chromosomes 2 and 21. In addition, trypanosomatids may control overexpression caused by additional gene copies, at the protein level 40 . We found aneuploidy in chromosome 37 in all parental and hybrid clones after culture growth, suggesting a benefit to parasite fitness in culture. As observed in Leishmania spp., the fixation of genetic alterations is the result of positive selection processes that adapt parasite fitness to a given ecology or transmission cycle 38 . We can speculate that in natural T. cruzi hybrids, distinct CCNV and genome erosion patterns would be observed due to specific environmental selective pressure with the expansion of genes involved in hostparasite interactions.
Genome variant analyses showed that an increase in genetic sequence diversity was present after hybridization. Hybridization events are associated with a burst of novel mutations and a myriad of genomic rearrangements (e.g. transpositions, genome size changes, chromosomal rearrangements) in distinct eukaryotes 17,41 . Our results indicate that the process of T. cruzi hybrid formation included both new mutations and most likely recombination between parental chromosomes. Here, we were able to show an increase in the number of SNPs in the hybrid strains with the accumulation of amino acid substitutions. Interestingly, the numbers of new SNPs in culture showed a clear difference between the two parental strains after 800 generations, in which P2 showed a higher mutation rate. A previous study has reported that TcI strains have a mismatch repair machinery, which repairs base misincorporation and erroneous insertions and deletions during DNA recombination and replication, that is more efficient than in other DTUs 31,42,43 . As described in Leishmania major, HUS1 was shown to be required for genome stability under non-stressed conditions and its suppression had a genome-wide mutagenic effect 44 . Since many of the genes coding for the HUS1-like proteins are encoded on chromosome 2, the loss of trisomy in this chromosome after culture growth could possibly be related to the increase in mutations observed in the P2 clones. In general, culture growth led to a reduction in genetic variants in nearly all evolved clones in comparison to the first generation. This reduction may be related to the loss of heterozygosity, but also to the loss of gene copies and genome erosion of surface molecules regions after culture growth. In addition, we could not identify any bias in the loss of a specific parental chromosome even in triploid hybrid clones, suggesting recombination between the parental chromosomes. As material from both parental strains has been lost during over 800 generations, it is likely that the hybrid clones would stabilize at diploidy and contain chromosomes from both parental strains, with a higher polymorphism than the parental strains. This is similar to strains from the known natural hybrid clades, TcV and TcVI. Strains from these clades have diploid genomes with high repeat content, and high levels of polymorphism in many regions, resulting from alleles from two divergent parental strains (from clades TcII and TcIII) 16,21,23 . From our analyses, it thus appears that the in vitro hybrid formation presented here is highly similar to the process that generated the natural hybrids.
Our microevolution experiment has shown that surface molecule genes represent the regions of the genome with higher genetic diversity and more rapid evolution.
Accumulations of distinct genomic signatures and amino acid substitutions were observed in both parental and hybrid clones after relatively few generations. We have previously observed that there is a higher frequency of recombination in repeat-rich regions of the T. cruzi genome 7 , and it is likely that this is also the case for the hybrids. We can speculate that this is a likely explanation for variability in repeat and gene copy numbers between hybrid and non-hybrid clades in response to the distinct environmental conditions. Since these parasites were cultivated solely in culture, it is still not clear if this mechanism is related to the lack of pressure caused by the progression within triatomine bugs or by the interaction with the host immune system, or if these genomes display an intrinsic mechanism of high diversification of these regions. Recent comparative genomics of natural T. cruzi strains have shown a similar pattern, suggesting that these regions indeed evolve more rapidly than other regions of the genome 9 .
In summary, we confirmed that in vitro hybrid formation in T. cruzi happened in a parasexual mechanism, representing an important mechanism of genetic diversification.
Tetraploid hybrids showed patterns of progressive genome erosion shaped by the environment's selective pressure, which may explain distinct phenotypes in isolated natural hybrid populations. In addition, an increase in coding sequence diversity is observed in hybrids in comparison to the parental strains, related to both new mutations and most likely recombination events. Finally, our microevolution experiment has shown that repetitive gene families related to immune evasion evolved more rapidly than other regions of the genome, indicating an intrinsic mechanism of genetic variation of these regions.

Parasites and microevolution experiment
For routine culture, T. cruzi epimastigotes were grown axenically in supplemented RPMI as previously described 45 . Parasites were cloned by limiting dilution. The microevolution experiment comprised continuous in vitro cultures of each parasite line for five years. Three experimentally-generated hybrid clones (1C2, 1D12 and 2C1) and their two parents (P1 and P2) were seeded in the primary culture at 2 x 10 5 /ml and then allowed to grow for seven generations to stationary phase. Parasites were passaged into fresh media every two weeks at a seeding density of 2 x 10 5 /ml, equating typically to 1% of the stationary phase cultures.

Flow cytometry
DNA content was determined as previously described 23 . In brief, mid-log phase epimastigotes were washed in PBS, then fixed overnight in ice-cold 70% methanol / 30% PBS. Fixed cells were washed in PBS, adjusted to 1 x 10 6 cells/ml and incubated for 45 minutes with 10 μg/ml propidium iodide (PI) and 10 μg/ml RNAse A at 37°C.
Fluorescence was detected using a FACSCalibur flow cytometer for a minimum of 10,000 events. FlowJo software (Tree Star Inc., Oregon, USA) was used to plot histograms and identify G1-0 and G2-M peaks. Mean G1-0 values were taken to infer relative DNA content. An internal control T. cruzi strain, Esm cl3, was included in each run. Relative DNA content values were calculated as ratios compared to the internal standard. For experimental hybrids, the ratios relative to each parent (P1 or P2) were also determined using the mean standard:parent ratios derived from 12 independent experiments.

de novo genome assembly
The genomes of the parent strains were assembled using the 180-bp Illumina paired end library and the 8 Kb Illumina mate pair library using the ALLPATHS-LG v52488 (https://software.broadinstitute.org/allpaths-lg/blog/) assembler with a K=96, using the TcI Sylvio X10/1 reference for evaluations. The resulting contigs were scaffolded using both Illumina libraries and the BESST scaffolder 48 . Later, gaps in the scaffolded assemblies were filled using the 180-bp with GapFiller 49 . Each genome was submitted for annotation, prior removal of repetitive elements using RepeatMasker (https://www.repeatmasker.org/), to the Companion annotation pipeline 50 .

Data processing and read mapping
Paired end libraries were quality filtered using Nesoni Clip program (https://github.com/Victorian-Bioinformatics-Consortium/nesoni) removing bases with a Phred quality score below 30 and reads shorter than 64 nucleotides; later, sequencing adaptors were removed. Mate paired libraries were processed as above, plus an additional step to reverse complement the filtered reads. Filtered reads, from all the sequencing libraries produced, were mapped against the TcI Sylvio X10/1 reference genome (https://tritrypdb.org/tritrypdb/app/record/dataset/TMPTX_tcruSylvioX10-1) using Burrows-Wheeler Aligner 51 . The mapping files were sorted, PCR and optical duplicates were removed and read groups were added using Picard Tools v1.134 (https://broadinstitute.github.io/picard/). These mapping files were used for downstream analyses.

Identification of genomic variation
Short variants, such as InDels and SNPs, were called using a mapping-based approach. Mapping files were used as input for the GATK variant caller (https://gatk.broadinstitute.org/hc/en-us) using a minimum quality value (QUAL) of 10 and a minimum depth of coverage (DP) of 10. To avoid the influence of low mapping quality in repeated regions, a strict mapping quality filter was applied and any SNP in regions with mapping quality below 50 was removed using BCFtools 52 .
VCFtools package 52 was used to infer SNPdensity per 1kB (option --SNPdensity) and SNPs exclusive to each strain were identified using both VCFtools and BEDtools 53,54 .
The functional effect of these variants and the ratio of synonymous per non-synonymous mutations was predicted using SNPEff v4.4 55 .

Chromosome copy number variation and ploidy estimation
Chromosome copy number variation (CCNV) and ploidy estimation were performed using a combination of Read Depth Coverage (RDC) and Allele Balance (AB) methods as previously described 30,31 . Briefly, the estimation of CCNV was based on the ratio between the mean coverage of predicted single-copy genes in a given chromosome and the mean coverage of all single-copy genes in the genome. This approach was based on the RDC of 2602 1:1 orthologs between the haplotypes of the reference SylvioX10/1 genome and the parental strains assemblies, identified using OrthoVenn2 and surface molecules genes were assigned from the available TcI Sylvio X10/1 annotation 7 .

Data Access
The raw data generated in this study have been submitted to the NCBI BioProject   It is expected that diploid genomes display a proportion of each allele around 50%, triploid genomes around 33.3% and 66.6%, while tetraploid genomes around 25% and 75% or 50% 29 . Darker blue represents the frequency of the first allele while lighter blue represents the frequency of the second allele.