Chromosome-Scale Assembly of the Complete Genome Sequence of Leishmania (Mundinia) procaviensis Isolate 253, Strain LV425

ABSTRACT Leishmania (Mundinia) procaviensis is a parasitic kinetoplastid that was first isolated from a rock hyrax in Namibia in 1975. We present the complete genome sequence of Leishmania (Mundinia) procaviensis isolate 253, strain LV425, sequenced using combined short- and long-read technologies. This genome will contribute to our understanding of hyraxes as a Leishmania reservoir.

Parasites were grown in Schneider's insect medium at 26°C as promastigotes and then in M199 medium (Sigma-Aldrich) supplemented with 10% fetal calf serum (FCS), 2% stable human urine, 1% basal medium Eagle vitamins, and 25 mg/mL gentamicin sulfate, with subpassage to fresh medium every 4 days to sustain parasite growth and viability (10). DNA was extracted and purified using a Qiagen DNeasy blood and tissue kit with the spin column protocol, according to the manufacturer's instructions. The extracted DNA concentration was assessed using a Qubit fluorometer, microplate reader, and agarose gel electrophoresis. All sequencing libraries were based on the same extracted DNA sample to avoid any inconsistency.
We assembled the long reads with Flye (12), using default parameters, to generate chromosome-scale scaffolds. Then, using Minimap2 (13) and SAMtools (14), we mapped the short reads onto the assembled scaffolds to compensate for erroneous bases within the long reads and to create consensus sequences. After polishing of the assembly with Pilon (15), another round of consensus short-read mapping was performed. Then, we removed duplicate contigs and sorted the remainder of the contigs according to length using Funannotate (16). Finally, we separated the chimeric sequences and performed scaffolding using RaGOO (17) with the Leishmania major Friedlin strain genome (GenBank assembly accession number GCA_000002725.2) (18) as a reference guide, aligning all 36 chromosomes for our assembly with the exception of 31 unplaced contigs totaling 248,213 bp.
The analysis workflow for assembly and annotation was performed using Snakemake (19) and is available online for reproducibility purposes (20), including software versions and parameters used. Figure 1 compares our assembly with other complete genomes.
We assessed assembly completeness with BUSCO (21), using the lineage data set for phylum Euglenozoa, which contains 130 single-copy orthologues from 31 species; we found that 123 of the orthologues were present (94.6% completeness). We carried out functional annotation and prediction using the MAKER2 annotation pipeline (22) in combination with AUGUSTUS gene prediction software (23). Table 1 shows further summary metrics for sequencing, assembly, and annotation.
Data availability. The assembly and annotations are available under GenBank assembly accession number GCA_017918225.1. The master record for the whole-genome