Chromosome-Scale Assembly of the Complete Genome Sequence of Leishmania (Mundinia) orientalis, Isolate LSCM4, Strain LV768

ABSTRACT Leishmania (Mundinia) orientalis is a kinetoplastid parasite first isolated in 2014 in Thailand. We report the complete genome sequence of L. (M.) orientalis, sequenced using combined short-read and long-read technologies. This will facilitate greater understanding of this novel pathogen and its relationship to other members of the subgenus Mundinia.

L eishmaniasis is spread through sand fly bites and caused by kinetoplastid parasites of the genus Leishmania (1). It is present in over 90 countries, infecting approximately 12 million people and putting 350 million more at risk of infection from visceral, cutaneous, or mucocutaneous leishmaniasis (2,3). The genus Leishmania is subdivided into four subgenera, Leishmania, Sauroleishmania, Viannia, and most recently Mundinia (4,5), the latter being the least studied. Mundinia includes a wide range of species with different hosts and regional distributions (6), including Thailand, where leishmaniasis is an emerging disease (7)(8)(9). Leishmania orientalis was formally described as part of Mundinia in 2018 (10). We report here the complete genome sequence of Leishmania (Mundinia) orientalis, isolate LSCM4, strain LV768 (WHO code MHOM/TH/2014/LSCM4), originally obtained from a cutaneous biopsy specimen from a 57-year-old woman from northern Thailand (10).
Parasites were grown using an in vitro culture system previously developed for L. (M.) orientalis axenic amastigotes (11), in Schneider's insect medium at 26°C as promastigotes, then in M199 medium supplemented with 10% fetal calf serum (FCS), 2% stable human urine, 1% basal medium Eagle vitamins, and 25 mg/ml gentamicin sulfate, with subpassage to fresh medium every 4 days to sustain the parasite growth and viability. DNA was extracted and purified using a Qiagen DNeasy blood and tissue kit using the spin column protocol, according to the manufacturer's instructions. The extracted DNA concentration was assessed using a Qubit fluorometer, microplate reader, and agarose gel electrophoresis. All sequencing libraries were based on the same extracted DNA sample to avoid any inconsistency.
Short-read library construction and sequencing were contracted to (i) BGI (Shenzhen, China) for DNBSEQ libraries, producing paired-end reads (170 bp, 270 bp, and 500 bp) using the Illumina HiSeq platform, and (ii) Aberystwyth University (Aberystwyth, UK) for TruSeq Nano DNA libraries, producing paired-end reads (300 bp) using the Illumina MiSeq platform. We performed long-read library preparation and sequencing according to the Nanopore protocol (SQK-LSK109) on R9 flow cells (FLO-MIN106). The read quality was assessed using MultiQC (12), incorporating the use of FastQC for Illumina short reads and pycoQC for Nanopore long reads.
We assembled the long reads using Flye (13), with default parameters, to generate chromosome-scale scaffolds. Then, using Minimap2 (14) and SAMtools (15), we mapped the short reads onto the assembled scaffolds to correct erroneous bases within the long reads and create consensus sequences. After polishing the assembly with Pilon (16), another round of consensus short-read mapping was performed. Then, we removed the duplicated contigs and sorted the remainder according to length using Funannotate (17,18). Finally, we separated the chimeric sequences and performed scaffolding using RaGOO (19) with the Leishmania major Friedlin strain genome (GenBank accession number GCA_000002725.2) (20) as a reference guide, aligning all 36 chromosomes for our assembly, thereby also determining the chromosome ends to be complete, with the exception of 62 small contigs totaling 257,579 bp.
The analysis workflow for assembly, repeat masking, and annotation was performed using Snakemake (21); it is available online for reproducibility purposes (https://github .com/hatimalmutairi/LGAAP), including the software versions and parameters used (22). Figure 1 compares our assembly with other complete genomes.
We assessed the assembly completeness using BUSCO (23), with the lineage data set for the phylum Euglenozoa, containing 130 single-copy orthologs from 31 species, and we found 128 of the orthologs to be present (98.5% completeness). We carried out functional annotation and prediction using the MAKER2 (24) annotation pipeline in combination with AUGUSTUS (25) gene prediction software, with the predictor trained on Leishmania tarentolae. Table 1 shows additional summary metrics for the sequencing, assembly, and annotation.  Data availability. The assembly and annotations are available under GenBank assembly accession number GCA_017916335.1. The master record for the whole-genome sequencing project is available under accession number JAFHLR000000000.1. The raw sequence reads are available at PRJNA691532.