Genome sequence of Leishmania mexicana MNYC/BZ/62/M379 expressing Cas9 and T7 RNA polymerase

We present the genome sequence of Leishmania mexicana MNYC/BZ/62/M379 modified to express Cas9 and T7 RNA-polymerase, revealing high similarity to the reference genome (MHOM/GT2001/U1103). Through RNAseq-based annotation of coding sequences and untranslated regions, we provide primer sequences for construct and sgRNA template generation for CRISPR-assisted gene deletion and endogenous tagging.


Introduction
Leishmania mexicana is a human-infective unicellular eukaryote and one of the species which cause leishmaniasis. It is commonly used as a model Leishmania species for molecular cell biology due to its lower virulence (causing cutaneous rather than visceral leishmaniasis) and its ability to readily differentiate into the amastigote form in appropriate axenic culture. We have previously described the generation of a genetically modified L. mexicana MNYC/BZ/62/M379 expressing Cas9 and T7 RNA polymerase as a strain enabling for rapid reverse genetic modifications 1 . As this is not the reference genome strain (which is MHOM/GT/2001/U1103) 2 and may have accumulated mutations during laboratory culture and/or selection pressures of Cas9 or T7 expression, we sequenced the genome of this widely used strain as a high-quality reference for design of reverse genetic strategies.

Methods
We have previously confirmed that these promastigotes are infectious to the sandfly vector 3 . To ensure that the line was infectious to mammals, we infected an eight-week-old female BALB/c mouse footpad with stationary phase promastigotes (2.0 × 10 6 ); after four weeks we purified amastigotes from the excised resulting lesion, which were then back-transformed to promastigotes in axenic culture in M199 supplemented with 20% FCS and 50 µg/ml gentamycin (Roche) and grown for seven passages. This gave rise to the cell line L. mex Cas9 T7 M. Genomic DNA from before and after mouse passage was extracted using phenol-chloroform DNA extraction as previously described 4  To simplify genome annotation, we therefore opted to polish the MHOM/GT/2001/U1103 genome (NCBI Genome Assembly GCA_000234665.4) with the Illumina reads to generate a MNYC/BZ/62/M379 Cas9/T7 genome instead of using the de novo assembly. Following adapter trimming using TrimGalore! V0.6.0 (default settings) and removal of unfixable reads, the genome was polished by mapping the Illumina reads to the genome using BWA-MEM v0.7.17 9 and one (1) round of polishing using Pilon v1.23 fixing SNPs and indels 10 , identifying 21500 SNPs, 3828 small insertions and 4878 small deletions (as defined by Pilon 10 ; mean insertion length 2.25 bp, mean deletion length 3.01 bp). The resulting updated and annotated genome sequence is available as a supplement 12 . Pilon, run in changes mode, identified only 193 SNPs and no changes in coding sequences following mouse passage. Note that neither T7 nor Cas9 are present in this polished genome as Cas9 is not chromosomally integrated (instead expressed from an episome 1 ) and T7 is integrated into the highly repetitive 18S rRNA array which is collapsed in the reference genome.
Aneuploidy is known to be common among Leishmania 2 . Indeed, sequencing read coverage (how many times each nucleotide was sequenced) was not uniform across all chromosomes.  13 were taken as the start

Amendments from Version 1
Changes made in this version in response to referee comments: • We corrected a typo in the number of read base pairs we obtained during sequencing of the before-passage sample (Methods section, first paragraph).
• We explained what we mean by "small insertions" and "small deletions" detected by the Pilon polishing run (Methods section, third paragraph).
• We clarified that we used Pilon to polish the reference genome with our new reads from the Cas9/T7 strain (Methods section, third paragraph).
• We made clearer what we mean by "sequencing read coverage" (Methods section, fourth paragraph).
Any further responses from the reviewers can be found at the end of the article REVISED set. Previous RNAseq analysis 14 (BioProject accession number PRJEB8829) mapped spliced leader acceptor sites (SLASs, the site of trans splicing of a leader sequence common to all processed mRNAs) and polyadenylation sites (PASs) which define the bounds of the mRNA, from which suggested gene extensions and truncations were listed. We included these changes when a valid ORF (with a start and stop codon and no internal stops) was retained and mapped the 5' and 3' UTR based on the most commonly observed SLAS and PAS for each gene respectively. Previously identified novel genes with a valid ORF and evidence for expression as a polyadenylated transcript 13 were also included. To distinguish these gene models from the reference genome annotation we prefixed the gene names with "LmxM379c" indicating the strain name and its expression of Cas9 and T7.
The Cas9 enables CRISPR-assisted genome editing, while the T7 RNA polymerase allows use of sgDNA encoding a T7 promoter and sgRNA to program Cas9 activity. Using our previously published 'LeishGEdit' pipeline 1 and our updated primer design software 15 that is based on the CCTop CRISPR/Cas9 Target Prediction Tool 16 , we designed primers for: 1) PCR-based generation of constructs for endogenous protein tagging (uf/ur primers for N terminal tagging or df/dr primers for C terminal tagging, using the pPOT, pLPOT and pPLOT series of plasmids) and gene deletion (uf/dr primers, using the pT series). 2) PCR-based generation of sgRNA templates for tagging and deletion (5g/3g primers for deletion, 5g primer for N terminal tagging and 3g primer for C).
3) Primers within each protein-coding gene ORF validation of gene deletion by diagnostic PCR (vf/vr primers) (based on the Primer3 primer design software 17 ). We also designed uf primers carrying a unique 17-nt DNA barcode 15 for generating barcoded pools of deletion mutants.
As this set of primers accounts for strain-specific SNPs and indels we recommended them as a standardised 'first attempt' for tagging and deletion genes in L. mex Cas9 T7 M, and we will be using them for future high-throughput reverse genetic analyses. This project contains the primer sequences, barcodes and the GFF file containing the sequence and the annotations of the L. mexicana MNYC/BZ/62/M379 T7/Cas9.

Analysis code
All code for genome assembly, polishing, annotation updates and annotation transfer are available from GitHub: https://github.com/Wheeler-Lab/genome-lmexcas9t7/tree/v1. The methodological procedures used for generating the raw sequencing data are described in a detailed manner. Also, the bioinformatics procedures used for genome assembly and sequence polishing are clearly described. Apart from generating the genome sequence, they conducted genome annotations and transcripts delineation (including UTRs). As another valuable resource, the authors have designed oligonucleotide sequences to accomplish gene tagging or deletion of every gene in L. mexicana MNYC/BZ/62/M379.
In sum, this article contains valuable datasets. In this regard, this reviewer acknowledges the effort made by the authors to generate these valuable resources and encourages users to inform about possible inconsistencies in the information provided, as an effective way to improve and curate genome annotations between all.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes