Plasmodium vinckei genomes provide insights into the pan-genome and evolution of rodent malaria parasites

Background Rodent malaria parasites (RMPs) serve as tractable tools to study malaria parasite biology and host-parasite-vector interactions. Among the four RMPs originally collected from wild thicket rats in sub-Saharan Central Africa and adapted to laboratory mice, Plasmodium vinckei is the most geographically widespread with isolates collected from five separate locations. However, there is a lack of extensive phenotype and genotype data associated with this species, thus hindering its use in experimental studies. Results We have generated a comprehensive genetic resource for P. vinckei comprising of five reference-quality genomes, one for each of its subspecies, blood-stage RNA sequencing data for five P. vinckei isolates, and genotypes and growth phenotypes for ten isolates. Additionally, we sequenced seven isolates of the RMP species Plasmodium chabaudi and Plasmodium yoelii, thus extending genotypic information for four additional subspecies enabling a re-evaluation of the genotypic diversity and evolutionary history of RMPs. The five subspecies of P. vinckei have diverged widely from their common ancestor and have undergone large-scale genome rearrangements. Comparing P. vinckei genotypes reveals region-specific selection pressures particularly on genes involved in mosquito transmission. Using phylogenetic analyses, we show that RMP multigene families have evolved differently across the vinckei and berghei groups of RMPs and that family-specific expansions in P. chabaudi and P. vinckei occurred in the common vinckei group ancestor prior to speciation. The erythrocyte membrane antigen 1 and fam-c families in particular show considerable expansions among the lowland forest-dwelling P. vinckei parasites. The subspecies from the highland forests of Katanga, P. v. vinckei, has a uniquely smaller genome, a reduced multigene family repertoire and is also amenable to transfection making it an ideal parasite for reverse genetics. We also show that P. vinckei parasites are amenable to genetic crosses. Conclusions Plasmodium vinckei isolates display a large degree of phenotypic and genotypic diversity and could serve as a resource to study parasite virulence and immunogenicity. Inclusion of P. vinckei genomes provide new insights into the evolution of RMPs and their multigene families. Amenability to genetic crossing and transfection make them also suitable for classical and functional genetics to study Plasmodium biology. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-021-00995-5.

Very few studies have employed P. vinckei compared to the other RMP species despite the public availability of several P. vinckei isolates [37]. Plasmodium v. vinckei v52 and P. v. petteri CR have been used to study parasite recrudescence [38], chronobiology [39] and artemisinin resistance [40]. They are also the only isolates for which draft genome assemblies with annotation are available as part of the Broad Institute Plasmodium 100 Genomes initiative (Genbank accessions GCA_000709005.1 [41] and GCA_000524515.1 [42]).
A high-quality reference genome for P. vinckei and detailed phenotypic and genotypic data are lacking for the majority of P. vinckei isolates hindering wide-scale adoption of this RMP species in experimental malaria studies.
We now present a comprehensive genome resource for P. vinckei comprising of high-quality reference genomes for five P. vinckei isolates (one from each subspecies) and describe the genotypic diversity within the P. vinckei clade through the sequencing of five additional P. vinckei isolates (see Fig. 1a inset). With the aid of high-quality annotated genome assemblies and gene expression data, we evaluate the evolutionary patterns of multigene families across all RMPs and within the subspecies of P. vinckei.
We also describe the growth and virulence phenotypes of these isolates and show that P. vinckei is amenable to genetic manipulation and can be used to generate experimental genetic crosses.
Furthermore, we sequenced the whole genomes of seven isolates of the subspecies of P. chabaudi (P. c. esekanensis) and P. yoelii (P. y. yoelii, P. y. nigeriensis, P. y. killicki and P. y. cameronensis) in order to resolve evolutionary relationships among RMP isolates.
The data presented here enable the use of the P. vinckei clade of parasites for laboratory-based experiments driven by high-throughput genomics technologies and will significantly expand the number of RMPs available as experimental models to understand the biology of malaria parasites.

Results
A total of seventeen RMP isolates (Table 1) were revived from frozen parasitized blood stabilates and propagated in female ICR and CBA mice. These included ten P. vinckei isolates consisting of one isolate of P. v. vinckei (PvvCY), two isolates each of P. v. brucechwatti (PvbDA and PvbDB), P. v. lentum (PvlDE and PvlDS) and P. v. petteri (PvpCR and PvpBS) and three isolates of P. v.
subsp. (PvsEH, PvsEE and PvsEL). In addition to this, six P. yoelii isolates representing its four subspecies; three P. y. yoelii isolates (Pyy33X, PyyAR and PyyCN), one isolate each of P. y. nigeriensis (PynD), P. y. killicki (PykDG) and P. yoelii subsp. (PysEL); and one P. chabaudi subsp. isolate (PcsEF) were revived. The rodent malaria parasites isolated from Cameroon in 1974 by J. M. Bafort are currently without subspecies names, being designated as P. Fig. 1 Plasmodium vinckei parasites and their phenotypic characteristics. a Rodent malaria parasite species and subspecies and the geographical sites in sub-Saharan Africa where from which they were isolated (modified from [1]). Plasmodium vinckei is the only RMP species to have been isolated from five different locations. Inset: To date, several RMP isolates have been sequenced (black) to aid research with rodent malaria models. Additional RMP isolates have been sequenced in this study (red) to cover all subspecies of P. vinckei and further subspecies of Plasmodium chabaudi and Plasmodium yoelii. b Morphology of different life stages of P. vinckei baforti EL. R: Ring, ET: early trophozoite, LT: Late trophozoite, S: Schizont, MG: Male gametocyte, FG: Female gametocyte, O: oocyst and Sp: Sporozoite. Plasmodium vinckei trophozoites and gametocytes are morphologically distinct from other RMPs due to their rich haemozoin content (brown pigment). c Parasitaemia of ten P. vinckei isolates (split into two graphs for clarity) during infections in mice (n = 5) for a 20-day duration. † denotes host mortality. Plasmodium vinckei isolates show significant diversity in their virulence phenotypes yoelii subsp., P. vinckei subsp. and P. chabaudi subsp.. We now present the full genome sequence data for these isolates and show they form distinct clades within their parent species. Therefore, we propose the following subspecies names; Plasmodium yoelii cameronensis, from the country of origin; Plasmodium vinckei baforti, after J. M. Bafort, the original collector of this subspecies; and Plasmodium chabaudi esekanensis, from Eséka, Cameroon, the town from the outskirts of which it was originally collected. Eight uncloned stabilates (PvvCY, PvbDA, PvbDB, PvlDE, PvsEE, PyyCN, PynD and PykDG) were cloned by limiting dilution [43] and one clone each were chosen for subsequent experiments.

Plasmodium vinckei isolates display extensive diversity in virulence
We followed the infection profiles of ten P. vinckei isolates in female CBA/J mice (five biological replicates per group) to study their virulence traits. As reported previously [44], P. vinckei parasites are morphologically indistinguishable from each other, prefer to invade mature erythrocytes, are largely synchronous during blood-stage growth and display a characteristically rich abundance of haemozoin crystals in their trophozoites and gametocytes (Fig. 1b).
Parasitaemia was determined daily to measure the growth rate of each isolate and host RBC density and weight were measured as indications of "virulence" (harm to the host) (Fig. 1c, Additional files 2 and 3).
The P. v. vinckei isolate PvvCY was highly virulent and reached a parasitaemia of 89.4% ± 1.4 (standard error of mean; SEM) on day 6 post-inoculation of 1 × 10 6 bloodstage parasites intravenously, causing host mortality on that day. Both strains of P. v. brucechwatti, PvbDA and PvbDB were virulent and killed the host on day 7 or 8 post infection (peak parasitaemia of around 70%). The P. v. lentum parasites PvlDS and PvlDE were not lethal and were eventually cleared by the host immune system, with PvlDS's clearance more prolonged than that of PvlDE (parasitaemia clearance rates; PvlDS = 10%day − 1 ; SE = 1.1; p value of linear fit = 0.0025; PvlDE = 16.5%day − 1 ; SE = 3.9; p value = 0.023). The P. v. petteri isolates PvpCR and PvpBS reached peak parasitaemia along similar timelines (6-7 dpi), but PvpCR was virulent (peak parasitaemia = 60.4% ± 2.4 on day 6) and could sometimes kill the host while PvpBS maintained a mild infection.
Of the three isolates of P. vinckei baforti, PvsEL and PvsEE were similar in their growth profiles and their perceived effect on the host, while in contrast, PvsEH was highly virulent, causing host mortality at day 5, the earliest among all P. vinckei parasites.
RBC densities reduced during the course of infection proportionally to the rise in parasitaemia in all the P. and PvsEH (0.5 mg ± 0.1) did not cause any significant weight loss during their infection before host death occurred.

Plasmodium vinckei reference genome assembly and annotation
High-quality reference genomes (nuclear, apicoplast and mitochondrial genomes) for five P. vinckei isolates, one from each subspecies; P. v. vinckei CY, P. v. brucechwatti DA, P. v. lentum DE, P. v. petteri CR and P. v. baforti EL were assembled from single-molecule real-time (SMRT) sequencing using FALCON [45]. PacBio long reads of 10-20 kilobasepairs (kb) and with a high median coverage of > 155X across the genome (Additional File 1) enabled de novo assembly of each of the 14 chromosomes as single unitigs (high confidence contig) ( Table 2). Pac-Bio assembly base call errors were corrected (with ICORN2 [46]) using high-quality 350 bp and 550 bp insert PCR-free Illumina reads. A small number of gaps remain in the assemblies, but these are mainly confined to the apicoplast genomes and to the PvsEL and PvlDE genomes that were assembled from 10kb-long PacBio reads instead of 20 kb. The PvpCR and PvvCY assemblies, with each chromosome in one piece, are a significant improvement over their existing fragmented genome assemblies [41,42]. Plasmodium vinckei genome sizes range from 19.2 to 19.5 megabasepairs (Mb) except for PvvCY which has a smaller genome size of 18.3 Mb, similar to that of P. berghei (both isolates are from the same Katanga region). While we were not able to resolve the telomeric repeats at the ends of some of the chromosomes, all the resolved telomeric repeats had the RMP-specific sub-telomeric repeat sequences CCCTA(G)AA. The mitochondrial and apicoplast genomes were ∼ 6 kb and ∼30 kb long respectively, except for the apicoplast genomes of PvpCR and PvsEL for which we were able to resolve only partial assemblies due to low read coverage (Additional File 4).
Gene models were predicted by combining multiple lines of evidence using MAKER [47] to improve the quality of those predictions. These include publicly available P. chabaudi gene models (GeneDB version 3 [48]), de novo-predicted gene models (using AUGUSTUS [49]) and transcript models from strand-specific RNAseq data of different blood life cycle stages generated as part of this study. Consensus gene models were then Table 2 Genome assembly characteristics of five Plasmodium vinckei reference genomes. AT-rich P. vinckei genomes are 19.2 to 19.5 megabasepairs (Mb) long except for PvvCY which has a smaller genome size of 18.3 Mb, similar to Plasmodium berghei. PacBio long reads allowed for chromosomes to be assembled as gapless unitigs with a few exceptions. Number of genes include partial genes and pseudogenes. Copy numbers of the ten multigene families differ between the P. vinckei subspecies (ema1, erythrocyte membrane antigen 1; etramp, early transcribed membrane protein; hdh, haloacid dehalogenase-like hydrolase; lpl, lysophospholipases; p235, reticulocyte-binding protein; pir, Plasmodium interspersed repeat protein)  [50,51]. As a result, we annotated 5073 to 5319 protein-coding genes, 57-67 tRNA genes and 40-48 rRNA genes in each P. vinckei genome. Functional and orthology analyses (using InterProScan v5.17 [52] and OrthoMCL v2.0.9 [53]) with the predicted P. vinckei proteins showed that the core genome content in P. vinckei parasites is highly conserved among the species and are comparable to other rodent and primate malaria species.
Plasmodium vinckei genome assemblies reveal novel structural variations Comparative analysis of P. vinckei and other RMP genomes shows that P. vinckei genomes exhibit the same high level of synteny seen within RMP genomes, with the roughly 90% of genes conserved. However, upon aligning and comparing genome sequences, breaks in synteny (synteny breakpoints-SBPs) can be observed indicating that a number of chromosomal rearrangements have occurred. We aligned P. vinckei and other RMP genomes to identify synteny blocks between their chromosomes (Fig. 2a, Additional File 5 and 12A). Similar to previous findings in RMP genomes [26,54], we observed largescale exchange of material between non-homologous chromosomes, namely three reciprocal translocation events and one inversion. A pan-vinckei reciprocal translocation of ∼0.6 Mb (with 134 genes) and ∼0.4 Mb (with 99 genes) long regions between chromosomes VIII and X was observed between P. vinckei and P. berghei (whose genome closely resembles that of the putative RMP ancestor [54]). Within the P. vinckei subspecies, two reciprocal translocations separate P. v. petteri and P. v. baforti from the other three subspecies. One pair of exchanges (∼1 Mb and ∼0.6 Mb) was observed between chromosomes V and XIII, and another smaller pair (∼150 kb and ∼70 kb) between chromosomes V and VI. The SBPs in chromosomes V and VI were near rRNA units, loci previously described as hotspots for such rearrangement events [55,56]. These events have left the chromosome V of PvvCY-PvbDA-PvlDE and PvpCR-PvsEL groups with only a ∼ 0.2 Mb region of synteny between them, consisting of 48 genes while the remaining 304 genes have been rearranged with chromosomes VI and XII. There also exists a small, PvvCY-specific inversion of a ∼ 100-kb region in chromosome XIV.
A pan-RMP phylogeny reveals high genotypic diversity within the P. vinckei clade In order to re-evaluate the evolutionary relationships among RMPs, we first inferred a well-resolved specieslevel phylogeny that takes advantage of the manually curated gene models in eight available high-quality RMP genomes representing all RMPs. A maximum-likelihood phylogeny tree was inferred through partitioned analysis using RAxML [57], of a concatenated protein alignment (2,281,420 amino acids long) from 3920 single-copy, conserved core genes in eleven taxa (eight RMPs, P. falciparum, P. knowlesi and P. vivax; see Fig. 2b and Additional File 6).
Both protein alignment-based and SNP-based phylogenies show significant divergence among the P. vinckei subspecies compared to the other RMPs. All P. vinckei subspecies have begun to diverge from their common ancestor well before subspeciation events within P. yoelii and P. chabaudi.
A total of 521,934 polymorphic positions were found within the P. vinckei core coding regions consisting of 4644 non-sub-telomeric genes across P. vinckei isolates. The Katangan isolate, P. v. vinckei CY, has undergone significant divergence from the common vinckei ancestor and is the most diverged of any RMP subspecies sequenced to date. Number of SNPs ranged from 292,240 to 318,344 in pairwise comparisons of isolates with PvvCY, with 4237 to 4263 genes (out of the 4644 core genes) having at least 5 SNPs. Plasmodium v. brucechwatti has also diverged significantly, while the divergence of P. v. lentum is comparable to that of P. y. nigeriensis, P. y. killicki and P. c. subsp. from their respective putative ancestors. Genetic diversity within P. v. petteri and P. v. baforti isolates are similar to that observed within P. yoelii and P. chabaudi isolates while P. v. lentum and P. v. brucechwatti isolates have exceptionally high and low divergences respectively. Number of SNPs in pairwise comparisons between the closest subspecies, P. v. petteri and P. v. baforti, ranged from 53,554 (P. v. baforti EE vs P. v. petteri CR) to 69,454 (P. v. baforti EL vs P. v. petteri CR) with 2358 genes having at least 5 SNPs. Our robust phylogeny based on a comprehensive set of genome-wide sequence variations confirms previous estimates of RMP evolution based on isoenzyme variation [5] and gene sequences of multiple housekeeping loci [6,7], except for the placement of P. y. nigeriensis D which we show to be diverged earlier than P. y. killicki DG (supported by a bootstrap value of 100).

Molecular evolution within P. vinckei isolates
Using SNP data (Additional file 7), we then assessed the differences in selection pressure on the geographically diverse P. vinckei isolates by calculating the gene-wise Ka/Ks ratio as a measure of enrichment of nonsynonymous mutations in a gene (signifying positive selection). We first compared the Katangan isolate (PvvCY) from the highland forests in the DRC with the non-Katangan isolates from the lowland forests elsewhere.
We made pairwise comparisons of the four non-Katangan P. vinckei subspecies with P. v. vinckei which revealed several genes under significant positive selection (Fig. 3c). Notably, we identified three genes involved in mosquito transmission, namely, a gamete-release protein (GAMER) essential for gamete egress [59], a secreted ookinete protein (PIMMS43, previously known as PSOP25) essential for ookinete evasion from mosquito immune system [60] and a thrombospondin-related anonymous protein (TRAP) required for sporozoite infectivity of mosquito salivary glands and host hepatocytes [61], featuring in all Katangan/non-Katangan subspecies comparisons.
Several exported proteins and surface antigens were also identified to have undergone positive selection. PVVCY_0100120 (PCHAS_0100651 being the gene ortholog in P. chabaudi) has a circumsporozoite-related antigen PFAM domain (PF06589) and is a conserved protein found in all RMPs except P. berghei. PVVCY_ 1200100 (PBANKA_1002600) is a merozoite surface antigen, p41 [62] that is secreted following invasion [63].
We then searched for the presence of any geographic location-specific selection pressures among the lowland forest isolates. Plasmodium vinckei isolates from Nigeria, Congo and Cameroon (P. v. brucechwatti, P. v. lentum and P. v. baforti respectively) were compared with P. v. petteri CR from the CAR. To see if similar selection pressures have acted on other RMP species too, we also analysed the P. yoelii (Nigerian PynD, PykDG from Congo and the Cameroonian PysEL versus Pyy17X from CAR) and P. chabaudi isolates (Cameroonian PcsEF versus PccAS from CAR) from the same regions.
Several exported and rhoptry-associated proteins were identified as having been subject to the influence of positive selection in each comparison but there was no overlap of positively selected genes among the lowland forest isolates. We identified a conserved rodent malaria protein of unknown function (PVVCY_0501990; PBANKA_ 051950) that seems to be under significant positive selection with high Ka/Ks values (ranging from 2.14 to 4.39) in all lowland P. vinckei comparisons except P. v. petteri-P. v. baforti. The P. yoelii ortholog of this protein was also positively selected among P. y. yoelii, P. y. nigeriensis and P. y. killicki but was not under selection within the P. y. yoelii isolates, signifying region-specific selection pressures.
A 28 kDa ookinete surface protein (P28; PVVCY_ 0501540; PBANKA_0514900) seems to be under positive selection in the Nigerian P. v. brucechwatti as it features in both P. v. vinckei-P. v. brucechwatti and P. v. brucechwatti-P. v. petteri comparisons. The protein is also seen positively selected among corresponding Nigerian and Central African Republic P. yoelii isolates (P. y. nigeriensis-P. y. yoelii). A protein phosphatase (PPM8; PVVCY_ 0903370; PBANKA_0913400) has also undergone positive selection in all three RMP species between CAR and (See figure on previous page.) Fig. 2 Structural variations and genotypic diversity among Plasmodium vinckei parasites. a Chromosomal rearrangements in P. vinckei parasites. Pairwise synteny was assessed between the five P. vinckei subspecies and Plasmodium berghei (to represent the earliest common RMP ancestor). The 14 chromosomes of different RMP genomes are arranged as a Circos plot and the ribbons (grey) between them denote regions of synteny. Three reciprocal translocation events (red) and one inversion (blue) accompany the separation of the different P. vinckei subspecies. A pan-vinckei reciprocal translocation between chromosomes VIII and X was observed between P. vinckei and other RMP genomes. Within the P. vinckei subspecies, two reciprocal translocations, between chromosomes V and XIII, and between chromosomes V and VI, separate Plasmodium vinckei petteri and P. v. baforti from the other three subspecies. A small inversion of ∼100 kb region in chromosome 14 has occurred in PvvCY alone. b Maximum likelihood phylogeny of different RMP species with high-quality reference genomes based on protein alignment of 3920 one-to-one orthologs (bootstrap values of each node are shown). Genomes of three human malaria species-Plasmodium falciparum, Plasmodium vivax and Plasmodium knowlesi-were included in the analysis as outgroups. c Maximum likelihood phylogenetic tree of all sequenced RMP isolates based on 1,010,956 high-quality SNPs (bootstrap values of each node are shown). There exists significant genotypic diversity among the P. vinckei isolates compared to the other RMPs. All P. vinckei subspecies have begun to diverge from their common ancestor well before subspeciation events within Plasmodium yoelii and Plasmodium chabaudi. Genetic diversity within P. v. petteri and P. v. baforti isolates are similar to those observed within P. yoelii and P. chabaudi isolates while P. v. lentum and P. v. brucechwatti isolates have exceptionally high and low divergences respectively. Genes with significantly high Ka/Ks ratios in different subspecies-wise comparisons (as indicated by connector lines), the gene's Ka/Ks ratio averaged across all indicated P. vinckei comparisons and geographical origin of the isolates are shown Congo isolates (P. c. chabaudi-P. c. adami comparison from [26]).

Evolutionary patterns within the RMP multigene families
We were able to accurately annotate members of the ten RMP multigene families (Plasmodium interspersed repeat genes; pir, fam-a, fam-b, fam-c, fam-d, early transcribed membrane protein; etramp, erythrocyte membrane antigen 1; ema1, lysophospholipases, haloacid dehalogenases and reticulocyte-binding protein; p235) in the P. vinckei genomes owing to the well-resolved subtelomeric regions in the Pacbio assemblies and manually curated gene models (see Table 2 and Additional file 8). P. v. vinckei, similar to its sympatric species, P. berghei, had a lower multigene family repertoire with copy numbers strikingly less than other P. vinckei subspecies (exceptions were the pir, etramp and lysophospholipase families). Multigene family sizes in the four non-Katangan P. vinckei subspecies were similar to P. chabaudi except for expansion in the ema1 and fam-c multigene families ( Fig. 3a and Additional File 8).
Next, we inferred maximum likelihood-based phylogenies for the ten multigene families, in order to identify structural differences among their members and to determine family evolutionary patterns across RMP species and P. vinckei subspecies (Fig. 3b, Additional files 8 and 9). Overall, we identified robust clades (with bootstrap value > 70) that fell into the following categories, (i) pan-RMP, with orthologous genes from the four RMP species, (ii) berghei group, with genes from P. berghei and P. yoelii alone, (iii) vinckei group, with genes from P. chabaudi and any or all P. vinckei subspecies, (iv) P. vinckei, with genes only from P. vinckei subspecies and (v) non-Katangan, with genes from all P. vinckei subspecies except P. v. vinckei.
In general, a high level of orthology was observed between P. chabaudi and P. vinckei genes forming several vinckei group clades in contrast to more species-specific clades of paralogous genes being formed in P. berghei and P. yoelii (Fig. 3b, Additional File 9). Thus, family expansions in P. chabaudi and P. vinckei seem to have occurred in the common vinckei group ancestor prior to speciation.
We rebuilt the phylogenetic trees for pir, fam-a and fam-b families in order to see if the previously defined clades [25,26] were also maintained in P. vinckei. Overall, we were able to reproduce the tree structures for the three families using the ML method with P. vinckei gene family members now added to them (Additional File 9).
In order to assess the general level of expression of multigene family members in blood-stage parasites, we superimposed blood-stage RNAseq data onto the phylogenetic trees. Life stage-specific expression of multigene family members in five RMPs-P. berghei (rings, trophozoites, schizonts and gametocytes) [26], P. chabaudi and P. v. vinckei (rings, trophozoites and schizonts) [64,65], P. v. petteri and P. v. lentum (rings, trophozoites and gametocytes) and mixed blood-stage expression levels in P. yoelii [58], P. v. brucechwatti and P. v. baforti were assessed (Additional File 10).
The level of gene transcription was designated low for genes with normalised FPKM (Fragments Per Kilobase of transcript per Million mapped reads) less than 36, medium if between 36 and 256 and high for genes with FPKM above 256. Both the levels and life stage specificity of gene expression within the (See figure on previous page.) Fig. 3 Sub-telomeric multigene family expansions in Plasmodium vinckei parasites. a Violin plots show sub-telomeric multigene family size variations among RMPs and Plasmodium falciparum. The erythrocyte membrane antigen 1 and fam-c multigene families are expanded in the non-Katangan P. vinckei parasites (red). Apart from these families, multigene families have expanded in P. vinckei similar to that in Plasmodium chabaudi. The Katangan isolate PvvCY (purple) has a smaller number of family members compared to non-Katangan isolates (orange) except for lysophospholipases, p235 and pir gene families. b Maximum Likelihood phylogeny of 99 ema1 (top) and 328 fam-c (bottom) genes in RMPs. Branch nodes with good bootstrap support (> 70) are marked in red. The first coloured band denotes the RMP species to which the particular gene taxon belongs to. The heatmap denotes the relative gene expression among rings, trophozoite, schizont and gametocyte stages in the RMPs for which data are publicly available. Orange denotes high relative gene expression and white denotes low relative gene expression, while grey denotes lack of information. Gene expression was classified into three categories based on FPKM level distribution-High (black) denotes the top 25% of ranked FPKM of all expressed genes (FPKM > 256), Low (light grey) is the lower 25% of all expressed genes (FPKM < 32) and Medium level expression (grey; 32 < FPKM< 256). "P" denotes pseudogenes. Four vinckei-group (P. chabaudi and P. vinckei subspecies)-specific clades (clades I-IV; orange), two vinckei-specific clade (clades IV and V; purple) and one non-Katangan-specific clade (clade VI; blue) can be identified within ema1 family with strong gene expression, maximal during ring stages. Rest of the family's expansion within non-Katangan P. vinckei isolates are mainly pseudogenes with weak transcriptional evidence. The fam-c gene phylogeny shows the presence of four distinctly distal clades (A) with robust basal support (96)(97)(98)(99)(100). Of the four clades, two are pan-RMP (grey) and two are vinckei-group-specific (orange), each consisting of fam-c genes positionally conserved across the member subspecies. Most members of these clades show medium to high gene expression during asexual blood stages. Other well-supported clades can be classified as either berghei-group-specific (two; green), vinckei-groupspecific (two; orange), P. vinckei-specific (two; purple) or non-Katangan P. vinckei-specific clades (three; blue). There is evidence of significant species-specific expansion with striking examples in Plasmodium yoelii (i), P. chabaudi (ii) and in P. v. brucechwatti (iii) various clades were generally conserved across the RMPs signifying that orthologs in structurally distinct clades might have conserved functions across the different RMPs.
The erythrocyte membrane antigen 1 and fam-c subtelomeric multigene families are expanded in non-Katangan P. vinckei parasites The erythrocyte membrane antigen 1 (EMA1) was first identified and described in P. chabaudi and is associated with the host RBC membrane [66]. These genes encode for a ∼ 800 aa-long protein and consist of two exons, a first short exon carrying a signal peptide followed by a longer exon carrying a PcEMA1 protein family domain (Pfam ID-PF07418). The gene encoding EMA1 is present only as a single copy in P. yoelii or as two copies in P. berghei but has expanded to 14 genes in P. chabaudi. We see similar gene expansions of 15 to 21 members in the four non-Katangan P. vinckei parasites (PvbDA, PvlDE, PvpCR and PvsEL; Fig. 3a and Additional file 8). However, almost half of these genes are pseudogenes with a conserved SNP (C>A) at base position 14 that introduces a TAA stop codon (S5X) within the signal peptide region, followed by a few more stop codons in the rest of the gene (see Additional File 12B). Apart from one or two cases, the S5X mutation is found in all pseudogenes belonging to the ema1 family and is vinckei-specific (it is not present in the single pseudogene in P. chabaudi).
A ML-based phylogeny inferred for the 99 ema1 genes was in general well-resolved with robust branch support for most nodes (see Fig. 3b). Three distinct vinckeigroup-specific clades (clades I to III), two vinckei-specific clades (clades IV and V) and a non-Katangan P. vinckeispecific clade (VI) with good basal support (bootstrap value of 75-100) were identified. Clades I-IV each consist of ema1 genes positionally conserved across P. chabaudi and all five P. vinckei subspecies and are actively transcribed during blood stages.
Of the two P. berghei ema1 genes, one forms a distal clade with the single P. yoelii gene while the other is paraphyletic within Clade V, pointing to presence of two ema1 loci in the common RMP ancestor, one of which was possibly lost during speciation of P. yoelii.
Family expansion in P. chabaudi is mainly driven by gene duplication giving rise to P. chabaudi-specific clades. In contrast, family expansion within non-Katangan P. vinckei parasites is mainly driven by expansion of pseudogenized ema1 genes (41 genes). The pseudogenes, in general, do not form subspecies-specific clades suggesting that the expansion must have occurred in their non-Katangan P. vinckei common ancestor.
The members of the P. vinckei-specific clade IV are heavily transcribed during blood stages but most ema1 pseudogenes that share ancestral lineage with clade IV are very weakly transcribed. Taken together, a core repertoire of conserved ema1 genes arising from 4 to 5 independent ancestral lineages are actively transcribed during blood stages of P. vinckei and P. chabaudi. The ema1 multigene family expansion in P. vinckei is largely due to duplications of ema1 pseudogenes, all carrying a S5X mutation and lacking transcription.
The fam-c proteins are exported proteins characterised by pyst-c1 and pyst-c2 domains, first identified in P. yoelii [55]. The fam-c genes are exclusively found in the sub-telomeric regions and are composed of two exons and an intron, of which the first exon is uniformly 80 bps long (with a few exceptions). fam-c proteins are approximately 100-200 amino acids long, and more than one third of the proteins in P. vinckei contain a transmembrane domain (75.5%) and a signal peptide (88.9%) but most of them lack a PEXEL-motif (motif was detected in only 4% of the genes compared to 24% in other RMPs).
An ML-based tree of all fam-c genes shows the presence of four distinctly distal clades (marked as A in Fig. 3b) with robust basal support (96)(97)(98)(99)(100). Two of them are pan-RMP and two are vinckei-group-specific, each consisting of fam-c genes positionally conserved across the member subspecies (taking into account the genome rearrangement between chromosome V and VI within the vinckei clade). Most members of these clades show medium to high gene expression during asexual blood stages.
There is a considerable expansion of this family in the non-Katangan P. vinckei strains resulting in 59 to 65 members, twice that of PvvCY and other RMPs. Subspecies-wise distinctions among the non-Katangan P. vinckei fam-c genes are less resolved as they form both paralogous and orthologous groups between the four subspecies with several ortholog pairs strongly supported by bootstrap values. Thus, fam-c gene family expansion in the non-Katangan P. vinckei subspecies seems to have been driven by both gene duplications in their common ancestor and subspecies-specific gene family expansions subsequent to subspeciation.
There is also evidence of significant species-and subspecies-specific expansions with striking examples in P. yoelii, P. chabaudi and in P. v. brucechwatti (marked i, ii and iii in Fig. 3b respectively), though they do not form well-supported clades.

Genetic crossing can be performed between P. vinckei isolates
The availability of several isolates within each P. vinckei subspecies with varying growth rates and wide genetic diversity makes them well-suited for genetic studies. Therefore, we attempted genetic crossing of the two P.
vinckei baforti isolates, PvsEH and PvsEL, that displayed differences in their growth rates. Optimal transmission temperature and vector stages were initially characterised for P. v. baforti EE, EH and EL. Each isolate was inoculated into three CBA mice and on day 3 post infection, around 100 female A. stephensi mosquitoes were allowed to engorge on each mouse at different temperatures-21°C, 23°C and 26°C. All three P. v. baforti isolates were able to establish infections in mosquitoes at 23°C and 26°C, producing at least 50 mature oocysts on day 15 post-feed but failed to transmit at 21°C (Additional file 11). Four to five oocysts of 12.5-17.5 μm diameter were observed at day 8 post-feed in the mosquito midgut and around a hundred mature oocysts of 50 μm diameter could be observed at day 15 post-feed. Some of these mature oocysts had progressed into sporozoites but only a very few (less than 10) appeared upon disruption of the salivary glands.
To perform a genetic cross between PvsEH and PvsEL, a mixed inoculum containing equal proportions of PvsEH and PvsEL parasites was injected into CBA mice and a mosquito feed was performed on both day 3 and day 4 post infection to increase the chances of a successful transmission (Additional file 11). For each feed, around 160 female A. stephensi mosquitoes were allowed to take a blood meal from two anaesthetized mice at 24°C for 40 min without interruption.
Upon inspection of mosquito midguts for the presence of oocysts on day 9 post-feed, 100% infection was observed (all midguts inspected contained oocysts) for both day 3 and day 4 feeds. Around 25-100 oocysts were found per midgut in day 3 fed mosquitoes and 5-40 oocysts per midgut in day 4 fed mosquitoes. On day 12 post-feed, mature oocysts and also a high number of sporozoites (40 to 100) were found in the midguts, but upon disrupting the salivary glands on day 20 post-feed, only a few sporozoites (2 to 7) were found in the suspension (Additional file 11).
Sporozoites were isolated by disrupting salivary glands from day 3 and 4 fed mosquitoes and were injected into ICR mice (D3 and D4 respectively). Five days later, both mice became positive for blood-stage parasites. In order to investigate if a genetic cross has taken place, four clones were obtained from D4 by limiting dilution to screen for presence of both PvsEH and PvsEL alleles within the chromosomes. Based on the SNPs identified between PvsEH and PvsEL, we amplified 600 to 1000 bp regions from polymorphic genes on both ends of the 14 chromosomes that contained isolate-specific SNPs and performed Sanger sequencing of the amplicons (primer sequences in Additional file 11).
Both PvsEL-specific (11) and PvsEH-specific (17) markers were found in the 28 markers sequenced (one marker, PVSEL_0600390, could not be amplified). Also, five chromosomes clearly showed evidence of chromosomal cross-over since they contained markers from both isolates (see Fig. 4a), thus confirming a successful P. vinckei genetic cross. However, all four clones had the same pattern of recombination which suggests that the diversity of recombinants in the cross-progeny was low and a single recombinant parasite might have undergone significant clonal expansion.

P. vinckei parasites are amenable to genetic manipulation
We asked if P. vinckei parasites can be genetically modified by applying existing transfection and genetic modification techniques routinely used in other RMPs. Plasmodium v. vinckei CY was chosen to test this because the isolate naturally established a synchronous infection in mice and reaches a high parasitaemia, which results in an abundance of schizonts for transfection. We aimed to produce a PvvCY line that constitutively expresses GFP-Luc (green fluorescent protein-firefly luciferase) fusion protein, similar to those produced in P. berghei and P. yoelii [67,68]. A recombination plasmid, pPvvCY-Δp230p-gfpLuc, was constructed to target and replace the dispensable wildtype P230p locus in P. v. vinckei CY (PVVCY_0300700) with a gene cassette encoding for GFPLuc and a hdhfr selectable marker cassette (Fig. 4b).
Transfection of purified PvvCY schizonts with 20 μg of linearized pPvvCY-Δp230p-gfpLuc plasmid by electroporation, followed by marker selection using pyrimethamine yielded pyrimethamine-resistant transfectant parasites (PvGFP-Luc con ) on day 6 after drug treatment. Stable transfectants were cloned by limiting dilution and plasmid integration in these clones was confirmed by PCR. Constitutive expression of GFPLuc in PvGFP-Luccon asexual and sexual blood-stage parasites was confirmed by fluorescence live cell imaging (Fig. 4c). GFPLuc expression in PvGFP-Luc con oocysts was confirmed by fluorescence imaging of mosquito midguts 7 days after blood meal.

Discussion
Of the four RMP species that have been adapted to laboratory mice, P. berghei, P. yoelii and P. chabaudi have been extensively used to investigate malaria parasite biology. Adopting these RMPs as tractable experimental models has been facilitated by continuous efforts in characterising their phenotypes, sequencing their genomes and establishing protocols for parasite maintenance, genetic crossing and genetic modification. Here, we extend these efforts to Plasmodium vinckei.
We have systematically studied ten P. vinckei isolates and produced a comprehensive resource of their reference genomes, transcriptomes, genotypes and phenotypes to help establish P. vinckei as a useful additional experimental model for malaria.
Enzyme variation and molecular phylogeny studies indicate that the five subspecies of P. vinckei have diverged significantly from each probably due to the geographical isolation of these parasites in different locations around the African Congo basin. This diversity calls for a reference genome for each subspecies in order to capture large-scale changes in their genomes such as chromosomal structural variations and gene copy number variations that might have played a role in their subspeciation. To accurately capture these events, we used a combination of Pacbio and Illumina sequencing that allowed us to produce an end-to-end assembly of P. vinckei chromosomes. This, coupled with manual curation of the predicted gene models, led to the creation of five high-quality reference genomes for P. vinckei that are a significant improvement to the existing fragmented genomes available for P. v. vinckei and P. v. petteri.
Comparative synteny analysis between P. vinckei and other RMP genomes reveals structural variations at both the species and the subspecies levels. Assuming that the observed variations have occurred only once, a putative pathway of genome rearrangements during RMP evolution can be inferred. No rearrangements have occurred during P. berghei and P. yoelii speciation and their genomes are likely to be identical to the RMP ancestor [54]. A reciprocal translocation between chromosomes VIII and X has accompanied the speciation of P. vinckei, and this is mutually exclusive from the reciprocal translocation between chromosome VII and IX that has occurred during P. chabaudi speciation. Following this, there has been a small inversion in chromosome X during the subspeciation of P. v. vinckei and translocations Fig. 4 Phenotypic variation and genetics in Plasmodium vinckei parasites. a Schematic of isolate-specific genetic markers detected in clonal line of PvsEL X PvsEH cross-progeny by Sanger sequencing. Genetic markers from both EH (red) and EL (blue) isolates were detected in the crossed progeny proving successful genetic crossing. b Schematic of homologous recombination-mediated insertion of a gfp-luciferase cassette into the p230p locus in P. vinckei CY. c GFP expression in different blood stages of PvvCY and luciferase expression of PvvCY oocysts in mosquito midgut between chromosomes V, VI and XIII during the subspeciation of P. v. petteri (which are then carried over to P. v. baforti). The biological or phenotypic significance, if any, of such alterations are poorly understood, but it seems likely that they may drive speciation through the promotion of reproductive isolation of species or subspecies.
In addition to the five P. vinckei reference genomes, we generated sequencing data for five more P. vinckei isolates to make available at least two genotypes per P. vinckei subspecies (except for P. v. vinckei for which only one isolate is available). This will facilitate future studies that might employ P. vinckei parasites to study phenotypegenotype relationships. Similarly, we also supplemented the existing genotype information for other RMPs by sequencing seven isolates that cover additional subspecies of P. chabaudi (P. c. esekanensis) and P. yoelii (P. y. nigeriensis, P. y. killicki and P. y. cameronensis). Our data thus comprises of genotypes from sympatric species from each region of isolation allowing us to re-evaluate the genotypic diversity and evolution among RMP isolates.
A genome-wide SNP-based phylogeny shows that the divergences between different subspecies are proportional to the level of isolation of the habitat for all RMP species. Plasmodium vinckei, P. yoelii and P. chabaudi isolates from sites in Cameroon have very similar genotypes to their counterparts in the Central African Republic denoting similar evolutionary pressures and perhaps the presence of gene flow across these regions, while isolates from Brazzaville (Congo) are more diverged probably due to the different environmental conditions in these locations [44].
Subspecies from West Nigeria and the DRC are highly diverged compared to subspecies from the rest of Africa. The distinctiveness of P. berghei and P. v. vinckei, both from the DRC is most likely due to climactic and hostvector differences in the highland forests of Katanga. Highland forests are an altitude of 1000-7000 m with mean temperature of 21°C whereas the lowland forests lie at an altitude less than 800 m with a mean temperature of 25°C. Different host-vector systems are prevalent in the lowland forests (Grammomys poensis (previously known as Thamnomys rutilans)-specific mosquito species unknown) and the highland Katangan forests (Grammomys surdaster-Anopheles dureni millecampsi [69]). The associated selection pressures seem to have mainly influenced their transmission, as reflected by their lower optimal transmission temperatures and the high Ka/Ks ratios observed in GAMER, PIMMS43 and TRAP, proteins that play critical functions in this process. Recently, several more rodent host and mosquito vector species have been identified in the forests of Gabon [70] implying that a diverse set of host-vector systems could have existed for RMPs. Thus, diversification of RMP species into several subspecies within these isolated ecological niches might have been driven by evolutionary forces resulting from the diverse host, vector and environmental conditions experienced at each locale.
Malaria parasite genomes contain several highly polymorphic multigene families located in the sub-telomeric chromosomal regions that encode a variety of exported proteins involved in processes such as immune evasion, cytoadherence, nutrient uptake and membrane synthesis. Multigene families are thought to have evolved rapidly under the influence of immune and other evolutionary pressures resulting in copy number variations and rampant sequence reshuffling that ultimately leads to phenotypic plasticity in Plasmodium.
Previously, phylogenetic analyses of pir, fam-a and fam-b genes from three RMP species have shown that structurally distinct genes exist within these families forming robust clades with varied levels of orthology/ paralogy. Identifying sub-families that have structurally diversified within the multigene families can help to better understand their functions and to this end, we constructed phylogenetic trees for the ten multigene families with genes from all four RMP species. Due to the scale of the analysis, we applied automated trimming to our alignments and limited our tree inference method to maximum likelihood. While this resulted in poor bootstrap values for some clades in the pir, fam-a and fam-b trees compared to previous phylogenetic analyses [25,26], our method was able to retrieve similar tree topologies to those previously inferred and in general produced trees with good nodal support for the rest of the multigene families.
By incorporating family members from P. vinckei, we were able to resolve the ancestral lineages that have expanded in the common vinckei group ancestor and their evolution within P. vinckei's geographically distinct subspecies in response to selective pressures. While our comprehensive phylogenies depict distinct evolutionary histories for each family, they also show presence of robust pan-RMP clades that represent ancestral lineages consisting of structural orthologs that perform conserved functions across all RMPs and will be useful for future work with these families.
Inclusion of P. vinckei pirs in the RMP pir family phylogeny show that P. vinckei pirs do not form independent clades of their own and instead populate three P. chabaudi-dominant clades. This suggests that some of the pir clades were established earlier on when the classical vinckei and berghei groups of parasites split from their common RMP ancestor resulting in vinckei-group-specific clades like L1, S7 and S1g.
Similarly, the addition of P. vinckei genes resolved the ancestral clade of internal fam-a genes into several wellsupported vinckei group clades and a berghei group clade. We observe similarly high level of orthology between P. chabaudi and P. vinckei genes in other multigene families forming several vinckei group clades in contrast to more species-specific clades of paralogous genes in P. berghei and P. yoelii. For example, within the fam-d family, five ancestral lineages can be identified in the vinckei group as opposed to only one paralogous P. yoelii-specific expansion within the berghei group. Taken together, it seems that family expansions in P. chabaudi and P. vinckei have occurred in the common vinckei group ancestor prior to speciation and that multigene families have evolved quite differently across the vinckei and berghei groups of RMPs. These might be related to the striking differences in the basic phenotypes of these two groups of parasites.
We also observed size expansions in the ema1 and fam-c families within the non-Katangan P. vinckei parasites, all being isolates from the lowlands around the Congo Basin. The ema1 family expansions seem to be specific to lowlands dwelling vinckei group parasites as they are expanded in both non-Katangan P. vinckei and P. chabaudi. However, unlike P. chabaudi, the duplicated gene members in non-Katangan P. vinckei are all pseudogenized by a S5X mutation effectively rendering the functional repertoire to be just 6-8 genes, similar to highlands dwelling Katangan P. v. vinckei. Thus, it could be speculated that even under similar selective pressures, ema1 family expansions contribute to parasite fitness in P. chabaudi but may not be required for the survival of sympatric P. vinckei parasites. The P. vinckei ema1 pseudogenes could still serve as silent donor genes that recombine into functional variants to bring about antigenic variation [71]. In the case of the fam-c gene family, the expansion is specific to non-Katangan P. vinckei subspecies since P. chabaudi, P. yoelii and P. v. vinckei all have similar repertoire sizes. The expansions seem to be driven by gene duplications initially in their non-Katangan common ancestor and again after subspeciation.
The effect of the difference in habitats is even more pronounced in the Katangan parasite, P. v. vinckei. It has a smaller genome and a compact multigene family repertoire reminiscent of the only other Katangan isolate, P. berghei and its genetic distance from other members of the P. vinckei clade is in the same order of magnitude as that between separate species within the RMPs. The reduced multigene family repertoire mainly consists of members belonging to pan-RMP or vinckei-group-specific ancestral lineages making it an ideal vinckei group parasite to study the localization and function of variant proteins.
We tested whether PvvCY was amenable to genetic manipulation using standard transfection protocols already established for other RMPs. We were able to successfully knock-in a GFP-luciferase fusion cassette to PvvCY to produce a GFP-Luc reporter line for P. v. vinckei, following a transfection protocol routinely used for modifying P. yoelii in our lab [58,65]. We were able to visualise GFP-positive parasites during different blood stages and in oocysts thus confirming stable GFPLuc expression. We were unable to visualise other life stages (sporozoites and liver stages) due to our failure to produce viable salivary gland sporozoites in this parasite.
The transfection of P. chabaudi has been challenging due to its slow proliferation rate and schizont sequestration resulting in low merozoite yield, thus necessitating optimised transfection protocols. In contrast, Plasmodium v. vinckei reaches high parasitaemia without being immediately lethal to the host (90% parasitaemia on day 6) and is highly synchronous yielding a large number of schizonts. A predominant population of schizonts appear near midnight in P. v. vinckei infections, at which point, they can be Percoll-purified from exsanguinated blood and transfected with DNA.
P. vinckei and P. chabaudi, while being distinct species, share several characteristics that are common among vinckei group RMPs, such as a predilection for mature erythrocytes, synchronous infections and the sequestration of schizonts from peripheral circulation [33,38,[72][73][74]. Thus, P. v. vinckei can serve as an ideal experimental model for functional studies targeting these aspects of parasite biology.
The availability of several RMP isolates with phenotypic differences aids their use in study of parasite fitness and transmission success in mixed infections [75,76] and for the identification of genes involved in parasite virulence, strain-specific immunity, drug resistance and host-cell preference using genetic crosses [58,[77][78][79]. With this in mind, we studied the virulence of ten P. vinckei isolates to identify differences in their growth rate and their effect on the host.
Some of these isolates have been previously characterised [44], but we systematically profiled additional representative isolates for each subspecies (where available) under comparative conditions in the same host strain. We identified pairs of isolates with contrasting virulence phenotypes within two P. vinckei subspecies-P. v. petteri (PvpCR and PvpBS) and P. v. baforti (PvsEH and PvsEL or PvsEE). These isolate pairs would be ideal candidates for studies utilising genetic crossing to identify genetic loci linked to virulence using Linkage Group Selection [58].
Since P. vinckei subspecies have significantly diverged from each other, isolates within the same subspecies are more likely to recombine than isolates from different subspecies. However, intra-specific hybrids between P. v. petteri and P. v. baforti may also be possible (as demonstrated earlier in P. yoelii [80]) since these two subspecies are closely related (see Fig. 2c). However, difficulties in transmitting P. vinckei parasites have been reported previously with either the gametocytes failing to produce midgut infections or sporozoites failing to invade the salivary glands or infections resulting in non-infective sporozoites [27,30,35]. Repeated attempts to create a cross between two P. vinckei baforti isolates failed to produce any detectable recombinants due to low frequency of mosquito transmission [35]. Here, we renewed these efforts with different P. vinckei isolates to see if we could establish a P. vinckei genetic cross. Two attempts were made to create a PvpCR X PvpBS cross and further two attempts were made to create a PvsEL X PvsEH cross. However, in all attempts the sporozoites failed to optimally invade the salivary glands and we managed to isolate only a few in the P. v. subsp cross, subsequently obtaining a cross-progeny in mice. While we were able to demonstrate a successful genetic cross by showing the presence of alleles from both isolates in the cross-progeny, the recombinant diversity was quite low probably due to the transmission bottleneck. We are currently further investigating the optimal conditions for transmitting P. vinckei.

Conclusions
In this study, we have created a comprehensive resource for the rodent malaria parasite Plasmodium vinckei, comprising of five high-quality reference genomes, and blood-stage-specific transcriptomes, genotypes and phenotypes for ten isolates. We have employed state-of-theart sequencing technologies to produce largely complete genome assemblies and highly accurate gene models that were manually polished based on strand-specific RNA sequencing data. The unfragmented nature of our genome assemblies allowed us to characterise structural variations within P. vinckei subspecies, which, to the best of our knowledge, is the first time that large-scale genome rearrangements have been found among subspecies of a Plasmodium species.
Through our extensive sequencing efforts, we have made at least one genotype available for all subspecies of the RMP that previously lacked any sequencing data. While our pan-RMP phylogenies broadly agree with previous biochemical and molecular data-based studies, our reconstruction based on sequence variations on a genome scale provides higher resolution to the divergence estimates. Our comprehensive phylogenetic analysis of multigene families across all RMP species will enable future studies on the critical role of multigene families in parasite adaptation, and to aid this, we have made searchable and interactive versions of the phylogenies publicly available through the iTOL online tool [81]. The gene expression data from our study, covering specific blood stages for some P. vinckei subspecies, show conserved expression of multigene family members across RMPs. While not comprehensive, it complements existing RMP transcriptomes and will aid functional studies in the P. vinckei model.
We have described here in detail the phenotypic, structural, copy number and nucleotide-level variations among several P. vinckei isolates. We hope that these efforts would greatly aid genetic linkage studies in the future to resolve genotype-phenotype relationships. This resource could aid the determination of genes involved in growth rate differences, virulence, drug resistance, host-cell tropism, transmissibility to vectors, etc.
A major limitation to this is the difficulty we faced in producing large numbers of recombinant parasites through genetic crossing of P. vinckei parasites. Attempts to transmit isolates from three different P. vinckei subspecies in A. stephensi mosquitoes failed here as sporozoites repeatedly failed to infect the salivary glands. Careful optimisation of transmission parameters and serial mosquito passages of the P. vinckei parasites will help in improving their transmission efficiency.
We have successfully demonstrated genetic manipulation in P. v. vinckei for functional studies in using this species. The synchronicity of P. v. vinckei infection and its unique ability to sustain high parasitaemia without killing its host culminating in good schizont yields make this parasite an attractive model for reverse genetics studies, especially those on multigene families owing to its reduced repertoire.

Parasite lines and experiments using mice and mosquitos
The parasite lines used in this study and their original isolate information are detailed in Table 1. Frozen parasite stabilates of cloned or uncloned lines were revived and inoculated intravenously into ICR mice. Five P. vinckei isolates (PvvCY, PvbDA, PvbDB, PvlDE and PvsEE) and three P. yoelii isolates (PyyCN, PynD and PykDG) were uncloned stabilates and were cloned by limiting dilution [43] to obtain clonal parasite lines.
Laboratory animal experimentation was performed in strict accordance with the Japanese Humane Treatment and Management of Animals Law (Law No. 105 dated 19 October 1973 modified on 2 June 2006), and the Regulation on Animal Experimentation at Nagasaki University, Japan. The protocol was approved by the Institutional Animal Research Committee of Nagasaki University (permit: 12072610052).
Six-to 8-week-old female ICR or CBA mice were used in all the experiments. The mice were housed at 23°C and maintained on a diet of mouse feed and water. Mice infected with malaria parasites were given 0.05% paraaminobenzoic acid (PABA)-supplemented water to assist parasite growth.
All mosquito transmission experiments were performed using Anopheles stephensi mosquitoes housed in a temperature and humidity-controlled insectary at 24°C and 70% humidity. Mosquito larvae were fed with mouse feed and yeast mixture and adult mosquitoes were maintained on 10% glucose solution supplemented with 0.05% PABA.

Parasite growth profiling
For each isolate, an inoculum containing 1 × 10 6 parasitized RBCs was injected intravenously to five CBA mice. Blood smears, haematocrit readings (Beckman Coulter Counter) and body weight readings were taken daily for 20 days or until host mortality to monitor parasitaemia, anaemia and weight loss. Blood smears were fixed with 100% methanol and stained with Giemsa's solution. The average parasitaemia was calculated from parasite and total RBC counts taken at three independent microscopic fields.

Genomic DNA isolation and whole genome sequencing
Parasitized whole blood was collected from the brachial arteries of infected mice and blood sera was removed by centrifugation. RBC pellets were washed once with PBS and leukocyte-depleted using CF11 (Sigma Cat# C6288) cellulose columns. Parasite pellets were obtained by gentle lysis of RBCs with 0.15% saponin solution. Genomic DNA extraction from the parasite pellet was performed using DNAzol reagent (Invitrogen CAT # 10503027) as per the manufacturer's instructions.
Single-molecule sequencing was performed for five P. vinckei isolates. Five to ten micrograms of gDNA was sheared using a Covaris g-TUBE shearing device to obtain target sizes of 20 kb (for PvvCY, PvbDA and PvpCR) and 10 kb (for samples PvlDE and PvsEL). Sheared DNA was concentrated using AMPure magnetic beads and SMRTbell template libraries were generated as per Pacific Biosciences instructions. Libraries were sequenced using P6 polymerase and chemistry version 4 (P6C4) on 3-6 SMRT cells and sequenced on a PacBio RS II. Reads were filtered using SMRT portal v2.2 with default parameters. Read yields were 352,693, 356,960, 765,596, 386,746 and 675,879 reads for PvvCY, PvbDA, PvlDE, PvpCR and PvsEL respectively totalling around 2.7 to 4.7 Gb per sample. Mean subread lengths ranged from 6.15 to 9.1 kb. N50 of 11.7 kb and 19.2 kb were obtained for 10 and 20 kb libraries respectively.
PCR-free Illumina sequencing was performed for most RMP isolates (Additional File 1). One to two micrograms of DNA was sheared using Covaris E series to obtain fragment sizes of 350 and 550 bp. In total, 350 bp and 550 bp PCR-free libraries were prepared using TruSeq PCR-free DNA library preparation kits according to the manufacturer's instructions. Libraries were sequenced on the Illumina HiSeq2000 platform with 2 × 100 bp paired-end read chemistry. Read yields ranged from 8 to 22 million reads for each library.
For five P. yoelii isolates (Pyy33X, PyyAR, PyyCN, PynD and PykDG), DNA libraries were made using the NEBNext Ultra II DNA Library Prep Kit (NEB) (Nebnext) according to the manufacturers' instructions with target insert size of 350-400 bp. The pooled libraries were sequenced on an Illumina HiSeq4000 instrument generating approximately 13-70 million reads per sample.

Genome assembly and annotation
Genome assembly from long single-molecule sequencing reads was performed using FALCON (v0.2.1) [45] with length cutoff for seed reads used for initial mapping set as 2000 bp and for pre-assembly set as 12,000 bp. The falcon sense options were set as "-min idt 0.70 -min cov 4 -local match count threshold 2 -max n read 200" and overlap filtering settings were set as "-max diff 240 -max cov 360 -min cov 5 -bestn 10". A total of 28-40 unitigs were obtained and smaller unitigs were discarded as they were exact copies of the regions already present in the larger unitigs.
PCR-free reads were used to correct base call errors in the unitigs using ICORN2 [46], run with default settings and for 15 iterations. The unitigs were classified as chromosomes based on their homology with P. chabaudi chromosomes (GeneDB version 3). In PvlDE and PvsEL samples, some of the chromosomes were made of two to three unitigs with overlapping ends which were then fused and the gaps were removed manually. Apicoplast and mitochondrial genomes were assembled from PCR-free reads alone using Velvet assembler [82].
Syntenic regions between genome sequences were identified using MUMmer v3.2 [83]. Synteny breakpoints were identified manually and were confirmed not to be misassemblies by verifying that they had continuous read coverage from PacBio and Illumina reads. Artemis Comparison tool [50,51] and Integrative Genomics Viewer [84] were used for this purpose. The structural variations were illustrated using CIRCOS [85].
De novo gene predictions were made using AUGUST US [49] trained on P. chabaudi gene models. RNA sequencing reads were mapped onto the reference genome using TopHat [86] to infer splice junctions. AUGUSTUS predicted gene models, junctions.bed file from TopHat and P. chabaudi gene models were fed into MAKER [47] to create consensus gene models that were then manually curated based on RNAseq evidence in Artemis Viewer and Artemis Comparison tool [50,51]. Ribosomal RNA (rRNA) and transfer RNA (tRNA) were annotated using RNAmmer v1.2 [87]. Gene product calls were assigned to P. vinckei gene models based on above identified orthologous groups using custom scripts. Functional domain annotations were inferred from InterPro database using InterProScan v5.17 [52]. Transmembrane domains were predicted by TMHMMv2.0 [88], signal peptide cleavage sites by SignalP v4.0 [89] and presence of PEXEL/VTS motif detected using ExportPredv4.0 [90] (with PEXEL score cutoff of 4.3).

Transcriptomics
Total RNA was isolated for four P. vinckei isolates (PvbDA, PvpCR, PvlDS and PvsEL) from mixed blood stages using TRIzol (Invitrogen) following the manufacturer's protocol. For PvpCR and PvlDS, additionally, total RNA was isolated from ring, trophozoite and gametocyte enriched fractions obtained using a Nycodenz gradient.
Strand-specific mRNA sequencing was performed from total RNA using TruSeq Stranded mRNA Sample Prep Kit LT (Illumina) according to the manufacturer's instructions. Briefly, polyA+ mRNA was purified from total RNA using oligo-dT dynabead selection. Firststrand cDNA was synthesised using randomly primed oligos followed by second-strand synthesis where dUTPs were incorporated to achieve strand specificity. The cDNA was adapter-ligated and the libraries amplified by PCR. Libraries were sequenced in Illumina Hiseq2000 with paired-end 100 bp read chemistry.
Stage-specific RNAseq data for PvvCY's intraerythrocytic growth stages were obtained from an earlier study [64]. Gene expression was captured every 6 h during PvvCY's 24 h IDC with three replicates, of which 6 h, 12 h and 24 h timepoints were used in this study to denote gene expression at ring, trophozoite and schizont stages respectively. Similarly, for P. chabaudi AS, gene expression was captured every 3 h during its IDC with two replicates in a recent study [65], of which the 5.5 h, 11.5 h and 23.5 h timepoints on day 2 were chosen to denote ring, trophozoite and schizont stages respectively (GSM3884649, GSM3884650, GSM3884653, GSM388 46534, GSM3884661 and GSM38846612 in [91]). P. yoelii and P. berghei transcriptome data were obtained from [58] (raw dataset in [92]) and [26] respectively.

SNP calling and molecular evolution analysis
Illumina paired-end reads for a total of 30 RMP isolates produced in this study or sourced from previous studies (see Additional File 7) were used for SNP calling. In the case of isolates sequenced in this study, the 350 bp fragment size sequencing data was used. First, to produce a high-quality pan-RMP SNP dataset for phylogeny construction, all quality-trimmed reads were mapped onto the PvvCY reference genome using BWA tool [93] with default parameters. MAPQ values of the mapped reads were fixed and duplicated reads removed using CleanSam, FixMateInformation and MarkDuplicates commands in picardtools [94] and only uniquely mapped reads were retained using samtools with parameter -q 1 [95]. Raw SNPs were called from the mapped reads using samtools mpileup and bcftools with following parameters-minimum base quality of 20, minimum mapping quality of 10 and ploidy of 1. SNPs with quality (QUAL) less than 20, read depth (DP) less than 10, mapping quality (MQ) less than 2 and allele frequency (AF1) less than 80% were removed. Further, only SNPs present in protein-coding genes were retained and those present in low-complexity regions (predicted by DustMasker [96]) and sub-telomeric multigene family members were excluded.
The filtered SNPs from different samples were merged and SNP positions with missing calls in more than six samples were removed. This filtered high-quality set of 1,020,956 SNP positions were used to infer maximum likelihood phylogeny (see Additional File 7).
For inferring Ka/Ks ratios between P. vinckei isolates and PvvCY, filtered SNPs obtained above were merged as before but excluding PvpBS due to its high missing call rate. Only SNP positions with no missing calls in any sample were retained and morphed onto PvvCY gene sequences using gatk command FastaAlternateRe-ferenceMaker [97] to produce isolate-specific gene sequences which were then used for pairwise sequence comparisons to identify synonymous and nonsynonymous substitutions. Ka/Ks ratios were calculated using KaKs Calculator [98] and averaged across isolates if more than one was available for a subspecies.
For comparisons against P. v. petteri, P. yoelii and P. chabaudi, sample reads were mapped onto PvpCR, P. yoelii 17X and P. chabaudi AS genomes respectively and subsequent steps were followed as before. Similar to PvpBS, PysEL was excluded from Ka/Ks analysis due to high missing rate.
One-to-one orthologous proteins from each of the 3920 ortholog groups that form the core proteome were aligned using MUSCLE [99]. Alignments were trimmed using trimAl [100] removing all gaps and concatenated into a partitioned alignment using catsequence. An initial RAxML [57] run was performed on individual alignments to identify best amino acid substitution model under the Akaike Information Criterion (--auto-prot = aic). These models were then used to run a partitioned RAxML analysis on the concatenated protein alignment using PROTGAMMA model for rate heterogeneity.
For constructing isolate-level phylogeny, the vcf files containing high-quality SNPs were first converted to a matrix for phylogenetic analysis using vcf2phylip [101]. RAxML tree inference was performed using GTRGAM MA model for rate heterogeneity along with ascertainment bias correction (--asc-corr = stamatakis) since we used only variant sites.
Maximum likelihood trees for multigene families were constructed based on nucleotide sequence alignments of member genes that included intron sequences if present (except in the case of pir family where introns were excluded). Alignments were performed using MUSCLE with default parameters, frame-shifts edited manually in AliView [102] followed by automated trimming with trimAl using -gappyout parameter. In all the phylogenies, bootstrapping was conducted until the majority-rule consensus tree criterion (-I autoMRE) was satisfied (usually 150-300 replicates). Phylogenetic trees were visualised and annotated in the iTOL server [81].

Plasmid construction and transfection in P. vinckei
The pPvvCY-p230p-gfpLuc plasmid was constructed using MultiSite Gateway cloning system (Invitrogen). attB-flanked 5′ and 3′ homology arms were obtained by amplifying 800 bp regions upstream and downstream of PVVCY_0300700. These fragments were subjected to independent BP recombination with pDONRP4-P1R (Invitrogen) to generate entry plasmids pENT12-5U and pENT41-3U, respectively. Similarly, the gfpLuc cassette from pL1063 was amplified and subjected to LR reaction to obtain pENT23-gfpLuc. BP reaction was performed using the BP Clonase II enzyme mix (Invitrogen) according to the manufacturer's instructions.
Plasmodium vinckei vinckei CY schizont-enriched fraction was collected by differential centrifugation on 50% Nycodenz in incomplete RPMI1640 medium, and 20 μg of ApaI-and StuI-double digested linearized transfection constructs were electroporated to 1 × 10 7 of enriched schizonts using a Nucleofector device (Amaxa) with human T cell solution under programme U-33. Transfected parasites were intravenously injected into 7-week-old ICR female mice, which were treated by administering pyrimethamine in the drinking water (0.07 mg/mL) 24 h later for a period of 4-7 days. Drug-resistant parasites were cloned by limiting dilution with an inoculum of 0.3 parasites/100 μL injected into 10 female ICR mice. Two clones were obtained, and integration of the transfection constructs was confirmed by PCR amplification with a unique set of primers for the modified p230p gene locus. Live imaging of parasites was performed on thin smears of parasite-infected blood prepared on glass slides stained with Hoechst 33342. Fluorescent and differential interference contrast (DIC) images were captured using an AxioCam MRm CCD camera (Carl Zeiss, Germany) fixed to an Axio imager Z2 fluorescent microscope with a Plan-Apochromat 100 ×/1.4 oil immersion lens (Carl Zeiss) and Axiovision software (Carl Zeiss). GFPexpressing P. vinckei oocysts in mosquito midguts were imaged in SMZ25 microscope (Nikon).

Mosquito transmission and genetic crossing of P. vinckei parasites
To determine the optimal transmission temperature for P. vinckei baforti isolates, infected CBA mice were anaesthetized on day 3 post-inoculation and ∼100 female Anopheles stephensi mosquitoes (7 to 12 days post emergence) were allowed to take a blood meal for 30 min without interruption after confirming presence of gametocytes by microscopy. Three batches of ∼100 mosquitoes were fed at three different temperatures-21°C, 23°C and 26°C. The fed mosquitoes were maintained at the feed temperatures and at 70% humidity. To check for presence of oocysts/sporozoites, mosquitoes were dissected, and their midguts or salivary glands were suspended in a drop of PBS solution atop a glass slide, covered by a coverslip and studied under a microscope.
For genetic crossing, isolates were harvested from donor mice and mixed to achieve a 1:1 ratio and 1 × 10 6 parasites of this mixture was inoculated into four female CBA mice. Three days after inoculation, after confirming the presence of gametocytes, two infected CBA mice were anaesthetized and placed on two mosquito cages, each containing around 80 mosquitoes each. Mosquitoes were allowed to feed on the mice without interruption for 40 min at 24°C. A fresh feed was again performed on the 4th day post-inoculation with the other two CBA mice and two fresh cages of mosquitoes. In total, 5-10 female mosquitoes from each cage were dissected on the 9th and 12th day after the blood meal to check for presence of oocysts in the mosquito midguts. Twenty days after the blood meal, the mosquitoes were dissected and the salivary glands were removed, placed in 0.5-0.7 ml PBS solution and gently disrupted to release sporozoites. The suspensions from day 3 and day 4 feeds were injected intravenously into an ICR mouse each. When the mice were positive for blood-stage parasites, they were sub-inoculated into ten ICR mice with an inoculum of 0.6 parasites/100 μL to obtain clones from the potential cross-progeny by limiting dilution. Eight days post infection, four mice were positive for parasites and these clones were screened for the presence of both PvsEH and PvsEL alleles within the chromosomes.