Genetic affinities of an eradicated European Plasmodium falciparum strain

Malaria was present in most of Europe until the second half of the 20th century, when it was eradicated through a combination of increased surveillance and mosquito control strategies, together with cross-border and political collaboration. Despite the severe burden of malaria on human populations, it remains contentious how the disease arrived and spread in Europe. Here, we report a partial Plasmodium falciparum nuclear genome derived from a set of antique medical slides stained with the blood of malaria-infected patients from Spain’s Ebro Delta, dating to the 1940s. Our analyses of the genome of this now eradicated European P. falciparum strain confirms stronger phylogeographical affinity to present-day strains in circulation in central south Asia, rather than to those in Africa. This points to a longitudinal, rather than a latitudinal, spread of malaria into Europe. In addition, this genome displays two derived alleles in the pfmrp1 gene that have been associated with drug resistance. Whilst this could represent standing variation in the ancestral P. falciparum population, these mutations may also have arisen due to the selective pressure of quinine treatment, which was an anti-malarial drug already in use by the time the sample we sequenced was mounted on a slide.

using whole-genome human baits was done following Mybait Human Whole genome to manual version 3.01 (from www.microarray.com/pdf/Mybaits-manual-v3.pdf). After hybridization of the ancient (a)DNA libraries with the human baits for 24 hours, we let it bind to streptavidin magnetic beads for 30 minutes at 65 ºC. Finally, we collected the supernatant (fraction that did not bind to the beads) and cleaned it using QiaQuick PCR Purification Kit (Qiagen) following the manufacturer's instructions. The samples were eluted in 30 μl of Elution Buffer (EB, Qiagen) after 10-minutes of incubation at 37 ºC.

Amplification of capture-depleted products
After we had estimated the optimal number of cycles with qPCR, capture-depletion products were amplified for five cycles and P. falciparum DNA reads captured for 22 cycles using 2x KAPA HotStart ReadyMix and re-amplification primers IS5 and IS6 (3).
The samples were then quantified on an Agilent 2100 Bioanalyzer (Agilent technologies) and pooled in equimolar amounts. The pool was sequenced in one lane of an Illumina HiSEQ 4500 run in 80 SR mode. A library blank and an extraction blank control were included and showed no evidence of contamination with exogenous P. falciparum DNA. This extraction and amplification process resulted in us using all available material and slides.

Reference dataset
In order to represent the global P. falciparum diversity, we selected 434 worldwide P.  (4). We also selected a P. praefalciparum genome (accession code ERS437570) as an outgroup species (5). We detail the dataset in Supplementary Table S1.

Mitochondrial analysis
We extracted only the mtDNA alignment of the population genetics dataset, which comprised 5967 bp and 251 SNPs across 426 samples, including the P. praefalciparum outgroup. A pairwise similarity matrix was constructed based on the raw SNP differences and used to generate a minimum spanning network using the spantree() function from the R package Vegan (21).

P. falciparum 18s rDNA sequence
We compared the reads mapped against the 18s rDNA gene with 2 previously published sequences of the same gene (22). Using BLASTn, both published sequences showed 100% identity with the 18s rDNA of P. falciparum (23).
Unfortunately, our Ebro-1944 sequence did not overlap with this specific genetic region, but showed a high degree of identity with P. falciparum sequences (>95%).

Imputation of H191Y and I876V genetic variants at the pfmrp1 gene
Given the low coverage of the Ebro-1944 nuclear genome, we conducted additional analyses to validate the presence and absence of variants involved in anti-malarial drug resistance. We performed imputation over a 100kb window of the genome containing the pfmrp1 gene, and used GATK UnifiedGenotyper to call variants in this region (24). We then selected all samples with more than 80% of the positions called at a depth of 20x. We filtered out all the positions with indels and with a minor allele count   The specific nucleotide positions (x-axis) at which a substitution is present at the 5' end (left) and 3' end (right) of the mapped reads is provided. In red, the C to T substitution frequency; in blue, the G to A substitution frequency; in grey, the frequency of all other substitutions. The elevation in C to T substitutions at the 5' end and G to A substitutions at the 3' end suggest DNA damage in Ebro-1944 is consistent with the degradation expected in post-mortem historical samples rather than modern contamination. Figure S4: f4-statistics. (left-panel) f4 statistics and 5-95% confidence intervals (yaxis) testing the relationship f4(P. preafalciparum, Ebro-1944; X, Y), where X and Y iterate through all groups included in our global dataset, as given in the header of each plot. (right-panel) Z-scores following jackknife resampling where an absolute Z-score greater than 2 is considered significant. Regional groupings are coloured as in Fig. 1 of the main text. A negative f4 statistic indicates that Ebro-1944 has a greater affinity to X (x-axis label) over Y. Figure S5: Chromopainter's inferred proportion of haplotypes shared (colour scale) between 30 strains of P. falciparum with 100% SNP overlap across 8195 variant sites. The x-axis colour provides the continental region where strains were collected. Ebro-1944 shares more haplotypes with strains from central south Asia relative to Africa.