Rapid and iterative genome editing in the zoonotic malaria parasite Plasmodium knowlesi: New tools for P. vivax research

Tackling relapsing Plasmodium vivax and zoonotic Plasmodium knowlesi infections is critical to reducing malaria incidence and mortality worldwide. Understanding the biology of these important and related parasites was previously constrained by the lack of robust molecular and genetic approaches. Here, we establish CRISPR-Cas9 genome editing in a culture-adapted P. knowlesi strain and define parameters for optimal homology-driven repair. We establish a scalable protocol for the production of repair templates by PCR and demonstrate the flexibility of the system by tagging proteins with distinct cellular localisations. Using iterative rounds of genome-editing we generate a transgenic line expressing P. vivax Duffy binding protein (PvDBP), a lead vaccine candidate. We demonstrate that PvDBP plays no role in reticulocyte restriction but can alter the macaque/human host cell tropism of P. knowlesi. Critically, antibodies raised against the P. vivax antigen potently inhibit proliferation of this strain, providing an invaluable tool to support vaccine development.

Here, we establish CRISPR-Cas9 genome editing in a culture-adapted P. knowlesi strain and define 23 parameters for optimal homology-driven repair. We establish a scalable protocol for the production of 24 repair templates by PCR and demonstrate the flexibility of the system by tagging proteins with distinct 25 cellular localisations. Using iterative rounds of genome-editing we generate a transgenic line expressing 26 P. vivax Duffy binding protein (PvDBP), a lead vaccine candidate. We demonstrate that PvDBP plays 27 no role in reticulocyte restriction but can alter the macaque/human host cell tropism of P. knowlesi. 28 Critically, antibodies raised against the P. vivax antigen potently inhibit proliferation of this strain, 29 providing an invaluable tool to support vaccine development. 30 31 32 33 confirmed by live microscopy ( Figure 1C). The eGFP positivity rate was calculated the day after 118 transfection (day 1), to evaluate transfection efficiency (8.4 % ± 2.1 SD). The eGFP positivity was then 119 assessed again once parasites reached 0.5 % parasitemia (day 12), indicating 83.3 % (± 1.8 SD) of the 120 parasites had integrated the construct ( Figure 1D). Parasites transfected with pCas/sg_p230p without 121 providing pDonor_p230p were visible in culture several days after the integrated lines. An intact guide 122 and PAM site was detected in these parasites, suggesting that a small population of parasites did not 123 form DSB. Parasites transfected with pCas/sg without a cloned sgRNA appeared in culture within a few 124 days after transfection, with comparable growth rates to the eGFP plasmid, suggesting the Cas9 125 expression without a targeting sgRNA is not toxic ( Figure 1E). Integrated lines were grown for one 126 week before negative selection with 5-Fluorocytosine and subsequent limiting dilution cloning. Clones 127 were identified using a plaque-based assay (S1B Figure) previously used for P. falciparum (32), and 128 10/10 genotyped clones harboured correctly integrated, markerless eGFP ( Figure 1F). 129 with shorter HR length (S2D Figure). Parasites transfected with 800 and 1600 bp HR constructs were 158 the fastest to reach 1 % parasitemia on day 12 and 9 post transfection, respectively ( Figure 2E). For the 159 50 and 100 bp HR constructs no eGFP positive parasites were detected by fluorescence microscopy 160 suggesting very low targeting efficiencies. Constructs with HRs >400 bp provided GFP positivity, 161 ranging from 79 and 81 % ( Figure 2F), which taken together with PCR yields and transfection recovery 162 time suggest an optimal HR length of at least ~800 bp. 163

164
To undertake large gene deletion or replacement experiments, HRs may need to be placed at a distance 165 from the Cas9-induced DSB, and it is well known in other systems that efficiency rapidly declines with 166 distance to DSB (35). In P. falciparum integration efficiencies decrease drastically with distance over 167 250 bp from the PAM site (36). To determine how distance from DSB affected efficiency of 168 integration, we used the same p230p PAM site and moved our 400 bp HRs varying distances away from 169 the DSB, ranging from 0 -5 kb ( Figure 2G and S2D Figure). Whilst all transfections were PCR positive 170 for integration and reached 1 % parasitemia at similar times (14-20 days) (S2D Figure and Figure 2H), 171 the integration efficiency declined with distance from DSB. This decline was surprisingly small, with 172 HRs placed even 5 kb away from either side of the DSB yielding a 14 % (± 18 SD) integration 173 efficiency ( Figure 2I). Interestingly, we found that extending HR length to 800 bp restored integration 174 efficiencies to 54.8 % (± 8.7 SD) at a 5 kb distance from DSB ( Figure 2I). Thus, HR length can directly 175 offset efficiency losses due to distance from DSB and this system can readily remove genes at least as 176 large as 10 kb in size from a single PAM site, accounting for ~98 % of genes in the P. knowlesi genome 177 (37). 178 179

Cas9-based PCR constructs enable rapid and flexible gene tagging in P. knowlesi 180
Having demonstrated consistent performance of an sgRNA sequence in the sgRNA/Cas9 suicide vector 181 and PCR constructs for targeting a single control locus, we next sought to determine how robust the 182 system is for targeting a range of loci. We therefore used the PCR-based approach for fusion of 183 fluorescent or epitope tags to proteins of interest ( Figure 3A). For C-terminal tags, the PCR repair 184 templates were generated by creating fusions of the tag with HRs targeting the 3'end of the gene and 185 the 3'UTR. Similarly, N-terminal tag repair templates were created by flanking the tag with HRs 186 targeting the 5'UTR and 5'end of the coding region. In each case a PAM site was selected that crossed 187 the stop codon (for C-terminal) or start codon (for N-terminal) such that integration of the tag alone, 188 with no other exogenous sequence, was sufficient to disrupt the PAM site. For genes, such as the 189 Chloroquine Resistance Transporter (CRT), where the PAM site preceded the stop codon, intervening 190 sequences were recodonised when generating the 5'HR to disrupt the PAM site using silent mutations. 191 We selected five genes with disparate subcellular locations and functions to test this approach: the 192 micronemal protein apical membrane antigen 1 (AMA1) (38), rhoptry neck protein 2 (RON2) (39), 193 inner membrane complex protein myosin A (MyoA) (40), digestive vacuole membrane protein involved 194 in drug resistance CRT (41), and a protein involved in artemisinin resistance in cytoplasmic foci 195 Kelch13 (K13) (42). A single sgRNA was selected for each, and repair templates were generated by 196 fusion PCR to incorporate an eGFP, mCherry (both with 24 bp glycine linker) or a hemagglutinin (HA) 197 tag (S3A-E Figure). An N-terminal tag was used for K13, as previous work in P. falciparum suggested 198 that C-terminal tagging affected parasite growth (42), and C-terminal tags used for all the other targets. 199 All lines grew up quickly after transfection, reaching 1 % after between 8 and 15 days, and PCR 200 analysis indicated that correct integration had occurred ( Figure 3B). Whilst it is, to our knowledge, the 201 first time each of these proteins have been tagged in P. knowlesi, all demonstrated localisation patterns 202 were consistent with previous reports for P. falciparum ( Figure 3C). AMA1, MyoA and K13 showed 203 clear bands at the expected size on western blots. The CRT-eGFP fusion protein showed a band at ~50 204 kDa, in line with work in P. falciparum which showed CRT-eGFP migrates faster than its predicted size 205 of 76 kDa (S3F Figure) (41). We were unable to visualise a band for RON2-HA most likely due to poor 206 blotting transfer of this 240 kDa protein. Together, these results demonstrate that the fusion PCR 207 approach can be used to tag P. knowlesi genes rapidly and robustly at a variety of loci. Analysis of 208 equivalent P. falciparum loci revealed only 2/5 had suitably positioned PAM sites, and equivalent UTR 209 regions had an average GC-content of only 11.8 % (36 % for P. knowlesi), suggesting a similar 210 approach would have been more challenging in P. falciparum (Table S1). 211 212 Transgenic P. knowlesi orthologue replacement lines provide surrogates for P. vivax vaccine 213

development and DBP tropism studies 214
Having demonstrated the utility of this technique for rapidly manipulating genes of interest, we next 215 sought to use this system to study P. vivax biology. The orthologous RBC ligands PkDBPα and PvDBP, 216 mediate host cell invasion by binding to the DARC receptor on human RBCs in P. knowlesi and P. 217 vivax, respectively (5-8). PvDBP is currently the lead vaccine candidate for a P. vivax blood stage 218 vaccine (12-14), thus P. knowlesi could provide an ideal surrogate for vaccine testing in the absence of 219 a robust in vitro culture system for P. vivax. Whilst likely functionally equivalent, the DBP orthologues 220 are antigenically distinct (~70 % amino acid identity in binding region II) so we used genome-editing 221 tools to generate transgenic P. knowlesi parasites in which DARC binding is provided solely by PvDBP. 222 We first carried out an orthologue replacement (OR) of the full-length PkDBPα with PvDBP in the P. (20) and PkDBPα is required to mediate this interaction (7), thus the successful replacement indicates 231 that the Pv orthologue can fully complement its role in DARC binding and parasite invasion. 232 P. knowlesi contains two DBPα paralogues, DBPβ and DBPγ, which are highly homologous at the 233 nucleotide (91-93 % identity) and amino acid (68-88 % identity) levels, but are thought to bind to 234 distinct sialic acid-modified receptors unique to macaque RBCs (16). The PkDBPα sgRNA was 235 carefully designed to be distinct to equivalent DBPβ and DBPγ target sequences (85 % identical to 236 DBPγ and 47.8 % to DBPβ), because, as in other systems, off-target Cas9-induced DSBs are a major 237 issue (43, 44). We therefore sequenced the four most similar target sequences, including one in DBPγ, 238 in the PvDBP OR lines (Table S2) and did not detect any off-target mutations, suggesting that as for other 239 malaria parasites (22) the absence of non-homologous end joining (26) ameliorates the potential for off-target mutations. However, diagnostic PCRs for DBPβ failed, as well as PCRs in genes flanking the 241 DBPβ locus. Whole genome sequencing revealed that in one of two independent PkDBPα OR and 242 PvDBP OR clones of ~44 kb truncation at one end of chromosome 14 had occurred (S5 Figure)  In this work, we adapt CRISPR-Cas9 genome editing to the zoonotic malaria parasite P. knowlesi. 297 Whilst various approaches for CRISPR-Cas9 have been used for other malaria parasites (22, 29, 44, 298 46), here we combine a plasmid containing a single recyclable positive selection marker with a fusion 299 PCR-based approach for generation of repair templates. This allows for seamless insertion or deletion at 300 any location within a gene and unlimited iterative modifications of the genome. Genome-wide reverse 301 genetics screens have been applied with great success to the rodent malaria parasite, P. berghei (47, 48), 302 but they have remained challenging for P. falciparum, and impossible for P. vivax. The tools presented 303 here will enable scalable construct assembly and genome-wide systematic knockout or tagging screens 304 in an alternative human infective species, thus providing a complementary tool to address both shared 305 and species-specific biology. The analysis of lines with multiple tagged or deleted genes is particularly 306 valuable for multigene families with highly redundant functions, as exemplified by our modification of 307 all three P. knowlesi DBP genes. 308

309
Here we investigate key parameters associated with successful genome editing and show that the 310 process is also highly robust; targeting of the p230p locus demonstrated successful editing for 25/25 311 transfections and only 1/10 sgRNAs targeting different loci failed to generate an edited line. The failure 312 of an sgRNA guide (AGAAAATAGTGAAAACCCAT) designed to target the DBPβ locus, a non-313 essential gene, suggests that multiple guides may need to be tested for some loci. We did not detect any 314 off-target effects, consistent with other reports of CRISPR-Cas9 use in malaria parasites (22, 29, 44, 315 46). Negative selection of the pCas9/sg plasmid then enables generation of markerless lines allowing 316 unlimited iterative modifications of the genome, with each round requiring only ~30 days (including 317 dilution cloning). We systematically tested key parameters associated with successful genome editing 318 and found increasing HR length enhanced integration efficiency proportionately, a trend seen in both P. 319 falciparum and P. berghei (31, 44, 49). Whilst integration was detected with HRs as short as 50 bp, 320 efficient editing was achieved with HRs between 200-800 bp. We were also able to examine how 321 distance from the DSB affected editing efficiency. Whilst in other systems editing efficiency decreases 322 rapidly as the DSB distance increases, we saw only a steady decline with distance, an effect which 323 could be ameliorated by simply increasing HR length. 324 325 By applying these techniques to the P. knowlesi and P. vivax DBP family we have been able to examine 326 the role of these genes in host and reticulocyte tropisms of the two species. Even after long-term 327 adaptation to culture with human RBCs, P. knowlesi parasites can retain a strong preference for 328 invasion of macaque RBCs reticulocytes (16, 33). Both DBPγ and DBPβ have been shown to bind to 329 proteins with a distinct sialic acid residue found in non-human primates, but absent in humans (16). 330 Deletion of these genes had no effect on invasion of human RBCs, but interestingly also had no effect 331 on invasion efficiency in macaque RBCs either, demonstrating that PkDBPα alone is sufficient to retain 332 full invasive capacity and that the DBP proteins are not responsible for the macaque cell preference 333 retained in the A1-H.1 human adapted line. Despite being closely related to P. knowlesi and other 334 macaque infecting species such as P. cynomolgi, P. vivax cannot infect macaques and the PvDBP 335 protein has been suggested to play a role in enforcing this tropism, as key interacting residues are 336 missing within the macaque DARC protein (11). P. knowlesi parasites expressing PvDBP in the 337 absence of DBP paralogues demonstrate a clear reduction in invasion capacity in macaque cells, 338 resulting in an overall shift towards preference for human cells consistent with a PvDBP binding 339 macaque DARC less efficiently. Nevertheless as invasion capacity remained quite close to that seen for 340 human RBCs it seems unlikely that the PvDBP protein alone represents a significant barrier to P. vivax 341 infection of macaques. Another key difference between the two species is that unlike P. knowlesi, P. 342 vivax has a strict restriction to invasion of reticulocytes. A second family of RBC binding proteins, 343 known as the reticulocyte binding-like proteins (RBPs) have previously been implicated in this tropism. 344 More recently, the PvDBP protein itself has been implicated with work using recombinant PvDBP-RII 345 suggesting that whilst DARC is present on both reticulocytes and mature normocytes, changes during 346 red cell maturation mean that DARC is only accessible to PvDBP binding in young reticulocytes (10). 347 Here we show that transgenic P. knowlesi parasites using PvDBP for invasion have no such restriction, 348 invading human RBCs (which typically contain less than 0.5% reticulocytes) with the same efficiency 349 as those expressing PkDBP -thus providing compelling evidence that PvDBP plays no role in the reticulocyte tropism. Further, recent work determining that PvRBP2b, which lacks an orthologue in P. 351 knowlesi, binds to the reticulocyte specific marker CD71 (50) further asserts the RBPs as the key to 352 reticulocyte tropism. Importantly, the ability to compare and contrast activity of Pk/Pv DBP family 353 members in parasitological assays will provide a vital new tool to test hypotheses and models arising 354 from studies that have until now relied on assays using recombinant protein fragments. 355 356 Efforts to develop a P. vivax vaccine to elicit antibodies against the lead candidate PvDBP have 357 predominantly relied on using ELISA-based assays, which assess the ability of antibodies to block 358 recombinant PvDBP-RII binding to DARC (18), but are likely to be less informative than 359 parasitological assays. Some epitopes identified in recombinant protein assays may be inaccessible in 360 the context of invasion and it is also possible that not all inhibitory antibodies directly block receptor 361 engagement. DARC-DBP binding is only one step in the multi-step invasion process, with subsequent 362 conformational changes and potential downstream signalling roles for the protein (51). The full-length 363 DBP antigen is 140 kDa which contains a C-terminal transmembrane domain and as such structural and 364 biochemical analysis of the protein has almost exclusively focused on the PvDBP-RII fragment alone. 365 The P. knowlesi PvDBP OR line thus provides an opportunity to interrogate the function of the full-length 366 protein. Whilst efforts to standardise ex vivo P. vivax assays have been successful (15), they remain 367 hugely challenging, low throughput and rely on genetically diverse P. vivax clinical isolates, that are 368 maintained in culture for a only a single cycle of RBC invasion. A vaccine against P. vivax must 369 ultimately elicit antibodies with strain-transcending inhibitory activity, but the ability to test on a 370 defined genetic background can provide a significant advantage when it comes to benchmarking and 371 prioritising target epitopes and characterising sera raised against them. Here we use the PvDBP 372 sequence from the SalI reference strain, but multiple lines expressing distinct PvDBP variants could be 373 generated in future, to systematically examine inhibition in heterologous strains. Isolates refractory to a 374 given test antibody in ex vivo assays can then be sequenced and direct the generation of new transgenic 375 P. knowlesi PvDBP OR variant lines to support rational vaccine development. These assays in turn can 376 provide vital triaging for non-human primate models, and controlled human challenge infections (13, 377 15, 17, 19) -both of which carry the imperative to ensure that only highly optimised antigens are tested.
The transgenic P. knowlesi OR lines developed here represent the ideal platform for scalable testing of 379 polyclonal and monoclonal sera from vaccine trials and natural P. vivax infections. This will enable 380 detailed investigation of epitopes providing invasion inhibitory activity and a means for systematic 381 development of a strain-transcending vaccine. Our work also revealed low-level cross-reactivity of 382 PvDBP-RII antibodies against P. knowlesi and suggests cross-immunity between the two species could 383 exist in the field, which may have a significant impact on disease outcome. Understanding the precise 384 epitopes involved could facilitate development of a dual species vaccine, and epitopes conserved across 385 species are also more likely to be conserved across polymorphic strains of P. vivax. The same approach 386 could readily be applied to other potential vaccine candidates, novel drug targets or to investigate 387 mechanisms of drug resistance, which are also thought to differ between P. falciparum and P. vivax 388 (52). 389 In conclusion, we demonstrate that adaptation of CRISPR-Cas9 genome editing to P. knowlesi provides 390 a powerful system for scalable genome editing of malaria parasites and can provide critical new tools 391 for studying both shared and species-specific biology. After 7 days the media was changed and 0.2 % fresh blood added. On day 11 the plate was screened for 428 plaques, in an assay modified from P. falciparum (32). Plaque positive cultures were transferred to 24 429 well plates containing 1 ml media with 2 % haematocrit and used for genotyping. 430

DNA Constructs and PCRs 432
Preparative DNA for plasmid cloning and PCR fusion constructs was amplified with CloneAmp 433 (Takara) using the following cycle conditions: 32 cycles of 5 s at 98°C , 20 s at 55°C, and 5 s/kb at 434 72°C . Genomic DNA was prepared using DNeasy blood and tissue kit (Qiagen). Cloning of pDonor_ pkdbpγ: Plasmid pDonor was modified by restriction cloning to include two HRs 477 from PkDBPγ 5' and 3'UTRs using primers olFM245 and olFM0246 (adding SacII /SpeI sites) and 478 primers olFM0247 and olFM248 (adding NotI/NcoI sites) respectively. A spacer sequence, to aid in 479 subsequent diagnostic PCRs was generated by polymerase cycling assembly (PCA). Briefly, the spacer 480 sequence was synthesised by using primers of 60 bp length with 20 bp homologous sequence to the 481 adjacent primers on each side. Final concentrations of 0.6 μM for outer primers (ol488 and ol492) and 482 0.03 μM of inner primers (ol489, ol490, ol491 and ol503 were used for PCA with the same cycle 483 conditions as described for PCR. The final product was inserted with SpeI and NcoI restriction sites 484 between the HRs as described for pDonor_ pkdbpα cloning, to replace the deleted DBPγ genes. Primer 485 sequences are listed in Table S4 and S6. 486

Three-step nested PCR 489
Generation of each PCR repair template was carried out by a three-step nested PCR method to fuse 490 together HRs with the insert DNA (eGFP expression cassette, eGFP with N-terminal linker or mCherry 491 with C-terminal linker). In a first set of PCRs, the DNA insert (eGFP expression cassette or tag) and the 492 HRs for integration into the region of interest were individually amplified in duplicate. The HRs 493 contained at least 20 bp and 58°C Tm overhangs with homology to the insert DNA (HR1 with C-term 494 overhang homologous to the N-term of insert DNA and HR2 with N-term overhang homologous to the 495 C-term of the insert DNA). All duplicates were pooled and products were extracted from agarose gel 496 (Qiagen) to remove primers and background amplicons. In a second nested PCR HR1 was fused to the 497 donor amplicon in duplicate with double the amount of time allowed for the elongation step (10 s/kb) 498 and again the product was gel extracted. In the final step the HR1-insert and HR2 were fused together 499 resulting in the final product HR1-insert-HR2 (Fig 2A). PCR repair templates for HA tagging were 500 generated in a two-step PCR method. First the HRs were individually amplified with addition of 27 bp 501 HA sequence overhangs on the 3'end of HR1 and the 5'end of HR2. In the second nested PCR HR1 and 502 HR2 were fused. 503 All primers are listed in Table S4 and all primer combinations for each contruct are listed in Table S5. proteins were detected with mouse anti-GFP (Sigma, 1:5,000), rat anti-HA (Sigma 3F10 clone, 528 1:5,000), or rabbit anti-mCherry (ChromoTek, 1:5,000). Primary antibodies were detected using HRP-529 conjugated secondary antibodies (Bio-Rad, 1:5,000) and ECL (ThermoFisher Pierce). 530 Chemiluminescence was captured using the Azure c600 system. 531 532

Immunofluorescence assays and live cell imaging 533
Immunofluorescence assays were performed using blood smears fixed with 4 % paraformaldehyde for

Growth inhibition activity assays 558
Assays of growth inhibition activity (GIA), in the presence of anti-PvDBP_RII antibodies, were carried 559 out using total IgG purified from rabbit sera using protein G columns (Pierce). Immunisation of rabbits 560 against PvDBP_RII (SalI) has been described previously (55). Purified IgG was buffer-exchanged into 561 RPMI 1640 medium, concentrated using ultra centrifugal devices (Millipore) and filter sterilized 562 through a 0.22 μm filter (Millipore) prior to being aliquoted and frozen at -20 °C until use. Raw sequence data for the A1-H.1 parental line was extracted from the European Nucleotide Archive as 582 per (33, 57). The raw sequence data (accession number ERS3042513) was processed as previously 583 described (58). In brief, the raw sequence data was aligned onto the A1-H.1 reference genome using the 584 bwa-mem short read alignment algorithm (59), and coverage statistics were obtained using the 585 sambamba software (60) to be plotted using R.  PCRs specific to each GOI locus were carried out to amplify the wild type locus (schematic positions 822 olFwd1 +olRev2), integration locus (schematic positions olFwd1 +olRev3) and a control targeting an 823 unrelated locus (ol75+ol76). List of specific primers used for each GOI is shown in Table S2. As no 824 DNA is removed in this process, the wild type specific locus primers also generate slightly larger shown. PCR reactions detecting the wild type locus, integration locus and a control PCR targeting an 884 unrelated locus (ol75+ol76) using approximately 3 ng/µl genomic DNA. Primers are listed in Table S4.