Targeted insertion and reporter transgene activity at a gene safe harbor of the human blood fluke, Schistosoma mansoni

Summary The identification and characterization of genomic safe harbor sites (GSHs) can facilitate consistent transgene activity with minimal disruption to the host cell genome. We combined computational genome annotation and chromatin structure analysis to predict the location of four GSHs in the human blood fluke, Schistosoma mansoni, a major infectious pathogen of the tropics. A transgene was introduced via CRISPR-Cas-assisted homology-directed repair into one of the GSHs in the egg of the parasite. Gene editing efficiencies of 24% and transgene-encoded fluorescence of 75% of gene-edited schistosome eggs were observed. The approach advances functional genomics for schistosomes by providing a tractable path for generating transgenics using homology-directed, repair-catalyzed transgene insertion. We also suggest that this work will serve as a roadmap for the development of similar approaches in helminths more broadly.


In brief
Approaches to generate tractable transgenics are needed for helminths responsible for major neglected tropical diseases. Ittiprasert et al. computationally identify genomic safe harbor sites and successfully insert a reporter transgene in one of them, demonstrating tractable transgenic studies in the human blood fluke, Schistosoma mansoni.

INTRODUCTION
Clustered regularly interspaced short palindromic repeats (CRISPR) technology has revolutionized functional genomics. [1][2][3] Transgenesis approaches are integral in diverse applications including therapeutics and deciphering host-pathogen interactions. With progress emanating from model species and cell lines, tools can frequently be adapted and transferred to nonmodel species. Among these are the helminths responsible for major neglected tropical diseases, which cause substantial morbidity and mortality. 4 Infections with helminths also are responsible for substantial economic and disease burdens in agriculture and animal health. 5 These public health and economic imperatives motivated international collaboration for parasite omics research that has resulted in outsized databases of genomes and proteomes and gene, transcript, and protein annotations. [6][7][8][9] In the current post-genomics era, and despite the availability of these omics data, tools for functional genomics in parasitic helminths have been limited to RNAi, which performs with variable efficacy. 10,11 Therefore, MOTIVATION Functional genomics methods are needed to advance the study of helminth parasites, which cause neglected tropical diseases of significant global health burden. The motivation for this work was to develop a tractable method for transgenic studies of the human blood fluke, Schistosoma mansoni, by identifying genome safe harbor sites and demonstrating successful homology-directed transgene insertion.
CRISPR-based transgenesis protocols for functionally characterizing genes of interest such as those coding for putative drug and/or vaccine targets are a prominent research priority. Moreover, progress with gene editing in schistosomes will facilitate its use in other major invertebrate clades of the Protostomia, including the planarians, for which CRISPR-based genetics have yet to be reported.
CRISPR enables targeted site-specific mutation(s), obviating an impediment of earlier transgenesis approaches that relied on vector-based particle bombardment, 12 lentiviruses, 13 and transposons such as piggyBac. 14 These latter approaches could lead to genetic instability, multi-copy insertion, or inactivation of the transgene and interference with the endogenous gene under investigation. These issues can be overcome in the process of genome editing, where double-stranded breaks (DSBs) are resolved by several discrete repair mechanisms, particularly the predominant error-prone, non-homology end joining (NHEJ) pathway and by homology-directed repair (HDR). HDR efficiency can be improved when supplied with double-strand (ds) DNA donor with modifications as the repair template. 15 CRISPR-Cas-assisted HDR has been applied in Schistosoma mansoni 16,17 with a promoter-free, single-strand deoxynucleotide. Overlapping CRISPR target sites improve precise HDR insertion in embryonic stem cells, 18,19 with modification of 5 0 termini of long dsDNA donors, bolstering efficient, single-copy integration through the retention of a monomeric donor confirmation and thereby enabling gene replacement and tagging. 20 One of the caveats of transgene integration is that transgene insertion into an arbitrarily chosen position in the genome may lead to loss of expression due to disruption of cell function or repressive chromatin structure in the target region. This had been identified as a major drawback, initially in gene therapy approaches, and it has led to the concept of genome safe harbors (GSHs). [21][22][23][24] An ideal GSH has been defined as a region (1) that does not overlap (predicted) functional DNA elements and (2) that lacks heterochromatic marks that could impede transcription. 25 This approach was successfully used in Caenorhabditis elegans based on annotations from the ENCODE and modEN-CODE consortia. 26 For non-model organisms, chromatin structure annotations are often unavailable, and experiments resort to criterion (1). For instance, transgene insertion into a GSH of the human filarial parasite Brugia malayi has been reported, but in that case, GSHs were predicted based on four sequence annotation features alone: to be located in intergenic regions, to be unique in the genome, to contain a terminal protospacer adjacent motif (PAM) necessary for targeting by the sgRNA-Cas9 ribonucleoprotein complex (RNP), and fourth, the putative PAM is admissible only if situated >2 kb from the nearest predicted coding region. 27 In this study, we profited from the availability of chromatin data and chromatin accessibility data and combined them in a computational investigation with genome sequence information to identify potential GSH sites in S. mansoni. Furthermore, we adapted CRISPR-Cas9-based approaches to insert a reporter transgene into the most qualified out of four predicted candidate GSHs. The donor transgene encoded EGFP under the control of the schistosome ubiquitin gene promoter and terminator. The targeted region was free of repetitive sequences and neigh-boring long non-coding regions, a situation likely to minimize off-target CRISPR-Cas activity. Multiple sites within this region were targeted with overlapping guide RNAs, deployed in unison to enhance editing efficiency and HDR in the presence of the phosphorothioate-modified DNA donor. A knockin (KI) efficiency of 75% was observed for expression of EGFP in miracidia developing within the schistosome eggshell.

RESULTS
GSHs predicted in the schistosome genome To identify potential GSH sites, we performed in silico analyses based on accepted criteria, introduced principles, 28 and genome resources for S. mansoni, which could satisfy benign and stable gene expression. Notably, we sought to identify intergenic GSH, rather than intragenic GSH. 28 Four regions satisfied our criteria ( Figure 1) and were termed GSH1 (1,416 bp; location, chromosome 3:13380432-13381848), GSH2 (970 bp; chromosome 2:15434976-15435954), GSH3 (752 bp; chromosome 2:9689988-9690739), and GSH4 (138 bp; chromosome 3:13381901-1338 2038), respectively. We note that several protein-coding loci were situated in the vicinity of these gene-free GSHs, although these genes were >2 kb distant from any GSH: Smp_052890, uncharacterized protein; Smp_150460, copper transport protein; Smp_071830, uncharacterized protein; Smp_245610, uncharacterized protein; and Smp_131070, condensing complex ( Figures 1B-1D). Most of these genes are as yet uncharacterized proteins and may be non-essential genes based on orthology to essential genes known from eukaryotes. 29 For CRISPR-specific considerations for the programmed transgene insertion, particularly the presence of multiple PAMs, GSH1 qualified as the most useful of the four GSHs for the present investigation, and hence programmed gene editing at GSH1 is the focus of the findings detailed below.
Efficiency of programmed mutation at GSH1 enhanced by multiple gRNAs We proceeded to investigate the efficiency of programmed mutation and reporter transgene activity at GSH1. Overlapping gRNAs were employed, an approach that enhanced KI efficiency in mammalian cell lines and embryos. 18,19 Among the gRNAs exhibiting on-target specificity for GSH1, three overlapping gRNAs (sgRNA1, sgRNA2, and sgRNA3), which lacked selfcomplementarity and off-target matches to S. mansoni genome ( Figures 1B, 1E, and 2A), were selected from among CRISPR-Cas9 target sites. 30,31 The RNPs of Cas9 nuclease and sgRNA were assembled, after which four discrete mixtures of RNPs were used. Three of the mixtures included dual RNPs (RNP1+RNP2, RNP2+RNP3, and RNP1+RNP3), and the fourth included the triple RNPs (RNP1+RNP2+RNP3).
The mixtures of multiple RNPs, along with the DNA donor encoding EGFP, were co-electroporated into schistosome eggs. The transfected eggs were cultured for 15 days, after which EGFP expression was quantified. Efficiency of genome editing, both in controls and experimental groups, was assessed using DECODR 32 analysis of Sanger sequence chromatograms of amplicons that spanned the DSBs. Analysis of PCR products from DNA using indel primers flanking the DSBs (Figure 2A Report ll OPEN ACCESS knockout (KO) efficiency, as assessed by indel (insertions/ deletions)-bearing alleles resulting from the dual gRNAs, as follows: KO frequencies at GSH1 of 5.4% (0.8%-10.4%), 3.6% (1.2%-19.3%), and 12.6% (4.9%-19.3%) for RNP1+RNP2, RNP2+RNP3, and RNP1+RNP3, respectively ( Figure 2B). The dual RNPs induced short deletions of one to several nucleotides at the predicted DSB for sgRNA1, 2, and/or 3 ( Figures 2C and  2E). Mutations were not evident in amplicons from the control groups. The triple RNPs resulted in 23.9% KO (2.4%-71.9%), higher than achieved with any mixture of the dual RNPs ( Figure 2B).
Overlapping gRNAs enhanced efficiency of CRISPR knockin As multiple gRNAs with overlapping sequences can enhance CRISPR-Cas9-mediated HDR efficiency 18 and given that triple overlapping gRNAs performed better than dual gRNAs in initiating programmed mutation at GSH1 in eggs ( Figure 2B), we investigated KI of a reporter transgene at GSH1 with the triple overlapping sgRNA/RNPs ( Figures 3A and 3B). We employed the gene encoding EGFP driven by the promoter of the endogenous S. mansoni ubiquitin gene (Smp_335990) and its cognate terminator region as the repair template for programmed HDR ( Figure 3A). The donor template included homology arms (HAs) specific for GSH1, located on the 5 0 -flanking region of target 1 and the 3 0 -flanking region of target 3 ( Figure 3B). The donor template was delivered as linearized, long, double-stranded DNA (lsDNA) of 4,451 bp in length. Aiming for precise and efficient single-copy integration of the donor transgene into GSH1 by HDR, the 5 0 termini of the DNA donor amplicons were chemically modi-fied 20 to shield the donor template from multimerization and from integration at the DSB via the NHEJ repair pathway ( Figure 3A).
At the outset, we investigated the impact of length of the HA by comparing the performance of the donor template bearing HAs of increasing length of 200, 400, and 600 bp. Dual (RNP1+RNP3) and triple (RNP1+RNP2+RNP3) RNP mixtures were used in this investigation. EGFP expression was not evident in eggs electroporated with lsDNA donors with 200-and 400-bp HAs at 5 days after transfection (not shown). By contrast, we observed a few EGFP-positive eggs ($2%-3% with at least a small number of EGFP-expressing cells; data from four biological replicates) with the lsDNA donor with 600-bp HA (not shown). Subsequently, we focused the investigation for EGFP expression on transfection with the donor transgene flanked by 600-bp HA using the triple RNPs and monitored EGFP expression for up to 15 days. Thereafter, on examination using spectrally resolved, confocal laser scanning microscopy (CLSM), the EGFP signals were detected in the eggs of the experimental group, which received the CRISPR materials including the lsDNA donor with 600-bp HA. EGFP signals remained until 15 days, when the experiment ended. EGFP signals were not observed in the negative control groups, although the autofluorescence characteristic A C D E F B Figure 2. Efficiency of NHEJ enhanced using overlapping gRNAs (A) Schematic map of sites of the overlapping gRNAs (blue, red, and green arrows for target 1, 2, and 3, respectively) within GSH1 (yellow box), along with primer locations for indel analysis (purple arrows). The black arrows indicate the DSB programmed by sgRNA1, 2, and 3. (B) Efficiency of CRISPR at GSH1 in eggs of S. mansoni, as assessed with the DECODR algorithm using distance, following transfection with overlapping gRNPs: RNP1+RNP2 (blue dots), RNP2+RNP3 (red), RNP1+RNP3 (green), and RNP1+RNP2+RNP3 (purple). Significantly higher CRISPR efficiency was obtained with the three overlapping gRNPs, mean = 23.6%, than the other groups (p % 0.001). Among the groups transfected with dual RNPs, efficiency obtained with the RNP1+RNP3 treatment group, mean = 12.6%, was significantly higher than either of the other groups, RNP1+RNP2 at 5.4% and RNP2+RNP3 at 3.6% (p % 0.01; one-way ANOVA with 95% confidence intervals, six biological replicates; GraphPad Prism). (C-E) Representative alleles, in the schistosome genome, bearing indels at the target site in GSH1 following transfection with dual RNPs, as a gauge of efficiency in CRISPR-catalyzed gene editing. The reference WT allele is shown above the KO allele. KOs were identified for small deletions, 1-6 nt in length (dash) or insertion/substitution (black boxes). The vertical black line boxes show PAM sites. (F) A representative example of a KO allele bearing a large-sized deletion resulting from transfection with the triple gRNPs. of schistosome eggs was apparent. 33 EGFP signals were also detected in the lsDNA donor control (without RNPs) for several days, indicating that extrachromosomal lsDNA expressed EGFP transiently after the transfection.
Next, we investigated the programmed KI by PCR-based analysis for the presence of the expected amplicons spanning the 5 0 and 3 0 flanks of the donor transgene, i.e., flanking the ubiquitin promoter, EGFP, and the ubiquitin terminator sequences. At the 5 0 -flanking region, we used a forward primer specific for the genome upstream of the 5 0 HA paired with a reverse primer specific for the ubiquitin promoter ( Figure 3B). For the 3 0 integration junction, a reverse primer specific for a site downstream of the 3 0 terminus of the HA paired with a forward primer specific for the ubiquitin terminator was used. Fragments representing the 3 0 KI and 5 0 KI integration regions of 983 bp and 728 bp, respectively, were observed in the treatment groups but not in the control groups ( Figure 3C). EGFP transcripts were observed in the KI experimental group, although some variability in tran-script abundance among the biological replicates was seen based on the signals obtained for SmGAPDH, which served as the reference gene ( Figure 3D).
Reporter transgene expression in edited eggs EGFP positivity and intensity were quantified using spectral laser scanning CLSM. 33 Active transgene expression was confirmed within miracidia developing inside transfected eggs ( Figures 4A  and 4B). EGFP appeared to be expressed by numerous diverse cells throughout the developing larvae, whereas morphological malformation was not observed in transgenic eggs and their enclosed larvae. More intense EGFP fluorescence was consistently recorded and quantified at 509 nm in eggs from the experimental treatment group (Figures 4B1 and 4B2) than the mock eggs and in eggs transfected solely with donor template (Figures 4A1 and  4A2). Subsequently, on day 15 following transfection with the repair template in the presence or absence of the RNPs mixture, we quantified EGFP intensity in eggs that contained a miracidium by normalization with the EGFP signal from lsDNA-only transfected eggs. Fluorescence intensity differed markedly between these two groups: by 15 days after transfection, 75% of miracidia in eggs from the KI group emitted EGFP fluorescence, whereas 25% of eggs containing a miracidium transfected with the lsDNA donor only emitted EGFP ( Figures 4C and 4D) (p % 0.001; n = 402 eggs in the experimental group, n = 397 eggs in the lsDNA-only group; Figure 4E; collected from four independent biological replicates).
In addition, we scored the intensity of fluorescence at 509 nm, 33 the emission wavelength for EGFP, as shown in Figures 4C2, 4C3, and 4D (green curve). To this end, we subtracted the signal at 509 nm from the autofluorescence spectrum, which originated from the eggshell. The EGFP-specific signal in the control lsDNA donor repair template treatment group, mean = 1,290 arbitrary units (AU) 34  was significantly lower than the experimental group transfected with the triple, overlapping guide RNPs mixture in the presence of the donor repair template, 6,905 AU (range, 4,973-8,963) (p % 0.001) ( Figure S2). Moreover, emission of EGFP was not detected in the control groups, i.e., mocktransfected and WT eggs (not shown). Diverse cells and tissues of the developing miracidium expressed the fluorescence reporter gene, so EGFP expression appeared not be restricted to specific cells (Figures 4B and 4C).

Impact on egg viability by electroporation
During the investigation, we also examined delivery of CRISPR materials using electroporation of the schistosome egg, an approach originally described for transfection of the schistosomulum stage of S. mansoni. 35 At the outset, electroporation voltage was investigated, using a single pulse of 20 ms duration of 125, 150, 200 and 250 V to deliver the RNPs and donor template into the eggs. Thereafter, the eggs were cultured, and miracidial hatching was assessed 7 days later. Survival and/or larval growth inside the egg in the 125-V treatment group was not significantly affected; the rates of miracidial hatching were 26.6% ± 3.2% and 31.9% ± 2.6% from the non-electroporated group ( Figure S3A). By contrast, increasing the voltage negatively impacted hatching of miracidia from the eggs: 150 V, 22.8% ± 2.2%; 200 V, 11.4% ± 1.2%; 250 V, 3.9% ± 2.0%, respectively (p < 0.001, two-way ANOVA) ( Figure S3A). Last, we investigated the impact of the CRISPR materials in addition to voltage. Using electroporation at 125 V, we monitored hatching in two biological replicates. In the first, 41.2% ± 2.1%, 39.5% ± 1.5%, and 40.9% ± 1.9% (mean ± SE) of miracidia hatched from the wild-type (WT), donor-transfection-only, and experimental (EGFP KI) groups, and in the second were 59.1% ± 2.4%, 60.5% ± 0.6%, and 60.1% ± 1% from WT, donor-transfection-only, and EGFP KI groups, respectively ( Figures S3B and S3C).

DISCUSSION
To advance functional genomics for helminths, we identified four potential GSH sites in S. mansoni, optimized conditions for delivery and structure of transgene cargo, and inserted the reporter transgene into the most qualified intergenic GSH1 by programmed CRISPR-Cas9 HDR repair. We confirmed integration of the transgene by amplicon sequencing as well as EGFP reporter activity using RT-PCR and CLSM analyses. Our approach for programmed editing in this helminth involved electroporation-based delivery to the schistosome egg of RNPs with overlapping gRNAs in the presence of phosphorothioate-modified, double-stranded donor targeting at GSH1. The procedure yielded 24% editing efficiency that was accompanied by transgene activity in 75% of miracidia in the genome-edited schistosome eggs. The donor dsDNA encoded the EGFP gene driven by the schistosome ubiquitin promoter and terminator. Furthermore, clear EGFP signals indicated the suitability of the regulatory elements of the ubiquitin gene to induce transgene expression.
This methodical approach provides a tractable path toward transgenic helminths using HDR-catalyzed transgene insertion. Our criteria to predict GSH included location in euchromatin to avoid silencing of the transgene, a unique genome-target sequence to minimize off-target events, avoidance of lncRNAencoding genes, presence of epigenetic marks for open chromatin structure, and the absence of epigenetic marks indicating heterochromatin. We termed the intergenic sites GSH1, -2, -3, and -4, which were located on chromosomes 2 and 3. S. mansoni has seven pairs of autosomes and one pair of sex chromosomes, Z and W, with the female schistosome being the heterogametic sex. 36 In addition, we assessed the GSH1 locus for CRISPR-Cas9 integration, gene editing, and overexpression of EGFP. We edited GSH1 using RNP of Cas9 endonuclease with multiple overlapping gRNAs. Triple RNPs delivered significantly higher CRISPR-Cas9 efficiency than dual RNPs and longer length deletion mutations. In addition, efficient HDR was obtained using a combination of multiple and overlapping RNPs programmed to cleave GSH1 in the presence of a repair template protected by chemical modifications. Our approach successfully inserted an lsDNA (4,551 bp) at GSH1. This outcome aligns with reports in cell lines and rodents involving overlapping gRNAs, where deletions close to the targeted mutation enhanced the efficiency of HDR. 18,37 Overlapping gRNAs rather than simply multiple gRNA may be more efficient for gene KO in S. mansoni given recent findings involving CRISPR interference that compared both single and multiple gRNAs. 18,19,38 GSH1 represents a promising CRISPR target for S. mansoni. Notably, 75% of eggs exhibited EGFP in the miracidium developing within the eggshell and significantly more fluorescence than seen in the control eggs transfected only with donor template. EGFP signals were not present in the control, untreated WT eggs, which by $10 days following transfection exhibited minimal background fluorescence. Our approach to evade the autofluorescence emitted by eggs, which can confound detection of EGFP, used spectral imaging and linear unmixing, 39,40 an approach to facilitate quantification of EGFP-specific emission to resolve overlap between the EGFP and endogenous fluorophores in schistosomes. Eggs isolated from livers of S. mansoni-infected mice were co-electroporated with two or three RNPs and the donor transgene. Such a preparation of schistosome eggs includes eggs displaying a spectrum of development-from newly laid eggs containing the zygote, developing embryos, eggs containing the fully developed miracidium (with miracidial movement evident), and some dead eggs. 41 Notably, however, the entry of active CRISPR materials and donor transgene into cells of each egg and each developmental stage cannot be predicted. The suitability of this approach for transfection of the LE population with RNPs has been demonstrated by RNP tracking analysis. 42 Indeed, the outcome would be stochastic: not every egg would be expected to receive the full complement of RNPs and donor. Accordingly, eggs exhibiting minimal EGFP may not have been transfected as efficiently as eggs with stronger EGFP fluorescence.
Fluorescence throughout miracidial tissues was achieved using EGFP driven by the schistosomal ubiquitin promoter and terminator, emphasizing the ubiquitous activity of this gene as predicted by transcriptome analyses. [43][44][45] This outcome confirmed reporter gene activity under the control of these ubiquitin elements and demonstrated the accessibility of GSH1 for the transcriptional machinery after programmed KI. The findings also revealed the feasibility of selection at the microscopic level, which would enable hand-picking of reporter-positive miracidia for snail infection to complete the life cycle. Following snail infection, fluorescing cercariae, or in the case of mono-miracidial infections reporter gene PCR-positive cercariae, could be selected for infection of laboratory rodents to propagate heritably transgenic worms. In future approaches, GSH1 may alternatively be used as a locus to integrate other transgenes, e.g., antibiotic resistance gene(s), to enable drug selection at the stage of embryogenesis or during the intermediate host stage in the snail. Here, oxamniquine is a suitable candidate drug. 42 Where mono-miracidial infections with reporter gene-positive miracidia are performed, additional selection manipulations (microscopic or PCR based) could be undertaken on the clonal cercariae derived by this approach. Thereafter, reporter genepositive female and male cercariae, in which gender can be confirmed by PCR, 46 would enable genetic crosses in the mammalian host. 47 Cell Reports Methods 3, 100535, July 24, 2023 7 Report ll OPEN ACCESS These findings are consequential in that they advance functional genomics for a hitherto unmet challenge to manipulate a pathogen of global public health significance. They confirm that transgenes can be inserted into a predicted GSH to endow individual stages or populations of these pathogens with functions, with broad potential for basic and translational studies. [48][49][50] Whereas this report deals with somatic transgenesis of the schistosome larva, the same approach is used for transfection of the newly laid egg (NLE) of S. mansoni, a stage that at its origin includes a single zygote (surrounded by vitelline yolk cells). The NLE represents a window to the germline, and the hypothesized accessibility of its zygote may facilitate complete transformation to derive lines of transgenic parasites carrying gain-or loss-of-function mutations. In addition, the gene editing methods developed here can be adapted for KO approaches of other genes of interest, in schistosomes, and likely other platyhelminths, for which genome sequences are available to be analyzed for GSHs. The information presented provided insights into efficient transgenesis and forward genetics for S. mansoni for other parasitic (and free-living) helminths.

Limitations of the study
We focused on GSH1 because there were limitations to progress with the other prospective GSHs 2-4. The CHOPCHOP software predicted only a single CRISPR-Cas9 target in these three GSHs, and moreover, the site predicted in GSH4 was not specific and showed potential off-target hits elsewhere in the genome. For GSH3, CHOPCHOP predicted only low, <35%, editing efficiency. Overall, locating a Cas9 PAM 51 is constrained by the AT-rich nature of the schistosome genome. 52 Since this investigation deployed multiple overlapping gRNAs 18 to facilitate homology-directed insertion of transgene, we ranked GSH1 as the most qualified for our purposes because multiple PAMs were present, along with the absence of off-target activity. Although a distance >2 kb from known genes was one of our criteria for GSHs in S. mansoni, intragenic sites rather than intergenic GSHs nonetheless may have expedient attributes for functional genomics where partial loss of fitness may be less consequential. Yet, intergenic sites are inherently safer given coding regions or other elements are not disrupted.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

Materials availability
This study did not generate new unique reagents.
Data and code availability d The nucleotide sequence reads are available at the NIH Sequence Read Archive, BioProject PRJNA919068, accession numbers SRX18957908-18957932. BED files from the bioinformatics analysis are publicly available on Zenodo. The DOI is https://zenodo.org/record/7602535#.ZAIgqBPMLIM. It is also listed in the key resources table. d This paper dose not report original code. d Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Mice
Mice (female, Swiss Webster) infected with S. mansoni were obtained from the Schistosomiasis Resource Center (Biomedical Research Institute, Rockville, MD) within seven days of infection by cercariae (180 cercariae/mouse/percutaneous infection). The mice were housed at the Animal Research Facility of George Washington University, which is accredited by the American Association for Accreditation of Laboratory Animal Care (AAALAC no. 000347) and has the Animal Welfare Assurance on file with the National Institutes of Health, Office of Laboratory Animal Welfare, OLAW assurance number A3205. All procedures employed were consistent with the Guide for the Care and Use of Laboratory Animals. The Institutional Animal Care and Use Committee of the George Washington University approved the protocol used for maintenance of mice and recovery of schistosomes.

METHOD DETAILS
Computational search for gene safe harbors in Schistosoma mansoni We undertook a genome analysis focusing of intergenic (gene-free) regions to identify prospective GSHs, using similar approaches as those used on the human genome. 28 We aimed to locate a GSH, a site that would facilitate stable expression of the integrated transgene free of interference from the host genome and which, in parallel, integrates and transcribes the transgene without negative consequences or loss of fitness for the host cell. The search for GSHs deployed included several criteria, First, its location should be adjacent to peaks of H3K4me3, a histone modification associated with euchromatin and transcription start sites. 53 Second, it should not be near or not containing H3K27me3 in any developmental stage, a histone mark associated with heterochromatin. 53 Third, as the schistosome genome contains highly repetitive elements, 52 the GSH site should be located in a unique tract of the genome sequence. Fourth, it should reside in open, euchromatic chromatin accessible to Tn5 transposase as assessed from ATACsequencing, which provides a positive display of transposase integration events 54 ; consequently, safe harbor candidate regions should deliver an ATAC-sequence signal. Fifth, in the vicinity of known HIV integration sites, given that HIV integrates preferentially into euchromatin in human cells, 55 we anticipated that HIV integration into the schistosome genome may likewise indicate a region of euchromatin ( Figure 1A). 56 To predict loci conforming to the criteria, pooled ChIP-seq data for H3K4me3 and K3K27me2 from previous studies were aligned against S. mansoni genome data (version 9 on the date of analysis). ATAC-seq was performed as described. 57 Peakcalls of ChIP-seq and ATAC-Seq were done with ChromstaR 21,28,53,54 and stored as Bed files. Bed files were used to identify the presence of H3K4me3 and absence of H3K27me3 in adults, miracidia, in vitro sporocysts, cercariae and in vitro schistosomula with bedtools intersect. Thereafter, ATAC-seq data from adult male and adult female worms (two replicates each) were intersected to find common ATAC-positive regions. H3K4me3-only (H3K27me3-absent) common to all stages and ATAC signals were intersected to find common regions. Next, the HIV integration sites were identified by using data from ERR33833.8. Reads were mapped to the lentivirus genome (HIV-1 vector pNL-3, accession AF324493.2) using Bowtie2 with default parameters. Paired reads were extracted where one end mapped to HIV and the other end mapped to schistosome genome at a unique location. Genes from the BED files above that located %11 kb HIV-1 integration sites were identified with bedtools closestbed. Gene expression data for these genes were obtained using the metanalysis tool, https://meta.schisto.xyz/analysis/, of Lu and Berriman. 44 Computational searches that addressed these criteria predicted, a priori, gene free (intergenic)-GSH (Figure 1), given that transgene integration into an existing gene could diminish fitness of the genetically modified cell. 23,24 We defined genes as protein coding sequences and sequences coding for long non-coding RNA (lncRNA). In view of our goal to use CRISPR/Cas mediated-HDR to insert the transgene, we searched preferentially for unique sequences, to obviate off-target gene modification, and excluded gene