Genomic Resources for Goniozus legneri, Aleochara bilineata and Paykullia maculata, Representing Three Independent Origins of the Parasitoid Lifestyle in Insects

Parasitoid insects are important model systems for a multitude of biological research topics and widely used as biological control agents against insect pests. While the parasitoid lifestyle has evolved numerous times in different insect groups, research has focused almost exclusively on Hymenoptera from the Parasitica clade. The genomes of several members of this group have been sequenced, but no genomic resources are available from any of the other, independent evolutionary origins of the parasitoid lifestyle. Our aim here was to develop genomic resources for three parasitoid insects outside the Parasitica. We present draft genome assemblies for Goniozus legneri, a parasitoid Hymenopteran more closely related to the non-parasitoid wasps and bees than to the Parasitica wasps, the Coleopteran parasitoid Aleochara bilineata and the Dipteran parasitoid Paykullia maculata. The genome assemblies are fragmented, but complete in terms of gene content. We also provide preliminary structural annotations. We anticipate that these genomic resources will be valuable for testing the generality of findings obtained from Parasitica wasps in future comparative studies.

such non-hymenopteran parasitoid systems would benefit from genomic resources, as it is unknown to what extent insights from hymenopteran parasitoids can be extrapolated to non-hymenopteran parasitoids. Unfortunately, no sequenced genomes are available for any of these groups as yet.
Here, we present draft genome assemblies for three parasitoid insect species that each represent an evolutionary independent acquisition of the parasitoid lifestyle ( Figure 1A). Goniozus legneri is a parasitoid wasp from the superfamily Chrysidoidea (family Bethylidae) that is not part of the species-rich and well-studied Parasitica clade. G. legneri can therefore function as an outgroup in comparative analyses of Parasitica wasps and may have re-evolved the parasitoid lifestyle after it was lost at the base of the Aculeata. G. legneri is a gregarious parasitoid of Lepidopteran larvae that stings and paralyzes its prey before ovipositing on it externally (Khidr et al. 2013). The female then guards the host against utilization by other females (Khidr et al. 2013). Aleochara bilineata is a Coleopteran parasitoid of Dipteran pupae that represents another evolutionary independent acquisition of the parasitoid lifestyle. Females lay their eggs in the proximity of hosts (Bili et al. 2016). After hatching, the first instar larvae search for, enter and parasitize host pupae (Bili et al. 2016). Paykullia maculata is a parasitoid fly from the family of the Rhinophoridae, representing one of the many independent acquisitions of the parasitoid mode within the Diptera. Like all Rhinophoridae, P. maculata parasitizes isopods. Females lay their eggs in the vicinity of isopod aggregations. The larvae latch on to a passing isopod and enter it through an intersegmental membrane (Wijnhoven 2001). They feed on the isopod's hemolymph and later also on non-vital tissues, like the female ovaria (Wijnhoven 2001). The isopod continues to feed and molt normally, until the parasitoid kills it and pupates within the host exoskeleton (Wijnhoven 2001). Here, we present draft genome sequences for these three species.

Samples
Specimens of G. legneri were obtained from a long-term culture at the University of Nottingham (UK), which is descended from founders originally collected in Uruguay (I. Hardy, pers. comm.). Specimens of A. bilineata used for genome sequencing were obtained from a culture maintained at the University of Rennes (France). Specimens of  (Visser et al. 2010). Parasitoid species are indicated in black font and non-parasitoid species in gray font. The three species for which we provide draft genome assemblies are indicated by arrows. (B) Completeness of the genome assemblies and annotation. Orthologs of 1658 genes that are present as single copies in at least 90% of insects were retrieved using Busco v3. From top to bottom: Goniozus legneri genome, G. legneri gene set, Aleochara bilineata genome, A. bilineata gene set, Paykullia maculata genome and P. maculata gene set. (C) Circle plots illustrating the synteny between the 50 largest scaffolds of the genome assemblies of G. legneri, A. bilineata and P. maculata (top half of each circle) and the genomes of the closest available relative with chromosome-or linkage group-level assemblies (A. mellifera, T. castaneum and D. melanogaster, respectively; bottom half of each circle). Links are colored to match the chromosome or linkage group of the model species.
A. bilineata used for flow cytometry were obtained commercially from De Groene Vlieg (www.degroenevlieg.nl). Specimens of P. maculata were cultured from Porcellio scaber, collected in the field in The Netherlands (Wageningen and Amsterdam) during 2015. For each species, DNA was extracted from a single adult female using different spin column protocols. For G. legneri and A. bilineata, DNA was obtained by crushing the insect in 100-200 ml PBS and adding 100 ml Nuclei Lysis Solution, 5 ml RNAse (4 mg/ml)(both Promega), 4 ml Proteinase K (20 mg/ml, Roche) and incubating at 60°for 15 min. A further 340 ml Wizard SV Lysis Buffer (Promega) was then added and the sample was centrifuged at full speed for 5 min. The supernatant was then transferred to a spin column (Promega), rinsed with 500 ml Wizard SV Wash Solution (Promega) three times and eluted in 100 ml H 2 0. P. maculata was crushed in liquid nitrogen and DNA was extracted using QIAamp DNA Mini Kit (Qiagen) following the manufacturer's protocol. The quantity and quality of the DNA samples was assessed using Nanodrop (Thermo Fisher Scientific) and Qubit 2.0 Fluorometer (Invitrogen).
Whole genome sequencing Two Illumina Truseq DNA libraries with slightly different insert sizes (400 and 500 bp) were constructed from each DNA sample. The two libraries for each species were barcoded, pooled and sequenced using 2x100 bp paired-end sequencing on Illumina HiSeq 2000 (G. legneri and A. bilineata), or 2x125 bp paired-end on Illumina HiSeq 2500 (P. maculata).
De novo genome assembly Prior to de novo assembly, we characterized the raw read data using SGA (Simpson 2014) and estimated genome size using KmerGenie (Chikhi and Medvedev 2014). Furthermore, 21-mer counts were obtained using Jellyfish (Marçais and Kingsford 2011) and plotted as histograms using GenomeScope (Vurture et al. 2017). GenomeScope also provided an additional genome size estimate. Ploidy structure was estimated from the 21-mer counts using Smudgeplot (https://github.com/tbenavi1/smudgeplot). Furthermore, we removed co-sequenced genomes to reduce complexity of the read set. To this end, reads were mapped to the mitochondrial DNA of well-sequenced related species (Apis mellifera L06178.1, Tribolium castaneum NC_003081.2 and Drosophila melanogaster U37541.1 for G. legneri, A. bilineata and P. maculata, respectively) and to a panel of 12 Wolbachia strains (wAu GCA_000953315.1; wBm GCA_000008385.1; wCle GCA_000829315.1; wPip_Pel GCA_ 000073005.1; wHa GCA_000376605.1; wNo GCA_000376585.1; wRi GCA_000022285.1; wMel GCA_000008025.1; wFol CP015510.1; wOo GCA_000306885.1; wOv GFA_000530755.1; wPip GCA_ 000208785.1) using Bowtie2 (Langmead and Salzberg 2012) with default settings. Unmapped reads were extracted from the Sam file using samtools view with -f 4 (Li et al. 2009) and reverted back to fastq using bedtools bamToFastq (Quinlan and Hall 2010). Reads were trimmed using platanus_trim (Kajitani et al. 2014). Given that the results from the SGA analysis indicated high levels of heterozygosity for A. bilineata and P. maculata, we chose to perform the de novo assembly for all three species in Platanus, which is specifically geared to deal with short-read data from heterozygous genomes (Kajitani et al. 2014). Assembly was followed by scaffold and gap_ close steps as implemented in the Platanus pipeline. To assess coverage, reads were mapped back to the assembly using Bowtie2 with default settings and per-base genome coverage was calculated using samtools depth.

Annotation
Structural annotation was performed using Maker2 using default settings (Holt and Yandell 2011). For each species, we included a protein training set from the closest relative with a well-annotated reference genome (A. mellifera Amel_4.5, T. castaneum Tcas5.2 and D. melanogaster release 6 for G. legneri, A. bilineata and P. maculata, respectively). Augustus was provided with gene models for the same combinations of species. Annotation statistics were obtained using GAG (Hall et al. 2014). The completeness of the genomes and gene sets was assessed by identifying the number of insect Benchmark Universal Single-Copy Orthologs (BUSCOs) (Simão et al. 2015). BUSCO v3.0.2 was run on both the genome assembly (using -m geno) and the Maker gene set at the peptide level (using -m prot) with the insecta_odb9 lineage dataset as reference. We compared the draft genome of each species to that of its closest available relative with a chromosome-or linkage group-level genome sequence available (A. mellifera Amel_4.5, T. castaneum Tcas5.2 and D. melanogaster release 6 for G. legneri, A. bilineata and P. maculata, respectively) using SyMap v4.2 (Soderlund et al. 2011).

Co-sequenced genomes
To remove any co-sequenced genomes (in addition to Wolbachia and mitochondria, which were removed prior to assembly), we employed the Blobology pipeline (Kumar et al. 2013).

Flow cytometry
Since the genome size predicted for A. bilineata was small (see Results), we performed flow cytometry to provide an independent estimate of genome size for this species. Flow cytometry was performed at Plant Cytometry Services (www.plantcytometry.nl) using the following protocol. Three replicate measurements were obtained. For each measurement, a single head of A. bilineata together with a whole specimen of D. melanogaster (obtained from long-term culture held at the Vrije Universiteit Amsterdam) was fragmented with a sharp razor blade in 500 ml extraction buffer (Sysmex), in a plastic petri disc. After 30 -60 sec of incubation, 2 ml staining buffer (Sysmex: propidium iodide, RNAse, 0,1% dithiothreitol and 1% polyvinylpyrolidone). The sample was then passed through a nylon filter of 50 mm mesh size. After 30 min incubation at room temperature, the filtered solution with stained nuclei was sent through the flow cytometer (Partec Cube). The fluorescence of the stained nuclei, passing through the focus of the light beam of a 50 mW, 532 nm green laser, was measured by a photomultiplier and converted into voltage pulses. These voltage pulses are electronically processed to yield integral and peak signals.

Data availability
The Whole Genome Shotgun projects have been deposited at DDBJ/ ENA/GenBank under the accessions NCVS00000000 (G. legneri), NBZA00000000 (A. bilineata) and NDXZ00000000 (P. maculata). The versions described in this paper are versions NCVS01000000, NBZA01000000 and NDXZ01000000, respectively. Mapped reads and genome annotations are available through http://parasitoids.labs.vu.nl/ parasitoids/. This website also includes genome browsers and viroblast n

RESULTS AND DISCUSSION
We generated 18.6-43.6 Gb data per species, covering each genome .100x (Table 1). These data were assembled into draft genomes that were reasonably close to the predicted size from k-mer analysis for each species (Table 2). The genome size of G. legneri appears small compared to other sequenced genomes of Hymenoptera, but is within the range for parasitoid wasps (e.g., Macrocentrus cingulum: 128 Mb, Fopius arisanus: 153 Mb). The estimated genome size of A. bilineata is smaller than that of other sequenced Coleoptera (smallest to date is Hypothenemus hampei: 151 Mb; see below). The genome size of P. maculata is within the range observed for Diptera (e.g., Zaprionus indianus 124 Mb; Rhagoletis zephyria: 795 Mb). Further work is required to establish the accuracy of our genome size estimates. The genome assemblies presented here were obtained from outbred, heterozygous individuals. The Platanus assembler is specifically designed to handle such data and creates a mosaic of the two haplotypes (Kajitani et al. 2014). We tested this by mapping the sequence reads back to the genome assembly. Bowtie2 in default settings reports only the best alignment for each read and haplotypes that had assembled as separate contigs should have half the coverage as collapsed haplotypes, as reads would only map to one of the haplotypes. In our case, the coverage histograms were unimodal ( Figure S1), indicating that haplotypes were successfully collapsed.
In the search for co-sequenced genomes, the assembly for G. legneri yielded hits to Rhabditida nematodes, some of which are known parasites of insects. A Blast search of the entomopathogenic Rhabditidid Oscheius sp. (Lephoto et al. 2016) against the G. legneri genome assembly revealed two hits, upon which we removed one contig and one partial contig from the assembly. A. bilineata contained Wolbachia, but no other co-sequenced genomes. P. maculata contained no cosequenced genomes.
The genome assemblies are very complete, with 97-99% of BUSCOs present and only 0.4-4.7% fragmented ( Figure 1B). Lacking transcriptome data and other genomic resources for these or closely related species, the structural annotation is less complete. Maker2 annotated 5588-7463 genes per genome (Table 3), which is below the expected value for eukaryotes. BUSCO analysis indicated the gene sets to be 63-85% complete ( Figure 1B). Genes missing from the annotation, but present in the genome assembly were more often fragmented, had relatively low bitscores and shorter alignment length to the BUSCO profile (Table S1). It is thus imperative for future studies to interrogate the genome assembly for genes missing from the gene set.
The level of synteny to well-characterized genome sequences of related model species varied ( Figure 1C). The draft genome of G. legneri shows many collinear regions with the genome of A. mellifera, while the similarity between P. maculata and D. melanogaster is limited, with intermediate collinearity between A. bilineata and T. castaneum ( Figure  1C). These differences are probably caused by a combination of factors, including quality of the draft genome assemblies, levels of relatedness to the selected model organism and rates of genome evolution.
As the genome size estimates and the assembled genome for A. bilineara were smaller than for other Coleoptera, we obtained an independent genome size estimate using flow cytometry. The three replicate measurements indicated a genome that was 1.32x larger than that of D. melanogaster (range 1.27 -1.38, Figure S2). Given D. melanogaster has a genome of 180 Mb, our flow cytometry estimated the genome of A. bilineata to be 238 MB. Remarkably, this is almost exactly twice the size estimated from our genome sequencing analysis. BUSCO analysis indicated that our genome assembly was very complete, so the discrepancy cannot be explained by large amounts of coding sequence missing from the sequencing data. Alternatively, a high proportion of repeat sequences in this genome could result in an underestimation of genome size from the sequencing data. However, the k-mer profile ( Figure S2B) provided no indication that this was the case. The most likely explanation seems endoreduplication in head n tissue of this species. This was supported by the Smudgeplot analysis ( Figure S3B), that indicated the presence of triploid and tetraploid cells.
In summary, we present fragmented, but relatively complete genome assemblies of three parasitoid insects, representing three independent evolutionary origins of the parasitoid lifestyle. These genomes will be valuable for comparisons to the widely studied parasitoid Hymenoptera, for which numerous genomes are available (Branstetter et al. 2018). Our study highlights that useful genomic resources can now be obtained from highly heterozygous individual insects collected from outbred lab cultures or even from the field, relieving the need for labor-intensive inbreeding procedures.