The genome sequence of the nematode Caenorhabditis drosophilae (Rhabditida, Rhabditidae) (Kiontke, 1997)

We present a genome assembly of the free-living nematode Caenorhabditis drosophilae (Nematoda; Chromadorea; Rhabditida; Rhabditidae). The genome sequence is 51.3 megabases in span. Most of the assembly is scaffolded into six chromosomal pseudomolecules, including the X sex chromosome. The mitochondrial genome has also been assembled and is 15.15 kilobases in length.


Background
Caenorhabditis drosophilae is a free-living, gonochoristic nematode species that was originally isolated from its phoretic host Drosophila nigrospiculata, a fly species that feeds on rot of saguaro cactus, Carnegiea gigantea, in Arizona, USA (Kiontke, 1997;Kiontke, 1999).C. drosophilae are bacterivorous and can be maintained on agar plates seeded with Escherichia coli, but presumably feed on the mixed community found in saguaro rot in the wild.Unlike many other Caenorhabditis species, where it is the third larval "dauer" stage (L3d) that actively associate with the phoretic host, host association in C. drosophilae is established by second stage larvae (L2) that are attracted to D. nigrospiracula pupae (Kiontke & Sudhaus, 2006).The L2 are predetermined to become L3d and moult just before the adult fly encloses.The L3ds migrate to a pouch in the head of the fly formed by the retracted ptilinum and leave the fly when it visits a cactus rot (Kiontke, 1997;Kiontke & Sudhaus, 2006).To exit dauer and resume development, C. drosophilae requires an unknown signal from the fly (Kiontke & Sudhaus, 2006).
C. drosophilae is most closely related to the formally undescribed Caenorhabditis sp. 2, and together these comprise the Drosophilae group (Dayi et al., 2021;Kiontke et al., 2004;Kiontke et al., 2011).The Drosophilae group was previously included in a larger Drosophilae supergroup (Dayi et al., 2021;Kiontke et al., 2004;Kiontke et al., 2011), but phylogenetic analysis of whole-genome data suggest that the Drosophilae supergroup is paraphyletic (Dayi et al., 2021;Sloat et al., 2022;Stevens et al., 2019).To promote the use of nematodes in evolution, ecology, and wider biological research we are sequencing to high quality the genomes of a wide range of species, both free-living and parasitic.Here, we present a chromosomelevel reference genome for C. drosophilae strain DF5112, an inbred derivative of DF5077, which was isolated from a rotting saguaro cactus in Arizona, USA by Karin Kiontke.

Genome sequence report
The genome was sequenced from cultured C. drosophilae DF5112.We obtained the strain DF5112 from the Caenorhabditis Genetics Center (CGC).A total of 58-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 15 missing joins or mis-joins, reducing the scaffold number by 37.5%, and increasing the scaffold N50 by 2.9%.
The final assembly has a total length of 51.33 Mb in 10 sequence scaffolds with a scaffold N50 of 8.7 Mb (Table 1).The snail plot in Figure 1 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 2. The cumulative assembly plot in Figure 3 shows curves for subsets of scaffolds assigned to different phyla.Most (99.77%) of the assembly sequence was assigned to 6 chromosomal-level scaffolds, representing 5 autosomes and the X sex chromosome.Chromosome-scale scaffolds were confirmed by the Hi-C data (Figure 4; Table 2).Chromosomes I_II and II_I have been named to represent a reciprocal translocation between chromosomes I and II in C. drosophilae when compared to C. elegans, whose chromosome nomenclature has been used to name this assembly.
We mapped the six C. drosophilae chromosomes to the rhabditid nematode ancestral linkage groups (Nigon elements) (Gonzalez de la Rosa et al., 2021) (Figure 5).The X chromosome is the product of a fusion between NigonN and NigonX that occurred in the last common ancestor of all Caenorhabditis species (Gonzalez de la Rosa et al., 2021).NigonA and NigonB (corresponding to C. elegans chromosomes I and II, respectively) have undergone a reciprocal translocation event.The remaining chromosomes correspond to complete Nigon elements, suggesting they have not been involved in fusion or fission events since the last common rhabditid ancestor.
While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to additional haplotypes present in the strain have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Nematode culturing
We obtained C. drosophilae DF5112 from the Caenorhabditis Genetics Center (CGC).DF5112 is an inbred derivative of DF5077, which was isolated from a rotting saguaro cactus in Arizona, USA by Karin Kiontke.We cultured DF5112 on NGM plates seeded with Escherichia coli strain HB101.After most of the bacteria had been consumed, we harvested the nematodes by washing the plates with cold M9 into 50 ml Falcon tubes, which we then centrifuged at 4000 rcf for 8 min with the brake set to the value of 3. We discarded the supernatant and washed the nematodes a further three times using M9 supplemented with 0.01% Tween.We performed a final wash using PBS buffer.We divided the worms into 1.5 ml DNA LoBind ® Tubes (Eppendorf).We recorded the weight of each pellet before flash freezing in liquid nitrogen and storing at -70°C.

Long-read DNA extraction and sequencing
We extracted high molecular weight DNA from a 150 mg pellet of nematodes using the MagAttract HMW DNA kit  (Qiagen) with the following modifications.The lysis mix was prepared and placed on ice: 200 µl PBS, 20 µl ProteinaseK (Qiagen), 4 µl RNase A (Qiagen), 150 µl AL buffer (Qiagen).
We added 75 µl of the lysis buffer mix to the frozen nematode pellet and used a BioMasher II to disrupt the pellet.We added the remaining lysis buffer and mixed with a wide bore tip.
We transferred the lysis solution to a 2 ml DNA LoBind ® Tube (Eppendorf) and digested overnight at 45°C mixing at 600 rpm in a ThermoMixer C (Eppendorf).We added 15 µl of MagAttract Suspension G before we mixed everything with 280 µl Buffer MB.We eluted the MagAttract beads twice using 200 µl of Buffer AE in each elution step.We incubated the second elution mix at 25°C with 1000 rpm for 3 min in the ThermoMixer C before transferring the elution liquid to a new 1.5 ml LoBind microtube.In total, we extracted 2,700 ng high molecular weight DNA.1,530 ng of this DNA was sheared to an average size of 13.2 kb with a Megaruptor 3 (setting 30) (Diagenode).The sheared DNA was SPRI cleaned with 1.8x of AMPure XP beads (Beckman Coulter).
A PacBio library was prepared from the extracted DNA by the Scientific Operations: Sequencing Operations core at the Wellcome Sanger Institute using the PacBio Low DNA Input Library Preparation Using SMRTbell Express Template Prep Kit 2.0.The library was sequenced on a single PacBio Sequel IIe flow cell.

Hi-C sequencing
Hi-C library preparation and sequencing were performed by the Scientific Operations: Sequencing Operations core at the Wellcome Sanger Institute.A 20 mg pellet of mixed stage nematodes was processed using the Arima Hi-C version 2 kit following the manufacturer's instructions.An Illumina library was prepared using the NEBNext Ultra II DNA Library Prep Kit and sequenced on one-eighth of a NovaSeq S4 lane using paired-end 150 bp sequencing.et al., 2020) to estimate genome size and heterozygosity.We first generated a primary and alternate assembly from the PacBio HiFi data using Hifiasm (Cheng et al., 2021).We randomly subsampled 10% of the Hi-C reads using samtools 1.14 (Danecek et al., 2021) and aligned them to the hifiasm primary assembly using bwa-mem 0.7.17-r1188 (Li, 2013), filtered out PCR duplicates using picard 2.27.1-0 (available at http://broadinstitute.github.io/picard/),and scaffolded the assembly using YaHS (Zhou et al., 2023).We ran BlobToolKit 2.6.5 (Challis et al., 2020) on the scaffolded assembly and used the interactive web viewer to manually screen for scaffolds derived from non-target organisms.We also ran BlobToolKit on an unscaffolded version of the alternate assembly.We removed one and three Proteobacteria (E.coli) contigs from the primary and alternate assemblies, respectively.After removing contaminants, we used MitoHiFi 2.2 (Uliano-Silva et al., 2023) to extract and annotate the mitochondrial genome.We removed residual haplotypic duplication from each assembly using purge_dups (Guan et al., 2020) and scaffolded the purged primary assembly using Hi-C data (Rao et al., 2014), as previously described.The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021).Manual curation was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and PretextView (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2023), which runs MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.

Evaluation of final assembly
The final assembly was post-processed and evaluated with the three Nextflow (Di Tommaso et al  (Manni et al., 2021).
The sanger-tol/blobtoolkit pipeline is a Nextflow port of the previous Snakemake Blobtoolkit pipeline (Challis et al., 2020).It aligns the PacBio reads with SAMtools and minimap2 Table 3 contains a list of relevant software tool versions and sources.
Table 3. Software tools: versions and sources.

Software tool Version
1.It appears that this is the report of an assembly alone.Annotations are conspicuously absent.
However, it appears the annotations might exist, as the Figure 5 legend notes the use of singlecopy orthologs.But, from the methods, it is not obvious to me how single-copy orthologs were inferred.Additionally, the BUSCO-associated methods refer to "blastp," which requires amino acid sequences (and suggests the existence of annotations).However, it is possible that the approaches used to generate Figure 5 (and the BUSCO completeness scores) are entirely annotation-independent.Beyond this, as the assembly size is super small for a Caenorhabditis species (51.3 MB), it is natural to wonder how many genes are present.Could this be the smallest Caenorhabditis genome yet?I suspect it may be, as it is smaller than the 59 MB C. niphades assembly!Yet, in this emerging era of preprints, micropublications, and data notes, I feel somewhat uncomfortable making a comment like this.Nonetheless, the lack of annotations makes this resource less useful and more difficult to contextualize than it would be otherwise.
Note, if the BUSCO scores and Nigon painting results were generated in an annotationindependent manner (or if I just overlooked the existence of available annotations), this comment can be ignored.
2. The reported methods and assembly statistics are transparent and sufficient for the aims of a data note.Table 3 is fantastic.But, I do not believe I see any links to the code used to generate this assembly?While I do not believe this is strictly required for a data note like this, I believe this would be helpful for those who want to generate nematode assemblies in a similar manner.Again, if I overlooked this link in the publication, this comment can be ignored.
3. I assume the bottom y-axis of Figure 2 corresponds to coverage because "_cov" is on the label.If this is noting something else, then this axis label should be changed.
4. This is very minor point.But, while blobtoolkit seems like a fantastic tool for evaluating genome assemblies, Figure 1 is not exactly intuitive to me.Multiple two-axis Cartesian plots are preferable to a single, information-dense circle.However, I understand the aim here is to just get this out, so this figure is sufficient for this paper.
And that's it-thanks for assembling this genome and sharing it with the community!

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Genetics, evolution, development, and genomics.I have never personally assembled a genome for publication, although I have published work in comparative genomics.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Reviewer Report 06 August 2024 https://doi.org/10.21956/wellcomeopenres.24695.r89257 © 2024 Qing X.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Xue Qing
Nanjing Agricultural University, Nanjing, China This MS describes the genome sequencing and assembly for Caenorhabditis drosophilae.The manuscript is well written, and genome was properly assembled and analyzed.The generation of chromosomal level genome is a major achievement, which is very valuable for the community.It's a pity author did not sequence the transcriptome or predict the genes.It would be interesting to see the genome composition.However, this may not be necessary for such type of data report.

Comments:
In background, author detailed the biology of sequenced species, but lacks the information for genomes in Caenorhabditis (how many been sequenced to chromosome level, size range...), or at least for closely related species within Drosophilae group.In fact, one of very important result for me is the small genome size, which Is half of C.elegans.

○
The

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: genomics, diversity, evolution, Nematodes I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Asher D. Cutter
University of Toronto, Toronto, Ontario, Canada Kieninger et al. present in a Data Note the genome sequencing and assembly for the nematode species Caenorhabditis drosophilae.This new chromosome-scale genome assembly will be a valuable contribution to the broader research community.Overall, the data collection and analysis appears sound with standard procedures and tools that are available to the research community, the links to which are provided in a table.The presentation also is clear.I have no major concerns or major criticisms.

Minor correction:
I noticed what appears to be a typo for a species name of the host insect in the first sentence of the Background that should be corrected (nigrospiculata -> nigrospiracula).
Is the rationale for creating the dataset(s) clearly described?

Minor comments
The article could benefit from providing more context about why C. drosophilae was chosen for genome sequencing and its significance in evolutionary or ecological studies.A brief introduction situating the species within the broader context of nematode biology would enhance the relevance of the study.The Background section could be expanded to provide more context.Reviewer Expertise: Single cell omics, Genomics, NGS, Bioinformatics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
I have indicated that the description of the methods only partially allow replication, because it appears that specific command line parameters are not mentioned for various softwares and because the manuscript describes manual curations.The latter is not really an issue because I am sure that the manual curations let to some improvements of the final assembly.However, unless all the changes have been documented or access to the raw assembly is provided, this makes the reproducibility of the complete assembly process a bit difficult.However, I do not see that as a real problem and I would leave it to the authors if and how to address this.

Specific comments
"Chromosome-scale scaffolds were confirmed by the Hi-C data (Figure 4; Table 2)."-> ○ Calling this "confirmation" is a bit circular since the scaffolds were generated using the Hi-C data.My suggestion: "Figure 4 shows the chromosomal layout of the C. drosophilae genome as inferred from Hi-C data."The section "Evaluation of final assembly" describes a computational pipeline.As I am not completely familiar with all the software, I would greatly appreciate a small description what useful information could be gained by running these programs.I think, what would be also helpful, would be a statement of how these results looked like for the C. drosophilae genome, e.g.how does the coverage distribution looks like across the genome?Is the X:autosome coverage ratio as expected?Reviewer Expertise: Nematode genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Genome assembly of Caenorhabditis drosophilae, nxCaeDros1.1:metrics.The BlobToolKit snail plot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 51,328,830 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (10,235,040 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (8,652,941 and 6,839,909 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the nematoda_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Caenorhabditis%20drosophilae/dataset/GCA_963572285.1/snail.

Figure 2 .
Figure 2. Genome assembly of Caenorhabditis drosophilae, nxCaeDros1.1:BlobToolKit GC-coverage plot.Sequences are coloured by phylum.Circles are sized in proportion to sequence length.Histograms show the distribution of sequence length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Caenorhabditis%20drosophilae/dataset/GCA_963572285.1/blob.

Figure 3 .
Figure 3. Genome assembly of Caenorhabditis drosophilae nxCaeDros1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all sequences.Coloured lines show cumulative lengths of sequences assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Caenorhabditis%20drosophilae/dataset/GCA_963572285.1/cumulative.

Figure 4 .
Figure 4. Genome assembly of Caenorhabditis drosophilae nxCaeDros1.1:Hi-C contact map of the nxCaeDros1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=YVko6Y-FRiWq1BLVluzh8A.

Figure 5 .
Figure 5. Nigon painting of Caenorhabditis drosophilae.Counts of Benchmarking using Single Copy Orthologues (BUSCO) loci in 500 kb windows in the six C. drosophilae chromosomes coloured by their allocation to the seven Nigon elements (A-E, N, X) (Gonzalez de la Rosa et al., 2021).

○
A section discussing the implications of the genome assembly could provide insights into how this research contributes to understanding nematode evolution, ecology, or other biological aspects ○ Please add more detail to some figure legends for a broader audience in biology.○Is the rationale for creating the dataset(s) clearly described?YesAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?YesAre the datasets clearly presented in a useable and accessible format?YesCompeting Interests: No competing interests were disclosed.

Figure 4
Figure 4This Figure lacks a color scale and an indication of how much sequence a single dot represents.○

Figure 5 ,
Figure 5, just out of curiosity, I wonder if it the reciprocal translocations between chromosomes I and II are shared with any other published Caenorhabditis genomes or did this happen quite recently in the lineage leading to C. drosophilae?

Table 2 . Chromosomal pseudomolecules in the genome assembly of Caenorhabditis drosophilae, nxCaeDros1. INSDC accession Name Length (MB) GC Percent
Table 1, figure1 and 3 are redundant, presenting the nearly same information.It would be interesting to make a table to show the genome statistics in genus Caenorhabditis, or the a phylogenomic tree showing the placement of Caenorhabditis drosophilae.Alternatively, you may replace one of this figure to GenomeScope graph, or synteny analysis in compare to C.elegans/other species in Drosophilae group In MS, it was stated the final assembly has a total length of 51.33 Mb in 10 sequence scaffolds, but in table 1 it was 9 scaffolds There are many mistakes in table3, e.g. two HiGlass and BUSCO in table 3, the Blast should be capitalized.In text the BlobToolKit is in version 2.6.5 but table 4.3.7.I haven't check them, but please carefully check to make sure they are correct.

Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

Reviewer Expertise: Evolutionary genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, USA The article by Kieninger et al, provides a comprehensive genomic analysis of a nematode species C. drosophilae.This study highlights the organism's genetic structure and assembly methodology.The authors show that the final assembly spans 51.33 Mb across 10 scaffolds, with a notable scaffold N50 of 8.7 Mb.The genomic completeness is robust, with a Quality Value (QV) of 55.6, kmer completeness of 99.99%, and BUSCO assessment showing 93.3% completeness against the nematoda_odb10 reference set.The authors have leveraged bioinformatics tools such as BlobToolKit for genome evaluation, MitoHiFi for mitochondrial genome annotation, and gEVAL for contamination screening and correction.Their analyses have revealed evolutionary insights, including a fusion event on the X chromosome and reciprocal translocations between chromosomes I and II, compared to the model nematode C. elegans.Overall, this article is well written and advances understanding of C. drosophilae's genomic architecture but also sets methodological standards for future comparative genomic studies within the Caenorhabditis genus.Most of my comments are minor and only aim at improving the manuscript.