The genome sequence of the drosophilid fruit fly, Drosophila phalerata (Meigen, 1830)

We present a genome assembly from an individual male Drosophila phalerata (drosophilid fruit fly, Arthropoda; Insecta; Diptera; Drosophilidae). The genome sequence is 223.9 megabases in span. Most of the assembly is scaffolded into 7 chromosomal pseudomolecules, including the X and Y sex chromosomes. The mitochondrial genome has also been assembled and is 16.14 kilobases in length. Gene annotation of this assembly on Ensembl identified 18,973 protein coding genes.


Background
Drosophila phalerata Meigen,1830 is a small to medium sized (2.2-3.0 mm) yellow-brown drosophilid 'fruit fly' (Figure 1A-D), distantly related to the laboratory model Drosophila melanogaster.It is one of around 30 British and Irish species of Drosophila (Chandler, 2023) and, like most its close relatives, it is a specialist fungus breeder (Scott Chialvo et al., 2019).Drosophila phalerata is among the most abundant fungus-breeding drosophilids in the UK (Shorrocks, 1977), and is not thought to be threatened.The adults are easily collected or reared from many fungi, including Phallus impudicus, Polyporus squamosus, Amanita rubescens, Pluteus cevinus and Tricholomopsis platyphylla (Charlesworth & Shorrocks, 1980;Shorrocks & Charlesworth, 1980).They are also attracted to decaying plant material (Offenberger & Klarenberg, 1992) and can be collected in smaller numbers from yeasted fruit bait (e.g.Basden, 1955).This species is broadly distributed in wooded areas across Europe, from the Azores and Madeira in the south and west, to Finland in the North, and Iran in the east (Bächli, 2023).In the UK, the adults are most active from May to October (GBIF Secretariat, 2023), but can be caught at any time of year (Basden, 1955).Egg to adult development time is thought to be around 25 to 32 days in the field, and there are likely to be 4 to 5 generations in a season -although larval development is faster at higher temperatures (Driessen et al., 1989).Adults that eclose late in the year enter a daylength-induced diapause over winter, and emerge in the following spring (Muona & Lumme, 1981).Drosophila phalerata has been studied for both its courtship behaviour (e.g.Hedlund et al., 1996;Neems et al., 1997), and its interaction with parasitoids and pathogens (Bozler et al., 2017;Driessen et al., 1989;Gillis & Hardy, 1997;Pannebakker et al., 2008).The D. phalerata genome was first sequenced, but not assembled, in 2019 using short-read data for a study of immune gene evolution (Hill et al., 2019).
Here we present a chromosomally complete genome sequence for Drosophila phalerata, derived from the DNA of two male offspring from a wild female collected on a sulphur tuft fungus (Hypholoma fasciculare) on the Penns in the Rocks estate, East Sussex, as part of the Darwin Tree of Life Project.This genome sequence is proving useful for resolving relationships among the Drosophilidae (Kim et al., 2023), and will further build on the value of this family as a model clade for comparative genomics and molecular evolution.This project is a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.

Genome sequence report
The genome was sequenced from one male Drosophila phalerata (Figure 1 10 haplotypic duplications, reducing the assembly length by 1.84% and the scaffold number by 16.11%, and increasing the scaffold N50 by 7.91%. The final assembly has a total length of 223.9 Mb in 525 sequence scaffolds with a scaffold N50 of 36.5 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (96.01%) of the assembly sequence was assigned to 7 chromosomal-level scaffolds, representing 5 autosomes and the X and Y sex chromosomes.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).The X chromosome was assigned by half coverage and synteny to Drosophila innubila (GCA_004354385.2) (Hill et al., 2019).Chromosome Y was identified by half-coverage.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.
The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
The Drosophila phalerata specimens were first-generation male progeny from a wild-collected female.The sequenced flies were reared on laboratory Drosophila medium supplemented with ~2cm 3 of commercial mushroom (Agaricus bisporus)     (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.
Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in the default protein mode to generate annotation for the Drosophila phalerata assembly (GCA_951394115.1) in Ensembl Rapid Release.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material This genome resource is excellent from the summary statistics, with high BUSCO numbers and sequence continuity, and majority of sequences contained on the 5 pseudochromosomes (plus two sex chromosomes and mitochondrion).All in all, this is a valuable contribution by the Darwin Tree of Life.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: I have published with Peter Holland more than three years ago, and confirm that this potential conflict of interest did not affect my ability to write an objective and unbiased review of the article.
Reviewer Expertise: Genomics, evolution, invertebrates I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Ravikumar Dodiya
Sardarkrushinagar Dantiwada Agricultural University, Sardarkrushinagar, Dantiwada, India The genomic sequencing and annotation of Drosophila phalerata, a fungus-breeding fruit fly species, have been meticulously detailed in the provided report.Leveraging advanced sequencing technologies, the genome was assembled to a high standard, encompassing 223.9 Mb across 525 sequence scaffolds.Notably, chromosome-level scaffolds were identified, with a significant portion of the assembly assigned to seven chromosomal-level structures, including autosomes and sex chromosomes.Annotation efforts revealed a rich genetic landscape, comprising 18,973 protein-coding genes and 19,808 transcribed mRNAs.The process involved meticulous sample acquisition, laboratory rearing, and DNA extraction, followed by thorough sequencing protocols.
To ensure accuracy and completeness, the assembly underwent extensive curation and evaluation, employing bioinformatic tools and high-quality standards.Overall, this comprehensive genomic analysis significantly enhances our understanding of Drosophila phalerata, shedding light on its evolutionary context within the Drosophilidae family and providing a valuable resource for further research in molecular evolution and comparative genomics.
Is the rationale for creating the dataset(s) clearly described?

Manee M Manee
King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia Obbard et al. generated a complete genome assembly of the drosophilid fruit fly, Drosophila phalerata, derived from a male specimen.The assembly is 223.9Mb in size, with most of the genome organized into seven chromosomal pseudomolecules, including the sex chromosomes X and Y, and a fully assembled mitochondrial genome.The annotation process identified 18,973 protein-coding genes.This research enhances our understanding of the genetic structure and evolutionary biology of Drosophila phalerata and provides a valuable resource for further comparative genomic studies within the Drosophilidae family.Here are minor suggestions: 1-I would ask the authors to clarify why the accession number in the text is version 1 (GCA_951394115.1)and in the Figure 1.A: Male (above) and female (below) Drosophila phalerata presented with a 3 mm scale bar.B and C: Male and female Drosophila phalerata, respectively, photographed on oyster mushrooms (Pleurotus spp.) in the Hermitage of Braid, Edinburgh, Scotland.D: The four lab-reared brothers selected for sequencing.Sample SAMEA12110456 (left) was used for HiC sequencing, and sample SAMEA12110457 (second left) was used for PacBio sequencing.E: The sulphur tuft fungus (Hypholoma fasciculare) from which the mother of the sequenced flies was collected (Penns in the Rocks Estate, East Sussex, England; 51.093N,0.1698E).

Figure 2 .
Figure 2. Genome assembly of Drosophila phalerata, idDroPhal2.2:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 223,932,721 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (77,813,368 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (61,822,819 and 962,970 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CATOAP01/dataset/CATOAP01/snail.

Figure 3 .
Figure 3. Genome assembly of Drosophila phalerata, idDroPhal2.2:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CATOAP01/dataset/CATOAP01/blob.

Figure 4 .
Figure 4. Genome assembly of Drosophila phalerata, idDroPhal2.2:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CATOAP01/dataset/CATOAP01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Drosophila phalerata, idDroPhal2.2:Hi-C contact map of the idDroPhal2.2assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=I0rbqGqnTLC2Q5kb_VqYOg.

Reviewer Report 24
May 2024 https://doi.org/10.21956/wellcomeopenres.22837.r83128© 2024 Ishengoma E. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Edson IshengomaUniversity of Dar es Salaam, Dar es Salaam, Dar es Salaam, Tanzania This manuscript presents the assembled genome of Drosophila phalerata, a fruit-fly species that feeds on mushrooms.As usual, Drosophilid species have four pairs of chromosomes including the sex chromosomes, X and Y in males.The 7 chromosomal psuedomolecules including the X and Y pair that have been scaffolded during this study reflect the typical karyotype of Drosophilids.~19K proteins that are extracted from this genome will obviously advance the ecology, evolution and functional diversity of Drosophila species.Is the rationale for creating the dataset(s) clearly described?YesAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?YesAre the datasets clearly presented in a useable and accessible format?YesCompeting Interests: No competing interests were disclosed.Reviewer Expertise: Genomics, Bioinformatics, Evolution I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Reviewer Report 17 May 2024 https://doi.org/10.21956/wellcomeopenres.22837.r83125© 2024 Dodiya R.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table 2 . Chromosomal pseudomolecules in the genome assembly of Drosophila phalerata, idDroPhal2. INSDC accession Chromosome Length (Mb) GC%
Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.

Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
table version 2 (GCA_951394115.2).It seems they updated to version 2, and they should specify which version they are reporting.Please check the parameters in Table 1 against the numbers reported in https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_951394115.2/2-I would suggest including in the background, or any appropriate section of the manuscript, why the study of Drosophila phalerata's genome is important.For instance, mention any specific applications of this research in evolutionary biology or other fields.No competing interests were disclosed.