The genome of Roselle's flesh fly Sarcophaga ( Helicophagella) rosellei (Böttcher, 1912)

We present a genome assembly from an individual male Sarcophaga rosellei (Roselle's flesh fly; Arthropoda; Insecta; Diptera; Sarcophagidae). The genome sequence is 541 megabases in span. Most of the assembly is scaffolded into six chromosomal pseudomolecules, with the X sex chromosome assembled. The mitochondrial genome has also been assembled and is 19.5 kilobases in length. Gene annotation of this assembly on Ensembl has identified 15,437 protein coding genes.


Background
Roselle's flesh fly (Sarcophaga rosellei) is a medium-sized (6.5-11 mm) (van Emden, 1954) flesh fly with a Palearctic distribution (Pape, 1996).The species was named by Böttcher in 1912 in honour of Dr du Roselle, who produced the first illustrations of male Sarcophagid genitalia (Böttcher, 1912;Senior-White, 1924).As with other members of the genus, S. rosellei has an overall grey/black colouration, with large red or orange eyes, three longitudinal stripes on the thorax, and a checked abdomen.S. rosellei is found across England and Wales, where it is most common from May to August, but is scarce in Scotland (NBN Atlas, Accession number NBNSYS0000156291).
The genus Sarcophaga contains roughly 890 species divided into around 69 subgenera (Buenaventura et al., 2017), and S. rosellei is placed in the Helicophagella subgenus, along with four other UK Sarcophagid species (S. agnata; S. crassimargo; S. hirticrus; S. melanura).Helicophagella is probably not monophyletic (Buenaventura & Pape, 2017), and is split into two subgroups: the noverca group and the melanura group, roughly along dietary lines, with melanura group members breeding in faeces and noverca group members breeding in snails (Blackith et al., 1997).S. rosellei is a member of the noverca group and has been recorded as preying on snails, a relationship that may explain the association of S. rosellei with calcareous soils (Blackith et al., 1997;Rozkošný & Vanhara, 1993).The S. rosellei genome assembly, together with those of other Sarcophaga species from the Darwin Tree of Life Project and elsewhere, is likely to be of great benefit to resolving the phylogeny of the genus and identifying patterns of dietary shifts.

Genome sequence report
The genome was sequenced from one male Sarcophaga rosellei specimen (Figure 1) collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.77, longitude -1.33).A total of 54-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 65-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 19 missing joins or mis-joins and removed three haplotypic duplications, reducing the scaffold number by 7.14%, and increasing the scaffold N50 by 2.23%.
The final assembly has a total length of 541.4 Mb in 169 sequence scaffolds with a scaffold N50 of 101.2 Mb (Table 1).Most (98.81%) of the assembly sequence was assigned to six chromosomal-level scaffolds, representing five autosomes and the X sex chromosome.Chromosome-scale scaffolds are named by synteny based on the genome assembly of Sarcophaga caerulescens GCA_927399465.1 (Figure 2-Figure 5; Table 2).The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 99.0% (single 98.3%, duplicated 0.7%) using the OrthoDB v10 Diptera reference set.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
A male Sarcophaga rosellei specimen (idSarRose1) was collected using a net in Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.77, longitude -1.33) on 4 August 2020.The specimen was collected and identified by Steven Falk (independent researcher).The specimen was snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The idSarRose1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Thorax tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.Low molecular weight DNA was removed from a 20-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing.HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible RNA was extracted from abdomen tissue of idSarRose1 in the Tree of Life Laboratory at the WSI using TRIzol, according  to the manufacturer's instructions.RNA was eluted in 50 μL RNAse-free water and its concentration was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing was performed by the  et al., 2020).Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the S. rosellei assembly GCA_930367235.1.Annotation was created primarily through alignment of transcriptomic data to the genome, with gap Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi), Illumina NovaSeq 6000 (RNA-Seq and 10X) instruments.Hi-C data were also generated from head tissue of idSarRose1 using the Arima v2 kit and sequenced on the Illumina NovaSeq 6000 instrument.
filling via protein to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Ethics/compliance issues
The materials that have contributed to this genome     Reviewer Expertise: Taxonomy, phylogeny, systematics and evolution of calyptrate flies, including morphology, anatomy and ecology.

Data availability
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Sarcophaga rosellei, idSarRose1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 541,393,943 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (121,533,712 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (101,196,005 and 86,778,848 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idSarRose1.1/dataset/CAKNFA01/snail.

Figure 5 .
Figure 5. Genome assembly of Sarcophaga rosellei, idSarRose1.1:Hi-C contact map.Hi-C contact map of the idSarRose1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=MIvbf0y8SPi6oXdSK5TiJA.
note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.
Fan SongDepartment of Entomology and MOA Key Lab of Pest Monitoring and Green Management College of Plant Protection, China Agricultural University, Beijing, China This article presented a genome assembly of Roselle's flesh fly Sarcophaga rosellei from an individual male adult.This manuscript is generally well-written and described this genome clearly.Two minor adjustments should be addressed.It is suggested that the horizontal and vertical coordinates of Figure5could have a label showing the size of the genome and indicating which is the X chromosome.

○
It is recommended to add annotation of repeat sequences of the genome.○Isthe rationale for creating the dataset(s) clearly described?YesAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?YesAre the datasets clearly presented in a useable and accessible format?YesCompeting Interests: No competing interests were disclosed.Reviewer Expertise: Insect Genome; Phylogenomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Is the rationale for creating the dataset(s) clearly described?YesAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?YesAre the datasets clearly presented in a useable and accessible format?YesCompeting Interests: No competing interests were disclosed.

Table 3 . Software tools and versions used.
The genome sequence is released openly for reuse.The Sarcophaga rosellei genome sequencing initiative is part of the Darwin Tree of Life (DToL) project.All raw sequence data and the assembly have been deposited in INSDC databases.Raw data and assembly accession identifiers are reported in Table1.

Open Peer Review Current Peer Review Status: Version 1
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.