The genome sequence of the yellow-legged black legionnaire, Beris chalybata (Forster, 1771)

We present a genome assembly from an individual male Beris chalybata (the yellow-legged black legionnaire; Arthropoda; Insecta; Diptera; Stratiomyidae). The genome sequence is 541.9 megabases in span. Most of the assembly is scaffolded into 6 chromosomal pseudomolecules, including the X and Y sex chromosomes. The mitochondrial genome has also been assembled and is 16.8 kilobases in length. Gene annotation of this assembly on Ensembl identified 17,511 protein coding genes.


Amendments from Version 1
We have added new information about sequencing of an RNA specimen within this BioProject -this has been added to the methods section under specimen collection, RNA extraction and sequencing.We have corrected the tissue used for Hi-C sequencing.An expanded section has been added to Table 1 to give the details of sequencing for the DNA PacBio, Illumina Hi-C and RNA sequencing.
Any further responses from the reviewers can be found at the end of the article
The Beris chalybata reference genome was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.This article presents a chromosomally complete genome sequence for B. chalybata based on one male specimen from Wytham Woods, Oxfordshire, UK.The high-quality data will provide a genomic foundation for analysing the biodiversity, molecular adaptations and evolutionary history of this species.

Genome sequence report
The genome was sequenced from one female Beris chalybata (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.78, -1.34).A total of 38-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data, which produced 108.85 Gbp from 720.83 million reads, yielding an approximate coverage of 201-fold.Manual assembly curation corrected 30 missing joins or mis-joins, reducing the scaffold number by 58.62%, and increasing the scaffold N50 by 68.03%.
The final assembly has a total length of 541.9 Mb in 11 sequence scaffolds with a scaffold N50 of 142.4 Mb (Table 1).The snail plot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.98%) of the assembly sequence was assigned to 6 chromosomal-level scaffolds, representing 4 autosomes and the X and Y sex chromosomes.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).The region of the X chromosome from ~1.5-5 Mbp is of uncertain order and orientation.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
Two Beris chalybata specimens, used for DNA (specimen ID Ox002131, ToLID idBerChal2) and RNA (specimen ID Ox002132, ToLID idBerChal3) sequencing were netted in Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.78, longitude -1.34) on 2022-04-28.The specimen used for Hi-C sequencing (specimen ID Ox001354, ToLID idBerChal1) was netted in Dry Sandford Pit, Oxfordshire, UK (latitude 51.7, longitude -1.32) on 2021-05-10.The specimens were collected and identified by Liam Crowley (University of Oxford) and preserved on dry ice.RNA was extracted from whole organism tissue of idBerChal3 in the Tree of Life Laboratory at the WSI using the RNA Extraction: Automated MagMax™ mirVana protocol (do Amaral et al., 2023).The RNA concentration was assessed using a Nanodrop spectrophotometer and a Qubit Fluorometer using the Qubit RNA Broad-Range Assay kit.Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.
Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in the default protein mode to generate annotation for the Beris chalybata assembly (GCA_949128065.1) in Ensembl Rapid Release at the EBI.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Frederick Jaya
Sydney Informatics Hub, The University of Sydney (Ringgold ID: 4334), Sydney, New South Wales, Australia This paper presents a high quality genome assembly of Beris chalybata.The paper is well contextualised, outlining the defining characteristics of the species, outlining previous sequencing of closely related taxa, and how this paper fits within the broader scope of the ToL project.
The authors generated high coverage PacBio long reads and Hi-C sequencing to generate a complete, high quality assembly following best practices.
I appreciate the inclusion of the manual curation outcomes, and the note on uncertainty regarding the X chromosome's orientation -this information will be particularly useful for researchers working with this resource.
While high coverage RNA-seq data was generated, it is unclear how this data was used.The authors should explicitly describe the role of RNA-seq in the genome annotation.
Overall, I have no major concerns with the paper and commend the availability of versioned pipelines, software, and metadata for ensuring reproducibility.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?
Is the rationale for creating the dataset(s) clearly described?

Levente Laczkó
University of Debrecen, Debrecen,, Hungary The authors present the high-quality genome assembly of Beris chalybata, in which multiple sequencing approaches were used to reconstruct the complete genome of the species.The background is well described.I agree that genomic resources such as that presented in this article support the in-depth study of species.The sequencing and data analysis methods are appropriate.The primary assembly has high contiguity and the assembly was finalized using Hi-C contact maps, also identifying potential contaminants.The final assembly shows high completeness.
The Sequencing part has a potential error.I assume that "$HIC_TISSUE" refers to the tissue type used for the experiments, but perhaps it was not interpreted during file conversion(?).
The methodology is detailed and reproducible, except for the genome annotation.I suggest detailing the gene prediction steps as they could have a large impact on the results.Please also indicate whether soft-masking was used and describe the reference dataset for evidence-based prediction (if it was used).I also suggest using BUSCO in proteome mode and/or OMArk to assess the completeness of the genome annotation.The metadata of the samples are very well described, which facilitates the reusability of the data.
Apart from what I have described above, I have no concerns about this article.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: genomics, bioinformatics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Figure 2 .
Figure 2. Genome assembly of Beris chalybata, idBerChal2.1:metrics.The BlobToolKit snail plot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 541,871,008 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (148,233,473 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths(142,415,298 and 114,440,707 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idBerChal2_1/dataset/idBerChal2_1/snail.

Figure 3 .
Figure 3. Genome assembly of Beris chalybata, idBerChal2.1:BlobToolKit GC-coverage plot.Sequences are coloured by phylum.Circles are sized in proportion to sequence length.Histograms show the distribution of sequence length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idBerChal2_1/dataset/idBerChal2_1/blob.

Figure 4 .
Figure 4. Genome assembly of Beris chalybata, idBerChal2.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all sequences.Coloured lines show cumulative lengths of sequences assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idBerChal2_1/dataset/idBerChal2_1/cumulative.

Figure 5 .
Figure 5. Genome assembly of Beris chalybata, idBerChal2.1:Hi-C contact map of the idBerChal2.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=L6wMorrpTcCCGJJUm_wMKg.

Table 3 . Software tools: versions and sources. Software tool Version Table 2. Chromosomal pseudomolecules in the genome assembly of Beris chalybata, idBerChal2. INSDC accession Chromosome Length (Mb) GC%
Protocols developed by the WSI Tree of Life laboratory are publicly available on protocols.io(Dentonetal., 2023b).

Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Version 1
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.