The genome sequence of a soldier beetle, Cantharis flavilabris Fallén, 1807

We present a genome assembly from an individual female Cantharis flavilabris (soldier beetle; Arthropoda; Insecta; Coleoptera; Cantharidae). The genome sequence is 348.3 megabases in span. Most of the assembly is scaffolded into 7 chromosomal pseudomolecules, including the X sex chromosome. The mitochondrial genome has also been assembled and is 17.5 kilobases in length. Gene annotation of this assembly on Ensembl identified 22,711 protein coding genes.


Background
The genome of the soldier beetle, Cantharis flavilabris, was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.Here we present a chromosomally complete genome sequence for Cantharis flavilabris, based on one female specimen from Leith Hill, England, UK.

Genome sequence report
The genome was sequenced from one female Cantharis flavilabris (Figure 1) collected from Leith Hill, England, UK (51.18, -0.37).A total of 74-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 16 missing joins or mis-joins and removed 5 haplotypic duplications, reducing the assembly length by 0.60% and the scaffold number by 6.90%, and increasing the scaffold N50 by 11.36%.
The final assembly has a total length of 348.3 Mb in 26 sequence scaffolds with a scaffold N50 of 47.5 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.8%) of the assembly sequence was assigned to 7 chromosomal-level scaffolds, representing 6 autosomes and the X sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).Chromosome X was assigned by synteny to Cantharis rufa (GCA_947369205.1)(Sivell et al., 2023) and Cantharis rustica (GCA_911387805.1)(Sivell et al., 2021).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
A female Cantharis flavilabris (specimen ID NHMUK014439758, ToLID icCanFlav1) was collected from Leith Hill, England, UK (latitude 51.18, longitude -0.37) on 2021-06-20.The specimen was collected and identified by Maxwell Barclay   HMW DNA was extracted using the Automated MagAttract v1 protocol (Oatley et al., 2023a).HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30 (Todorovic et al., 2023).Sheared DNA was purified by solid-phase reversible immobilisation (Oatley et al., 2023b): in brief, the method employs a 1.8X ratio of AMPure PB beads to sample to eliminate shorter fragments and concentrate the DNA.The concentration of the sheared

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in the default protein mode to generate annotation for the Cantharis flavilabris assembly (GCA_949152465.1) in Ensembl Rapid Release.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.
The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material  Barclay and colleagues present a genome assembly for the soldier beetle Cantharis flavilabris.
Great care was taken in identifying an individual of the target species and producing this assembly.
The authors used cutting edge data and pipelines to produce an excellent chromosomal level assembly.This ressource will be of great use to perform genomic work in this species.
I therefore recommend this article.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Evolutionary genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.It is dissapointing to see no background included.This does not provide any detail on this species or the biological context of this assembly.
The methods are cutting edge in line with other DToL genomes.The Hi-C protocol is not repeatable, and as such I cannot recommend this manuscript.
Is the rationale for creating the dataset(s) clearly described?

Yangzi Wang
University of Mainz, Mainz, Germany The manuscript titled "The genome sequence of a soldier beetle, Cantharis flavilabris Fallén, 1807" presents a high-quality genome assembly for the soldier beetle, Cantharis flavilabris.The authors have employed a combination of robust methodological frameworks that include third-generation long-read sequencing technology, Hi-C, rigorous data curation, and reliably automated gene annotation processes.Below is a detailed evaluation of the manuscript.
The project demonstrates a high level of professionalism.The genome data has been deposited in NCBI and EMBL, ensuring transparency and easy accessibility.The accessibility of the genome assembly and the raw data has been verified, confirming that it matches the descriptions provided in the manuscript.The authors have utilized a combination of PacBio HiFi long reads and Hi-C data to achieve a high-quality assembly.Manual curation further improved the assembly by correcting missing joins and removing haplotypic duplications.This approach has resulted in a polished genome sequence.The evaluation shows the genome assembly presented is of high quality, with several key measures highlighting its completeness and accuracy: 348.3 Mb genome span; 47.5 Mb Scaffold N50; Chromosome-level assembly; 98.7% BUSCO Completeness.These data suggest that the assembly is complete and highly contiguous, providing a reliable reference genome for further studies.In addition to the nuclear genome, the mitochondrial genome was also assembled, measuring 17.5 kb in length.This presents another valuable genetic dataset for the species.The Ensembl rapid annotation pipeline was employed to annotate the genome, resulting in the identification of 22,711 protein-coding genes.This automated process ensures consistency and accuracy in gene prediction, which paves the way for further functional genomics studies.Furthermore, the manuscript includes detailed figures illustrating the genome assembly statistics, scaffold distribution, and other key aspects.
There is a minor error in the manuscript: the raw data identifiers ERR10809391 and ERR10802468 appear to be from NCBI SRA, not INSDC, as mentioned.I only found the data using these accession numbers on NCBI.Please correct this in the manuscript.Additionally, I recommend including both accession numbers in the Data Availability section to facilitate quick data access for readers.
In conclusion, this manuscript represents a valuable genomic dataset of the soldier beetle for the research community.The authors have produced a high-quality genome assembly for the soldier beetle using cutting-edge sequencing technologies and stringent data curation.The availability of this genome assembly in public databases, along with the comprehensive methodological details provided, ensures that this work will be a valuable resource for future research.The authors are to be commended for their professionalism and the high standards of their work.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Genomics, population genomics, epigenomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Cantharis flavilabris, icCanFlav1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 348,304,867 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (106,117,081 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (47,548,398 and 33,898,588 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the endopterygota_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CASCJT01/dataset/CASCJT01/snail.

Figure 3 .
Figure 3. Genome assembly of Cantharis flavilabris, icCanFlav1.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CASCJT01/dataset/CASCJT01/blob.

Figure 4 .
Figure 4. Genome assembly of Cantharis flavilabris, icCanFlav1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CASCJT01/dataset/CASCJT01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Cantharis flavilabris, icCanFlav1.1:Hi-C contact map of the icCanFlav1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=Kkxirt3sTYK9ripEyx-0Yw.
note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.

Reviewer Report 06
August 2024 https://doi.org/10.21956/wellcomeopenres.24701.r87914© 2024 Nash W. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Will Nash Research Faculty, Earlham Institute, Norwich Research Park, Norwich, England, UK The genome of a soldier beetle is presented.It is sequenced from 74x of PacBioHifi and scaffolded with Hi-C.The assembly is 348.3Mb over 26 scaffolds with an N50 of 47.5Mb.23,081 genes are annotated.

Table 3
contains a list of relevant software tool versions and sources.

Table 3 . Software tools: versions and sources. Software tool Version Romain Villoutreix
Universite de Montpellier, Montpellier, Occitanie, France

Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? No Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.