The genome sequence of the Kite-tailed Robberfly, Machimus atricapillus (Fallén, 1814)

We present a genome assembly from an individual male Machimus atricapillus (the Kite-tailed Robberfly; Arthropoda; Insecta; Diptera; Asilidae). The genome sequence is 268.6 megabases in span. Most of the assembly is scaffolded into six chromosomal pseudomolecules, including the X and Y sex chromosomes. The mitochondrial genome has also been assembled and is 16.3 kilobases in length. Gene annotation of this assembly on Ensembl identified 10,978 protein coding genes.


Background
Machimus atricapillus (Fallén, 1814) is a medium-sized, greybrown robberfly (Asilidae) with a body length of 11 to 18 mm (Stubbs & Drake, 2014;van den Broek & Schulten, 2017;Wolff et al., 2018).It is one of a group of several similar Machimus (Loew, 1849) species.Males can be easily separated from all other UK Asilidae by the presence of a projection on the hind edge of sternite eight which is clothed with black hairs and normally indented to form a shape similar to the tail of a kite (Milvus Lacépède, 1799) (Stubbs & Drake, 2014), giving it the common name of Kite-tailed Robberfly.Females are more difficult to identify but can be distinguished from similar UK Machimus species based on a combination of features including leg colour and the hairing of the sternites (Smart, 2005).
Larvae of other Machimus species are known to be soil dwelling and feed on the larvae of beetles (Coleoptera) in the families Scarabaeidae, Chrysomelidae, and Curculionidae (Musso, 1983).Adults are predaceous on a range of insects, with most prey consisting of other Diptera (Parmenter, 1942).In the UK the adult flight period spans from May to October with a peak in July and August (Stubbs & Drake, 2014).Adults are often found sunning themselves on vantage points such as fence posts and tree trunks or foliage or sitting on bare ground (Stubbs & Drake, 2014).
Machimus atricapillus is widely distributed across southern Britain in open habitats with dry soils but becomes rarer in the north and has not been recorded from Ireland or most of Scotland (Chandler, 2022;Harvey, 2018;Stubbs & Drake, 2014).M. atricapillus is widespread in Europe though absent from the far north (Lehr, 1988) and is found widely though Russia as far east as Sakhalin Island (Astakhov et al., 2019) and in Iran (Ghahari et al., 2014).
The high-quality genome sequence described here is the first one reported for M. atricapillus, to our knowledge, and has been generated as part of the Darwin Tree of Life project.It will aid future research on the taxonomy of the genus Machimus and the phylogeny of the wider Asilidae as well as contributing to our understanding of the biology, physiology and ecology of M. atricapillus.

Genome sequence report
The genome was sequenced from one male Machimus atricapillus specimen (Figure 1) collected from Hartslock Reserve, Oxfordshire (latitude 51.511263, longitude -1.112222).A total of 47-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 17 missing or mis-joins, reducing the scaffold number by 9.68%, and increasing the scaffold N50 by 5.34%.
The final assembly has a total length of 268.6 Mb in 28 sequence scaffolds with a scaffold N50 of 54.6 Mb (Table 1).Most (99.99%) of the assembly sequence was assigned to six chromosomal-level scaffolds, representing four autosomes and the X and Y sex chromosomes.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 96.0% (single 94.7%, duplicated 1.4%) using the diptera_odb10 reference set.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
A male Machimus atricapillus specimen (idMacAtri3) was collected from Hartslock Reserve, Oxfordshire (latitude 51.511263, longitude -1.112222) on 20 August 2020, using an aerial net.RNA was extracted from head and thorax tissue of idMacAtri1 and idMacAtri4 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing were performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and Illumina NovaSeq 6000 (RNA-Seq) instruments.Hi-C data were also generated from head and thorax tissue of idMacAtri3 using the Arima v2 kit and sequenced on the Illumina NovaSeq 6000 instrument.et al., 2021).Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2022), which performed annotation using MitoFinder (Allio et al., 2020).The genome was analysed and BUSCO scores were generated within the BlobToolKit environment (Challis et al., 2020).Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the M. atricapillus assembly (GCA_933228815.1).Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Ethics and compliance issues
The materials that have contributed to this genome This is a high-quality chromosome level assembly of a new species the Kite-tailed Robberfly, Machimus atricapillus that will benefit the whole community.More details may be provided on the scaffolding (e.g.The scaffolding is lacking details on the chromosome boundaries, and the middle part is poorly connected, so it is hard to assess the quality of the scaffolding of these areas.To which chromosome does it belong?).You could also elaborate on how X and Y were identified.
Providing the busco score for the gene's annotations would be great.More details on the gene annotation pipeline would be useful.Moreover, repeated elements were certainly identified, and some quantitative numbers could be provided.
Is the rationale for creating the dataset(s) clearly described?Yes.It will be useful for ecology, evolution, and taxonomic questions. 1.
Are the protocols appropriate and is the work technically sound?Yes. 2.
Are sufficient details of methods and materials provided to allow replication by others?Yes.However, more details would be welcome regarding whether any custom parameters were used or only default parameters were used for all software (HiFiasm, YaHs, Long-Ranger, Freebayes, MitoHiFi, etc).Is the number of chromosomes identified in agreement with expectations from this genus?

3.
Are the datasets clearly presented in a usable and accessible format?Yes. 4.

Kuppusamy Sivasankaran
Division of Taxonomy and Biodiversity, Entomology Research Institute, Loyola College, Chennai, Tamil Nadu, India I appreciate the authors for assembling the Machimus atricapillus (Fallén, 1814) whole genome sequence.The authors have used the proper software for assembly and annotation of the genome.In genome assembly, the authors identified 10,978 protein-coding genes and 694 noncoding genes.

Minor clarification:
Authors have not written the species name in italics in the abstract.It can be written in italics.

○
Mitochondrial genome length 16, 324 bp not mentioned in the text.The mitogenome genome length can be included in the text.

○
Page no 3: The last paragraph 5 th line "and the phylogeny of the wider" was given in italics.
The sentence can be changed to the regular font.

○
Page no 5: The first paragraph starts, "The specimen was collected and identified…" The sentence can be modified as: "The collected specimen was identified by Sam Thomas (Natural History Museum) and preserved in liquid nitrogen." ○ Overall, the manuscript can be accepted for indexing.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Phylogenetic analysis of moths using mitogenomes I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Minor comment: "and the phylogeny of the wider" in the last paragraph of the Background section needs to be de-italicized.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Alex Makunin
Wellcome Sanger Institute, Hinxton, UK The manuscript by Thomas et al. presents high-quality genome assembly for Kite-tailed Robberfly, Machimus atricapillus.The assembly results are presented in a concise yet detailed fashion.Gene annotation is presented as a very high level summary in both results and methods.
Small notes: Percentage of assembly sequence assigned to six chromosomal-level scaffolds (99.99%) does not seem to match Table 2 with 3.78Mbp unplaced scaffolds ○ From Hi-C figure, it seems that chromosomes were oriented as short arm -centromerelong arm.If that is the case, can this be indicated in the genome report?Reviewer Expertise: comparative genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Machimus atricapillus, idMacAtri3.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 268,644,068 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (83,901,797 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (54,569,305 and 29,817,269 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idMacAtri3.1/dataset/CAKOGC01/snail.

Figure 5 .
Figure 5. Genome assembly of Machimus atricapillus, idMacAtri3.1:Hi-C contact map of the idMacAtri3.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=fnEw__WoTp6esZWuwMdF5w.
note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.All efforts are undertaken to minimise the suffering of animals used for sequencing.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.
Are the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?YesAre the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.Reviewer Expertise: population genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Reviewer Report 02 November 2023 https://doi.org/10.21956/wellcomeopenres.21155.r67245© 2023 Sivasankaran K.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

○Figure 5
Figure5would benefit from a grid separating chromosomes -at the moment it is unclear if the small block with small affinity with other genomic regions belongs to chrX or chr2.○

Table 1 . Genome data for Machimus atricapillus, idMacAtri3.1. Project accession data
C:96.0%[S:94.7%,D:1.4%],F:1.2%,M:2.7%,n:3,285C ≥ 95% Percentage of assembly mapped to chromosomes 99.99% ≥ 95% Sex chromosomes X and Y chromosomes localised homologous pairs Organelles Mitochondrial genome assembled complete single alleles Raw data accessions * Assembly metric benchmarks are adapted from column VGP-2020 of "Table 1: Proposed standards and metrics for defining genome assembly quality" from (Rhie et al., 2021).** BUSCO scores based on the diptera_odb10 BUSCO set using v5.3.2.C = complete [S = single copy, D = duplicated], F = fragmented, M = missing, n = number of orthologues in comparison.A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/idMacAtri3.1/dataset/CAKOGC01/busco. The specimen was collected and identified by Sam Thomas (Natural History Museum) and preserved in liquid nitrogen.A second M. atricapillus specimen (idMacAtri1) was collected by Ryan Mitchell (Natural History Museum) from Hartslock Reserve, Oxfordshire (latitude 51.511263, longitude -1.112222) on 20 August 2020, using an aerial net.This specimen was used for RNA sequencing.A third M. atricapillus specimen (idMacAtri4) was collected by Liam Crowley (University of DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The idMacAtri3 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Abdomen tissue was disrupted using a Nippi Powermasher fitted