The genome sequence of the 6-spot burnet, Zygaena filipendulae (Linnaeus, 1758)

We present a genome assembly from an individual female Zygaena filipendulae (6-spot burnet; Arthropoda; Insecta; Lepidoptera; Zygaenidae). The genome sequence is 365.9 megabases in span. The majority of the assembly (99.99%) is scaffolded into 31 chromosomal pseudomolecules, with the W and Z sex chromosomes assembled. The complete mitochondrial genome was also assembled and is 15.6 kilobases in length. Gene annotation of this assembly on Ensembl has identified 12,493 protein coding genes.


Background
The six-spot burnet moth, Zygaena filipendulae (Linnaeus, 1758) is an aposematic, chemically defended, day-flying moth in the family Zygaenidae with a distribution that ranges across Europe.There are 98 described species of burnet moths in Zygaena (Hofmann & Gerald Tremewan, 2005).Some Zygaena species have become model organisms to study the evolution of chemical defence compounds (Zagrobelny et al., 2019).Forewings of Z. filipendulae are black and distinctively marked with six red spots.This species can biosynthesize cyanogenic glucosides de novo, or obtain them from Fabaceae host plants, storing cyanoglucosides in cuticular cavities and hemolymph, for later use as a defensive secretion (Franzl et al., 1986).The three enzymes involved in the evolution of biosynthesis in Z. filipendulae are cytochrome CYP405A2, CYP332A3, and glucosyl transferase UGT33A1 (Zagrobelny et al., 2019).A genome of Z. filipendulae is much needed especially in order to understand the genetics of cyanogenic glucoside biosynthesis.

Genome sequence report
The genome was sequenced from a single female Z. filipendulae collected from Ant Hills region, Wytham, Berkshire, UK (Figure 1).A total of 58-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 92-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 3 missing/misjoins and removed 0 haplotypic duplications, reducing the assembly size by 0.004% and the scaffold number by 8.33% and the scaffold N50 remained the same.
The final assembly has a total length of 365.9 Mb in 55 sequence scaffolds with a scaffold N50 of 12.6 Mb (Table 1).The majority, 99.99%, of the assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 29 autosomes (numbered by sequence length) and the W and Z sex chromosomes (Figure 2-Figure 5; Table 2).
The assembly has a BUSCO v5.2.2 (Manni et al., 2021) completeness of 97.8% (single 97.3%, duplicated 0.5%) using the lepidoptera_odb10 reference set (n=954).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
Two Z. filipendulae specimens (ilZygFili1, genome assembly; and ilZygFili3, RNA-Seq) were collected using a net from Ant Hills region and Wytham woods, Wytham, Berkshire, UK (latitude 51.765, longitude -1.327) by Douglas Boyes (University of Oxford).The specimens were identified by Douglas Boyes and snap-frozen on dry ice.A further Z.filipendulae specimen (ilZygFili2, Hi-C) was collected using a net from Wytham woods, Berkshire, UK (latitude 51.771, longitude -1.338) by Liam Crowley (University of Oxford).The specimen was identified by Liam Crowley and snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute.The ilZygFili1 sample was weighed and dissected on dry ice.Whole organism tissue was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts.Fragment size  The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from other abdomen tissue of ilZygFili3 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration RNA assessed using a Nanodrop spectrophotometer and Qubit Fluorometer  using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics Chromium read cloud sequencing libraries were constructed according to the manufacturers' instructions.Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II (HiFi), Illumina NovaSeq 6000 (10X) and Illumina HiSeq 4000 (RNA-Seq) instruments.Hi-C data were generated in the Tree of Life laboratory from head/thorax tissue of ilZygFili2 using the Arima v2 kit and sequenced on a NovaSeq 6000 instrument.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016)

Figure 1 .
Figure 1.Image of the Zygaena filipendulae specimen taken prior to preservation and processing.

Figure 2 .
Figure 2. Genome assembly of Zygaena filipendulae, ilZygFili1.2:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 365,946,273 bp assembly.The distribution of chromosome lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (16,101,494 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 chromosome lengths (12,640,274 and 8,250,661 bp), respectively.The pale grey spiral shows the cumulative chromosome count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilZygFili1.2/dataset/CAJRBF02/snail#Filters.

Figure 3 .
Figure 3. Genome assembly of Zygaena filipendulae, ilZygFili1.2:GC coverage.BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to chromosome length Histograms show the distribution of chromosome length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilZygFili1.2/dataset/CAJRBF02/blob#Filters.

Figure 5 .
Figure 5. Genome assembly of Zygaena filipendulae, ilZygFili1.2:Hi-C contact map.Hi-C contact map of the ilZygFili1.2assembly, visualised in HiGlass.Chromosomes are arranged in size order from left to right and top to bottom.The interactive Hi-C map can be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=Aqyc_jJbQjuSzW9eMHqPQg.

Table 2 . Chromosomal pseudomolecules in the genome assembly of Zygaena filipendulae, ilZygFili1.2. INSDC accession Chromosome Size (Mb) GC%
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.Members of the Darwin Tree of Life Barcoding collective are listed here: https://doi.org/10.5281/zenodo.6418156.Members of the Wellcome Sanger Institute Tree of Life programme are listed here: https://doi.org/10.5281/zenodo.6418327.Members of Wellcome Sanger Institute Scientific Operations: DNA Pipelines collective are listed here: https://doi.org/10.5281/zenodo.5746904.Members of the Tree of Life Core Informatics collective are listed here: https://doi.org/10.5281/zenodo.6125046.Members of the Darwin Tree of Life Consortium are listed here: https://doi.org/10.5281/zenodo.6418363.

Table 3 .
Software tools used.