The genome sequence of the Chestnut, Conistra vaccinii (Linnaeus, 1761)

We present a genome assembly from an individual male Conistra vaccinii (the Chestnut; Arthropoda; None; Lepidoptera; Noctuidae). The genome sequence is 720.8 megabases in span. Most of the assembly is scaffolded into 31 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 15.44 kilobases in length. Gene annotation of this assembly on Ensembl identified 13,109 protein coding genes.


Background
The Chestnut, Conistra vaccinii is a smallish to medium sized (ca.33-36 mm wingspan, 14-15 mm forewing length) (Bretherton et al., 1983;McMannis & Fiedler, 2019) noctuid moth, usually with a black mark in the lower part of the reniform stigma of the forewing, colouration varying from plain reddish or chestnut brown to a pattern marbled with yellower brown or grey.It is distinguished from the very similar C. ligula (Esper, 1791) by the relatively unpointed forewing apex with also a more rounded termen.The adult emerges in the Autumn, flying between September and November and overwintering as an adult, reappearing in February to May (Randle et al., 2019).Like some other members of its genus, C. vaccinii awakes during hibernation to feed during the winter; by contrast, adults of C. ligula die off near the beginning of the new year.Feeding on fruit during the winter increases fecundity but not apparently longevity (McMannis & Fiedler, 2019).
Conistra vaccinii is found most often in non-coniferous woodland, also scrub, heathland, hedgerows and gardens in the UK (Waring et al., 2017).The adult, emerging in September and October, is attracted to the flowers of ivy (Hedera helix L.) and ripe blackberries and other fruits, feeding on Salix blossoms in the Spring, by which time it can lay fertile eggs (Bretherton et al., 1983).The eggs hatch within 11 to 14 days, and the larva feeds from late April to June on deciduous trees and shrubs such as blackthorn, hawthorn, birch and sweet chestnuts, sometimes later descending to feed on herbaceous plants (Waring et al., 2017).Full-grown, it is about 30 mm long and pupates in the ground in the loose cocoon (Bretherton et al., 1983), pupating about two months later (Waring et al., 2017).
The Chestnut is generally common and widespread in Britain and the Isle of Man and Channel Islands (sparsely recorded in Ireland) (NBN Atlas Partnership, 2023), and in the Palaearctic as far east as Central Asia and as far south as the shores of the Mediterranean, but it is absent in northern Scandinavia and northern Russia (GBIF Secretariat, 2023).Both abundance (+36%) and distribution (+41%) -notably in Scotland -have increased since 1970 (Randle et al., 2019).

Genome sequence report
The genome was sequenced from one male Conistra vaccinii (Figure 1) collected from High Wycombe, Buckinghamshire, UK (51.63, -0.74).A total of 36-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 9 missing joins or mis-joins and removed 4 haplotypic duplications, reducing the assembly length by 0.8%.
The final assembly has a total length of 720.8 Mb in 42 sequence scaffolds with a scaffold N50 of 24.5 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets  of scaffolds assigned to different phyla.Most (99.92%) of the assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 30 autosomes and the Z sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
RNA was extracted from abdomen tissue of ilConVacc1 in the Tree of Life Laboratory using the Life RNA Extraction: Automated MagMax™ mirVana protocol (https://dx.doi.org/10.17504/protocols.io.6qpvr36n3vmk/v1).The RNA concentration was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.et al., 2020).The assembly then scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al.,2023).The assembly was checked for contamination and corrected as described previously (Howe et al., 2021).Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano- Silva et al., 2023), which runs MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.
Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Conistra vaccinii assembly (GCA_948150665.1).Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal

Software tool Version Source
and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material Lees and colleagues report the high-quality genome assembly and annotation of the Chestnut, a common noctuid moth in the British Isles.The natural history section gives a concise summary of the organism and notes interesting features of the biology and taxonomy of the chestnut.The assembly and report follow carefully developed DToL workflows and templates and these comprehensively capture metadata for software versions and assembly properties.I have just minor questions/comments.Although this is stated in the methods, please also highlight in the main text (in the genome sequence report section) that the HiC/RNA seq data comes from a second individual. 1.
Can I query the bead ratio stated in the methods "In brief, the method uses a 1.8X ratio of AMPure PB beads to sample to eliminate shorter fragments and concentrate the DNA."In my understanding, and from the guidelines on protocol.io,to eliminate smaller fragments, a lower ratio of beads to sample would be used.Should this be 0.6x?

2.
Is the rationale for creating the dataset(s) clearly described?

Guanghong Liang
Fujian Agriculture and Forestry University, Fuzhou, Fujian, China This paper has sequenced and assembled the genome of Conistra vaccinii which is reported to be 720.8megabytes in size, comprising 31 chromosomes, and also completed the assembly of a 15.44 kilobase mitochondrial genome, and elaborated in detail the entire process from genome sequencing to assembly, revealed a substantial number of protein-coding genes within the Conistra vaccinii genome, which will be very helpful to identify this species from numerous specimen by using light trap, and also play a significant role in reveal phylogenetic relationships in Noctuid moths.
However, the paper also needs improvement as follow: The sequencing depth needs to be discussed.1.
The abstract section is overly succinct, I would suggest that the abstract should be rewritten to increase its appeal.

2.
The significance of objective in the introduction is also insufficient, for example, why did author try to reveal the genome sequence of Conistra vaccinii, just for interesting? is there any taxonomy trouble to identify those specimen by using light trap?

3.
The clarity of the images is inadequate, especially two specimen features are incomplete.4.
Please add the detail information about instruments and reagents used in this work, as well as the names and countries of the manufacturers in the Materials and Methods section.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?

Figure 2 .
Figure 2. Genome assembly of Conistra vaccinii, ilConVacc3.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 720,850,966 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (31,631,061 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (24,458,057 and 17,604,836 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Conistra%20vaccinii/dataset/CANUHQ01/snail.
A male Conistra vaccinii (specimen ID NHMUK013267905, ToLID ilConVacc3) was caught in a light trap in High Wycombe, Buckinghamshire, UK (latitude 51.63, longitude -0.74) on 2021-02-16.The specimen was collected and identified by David Lees (Natural History Museum) and then dry frozen at -80 °C.The specimen used for Hi-C data and RNA sequencing (specimen ID Ox000322, ToLID ilConVacc1), was collected from Wytham Woods, Oxfordshire, UK (latitude 51.77, longitude -1.33) on 2020-01-08.The specimen was collected and identified by Liam Crowley (University of Oxford), and then frozen on dry ice.High molecular weight (HMW) DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI), using the main processes: sample preparation; sample homogenisation; HMW DNA extraction; HMW DNA fragmentation; and fragmented DNA clean-up.The ilConVacc3 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing (as per the protocol at https://dx.doi.org/10.17504/protocols.io.x54v9prmqg3e/v1).For sample homogenisation,

Figure 5 .
Figure 5. Genome assembly of Conistra vaccinii, ilConVacc3.1:Hi-C contact map of the ilConVacc3.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=eDBoITIdQUW5SuXhgYE6hw.

Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.