The genome sequence of the Feathered Gothic, Tholera decimalis (Poda, 1761)

We present a genome assembly from an individual female Tholera decimalis (the Feathered Gothic; Arthropoda; Insecta; Lepidoptera; Noctuidae). The genome sequence is 1,334.1 megabases in span. Most of the assembly is scaffolded into 31 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 15.4 kilobases in length. Gene annotation of this assembly on Ensembl identified 12,771 protein coding genes.


Background
The use of a network of standardised light-traps operated every night since 1970, coupled with millions of individual records collated by amateur and professional recorders, has permitted analysis of long-term trends in moth abundance.These studies indicate that over 60% of larger moth species have decreased in abundance in Britain in the past half century (Conrad et al., 2006;Fox et al., 2019;Randle et al., 2019).The picture for geographical distribution within Britain is more complex, with roughly equal numbers of species expanding their range or facing range contraction (Randle et al., 2019).The Feathered Gothic Tholera decimalis is a moth that has had mixed fortunes in this period and may provide a useful study case.The quantitative data indicate that T. decimalis has suffered a large decline in abundance since 1970, while its geographic range showed a large contraction followed by a more recent expansion from 2000 to 2016 (Randle et al., 2019).Currently, T. decimalis is found widely across England and Wales, particularly in southern counties, but it is now absent from most of Scotland and very rare in Ireland and Northern Ireland (National Biodiversity Data Centre, 2023;Randle et al., 2019;Thompson & Nelson, 2003).The species has a patchy distribution across Eurasia with records concentrated in the Netherlands, Austria, Switzerland and Scandinavia; there are sporadic records further east through Russia to Mongolia (GBIF Secretariat, 2023).The moth is associated with rough grassland, downland and open woodland, with adults laying eggs during August and September.The eggs overwinter before the hatched larvae feed on grasses between March and July (Stokoe, 1948).The adult moth has bold white markings along the wing veins, streaked over ornately patterned rich brown forewings.Males have very pronounced bipectinate (feathered) antennae; this feature, not seen in females, is likely an adaptation to accommodate increased numbers of olfactory receptors for pheromone detection.Further research is needed into the chemical biology of this species, particularly as electroantennogram recording showed no response to 30 putative pheromone components that elicit responses in related species (Renou et al., 1991).
A genome sequence for T. decimalis will enable studies into the biochemical basis of olfactory reception and molecular adaptations for grass feeding, and may facilitate future research into the biological factors affecting responses to environmental and land use changes.

Genome sequence report
The genome was sequenced from one female Tholera decimalis (Figure 1) collected from Wytham Woods, Oxfordshire, UK (latitude 51.77, longitude -1.34).A total of 45-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 29-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 58 missing or mis-joins and removed 22 haplotypic duplications, reducing the assembly length by 1.26% and the scaffold number by 25%.
The final assembly has a total length of 1,334.1 Mb in 84 sequence scaffolds with a scaffold N50 of 44.3 Mb (Table 1).Most (99.75%) of the assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 30 autosomes and the Z sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/988041.

Sample acquisition and nucleic acid extraction
A male Tholera decimalis (ilThoDeci1) was collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.77, longitude -1.34) on 8 September 2020.The specimen was taken from woodland habitat by Douglas Boyes (University of Oxford) using a light trap.The  specimen was identified by the collector and snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The ilThoDeci1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Head and thorax tissue was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.Low molecular weight DNA was removed from a 20 ng aliquot of extracted DNA using the 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing.HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio RNA was extracted from abdomen tissue of ilThoDeci1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of  the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions.DNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi), Illumina HiSeq 4000 (RNA-Seq) and Illumina NovaSeq 6000 (10X) instruments.Hi-C data were also generated from head tissue of ilThoDeci1 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.et al., 2021;Simão et al., 2015) were calculated.Table 3 contains a list of software tool versions and sources.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Tholera decimalis assembly  In this well-written manuscript, Boyes et al., describe the genome sequencing, assembly, and preliminary annotation of a single female Tholera decimalis.By assessing the assembly using the 3C criterion (contiguity, completeness, and correctness), they present a highly contiguous assembly, with a 99% score on the BUSCO v5.3.2 completeness, with minimal duplicated sequences.The authors have also assembled the Z sex chromosome and mitochondrial genome.
There's a logical flow between sections and paragraphs where figures and tables are appropriately labeled and align with the text.The authors have also appropriately addressed ethical issues.We found the manuscript itself to be very informative, with sufficient detail.The genome (when released) will be a great utility to the Lepidopteran research community.We however had just few suggestions/comments that would help improve this resource: While the assembly, annotation, and quality assessment methods are well described, it would be of great help to the community if the authors were to share their code/scripts/pipeline via a GitHub/Zenodo page.This would also ensure reproducibility of the methods. 1.
We were also unable to access the assembly via the Darwin Tree of Life portal (perhaps because of an embargo on release).It would be very helpful to make sure that the resource is released.

2.
On page 3, under methods (sample acquisition and nucleic acid extraction), it's mentioned that a "male" Tholera decimalis was collected when it was actually a female?Please clarify this.

3.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? No
Are the datasets clearly presented in a useable and accessible format?Partly Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Population genomics, genomics, evolution We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

Jacqueline Heckenhauer
Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany In their data note, the authors present the genome sequence of the primary assembly and contigs of the alternate haplotype of Tholera decimalis using a de novo assembly method following the approach used by the Darwin Tree of Life Project.The presented genome is of high quality and released openly for reuse.Therefore this high-quality reference genome is very beneficial to the field.
However, I suggest that the authors include a short summary of repetitive DNA components of the assembly and submit the library of repetitive element (RE) sequences to a public repository of repetitive elements (e.g., Dfam).This way the community would even more benefit from this data.The diversity of available insect genomes has rapidly expanded, but the rate of community contributions to RE databases which are important for RE annotation has not kept pace, preventing high-resolution study of REs in many groups.
Was the genome size of this specimen / species estimated by flow cytometry or similar?
Is the rationale for creating the dataset(s) clearly described?).The computational programs used in that process were clearly summarized in Table 3 with precise version information.Reviewer has no more requests on the manuscript as a genome sequencing report.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Genome informatics for non-model organisms I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Tholera decimalis, ilThoDeci1.2:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 1,334,081,899 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (55,901,591 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (44,326,127 and 30,461,373 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilThoDeci1.2/dataset/CALPBR02/snail.
Assembly was carried out withHifiasm (Cheng et al., 2021)   and haplotypic duplication was identified and removed with purge_dups(Guan et al., 2020).One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long Ranger ALIGN, calling variants with FreeBayes (Garrison & Marth, 2012).The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2023).The assembly was checked for contamination as described previously (Howe et al., 2021).Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2022), which runs MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.To evaluate the assembly, MerquryFK was used to estimate consensus quality (QV) scores and k-mer completeness (Rhie et al., 2020).The genome was analysed within the BlobToolKit environment (Challis et al., 2020) and BUSCO scores (Manni

Figure 5 .
Figure 5. Genome assembly of Tholera decimalis, ilThoDeci1.2:Hi-C contact map.Hi-C contact map of the ilThoDeci1.2assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=IsQwpVaRQ36c521_VU0W0w.

Reviewer
Report 04 September 2023 https://doi.org/10.21956/wellcomeopenres.21483.r66112© 2023 Heckenhauer J.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

©
2023 Bono H.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Hidemasa Bono 1 Genome Editing Innovation Center, Hiroshima University, Higashi-Hiroshima, Hiroshima, Japan 2 Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Hiroshima, Japan 3 Database Center for Life Science, Research Organization of Information and Systems, Mishima, Shizuoka, Japan Authors present the genome sequencing of the feathered gothic Theolera decimalis.The reads used in this study were sufficiently deep (45-fold coverage in PacBio HiFi) with Hi-C reads.The assembly constructed was very good judged from BUSCO v5.3.2 completeness score (99.0%

Table 3 . Software tools: versions and sources. Software tool Version Arun Sethuraman San
Diego State University, San Diego, California, USA Priyanshi Shah Biology, San Diego State University, San Diego, California, USA

Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
https://doi.org/10.21956/wellcomeopenres.21483.r57063