The genome sequence of the Oak Beauty, Biston strataria (Hufnagel, 1767)

We present a genome assembly from an individual male Biston strataria (the Oak Beauty; Arthropoda; Insecta; Lepidoptera; Geometridae). The genome sequence is 424.0 megabases in span. Most of the assembly is scaffolded into 16 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 15.61 kilobases in length. Gene annotation of this assembly on Ensembl identified 18,406 protein coding genes.


Background
The Oak Beauty Biston strataria (synonym stratarius) is a large geometrid moth (wingspan 40-50 mm) with two jagged chestnut brown bands, edged in black, crossing the speckled white and black forewings.B. strataria is found across northern, central and eastern Europe, including southern parts of Scandinavia, central and southern counties of Britain, and scattered sites across Ireland.There are also records from Russia, Georgia, Turkmenistan and Kazakhstan (GBIF Secretariat, 2023).
B. strataria is a close relative of the Peppered moth B. betularia; the two species have similar wing size and shape, although the markings are quite different.As with B. betularia, melanism has been reported in B. strataria.At least two distinct melanic forms are reported: ab.robinaria in which the pale areas of the forewings are suffused with black scales and ab.melanaria, widely recorded in the Netherlands, which is uniformly black (West, 2005).Neither melanic variant has been common in Britain, even during industrial periods when melanic B. betularia increased in frequency.West (2005) suggests that ab.robinaria is caused by a dominant allele with viable heterozygotes and homozygotes; Ford (1967) reported that one of the melanic forms is lethal when homozygous, but it is unclear which he was referring to.
The larvae of B. strataria feed on the leaves of many deciduous trees and, despite the common name, are not oak specialists.Indeed, the German common name, Pappel-Dickleibspanner, refers to living on poplar trees (Populus spp.).At rest, the mature larvae grip twigs of the food plant with posterior claspers and prolegs, and extend their thin, lumpy body at a sharp angle.This 'twig-like' posture is thought to be an example of masquerade, where the larva is clearly visible to predators but misidentified; this contrasts to crypsis in which an individual is not detected (Skelhorn et al., 2009).Attempts to test the masquerade hypothesis have given supportive, but not conclusive, support (Skelhorn et al., 2009) The genome sequence of Biston strataria was determined as part of the Darwin Tree of Life project.The complete genome sequence will aid research into the molecular basis of wing colour polymorphism and into adaptations to polyphagy, and will contribute to the growing set of resources for studying molecular evolution in the Lepidoptera.

Genome sequence report
The genome was sequenced from one male Biston strataria (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.77, -1.34).A total of 48-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected one missing join and removed one haplotypic duplication, reducing the scaffold number by 4.35%.
The final assembly has a total length of 424.0 Mb in 21 sequence scaffolds with a scaffold N50 of 31.2Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.94%) of the assembly sequence was assigned to 16 chromosomal-level scaffolds, representing 15 autosomes and the Z sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).Chromosome Z was assigned by synteny to Biston betularia (GCA_905404145.2) (Boyes et al., 2022).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
A male Biston strataria (specimen ID Ox001101, ToLID ilBisStrt2) was collected from Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.77, longitude -1.34) on 2021-03-31 using a light trap.The specimen was collected and identified by Douglas Boyes (University of Oxford) and preserved on dry ice.This specimen was used for DNA and RNA sequencing.
The specimen used for Hi-C sequencing (specimen ID NHMUK014043020, ToLID ilBisStrt1) was collected in a light trap from High Wycombe, Buckinghamshire, UK (latitude 51.63, longitude -0.74) on 2021-03-04.The specimen  In sample preparation, the ilBisStrt2 sample was weighed and dissected on dry ice (Jay et al., 2023).Tissue from the thorax was homogenised using a PowerMasher II tissue disruptor (Denton et al., 2023a).HMW DNA was extracted in the WSI Scientific Operations core using the Automated MagAttract v2 protocol (Oatley et al., 2023).The DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system RNA was extracted from remaining thorax tissue of ilBisStrt2 in the Tree of Life Laboratory at the WSI using the RNA Extraction: Automated MagMax™ mirVana protocol (do Amaral et al., 2023).The RNA concentration was assessed using   a Nanodrop spectrophotometer and a Qubit Fluorometer using the Qubit RNA Broad-Range Assay kit.Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.
Protocols developed by the WSI Tree of Life laboratory are publicly available on protocols.io(Denton et al., 2023b).

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and Illumina NovaSeq 6000 (RNA-Seq)) instruments.Hi-C data were also generated from head and thorax tissue of ilBisStrt1 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000, Illumina NovaSeq 6000 instrument.

Genome assembly, curation and evaluation
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2023).The assembly was checked for contamination and corrected as described previously (Howe et al., 2021).Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2023), which runs MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.
Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in the default protein mode to generate annotation for the Biston strataria assembly (GCA_950106695.1) in Ensembl Rapid Release.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Kumar Saurabh Singh
Wageningen 3. Regarding the manual assembly correction, if the Hi-C map was utilized, could you provide a graphical representation indicating where corrections were applied within the contact map?Additionally, it was unclear how correcting a single missing join resulted in a significant reduction of 4.35% in scaffold count.I recommend providing more detailed information on this aspect to help readers understand the connection within the correction process.
3. In the annotation section, there seems to be no explanation regarding whether the RNAseq data was utilized as external evidence for predicting gene models.I recommend adding those details in that section.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bioinformatics, genomics, transcriptomics, metabolomics, data-integration I confirm that I have read this submission and believe that I have an appropriate level of

Figure 2 .
Figure 2. Genome assembly of Biston strataria, ilBisStrt2.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 424,056,544 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (43,732,088 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (31,208,321 and 25,983,362 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilBisStrt2_1/dataset/ilBisStrt2_1/snail.
was collected and identified by David Lees (Natural History Museum) and preserved by dry freezing at -80 °C.The workflow for high molecular weight (HMW) DNA extraction at the Wellcome Sanger Institute (WSI) includes a sequence of core procedures: sample preparation; sample homogenisation, DNA extraction, fragmentation, and clean-up.

Figure 3 .
Figure 3. Genome assembly of Biston strataria, ilBisStrt2.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilBisStrt2_1/dataset/ilBisStrt2_1/blob.

Figure 4 .
Figure 4. Genome assembly of Biston strataria, ilBisStrt2.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilBisStrt2_1/dataset/ilBisStrt2_1/cumulative.

Figure 5 .
Figure 5. Genome assembly of Biston strataria, ilBisStrt2.1:Hi-C contact map of the ilBisStrt2.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=H2uSqyMdS6ST2KSsBHf3Cg.
University & Research, Wageningen, Gelderland, The Netherlands The authors have assembled the genome of the male Oak Beauty moth, Biston strataria (Hufnagel, 1767), utilizing advanced genome sequencing technologies such as Pacific Biosciences singlemolecule HiFi sequencing and chromosome conformation Hi-C data.They have deposited the raw datasets in the NCBI and included the accessions for accessing this data.While I found the manuscript comprehensive, I have a few comments on areas that were unclear to me or suggestions for further enhancing the clarity and quality of the manuscript.1.The introduction discusses other moth species closely related to B. strataria.I believe including a phylogenetic tree illustrating the position of B. strataria would greatly enhance the manuscript, effectively conveying the rationale for sequencing this moth species.2. I couldn't find details regarding the coverage and quantity of sequencing data generated for the Hi-C and RNAseq libraries.Additionally, annotating the Hi-C contact map with chromosome numbers would be helpful for readers.

Table 3 . Software tools: versions and sources. Software tool Version Open Peer Review Current Peer Review Status: Version 1
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Fundacion Centro Nacional de Analisi Genomico, Barcelona, Catalonia, Spain The genome note submitted by Boyes et al. reports the genome sequence of the Oak Beauty, Biston strataria.The assembly is chromosome-scale and of high quality, meeting the minimum standards recommended by the Earth Biogenome Project.All protocols are appropriate and welldocumented.All data conforms with FAIR principles, with read data and assemblies being available in the ENA.Blobtoolkit figures are interactive.Additional QC data are provided at https://tolqc.cog.sanger.ac.uk/, supplementing the figures published in the data note.

Is the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.