The genome sequence of the July Highflyer, Hydriomena furcata (Thunberg, 1784)

We present a genome assembly from an individual male Hydriomena furcata (the July Highflyer; Arthropoda; Insecta; Lepidoptera; Geometridae). The genome sequence is 423.3 megabases in span. Most of the assembly is scaffolded into 28 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 15.89 kilobases in length. Gene annotation of this assembly on Ensembl identified 17,324 protein coding genes.


Background
The July highflyer Hydriomena furcata is a widely distributed moth in the superfamily Geometridae found across Europe and Asia, ranging from Portugal and Ireland to the far east of Russia and Japan.The species is also widespread in Canada and the United States (GBIF Secretariat, 2023).In the UK, H. furcata is univoltine with adults on the wing in July and August in woodlands, moorlands, hedgerows and suburban areas (NBN Atlas Partnership, 2023).The forewings of the adult moth are delicately patterned with a series of wavy bands of alternating dark and light shading, most likely providing protection through crypsis on patchy and irregular surfaces including tree bark and lichen.Wing shape is unusual with a distinct 'shoulder' at the base of the leading wing edge, but colouration is highly variable.One common form has cream and black bands crossing a bright olive green ground colour; other individuals are dark reddish-brown with little green visible.
Larvae of H. furcata have been recorded feeding on the leaves of a wide range of deciduous plants including willow (Salix), poplar (Populus), alder (Alnus), birch (Betula), bilberry (Vaccinium myrtillus) and heather (Calluna vulgaris) (South, 1961).High densities can sometimes be recorded; one study estimated that the abundant larvae of H. furcata consumed 50% of the annual growth of mature heather in a moorland region of northern England (Fielding, 1992).In stands of heather Calluna vulgaris, late instar larvae spin a small silken feeding web on mature plants (Fielding, 1992).
The genome of the July highflyer, Hydriomena furcata, will facilitate studies into polyphagy and insect colour polymorphism, and contribute to the growing set of resources for studying lepidopteran evolution.Here we present a chromosomally complete genome sequence for Hydriomena furcata, based on one male specimen of the green form, collected from Wytham Woods, Oxfordshire, UK.

Genome sequence report
The genome was sequenced from one male Hydriomena furcata (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.77,.A total of 42-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 98-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 18 missing joins or mis-joins and removed one haplotypic duplication, reducing the scaffold number by 24.44%, and decreasing the scaffold N50 by 2.58%. The final assembly has a total length of 423.3 Mb in 34 sequence scaffolds with a scaffold N50 of 16.1 Mb (Table 1).A summary of the assembly statistics is shown in Figure 2, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.97%) of the assembly sequence was assigned to 28 chromosomal-level scaffolds, representing 27 autosomes and the Z sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/104460.
The resulting annotation includes 17,492 transcribed mRNAs from 17,324 protein-coding genes.

Sample acquisition and nucleic acid extraction
A male Hydriomena furcata (specimen ID Ox000534, ToLID ilHydFurc1) was collected from Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.77, longitude -1.33) on 2020-06-25 using a light trap.The specimen was collected and identified by Douglas Boyes (University of Oxford) and preserved on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The ilHydFurc1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C RNA was extracted from head and thorax tissue of ilHydFurc1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer    et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.
Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in the default protein mode to generate annotation for the Hydriomena furcata assembly (GCA_912999785.1) in Ensembl Rapid Release.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome

Software tool Version
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material They presented a genome assembly of July Highflyer using Pacbio long-read HIFI sequencing data and Hi-C data, acquiring a pretty high-quality of genomes as well as the mitochondrial genome.In addition, they annotated protein-coding genes using short and long-read RNA-seq data , identifying 17,324 protein-coding genes in a draft genome.Overall, they presented a high-quality result and well-described methods. 1.
It would be great to have a comparative analysis of assembled genomes of host and mitochondria with closed-related specie's genomes as well as protein-coding genes, which will provide interesting insights.In addition, I suggest that authors can examine repeat elements in genomes and how much and what kinds of repeats are prevalent.

2.
Is the rationale for creating the dataset(s) clearly described?Partly Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Genome assembly, Structural variations I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Michael Hiller
Senckenberg Nature Research Society, Frankfurt Am Main, Germany This data note reports a high-quality primary assembly of the July Highflyer.

○
The genome was generated from PacBio HiFi data, polished with 10X reads and scaffolded with HiC data.

○
The assembly has a very high base accuracy and BUSCO completeness, and almost all of the contigs are assigned to chromosome-level scaffolds.The mito genome was also assembled and genes were annotated in both the mito and nuclear genome.

○
All methods and procedures are clearly described.

○
The assembly contributes to the genomic resources available for lepidoptera.
○ I have two minor comments: I wonder why polishing with the 10X data was necessary, as one often / mostly achieves a QV > 60 with HiFi reads alone and this study generated sufficient (42X) HiFi coverage.

○
In Figure 1, it would be helpful to note whether the missing piece of the left wing is a natural feature or damage caused by sampling the individual.
○ I am also wondering if the wing color variation can be illustrated in case images are available.

Kay Lucek
University of Neuchâtel, Neuchâtel, Switzerland The authors present the chromosome level genome assembly of a male specimen of the July highflyer Hydriomena furcata.The assembly consists of 28 chromosomes including the Z chromosome.The assembly is highly complete as revealed by the high BUSCO score but not fully phased.Sequencing and genome assembly follow the current state of the art and use established methods.
Overall, the presented assembly will be of great value to study genome architecture as well as genome size evolution in Lepidoptera.I particularly appreciate the annotation using RNAseq data, although the specific dataset is not indicated but only indirectly in the referenced BioProject.

Rodolpho S.T Menezes
State University of Santa Cruz, Ilhéus, Brazil The manuscript by Boyes and Holland presents a genome assembly from a male of the July highflyer, Hydriomena furcata (Thunberg, 1784).The final assembly is 423.3 megabases and most of the assembly is scaffolded into 28 chromosomal pseudomolecules, including the sex chromosome.Overall the manuscript presents an advance in the genomic field of Lepidoptera.This genome information will be interesting for comparative genomics investigation and also for evolution and conservation studies.The methodology is thorough and explained and the figures are well constructed and the manuscript is well written.However, I think the author should compare its results with the assembled genomes from other Lepidoptera species.
Minor point: Since the author assembled the mitochondrial genome, I think it is interesting to show general (GC/AT content…) mitochondrial information about this species.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Hydriomena furcata, ilHydFurc1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 423,317,948 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (25,664,163 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (16,092,901 and 10,438,587 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilHydFurc1.1/dataset/CAJVWF01.1/snail.

Figure 5 .
Figure 5. Genome assembly of Hydriomena furcata, ilHydFurc1.1:Hi-C contact map of the ilHydFurc1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=emIrPe0SQe6WaCmtohrc7A.
note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all

Reviewer Report 15
July 2024 https://doi.org/10.21956/wellcomeopenres.22345.r84783© 2024 Hiller M. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Is the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.