The genome sequence of the Streak, Chesias legatella (Denis & Schiffermüller, 1775)

We present a genome assembly from an individual male Chesias legatella (the Streak; Arthropoda; Insecta; Lepidoptera; Geometridae). The genome sequence is 310.3 megabases in span. Most of the assembly is scaffolded into 31 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 20.1 kilobases in length. Gene annotation of this assembly on Ensembl identified 15,520 protein-coding genes.


Background
The Streak, Chesias legatella, is a medium sized geometrid moth, dark greyish brown with a prominent creamish-white streak towards the apex of the forewing and an elliptic dark shape in the discal area with another whitish-cream dash inside, which flies late in the temperate season, usually September to early November in the UK (Randle et al., 2019), overwintering as an egg.At rest it has a rather unusual posture for a looper moth, sometimes rolling its wings partially around a twig.
The Streak is a species of open woodland and heathland in the UK, especially on sandy substrates, feeding on Broom (Cytisus scoparius L.) (Wall, 1975), or occasionally Tree Lupin (Lupinus arboreus) (Wall, 1975;Waring et al., 2017).
C. legatella is generally common and widespread in the western Palaearctic only, from southern Scandinavia to the northern Mediterranean; but has relatively few records for eastern Europe (GBIF Secretariat, 2022).It is widespread in the UK and eastern Ireland (NBN Atlas Partnership, 2021), but the distribution is patchy, and it is vulnerable, with evidence for a significant decline since 1970 (Conrad et al., 2006) that has affected its distribution (Randle et al., 2019).
There is a single DNA barcode cluster on BOLD, BOLD: AAF2574 (16 March 2023), which is 5.46% pairwise divergent to that of Chesias capriata Prout, 1904 from Italy (BOLD: AAW3724).C. legatella has six other known congeners including the Broom-tip C. rufata (Fabricius, 1775) and belongs to the larentiine tribe Chesiadini, an early diverging one within the subfamily Larentiinae (after Trichopterygini), based on a study of ten nuclear protein coding genes and COI (Murillo-Ramos et al., 2019: Figure 3).The genus Chesias falls sister to the genus Aplocera Stephens, 1827 in the study of Õunap, Viidalepp and Truuverk (2016: Figure 2).The whole genome will be useful for more detailed evolutionary studies.
The species is of no economic concern, although it has been considered as a possible agent of biological control of Broom (Syrett et al., 1999).
The genome of Chesias legatella was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.Here we present a chromosomally complete genome sequence for Chesias legatella, based on one specimen from Beinn Eighe National Nature Reserve, Scotland.

Genome sequence report
The genome was sequenced from one male Chesias legatella (Figure 1) collected from Beinn Eighe National Nature Reserve, Scotland, UK (latitude 57.63, longitude -5.35).A total of 53-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected seven missing or mis-joins and removed two haplotypic duplications, reducing the scaffold number by 5.26%.
The final assembly has a total length of 310.3 Mb in 36 sequence scaffolds with a scaffold N50 of 11.0 Mb (Table 1).Most (99.95%) of the assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 30 autosomes, and the Z sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.
Metadata for specimens, barcode results, spectra estimates, sequencing runs, contaminants and pre-curation assembly statistics are given at https://links.tol.sanger.ac.uk/species/934925.The estimated Quality Value (QV) of the final assembly is 65.8 with k-mer completeness of 100%, and the assembly has a

Amendments from Version 1
Changes have been made to the text in response to reviewers' comments.We have added information about genome annotation of Chesias legatella.
We now link to new annotation data from Ensembl at the European Bioinformatics Institute for the genome assembly reported in this data note.
We attempted to improve the appearance of the specimen photograph of the sampled moth.
The software Pretext was corrected to PretextView.
Any further responses from the reviewers can be found at the end of the article BUSCO v5.3.2 completeness of 98.4% (single = 98.0%, duplicated = 0.4%), using the lepidoptera_odb10 reference set (n = 5,286).
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/934925.

Sample acquisition and nucleic acid extraction
A male Chesias legatella (specimen number NHMUK014543814, ToLID ilCheLega1) was collected from Beinn Eighe National Nature Reserve, Scotland, UK (latitude 57.63, longitude -5.35) on 10 September 2021.The specimen was collected by David Lees (Natural History Museum) using a light trap.The specimen was identified by the collector and preserved at -80°C.
The ilCheLega1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Head and thorax tissue of ilCheLega1 was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.DNA was extracted at the Wellcome Sanger Institute (WSI) Scientific Operations core using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions.

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.DNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) instrument.Hi-C data were also generated from tissue of ilCheLega1 using the Arima v2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in the default protein mode to generate annotation for the Chesias    expertise to confirm that it is of an acceptable scientific standard.

Version 1
Reviewer Report 01 March 2024 https://doi.org/10.21956/wellcomeopenres.21381.r71927 © 2024 Nomura S. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Shota Nomura
Division of Evolutionary Developmental Biology, National Institute for Basic Biology, Okazaki, Japan This study performed the chromosome-level genome assembly of the Streak, Chesias legatella.Authors assembled the contigs using PacBio single-molecule HiFi long reads and scaffolded them using Hi-C data.As a result, the authors obtained 36 scaffolds with 11.0 Mb of N50 and 98.4% of BUSCO score.Among the assembly, 99.95% were assigned to 30 autosomes and the Z sex chromosomes, resulting in a chromosome-level genome assembly with very high completeness.
In this manuscript, the methods of analyses were appropriate and well explained.I note below a few minor questions.
1. Authors wrote "the whole genome will be useful for more detailed evolutionary studies".How many species of the same genus or tribe have chromosome-level genome sequences been published?This information would be useful to readers who wants to perform evolutionary studies and would be better to be explained in the paragraph.
2. As well as other papers submitted to Wellcome open research, genome annotation analyses should be performed using BRAKER2 or others pipeline.Annotation information is very useful to readers who use the assembled genome sequences for other studies.Also, the statement in Data Availability section ("The genome will be annotated using available RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute") should be modified.
3. How did authors distinguish Z sex chromosomes from autosomes?If authors obtained male and female reads, they can be distinguished based on differences in reads mapping between the sexes.Did the authors use such a method?In any case, it would be better to be explained the methods of distinguish them.
4. "Hi-C data" should be changed to "Hi-C library" in Sequencing paragraph in Method section.
Is the rationale for creating the dataset(s) clearly described?

Fahad Alqahtani
King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia This paper is about the genome sequence of the Streak, Chesias legatella (Denis & Schiffermüller, 1775), the authors report that they successful reconstructed the genome at the chromosomallevel for a male specimen of Chesias legatella.They utilized two sequencing technologies, Pacific Biosciences SEQUEL II and Hi-C Illumina, to assemble the genome, which is approximately 310.3 Mb in size.The completeness of the genome assembly was assessed with BUSCO analysis, which contains 98.4% of common genes were completely present.However, minor comments should be addressed: -The photo of the Streak, Chesias legatella in Figure 1 needs to be imporoved.
-The NCBI-BioSample (SAMEA14448143) entry should be updated to reflect that the specimen is male, as stated in the paper.
-It is recommended to mention the related species used for guiding the mitochondrial genome annotation with the Mitofinder tool in the methods section.
-The term "Pretext" in the Genome Assembly section should be corrected to "PretextView".

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bioinformatics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Reviewer Report 08 January 2024 https://doi.org/10.21956/wellcomeopenres.21381.r71925 © 2024 Lucek K.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Kay Lucek
Department of Environmental Sciences, University of Neuchâtel, Neuchâtel,, Switzerland The genome sequence of Chesias legatella is providing great potential for future evolutionary studies.Importantly, the notion that this species could be used as an agent for biological control "probably without consequences" needs to be carefully assessed before such measures would be implemented.
Overall the standard pipelines for genome assembly of the Darwin Tree of Life project have been thoroughly implemented.
Although the data availability statement suggests that "available RNA-Seq data" will be used to annotate the genome, no information on how this dataset has been generated or information about its quality are provided.
Is the rationale for creating the dataset(s) clearly described?Yes

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of methods and materials provided to allow replication by others?Partly

Figure 2 .
Figure 2. Genome assembly of Chesias legatella, ilCheLega1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 310,278,188 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (14,956,362 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (11,018,016 and 7,493,246 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilCheLega1.1/dataset/ CANAHS01/snail.

Figure 5 .
Figure 5. Genome assembly of Chesias legatella, ilCheLega1.1:Hi-C contact map of the ilCheLega1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=ZLBbPbLmQzijgU5VCAKOxg.

Table 2 . Chromosomal pseudomolecules in the genome assembly of Chesias legatella, ilCheLega1. INSDC accession Chromosome Size (Mb) GC%
Ethics and compliance issuesThe materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.All efforts are undertaken to minimise the suffering of animals used for sequencing.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.

Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Partly Are the datasets clearly presented in a useable and accessible format? Partly Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
https://doi.org/10.21956/wellcomeopenres.21381.r75002© 2024 Alqahtani F. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.