The genome sequence of the Fulvous Clothes Moth, Tinea semifulvella (Haworth, 1828)

We present a genome assembly from an individual male Tinea semifulvella (the Fulvous Clothes Moth; Arthropoda; Insecta; Lepidoptera; Tineidae). The genome sequence is 596.6 megabases in span. The whole assembly is scaffolded into 45 chromosomal pseudomolecules, with the Z sex chromosome assembled. The mitochondrial genome has also been assembled and is 16.8 kilobases in length. Gene annotation of this assembly on Ensembl has identified 11,516 protein coding genes.


Background
Tinea semifulvella is a micro-moth in the family Tineidae, a cosmopolitan group of moths, many associated with human habitation, and some of which have become pests.Although small (forewing length 6-10 mm), T. semifulvella, unlike many moths in the genus, is distinctive.It has a reddish head, and a dirty white forewing with the final third of the wing orangey-brown.There is a small dark dot on the back.
The moth is common and widespread throughout Britain, but more local in its distribution in Ireland.It occurs throughout Europe, and as far east as Iran (Gaedike, 2019).It is on the wing between May and October and may well be double-brooded in the southern part of its UK range (Sterling & Parsons, 2018).The moth is found in a range of habitats and comes to light.It is associated with bird's nests, particularly those which occur in the open.This is unusual as most other moths found in bird nests have an association with hole-nesting species (Boyes & Lewis, 2019).It has been suggested that this might be a strategy to avoid intraspecific competition (Boyes, 2018).The moth has also been found on wool out of doors, dead animals (Sterling & Parsons, 2018); and in hen-houses (Gaedike, 2019), suggesting it is keratinophagous.
The genome of T. semifulvella was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.Here we present a chromosomally complete genome sequence for Tinea semifulvella based on one male specimen from Wytham Woods, Oxfordshire, UK.

Genome sequence report
The genome was sequenced from one male T. semifulvea specimen (Figure 1) collected from a grassland area of Wytham Woods (latitude 51.78, longitude -1.32).A total of 45-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 62-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 21 missing or mis-joins and removed one haplotypic duplication, reducing the scaffold number by 30.77% and increasing the scaffold N50 by 5.66%.
The final assembly has a total length of 596.6 Mb in 45 sequence scaffolds with a scaffold N50 of 12715305 Mb (Table 1).All of the assembly sequence was assigned to 45 chromosomal-level scaffolds, representing 44 autosomes and the Z sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 95.3% (single 94.5%, duplicated 0.8%) using the lepidoptera_odb10 reference set.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
Three T. semifulvella specimens (ilTinSemi1, ilTinSemi2 and ilTinSemi3) were collected using a light trap in Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.78, longitude -1.32) on the following dates: 21 September 2019, 13 June 2020 and 5 July 2020, respectively.The specimens were collected and identified by Douglas Boyes (University of Oxford), and snap-frozen on dry ice.
DNA was extracted from whole organism tissue of ilTinSemi1 at the Wellcome Sanger Institute (WSI) Scientific Operations core using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions.
RNA was extracted from whole organism tissue of ilTinSemi2 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq  libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi), Illumina HiSeq 4000 (RNA-Seq) and HiSeq X Ten (10X) instruments.Hi-C data were also generated from ilTinSemi3 using the Arima v2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long Ranger ALIGN, calling variants with freebayes (Garrison & Marth, 2012).The assembly was then scaffolded

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the T. semifulvella genome assembly (GCA_910589645.1).Annotation was created primarily    The genome of the Fulvous Clothes Moth, Tinea semifulvella (Haworth, 1828) was sequenced and assembled using appropriate techniques.In this assembly 45 chromosomal pseudomolecules along with the Z sex chromosome have been conformed.

Minor comment
In the background of the manuscript first line the genus name (Tinea semifulvella) first letter is not in italics.The first letter can be changed as italics.
Above all, I confirm that the manuscript meets the necessary scientific standard and is suitable for indexing.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Phylogenetic analysis of moths I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Saurav Baral
Tata Institute of Fundamental Research, Bengaluru, India The findings presented in this paper contribute significantly to the expanding repository of Lepidoptera genomes.In addition to presenting a chromosomally resolved genome, the authors have delivered a meticulous annotation, enhancing the overall utility of the amassed dataset.Despite over 500 lepidopteran genomes having been assembled to the chromosomal level, a mere fraction-less than 50-boast comprehensive annotations.The near-complete annotation achieved in this study holds great promise for advancing research in comparative genomics.Such complete genomic dataset aids significantly to the study of genes, gene families and genome evolution and I would like to both thank and congratulate the authors for producing this dataset.With that said, there are still improvements that can be made on how the data is presented in the paper.
The paper is concise, which is good.But there are sections of the paper that can be improved.

Issues:
The current image of the specimen, Figure 1, does not contain a proper scale.Please add a proper image of the species, with appropriate scale. 1.
"The moth is found in a range of habitats and comes to light."Do you mean the moth is Diurnal?Please use appropriate scientific words and avoid ambiguous statements.

2.
"Manual assembly curation corrected 21 missing or mis-joins and removed one haplotypic duplication, reducing the scaffold number by 30.77% and increasing the scaffold N50 by 5.66%."Please provide the exact metrics for these in addition to the percentages.

"
The final assembly has a total length of 596.6 Mb in 45 sequence scaffolds with a scaffold N50 of 12715305 Mb".The N50 is too large.

4.
"All of the assembly sequence was assigned to 45 chromosomal-level scaffolds, representing 44 autosomes and the Z sex chromosome."Please clarify if there is prior information about the number of chromosomes or if closely related species also show similar chromosome number.

5.
Please add a table showing comparison between this genome assembly/annotation and 6. assembly/annotation of some related lepidoptera.This would improve both clarity and provide a comparative context towards understanding this analysis.This project employs standard methodologies, yielding results that align with findings from analogous Lepidopteran species.However, the authors are encouraged to provide more comprehensive details regarding the Annotation pipeline in the Methods section, which currently lacks specificity.The paper also requires modifications in the Background section for clarity and scientific precision.Additionally, incorporating a comparative analysis within the Genome Sequence Report would enrich the contextual understanding of the metrics presented in the paper.While the quality of the work and the results is commendable, their excellence may not be fully apparent without a broader comparative context.With these minor corrections, the paper may be accepted for indexing.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format?Partly Competing Interests: No competing interests were disclosed.
Reviewer Expertise: I study molecular evolution across gene families using sequences extracted from published genomic datasets.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
The assembly includes forty-five chromosomal pseudomolecules and the mitochondrial genome, along with the Z sex chromosome.The genome assembly is of high quality, with a BUSCO completeness score of 95.3%.Here are some minor comments: Is the scaffold N50 value 12715305 Mb? Please verify the value and unit.1.
Could you provide the BUSCO estimation for the annotated sets of protein-coding genes? 2.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Interests: No competing interests were disclosed.
Reviewer Expertise: Population genetics, genomics, and pest control.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Tinea semifulvella, ilTinSemi1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 596,601,316 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (35,936,759 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (12,715,305 and 9,702,360 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilTinSemi1_1.1/dataset/ilTinSemi1_1.1/snail.

Figure 5 .
Figure 5. Genome assembly of Tinea semifulvella, ilTinSemi1.1:Hi-C contact map.Hi-C contact map of the ilTinSemi1.1 assembly against the specimen ilTinSemi3, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=HL8wleGpRM2Mr4BHie6J_Q.

Reviewer Report 03
May 2024 https://doi.org/10.21956/wellcomeopenres.21153.r77434© 2024 Baral S. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table 2 . Chromosomal pseudomolecules in the genome assembly of Tinea semifulvella, ilTinSemi1. INSDC accession Chromosome Size (Mb) GC%
Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.All efforts are undertaken to minimise the suffering of animals used for sequencing.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.