The genome sequence of the Acer Sober, Anarsia innoxiella (Gregersen & Karsholt, 2017)

We present a genome assembly from an individual female Anarsia innoxiella (the Acer Sober; Arthropoda; Insecta; Lepidoptera; Gelechiidae). The genome sequence is 302.9 megabases in span. Most of the assembly is scaffolded into 31 chromosomal pseudomolecules, including the Z and W sex chromosomes. The mitochondrial genome has also been assembled and is 15.25 kilobases in length.


Background
Anarsia innoxiella (Acer Sober) is a leaf-mining micro-moth in the family Gelechiidae which was described as new to science in 2017.The moth was found to be a distinct species from the very similar A. lineatella, which is a serious pest of fruit trees (Prunus spp).However, the larvae of A. innoxiella feed on Acer species (Sapindaceae), particularly field maple (Acer campestre).This difference led to the suspicion that A. innoxiella was a separate species; and this was confirmed by DNA barcoding.Taxonomic work found subtle morphological differences between the two species (Gregersen & Karsholt, 2017).This small adult moth (forewing length 6-8 mm) has light and dark grey mottled forewings with black longitudinal streaks, and an especially prominent streak in the middle of the wing.There is some variation in appearance with some adults appearing light and variegated; and others appearing darker, and more closely resembling A. lineatella.In males, the species can be confirmed by genital dissection (Palmer, 2017).
As the species is newly described, information about its distribution worldwide is not complete, but it appears to be widespread in central Europe and southern Scandinavia, and can be locally common (GBIF Secretariat, 2023).In the UK, critical re-examination of previously determined specimens of A. lineatella has found the earliest confirmed record of A. innoxiella to be in 1991 from west Sussex.It has since been verified from many southern counties and it is believed that most, but not all, of the previous records of A. lineatella are actually A. innoxiella.Since 1991, it has occurred annually and these records suggest that the moth is single-brooded, flying between late June and early August (Palmer, 2017).
A genome sequence from A. innoxiella will be useful for further research into this cryptic group of moths.The genome of A. innoxiella was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all the named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.Here we present a chromosomally complete genome sequence for A. innoxiella based on a female specimen from Wytham Woods, Oxfordshire, UK.

Genome sequence report
The genome was sequenced from one female Anarsia innoxiella (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.77,.A total of 78-fold coverage in Pacific Biosciences single-molecule HiFi long was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 23 missing joins or mis-joins and removed two haplotypic duplications, reducing the scaffold number by 20%, and increasing the scaffold N50 by 1.3%. The final assembly has a total length of 302.9 Mb in 31 sequence scaffolds with a scaffold N50 of 10.4 Mb (Table 1).Most (99.99%) of the assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 29 autosomes and the W and Z sex chromosomes.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/2566270.

Sample acquisition and nucleic acid extraction
A female Anarsia innoxiella (ilAnaInnx2) was collected from Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.77, longitude -1.31) on 2021-07-17.The specimen was taken from woodland habitat using a light trap.This specimen was used for DNA sequencing.A second specimen (ilAnaInnx1) was collected from Wytham Woods (latitude 51.77, longitude -1.31) on 2020-08-01.This specimen was used for Hi-C scaffolding.Both specimens were collected and identified by Douglas Boyes (University of Oxford) and were preserved on dry ice.The sample was prepared for DNA extraction extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The ilAnaInnx2 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Whole organism tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.DNA was extracted from whole organism tissue of ilAnaInnx2 at the Wellcome Sanger Institute (WSI) Scientific Operations core using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions.

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.DNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) instrument.Hi-C data were also generated from whole organism tissue of ilAnaInnx1 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Genome assembly, curation and evaluation
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using YaHS

Maurijn van der Zee
Institute of Biology, Leiden University, Leiden, The Netherlands Boyes and Boyes present here the genome sequence of Anarisa innoxiella.It is a high quality genome, sequenced with PacBio 78x coverage.Particularly the use of Hi-C data to scaffold the contigs into chromosomes make this assembly stand out from what was previously standard in the field of genome sequencing.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: comparative genomics, evo-devo, life histories I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Reviewer Report 12 August 2024 https://doi.org/10.21956/wellcomeopenres.21616.r92243 Overall, the presented assembly will be of great value to study cryptic diversification in moths, given that this species has only recently been split from another species based on mitochondrial barcode sequences.Once available, it would be interesting to perform a comparative analysis between the two species and contrast former taxonomical inferences.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Speciation genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Anarsia innoxiella, ilAnaInnx2.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 302,928,861 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (17,173,194 bp, shown in red). .Orange and pale-orange arcs show the N50 and N90 scaffold lengths (10,425,486 and 7,192,583 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilAnaInnx2_1/dataset/ilAnaInnx2_1/snail.

Figure 3 .
Figure 3. Genome assembly of Anarsia innoxiella, ilAnaInnx2.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilAnaInnx2_1/dataset/ilAnaInnx2_1/blob.

Figure 4 .
Figure 4. Genome assembly of Anarsia innoxiella, ilAnaInnx2.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilAnaInnx2_1/dataset/ilAnaInnx2_1/cumulative.

Figure 5 .
Figure 5. Genome assembly of Anarsia innoxiella, ilAnaInnx2.1:Hi-C contact map of the ilAnaInnx2.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=Vjcc-TrMQ5ua30J26JKCBg.