The genome sequence of the long-horned flat-body, Carcina quercana (Fabricius, 1775)

We present a genome assembly from an individual male Carcina quercana (the long-horned flat-body; Arthropoda; Insecta; Lepidoptera; Depressariidae). The genome sequence is 409 megabases in span. Most of the assembly (99.96%) is scaffolded into 30 chromosomal pseudomolecules, including the assembled Z sex chromosome. The complete mitochondrial genome was also assembled and is 15.3 kilobases in length. Gene annotation of this assembly on Ensembl identified 18,108 protein coding genes.


Background
The long-horned flat-body, Carcina quercana (Fabricius, 1775), is a micromoth belonging to the Depressariidae family. It can be identified by its pastel purple and yellow wing patterning and notably long antennae. In the Western Palaearctic, C. quercana is widespread in Europe, including the UK, and reaches its eastern limit in the Middle East. The species has also recently been introduced into North America. Across its range, C. quercana is generally common but rarely abundant.
The species prefers woodland and garden habitats and is moderately polyphagous on deciduous trees, favouring species within the Fagaceae family (Quercus and Fagus spp.) and the Rosaceae family. Adults fly from May to October, peaking in July, and produce larvae that skeletonise under a silken web. C. quercana has been described as a minor pest of Rosaceae fruit trees, such as apple, pear, cherry and plum among others (Alford, 2016).
Carcina quercana represents a lineage otherwise not present in Europe and is thus of phylogenomic value. It is the only UK representative of the Peleopodidae (hitherto usually included as a subfamily of Depressariidae). This gelechioid family turns out to have previously unsuspected richness in the Old World tropics, containing various lineages previously placed in Oecophoridae and Depressariidae (Wang & Li, 2020), but the species has not been included generally in multi-genomic studies to date.

Genome sequence report
The genome was sequenced from a single male C. quercana ( Figure 1) collected from Ant Hills region, Wytham, Berkshire, UK. A total of 57-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 99-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 15 missing/misjoins, reducing the assembly size by 0.23% and the scaffold number by 32.61%, and increasing the scaffold N50 by 10.28%.
The final assembly has a total length of 409 Mb in 31 sequence scaffolds with a scaffold N50 of 15.7 Mb (Table 1). Most of the assembly sequence (99.96%) was assigned to 30 chromosomal-level scaffolds, representing 29 autosomes (numbered by sequence length) and the Z sex chromosome (Figure 2- Figure 5; Table 2).  DNA was extracted at the Scientific Operations Core, Wellcome Sanger Institute. The ilCarQuer1 sample was weighed and dissected on dry ice. Whole organism tissue was disrupted by manual grinding in a lysis buffer with a disposable pestle. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200 ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing, and a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics Chromium read cloud sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II (HiFi) and Illumina HiSeq (10X) instruments. Hi-C data were generated in the Tree of Life laboratory from whole organism

Beatriz Vicoso
Institute of Science and Technology Austria, Klosterneuburg, Austria A chromosome level assembly, obtained from PacBio Hifi reads and HiC data for the micromoth Carcina quercana, is presented in this manuscript. It consists of 30 pseudo-chromosomes, including the Z-chromosome, that include over 99% of the assembled sequence. As expected for such a high quality assembly, the BUSCO score is >98%. This therefore represents a great resource for a species for which there were until now no genomic resources. I do not have any criticism, just some very minor suggestions for text edits.
The rationale for getting the data is well described, as this species is the only representative of the Peleopodidae subfamily in the UK. Some words about what resources are available for this group outside of the UK would have been helpful (I could not find any other than the Illumina sequencing of Acria ceramitis, and that seems perhaps worth emphasizing).
The protocols are appropriate and the methods very well described. The only thing that could have been added would have been information about what parameters were changed from the default for each of the software listed in Table 3 (or specify that default parameters were used throughout).
A short sentence explaining conceptually how the gene annotation was performed without RNA would also have been helpful.
The data are easily accessed once one finds the correct page on the Darwin Tree of Life website. A direct link to this page would have been helpful.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?

Yes
Competing Interests: No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.