The genome sequence of the peacock moth, Macaria notata (Linnaeus, 1758)

We present a genome assembly from an individual male Macaria notata (the peacock moth; Arthropoda; Insecta; Lepidoptera; Geometridae). The genome sequence is 394 megabases in span. The majority of the assembly (99.98%) is scaffolded into 29 chromosomal pseudomolecules with the Z sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.


Background
The peacock moth, Macaria notata (Linnaeus, 1758) is a moth in the family Geometridae.This species has wings with a greyish-white ground colour dorsally, and whiter coloration with many small brown spots and stripes on the underside.The dorsal forewing has some brown patches along the costal margin, and a darker-brown patch covering part of the postmedial line.The forewing also has a notable brown, crescent-shaped concavity just below the apex.The hindwing margin is slightly elongated at the end of vein M 3 (Friedrich, 2022).This species thrives in Holarctic deciduous forests and meadows rich in thick brush.Caterpillars tend to feed on leaves of deciduous trees such as oak, birch, alder, poplar, and blackthorn (Friedrich, 2022).The peacock moth's Palearctic distribution ranges from central and northern Europe to eastern Siberia and Japan, and as far south as Iran.There are also populations in Canada and both the east and west coasts of the northern United States (Moth Photographers Group, 2022).Its flight season ranges from May to September, with two generations in central Europe (Friedrich, 2022).The moth is often attracted to light and can be collected commonly.Genomic research for this species could be used to determine zoogeographical patterns within multiple European states, and its relationship to other closely related species (Can et al., 2018;Õunap et al., 2011) .

Genome sequence report
The genome was sequenced from a single male M. notata collected from Wytham Woods, Berkshire, UK (Figure 1).A total of 58-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 104-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 1 misjoin, reducing the scaffold number by 3.13%.
The final assembly has a total length of 394 Mb in 31 sequence scaffolds with a scaffold N50 of 13.8 Mb (Table 1).The majority, 99.98%, of the assembly sequence was assigned to 29 chromosomal-level scaffolds, representing 28 autosomes (numbered by sequence length) and the Z sex chromosome (Figure 2-Figure 5; Table 2).

Raw data accessions
PacificBiosciences SEQUEL II ERR7254658 phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
A single M. notata specimen (ilMacNota1) was collected using a light trap from Wytham Woods, Berkshire, UK (latitude 51.772, longitude -1.338) by Douglas Boyes (University of Oxford).The specimen was identified by Douglas Boyes and snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute.The ilMacNota1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Abdomen tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing.HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible immobilisation using AMPure    generated in the Tree of Life laboratory from head and thorax of ilMacNota1 using the Arima v2 kit and sequenced on a NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021)  Boyes and colleagues report the genome sequence of peacock moth Macaria notata (Linnaeus, 1758).This species can been found in woodland habitat in England and Wales, and there are also reports for its occurrence in Scotland and Ireland.Molecular data of this species are scarce prior to this report, and are mainly confined to COI sequences (plus a few other genes) deposited to the NCBI database.This new genome resource will be very useful for further studies, including but not limited to understanding their evolutionary relationships with other moths, identifying whether there could be cryptic species, or revealing population structure in relations to habitats or geographical locations.
This genome resource is excellent from the summary statistics, with high BUSCO numbers, high sequence continuity (contig and scaffold N50), and majority of sequences contained on the 29 pseudochromosomes (plus mitochondrion).To sum up, this is a valuable contribution.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: I have published with Peter Holland more than three years ago, and confirm that this potential conflict of interest did not affect my ability to write an objective and unbiased review of the article.
Reviewer Expertise: Genomics, evolution, invertebrates I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Sam Whiteford
University of Liverpool,, Liverpool, UK The authors report a genome assembly for the moth species Macaria notata, representing a mosaic of the two haplotypes present in the sampled individual.The alternate mosaic of haplotypes is also made available.The sequencing technologies used, and assembly methodologies applied are appropriate for generating very high-quality genome representations.Some phenotypic features and ecological background of the organism is given for context.The inclusion of a photograph of the sequenced specimen is nice, and the plots are informative of the data collected and the quality assessments performed.The interactive plots are a nice touch, and useful for taking a closer look at the data, particularly the Hi-C maps.The DNA extraction methodology is described in a good level of detail.The underlying raw data appears to be accessible.The authors provide some standard genome quality assessments such as the BUSCO score and coverage vs GC plots.BUSCO analysis shows a very low number of missing, fragmented and duplicated genes, suggesting that genic regions are complete and an accurate haploid representation has been produced.

I have a few comments:
The date that the individual was sampled may be useful metadata to include if available.
A copy or even rough reference version of the code used for the assembly itself and the subsequent QC steps would be useful.I think the assembly portion is largely reproduceable from the written description, however the exact parameters used for the various programs is important.The manual curation steps are a little opaque (e.g. which underlying scaffold was broken and where & what the resulting scaffolds are called), but I'm sympathetic to the difficulty of publishing a perfectly reproducible genome assembly (especially given the scale of DToL).Also, the availability of multiple intermediate genome assemblies for the sake of complete transparency can create unnecessary confusion.
One of the chromosomes/pseudomolecules is labelled as the Z chromosome, but I do not see a description of how this assignment was made.This may be important detail for researchers interested in Z evolutionary history.For this reason, I scored the "replication detail" question as partial.Inclusion of the Z chromosome assignment methodology would improve the replicability.
There are some useful plots presented in the Tree of Life QC webpages that would be nice to include in these genome notes.In particular, the k-mer results from the HiFi data, for a justification of the resulting assembled genome size and to give an estimate of polymorphism, repetitiveness (although the latter points may be beyond the scope of a genome note).
The snail plot is an aesthetically appealing presentation, but the radial layout combined with the multiple layers might make it tricky to interpret for a general audience Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Lepidoptera genomics/genetics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Image of the Macaria notata specimen taken prior to preservation and processing.

Figure 2 .
Figure 2. Genome assembly of Macaria notata, ilMacNota1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 393,787,475 bp assembly.The distribution of chromosome lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (23,248,439 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 chromosome lengths (13,783,192 and 10,416,434 bp), respectively.The pale grey spiral shows the cumulative chromosome count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilMacNota1.1/dataset/CAKMJI01/snail.

Figure 5 .
Figure 5. Genome assembly of Macaria notata, ilMacNota1.1:Hi-C contact map.Hi-C contact map of the ilMacNota1.1 assembly, visualised in HiGlass.Chromosomes are arranged in size order from left to right and top to bottom.The interactive Hi-C map can be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=Ng_bHCybQ9aZiaw8aC3a4w.

Reviewer Report 07
February 2023 https://doi.org/10.21956/wellcomeopenres.20079.r54227© 2023 Whiteford S. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table 2 . Chromosomal pseudomolecules in the genome assembly of Macaria notata, ilMacNota1.1. INSDC accession Chromosome Size (Mb) GC%
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.The genome sequence is released openly for reuse.The M. notata genome sequencing initiative is part of the Darwin Tree of Life (DToL) project.All raw sequence data and the assembly have been deposited in INSDC databases.The genome will be annotated and presented through the Ensembl pipeline at the European Bioinformatics Institute.Raw data and assembly accession identifiers are reported in Table1.

Integrative Approach to Understand the Biogeography, Taxonomy and Ecology of the Macroheteroceran Fauna of the Amanos Mountains in Southern Turkey
. J Entomol Res Soc.2018; 20(2): 91-101.Reference Source Challis R, Richards E, Rajan J, et al.: BlobToolKit-Interactive Quality