The genome sequence of the White-point, Mythimna albipuncta (Denis & Schiffermüller, 1775)

We present a genome assembly from an individual male Mythimna albipuncta (the White-point; Arthropoda; Insecta; Lepidoptera; Noctuidae). The genome sequence is 698.6 megabases in span. Most of the assembly is scaffolded into 31 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 15.38 kilobases in length. Gene annotation of this assembly on Ensembl identified 13,679 protein coding genes.


Background
Mythimna albipuncta, the White-point, is a moth in the family Noctuidae found across much of central and northern Europe, with scattered records from Ukraine, Estonia, Russia, Tunisia and Morocco (GBIF Secretariat, 2023).The adult moth has ochreous-brown forewings with indistinct markings apart from a conspicuous white spot in the position of the reniform stigma.It can be distinguished from the similar Clay moth, Mythimna farrago (Boyes et al., 2022), by a less elongated forewing shape and by the white spot being either diamondshaped or rounded.
Through most of the twentieth century, the species was not a resident breeding species in Britain, but was recorded as an infrequent immigrant species along the south and east coasts.There was evidence of a second brood produced by early summer migrant moths in southern counties such as Dorset (Davey, 2009).In the past twenty years numbers of records have increased dramatically; for example, in Norfolk there were just 4 records in 2000, but this number increased to almost 4000 records in 2021 (NorfolkMoths, 2023).The increase in recording frequency, seen in all southern counties of Britain, is attributed to widespread establishment as a resident breeding species, supplemented by ongoing influx of migrant individuals from France and Spain (Davey, 2009).
The larvae of M. albipuncta feed on various species of grass including cock's-foot (Dactylus glomerata), overwintering at the larval stage.Hibernation is clearly not obligatory, as a second brood can occur in summer in Britain (Davey, 2009) and in captivity adults can be reared from the egg in 2 to 3 months (Heath & Emmet, 1983).
The genome sequence of Mythimna albipuncta was determined and assembled as part of the Darwin Tree of Life project.The complete genome sequence will contribute to the growing set of resources for studying molecular evolution in the Lepidoptera.

Genome sequence report
The genome was sequenced from one male Mythimna albipuncta (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.77,.A total of 37-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 57-fold coverage in 10X Genomics read clouds was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 7 missing joins or mis-joins and removed one haplotypic duplication, reducing the scaffold number by 5.71%, and increasing the scaffold N50 by 0.44%. The final assembly has a total length of 698.6 Mb in 33 sequence scaffolds with a scaffold N50 of 23.9 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.98%) of the assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 30 autosomes and the Z sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
The resulting annotation includes 25,614 transcribed mRNAs from 13,679 protein-coding and 3,250 non-coding genes.
The specimen was collected and identified by Douglas Boyes (University of Oxford) and snap-frozen on dry ice.
The workflow for high molecular weight (HMW) DNA extraction at the Wellcome Sanger Institute (WSI) includes a sequence of core procedures: sample preparation; sample      et al., 2020) andBUSCO scores (Manni et al., 2021;Simão et al., 2015) were calculated.
Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Mythimna albipuncta assembly (GCA_929112965.1).Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material • Legality of collection, transfer and use (national and international) Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.
The assembly is mostly in 31 pseudochromosomes and is of high quality.In addition, the gene annotation is provided using the Ensembl gene annotation system.
As minor improvement, it could mention what (if any) transcriptome data was used in the gene annotation.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Genome assemblies of various species, including butterflies and moths.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Boyes et al. have completed the genome assembly and annotation of the White-point moth ( Mythimna albipuncta).After providing a background on the biology and recent natural history of the White-point moth, they describe in detail their standard protocol for genome assembly, which is consistent with other reports from the Darwin's Tree of Life project.They provide details on the sampling, protocols, and resulting assemblies.The resulting assembly, which is high-quality, complete, and contiguous, is already deposited in public databases.
I only need clarification on the below points: Both chromosome number and genome size vary dramatically in Lepidoptera (1,2).I have not found previous estimations of genome size, so how was the coverage needed for sequencing determined?Annabel Whibley 1 The University of Auckland, Auckland, Auckland, New Zealand 2 Grapevine Improvement, Bragato Research Institute, Lincoln, New Zealand Boyes, Holland and colleagues present the genome assembly and annotation of the White-point moth (Mythimna albipuncta).The report follows the DtOL template, with systematic and comprehensive reporting of protocols, sample and assembly properties and metadata and appropriate deposition of resources in public databases.The assembly is a high-quality resource, with excellent contiguity, completeness and accuracy properties.The natural history background is knowledgeable and engaging and this genomic resource will be of value to the scientific community.
Minor comments: I would prefer that the location collection co-ordinates be given their longitude and latitude qualifiers (they are in the methods, but not in the initial genome sequence report).
I have repeatedly raised in these Data Note reviews that I believe there is an error in the PB-AMPure bead ratio reported for the clean-up of sheared DNA.In my understanding this is a 0.6x ratio of beads to sample volume, not 1.8x as stated in the template.
I would also prefer to see the k-mer size used in MerquryFK estimates of genome properties noted, along with some details of other non-default parameter settings, for example in mapping HiC reads with BWA, though I will acknowledge that this information will be contained within the Zenodo Nextflow workflow archives.
Is the rationale for creating the dataset(s) clearly described?

Kuppusamy Sivasankaran
Loyola College, Chennai, Tamil Nadu, India The genome of the White-point Mythimna albipuncta (Denis & Schiffermuller, 1775) was sequenced with appropriate techniques and assembled using standard software.Through the genome annotation authors have identified protein-coding genes, non-coding genes and gene transcripts.

Some minor comments
The authors have given the genus name Mythimna albipuncta full form throughout the article.First time the genus name should be given in full form then subsequently can be given in short form like M. albipuncta in the entire article.
Table number and title for table 3 was given in between the text in page number 9. I think it may be incorrect.The table number and title can be deleted or replaced.
In the background of the manuscript third paragraph first line after comma "overwintering at the larval stage" is not giving proper meaning at end of the sentence.It can be rewritten.
Last sentence of the fourth paragraph in the background can be modified as "The complete genome sequence will contribute to the growing set of resources for phylogenomic analysis in the order Lepidoptera".
Above all, I confirm that the manuscript meets the necessary scientific standard and is suitable for indexing" Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Phylogenetic analysis of Noctuoidea moth I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Mythimna albipuncta, ilMytAlbi1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 698,566,279 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (37,427,200 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (23,908,972 and 16,462,175 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAKMYI01/dataset/CAKMYI01/snail.

Figure 3 .
Figure 3. Genome assembly of Mythimna albipuncta, ilMytAlbi1.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAKMYI01/dataset/CAKMYI01/blob.

Figure 4 .
Figure 4. Genome assembly of Mythimna albipuncta, ilMytAlbi1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAKMYI01/dataset/CAKMYI01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Mythimna albipuncta, ilMytAlbi1.1:Hi-C contact map of the ilMytAlbi1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=CJQyXHvuRnOqIjJo41kPdQ.

Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Reviewer Report 14 May 2024 https://doi.org/10.21956/wellcomeopenres.22887.r81888© 2024 Sivasankaran K.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.