The genome sequence of a cockchafer, Melolontha melolontha (Linnaeus, 1758)

We present a genome assembly from an individual male Melolontha melolontha (a cockchafer; Arthropoda; Insecta; Coleoptera; Scarabaeidae). The genome sequence is 1,656.9 megabases in span. Most of the assembly is scaffolded into 10 chromosomal pseudomolecules, including the X sex chromosome. The mitochondrial genome has also been assembled and is 18.4 kilobases in length. Gene annotation of this assembly on Ensembl identified 17,392 protein coding genes.


Background
The beetle Melolontha melolontha (Linnaeus, 1758) (Coleoptera: Scarabaeidae) is one of several members of the genus commonly known in English as the Cockchafer or May Bug.The name 'melolontha' originates from the ancient Greek for 'fig-sheep' because of the tendency for the beetle to feed on wild figs.It is widely distributed across Europe from the west coast to Ukraine and Turkey in the east, and as far north as southern Scandinavia.It was formerly found in large numbers.Populations have been drastically reduced due to changes in land use and the widespread use of insecticides but have been recovering since the 1980s.In the United Kingdom, it remains locally common in England and Wales with a few scattered records in Scotland.
Cockchafers are large and distinctive enough to make an impression in the public consciousness (Figure 1), appearing in popular art, including paintings, opera, postcards, greeting cards and stamps, and as novelty chocolates (Jones, 2018).
The larvae of M. melolontha feed on roots, taking about three years to develop to pupation.They can cause heavy damage to grasslands, fruit plantations and vineyards.In addition, the adults feed voraciously on the leaves of broadleaf trees, usually oak.In Central Europe the species has been regarded as an agricultural and horticultural pest.A range of control methods have been applied (Malusá et al., 2020), including the use of biological control agents, such as the entomopathogenic fungus Beauveria brongniartii (Kessler et al., 2004) and nematodes (Erbaş et al., 2014).
The specimen used for genome assembly was an adult male.Male cockchafers are known to be strongly attracted to light, and this one flew into a dwelling one warm spring evening on the 20 May 2020 in a rural village in Somerset, south-west England.The high-quality genome sequence for a male M. melolontha reported here has been generated as part of the Darwin Tree of Life project.It will aid in understanding the biology, physiology and ecology of the species.

Genome sequence report
The genome was sequenced from one male Melolontha melolontha specimen collected from Yeovil, Somerset, UK (latitude 50.97, longitude -2.68).A total of 33-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected eight missing joins or mis-joins and removed four haplotypic duplications, reducing the assembly length by 0.15% and the scaffold number by 8.33%, and increasing the scaffold N50 by 1.14%.
The final assembly has a total length of 1,656,9 Mb in 55 sequence scaffolds with a scaffold N50 of 180.5 Mb (Table 1).Most (99.46%) of the assembly sequence was assigned to 10 chromosomal-level scaffolds, representing 9 autosomes and the X sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
A male Melolontha melolontha specimen (icMelMelo1) was collected from Yeovil, Somerset, UK (latitude 50.97, longitude -2.68) on 20 May 2020.The specimen came to light from a rural garden and was collected by Mike Ashworth (independent researcher).The specimen was identified by the collector and preserved on dry ice.
The sample was prepared and DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The icMelMelo1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Thorax tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.HMW DNA was sheared into an average RNA was extracted from abdomen tissue of icMelMelo1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed  using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing were performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and Illumina NovaSeq 6000 (RNA-Seq) instruments.Hi-C data were also generated from head and thorax tissue of icMelMelo1 using the Arima2 kit and sequenced on the HiSeq X Ten instrument.

Genome assembly, curation and evaluation
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2023).The assembly was checked for contamination as described previously (Howe et al., 2021).Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2022), which runs MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.
Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Melolontha melolontha assembly (GCA_935421215.1).Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Ethics and compliance issues
The materials that have contributed to this genome

Species Identification:
The authors should cite the literature used for species identification.

3.
Mitochondrial Genome: I strongly recommend submitting the mitochondrial genome separately with an independent accession number.Moreover, while the authors described the methods, they did not provide information about the mitochondrial genome annotation (coding genes, rRNAs, and tRNAs).It is expected that the authors submit the annotations obtained by MITOS separately.

Annotation:
The authors should identify and quantify the transposable elements in the 5.

genome.
Command Lines: It is common practice to include the command lines used with all software.Providing this information would be very useful for readers.

6.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are the datasets clearly presented in a useable and accessible format? Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Transcriptomic genomics proteomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
2. In relation to the identification of sex chromosomes, I would recommend that information on the sex determination mechanisms or sex chromosomes of this or related species be described in the background.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Marine invertebrate, Evo-Devo I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Beulah Garner
Natural History Museum, London, England, UK In this data note, the authors present the genome assembly of Melolontha melolontha, the European cockchafer.The assembly consist of 10 chromosome scaffolds which is concurrent with other studies of this kind within the Scarabaeoidea.
The latest methods for genome sequencing, assembly, annotation and characterization used in this study are standardized across the Darwin Tree of Life Consortium, being clearly and methodically described here.Sequencing assembly is clearly supplemented with the appropriate illustrations.The standard for generating high quality genome assemblies is high, therefore this genome sequence is a valuable resource for studies into the systematics of this species as well as its relatives.As a curiosity, why was DNA and RNA extracted from different parts of the beetle (Johnson, et al., 2013 [Ref 1]).
The title should be clear this is the European cockchafer and in fact a beetle (Coleoptera).Consistency in reporting the species name, author and year.Cite the original publication (Wägele, et al., 2011[Ref 2]).I would like to see references for the statements on M. melolontha decline and for it's recovery since the 1980s to improve the overall rationale.I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.1758).The genome assembly utilized three sequencing technologies: Pacific Biosciences SEQUEL II, Hi-C Illumina, and PolyA RNA-Seq Illumina.The completeness of the genome assembly was assessed using BUSCO analysis, which indicated a genome size of 1,656.9Mb, with 98.9% of common genes completely present.Figure 1 is very detailed, showcasing some features clearly.However, minor comments should be addressed: In the "Genome Sequence Report," the authors have incorrectly stated the total length of the final assembly as "1,656,9 Mb"; this should be corrected to 1,656.9Mb.

○
Also, in the "Genome Sequence Report" section, the authors state that the number of sequence scaffolds is 55, which differs from the 54 reported in NCBI.This difference may be due to the inclusion of the mitochondrial sequence; if so, the authors should clarify that the mitochondrial sequence is included.This adjustment should also be reflected in Table 1.

○
In Table 1, the authors report the number of contigs as 563, while NCBI lists 562.
○ "Pretext" in the Genome Assembly section should be corrected to "PretextView".Reviewer Expertise: Bioinformatics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Figure 2 .
Figure 2. Genome assembly of Melolontha melolontha, icMelMelo1.2:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 1,656,884,372 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (253,604,678 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (180,468,607 and 119,236,436 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the endopterygota_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/icMelMelo1.2/dataset/CAKXYW02/snail.
note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.All efforts are undertaken to minimise the suffering of animals used for sequencing.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.

References 1 .
Johnson BR, Atallah J, Plachetzki DC: The importance of tissue specificity for RNA-seq: highlighting the errors of composite structure extractions.BMC Genomics.2013; 14: 586 PubMed Abstract | Publisher Full Text 2. Wägele H, Klussmann-Kolb A, Kuhlmann M, Haszprunar G, et al.: The taxonomist -an endangered race.A practical proposal for its survival.Front Zool.2011; 8 (1): 25 PubMed Abstract | Publisher Full Text Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.Reviewer Expertise: Coleoptera taxonomy and systematics.

○
Is the rationale for creating the dataset(s) clearly described?YesAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?YesAre the datasets clearly presented in a useable and accessible format?YesCompeting Interests: No competing interests were disclosed.