The genome sequence of the two-spot ladybird, Adalia bipunctata (Linnaeus, 1758)

We present a genome assembly from an individual male Adalia bipunctata (the two-spot ladybird; Arthropoda; Insecta; Coleoptera; Coccinellidae). The genome sequence is 475 megabases in span. Most of the assembly (94.87%) is scaffolded into 11 chromosomal pseudomolecules, with the X and Y sex chromosomes assembled. The complete mitochondrial genome was also assembled and is 21.2 kilobases in length. Gene annotation of this assembly in Ensembl identified 13,611 protein coding genes.


Background
The two-spot ladybird, Adalia bipunctata (Linnaeus, 1758) is a Holarctic species native to Europe, Central Asia and North America. A. bipunctata was once the second most common ladybird in the US, but the invasion of the predatory Asian species, the harlequin ladybird, Harmonia axyridis has seen a rapid decline in the two-spot population over the last decade (Kenis et al., 2020). This widespread species occupies a variety of habitats, from deciduous or coniferous woodlands to orchards and crops. In temperate regions, adults appear in March and are known to overwinter in large groups along with other common species in among loose bark, leaf-litter and outhouses. Both adult and larval forms of A. bipunctata are voracious aphidophagous hunters, making them suitable biocontrol agents against aphids in agricultural systems (Riddick, 2017). Two-spots exhibit complex polymorphism with typical morphs conspicuously marked with vivid red elytra and a large black spot in the middle of each (Figure 1), whilst melanic morphs display a black elytra with red spots (Rutkowski et al., 2019).
The two-spot ladybird is a classic model for population genetics studies, and a complete genome assembly of A. bipunctata may help to characterise the genetic diversity underpinning phenotypic polymorphisms among populations across different environments (Gautier et al., 2018).
We present a complete genome assembly for A. bipunctata as part of the Darwin Tree of Life project, which aims to sequence the genomes of 70,000 species of eukaryotic organisms in Britain and Ireland.

Genome sequence report
The genome was sequenced from an individual male A. bipunctata (icAdaBipu1) purchased live from Dragonfli, Essex, UK. A total of 48-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 80-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 72 missing/misjoins and removed 17 haplotypic duplications, reducing the assembly size by 2.45% and the scaffold number by 34.08%, and increasing the scaffold N50 by 105.95%. The final assembly has a total length of 475 Mb in 118 sequence scaffolds with a scaffold N50 of 45.9 Mb ( Table 1). Most of the assembly sequence (94.87%) was assigned to 11 chromosomal-level scaffolds, representing 9 autosomes (numbered by sequence length), and the X and Y sex chromosomes (Figure 2- Figure 5; Table 2).
While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Genome annotation report
The A. bipunctata genome was annotated using the Ensembl rapid annotation pipeline (Table 1; https://rapid.ensembl.org/ Adalia_bipunctata_GCA_910592335.1/Info/Index). The resulting annotation includes 26,646 gene transcripts from 13,611 protein-coding genes and 3,277 non-coding genes.

Methods
Sample acquisition and DNA extraction One male A. bipunctata specimen (icAdaBipu1), purchased live from Dragonfli, Essex, UK, was used for this genome assembly. The specimen was preserved on dry ice. DNA was  extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The icAdaBipu1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Whole tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200 ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solidphase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was    © 2023 Nomura S. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Division of Evolutionary Developmental Biology, National Institute for Basic Biology, Okazaki, Japan
This study performed the chromosome-level genome assembly of the two-spot ladybird, Adalia bipunctata. Authors assembled the contigs using PacBio single-molecule HiFi long reads and scaffolded them using Hi-C data. As a result, the authors obtained 118 scaffolds with 45.9 Mb of N50 and 97.6% of BUSCO score. Among the assembly, 94.87% were assigned to 9 autosomes and X and Y sex chromosomes, which was consistent with the number of chromosomes observed in A. bipunctata.
Authors found high quality chromosome-level assembly, and the methods of analyses were appropriate and well explained. I consider this manuscript is worthy for publication. I note below a few minor questions.
1. How many ladybird species have chromosome-level genome sequences been published? I know the harlequin ladybird has been well studied and that genome assembly has been published, but has chromosome-level assembly been done for other ladybird species? Also, does the genome compositions or number of chromosomes in two-spot ladybird are different from other ladybirds that have chromosome-level genome assembly? This information would be useful to readers who use the assembled genome sequences and would be better to be explained in the Background section.
2. How did authors distinguish X and Y sex chromosomes from autosomes? If authors obtained male and female reads, they can be distinguished based on differences in reads mapping between the sexes. Did the authors use such a method? In any case, it would be better to be explained the methods of distinguish them.

Is the rationale for creating the dataset(s) clearly described? Partly
Are the protocols appropriate and is the work technically sound? coincide with the cytological data? How does the genome size and GC content relate to other insects and beetles? Any property of the genome that deviates from expectations? Similarly, the results of the analyses shown in the figures are not well explained and not interpreted at all leaving it to the reader to make sense of those findings (see comments below).
My second concern is the annotation: A well annotated gene set is a very important aspect to make the resource useful for others. In that respect, I remain a bit unsure with respect to the comparably low number of genes. If true, this would represent a surprising case of gene loss, which should be highlighted. However, it could as well be an erroneous number due to a annotation issues. Actually, it appears that not much emphasis has been put into that topic: The annotation procedure is described rather briefly, information on quality control or the optimization of annotation parameters are not given and it appears that no transcriptomcis data nor comparative annotation across species have been used to enhance the annotation. In summary, some more efforts to extract a good gene set would make the resource even more valuable unless such efforts are underway by groups dedicated to that challenge.
Finally, the presentation and description of some data could be improved in order to help understanding the analyses.
The following aspects have remained unclear to me or were hard to read in the current form: I remained a bit unsure, how the chromosomes were assembled from the contigs -I guess this was based on Hi-C data?
○ Figure 2 was hard to read for me. It may be more obvious for genomics aficionados -to help the "users" one could make the figure more intuitive. Some suggestions: -mark the extend of individual scaffolds with lines (similar to the red line marking the largest scaffold) and number them in order to relate them to the other data -mark the X and Y chromosomes -the use of colors for indicating the length of scaffolds was difficult to interpret and it leads to some weird depictions: The scaffold covering the N50 line seems to belong at the same time to N50 (orange) and larger scaffolds (grey). Does that scaffold really exactly represent the N50 value? Same with N90. Suggestion for an alternative: Mark N50/N90 simply with dotted circles instead of colors and the length of the scaffolds with a line instead of a color.
-visualized on that scale the GC content seems to be essentially constant -hence, there is not much information that can be gained from the blue circles outside. Does the GC content really not vary along the chromosomes (e.g. in centromeric regions)? I would have loved to know, which chromosomes have the higher AT content (the smaller chromosomes? sex chromosomes?) ○ Figure 3: I have remained unsure what actually was plotted here. The 118 scaffolds? I did not understand how you map a scaffold to a phylum? Based on homology of the contained protein coding genes? Isn´t it trivial that all scaffolds map to arthropods given that beetles belong to that clade? What do the axes represent? (what is gc coverage -do you mean gc ○ content? what is the axis "ERR6842...."?). An interesting aspect appeared to be that there are scaffolds with clearly lower GC content -but what does that mean? Do these represent the centromers? Or the sex chromosomes? I have to confess that I have remained unsure what I can actually learn from that figure. Either the results are better described and the results interpreted or instead of that plot the GC content is plotted along each chromosome, which might be more informative.  Fig. 2), mark the sex chromosomes. Any curious patterns worth mentioning? ○ Is the rationale for creating the dataset(s) clearly described? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Partly