The genome sequence of the white admiral, Limenitis camilla (Linnaeus, 1764)

We present a genome assembly from an individual female Limenitis camilla (the white admiral; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 435 megabases in span. Most of the assembly (99.97%) is scaffolded into 31 chromosomal pseudomolecules, corresponding to 29 autosomes plus the W and Z sex chromosomes. The complete mitochondrial genome was also assembled and is 15.2 kilobases in length. Gene annotation of this assembly on Ensembl identified 12,489 protein coding genes.


Background
The white admiral, Limenitis camilla (Linnaeus 1764), is a widespread species in temperate Eurasia, and is found in southern Britain.While L. camilla is considered a species of Least Concern according to the IUCN Red List for Europe (van Swaay et al., 2010), it is listed as vulnerable on the UK Red List (Fox et al., 2022).In Britain there has been a dramatic decline in populations during the last two decades, but the reasons for this are unclear (Fox et al., 2015;Fox et al., 2022).
The species is found in shady woodland areas, where its larval host plants, honeysuckles (Lonicera sp.), grow.Adults are attracted to bramble flowers for nectar.The white admiral is generally univoltine, but it can have two overlapping generations in some parts of its range, where it can be found as adult from May to September or even beginning of October (Tshikolovets, 2011;Vila et al., 2018).
L. camilla belongs to a species-rich genus that has its centre of diversity in eastern Asia (Tseng et al., 2022).Only three species are found in Europe, L. camilla, L. reducta, and L. populi.These three species are not closely related to each other and appear to represent independent colonisations of Europe from Asia (Tseng et al., 2022).While Lorkovic (1941) reports 31 chromosome pairs for Limenitis camilla, other researchers have documented 30 chromosome pairs (Beliajeff, 1930;Maeki & Makino, 1953;Maeki, 1961).

Genome sequence report
The genome was sequenced from a single female L. camilla (Figure 1) collected from Lupşa, Apuseni Mountains, Alba, Romania.A total of 60-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 87-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 21 missing/misjoins and removed two haplotypic duplications, reducing the assembly size by 0.01% and the scaffold number by 17.05%.
The final assembly has a total length of 435 Mb in 73 sequence scaffolds with a scaffold N50 of 15.2 Mb (Table 1).Most of the assembly sequence (99.97%) was assigned to 31 chromosomal-level scaffolds, representing 29 autosomes (numbered by sequence length) plus the W and Z sex chromosomes (Figure 2-Figure 5; Table 2), in agreement with previous authors suggesting a haploid chromosome number of n = 30 (e.g.Beliajeff, 1930).
The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 98.8% (single 98.4%, duplicated 0.4%) using the lepidoptera_odb10 reference set (n = 5,286).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Genome annotation report
Annotation of the GCA_905147385.1 assembly was generated using the Ensembl genome annotation pipeline (Table 1; Ensembl annotation).The resulting annotation includes 12,489 protein coding genes with an average length of 14,179.79 and an average coding length of 1531.93, and 2,538 non-protein coding genes.There is an average of 7.54 exons and 6.54 introns per canonical protein coding transcript, with an average intron length of 1694.97.A total of 4763 gene loci have more than one associated transcript.The annotation has a BUSCO v5.1.2completeness of C:96.3%[S:95.8%,D:0.5%],F:0.9%,M:2.8%,n:52,86using lepidoptera_odb10.The annotation identified a repeat content of 38.17%.

Sample acquisition and nucleic acid extraction
A single female L. camilla specimen (ilLimCami1) was collected from Lupşa, Apuseni Mountains, Alba, Romania (latitude 46.416, longitude 23.192) by Roger Vila (Institut de Biologia  DNA was extracted at the Scientific Operations Core, Wellcome Sanger Institute.The ilLimCami1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Whole organism tissue was disrupted by manual grinding with a disposable pestle.Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing.HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from remaining whole organism tissue of ilLimCami1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration RNA assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics Chromium read cloud sequencing libraries were

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the L. camilla assembly (GCA_905147385.1).Annotation was created primarily through alignment of transcriptomic data to the genome, with   3. Software tools used.Comment 2: Authors have assembled the mitochondrial genome in MitoHiFi software.But they haven't given the total length of mitochondrial genome sequence in the text.The mitochondrial genome length could be added in the text.

Software tool Version
Overall, the manuscript can be accepted for indexing.

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of methods and materials provided to allow replication by others?

Are datasets clearly presented in a useable and accessible format? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Phylogenetic analysis of clade Macroheterocera moths (Lepidoptera) using mitochondrial genome sequence.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Reviewer Report 28 September 2023 https://doi.org/10.21956/wellcomeopenres.20619.r67056 The gene annotation, conducted using Ensembl, identifies a substantial number of protein coding genes, contributing significantly to our understanding of the genetic basis of L. camilla.The methods section is detailed and well-structured, covering sample acquisition, nucleic acid extraction, and sequencing, ensuring the reliability of the obtained genomic data.
Overall, this article stands as a valuable contribution to the field of biodiversity research and genomics.The genome assembly of Limenitis camilla, along with the ecological context provided, opens avenues for further investigations into the species' biology, adaptation, and population genomics.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others?
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bioinformatics, entomology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This manuscript by Vila et al. describes "The genome sequence of the white admiral, Limenitis camilla".I enjoyed reading this announcement.Thanks for the work you have done on this.I apologize that my review is probably longer than your data note.I think most of these things can be addressed with just a few words, but I felt I owed an explanation for why I was asking, which contributed to the length of the review.
In the introduction, it would be helpful to know what type of chromosomes this species has with respect to metacentric vs. holocentric vs. telocentric.I suspect they should have telomeres, although biological diversity is great and I try not to make assumptions.I don't see telomeres mentioned at all; I am curious if you were able to use them to orient contigs in the pseudomolecule or if they were unresolved.
Figure 2, I don't understand most of the image (gray/peach/orange portions).The data presented in the corners are odd because they could just be put in Table 1.The blue and pale blue make sense, although they are not informative since there is little to no variation from the mean.This likely owes to the size of the bins and/or the scale.The size of the bins is also not clear to me.Are there 1000 of them, presumably each 435,113 bp in size?Or are they each 1000 bp?Either way the bin size is too large to be informative about events that would alter GC content like the presence of ncRNAs or as a proxy for atypical nucleotide content.Most of the information in this figure (possibly all of the information in this figure) would be better served being moved to expect for males and females in a ZW system.In XY animals, there are a large number of male specific genes that wouldn't be transcribed.Is that true of ZW systems too?Or maybe, given biodiversity, is that true of this particular system?
For PacBio, what was the read N50 and the total number of subreads generated?
The methods do not describe how the pseudomolecules were made.Presumably it was using Hi-C but looking at Figure 4 there are some regions that seem like they would have been difficult to distinguish.(Maybe if I knew where the scaffold boundaries are it would be more clear).But also, when you know you have two scaffolds that belong in a pseudomoleucle, how were they put together.How were they both ordered and oriented?Is there a way that if I wanted to I could now disentangle the two scaffolds and differentiate scaffold breaks from pseudomolecule breaks?I know, everyone loves BUSCO.But that's just ~200 proteins when you expect 12,000.I would love it if you could generate a promer plot of your pseudomolecules against a closely related Lepidoptera (particularly if there is one that has closed chromosomes).It really would give a bigger picture of the completeness of your genome.It would also show if the genomes are colinear which might help identify if there are any misassemblies.(I expect that intrachromosome rearrangements >>> interchromosome rearrangements when I say this, although not sure if that is true in Leipidoptera).You might think it is beyond the scope of the announcement.But it really is SO easy to do and would say SO much with respect to the quality of the assembly (at least more so than Figures 2-4...) Unless the journal requires this format, I would prefer that the WGS accession and the SRA accessions (either the experiment or the run accessions) are in the data availability section rather than the table.Or that they are both places.In addition, annotation is described and I'm not finding that anywhere.It very well could be that I'm incompetent, but I checked several different places associated with this accession (CAJHVI010000000 as well as GCA_905147385.1) and I don't see them.Even when all else fails, usually you can see the genes in the graphical viewer at NCBI, but I don't see them.Please ensure the annotation was deposited and is available.
I know insects sometimes have HGT/LGT from Wolbachia.The announcement doesn't mention this.I think it is important for all genome announcement of arthropods (and maybe nematodes) to address this issue.Were all Wolbachia reads screened out, or were they left in?Are there integrations?If there are integrations, how were they handled with assembly?How were they handled for annotation?Lastly, did this female have a Wolbachia endosymbiont or any other endosymbiont?Are they known to have endosymbionts?Has that even been assessed?Was there evidence for any endosymbionts in the genome data?Would the presence of endosymbionts expect to vary based on the tissues used for each of these extractions?
Is the rationale for creating the dataset(s) clearly described?Yes

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format?Partly

Figure 1 .
Figure 1.Forewings and hindwings of the female L. camilla specimen from which the genome was sequenced.Dorsal (left) and ventral (right) surface view of wings from specimen RO_LC_940 (ilLimCami1) from Lupşa, Alba, Romania, used to generate Pacific Biosciences, 10X genomics, Hi-C and RNA-Seq data.

Figure 2 .
Figure 2. Genome assembly of L. camilla, ilLimCami1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 435,112,716 bp assembly.The distribution of chromosome lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (21,933,589 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 chromosome lengths (15,214,206 and 11,199,090 bp), respectively.The pale grey spiral shows the cumulative chromosome count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilLimCami1.1/dataset/CAJHVI01/snail.

Figure 3 .
Figure 3. Genome assembly of L. camilla, ilLimCami1.1:GC coverage.BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilLimCami1.1/dataset/CAJHVI01/blob.

Figure 4 .
Figure 4. Genome assembly of L. camilla, ilLimCami1.1:cumulative sequence.BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilLimCami1.1/dataset/ CAJHVI01/cumulative.

Figure 5 .
Figure 5. Genome assembly of L. camilla, ilLimCami1.1:Hi-C contact map.Hi-C contact map of the ilLimCami1.1 assembly, visualised in HiGlass.Chromosomes are arranged in size order from left to right and top to bottom.The interactive Hi-C map can be viewed at https:// genome-note-higlass.tol.sanger.ac.uk/l/?d=C4aYtOqlRRioeJ532sWPiA.

Table 1 . Genome data for L. camilla, ilLimCami1.1. Project accession data
Table 3 contains a list of all software tool versions used, where appropriate.

Review Current Peer Review Status: Version 1
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).https://doi.org/10.21956/wellcomeopenres.20619.r67052©2023Sivasankaran K. Kuppusamy Sivasankaran Division of Taxonomy and Biodiversity, Entomology Research Institute, Loyola College, Chennai, Tamil Nadu, India Review report I admire the author for compiling Limenitis camilla's (Linnaeus 1764) whole genomic sequence.The authors have followed the appropriate techniques for whole genome sequencing.The proper software was followed for genome assembly and sequence annotation.The authors have annotated PCGs in the whole genome sequence.They have given the total number of PCGs (12,489) in the table.The few comments on the manuscriptsComment 1: According to Fox et al., 2022 the species Limenitis camilla (Linnaeus 1764) is in the vulnerable list.Have the authors got approval from the Insect Red Data Book Committee or Royal Society for Nature Conservation, UK for DNA isolation?
Table 1 and removing the figure.To beat a dead horse (in case you leave the figure in), I also don't understand what the two scales are.The 400+ Mbp one makes sense.But what is the other?Figure3, coloring in the key doesn't match the coloring in the figures.I'm curious about what is going on with the contigs at 50% GC that don't get mentioned.Are they the rRNA arrays?A virus (integrated or otherwise)?Some sort of gene transfer event?I was initially guessing that the large circle that is at 1/2 sequencing depth is probably a sex chromosome but it is hard to tell if it matches Arthropods or something else.I'm also unsure in a female from an W/Z system if you expect to have 1/2 sequencing depth for one or both of the sex chromosomes.You need to explain what the expected chromosomal state of a female in this particular insect (since it can vary even in XY and XO systems, with which most people are familiar).But back to the coloring, I can't tell if these sequences are arthropod or not.This is true for the static image as well as when you open the blobtoolkit URL.I would suggest that since there seems to be some use of transparency in rendering the figure, you end up with shading.Because of this, you likely need to use two different colors, not two shades of the same/similar color.Regardless, a discussion on contamination and contamination removal is needed.Did you have absolutely no contamination.Do you think any of these outliers are contamination?This feeds into the question at the end about Wolbachia and endosymbionts as well.The same coloring problem exists for Figure4.In addition, I question the inclusion of Figure4as I don't see what value it adds.For table 2, it would be helpful to have the sequencing depth for the various technologies in this table.It would also be helpful to know how many contigs and/or scaffolds are in each pseudomolecule as well as how many telomeres and/or centromeres (if they even have centromeres, see previous comment on this).I am so glad you included Figure5, but it really needs to be labeled with the pseudomolecule numbers.It seems obvious until you try to figure out where Z and W are.If Z and W were labeled I would know if that unusual sharing between some of the last pseudomolecules are potential misassemblies or something like a pseudoautosomal region.(Asentenceor two discussing this would be helpful).Related to that, it would be helpful if you mention if you even expect a pseudoautosomal region.I know they happen in XY systems, so I assume they do in ZW systems, but biology is incredibly rich in diversity.Fragment size analysis is described before DNA extraction, which seems odd to me.Please clarify, particularly since fragment size is described twice.I'm unclear what was done.How was the Illumina RNAseq library prepared (e.g.what kit and/or reagents was/were used)?Was it really only from the one female?(If so, are you only getting female transcripts?)How were the reads QC'ed and trimmed, if they were trimmed.What was the read length?How many reads in total were sequenced?In the annotation section, "gap filling via", what does that really mean?What gaps?Do you mean large regions of the genome without RNASeq reads mapping to them?Or do you mean missing exons in genes with low sequencing depth?And remind me, what to