The genome sequence of the springtail Allacma fusca (Linnaeus, 1758)

We present a genome assembly from an individual male Allacma fusca (the springtail; Arthropoda; Collembola; Symphypleona; Sminthuridae). The genome sequence is 392.8 megabases in span. Most of the assembly is scaffolded into 6 chromosomal pseudomolecules, including the X 1 and X 2 sex chromosomes. The mitochondrial genome has also been assembled and is 14.94 kilobases in length.


Background
Springtails are one of the most abundant groups of soil animals, found in all sorts of biomes and habitats worldwide (Hopkin, 1997). Allacma fusca is the very first springtail described by De Geer in 1744, which perhaps does not come as a surprise given it is one of the largest springtails in the UK (up to 3.5 mm;Hopkin, 1997). Allacma fusca is a brown-coloured ( Figure 1) globular springtail (Symphypleona; Sminthuridae), commonly found on bark of trees overgrown with green algae. It is native to the palearctic region, but more recently found also on the east coast of the Nearctic and Australia (GBIF, 2023). Tree trunks are a rather unusual habitat for a globular springtail, they usually live in more humid environments closer to the ground -soil and leaf litter. While other springtails breathe through their cuticle, A. fusca, adapted for lifestyle on trees, features a thicker cuticle and a complex system of trachea (Betsch & Vannier, 2009). This is likely an example of convergent evolution to trachea in insects (Hopkin, 1997).
The genetics of A. fusca features several peculiarities. The female karyotype consists of 2n = 12 chromosomes, but the male karyotype is more complex (Dallai et al., 2000). Just like in females, the male zygote is initially fully diploid (2n = 12), but in very early embryogenesis, two chromosomes often referred to as X 1 and X 2 are eliminated. As a result, males show a 2n = 10 karyotype with X 1 X 2 00 sex chromosomes, although they are not the primary mechanism of the sex determination (Dallai et al., 2000), which is thought to be maternally controlled (Haig, 1993). The spermatogenesis of males is also unusual: one full haploid set of chromosomes co-segregate into one secondary spermatocyte, while the one other will contain only four autosomes and no X chromosomes (Dallai et al., 2000). The cell with two missing chromosomes will immediately degenerate, while the cell with the complete haploid set will undergo the rest of meiosis and form two spermatozoa (Dallai et al., 2000). The sperm contains one parental haplotype only (Jaron et al., 2022). It is hypothesised that the retained genome is of maternal origin, therefore A. fusca more likely reproduces via so-called paternal genome elimination (Jaron et al., 2022). A similar type of spermatogenesis is found across at least five globular springtail families (Dallai et al., 1999;Dallai et al., 2000;Dallai et al., 2001;Dallai et al., 2004), and therefore paternal genome elimination is thought to be ancestral to all globular springtails (Symphypleona). The chromosomal genome assembly will contribute to understanding this peculiar reproductive strategy and other unusual features of this species.

Genome sequence report
The genome was sequenced from one male Allacma fusca from a collection at the Ashworth Laboratories, University of Edinburgh. A total of 80-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 221 missing joins or mis-joins and removed 23 haplotypic duplications, reducing the assembly length by 1.83% and the scaffold number by 34.93%, and decreasing the scaffold N50 by 18.48%.
The final assembly has a total length of 392.8 Mb in 95 sequence scaffolds with a scaffold N50 of 66.1 Mb (Table 1). Most (98.82%) of the assembly sequence was assigned to 6 chromosomal-level scaffolds, representing 4 autosomes and the X 1 and X 2 sex chromosomes. A heterozygous inversion was observed on chromosome 3 from ~5,000 kbp to ~45,000 kbp. Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2- Figure 5; Table 2). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited. The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/39272.

Sample acquisition and nucleic acid extraction
The specimen selected for this genome assembly was a male Allacma fusca (ToLID qeAllFusc8) taken from a collection in the Ashworth Laboratories, University of Edinburgh, which had been aspirated from the bark of various smooth-bark trees. Specimen identification was done using the key (Hopkin, 2007). The specimens in the collection were kept in a  cylindrical plaster cage with mossy bark pieces. The specimen was harvested on 2021-10-04 and frozen from live prior to shipping and sample preparation.
The specimen used for Hi-C sequencing (specimen number Ox000724, ToLID qeAllFusc3) was collected by Kamil  was fitted with a BioMasher pestle. High molecular weight (HMW) DNA was extracted using the Salting Out extraction protocol (Steps 2 and 14 of (christopher.laumer, 2023)). HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was size selected using bluePippin. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions using the ultra-low input protocol. DNA sequencing was performed by Edinburgh Genomics on the Pacific Biosciences SEQUEL II (HiFi) instrument. Hi-C data were also generated from whole organism tissue of qeAllFusc3 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument at the Sanger Institute.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Tree of Life collaborator. The Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use. The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible. The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Data availability
European Nucleotide Archive: Allacma fusca. Accession number PRJEB53479; https://identifiers.org/ena.embl/PRJEB53479. (Wellcome Sanger Institute, 2022) The genome sequence is released openly for reuse. The Allacma fusca genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated using available RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.

Francesco Cicconardi
School of Biological Sciences, University of Bristol, Bristol, England, UK In this article, the authors report on the chromosome-level genome assembly of the springtail Allacma fusca (Collembola: Symphypleona), one of the few representatives and under-appreciated group of terrestrial arthropods. I'm also glad they picked a male, so they were able to identify the "sex-chromosomes". This maybe will help to understand the sex determination is these organisms, and if they differ for other non-Symphypleona springtails. The assembly appears to be of good quality using appropriate methods (HiFi long reads, 80-fold coverage, Hi-C data to confirm scaffolding, manual curation of the final assembly) and following the high standard of the Darwin Tree of Life Project. I have only minor comments. I don't think that the sentence "Tree trunks are a rather unusual habitat for a globular springtail, they usually live in more humid environments closer to the ground -soil and leaf litter." in the background section, is very accurate. In fact, the temperate forests where these springtails live still need to be humid for them to actually be found. Also, there are multiple instances where Symphypleona do not live in a very humid environment, such as the epiphyte Sminthurus viridis. I'd change or remove the sentence.
For the annotation of the genome, it is a bit disappointing that they didn't attempt to annotate the genome. Because of the almost absolute lack of well annotated springtail genomes, putting some effort from the Darwin Tree of Life Project consortium would have been very beneficial for the whole community.
Finally, I was wondering why the method section does not list in more detail how the various tools were run (e.g. options).

Is the rationale for creating the dataset(s) clearly described? Partly
Are the protocols appropriate and is the work technically sound? globular springtail, they usually live in more humid environments closer to the ground -soil and leaf litter.' However, it may not be true. Many symphypleones are typical aboveground dwellers, and appear to be rather tolerant to desiccation. They are usually found on living plants, even canopies of tall trees. Beating branches of trees and shrubs often produces abundant specimens of globular springtails (mostly Katiannidae, Bourletiellidae, and Sminthridae). In fact, only neelipleones and some symphypleones (Arrhopalites, some Sminthurinus, etc.) are specialized to a soil life. Probably they would say 'Tree trunks are an unusual habitat for most springtails'. 5. Also in the background, 'While other springtails breathe through their cuticle, A. fusca, adapted for lifestyle on trees, features a thicker cuticle and a complex system of trachea', sounds like A. fusca is the only springtail species possessing trachea, however, it is not true. Trachea are shared by many symphypleones and a family of elongated springtail: Actaletidae. Perhaps they would like to say 'while MOST other springtails breathe through their cuticle'.

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes