The genome sequence of the Eurasian red squirrel, Sciurus vulgaris Linnaeus 1758

We present a genome assembly from an individual male Sciurus vulgaris (the Eurasian red squirrel; Vertebrata; Mammalia; Eutheria; Rodentia; Sciuridae). The genome sequence is 2.88 gigabases in span. The majority of the assembly is scaffolded into 21 chromosomal-level scaffolds, with both X and Y sex chromosomes assembled.


Background
The Eurasian red squirrel, Sciurus vulgaris, is native to northern Eurasia. In the Atlantic Archipelago of Britain and Ireland, S. vulgaris is under threat from anthropogenic pressure on its native woodland habitats 1 , and from competition from the introduced American grey squirrel, Sciurus carolinensis, particularly mediated by squirrelpox virus (Chantrey et al., 2014). The current population of S. vulgaris in the Atlantic Archipelago is estimated to be 150,000, and there are extensive efforts to conserve this species and expand its range (Hardouin et al., 2019). Here we present a chromosomally assembled genome sequence for S. vulgaris, based on a male specimen from Britain. This genome sequence will be of utility in population genomic analysis of fragmented S. vulgaris populations (Barratt et al., 1999), in managing reintroductions and in investigating the biology of susceptibility to squirrelpox virus (Darby et al., 2014).

Genome sequence report
The genome was sequenced from DNA extracted from a from a naturally deceased male S. vulgaris collected as part of a squirrel monitoring project run by the Wildlife Trust for Lancashire, Manchester and North Merseyside. A total of 51-fold coverage in Pacific Biosciences single-molecule long reads (N50 19 kb) and 44-fold coverage in 10X Genomics read clouds (from molecules with an estimated N50 of 69 kb) were generated. Primary assembly contigs were scaffolded with 10X read clouds, chromosome conformation HiC data, and 111-fold coverage of Bionano optical maps. The final assembly has a total length of 2.88 Gb in 638 sequence scaffolds with a scaffold N50 of 153.9 Mb (Table 1). The majority, 92.7%, of the assembly sequence was assigned to 21 chromosomal pseudomolecules representing 19 autosomes (numbered by sequence length), and the X and Y sex chromosomes (Figure 1-Figure 4; Table 2). The assembly has a BUSCO (Simão et al., 2015) completeness of 93.8% using the mammalia_odb9 reference set. The primary assembly is a large-scale mosaic of both haplotypes (i.e. is not fully phased) and we have therefore also deposited the contigs corresponding to the alternate haplotype. The genome can be compared to that of the grey squirrel, Sciurus carolinensis, which we have also assembled.

Methods
The red squirrel specimen was collected from a garden in Beechwood Drive, Formby, Merseyside, L37 2DQ. Grid ref: SD2829706400 (Lat Long: 53.549316, -3.0836773) by the Wildlife Trust for Lancashire, Manchester and North Merseyside as part of an ongoing programme of recovery of dead squirrels. The spleen was dissected out during autopsy. A full tissue dissection and preservation in 80% ethanol was undertaken and the specimen accessioned by the Natural History Museum, London.
DNA was extracted using an agarose plug extraction from spleen tissue following the Bionano Prep Animal Tissue DNA Isolation Soft Tissue Protocol 2 . Pacific Biosciences CLR long read and 10X Genomics read cloud sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL I and Illumina HiSeq X instruments. Hi-C data were generated by the Aiden  lab using an optimised version of their protocols (Dudchenko et al., 2017). BioNano data were generated in the Rockefeller University Vertebrate Genome laboratory using the Saphyr instrument. Ultra-high molecular weight DNA was extracted using the Bionano Prep Animal Tissue DNA Isolation Soft Tissue Protocol and assessed by pulsed field gel and Qubit 2 fluorimetry. DNA was labeled for Bionano Genomics optical mapping following the Bionano Prep Direct Label and Stain (DLS) Protocol and run on one Saphyr instrument chip flowcell. The total yield of tagged molecules ≥ 150 kb with at least 9 sites was 320.6 Gb (N50 0.25 Mb). A CMAP (Bionano assembly consensus genome map) was de-novo assembled using Bionano Solve (see Table 3 for software versions and sources) yielding 574 maps with a total map length of 3.28 Gb and a map N50 of 86.34 Mb.
Assembly followed a modified version of the Vertebrate Genomes Project assembly protocols 3 . In brief, assembly was carried out using Falcon-unzip (Chin et al., 2016), haplotypic duplication was identified and removed with purge_dups (Guan et al., 2019) and a first round of scaffolding carried out with 10X Genomics read clouds using scaff10x. Hybrid scaffolding was performed using the BioNano DLE-1 data and BioNano Solve.
Scaffolding with Hi-C data (Rao et al., 2014) was carried out with 3D-DNA (Dudchenko et al., 2017), followed by manual curation with Juicebox Assembly Tools (Dudchenko et al., 2018;Durand et al., 2016;Robinson et al., 2018). The Hi-C scaffolded assembly was polished using arrow with the PacBio data, then polished with the 10X Genomics Illumina data by aligning to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012) and applying homozygous non-reference edits using bcftools consensus. Two rounds of the Illumina polishing were applied. The assembly was The genome sequence is released openly for reuse. The S. vulgaris genome sequencing initiative is part of the Wellcome Sanger Institute's "25 genomes for 25 years" project 4 . It is also part of the Vertebrate Genomes Project (VGP) 5 ordinal references programme, the DNA Zoo Project 6 and the Darwin Tree of Life (DToL) project 7 . The specimen has been preserved in ethanol and deposited with the Natural History Museum, London under registration number NHMUK ZD 2019.213, where it will remain accessible to the research community for posterity. All raw sequence data and the assembly have been deposited in the ENA. The genome will be annotated and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.