The genome sequence of the hazel dormouse, Muscardinus avellanarius (Linnaeus, 1758)

We present a genome assembly from an individual male Muscardinus avellanarius (the hazel dormouse; Chordata; Mammalia; Rodentia; Gliridae). The genome sequence is 2,497.5 megabases in span. Most of the assembly is scaffolded into 24 chromosomal pseudomolecules, including the X and Y sex chromosomes. The mitochondrial genome has also been assembled and is 16.73 kilobases in length.


Background
The hazel or common dormouse, Muscardinus avellanarius Linnaeus 1758, is brown to ochre-coloured and has a furred tail.It can be as small as 7 cm, and lives arboreally in forests of all types and age classes, including pure spruce forests, park landscapes and riverine forests as well as hedges and shrubberies.The animals hibernate in self-constructed nests on the ground or between rootstocks, and sometimes in nestboxes.The hazel dormouse feeds on seeds, flower buds, berries and other fruits, and nuts, especially hazelnuts.Hazel dormice have a mostly vegetarian diet, but in early summer up to 50 % of the food can consist of insects.It is the only extant species of the genus Muscardinus.
The hazel dormouse is found in northern Europe and Asia Minor and is strictly protected in Europe under the Habitat Directive (annex IV) and Bern Convention (annex III).It is the only dormouse native to Britain, where it is protected under the Wildlife and Countryside Act since its reintroduction in 1993 as part of the English Nature Species Recovery Programme.It is an introduced species to Ireland.In the 2006 European report (Kryštufek et al., 2007) as well as the 2016 global assessment (Hutterer et al., 2021) of the IUCN Red List of Threatened Species, the hazel dormouse is listed as 'least concern'.The German Red List has the hazel dormouse as 'near threatened', with current observations being rare and a moderate declining trend in the long-term in the population (Meinig et al., 2020).The hazel dormouse is threatened by woodland habitat loss and the resulting fragmentation as well as changes in woodland management practices since it depends on forests in combination with hedges and bushes, as dormice do not cross large, open spaces.
Based on mitochondrial markers, Mouton et al. (2012) identified two highly divergent M. avellanarius lineages (lineage 1 in France, Belgium, Switzerland and Italy, lineage 2 in Poland, Germany, Latvia, Lithuania, the Balkan Peninsula and Turkey) and low genetic diversity within the lineages.This underlines the importance of further genomic studies to define conservation units for this species.
Here we present a chromosomally complete genome sequence for Muscardinus avellanarius based on one male specimen from Friesheimer Busch, Germany.

Genome sequence report
The genome was sequenced from one male Muscardinus avellanarius (Figure 1) collected from Friesheimer Busch, Germany.A total of 58-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 34 missing joins or mis-joins, reducing the scaffold number by 6.58%.
The final assembly has a total length of 2497.5 Mb in 212 sequence scaffolds with a scaffold N50 of 119.5 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.78%) of the assembly sequence was assigned to 24 chromosomal-level scaffolds, representing 22 autosomes and the X and Y sex chromosomes.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
The sequenced individual Muscardinus avellanarius (specimen ID SAN00002415, ToLID mMusAve1) was found dead in Friesheimer Busch, Germany on 2021-07-21 and then frozen in a regular -20 °C freezer.Astrid Böhne, Christine Thiel-Bender and Sandra Kukowka collected and identified the species.Subsamples for genome sequencing were taken from the frozen individual with minimal thawing and transferred to dry ice and subsequent storage at -80 °C until further processing.The appropriation of a dead found hazel dormouse and subsequent usage for research was granted by the responsible authority Untere Naturschutzbehörde Rhein-Erft Kreis as an exemption to §45 ABS4 Bundesnaturschutzgesetz Germany.was purified following either the Manual solid-phase reversible immobilisation (SPRI) protocol (https://dx.doi.org/10.17504/protocols.io.kxygx3y1dg8j/v1), or the Automated SPRI protocol (https://dx.doi.org/10.17504/protocols.io.q26g7p1wkgwz/v1) for higher throughput.In brief, the method employs a 1.8X ratio of AMPure PB beads to sample to eliminate shorter fragments and concentrate the DNA.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer    Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Richard Edwards
Minderoo OceanOmics Centre at UWA, The University of Western Australia Oceans Institute (Ringgold ID: 590767), Perth, Western Australia, Australia This manuscript presents the genome sequence of the hazel dormouse.The assembly is in very good shape, with high contiguity and chromosome-level scaffolding.It comfortably exceeds the VGP-2020 standards and represents a useful genomic resource for this, and related, organisms.
Overall, the presentation of the data is thorough and clear.I only have a few minor comments for consideration.
Abstract: "Most of the assembly is scaffolded into 24 chromosomal pseudomolecules, including the X and Y sex chromosomes."-I think this undersells the quality of the genome, which has less than 0.25% unplaced (Table 1).
○ Introduction: "... and lives arboreally in forests of all types and age classes," -Would it be more accurate to say "temperate forests of all types"?From a global perspective, I don't think this statement is true.It does not live in rainforests, for example.
○ Page 3: "Muscardinus avellanarius based on one male specimen from Friesheimer Busch, Germany" -it might be worth clarifying that this is an individual from Lineage 2. (Was this checked?) ○ Page 3, typo: "... corresponding to the second haplotype have also been deposited.The mitochondrial genome…" -space needed between "deposited." and "The".
○ Table 1: I note that the BUSCO Duplication rate is quite high at 2.9%.Is this typical for the lineage, or is a consequence of having both X and Y in the assembly?
○ Figure 5: It would be useful to have the chromosomes labeled, as the order in Figure 5 and Table 2 are not the same.The X chromosome has a clear lack of contacts.Is this because it is represented at half-depth in the HiC data?The 3' end of some of the chromosomes appears to be lacking HiC signal.Is this expected?Some of the chromosome ends also appear to be incorrectly aligned with the HiC signal, e.g.Chromosomes 5, 6, X and 7.This appears to be just a plotting issue, as the interactive map looked fine.Is it possible to fix the plot in the figure?
○ Is the rationale for creating the dataset(s) clearly described?Yes

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of methods and materials provided to allow replication by others?

Details of methods and materials:
The methods section is detailed, covering all aspects from sample acquisition to DNA extraction and sequencing.The protocols are cited with DOIs, providing easy access to the procedures followed.This level of detail is sufficient for replication, as it includes information on sample preparation, DNA fragmentation, and sequencing technologies used.

Presentation of datasets:
The datasets are presented in a clear and accessible manner.The article includes figures such as the BlobToolKit Snailplot and GC-coverage plot, which visually summarize the assembly metrics.
The data is deposited in public repositories, with accession numbers provided for easy access.The comprehensive presentation ensures that the datasets can be readily used by other researchers.
The article successfully presents a high-quality genome assembly of the hazel dormouse, providing a valuable resource for conservation biology.The rationale, methods, and data presentation are all robust, making the study a significant contribution to the field.No major revisions are necessary.The article is well-structured and thorough, meeting all criteria for a robust and reproducible genomic study.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: conservation genetics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Mostafa Ghaderi-Zefrehei Yasouj University, Yasuj, Kohgiluyeh and Boyer-Ahmad Province, Iran I am reaching out to you today with a statement of my enthusiastic backing for the manuscript titled "The genome sequence of the hazel dormouse, Muscardinus avellanarius (Linnaeus, 1758)" which was under my review.My perception of this work is that it has been thoughtfully formulated, meticulously executed and highly perceptive.I have recognized how the authors provided noteworthy insights pertaining to our comprehension on this species making it an important contribution.The authors also did an excellent job of discussing their findings in the context of the existing literature.I am confident that this manuscript will be of great interest to readers of Wellcome Open Research.I highly recommend that it be considered for indexing.
In addition to the strengths mentioned above, and given the fact that I don't consider myself an authority on this particular type of organism, here are some observations regarding your project.
… "The genome was sequenced from one male Muscardinus avellanarius (Figure 1) collected from Friesheimer Busch, Germany.A total of 58-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 34 missing joins or mis-joins, reducing the scaffold number by 6.58%."…

Strength:
The utilization of Pacific Biosciences' single-molecule HiFi long reads, which have the ability to cover vast stretches measuring in hundreds of thousands of bases, is a major asset.Its implementation allows for the compilation of intricate sections such as repetitive sequences and centromeres that prove daunting for standard short-read sequencing techniques.Another advantage is the integration of chromosome conformation Hi-C data to arrange primary assembly contigs.The Hi-C data records the physical connections among DNA segments, offering useful insights into arranging and aligning contigs on chromosomes.
Cautions/considerations: Although it is a typical beginning for genome sequencing initiatives, it's crucial to acknowledge the likelihood of constraints in capturing the genetic variability found among individuals within this species.To attain an all-embracing comprehension of Muscardinus avellanarius' genome, including more study subjects from distinct populations would be advantageous.The completeness of the assembled genome is not directly indicated in the passage.However, it is imperative to evaluate this aspect so as to establish how much of Muscardinus avellanarius' entire genome has been acquired.Such knowledge holds great significance for downstream analyses and interpretations.The crucial step of identifying and characterizing genes as well as other functional elements, referred to as genome annotation, is not discussed for Muscardinus avellanarius.Such an undertaking greatly aids in comprehending the organism's genetic composition and biological processes.
" … The final assembly has a total length of 2497.5 Mb in 212 sequence scaffolds with a scaffold N50 of 119.5 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.78%) of the assembly sequence was assigned to 24 chromosomal-level scaffolds, representing 22 autosomes and the X and Y sex chromosomes.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission ..." Although the assembly is presented as a single haplotype, its phasing remains incomplete.To facilitate research on heterozygosity and genome-wide association studies, it's crucial to establish allelic sequence arrangement for each chromosome through phasing.

Strength:
The confidence in accuracy and contiguity of the assembly is reinforced through chromosomescale scaffolds validation employing Hi-C data.
Cautions/considerations: Although the assembly is deposited as a single haplotype, it lacks complete phasing.Phasing plays a crucial role in researching heterozygosity by determining the sequence of alleles on each chromosome.The passage does not directly mention the thoroughness of the gathered genome.
Evaluating the comprehensiveness of this assembly is vital in establishing how much of Muscardinus avellanarius's entirety has been encompassed.The passage does not explicitly discuss the extent to which the genome has been compiled comprehensively.To determine how much of Muscardinus avellanarius's entirety has been included, it is crucial to assess the thoroughness of this collection.

Is the rationale for creating the dataset(s) clearly described? Partly
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bioinformatics, Systems Genetics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .Figure 3 .
Figure 2. Genome assembly of Muscardinus avellanarius, mMusAve1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 2,497,520,702 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (239,886,027 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (119,481,734 and 64,769,431 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the glires_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Muscardinus%20avellanarius/dataset/mMusAve1_1/snail.

Figure 5 .
Figure 5. Genome assembly of Muscardinus avellanarius, mMusAve1.1:Hi-C contact map of the mMusAve1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=U7gYOKXVRseTzqyRCHHU5g.

Table 1 . Genome data for Muscardinus avellanarius, mMusAve1.1. Project accession data
The workflow for high molecular weight (HMW) DNA extraction at the Wellcome Sanger Institute (WSI) includes a sequence of core procedures: sample preparation; sample homogenisation; DNA extraction; HMW DNA fragmentation; and fragmented DNA clean-up.The mMusAve1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing (as per the protocol at https://dx.doi.org/10.17504/protocols.io.x54v9prmqg3e/v1).For sample homogenisation, thorax tissue was cryogenically disrupted using the Sample Homogenisation: Covaris cryoPREP® Automated Dry Pulverizer

Table 3 . Software tools: versions and sources. Software tool Version Source and
(Abdennur & Mirny, 2020)bit dsDNA High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.Protocols employed by the Tree of Life laboratory are publicly available on protocols.io:https://dx.doi.org/10.17504/protocols.io.8epv5xxy6g1b/v1.Hi-C map for the final assembly was produced using bwa-mem2(Vasimuddin et al., 2019)in the Cooler file format(Abdennur & Mirny, 2020).To assess the assembly metrics, the k-mer completeness and QV consensus quality values were calculated inMerqury (Rhie et al., 2020).

Darwin Tree of Life Project Sampling Code of Practice', which
can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.

the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.