The genome sequence of the Western Capercaillie Tetrao urogallus Linnaeus, 1758

We present a genome assembly from an individual male Tetrao urogallus (the Western Capercaillie; Chordata; Aves; Galliformes; Phasianidae). The genome sequence is 1,013.2 megabases in length. Most of the assembly is scaffolded into 39 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 16.68 kilobases in length.


Background
The Western Capercaillie (Tetrao urogallus) (Figure 1) is the world's largest grouse species.It is found in mixed coniferous forests across Eurasia, from northern Spain through to Russia.While populations in the northern parts of its range, including Scandinavia and Russia are large, there have been dramatic declines in central and western Europe likely due to habitat fragmentation and hunting.In the UK, the capercaillie was previously driven to extinction in the 18th century, and the present-day population is the result of a successful reintroduction in the UK, which occurred in Scotland during the 1830s (Stevenson, 2007).However, after reaching an estimated population size of 20,000 birds in the 1970s, the capercaillie is once again facing extinction in the UK (Moss et al., 2000;Wilkinson et al., 2024).Huge declines have occurred in the last few decades, with the most recent national survey estimating that only 532 birds remain (Wilkinson et al., 2024).Since the first UK listings of 'Birds of conservation concern' in 1996, the capercaillie has been included as a red-list priority species.The capercaillie's unusual mating system consists of spring leks, in which males congregate to display via dances, clicking and popping sounds in open forest areas.These sounds are thought to have led to its name in Gaelic, capall coille, meaning 'horse of the woods'.These displays, its historic status as a game bird and its association with the last remaining wild Caledonian pine forests has led to its iconic status in Scotland.
Capercaillie require extensive areas of Scots pine (Pinus sylvestris) dominated woodland, which in the UK, is only available in Scotland.This specific habitat need makes capercaillie particularly vulnerable to habitat fragmentation and unfavourable forest management.Limited and fragmented habitat is a recognised cause of population decline and can lead capercaillie populations to become isolated.Less habitat means capercaillie may also be more prone to the impacts of predation and human disturbance.The Review of Capercaillie Conservation and Management commissioned by NatureScot in 2021 included consideration of several fundamental issues facing the species, including predation and human disturbance (NatureScot, 2022).Mortality associated with deer fence collisions and reduced breeding success associated with high April temperatures and high June rainfall were also cited as fundamental issues.
With the UK capercaillie population now at a critically low level, the Cairngorms National Park Authority and NatureScot are tasked with bringing together stakeholders from across the spectrum to explore a range of options to help the species.This includes coordinating activities from fence marking and removal, to working with access takers, expanding pinewood habitat at landscape scale and exploring the feasibility of reinforcement.
An adaptive, evidence-led approach to improve management for this species is the ultimate aim, and a key part of this approach has been the generation of genomic tools to improve management decisions.The RZSS WildGenes team at the Royal Zoological Society of Scotland has generated genetic data from tissue and blood samples from across Europe to create a panel of target enrichment probes that can be applied to degraded but non-invasively collected feather and faecal samples.The collection of these sample types minimises disturbance to the remaining birds and will provide increased insights to the currently used field approaches.The new panel of genomic markers are being used to investigate geographic origin of the capercaillie in Scotland, population structure, genetic diversity and individual identification.The use of the enrichment probes, however, relies on mapping of sequences to a reference genome, until now the Greater prairie chicken genome has been used with limited success.The generation of a capercaillie genome will now allow researchers to identify a greater number of variable markers within this species, increasing our ability to monitor the population in future.By identifying unique individual genetic signatures in the samples, the aim is to improve the accuracy of current population estimates, not only in Scotland, but across Europe.

Genome sequence report
The genome was sequenced from a male Tetrao urogallus found deceased in Cairngorms, Scotland, UK.A total of 23-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 66 missing joins or mis-joins and removed 2 haplotypic duplications, reducing the scaffold number by 7.29%.
The final assembly has a total length of 1,013.2Mb in 317 sequence scaffolds with a scaffold N50 of 71.4 Mb (Table 1).The snail plot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99%) of the assembly sequence was assigned to 39 chromosomal-level scaffolds, representing 38 autosomes and the Z sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).The Z sex chromosome was identified by alignment to Gallus gallus (GCA_016699485.1).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.The estimated Quality Value (QV) of the final assembly is 59.5 with k-mer completeness of 100.0%, and the assembly has a BUSCO v5.3.2 completeness of 96.6% (single = 96.3%,duplicated = 0.3%), using the aves_odb10 reference set (n = 8,338).

Sample acquisition
A male capercaillie carcass (specimen ID SAN0001380, ToLID bTetUro1) was found in Anagach Wood, Strathspey, Scotland on 2020-06-01.The carcass weighed 3.72 kg and was stored frozen at -20°C until a post-mortem was conducted on 2020-08-04.The post-mortem was unable to identify a cause of death, however a skeletal muscle tissue sample was taken and placed in 90% ethanol.This sample was stored at -20°C until transfer to Darwin Tree of Life.The carcass was collected HMW DNA was extracted using the Automated MagAttract v2 protocol (Oatley et al., 2023a).DNA was sheared into an Protocols developed by the WSI Tree of Life laboratory are publicly available on protocols.io(Denton et al., 2023).

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.DNA sequencing was performed by the Scientific Operations core at the WSI on a Pacific Biosciences SEQUEL II instrument.Hi-C data were also generated from muscle tissue of bTetUro1 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Genome assembly, curation and evaluation
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).The assembly was then scaf-  Table 3. Software tools: versions and sources.

Software tool Version
folded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2023).The assembly was checked for contamination and corrected as described previously (Howe et al., 2021).Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and PretextView (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2023), which runs MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.
When referencing "aves_odb10", please include the full designation as "avian database (aves_odb10)" to ensure clarity for readers who may not be familiar with BUSCO databases.

2.
Has karyotyping been performed to verify the chromosome count?3.
The use of the Ensembl pipeline for annotation is only briefly mentioned in the data availability section.It would be beneficial to discuss this more prominently in the methods and results sections, including details such as the number of genes identified.

Are the datasets clearly presented in a useable and accessible format? Yes
Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Genomics
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Reviewer Expertise: genomics, genome assembly, gene prediction, annotation I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Scottish lek site in Caledonian pine forest; displaying male with 6 females.Credit Mark Hamblin.

Figure 2 .
Figure 2. Genome assembly of Tetrao urogallus, bTetUro1.1:metrics.The BlobToolKit snail plot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 1,013,184,029 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (193,917,226 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (71,401,156 and 12,783,881 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the aves_ odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/bTetUro1_1/dataset/bTetUro1_1/snail.

Figure 3 .
Figure 3. Genome assembly of Tetrao urogallus, bTetUro1.1:BlobToolKit GC-coverage plot.Sequences are coloured by phylum.Circles are sized in proportion to sequence length.Histograms show the distribution of sequence length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/bTetUro1_1/dataset/bTetUro1_1/blob.

Figure 4 .
Figure 4. Genome assembly of Tetrao urogallus, bTetUro1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all sequences.Coloured lines show cumulative lengths of sequences assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/bTetUro1_1/dataset/bTetUro1_1/cumulative.

Figure 5 .
Figure 5. Genome assembly of Tetrao urogallus, bTetUro1.1:Hi-C contact map of the bTetUro1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=JQ17gbhIQJWxkHqBdByFYQ.

Reviewer
Report 30 April 2024 https://doi.org/10.21956/wellcomeopenres.23514.r80822© 2024 Alioto T. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Tyler Alioto Fundacion Centro Nacional de Analisi Genomico (Ringgold ID: 478092), Barcelona, Catalonia, Spain The genome note submitted by Ball et al. reports the genome sequence of the Western Capercaillie, Tetrao urogallus.Although Northern populations are large, this grouse species is endangered in the UK, with less than a thousand individuals estimated remaining.The assembly is chromosome-scale and of high quality, meeting the minimum standards recommended by the Earth Biogenome Project.All protocols are appropriate and well-documented.All data conforms with FAIR principles, with read data and assemblies being available in the ENA.Blobtoolkit figures are interactive.Additional QC data are provided at https://tolqc.cog.sanger.ac.uk/, supplementing the figures published in the data note.Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.

Table 3
contains a list of relevant software tool versions and sources.Wellcome Sanger Institute -Legal and GovernanceThe materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the '

Darwin Tree of Life Project Sampling Code of Practice', which
can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: