The First Genome of the Cold-Water Octocoral, the Pink Sea Fan, Eunicella verrucosa

Abstract Cold-water corals form an important part of temperate benthic ecosystems by increasing three-dimensionality and providing an important ecological substrate for other benthic fauna. However, the fragile three-dimensional structure and life-history characteristics of cold-water corals can leave populations vulnerable to anthropogenic disturbance. Meanwhile, the ability of temperate octocorals, particularly shallow-water species, to respond to adjustments in their environment linked to climate change has not been studied. This study reports the first genome assembly of the pink sea fan (Eunicella verrucosa), a temperate shallow-water octocoral species. We produced an assembly of 467 Mb, comprising 4,277 contigs and an N50 of 250,417 bp. In total, 213 Mb (45.96% of the genome) comprised repetitive sequences. Annotation of the genome using RNA-seq data derived from polyp tissue and gorgonin skeleton resulted in 36,099 protein-coding genes after 90% similarity clustering, capturing 92.2% of the complete Benchmarking Universal Single-Copy Orthologs (BUSCO) ortholog benchmark genes. Functional annotation of the proteome using orthology inference identified 25,419 annotated genes. This genome adds to the very few genomic resources currently available in the octocoral community and represents a key step in allowing scientists to investigate the genomic and transcriptomic responses of octocorals to climate change.


Introduction
The pink sea fan, Eunicella verrucosa, is a temperate octocoral within the soft coral order Malacalcyonacea (formerly Alcyonacea; see McFadden et al. 2022) and a member of the Gorgoniidae family. This species is distributed across the northeast Atlantic from western Ireland to (reportedly) the coast of Mauritania in West Africa (Hayward and Ryland 2017) and as far east as the Aegean Sea (Chimienti et al. 2020). The species is mostly found in dense "forest-like" aggregations (Chimienti et al. 2020;Jenkins and Stevens 2022), while at its range-edge, for example, Pembrokeshire, southwest Wales (Holland et al. 2017), it exhibits a patchy distribution. Its depth ranges from 3 to 50 m within the northeast Atlantic (Readman and Hiscock GBE 2017) and down to 200 m in the Mediterranean Sea (Sartoretto and Francour 2012;Chimienti et al. 2019).
Gorgonians often act as key ecological substrates for many epifauna, increasing the structural complexity of benthic ecosystems (Wood 2013;Pikesley et al. 2016). For E. verrucosa, this species' slow growth (Coz et al. 2012), longevity, and physical three-dimensional structure can render local populations vulnerable to ecological pressures, including physical disturbance (Readman and Hiscock 2017) and disease (Hall-Spencer et al. 2007). Given the current distribution of E. verrucosa, and observations of the relationship between thermal regime and distribution in other octocorals (Ferrier-Pages et al. 2009;Haguenauer et al. 2013;Arizmendi-Mejía et al. 2015;Crisci et al. 2017;Oualid et al. 2023), seawater temperature may be a key pressure underpinning local population persistence. Despite this, a dedicated study exploring this has not been conducted (although see Jenkins and Stevens 2022), whilst genomic and transcriptomic analyses have been limited due to the lack of a genome for the species.
Across most of its range, E. verrucosa is protected under the EU Habitats Directive Annex 1 and under Ecologically or Biologically Significant Marine Areas (EBSAs) throughout the Mediterranean Sea. In the United Kingdom, it is a "protected feature" used for the designation of Marine Protected Areas (MPAs) and previous research into genetic connectivity across southwest Britain (Holland et al. 2013(Holland et al. , 2017 has been used to assess whether MPAs represent an "ecologically coherent" network (Jenkins and Stevens 2018).
Ecological research questions are increasingly focused on the adaptive potential of marine taxa to environmental change and how subsequent conservation measures, such as MPA designations, can be more resilient to future anthropogenic and ecological pressures (Donelson et al. 2019;Hoppit et al. 2022). Very few genomic resources for octocorals are currently available, hindering such investigations into their adaptive capacity and the implications this may have for effective conservation and mitigation practices. This report presents the first annotated genome of a pink sea fan that will augment the limited genomic resources available in octocoral research, allowing scientists to investigate hypotheses concerning the species' potential responses to environmental change.

Results and Discussion
Assembly We generated 29.96 GB (∼46.8-fold coverage) of PacBio circular consensus reads (>1 kb in length), producing an initial genome assembly of 467 Mb comprising 11,043 contigs with an N50 of 183,250 bp ("Raw" assembly- fig. 1C). After identification and removal of 6,766 haplotigs (supplementary fig. S1, Supplementary Material online), we produced a final assembly with 4,277 contigs, with an improved N50 of 250,417 bp ( fig. 1B; "Purged" assembly- fig. 1C). Preliminary Benchmarking Universal Single-Copy Orthologs (BUSCO) assessment using the metazoan conserved orthologs (n = 954) showed a completeness score of 86.5% (82.8% single-copy, 3.7% duplicated, 6.9% fragmented, and 6.6% missing) ( fig. 1B and  1C). In comparison with the very few available octocoral genomes, the number of contigs suggests a more contiguous assembly than that of Paramuricea clavata and fewer missing BUSCO genes than the assemblies of P. clavata and Trachythela sp. (table 1).
We identified 213 Mb (45.96%) of repetitive sequence in the genome assembly, of which 18% comprised unclassified repeats and the remaining 27% were categorized into repeat families, the highest being DNA repeats (∼12%) (fig .  1D and supplementary table S2, Supplementary Material online). This is comparable with the genomes of P. clavata (Ledoux et al. 2020) and Trachythela sp. (Zhou et al. 2020), which had 49% and 58.9% of the genome composed of repetitive elements, respectively (table 1). When compared with the genome of Dendronephthya gigantean (12% repetitive elements), a much greater number of repetitive elements were identified in E. verrucosa, but D. gigantean has a considerably smaller assembly size (table 1).

Annotation
We performed gene prediction using paired-end RNA-seq data. The initial annotation ("Raw" annotation- fig. 1C) was produced using the initial gene predictions, recovering 41,933 genes. BUSCO analysis showed a high proteome contiguity and completeness: 92.2% complete BUSCO genes (85.1% single-copy, 7.1% duplicated, 3.7% fragmented, and 4.1% missing) but a high number of duplicated genes ( fig. 1C and supplementary table S2, Supplementary Material online), indicating redundancy in this initial gene set. Filtering this initial proteome for the longest isoform reduced the BUSCO gene duplication from 7.1% to 3.9% ("Longest isoform" annotation- fig.  1C and supplementary table S1, Supplementary Material online). Despite this, the gene annotation was still higher than expected (40,003 protein-coding genes), especially given the number of genes annotated in other octocorals (table 1). Annotations were therefore filtered for 90% clustering similarity, removing 3,904 genes and resulting in an annotation containing 36,099 genes ("90% similarity" annotation- fig. 1C). This final annotation contained 92.2% complete BUSCOs and lowered the duplication rate further from 3.9% to 2.3% ( fig. 1C). Overall, this indicates that the E. verrucosa proteome has the second highest BUSCO gene completeness compared with other available octocoral proteomes (table 1). Functional annotation of the final annotation using eggNOG-mapper and InterProScan identified GBE 25,419 functionally annotated genes, containing 92.3% complete BUSCO genes (88.6% single-copy, 3.7% duplicated, 3.7% fragmented, and 4.0% missing). All versions of these gene annotations are available on GitHub (https://github.com/klmacleod/pinkseafan-genome).

Comparative Proteome Analysis
Using our annotation data sets, we performed a comparative analysis using other available octocoral proteomes.
Filtering of D. gigantean and Stylophora pistillata annotation sequences at 90% sequence similarity using CD-HIT produced a reference Blast database of 18,649 and 22,553 genes, respectively. BlastP of the E. verrucosa proteome against both reference proteomes, using an e-value cut-off 0.00001 and then filtering for a 95% overlap hit ratio, gave 30,308 hits against D. gigantean and 25,876 hits for S. pistillata. Protein comparison of shared functional genes, annotated via eggNOG-mapper and InterProScan, indicated that 23,827 (93.7%) genes were present in all This first annotated genome of E. verrucosa represents a key tool in supporting future investigation into any potential genomic and transcriptomic mechanisms underpinning octocoral responses to environmental change and may also aid comparative analysis between octocorals and the more widely studied tropical, stony corals.

Sample Collection
A detailed description of the sample collection can be found in the Supplementary Material. Six colonies were collected via SCUBA at 8.5-12.7 meters depth in Plymouth Sound, England (lat. 50.33, long. −4.14) (L/2019/00143), representing the northern region of the species' global distribution. Colonies were transported in chilled seawater to a 350-l artificial seawater tank at the Aquatic Resources Centre, University of Exeter. Colonies were left to acclimate for 19 months at 14.3 °C (+/−0.5 °C) degrees prior to DNA extraction.

Genomic DNA Extraction, Library Preparation, and Sequencing
Extracting sufficient quality and quantity of DNA and RNA from octocoral polyps is notoriously challenging. Whilst the underlying reasons for this is not well-understood, we have dedicated significant efforts to optimize extraction (detailed protocols are available in the Supplementary Material). Briefly, genomic DNA was extracted from polyp tissue using a salting-out protocol (Jenkins et al. 2019) optimized for the semi-rigid, gorgonian protein tissue of E. verrucosa. DNA extraction integrity was assessed on a 1% agarose gel, purity using a NanoDrop 1000 spectrophotometer, and concentration using the Invitrogen dsDNA HS Assay kit and a Qubit 4 Fluorometer. DNA extractions were cleaned using the Qiagen DNeasy PowerClean Pro Cleanup Kit according to the manufacturer protocol until DNA precipitation, which was performed using isopropanol and then elution of genomic DNA via the salting-out protocol.
The quality of genomic DNA was checked on a pulsed field gel (Bio-Rad Chef-DR II). SMRTbell libraries were prepared using a SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences) including a size selection of 15 Kb or greater using a BluePippin (BPLUS10, Sage Science). The library was diffusion loaded at 5 pM on the PacBio Sequel I using SMRTcell 1Mv3. Data were sequenced across three SMRT cells (expected output range of 120-180 Gb). Sequencing was carried out by the University of Exeter Sequencing Service.

Genome Assembly and Quality Assessment
An estimate for the expected size of the E. verrucosa genome size was not available, and we did not generate shortread data to estimate the genome size using kmer profiling. We therefore used the C-value of the only soft coral, Sarcophyton sp. (640 Mbp; Adachi et al. 2017) as a guide for the expected genome assembly span.
The genome was assembled using the assembly algorithm in Flye v2.8.3 (Kolmogorov et al. 2019) with default settings for PacBio long reads. Based on initial assessment of assembly contiguity and BUSCO completeness, reads shorter than 1,000 bp were removed from the initial assembly. Contamination of contigs was assessed using Blobtools v1.1.1 (Laetsch et al. 2020), but no evidence of contamination was found (supplementary figs. S3, and  (Guan et al. 2020). To evaluate the assembly, presence and completeness of orthologs were assessed using BUSCO v5.1.3 (Simão et al. 2015) using the metazoan database containing 954 genes.

RNA-Seq Extraction, Library Preparation, and Sequencing
RNA-seq data were collected from E. verrucosa colonies which had undergone an ex situ thermal experiment. Briefly, 10 cm fragments were individually exposed to thermal regimes representing the minima and maxima across the species' range. After 24 h, whole fragments were flash frozen in liquid nitrogen and stored at −80 °C. In total, 24 fragments were sampled. RNA extraction was conducted using QIAzol Lysis Reagent (QIAGEN) and 1-bromo-3chloropropane (BCP), and RNA-seq libraries were prepared from total RNA using the NEB Next® Ultra™ RNA Library Prep Kit (see Supplementary Materials). Quantified libraries were pooled and paired-end sequenced on a NovaSeq 6000 S4 flow cell (Illumina, Inc).
Functional annotations were assigned to the final set of predicted genes using eggNOG-mapper v2 (Cantalapiedra et al. 2021), which examines orthologous gene clustering through the detection of orthologous groups, and InterProScan v5.61-93 (Jones et al. 2014), which performs annotation of protein family and domain information through integration of protein signatures. Functional annotations from both sources were then integrated using annotate from funannotate v1.7.4 (https://github. com/nextgenusfs/funannotate) to produce a final annotation file that was then assessed for BUSCO completeness.

Comparative Analysis
To further assess the final set of functionally annotated genes, the number of shared functional genes with other octocorals was compared. The proteomes of D. gigantean (Jeon et al. 2019) (GCF_004324835.1) and S. pistillata (Voolstra et al. 2017) (GCF_002571385.1) were downloaded from the NCBI database, and annotations were clustered by 90% similarity using CD-HIT (Fu et al. 2012). The number of shared homologous genes was assessed using BlastP (Altschul et al. 1990) to the E. verrucosa genome. Annotations of predicted protein-coding genes were identified using the eggNOG v2 and InterProScan v5.61-93. A recent genome assembly of the deep-water octocoral Trachythela sp. (GCA_016169945.1) is available online, but unfortunately, arrived too late for inclusion in our comparative analysis.

Supplementary Material
Supplementary data, Supplementary Material online are available at Genome Biology and Evolution online (http:// www.gbe.oxfordjournals.org/).