Long-Read–Based Hybrid Genome Assembly and Annotation of Snow Algal Strain CCCryo 101-99 (cf. Sphaerocystis sp., Chlamydomonadales)

Abstract Polar regions harbor a diversity of cold-adapted (cryophilic) algae, which can be categorized into psychrophilic (obligate cryophilic) and cryotrophic (nonobligate cryophilic) snow algae. Both can accumulate significant biomasses on glacier and snow habitats and play major roles in global climate dynamics. Despite their significance, genomic studies on these organisms remain scarce, hindering our understanding of their evolutionary history and adaptive mechanisms in the face of climate change. Here, we present the draft genome assembly and annotation of the psychrophilic snow algal strain CCCryo 101-99 (cf. Sphaerocystis sp.). The draft haploid genome assembly is 122.5 Mb in length and is represented by 664 contigs with an N50 of 0.86 Mb, a Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness of 92.9% (n = 1,519), a maximum contig length of 5.3 Mb, and a guanine-cystosine (GC) content of 53.1%. In total, 28.98% of the genome (35.5 Mb) contains repetitive elements. We identified 417 noncoding RNAs and annotated the chloroplast genome. The predicted proteome comprises 14,805 genes with a BUSCO completeness of 97.8%. Our preliminary analyses reveal a genome with a higher repeat content compared with mesophilic chlorophyte relatives, alongside enrichment in gene families associated with photosynthesis and flagella functions. Our current data will facilitate future comparative studies, improving our understanding of the likely response of polar algae to a warming climate as well as their evolutionary trajectories in permanently cold environments.


Significance
In polar regions, cold-adapted algae thrive and bloom, yet genomic studies on these organisms are scarce.This is hindering our understanding of their adaptive mechanisms in the face of climate change.Here, we provide the first genome assembly of the psychrophilic snow algal strain CCCryo 101-99, revealing a higher repeat content compared with mesophilic relatives, and enrichments in gene families crucial for photosynthesis and flagellar functions.This genome data will allow further comparative analyses and help us obtain critical insights into the adaptive strategies of polar algae and their responses to a changing climate.

Introduction
Polar and alpine regions are characterized by extreme environmental conditions with snow and ice covering ∼12% of the Earth's land surface.Significant yet not well-studied components of the polar biota are cold-adapted eukaryotic snow and glacier ice algae, which are the main primary producers in these ecosystems.They shape local biogeochemical cycles and affect albedo through their pigmentation, thus enhancing melting (Lutz et al. 2016;Chevrollier et al. 2023).On semipermanent and permanent snow fields, different snow algae phenotypes bloom during the summer melt seasons, despite the harsh conditions characterized by low temperatures, high irradiation, and oligotrophic nutrient levels.Snow algae produce green, yellow, orange, and red algal blooms on snow fields worldwide (Lutz et al. 2016;Segawa et al. 2018).The everlengthening melt seasons, in particular, in future warming climate scenarios, will lead to an expansion of the habitat extent of snow algae, further exacerbating glacier melt (Lutz et al. 2014;Segawa et al. 2018).
Various chlorophyte algae species belonging to the genera Chlainomonas, Chlamydomonas, Chlorominima, Chloromonas, Limnomonas, Microglena, Raphidonema, Sanguina, and Scotiella have been described from snow habitats, and their ecology, physiology, biogeographical distribution, and taxonomic diversity have been characterized (e.g.Remias et al. 2018;Procházková et al. 2019;Tesson and Pröschold 2022;Novis et al. 2023 and references therein).Molecular studies, particularly metabarcoding and metagenomic approaches, have only recently provided the first critical insights into polar biodiversity and evolution and adaptation to permanently cold environments (Clark et al. 2023).However, most work has focused on marine settings, and there are only a few available psychrophilic algae genomes (Blanc et al. 2012;Mock et al. 2017;Zhang et al. 2020Zhang et al. , 2021;;Hulatt et al. 2024).This hampers our ability to understand their evolutionary history, metabolic potential, and cellular responses and to better address their potential roles in future global warming climate scenarios.Here, we present extensive DNA and RNA sequencing (RNA-seq) data, a long-read-based draft assembly, annotations, and preliminary functional enrichment results of the genome of the psychrophilic snow algal strain CCCryo 101-99 (cf.Sphaerocystis sp.).

Results and Discussion
Genome Assembly DNA sequencing using Illumina,PacBio,and Nanopore generated 31,984,466 (7.55 Gb),1,986,964 (14.1 Gb,read N50: 12.8 kb), and 1,591,524 (1.9 Gb, read N50: 2 kb, max: 400 kb) reads, respectively.For the estimated genome size of 160 Mb, PacBio and Nanopore reads cumulatively provided 98× sequencing depth (86× and 12×, respectively).For error correction with Illumina short reads, 21,366,832 (3.7 Gb) trimmed and quality-filtered reads were mapped onto the long-read assembly with a mapping rate of 98.8%.We compared the metrics and qualities of the assemblies produced by different error correction strategies and assemblers (supplementary table S1, Supplementary Material online).Canu hybrid long-read assembly corrected with Illumina short reads was the longest and most contiguous and had the highest mapping rate and BUSCO coverage and smallest number of assembly errors.An initial error correction step with long reads decreased the BUSCO coverage and contig N50 metrics and introduced additional assembly errors, so the correction was performed using short reads only.
In contaminant screening, 107 contigs comprising 73% of the total genome length (∼90 Mb) were assigned to Chlorophyta and 8 contigs (1.8 Mb) to Streptophyta, and 557 contigs (∼30 Mb) had no hits.In total, 66 contigs of 380 kb length that were assigned to Actinomycetota and Pseudomonadota were removed.The corrected and filtered haploid draft genome is 122.5 Mb in length represented by 664 contigs (68.3× average read coverage) with an N50 of 0.86 Mb, a BUSCO completeness of 92.9% (complete: 92.9% [single-copy: 92.4%, duplicated: 0.5%], fragmented: 2.5%, missing: 4.6%, chlorophyta_ odb10, n = 1,519, mode = genome), a maximum contig length of 5.3 Mb, and a guanine-cystosine (GC) content of 53.1% (Table 1).The total genome size was estimated to be around 160 Mb based on k-mer counts of Illumina sequencing reads.One contig, 458 kb in length and assigned to Chlorophyta, was identified as a candidate chloroplast genome based on GC content (36.5%) and read coverage (2,586×).

Genome Annotation
The repeat content of the genome is 28.98% (35.5 Mb), and the most abundant category was unidentified repeats (13.24%; supplementary table S2, Supplementary Material online).Long terminal repeats (LTRs) comprised 8.85% of the genome, and the most abundant LTR family was Gypsy/DIRS1 (8.41%) which has been found in diverse eukaryotic species including green algae (Piednoël et al. 2011).The second most abundant repeat category was long interspersed retrotransposable elements (LINEs, 4.38%), and the most abundant LINE family was L1/CIN4 (2.67%).CCCryo 101-99 genome has a slightly higher repeat content compared with its mesophilic chlorophyte relatives, 22% in Chlamydomonas reinhardtii (Payne et al. 2023) and 24.76% in Volvox carteri (Prochnik et al. 2010), but not as high as other cold-adapted chlorophytes, 64% and 49% in Chlamydomonas sp.ICE-L and UWO, respectively (Zhang et al. 2020(Zhang et al. , 2021)).Increased content of tandem repeats and transposable elements in the genome can potentially be helpful for organisms that survive in extreme habitats by creating genetic diversity that can drive the emergence of new adaptive traits (Schrader and Schmitz 2019).
The assembled chloroplast genome is circular, has one large (254 kb) and one small (202 kb) single-copy region, and has an inverted repeat comprising five rRNA and one tRNA genes (Fig. 1).In total, 112 genes were annotated in the chloroplast genome including 10 rRNA and 24 tRNA genes.In addition, four group I and three group II catalytic introns were identified.We performed additional analyses to assemble the mitochondrial genome using 65 Chlamydomonadales mitogenomes as a reference; however, no genes could be annotated on assembled contigs.The mitochondrial genomes of chlorophytes harbor significant diversity in terms of gene content and genome structure (Popescu and Lee 2007).Chlamydomonas-like mitochondrial genomes have reduced organizational pattern characterized by small genome size, limited gene content, and the presence of fragmented and scrambled rRNA coding regions (Nedelcu et al. 2000).Our results suggest that CCCryo 101-99 follows this Chlamydomonas-like mitochondrial genome evolution.
Genome annotation assigned a protein or domain to 13,707 genes in total (92.5%)(supplementary table S3, Supplementary Material online).The most enriched category (enrichment score = 37.81) in the functional enrichment analysis against the C. reinhardtii genome included the Gene Ontology (GO) term "chloroplast thylakoid membrane" and photosynthetic genes (i.e.psbW, petH, and rubA).Duplicates and increased protein accumulation of the photosynthetic genes have been reported (Zhang et al. 2020) in the cold-adapted Antarctic green alga Chlamydomonas sp.UWO241 (i.e. compared with the mesophilic relative C. reinhardtii).These authors hypothesized that this is an indication of an adaptation to the cold.The second most enriched category in our annotation (enrichment score = 37.29) contained flagellum-associated genes (CFAP58/91/157/221).During its asexual life cycle, CCCryo 101-99 produces motile spores (zoospores) representing Draft Genome Assembly and Annotation of CCCryo 101-99 young stages of haploid spores.These are not frequently observed in laboratory-grown cultures as they shed their flagella shortly after attaching to surfaces and develop into nonmotile adult spores that divide several times by multiple mitoses and develop into sporangia.Later, young motile zoospores are released from these sporangia to fulfill the asexual life cycle.Nothing is known about their sexual reproduction.
In summary, our analyses on the genome of the snow algal strain CCCryo 101-99 (cf.Sphaerocystis sp.) revealed a slightly higher (29.98%) repeat content, compared with the mesophilic chlorophyte relative C. reinhardtii.Furthermore, the gene families associated with photosynthesis and flagellar functions were enriched.The former may contribute to the adaptation to cold climates by maintaining a high level of CO 2 fixation and, thus, optimizing energy production, while the latter may be important for an effective dispersal strategy in snow habitats by maintaining motility in young cell stages.Our draft-assembled snow algal genome and ongoing work on comparative snow algal genomics will allow in-depth comparisons with the increasing number of available genomes.This will improve our understanding of the evolution and diversification of photosynthetic eukaryotes adapted to living in permanently cold climates and their responses to future global warming climate scenarios.

Strain Description
The snow algal strain CCCryo 101-99 was isolated in 1998 from moss patches along melt streams flowing down from snow fields above on Brøggerhalvoya northwest of Ny-Ålesund, Svalbard, Norway.The strain has been deposited and since then been maintained at the CCCryo Culture Collection of Cryophilic Algae at the Fraunhofer IZI-BB, Potsdam, Germany (Leya 2022).The taxonomic identity of CCCryo 101-99 still is under debate; therefore, for the time being, it is called cf.Sphaerocystis sp.According to its phylogenetic marker (rbcL), it shows relatedness (data not shown) to the genus Achoma (Novis and Visnovsky 2012).Detailed studies are currently in progress.It can be classified as a psychrophilic snow alga as its optimum temperature for growth lies at around 14 °C; its maximum temperature lies above 18 °C (Leya et al. 2009), but above that it starts ceasing growth and eventually dies.A progeny of CCCryo 101-99 had spent 531 d outside the International Space Station in its desiccated cyst stage as part of the BIOMEX experiments and survived (de Vera et al. 2019).CCCryo 101-99 extracts are also used in cosmetic products to guard against aging of the skin caused by extrinsic factors, such as ultraviolet radiation or air pollution, or intrinsic factors such as agingspecific gene expression levels (Stutz et al. 2010).The strain is known to synthesize the antioxidants and pigments: astaxanthin, adonixanthin, canthaxanthin, echinenone, hydroxyechinenone, and saccharose (Leya 2022).

Cultivation and DNA/RNA Extractions
The cultures for nucleic acid extractions were grown in a 3N Bold's Basal Medium at pH 5.5 and 2 °C (green biomass) or 12 °C (orange biomass induced from green biomass) under axenic conditions and continuous illumination.Genomic DNA (gDNA) was extracted from green biomass using the PowerSoil DNA Isolation kit (QIAGEN, Germany) for Illumina sequencing and the QIAGEN Genomic-tips extraction kit for PacBio sequencing, following the manufacturer's instructions.gDNA extraction for Nanopore sequencing was performed using a cetyltrimethylammonium bromide (CTAB)/phenol extraction of green biomass following Chaux-Jukic et al. (2024).We assessed the size, quantity, and integrity of the extracts on a TapeStation 4150 (Agilent Technologies) and a Qubit 4 Fluorometer (Thermo Scientific) and via gel electrophoresis with a low concentration agar (0.4%).For RNA extractions, freeze-dried green and orange cells were processed following Sanz-Luque and Montaigu (2018) with the only modification being the replacement of the sodium dodecyl sulfate lysis buffer with CTAB.

Library Preparations and Sequencing
Illumina library preparations and sequencing were done at the Genome Analysis Centre (Earlham Institute, UK) on an Illumina MiSeq with the 500-cycle v2 chemistry kit in 250 bp pair-end mode.PacBio library preparation was performed using SMRTbell Template Prep Kit v1.0, and the sequencing was done on a PacBio Sequel SMRT Cell (2.1 chemistry) at the NERC Biomolecular Analysis Facility, Liverpool.Nanopore library preparation followed the SQK-LSK110 protocol (Oxford Nanopore, Oxford, UK), and sequencing was performed on a MinION with a FLO-MIN106 flow cell, controlled using MinKNOW (19.10.1) at Aarhus University.Raw Nanopore fast5 reads were basecalled with GPU-Guppy (3.2.6+afc8e14) under default settings.
Eukaryotic mRNA was selected from 1 µg of the RNA pool using the NEBNext Poly(A) mRNA Magnetic Isolation Module (New England Biolabs, #E7490) following the manufacturer's instructions.Sequencing libraries were prepared using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (New England Biolabs, #E7765) following the manufacturer's instructions, with 5× dilution of the NEBNext Adaptor and 14 library amplifications cycles.The resulting libraries were measured on a Qubit 4 (Thermo Scientific) using the dsDNA High Sensitivity assay and visualized on a Tapestation 4150 (Agilent Technologies) using a D1000 ScreenTape and reagents.Sequencing was performed on an Illumina NextSeq 500 with the 300-cycle v2.5 chemistry kit in 151 bp pair-end mode.
Adapter removal and end trimming of Illumina RNA-seq reads were performed using Trimmomatic v.0.39 (Bolger et al. 2014) with parameters "leading: 5 trailing: 5 slidingwindow: 5:10 minlen: 36."Filtered RNA-seq reads were mapped onto the soft-masked genome using STAR v2.5.2b (Dobin et al. 2013).Coding regions were predicted by incorporating RNA-seq mapping data as extrinsic evidence with BRAKER v3.06 (Stanke et al. 2006(Stanke et al. , 2008)), which is an automated pipeline that utilizes ab initio gene prediction tools GeneMark and AUGUSTUS.The 5′ and 3′ UTR regions were predicted with GUSHR (https://github.com/Gaius-Augustus/GUSHR) based on RNA-seq coverage information, and the predictions were added as gene features to structural annotations.The models and splice patterns were manually checked for a subset of genes with IGV v2.14.1 (Robinson et al. 2011).The longest isoform for each gene was selected for the downstream analyses.

Table 1
Genome assembly and annotation statistics of CCCryo 101-99 in comparison with other chlorophyte algae genomes a Zhang et al. 2021.b Hulatt et al. 2024.c Craig et al. 2021.d 17 chromosomes + 37 unassembled scaffolds.