The genome sequence of Pycnococcus provasolii (CCAP190/2) (Guillard, 1991)

We present a genome assembly from cultured Pycnococcus provasolii (a marine green alga; Chlorophyta; None; Pseudoscourfieldiales; Pycnococcaceae). The genome sequence is 32.2 megabases in span. Most of the assembly is scaffolded into 44 chromosomal pseudomolecules (99.67%). The mitochondrial and plastid genomes have also been assembled, and the length of the mitochondrial scaffold is 24.3 kilobases and of the plastid genome has been assembled and is 80.2 kilobases in length.


Background
Pycnococcus provasolii is a marine green alga first described by Guillard et al. (1991) and that was one of the first planktonic eukaryotic species to be isolated from below the pycnocline (Vaulot et al., 2008).Despite the early isolation of Pycnococcus information on the genus is scarce to date.P. provasolii is part of a group of small-sized coccoid algae that are gathered under the term prasinophytes, located at the base of the Chlorophyta, which is a sister group of the Streptophyta in Viridiplantae (Lemieux et al., 2014).Given the basal position of prasinophytes within the Chlorophyta, genomic data from members are important as they can hold key to understand the nature of the last common ancestor of all green plants.
Phylogenetic studies based on the SSU of 18S rDNA identified nine distinct lineages within the prasinophytes (clades I-IX) (Guillou et al., 2004;Viprey et al., 2008).The paraphyletic origin of the group is reflected in a vast diversity of cell shapes and photosynthetic pigments (Latasa et al., 2004;Leliaert et al., 2012;Lopes dos Santos et al., 2017;Tragin et al., 2016).Initially, P. provasolii was placed within the order of Mamiellales due to the presence of the photopigment prasinoxanthin, a diagnostic feature of the order at that time.The discovery of prasinoxanthin in another species, Pseudoscourfielda marina led to the formation of a new order Pseudoscourfieldiales with the family Pycnococcaceae, or prasinophyte clade V (Lemieux et al., 2014;Sym & Pienaar, 1993).Both species share a 18S rRNA gene identity of 100 percent (Fawley et al., 1999;Guillou et al., 2004).
The morphology of Pycnococcus provasolii is described as a solitary coccoid cell of spherical shape with a size ranging between 1.5 and 4.0 μm in diameter (Figure 1).The cell wall of Pycnococcus provasolii has the ultrastructural characteristics of green algae and lacks sporopollenin.Environmental DNA studies show a wide distribution pattern of P. provasolii covering different marine regions and light conditions, ranging between 0 and 100 m (Viprey et al., 2008;Zingone et al., 2011).As a phototroph, sensitivity to light is crucial for survival.The photopigment composition of P. provasolii includes chl a, chl b, Mg 2,4-divinylphaeoporphyrin a 5 monomethyl ester, and prasinoxanthin as a major xanthophyll (Guillard et al., 1991;Iriarte & Purdie, 1993).Further, a study by Makita et al. (2021) reported the presence of a bifunctional photoreceptor (PpDUC1) in the chloroplast genome, composed of phytochrome (PHY) and a cryptochrome (CRY).They hypothesise that the presence might have widen P. provasolii's spectral utilisation.
The culture strain CCAP190/2 was isolated by C. Campbell in 2011 from Loch Scridian in Mull (Argyll, Scotland).Here we present the chromosomally complete genome of P. provasolii (190/2) which will help address a grand challenge in protists research, namely the lack of relevant genome sequences and help to understand the diversity and evolutionary history of the light response system in the Viridiplantae.

Genome sequence report
The genome was sequenced from the strain CCAP190/2 of Pycnococcus provasolii (Figure 1) maintained in culture by the Culture Collection of Algae and Protozoa, Oban, Scotland.A total of 444-fold coverage in Pacific Biosciences singlemolecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 8 missing joins or mis-joins and removed 6 haplotypic duplications, reducing the scaffold number by 1.
The final assembly has a total length of 32.2 Mb in 44 sequence scaffolds with a scaffold N50 of 0.8 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.67%) of the assembly sequence was assigned to 44 chromosomal-level scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).Chromosomes 30 and 39 have roughly half coverage, which could be explained by the fact that some green algae can possess a mixture of diploid and haploid chromosomes.Telomeres have been identified on both ends of 36 of 44 chromosomes.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial and plastid genomes were also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectra estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/41880.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The ucPycProv1 sample was weighed on dry ice with some of the sample set aside for Hi-C sequencing.
The sample was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay

Software tool Version
Mitochondrial and plastid contigs were detected using Tiara.
Table 3 contains a list of relevant software tool versions and sources.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material The larger genome size and one more chromosome in strain CCAP190/2 than in strain NIES-2893 suggest remarkable genomic differentiation between intraspecific strains.
Besides the nuclear genome, the newer work by Green and colleagues assembled the mitochondrial and chloroplast genomes.They were 24 and 80 Kbp long, respectively, positioned towards the small end of algal mitogenomes and plastid genomes, but consistent to prasinophyte's small cell size and small nuclear genome.The organelle genomes are a nice addition.
It would be nice to see a comparison of the mitogenomes and plastid genomes with that of the other strain and information about whether the organelle genomes are as much differentiated between strains as the nuclear genome.The lack of such comparison appears to be due to the unavailability of organelle genomes for stain NIES-2893.Nevertheless, some evolutionary insights between these organelle genomes and organelles of other algal lineages would make the data shine much more.
Besides, I do not seem to be able to find gene numbers predicted from the CCAP190/2 nuclear genome.The number and its comparison with the other strains are highly desirable.
In addition, it is curious that the nuclear genome assemblies from the two sequenced strains were not compared in the current report.To me, this constitutes a major shortfall of the report, even though it is a data report.In particular, a systematic synteny analysis would shed light on how much similarity and dissimilarity occur in terms of the organization of DNA between the strains.
Finally, the previous genomic work has revealed a dual photoreceptor containing phytochrome and cryptochrome, initially from metagenome and subsequently confirmed by the genome data of strain NIES-2893(Makita et al. 2021).Does this also occur in this CCAP190/2 strain?If so, are there differences in its organization, copy number, and localization in the chromosomes?
Conceivably, a research paper might be underway providing more complete comparative genomics information and insights into evolution and ecological niche differentiation.I look forward to such a paper.Perhaps only places for additional detail would be How long was the culture grown for in order to obtain the cell density needed for the extraction?

○
The methods / pattern used to identify the telomere sequence are not described specifically or if this repeat sequence is a common one or determined based on this dataset ○ Do the authors feel the lower completeness of the BUSCO (12% missing) is really a function of genes lost in the green algae in general and not present in the chlorophyta_odb10 marker set?
○ But these are all minor points and this is generally a well written and complete genome announcement report.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Genomics, systematics, bioinformatics and genome assembly and annotation.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Light microscopy image of the strain 190/2 of Pycnococcus provasolii; available in the Culture Collection of Alae and protozoa (CCAP).

Figure 2 .
Figure 2. Genome assembly of Pycnococcus provasolii, ucPycProv1.2:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 32,253,916 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (1,490,684 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (757,545 and 486,817 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the chlorophyta_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Pycnococcus%20provasolii/dataset/ucPycProv1_1/snail.

Figure 5 .
Figure 5. Genome assembly of Pycnococcus provasolii, ucPycProv1.2:Hi-C contact map of the ucPycProv1.2assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=PPdGu0uUSVObW9qATZ_0Ag.
Legality of collection, transfer and use (national and international) Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.doi.org/10.21956/wellcomeopenres.22529.r81985© 2024 Lin S. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Senjie LinDepartment of Marine Sciences, University of Connecticut, Storrs, Connecticut, USA This is the second strain of Pycnococcus provasolii, the genome of which has been sequenced, and a valuable addition to the growing algal genomic database.Previously, Makita et al. (2021; https://doi.org/10.1038/s41467-021-23741-5)sequenced the ~22.7 Mbp genome of P. provasolii strain NIES-2893 using paired-end and mate-pair short reads and nanopore long reads.The reads were assembled to 43 scaffolds.From the genome 11,297 genes were predicted, with 89.1% (270/303) BUSCO completeness.This second genome was sequenced using PacBio singlemolecule HiFi long-read sequencing coupled with Hi-C analysis.The genome was assembled to 32.2 Mbp in 44 scaffolds with a length ranging from 0.26 to 1.49 Mbp and a N50 of 0.8 Mbp.The assembly exhibited 86.9% BUSCO completeness.The sequencing technology in the newer work is clearly more of cutting edge from today's point of view, but the older work also did a great job and achieved a reasonable assembly, in fact a slightly higher BUSCO completeness than the newer study.

the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.
1. Makita Y, Suzuki S, Fushimi K, Shimada S, et al.: Identification of a dual orange/far-red and blue light photoreceptor from an oceanic green picoplankton.Nat Commun.2021; 12 (1): 3593 PubMed Abstract | Publisher Full Text Is