The genome sequence of the Annual Mercury, Mercurialis annua L., 1753 (Euphorbiaceae)

We present a genome assembly from a diploid female Mercurialis annua (the Annual Mercury; Tracheophyta; Magnoliopsida; Malpighiales; Euphorbiaceae). The genome sequence is 453.2 megabases in span. Most of the assembly is scaffolded into 8 chromosomal pseudomolecules, including the X chromosome. The organelle genomes have also been assembled, and the mitochondrial genome is 435.28 kilobases in length, while the plastid genome is 169.65 kilobases in length.


Background
The Annual Mercury Mercurialis annua is a widespread species, native to Europe, north Africa and western Asia.It is an ancient introduction (archaeophyte) in Britain and Ireland, where it is most abundant in England, decreasing in abundance to the north, being generally rare in Scotland (Stace, 2010).The species is a wind-pollinated annual, found predominantly in disturbed habitats such as roadsides and gardens.It can be distinguished from the native congener M. perennis based on a range of traits such as its lack of a perennating rhizome, more frequent branching and paler green colour, as well as its distinct ecological preference for less shady sites.
When one of us (JRP) began his investigations as a graduate student into the ecological and genetic reasons for the evolution and maintenance of dioecy, he adopted the rare case of dioecy in the annual species M. annua as a model.M. annua had featured in an early woodblock published by Linnaeus (1749) that drew attention to the sexuality of plants and the transfer of pollen from males to females.It was also studied by Kuhn (1939) in investigations of the meaning of sex-ratio variation and in early work by Westergaard (1958) on mechanisms of sex determination in plants.In 1963, a French student Bernard Durand published a PhD thesis as a monograph on the biosystematics and cytogenetics of M. annua, which, he revealed, is a complex of several polyploid lineages that vary in their sexual system (Durand, 1963).Whereas diploid M. annua is dioecious, higher ploidy levels are variously monoecious and androdioecious (the co-occurrence in a population of males and functional hermaphrodites -in this case, males and monoecious individuals) (Durand, 1963;Durand & Durand, 1992).Androdioecy is a very rare sexual system that was still completely unrecognised for any plant species (Charlesworth, 1984) until 1990(Liston et al., 1990) -Durand's (1963) monograph had been overlooked (Pannell, 1997).Over the last three decades, however, the M. annua species complex has proven a rich model to study not only evolutionary transitions between sexual systems (reviewed in Pannell et al., 2008) and ploidy levels (reviewed in Pannell et al., 2004), but also the ecological genetics and genomic implications of metapopulations and range expansions (González-Martínez et al., 2017;Obbard et al., 2006;Pujol et al., 2009;Pujol & Pannell, 2008).
Work on the sex chromosomes of M. annua prompted the first genome assembly for the species (Veltsos et al., 2018).This assembly was based on Illumina short reads and low-coverage early generation PacBio long reads for scaffolding of a male M. annua.The resulting assembly was highly fragmented (720,537 scaffolds, scaffold N50 of 6,398 bp) and had a limited completeness based on BUSCO scores (76.1% of complete BUSCOs).Despite its relatively low contiguity, this first assembly proved useful for studies of the genomic implications of the species' range expansion in Europe (González-Martínez et al., 2017) and the evolution of sex chromosomes in both diploid and polyploid M. annua (Gerchen et al., 2022;Toups et al., 2022;Veltsos et al., 2018;Veltsos et al., 2019).
A new reference genome of M. annua has now been sequenced as part of the Darwin Tree of Life Project.Here we present the chromosomally complete genome sequence for this species, based on one female specimen collected from the Royal Botanic Gardens, Kew.This new genome has a substantially higher contiguity and completeness than earlier assemblies, and is already providing a timely and much-needed resource.The genome is being used in the investigation of the spectacularly rapid breakdown of dioecy observed in replicate populations from which males were initially experimentally removed and in which the 'leaky' production of male flowers by females has been amplified many fold by frequency-dependent selection on the population sex allocation (Cossard et al., 2021).It will also strengthen ongoing work on the comparative genomics of sex chromosomes across the genus in the context of genome duplication, hybridisation and the introgression of the Y chromosome between distantly related species.Finally, it will be valuable for ongoing ecological genomic analyses of the impact of admixture among populations on population dynamics and local adaptation at and beyond species range margins.We plan to supplement this genome assembly from an XX female with a genome from a male, to understand differences in structure and gene content between X and Y chromosomes.

Genome sequence report
The genome was sequenced from one female Mercurialis annua (Figure 1) collected from Royal Botanic Gardens,  Kew (51.48,.Using flow cytometry, the genome size (1C-value) was estimated to be 0.71 pg, equivalent to 690 Mb.A total of 58-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 102-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 12 missing joins or mis-joins and removed 7 haplotypic duplications, reducing the assembly length by 0.67% and the scaffold number by 1.52%, and decreasing the scaffold N50 by 7.45%.
The final assembly has a total length of 453.2 Mb in 63 sequence scaffolds with a scaffold N50 of 56.1 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.86%) of the assembly sequence was assigned to 8 chromosomal-level scaffolds.Chromosome assignment for this assembly is based on the genetic map produced by (Veltsos et al., 2019).Since the sex chromosomes are homomorphic and have recombined over an extensive autosomal region, they were identified as a single linkage group, therefore, for this diploid female specimen, X has been assigned LG1-X (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial and plastid genomes were also assembled and deposited.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/3986.DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The ddMerAnnu1 sample was RNA was extracted from leaf tissue of ddMerAnnu1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using an Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq  weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Leaf tissue was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA  (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.The mitochondrial and plastid genomes were assembled using MBG (Rautiainen & Marschall, 2021) from PacBio HiFi reads mapping to related genomes.A representative circular sequence was selected for each from the graph based on read coverage.
Table 3 contains a list of relevant software tool versions and sources.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Yoshinori Fukasawa
Center for Center for Bioscience Research and Education, Utsunomiya University, Tochigi, Japan This study presents a novel genome assembly for the Annual Mercury, Mercurialis annua, a widespread plant species renowned for its complex sexual system.The assembly, generated using PacBio HiFi reads and supplemented with 10X Genomics linked reads, was scaffolded using Hi-C data.
The resulting assembly represents a substantial improvement over prior assemblies in terms of contiguity and completeness, as evidenced by high BUSCO scores and consensus quality values determined using Merqury.
The assembled genome spans 453.2 Mb and is predominantly scaffolded into eight chromosomal pseudomolecules, including the X chromosome.The mitochondrial and plastid genomes were also successfully assembled.This high-quality genome assembly serves as a valuable reference for Mercurialis annua and meets current technical standards in the field.The assembly exhibits significantly improved contiguity and completeness compared to previous efforts, making it an invaluable resource for future research.It comprehensively encompasses the entire genetic sequence, including organelle genomes.
Minor points should be addressed: 1 -Mitochondrial Genome Assembly: The authors should explicitly state which method was primarily used for the mitochondrial genome assembly, either MitoHiFi or MBG, and provide a clear rationale for their choice.The current description creates ambiguity and necessitates further clarification.
2 -Origin of K-mers: The source of the k-mers used in Merqury should be clearly stated.This could be addressed by specifying whether they were derived from HiFi reads, 10X linked reads, or a combination of both.3 -Figure 5: Labels for both x-and y-axes, along with clear identification of each chromosome, should be included in Figure 5 to improve its clarity and interpretability.4 -High-Level Statistics: The authors could consider including additional high-level statistics, such as the presence or absence of telomeres and centromeres, to provide a more comprehensive overview of the assembly's quality and completeness for this level of contiguity.These statistics would be particularly relevant for readers interested in the structural organization of the Mercurialis annua genome.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Plant genomics and bioinformatics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Hoang Dang Khoa Do
NTT Hi-Tech Institute, Nguyen Tat Thanh University, Ho Chi Minh City, Ho Chi Minh, Vietnam The authors reported a procedure for sequencing and assembling the nuclear genome of Mercurialis annua.Although the complete sequences of chromosomes were not reported, the 8 chromosomal pseudomolecules, including the X chromosome provided new insights into genomic data of M. annua in comparison to previous studies.The X chromosome sequence is a useful source for further comparative genomic studies about sex chromosomes in M. annua.Additionally, the complete chloroplast and mitochondrial genomes of M. annua were successfully sequenced.However, the authors might characterize the features of these organelle genomes which are essential data for exploring the evolutionary history of M. annua.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Mercurialis annua, ddMerAnnu1.2:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 453,168,992 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (76,280,018 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (56,051,264 and 41,830,924 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the eudicots_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ddMerAnnu1.1/dataset/CALLYH01/snail.

Figure 3 .
Figure 3. Genome assembly of Mercurialis annua, ddMerAnnu1.2:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ddMerAnnu1.1/dataset/CALLYH01/blob.

Figure 4 .
Figure 4. Genome assembly of Mercurialis annua, ddMerAnnu1.2:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ddMerAnnu1.1/dataset/CALLYH01/ cumulative.

Figure 5 .
Figure 5. Genome assembly of Mercurialis annua, ddMerAnnu1.2:Hi-C contact map of the ddMerAnnu1.2assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=Q-UX07cRTeCFRydfKzUNKA.

Reviewer Report 06
June 2024 https://doi.org/10.21956/wellcomeopenres.23239.r84817© 2024 Do H.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.