The genome sequence of bittersweet, Solanum dulcamara L. (Solanaceae)

We present a genome assembly from an individual Solanum dulcamara (bittersweet; Eudicot; Magnoliopsida; Solanales; Solanaceae). The genome sequence is 946.3 megabases in span. Most of the assembly is scaffolded into 12 chromosomal pseudomolecules. The mitochondrial and plastid genomes have also been assembled, with lengths of 459.22 kilobases and 161.98 kilobases respectively.


Background
Bittersweet, Solanum dulcamara, is a woody, perennial vine with foetid leaves.It has flowers with purple, reflexed petals and yellow stamens (Figure 1).Pollinated by bees, it forms a bright red, ovoid berry that is dispersed by birds.It is found across Europe, north to Scandinavia and south to Greece and North Africa.It also grows throughout west and central Asia to Manchuria and has been introduced in North America.It occurs commonly across Britain and Ireland, but is rarer in the extreme north and west of the archipelago.
Solanum dulcamara is found in a variety of habitats, but it prefers moist places such as fens, marshes and lake and river shores, often periodically inundated.It can also be found in open woodland, hedgerows, and as a garden weed.The sample studied here originates from the shore of the River Thames in Kingston.
Bittersweet has been used by herbalists since ancient times in Europe to treat bruises and to fend off evil spirits (Drage, 1665;Gerarde, 1597;Grieve, 1971).Its toxicity is relatively low, and poisoning is rare.The fruit is reported to be extremely bitter at first, followed by a sweet aftertaste, hence the name (Mabey et al., 1997).
It contains alkaloids that inhibit bacterial and tumorous growth (Kumar et al., 2009) and its antidermatophytic effect may be applied to treat ringworm and eczema (Bakshi et al., 2008;Fallahzadeh & Mohammadi, 2020).
The high-quality genome of Solanum dulcamara presented here complements the published chloroplast genome (Amiryousefi et al., 2018) and will be a useful resource for those studying Solanaceae and the distribution of alkaloids and medicinally useful compounds in this family.

Genome sequence report
The genome was sequenced from a specimen of Solanum dulcamara (Figure 1) collected from Kingston upon Thames, Surrey, UK (51.42, -0.31).Using flow cytometry, the genome size (1C-value) was estimated to be 1.24 pg, equivalent to 1,210 Mb.A total of 27-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 66-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 25 missing joins or misjoins and removed 15 haplotypic duplications, reducing the assembly length by 3.35% and the scaffold number by 26.71%, and increasing the scaffold N50 by 0.55%.
The final assembly has a total length of 946.3 Mb in 105 sequence scaffolds with a scaffold N50 of 80.0 Mb (Table 1).Most (99.88%) of the assembly sequence was assigned to 12 chromosomallevel scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).The order and orientation of scaffolds in repetitive  regions on chromosome 6 (15.6 to 16.5 Mbp) are uncertain.
Chromosome 10 shows an inversion between haplotypes (41 to 44 Mbp).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial and plastid genomes were also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/45834.

Sample acquisition, genome size estimation and nucleic acid extraction
A Solanum dulcamara (specimen ID KDTOL10035, individual daSolDulc1) was picked by hand from Canbury Gardens, Kingston upon Thames, Surrey (latitude 51.42, longitude -0.31) on 2020-08-12.The specimen was collected and identified by Maarten J. M. Christenhusz (Royal Botanic Gardens, Kew) and was frozen at -80°C.
The genome size was estimated by flow cytometry using the fluorochrome propidium iodide and following the 'one-step' method as outlined in Pellicer et al. (2021).Specifically for this species, General Purpose Buffer (GPB) supplemented with 3% PVP and 0.08% (v/v) beta-mercaptoethanol was used for isolation of nuclei (Loureiro et al., 2007), and the internal calibration standard used was Petroselinum crispum 'Champion Moss Curled' with an assumed 1C-value of 2,200 Mb (Obermayer et al., 2002).
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The daSolDulc1 sample was weighed RNA was extracted from leaf tissue of daSolDulc1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the    et al., 2019).The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021).Manual curation was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022).The mitochondrial and chloroplast genomes were assembled using MBG from PacBio HiFi reads mapping to related genomes (Rautiainen & Marschall, 2021).A representative circular sequence was selected for each from the graph based on read coverage.
A Hi-C map for the final assembly was produced using bwa-mem2 (Vasimuddin et al., 2019) in the Cooler file format (Abdennur & Mirny, 2020).To assess the assembly metrics, the k-mer completeness and QV consensus quality values were calculated in Merqury (Rhie et al., 2020).This work was done using Nextflow (Di Tommaso et al., 2017) DSL2 pipelines "sanger-tol/readmapping" (Surana et al., 2023a) and "sanger-tol/genomenote" (Surana et al., 2023b).The genome was analysed within the BlobToolKit environment (Challis et al., 2020) and BUSCO scores (Manni et al., 2021;Simão et al., 2015) were calculated.Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.
The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Xin Liu
State Key Laboratory of Agricultural Genomics, BGI (Beijing Genomics Institute)-Shenzhen, Shenzhen, China The data note by Christenhusz et al. described sequencing and genome assembly of bittersweet.
The genome is relatively big, but the assembly is of high quality.The method is properly described and the genome assembly was properly assessed.The high quality genome dataset generated here can be valuable for future research.
Is the rationale for creating the dataset(s) clearly described?Yes Reviewer Expertise: Solanaceae as a source of novel disease resistance genes We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Photographs of the Solanum dulcamara (daSolDulc1) specimen used for genome sequencing.a. Habit.b, d.Inflorescence.c. Flower as seen from the front.e. Fruit.

Figure 2 .
Figure 2. Genome assembly of Solanum dulcamara, daSolDulc1.2:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 946,926,110 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (89,582,784 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (80,040,668 and 67,654,137 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the solanales_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Solanum/dataset/CAMXBZ02/snail.

Figure 3 .
Figure 3. Genome assembly of Solanum dulcamara, daSolDulc1.2:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Solanum/dataset/CAMXBZ02/blob.

Figure 4 .
Figure 4. Genome assembly of Solanum dulcamara, daSolDulc1.2:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Solanum/dataset/CAMXBZ02/cumulative.

Figure 5 .
Figure 5. Genome assembly of Solanum dulcamara, daSolDulc1.2:Hi-C contact map of the daSolDulc1.2assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=aQnjuQF5TYGj41Ao0PQH1g.
Are the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?YesAre the datasets clearly presented in a useable and accessible format?YesCompeting Interests: No competing interests were disclosed.Reviewer Expertise: Genome assembly, plant genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Reviewer Report 12 October 2023 https://doi.org/10.21956/wellcomeopenres.22150.r67452Is the rationale for creating the dataset(s) clearly described?YesAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?YesAre the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.

Table 3 . Software tools: versions and sources. Software tool Version integrity
(Guan et al., 2020) using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.Genome assembly, curation and evaluationAssembly was carried out withHifiasm (Cheng et al., 2021)and haplotypic duplication was identified and removed with purge_dups(Guan et al., 2020).One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long Ranger ALIGN, calling variants with FreeBayes(Garrison & Marth, 2012).The assembly was then scaffolded with Hi-C data(Rao et al., 2014)using SALSA2 (Ghurye

Darwin Tree of Life Project Sampling Code of Practice', which
can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.

Open Peer Review Current Peer Review Status: Version 1
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.