The genome sequence of common fleabane, Pulicaria dysenterica (L.) Bernh. (Asteraceae)

We present a genome assembly from an individual Pulicaria dysenterica (common fleabane; Tracheophyta; Magnoliopsida; Asterales; Asteraceae). The genome sequence is 833.2 megabases in span. Most of the assembly is scaffolded into 9 chromosomal pseudomolecules. The mitochondrial and plastid genomes were assembled and have lengths of 375.47 kilobases and 150.94 kilobases respectively.


Background
Pulicaria dysenterica belongs to Asteraceae and is a rhizomatous, perennial herb of damp or wet, open habitats.It can form dense clusters, especially in marshy places like fen-meadows, reed beds, dune slacks, wet hollows, and the edges of lakes, rivers, canals, streams, ditches and seepages on sea cliffs.It is also found at the edges of damp woodland and roadside verges, and it is sometimes cultivated along pond edges in gardens, where it prefers sun and plenty of water.It can be found from sea level to 325 m elevation and is a Eurasian Southerntemperate element in the flora, reaching its northern limit in eastern Denmark.It is common in England, Wales and Ireland, but rare in Scotland (OABIF, 2022).
Plants are hairy and grow to about 60 cm tall with alternate leaves that clasp the stem.The golden-yellow flowers are arranged in dense heads with a centre of up to 100 bisexual disc florets surrounded by up to 30 narrow ray florets, which are female.In fruit, the flower heads reflex exposing the fluffy pappus.Seeds are dispersed by wind.
The leaves of common fleabane are the main food for the fleabane tortoise beetle (Cassida murraea), and the larvae of the dusky plume (Oidaematophorus lithodactyla), a micromoth, also feed on leaves and shoots (Kimber, 2022;Salisbury, 2004).The larvae of three other micromoths also feed on fleabane: larvae of the light fleabane neb (Ptocheuusa paupella) and dark fleabane neb (Apodia bifractella) feed on the seeds and flowers, and the larvae of the fleabane fanner (Digitivalva pulicariae) mine the leaves (Kimber, 2022).
Both the generic name Pulicaria and common name fleabane refer to its former use as an insectifuge (Pulicaria is derived from Pulex, a genus of fleas).Dried stems were burned to rid linen of fleas and other insects.The salty-astringent juice from the fresh plant was formerly used for various maladies including dysentery, hence its specific name (Horwood, 1919).A recent study has demonstrated biological activity against a range of bacteria, potentially supporting the use against dysentery (Radulović et al., 2022).This is the first complete genome sequence of Pulicaria, and it will contribute to the importance of Asteraceae studies, such as those involving genomics and genome size (Palazzesi et al., 2022).

Genome sequence report
The genome was sequenced from one Pulicaria dysenterica specimen (Figure 1) collected from Kingston upon Thames, Surrey, UK (51.42,.Using flow cytometry, the genome size (1C-value) was estimated to be 1.10 pg, equivalent to 1,070 Mb.A total of 28-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 35-fold coverage in 10X Genomics read clouds was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 30 missing joins or mis-joins and removed 6 haplotypic duplications, increasing the assembly length by 8.43%, reducing the scaffold number by 11.43%, and decreasing the scaffold N50 by 45.48%.
The final assembly has a total length of 833.2 Mb in 60 sequence scaffolds with a scaffold N50 of 99.1 Mb (Table 1).Most (99.66%) of the assembly sequence was assigned to 9 chromosomal-level scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial and plastid genomes were also assembled and can be found as contigs within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/56535.

Sample acquisition, genome size estimation and nucleic acid extraction
Samples of an individual Pulicaria dysenterica (specimen ID KDTOL10042, ToLIDdaPulDyse1) was picked by hand along the River Thames in Canbury Gardens, Kingston upon Thames, Surrey (latitude 51.42, longitude -0.31) on 2020-08-12.The specimen was collected and identified by Maarten J. M. Christenhusz (Royal Botanic Gardens, Kew) and frozen at -80°C.
The genome size was estimated by flow cytometry using the fluorochrome propidium iodide and following the 'one-step'   ,200 Mb (Obermayer et al., 2002).
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The daPulDyse1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.
Flower and leaf samples were cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts.High molecular weight (HMW) DNA was extracted using the Qiagen Plant MagAttract DNA extraction kit.Low molecular weight DNA was removed from a 20 ng aliquot of extracted DNA using the 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing.HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.
Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from leaf tissue of daPulDyse1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according    The assembly was checked for contamination and corrected as described previously (Howe et al., 2021).Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022).The mitochondrial and chloroplast genomes were assembled using MBG from PacBio HiFi reads mapping to related genomes (Rautiainen & Marschall, 2021).A representative circular sequence was selected for each from the graph based on read coverage.

Software tool Version
Wellcome Sanger Institute -Legal and Governance The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material  The presentation of the genome assembly data is sufficient to assess the quality of the genome.
The article provides genome assembly benchmarks used in other eukaryotic species genome assembly evaluations, and the reported genome assembly meets or exceeds all benchmarks for quality.The main assembly was not phased but haplotypic duplications were identified and removed to an alternate assembly that is also reported.
The description of the methods is sufficient to enable assessment of the work and for someone to reproduce the work.All data have been properly provided to permanent online repositories, including the original sequence data, the main assembly, and the partial alternate haplotype.The article also reports the sampling records and processes used by DTOL to ensure compliance with ethical and legal standards.
This article meets the expected standards for a report of a de novo plant genome assembly.My only suggestion for a change would be to provide citation(s) for the biological and morphological information provided in the first two paragraphs of the background section.

Ayoob Alfalahi
University of Anbar, Anbar, Iraq Thank you for suggesting me to review the attached manuscript.
I found the article is very interesting through which authors tried to employ modern molecular techniques to determine the whole genome sequence of Pulicaria dysenterica (common fleabane; Tracheophyta; Magnoliopsida; Asterales; Asteraceae).The mitochondrial and plastid genomes of fleabane were assembled too.
However, after thoroughly reading the manuscript, I'm sending the following comments: The introduction doesn't provide sufficient background and doesn't include all up-to-date relevant references.Also, the main objectives should be included (e.g.why it is important to complete the genome sequence for this plant species). 1.
The methods can be further detailed; however, before it was diluted to the final work concentration, the quality and quantity of the extracted DNA and RNA should be indicated!Also, author may clarify whether the collected plant sample was free pest infections or it was not.What was the weight of plant material used to extract the DNA? 2.
Close images to the entire plant sample will be more valuable in describing the plant morphology; especially in such important genomic studies.

3.
I think it is important to indicate the origin and the manufactured company of the used materials and instruments.

4.
I wish authors will be able to use these comments to improve their article.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Partly Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Plant Biotechnology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Photographs of the Pulicaria dysenterica (daPulDyse1) specimen used for genome sequencing.A. Habitat along Lower Ham Road in Kingston upon Thames.B. Detail of inflorescences.

Figure 2 .
Figure 2. Genome assembly of Pulicaria dysenterica, daPulDyse1.2:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 833,756,611 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (118,229,540 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (99,126,001 and 73,589,420 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the eudicots_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Pulicaria%20dysenterica/dataset/CAMXCE02/snail.

Figure 5 .
Figure 5. Genome assembly of Pulicaria dysenterica, daPulDyse1.2:Hi-C contact map of the daPulDyse1.2assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=JSiZBnchQRe3y3PaJKhs4Q.

©
2023 Gaines T. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Todd Gaines Colorado State University, Fort Collins, Colorado, USA The article reports assembly of the genome of common fleabane, Pulicaria dysenterica, as part of the Darwin Tree of Life Consortium.The genome sequencing was conducted using reasonably high coverage (28X) PacBio HiFi long read sequencing combined with 10X Genomics sequencing and Hi-C chromosome conformation data.The initially assembled contigs were scaffolded into chromosome-level scaffolds (9, one for each chromosome of this species).The final assembly size was 833.2 Mb, close to the genome size estimated by flow cytometry of 1,070 Mb.

Table 3
contains a list of relevant software tool versions and sources.

Peer Review Current Peer Review Status: Version 1
Members of theTree of Life Core Informatics collective are listed here: https://doi.org/10.5281/zenodo.5013541.Members of the Darwin Tree of Life Consortium are listed here: https://doi.org/10.5281/zenodo.4783558.

Is the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
https://doi.org/10.21956/wellcomeopenres.22149.r69129© 2023 Alfalahi A. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.