The genome sequence of the English holly, Ilex aquifolium L. (Aquifoliaceae)

We present a genome assembly from an individual Ilex aquifolium (the English holly; Eudicot; Magnoliopsida; Aquifoliales; Aquifoliaceae). The genome sequence is 800.0 megabases in span. Most of the assembly is scaffolded into 20 chromosomal pseudomolecules. The assembled mitochondrial and plastid genomes have lengths of 538.43 kilobases and 157.52 kilobases in length, respectively.


Background
Common holly, Ilex aquifolium L., is an evergreen, dioecious shrub or small tree, found across western and southern Europe, east to Germany and Romania and south to Morocco, but it has been planted and naturalised from cultivation across Europe and in Tasmania, New Zealand and western North America (POWO, 2023).
It is one of the few evergreen hardwood trees in Britain, where it usually grows in shaded deciduous forests under oak or beech.It most frequently grows as a shrub up to 3 m tall, but occasionally trees of up to 23 m are found (e.g.Peterken & Lloyd, 1967).It is long-lived, with some individuals known to be more than 300 years old (De Cleene & Lejeune, 2000); in Britain, some trees have been shown to be more than 250 years old (Peterken & Lloyd, 1967).The evergreen leaves are also long-lived and can stay alive for up to five years.In the leaf axils white, four-parted unisexual flowers appear.These are sweet-scented and are pollinated by bees, wasps, flies and small butterflies.On female plants the flowers develop into red drupes with the persistent stigma on top.Seeds are dispersed in late winter by birds or rodents.The species is adaptable to different situations and can quickly invade clear-cuts or forest margins, but holly is damaged by fire, grazing and severe frost, which can delimit its distribution.It is also the host for the larvae of the holly leaf miner, Phytomyza ilicis (Curtis, 1846), which causes yellow patches on the leaves.
The evergreen leaves of holly have spinose margins in the lower part of the plant, whereas they become smooth and spine-less in the upper part.This phenotypic plasticity is a response to grazing.Epigenetic variation caused by changes in DNA methylation are responsible for phenotypic plasticity in Ilex aquifolium (Herrera & Bazaga, 2013).
In the Mediterranean, holly was associated with the Roman festival Saturnalia (17 December), where branches were given to friends as a good luck charm.Early usage of holly in the Christian church is probably derived from this usage (De Cleene & Lejeune, 2000).In Germanic and Celtic druidic traditions, it was associated with winter solstice festivities, because it was one of the few evergreen trees in the forest and therefore considered sacred.These associations with winter remained and were later merged with Christmas, so holly is now commonly featured in Christmas decorations with other evergreen species including Viscum album L. (mistletoe) and Hedera helix L. (ivy).With the latter species, holly is the focus of the traditional Christmas carol "The holly and the ivy".These evergreen plants are symbols of life in northern peoples, especially in Britain.It was seen as a protective plant and hence it was frequently planted near churches and monasteries, often with other evergreen species including Taxus baccata L. (yew).Many apotropaic powers are attributed to holly (Grigson, 1955).
It also has an association with Easter, especially in Germany and Switzerland, where the spiny branches were used for chastisement and were also burned to celebrate the resurrection of Christ.With its blood-red berries and spiny leaves, it is also a symbol of the crown of thorns (De Cleene & Lejeune, 2000).
The wood was traditionally used to make Great Highland bagpipes in Scotland, although this usage was abandoned in favour of cheaper tropical hardwoods that became available during the 19th century (Dickson, 2009).The wood is easily worked and is often used for handles and cabinetry.Bark was once used to make a tar-like substance to catch birds (Dodoens, 1554).The same book also describes a treatment for abdominal pain; however, as holly is poisonous, it should not be consumed.
The name holly comes from the Old English 'holen', probably derived from a Proto-Indo-European root for prick; holly and its Celtic equivalents ('kelen' in Cornish; 'celyn' in Welsh) are frequent placename elements.Ilex is the Latin word for the holm oak (Quercus ilex L.), which has similar prickly leaves, and aquifolium is derived from Latin 'acus', needle, and 'folium', leaf.All these terms refer to the prickly leaves.
We sampled a female individual from a native stand at Petersham Common in Surrey, where it grows in mixed woodland with Acer campestre L., Carpinus betulus L. and Quercus robur L. As with the genome of the Chinese Ilex micrococca Maxim.(Yao et al., 2022), the genome of Ilex aquifolium will be equally useful for population genetic studies, especially related to studying phenotypic variation in glacial refuges across the range of this species (Mihali et al., 2022).We hope that this high-quality genome will be useful for furthering the studies on population genetics, epigenetics and domestication in Aquifoliaceae.

Genome sequence report
The genome was sequenced from a specimen of Ilex aquifolium (Figure 1) collected from Petersham Common, Richmond, Surrey, UK (51.45, -0.30).Using flow cytometry, the genome size (1C-value) was estimated to be 1.04 pg, equivalent to 1,010 Mb.A total of 36-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 16 missing joins or mis-joins and removed 7 haplotypic duplications, reducing the assembly length by 0.48% and increasing the scaffold number by 0.95%, while decreasing the scaffold N50 by 0.84%.
The final assembly has a total length of 800.0 Mb in 104 sequence scaffolds with a scaffold N50 of 37.4 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC    proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (98.33%) of the assembly sequence was assigned to 20 chromosomal-level scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.
Contigs corresponding to the second haplotype have also been deposited.The mitochondrial and plastid genomes were also assembled and can be found as contigs within the multifasta file of the genome submission.

Sample acquisition, genome size estimation and nucleic acid extraction
A specimen of Ilex aquifolium (specimen ID KDTOL10104, ToLID drIleAqui2) was picked by hand in Petersham Common, Richmond, Surrey, UK (latitude 51.45, longitude -0.30) on 2020-09-08.The specimen was collected and identified by Maarten J. M. Christenhusz (Independent) and frozen at -80 °C.
The genome size was estimated by flow cytometry using the fluorochrome propidium iodide and following the 'one-step' method as outlined in Pellicer et al. (2021).For this species, the General Purpose Buffer (GPB) supplemented with 3% PVP and 0.08% (v/v) beta-mercaptoethanol was used for isolation of nuclei (Loureiro et al., 2007), and the internal calibration standard was Petroselinum crispum 'Champion Moss Curled' with an assumed 1C-value of 2,200 Mb (Obermayer et al., 2002).
The workflow for high molecular weight (HMW) DNA extraction at the Wellcome Sanger Institute (WSI) includes a sequence of core procedures: sample preparation; sample homogenisation, DNA extraction, fragmentation, and clean-up.
In sample preparation, the drIleAqui2 sample was weighed and dissected on dry ice (Jay et al., 2023).For sample homogenisation, leaf tissue was cryogenically disrupted   Protocols developed by the WSI Tree of Life core laboratory are publicly available on protocols.io(Denton et al., 2023).

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturer's instructions.DNA sequencing was performed by the Scientific Operations core at the WSI on a Pacific Biosciences SEQUEL II instrument.Hi-C data were also generated from leaf tissue of drIleAqui2 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.
Table 3 contains a list of relevant software tool versions and sources.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.
The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.
The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Paola Gaiero
Universidad de la República, Montevideo, Uruguay This dataset reports the genome sequencing and assembly of a highly emblematic and popular plant, the English holly.The significant cultural value and traditional knowledge associated to this plant, together with all the links with landmark dates for western civilization, make it a very suitable species to be included in the Tree of Life project and justifies the rationale behind creating this dataset.
The genome was sequenced with the latest technologies applied for plant genomes and the scaffolding was properly achieved to the chromosome level using high quality Hi-C data as presented in the contact map in Figure 5.The assembly of one haplotype is presented and the contigs for the alternative haplotyped are also provided.Organelle genomes are also assembled and presented.All methods are sound, robust and clearly presented to a detail that would allow replication.Access to all the software and datasets is easily provided.
A very informative snailplot is used to present the different features of this genome assembly.Metrics in this plot are compared to standards in table 1 and are all highly satisfactory.Although I understand that it is a dataset report and therefore all the metrics and features of this genome are to be presented and exploited, I am not entirely sure about the relevance of Figures 3 and 4. If possible, I would suggest the authors to include an explanation or interpretation of why the genome assembly manages to cover 800 Mbp out of the estimated genome size of 1010 Mbp.
To sum up, this is a sound and robust dataset, obtaining a high quality, chromosome-scale genome assembly of the emblematic English holly, which makes publicly available a full haplotype together with the contigs for the alternative haplotype and the chloroplast and mitochondrial full genome assemblies.It is undoubtfully a useful contribution to the plant genomics community and to the Tree of life project.

Xin Yao
Center for Integrative Conservation, Xishuangbanna Tropical Botanical Garden, University of the Chinese Academy of Sciences, Beijing, Beijing, China Species Ilex aquifolium is an important species in terms of its values of culture and ecology.The chromosome-scale genome of this species will be definitely helpful to future studies of its ecology, evolution, breeding, and so on.However, the manuscript could improve.Firstly, the background was not well structured.The authors used eight paragraphs to introduce the morphology, culture, and utilization of I. aquifolium.All these contents could merge into one or two paragraphs.Yet, the chromosome-scale genome of this species would be more directly beneficial to its ecology, evolution, and breeding.So, the background could talk a little more about how these aspects need a chromosome-scale genome.Reviewer Expertise: Plant genomics, bioinformatics, genetics and breeding, evolution.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Figure 2 .
Figure 2. Genome assembly of Ilex aquifolium, drIleAqui2.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 800,663,922 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (62,197,019 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (37,412,598 and 30,299,755 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the eudicots_ odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/drIleAqui2_1/dataset/drIleAqui2_1/snail.

Figure 3 .
Figure 3. Genome assembly of Ilex aquifolium, drIleAqui2.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/drIleAqui2_1/dataset/drIleAqui2_1/blob.

Figure 4 .
Figure 4. Genome assembly of Ilex aquifolium, drIleAqui2.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/drIleAqui2_1/dataset/drIleAqui2_1/cumulative.

Figure 5 .
Figure 5. Genome assembly of Ilex aquifolium, drIleAqui2.1:Hi-C contact map of the drIleAqui2.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=NkyM_TiaRAqOF7SXZC8Kbg.
Try to keep the whole background concise and readable.Secondly, the methods section was not very clear.Did the authors only use the sequence data produced by the third-generation sequencer and Hi-C?What were the sequencing depths of the third-generation sequencing and Hi-C sequencing?Ideally, authors should make a genome comparison between the I. aquifolium genome and other published Ilex genomes.Hope that the annotations of this I. aquifolium genome could be published soon Is the rationale for creating the dataset(s) clearly described?Partly Are the protocols appropriate and is the work technically sound?Are sufficient details of methods and materials provided to allow replication by others?Partly Are the datasets clearly presented in a useable and accessible format?Partly Competing Interests: No competing interests were disclosed.

•
Legality of collection, transfer and use (national and international)Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.DToL) project.All raw sequence data and the assembly have been deposited in INSDC databases.The genome will be annotated using available RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute.Raw data and assembly accession identifiers are reported in Table1.

the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
https://doi.org/10.21956/wellcomeopenres.22960.r80521© 2024 Yao X.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.