Ultra-Barcoding Discovers a Cryptic Species in Paris yunnanensis (Melanthiaceae), a Medicinally Important Plant

Ultra-barcoding is a technique using whole plastomes and nuclear ribosomal DNA (nrDNA) sequences for plant species identification. Paris yunnanensis is a medicinal plant of great economic importance for the pharmaceutical industry. However, the alpha taxonomy of P. yunnanensis is still uncertain, hindering effective conservation and management of the germplasm. To resolve long-standing taxonomic disputes regarding this species, we newly generated the complete plastomes and nrDNA sequences from 22 P. yunnanensis accessions. Ultra-barcoding analyses suggest that P. yunnanensis as currently circumscribed is made up of two distinct genetic lineages, corresponding to the two phenotypes (“typical” and “high stem” form) identified early in our study. With distinct morphologies and distribution, the “high stem” form should be recognized as a previously unrecognized species; here it is described as a new species, P. liiana sp. nov. Moreover, the ultra-barcoding data do not support treatment of P. yunnanensis as a conspecific variety under Paris polyphylla. Our study represents a guiding practical application of ultra-barcoding for discovery of cryptic species in taxonomically challenging plant taxa. The findings highlight the great potential of ultra-barcoding as an effective tool for resolving perplexing problems in plant taxonomy.

Ultra-barcoding is a technique using whole plastomes and nuclear ribosomal DNA (nrDNA) sequences for plant species identification. Paris yunnanensis is a medicinal plant of great economic importance for the pharmaceutical industry. However, the alpha taxonomy of P. yunnanensis is still uncertain, hindering effective conservation and management of the germplasm. To resolve long-standing taxonomic disputes regarding this species, we newly generated the complete plastomes and nrDNA sequences from 22 P. yunnanensis accessions. Ultra-barcoding analyses suggest that P. yunnanensis as currently circumscribed is made up of two distinct genetic lineages, corresponding to the two phenotypes ("typical" and "high stem" form) identified early in our study. With distinct morphologies and distribution, the "high stem" form should be recognized as a previously unrecognized species; here it is described as a new species, P. liiana sp. nov. Moreover, the ultra-barcoding data do not support treatment of P. yunnanensis as a conspecific variety under Paris polyphylla. Our study represents a guiding practical application of ultra-barcoding for discovery of cryptic species in taxonomically challenging plant taxa. The findings highlight the great potential of ultra-barcoding as an effective tool for resolving perplexing problems in plant taxonomy.

INTRODUCTION
DNA barcoding involves the standardized use of one or a few DNA regions for identification and discrimination of species (Hebert et al., 2003;Hollingsworth, 2011;Hollingsworth et al., 2016), as well as the discovery of cryptic or novel species (Hebert et al., 2003;Bell et al., 2012). Although the mitochondrial gene cytochrome oxidase 1 (COI) performs well as a standard animal DNA barcode (Ward et al., 2005;Hajibabaei et al., 2006;Pons et al., 2006), reliable species discrimination based on standard DNA barcodes (i.e., rbcL, matK, trnH-psbA, and ITS) remains problematic in plants (Hollingsworth et al., 2009Hollingsworth, 2011;Li et al., 2011;Coissac et al., 2016). With the advent of next-generation DNA sequencing (NGS) technologies, the concept of DNA barcoding for plant species has been extended from one or several sequence loci to large amounts of genomic data (Li et al., 2015;Coissac et al., 2016;Hollingsworth et al., 2016). Complete plastid genomes (plastomes) and entire nuclear ribosomal DNA (nrDNA) sequences harbor many more sequence variations, making them far more sensitive and effective than standard DNA barcodes, especially among very closely related taxa (Nock et al., 2011;Kane et al., 2012;Ruhsam et al., 2015;Ji et al., 2019a;Zhu et al., 2019;Li et al., 2020). The extension of standard DNA barcodes to whole plastomes and nrDNA sequences has been referred to as "ultra-barcoding" (Kane et al., 2012). However, practical application of this technique for discovery of cryptic or novel species in taxonomically difficult plant taxa is still absent from literature.
Paris yunnanensis Franch. (Melanthiaceae), a perennial rhizomatous herb distributed in southwestern China and northern Myanmar (Li, 1998), has great economic value. Dried rhizome of this plant, bearing the pharmaceutical name "Rhizoma Paridis," is a traditional medicine in China with hemostatic, anti-inflammatory, analgesic, antipyretic, and other therapeutic properties (China Pharmacopoeia Commission, 2015). Phytochemical investigations revealed steroidal saponins as the main components responsible for the bioactivities of this plant . There are about 70 commercial drugs and health products that use Rhizoma Paridis as raw materials, including "Yunnan Baiyao," a famous Chinese medicine, and "Gongxuening Capsule," a gynecological hemostatic based on extractions of Rhizoma Paridis. The value of these pharmaceutical products is estimated to be more than 10 billion CNY (∼1.5 billion USD) per year (Huang et al., 2012).
Although Paris is morphologically distinct from other angiosperm genera, the rhizome, leaf, flower, stamen, ovary, fruit, and seed morphologies, which have been widely used for classification, are highly divergent among species (Hara, 1969;Takhtajan, 1983;Li, 1998;Ji et al., 2006Ji et al., , 2019b. Since the description of P. yunnanensis by Franchet (1888), its taxonomic rank has been in dispute. Handel-Mazzetti (1936) proposed that the morphologies of P. yunnanensis are largely homologous to those of Paris polyphylla, and thus reduced it to a conspecific variety (P. polyphylla var. yunnanensis) of the latter species; this treatment was followed by Hara (1969) and Li (1998). However, Takhtajan (1983) argued that P. yunnanensis is morphologically different from P. polyphylla and should be treated as a separate species. We detected a number of morphological differences among P. yunnanensis accessions during the early stage of our study, based on which we identified two phenotypes ("typical" form and "high stem" form, Figure 1). The morphological variation within P. yunnanensis suggests that the taxonomic delimitation of this economically important plant needs to be re-assessed. Given the great economic importance of P. yunnanensis, satisfactory resolution of these taxonomic issues will be conductive to exploration and protection of its germplasm.
Genome skimming, involving a relatively low coverage shotgun sequencing of genomic DNA, is an efficient and cost-effective approach to recover highly repetitive genome components such as nrDNA or organelle genomes (Straub et al., 2012). The genome skimming approach using NGS can recover plastomes, nrDNA clusters and sometimes even the complete nuclear genome at relatively low sequencing depth, and these sequence data can be both backwards-compatible with the standard plant barcodes, and forward-compatible with whole genome sequencing (Straub et al., 2012;Coissac et al., 2016;Hollingsworth et al., 2016). Because of the significant advantages, this approach has great promise for extending the concept of DNA barcoding from one or a few DNA regions to genomes . Recently, genome skimming has been employed to genomic data for species discrimination in several taxonomically challenging plant groups, for instance, Theobroma (Kane et al., 2012), Araucaria (Ruhsam et al., 2015), Diospyros (Turner et al., 2016), and Panax (Ji et al., 2019a).
In this study, we sampled 22 P. yunnanensis individuals, representing the two phenotypes identified, and generated complete plastomes and nrDNA sequences for these individuals using a genome skimming approach. Based on ultra-barcoding analyses, we aimed to elucidate (1) whether P. yunnanensis is related closely enough to P. polyphylla to warrant taxonomic treatment as conspecific varieties, and (2) whether the two phenotypes within P. yunnanensis represent distinct taxa.

Plant Materials and Low-Coverage Shotgun Sequencing of Genomes
A total of 22 individuals of P. yunnanensis as currently circumscribed (16 accessions of "typical" form and 6 accessions of "high stem" form) were collected from the wild according to records of herbarium specimens, approximately covering the geographic range of the species ( Table 1). The vouchers were identified by Dr. Yunheng Ji and deposited at the herbarium of the Kunming Institute of Botany, Chinese Academy of Sciences (KUN).
Genomic DNA was extracted from ∼20 mg silica-dried leaf tissues, using the cetyltrimethylammonium bromide (CTAB) method (Doyle and Doyle, 1987). Approximately 5 µg purified genomic DNA was sheared to fragments of 300-500 bp by sonication. Paired-end libraries with an average insert size of 350 bp were prepared using a TruSeq DNA Sample Prep Kit (Illumina, Inc., United States), according to the manufacturer's instructions. The libraries were paired-end sequenced on the Illumina HiSeq 2000 platform. Raw reads were filtered to remove adaptors and low-quality reads using the NGS QC Toolkit (Patel and Jain, 2012), setting the cutoff value for percentage read length to 80 and Phred quality score to 30.

Recovery and Annotation of Plastomes
High-quality reads were assembled to generate complete plastomes with GetOrganelle pipeline developed by Jin et al. (2018). The plastome sequence of P. yunnanensis (GenBank accession: MN125587) was used as a reference for plastome assembly. All of the plastid-like reads were assembled into contigs by SPAdes v3.10.1 (Bankevich et al., 2012) with the k-mer defined as 75, 85, 95, and  105. A customized python script (Jin et al., 2018), which uses BLAST and a built-in library to search the plastidlike contig, was employed to connect verified contigs into plastomes in Bowtie 2 (Langmead and Salzberg, 2012), with its default parameters. The assembled plastomes were annotated using the Dual Organellar Genome Annotator database (Wyman et al., 2004). Start and stop codons and intron/exon boundaries for protein-coding genes were checked manually. Annotated tRNA genes were further verified using tRNAscan-SE 1.21 (Schattner et al., 2005) with default parameters. Gene content and arrangement of P. yunnanensis plastomes were visualized and compared using MUMmer 3.0 (Kurtz et al., 2004). Boundaries of the large single copy (LSC), inverted repeat (IR), and small single copy (SSC) regions in each plastome were compared using Geneious v10.2.3 (Kearse et al., 2012).

Recovery of rDNA Sequences
Before the assembly of nrDNA clusters, all plastid-like reads were excluded from the Illumina data. The complete nrDNA sequence (including 26S, 18S, and 5.8S ribosomal RNA genes and ITS regions) of P. yunnanensis (MN174873) was used as a reference. Contigs mapping to reference nrDNA sequences were assembled using the processes described above. Nuclear ribosomal RNA genes and their boundaries with ITS regions were annotated and defined by comparison with the reference sequence using Geneious v10.2.3 (Kearse et al., 2012).

Data Analysis
The efficiency of the complete plastomes and nrDNA sequences for species identification were investigated using tree-based methods. Based on the tree topologies, species-level monophyly of P. yunnanensis as currently circumscribed and its relationships with congeneric species were examined. In addition to the 22 P. yunnanensis plastomes and nrDNA sequences newly sequenced in this study (Table 1), 31 plastomes and nrDNA sequences determined from our previous studies (Huang et al., 2016;Ji et al., 2019a; and representing species in the genus Paris were included in the phylogenetic analyses. Plastome and nrDNA sequences were respectively aligned using the program MAFFT (Katoh and Standley, 2013) with manual adjustment where necessary. Alignment of sequences are deposited in the online database Treebase 1 .
Phylogenetic analysis of each dataset was performed using maximum likelihood (ML) and Bayesian inference (BI). The complete plastome (MN125577) and nrDNA (MN174897) sequences of Trillium tschonoskii were used as the outgroup to root the plastome and nuclear trees, respectively. Conflict between plastid and nuclear datasets was statistically tested using the incongruence length difference (ILD) test (Farris et al., 1994) implemented in PAUP * 4.0b10 (Swofford, 2002) for 1,000 replicates.
The best-fit substitution model for plastomes (GTR + G) and nrDNA (GTR + G + I) was determined using MODELTEST 3.7 (Posada and Crandall, 1998) with the Akaike information criterion (Posada and Buckley, 2004). ML analyses were performed in the software RAxML-HPC BlackBox v8.1.24 (Stamatakis, 2006). The best-scoring ML tree for each dataset was generated with 1,000 replicates to provide bootstrap percentage (BP) support values. BI analyses were performed using MrBayes v3.2 (Ronquist and Huelsenbeck, 2003). Two independent Markov Chain Monte Carlo (MCMC) simulations were run with 1,000,000 generations, sampling every 100 generations. An initial 25% of the sampled trees were discarded as burn-in. Posterior probability (PP) values were computed from the remaining trees. Stationarity was considered to be reached when the average standard deviation of the split frequencies was <0.01.

Illumina Sequencing
Illumina sequencing generated between 9,448,962 and 25,342,424 paired-end clean reads per sample. Of those, 158,172-1,067,241 and 8,299-22,466 reads were mapped to the reference plastome and ribosomal DNA sequences, respectively (Supplementary Table S1). De novo assembly based on these data covered the entire plastome and nrDNA for all samples, with average coverage ranging from 44,917 to 1,011 and 211,637 to 572,407 times, respectively. The sequences newly generated in this study were deposited in NCBI GenBank, and their accession numbers are shown in Table 1.

Phylogenies Based on nrDNA Sequences
Assembly of nrDNA sequences entirely covered the 18S rDNA, ITS1, 5.8S rDNA, ITS2, and 26S rDNA clusters. Sequence lengths for the "typical" P. yunnanensis (16 accessions) and "high stem" phenotype (6 accessions) were 5,856 and 5,857 bp, respectively. Phylogenetic trees based on maximum likelihood (ML) and Bayesian inference (BI) analyses had a very similar topology overall, but exhibited minor differences within interior nodes. Both ML and BI analyses failed to recover all P. yunnanensis accessions as a monophyletic lineage, instead grouping them into two phylogenetically disparate clades (Figure 2). The first clade consisted of all "high stem" accessions while the second clade included all "typical" P. yunnanensis accessions. The monophyly of both clades received full branch support (BP = 100, PP = 1), and the clades were separated from each other by Paris yanchii, Paris lancifolia (≡ P. polyphylla var. stenophylla), and the clade comprising Paris tengchongensis, Paris forrestii, Paris rugosa, Paris mairei, P. polyphylla, Paris luquanensis, and Paris marmorata. Moreover, the nrDNA phylogenies indicated that P. polyphylla is sister to P. mairei (BP = 100, BI = 1) and closely related to P. luquanensis and P. marmorata (BP = 100, BI = 1). However, not only P. yunnanensis accessions but also Paris chinensis (≡ P. polyphylla var. chinensis) and P. lancifolia, once treated as conspecific varieties of P. polyphylla (Hara, 1969;Li, 1998), were phylogenetically disparate from P. polyphylla in both ML and BI trees (Figure 2).

Plastome Phylogenies
In this study, 22 P. yunnanensis plastomes were recovered, using the genome skimming approach. The plastome size of "typical" P. yunnanensis and "high stem" accessions varied from 157,641 to 158,254 bp and 157,951 to 158,526 bp, which possessed the typical quadripartite structure of flowering plants, consisting of a LSC, a SSC, and a pair of IRs (Figure 3). All plastomes contained the same 114 unique genes, including 80 protein-coding genes, 30 tRNA genes, and four rRNA genes (Supplementary Table S2). Several internal stop codons in coding regions of the cemA gene identified it as a pseudogene in all newly generated plastomes.
The incongruence length difference (ILD) test revealed strong discordance between plastome and nrDNA datasets (p < 0.001). Although both ML and BI analyses similarly grouped "high stem" and "typical" P. yunnanensis plastomes into two phylogenetically independent clades, the relationships of these two groups with congeneric species differed greatly from those revealed by nrDNA phylogenies. Since P. luquanensis was nested into "typical" P. yunnanensis accessions, both ML and BI analyses failed to resolve the latter as monophyletic (Figure 4). In addition, plastome phylogenies did not recover the sister relationships between "typical" P. yunnanensis accessions and P. yanchii, as well as between "high stem" accessions and P. lancifolia. Instead, P. yanchii and P. lancifolia formed a well-supported clade (BP = 99, PP = 1) sister to the clade consisting of P. mairei, P. marmorata, and P. polyphylla (BP = 84, PP = 1). Similar to nrDNA phylogenies, P. chinensis, P. lancifolia, P. polyphylla, and P. yunnanensis accessions were resolved as phylogenetically disparate in both tree topologies (Figure 4).

DISCUSSION
Paris yunnanensis is a medicinal plant with great economic importance to the pharmaceutical industry. In this study, we aimed to resolve long-standing taxonomic disputes regarding this species using ultra-barcoding technique. The complete plastomes and nuclear ribosomal DNA regions from 22 Paris yunnanensis accessions were newly generated to investigate the species-level monophyly of the plant and its relationships with the congeneric species. Our data not only allowed recognition of a cryptic species in P. yunnanensis, but also led us to resolve long-standing controversies regarding the taxonomic status of this species. This study represents a guiding practical application of ultrabarcoding technique for discovery of cryptic or novel species. The findings highlight the great potential of ultra-barcoding as an effective tool for resolving perplexing problems in taxonomically difficult plant taxa, and have implications for the conservation and management of P. yunnanensis germplasm.

Putative Hybridization
Similar to previous studies (Ji et al., 2006(Ji et al., , 2019b, we found that nrDNA and plastome phylogenies were largely incongruent in Paris. With respect to the target species of this study, the cytonuclear incongruence primarily involved the non-monophyly of "typical" P. yunnanensis accessions in plastome trees. Notably, cytonuclear discordance is a commonly  investigated phenomenon in plant phylogenetics (Rieseberg and Soltis, 1991), which can be attributed to incomplete sorting of cytoplasmic polymorphisms or "chloroplast capture" resulting from hybridization (Wendel and Doyle, 1998). Ji et al. (2006) proposed that natural hybridization between some sympatric Paris species is feasible if the pollination mechanisms are compatible. In addition, observation of morphological intermediates between P. yunnanensis and P. luquanensis suggests that natural hybridization may occur between these two species (Ji et al., 2006).
Plastome tree topologies indicated that the non-monophyly of "typical" P. yunnanensis accessions results from clustering of P. luquanensis with P. yunnanensis accessions collected from northern Yunnan and southwestern Sichuan. Within these regions, P. yunnanensis is sympatric with P. luquanensis. Therefore, the non-monophyly of "typical" P. yunnanensis plastomes may have been caused by chloroplast capture, with the plastome from P. yunnanensis being introgressed into the nuclear background of P. luquanensis by hybridization (Rieseberg and Soltis, 1991;Rieseberg and Wendel, 1993). This assumption can be further tested through analyzing multiple loci of nuclear genes and sampling populations of both species.

Evidence for a Cryptic Species Within Paris yunnanensis
The plasticity of morphological characteristics and lack of taxonomically robust characters among Paris species have made the taxonomy of this genus historically difficult to reconstruct, especially for Chinese and Himalayan species (Franchet, 1888;Hara, 1969;Takhtajan, 1983;Li, 1998). Despite the great commercial value of P. yunnanensis to the pharmaceutical industry, the alpha taxonomy of this plant is still uncertain. In this study, we used complete plastomes and nrDNA sequences as ultra-barcodes to assess the specieslevel monophyly of P. yunnanensis as currently circumscribed.
Our data failed to group all accessions into a single and monophyletic clade, but resolved them as two phylogenetically disparate and well-supported clades corresponding to the two phenotypes identified in P. yunnanensis. This suggests that both "typical" and "high stem" forms represent two evolutionarily distinct lineages. The "high stem" form shows significant morphological differences from "typical" P. yunnanensis, which include plant height, leaf-blade shape, length, and width, sepal shape, petal color and width, and color of fruit at maturity (Figure 1 and Table 2). Interestingly, their aerial shoots also exhibit distinct growth patterns (Figure 5). Specifically, opening of flowers in "high stem" form is usually 5-15 days earlier than the leaf unfolding, when the pedicels extend out 10-30 cm above stem apex. On the contrast, flowering and leaf expansion synchronize in "typical" P. yunnanensis, whose pedicels do not obviously elongate until the full expansion of leaves. In addition, the FIGURE 5 | Comparison of the development of aerial shoot between "typical" Paris yunnanensis (P. yunnanensis s.s.) and "high stem" form (P. liiana sp. nov.).
Frontiers in Plant Science | www.frontiersin.org two phenotypes possess distinct distribution ranges. The "high stem" populations occur in southern Yunnan, western Guangxi, and southwestern Guizhou, whereas "typical" P. yunnanensis (P. yunnanensis s.s.) is mainly distributed in central, northern, northwestern, and western Yunnan, southwestern Sichuan, and southeastern Tibet (Figure 6). There is little overlap between their respective distribution ranges. This evidence justifies the "high stem" form being recognized as a distinct taxon. Moreover, the phylogenetic relationships of the "high stem" form with related, well-defined, congeneric species suggest that recognition of it as a distinct species is appropriate.
The nrDNA and plastome tree topologies both indicated that P. yunnanensis s.s. (≡ P. polyphylla var. yunnanensis), P. chinensis (≡ P. polyphylla var. chinensis), and P. lancifolia (≡ P. polyphylla var. stenophylla) are genetically distinct from P. polyphylla. The ultra-barcoding data provide no support for treating these four taxa as conspecific varieties (Handel-Mazzetti, 1936;Hara, 1969;Li, 1998), but justify that they should be recognized as distinct species. It is notable that our sampling of P. chinensis and P. lancifolia (one individual per species) might be limiting for molecular study. Further study of their species-level monophyly by sampling multiple accessions per species is warranted.
With more variable characters than standard DNA barcodes, genomic data have been recommended as next-generation DNA barcodes for plants (Kane and Cronk, 2008;Nock et al., 2011;Kane et al., 2012;Ruhsam et al., 2015;Hollingsworth et al., 2016) and utilization of these extended barcodes in plant species identification is referred to as ultra-barcoding (Kane et al., 2012) or "plant barcoding 2.0" . However, the efficiency of ultra-barcodes for the discovery of cryptic and novel species has seldom been evaluated (Kane and Cronk, 2008;Kane et al., 2012;Hollingsworth et al., 2016). The practical application of ultra-barcodes in this study not only allowed recognition of a cryptic species in P. yunnanensis, but also led us to infer possible hybridization between P. luquanensis and "typical" P. yunnanensis (P. yunnanensis s.s.). Therefore, the ultra-barcoding approach has great promise for discovery of novel taxa, and offers significant advantages in interpreting possible hybridization events and identifying hybrids.

Implications for the Management of Paris yunnanensis Germplasm Resources
Germplasm resources are the genetic material basis for plant breeding and crop improvement (Nass et al., 2012).
Proper circumscription of a species and identification of germplasm diversity is a critical prerequisite to conservation and management efforts. We propose a narrow species delimitation for P. yunnanensis, based on successful distinction of P. yunnanensis s.s. from the cryptic species using complete plastome and nrDNA sequences as ultra-barcodes. In addition, both datasets possess high levels of infraspecific sequence variation in P. yunnanensis s.s. Thus, ultra-barcoding could be an effective tool for identifying P. yunnanensis s.s. and for investigating its germplasm diversity.
As we discussed above, cytonuclear discordance observed in P. yunnanensis s.s. implies that natural hybridization may occur between this plant and its sympatric congeneric species, P. luquanensis. From the perspective of germplasm conservation and management, great attention should be paid to the protection of "genetically genuine" individuals and populations. Ultrabarcoding may help exclude possible hybrids for construction of a core germplasm resource. Based on this, we could search for elite germplasm that is highly productive and contains high levels of steroidal saponins for breeding needs. Given that distant hybridization can either result in parental advantages or create heterosis (Cicin, 1954;Whitney et al., 2010), it will not only extend the germplasm resources of P. yunnanensis s.s., but can also improve the performance of the outcrossing offspring. Therefore, hybrids are indispensable complements to the core germplasm resources. Ultra-barcoding will serve as a useful tool for genotyping hybrid germplasm and interpreting parentage. Elucidating these issues will improve future crossbreeding research.