Comparison of the abilities of universal, super, and specific DNA barcodes to discriminate among the original species of Fritillariae cirrhosae bulbus and its adulterants

Fritillariae cirrhosae bulbus is a famous type of traditional Chinese medicine used for cough relief and eliminating phlegm. The medicine originates from dried bulbs of five species and one variety of Fritillaria. Recently, immature bulbs from other congeneric species, such as F. ussuriensis, have been sold as adulterants of Fritillariae cirrhosae bulbus in medicine markets owing to the high price and limited availability of the genuine medicine. However, it is difficult to accurately identify the bulbs from different original species of Fritillariae cirrhosae bulbus and its adulterants based on traditional methods, although such medicines have different prices and treatment efficacies. The present study adopted DNA barcoding to identify these different species and compared the discriminatory power of super, universal, and specific barcodes in Fritillaria. The results revealed that the super-barcode had strong discriminatory power (87.5%). Among universal barcodes, matK provided the best species resolution (87.5%), followed by ITS (62.5%), rbcL (62.5%), and trnH-psbA (25%). The combination of these four universal barcodes provided the highest discriminatory power (87.5%), which was equivalent to that of the super-barcode. Two plastid genes, ycf1 and psbM-psbD, had much better discriminatory power (both 87.5%) than did other plastid barcodes, and were suggested as potential specific barcodes for identifying Fritillaria species. Phylogenetic analyses indicated that F. cirrhosa was not a “good” species that was composed of multiple lineages, which might have affected the evaluation of the discriminatory ability. This study revealed that the complete plastid genome, as super barcode, was an efficient and reliable tool for identifying the original species of Fritillariae cirrhosae bulbus and its adulterants.

Introduction generally sequenced only one individual per species, and thus they could not effectively compare the intra-and inter-specific genetic distances in Fritillaria species. This leads to the question of whether species of Fritillaria, especially the original plants used to make Fritillariae cirrhosae bulbus and its adulterants, could be correctly identified at the species level by analyses of the complete plastid genome.
In the present study, we used the complete plastid genome sequence as a super-barcode to: identify the botanical origins of Fritillariae cirrhosae bulbus and its adulterants; compare the discriminatory power of the super-barcode with that of universal DNA barcodes and their combinations; and scan the highly variable gene regions as potential specific barcodes for species identification of Fritillariae cirrhosae bulbus. The present study provided abundant information for further development of super-barcodes and broadened the horizon over which other rapid and accurate molecular techniques for species identification in Fritillaria can be explored.

Material sampling
A total of 32 individuals of Fritillaria representing the five original species of Fritillariae cirrhosae bulbus and three species of its adulterants, as well as an individual of F. anhuiensis S.C. Chen & S.F. Yin as an outgroup, were used in tree-building analysis in this study (Fig 1, S1 Table). Among these species, individuals of the five original species that produce the genuine medicine were collected from wild habitats in Southwest China, while those of the adulterants were collected from cultivated populations in provinces of Zhejiang, Jilin, and Xinjiang, China. None of the species from the genus Fritillaria are listed as the national protected plants in China (Information System of Chinese Rare and Endangered Plants, http://www.iplant.cn/rep/) and therefore their collection is allowed for scientific research. Fresh leaves were sampled from healthy, mature individuals in the field or cultivation bases and then dried by using silica gel. Meanwhile, 3-5 individuals with flowers were dug up and preserved as voucher specimens. Subsequently, geographic information (latitude, longitude, altitude, etc.) for the sampling locations was obtained by a Global Positioning System receiver (GPS; Garmin, Olathe, KS, USA). All voucher specimens of Fritillaria were identified and then deposited at the Herbarium of Medicinal Plants and Crude Drugs of the College of Pharmacy and Chemistry, Dali University, Dali, China (S1 Table).
DNA extraction, sequencing, and assembly. Total genomic DNA was extracted from about 100 mg of dried leaf material using a modified CTAB method [44,45]. The DNA content was checked by electrophoresis on 1.2% agarose gels, and its concentration was determined using a SmartSpec TM Plus Spectrophotometer (Bio-Rad, Hercules, CA, USA). DNA extracts were then fragmented for the construction of 300 bp short-insert paired-end (PE) libraries and sequenced on an Illumina HiSeq 2000/2500 sequencer at the Beijing Genomics Institute (BGI, Shenzhen, China).
The raw data were filtered using Trimmomatic v.0.32 [46] with default settings. Paired-end reads in the clean data were then filtered and assembled into contigs using GetOrganelle.py [47] with Fritillaria cirrhosa (accession number: KF769143) as reference [48], calling the bow-tie2 v., Blastn v., and SPAdes v.3.10 [49]. The de novo assembly graphs were visualized and edited using Bandage Linux dynamic v.8.0 [50], and then a whole or nearly whole circular plastid genome (plastome) was generated.

Annotation and sequence submission
The plastomes were annotated by aligning them to the complete plastid genome sequence available in NCBI using MAFFT [48,51] with default parameters, which was coupled with manual adjustment using Geneious v.11.1.4 [52]. Circular genome visualization was generated with OGDRAW v.1.2 [53]. Furthermore, the ITS sequences were sequenced using Illumina sequencing technology and extracted from the raw data. The annotated plastid genomes of the nine Fritillaria species and their ITS sequences were submitted to the NCBI under the accession numbers MH588404-MH588436 for ITS sequences and those listed in Table 1 for the complete plastid genomes.

Variable site analysis
After using MAFFT v.7.129 to align the plastid genome sequences, BioEdit software was used to adjust the alignment manually [51,54]. A sliding window analysis was conducted to determine the nucleotide variability (Pi) in the whole plastid genome using DnaSP v.6.11 [55]. The step size was set to 200 bp, with a 600 bp window length. Moreover, the DnaSP software was used to identify and quantify the insertions/deletions (indels), mutations, and nucleotide variability (Pi) in all aligned datasets. The p-distances, GC content, variable sites, and parsimony informative sites in the genomes were identified and analyzed by the software MEGA v.7.0.26 [56].  [8,9] and herbarium specimens (http://www.cvh.ac.cn/). The a-f refer to these species and their distribution. Photos of the eight Fritillaria species studied are also added: a.

Species discrimination of universal, super, and specific barcodes
To evaluate the success rates of species discrimination with each barcode, we used a tree-building method to analyze 14 datasets for each of the single regions examined and their combinations. All single regions, including universal DNA barcodes and highly variable loci, were extracted from the complete plastid genome sequences, except the ITS/ITS2 region. These datasets were aligned with MAFFT [51] and used to build neighbor-joining trees (NJ) based on p-distances in the software MEGA [56]. The plastome of F. anhuiensis (accession number: MH593363) was used as the outgroup in these tree-building analyses. Species were regarded as being successfully discriminated if all the individuals of a given species formed a monophyletic group [57].   (Table 1). Moreover, a total of 115 genes were found, namely 78 protein coding genes, 30 tRNA genes, and 4 rRNA genes, as well as 3 pseudogenes (Table 2, S1 Fig). The protein coding genes present in the plastid genome of Fritillaria included 9 genes for large ribosomal proteins, 12 genes for small ribosomal proteins, 5 genes for photosystem I, 15 genes for photosystem II, and 6 genes for ATP synthase (

Highly variable regions in plastid genome
Seven highly variable regions from the plastid genomes, namely three intergenic regions (psbM-psdD, rps4-trnL-UAA, and ndhF-trnL-UAG), three gene regions (matK, ndhD, and ycf1), and one intron region (petB-intron), were selected as potential specific barcodes for use in species identification in Fritillaria (Fig 2). Among these regions, ycf1 was the longest, followed by psbM-psbD, while petB-intron region was the shortest (Table 3). Furthermore, all highly divergent fragments were found in the LSC and SSC regions, whereas none were present in the IR regions.

DNA barcoding gap assessment
The inter-and intraspecific distances were calculated for each of the 14 datasets (Table 3). In these datasets, the ITS2 region exhibited the highest inter-and intraspecific distances (0.0326 and 0.0094, respectively), followed by the ITS (0.0235 and 0.0044), trnH-psbA (0.0147 and 0.0010) and ndhF-trnL-UAG (0.0112 and 0.0008), whereas these distances were the lowest for the rbcL (0.0033 and 0.0005). Meanwhile, the inter-and intraspecific distances of the complete plastid genome showed relatively low values (0.0037 and 0.0005, respectively) compared with those calculated for other datasets. Furthermore, the barcoding gap between inter-and intraspecific distances based on the p-distance model revealed that matK had the highest interspecific gap (divergence), but overlap between inter-and intraspecific distances existed for almost all single regions and their combinations, except for matK (Fig 3, S2 Fig).

Discriminatory powers of all regions and their combinations
We calculated the species discrimination ability of each region and their combinations based on 14 datasets using tree-building methods (Fig 4). The super-barcode, comprised of complete plastid genomes, showed the highest power for species identification (87.5%), with strongest bootstrap values (Fig 5), except for F. cirrhosa, which is probably due to its polyphyletic nature.
According to the NJ trees, each of the DNA regions psbM-psbD, ycf1, and matK alone could be used to efficiently identify the original plants of Fritillariae cirrhosae bulbus and its adulterants. In contrast, the ITS, ndhD, ndhF-trnL-UAG, and rps4-trnL-UAA regions identified only the adulterant species from the genuine ones in the medicine. Moreover, the ITS2 could discriminate the adulterant from the genuine medicine, but could not effectively identify any of

Definition of Fritillaria cirrhosa
In the NJ trees inferred from the whole plastid genomes, other DNA regions, or combinations, the six individuals of F. cirrhosa collected from three locations were clustered into two distinct clades: one was placed close to F. unibracteata and the other was close to F. przewalskii with strong support values (Fig 5).

Discussion
DNA barcoding has been demonstrated to be an efficient tool for identifying traditional medicines as well as their original species [23,[58][59][60]. In the present study, this tool was used to discriminate among the five original species of Fritillariae cirrhosae bulbus and three of its adulterants.

Performance of universal DNA barcodes
In plants, the term universal DNA barcodes, which are identified based on Sanger sequencing, generally refers to rbcL and matK, to be supplemented with trnH-psbA and ITS/ITS2 [21,22]. Among the four universal DNA barcodes, matK had the highest species discriminatory power and successfully distinguished all the original species of Fritillariae cirrhosae bulbus, except F. cirrhosa. In contrast, ITS2 and trnH-psbA showed the lowest species resolution and failed to correctly identify the original species of Fritillariae cirrhosae bulbus, although these two barcodes were previously suggested as core barcode for species identification in traditional Chinese medicines [22,61]. ITS/ITS2 correctly discriminated the genuine medicine from the adulterants (Fig 5, S3 Fig). This result was nearly consistent with that of our previous study [24]. In fact, ITS/ITS2 were rich in variable sites and nucleotide diversity (Pi) compared with the studied plastid DNA regions (Table 3), but the obvious overlap between inter-and intraspecific distances in such regions among and within Fritillaria species might impact their discriminating ability (Fig 3).
It should be noted that the matK sequences used for analysis were obtained from the complete plastid genome, not by Sanger sequencing. If the obstacles to Sanger sequencing of matK are resolved, matK will become the ideal candidate barcode for identifying the original species of Fritillariae cirrhosae bulbus, despite its lower nucleotide diversity (Pi) when compared with that of other regions. The best performance of this barcode might be attributed to the existence of a clear barcoding gap (Table 3, Fig 3). The matK gene has been confirmed to perform poorly in discriminating species of Fritillaria and other genera such as Primula [62] and Garcinia [63] because of its low success rates in PCR amplification and Sanger sequencing [24]. However, next-generation sequencing (NGS) technology could resolve the aforementioned deficiency of the Sanger method, as it was revealed in the present study that this marker has satisfactory discriminatory power. Therefore, we propose that NGS technology should be adopted for DNA sequencing of matK to overcome the disadvantages of the Sanger sequencing method for such fragments.
Furthermore, combined barcodes generally perform better in species discrimination than single barcodes [23]. In the present study, combinations of the four barcodes (HKLI) or three plastid regions (HKL) also showed strong species discrimination abilities (87.5%) (Fig 5, S3  Fig). In short, multi-locus combination could raise discrimination ability for Fritillaria species and improve the reliability.

Super-barcode-a crucial candidate DNA barcode in Fritillaria
Because of the low power for species discrimination of universal DNA barcodes, new methods are necessary to discriminate closely related species [23,37]. Complete plastid genomes are extremely rich in genetic variations and have been shown to be powerful tools for resolving the phylogenetic relationships of complex groups [30,42,43,64,65]. Their use can greatly improve the resolution at lower taxonomic levels in plant phylogeny, phylogeography, and population genetics; therefore, they were also proposed as a type of super DNA barcode that is likely to resolve the defects of the universal DNA barcodes [23,66]. In this study, plastid genomes of Fritillaria species, with lengths from 151,518 to 152,073 bp, provided abundant informative sites for species identification. As a result, this super-barcode identified almost all of the original species of Fritillariae cirrhosae bulbus and its adulterants with high bootstrap values, except for F. cirrhosa because of its possible polyphyletic nature (Fig 5). Herein, the complete plastid sequences in the present study were obtained from dried leaf materials using NGS methods. Nevertheless, the use of this super-barcode might face extreme challenges if one needed to extract DNA from specific materials, such as kiln-dried specimens or medicines. However, recent procedures have been developed that can use total DNA as a template for genome skimming to assemble plastid genome, which not only solves the problem of extracting plastid DNA from dried or even degraded materials but also simplifies the whole process [23,67,68]. As sequencing technology and bioinformatics continue to improve rapidly, super DNA barcode will become more popular and may eventually replace Sanger-based DNA barcoding. Thus, super DNA barcodes could be adopted as useful complements to universal DNA barcodes, especially in identifying closely related species.

Specific DNA barcode-a trade-off between universal and super-barcode
The present study revealed that, when adopting the NGS method, the universal DNA barcodes were limited in their ability to discriminate species, except matK region, which showed high success rate for identifying species in Fritillaria (87.5%). Although the super-barcode exhibited high discriminatory power and sufficient reliability in this study, its use might be limited due to complications in data analyses and expensive sequencing costs. Therefore, it is of great importance to search for specific barcodes from highly variable regions that can be used as a trade-off between universal and super DNA barcodes.
Highly variable regions in the plastome could help to effectively resolve phylogenetic relationships and identify species within complicated groups [30,69]. In the present study, six highly variable regions were extracted from the complete plastid genomes. Among these regions, ycf1 had the same high species discriminating ability (87.5%) as the combination of four markers (HKLI); similarly, psdM-psbD showed the strikingly high discriminatory power (87.5%) (Fig 4, Fig 5). In fact, ycf1 had been successfully used to reconstruct the phylogenies of the family Orchidaceae [70] and the genus Astragalus [71], and it showed an extremely strong power to resolve interspecific relationships.
Therefore, we conclude that ycf1 and psdM-psbD can be used as potential specific barcodes for Fritillaria. However, the overly long DNA sequences of these regions result in some difficulties during PCR amplification and DNA Sanger sequencing [72]. Therefore, designing suitable primers to amplify shorter sections of those regions (about 1000 bp) with more variation might be another approach to be used in Sanger sequencing.

Delimitation of F. cirrhosa and its effect on species discrimination abilities of DNA barcodes
In the current study, the NJ trees generated using plastid genomes, universal barcodes, specific barcodes, and their combinations failed to resolve the samples of F. cirrhosa from different locations into one clade (Fig 5, S3 Fig). In the tree based on genome sequences, six individuals were divided into two different clades, one (BM1-1, BM1-2, and BM 2-2) was sister to F. unibracteata, but the other (BM2-1, BM 3-1, and BM 3-2) was placed close to F. przewalskii. These results indicated that F. cirrhosa is not a "good" species but contains multiple different lineages, which was also supported by the previous analysis of population genetics data based on amplified fragment length polymorphism (AFLP) markers (unpublished data). It is well known that F. cirrhosa has extremely complex variations in its morphology, especially its floral characteristics. According to the field survey, individuals from Lijiang (ZDQ15019) possess yellow-green tepals, but they are dark purple in individuals from Shangri-La (ZDQ13053). The complex morphology of this species unavoidably causes confusion in its taxonomy. Delimitation, as well as phylogeny, of F. cirrhosa and its closely related species is still controversial [8] and requires further study based on more samples and better markers. Thus, if the polyphyletic condition of this species is not considered, we could conclude that all of the examined fragments, including the whole plastid genomes, matK, ycf1, and psbM-psbD, as well as the combination HKLI, were able to discriminate all the original species of Fritillariae cirrhosae bulbus and its adulterants with a success rate of 100%.

Conclusion
In the present study, 32 individuals from eight species, representing five species of the original plants of Fritillariae cirrhosae bulbus and three of its adulterants, were employed to compare the species discriminatory powers of universal, super, and specific DNA barcodes. The results revealed that the whole plastid genome used as a super-barcode exhibited a powerful ability to identify Fritillaria species, with high reliability. Among the universal barcodes, only matK could discriminate almost all the original species when NGS methods are employed. It should be noted that ITS2 separated genuine Fritillariae cirrhosae bulbus from its adulterants, but it could not correctly identify the original species. Among the highly variable regions examined, ycf1 and psbM-psbD are considered the primary potential specific barcodes for Fritillaria species, but their successful sequencing using the Sanger method will depend on developing primers that will amplify the barcodes in sections. Moreover, NJ analysis based on complete plastid genomes, as well as other regions, revealed that F. cirrhosa was polyphyletic and the variations in its morphology requires further research at the population level. As the costs of NGS continue to decrease and data analysis methods are simplified, the use of super-barcodes might become the primary method for species discrimination in plants. Overall, the results in this study help to recognize species discrimination ability of super, universal, and specific barcodes in complex groups, and provide new knowledge to accurately identify the original plants of Fritillariae cirrhosae bulbus and its adulterants.