Internal transcribed spacer 2 (ITS2) barcodes: A useful tool for identifying Chinese Zanthoxylum

Premise of the Study The genus Zanthoxylum in the Rutaceae family of trees and shrubs has a long history of domestication and cultivation in Asia for both economic and medicinal purposes. However, many Zanthoxylum species are morphologically similar and are easily confused. This often leads to false authentication of source materials and confusion in herbal markets, hindering their safe utilization and genetic resource conservation. DNA barcoding is a promising tool for identifying plant taxa. Methods We used three candidate DNA barcoding regions (ITS2, ETS, and trnH‐psbA) to identify 69 accessions representing 13 Chinese Zanthoxylum species. The discriminatory capabilities of these regions were evaluated in terms of PCR amplification success, intra‐ and interspecific divergence, DNA barcoding gaps, and identification efficiency using the BLAST and tree‐building methods. Results ITS2 proved the most useful for discriminating Chinese Zanthoxylum species, with a correct identification rate of 100%, and this region also exhibited significantly higher intra‐ and interspecific divergence. Discussion Phylogenetic analysis confirmed that ITS2 has a powerful discriminatory ability both at and below the species level. We confirmed that ITS2 is a powerful barcoding region for identifying Chinese Zanthoxylum species, and will be useful for analyzing and managing Chinese Zanthoxylum germplasm collections.

DNA region (Hebert et al., 2003(Hebert et al., , 2004. The cytochrome c oxidase subunit I (COI) gene has been widely found to have a high ability to differentiate between animal species (Hebert et al., 2003). However, in plants, the low mutation rate of the COI gene has led to the search for alternative barcoding regions (Chase et al., 2005;Kress et al., 2005;CBOL Plant Working Group, 2009), and studies have focused on chloroplast and nuclear genes (Cowan et al., 2006;Pennisi, 2007). Many loci have been proposed as plant barcodes. Although no consensus has emerged, different research groups have embraced DNA barcoding as a practical tool for taxa identification and have proposed various different loci as preferred plant barcodes (Pennisi, 2007;Erickson et al., 2008;Lahaye et al., 2008). For example, the Consortium for the Barcode of Life Plant Working Group (CBOL) evaluated seven plastid regions for land plants and proposed the two-locus rbcL+matK as a universal plant DNA barcode (CBOL Plant Working Group, 2009). However, because rbcL has low sequence variation  and matK has low primer universality (Sass et al., 2007;Piredda et al., 2011;Kool et al., 2012), this two-locus barcode is far from perfect, and the search for a more appropriate DNA barcoding region for plants is ongoing.
In the nuclear genome, the ITS gene, especially ITS2, is a potential DNA barcode, and it has been used in multi-loci barcode combinations (Chase et al., 2005;Feng et al., 2016a). Schultz et al. (2005) advocated that ITS2 has potential as a general phylogenetic marker. Chen et al. (2010) tested the discriminatory ability of ITS2 in more than 6600 plant samples belonging to 4800 species from 753 distinct genera, and proposed that it can be used as a standard DNA barcode to identify medicinal plants. ITS2 has since been shown to be a promising and effective region for identifying medicinal samples (Feng et al., 2016a;Han et al., 2016).
The ETS region is often used for phylogenetic analysis (Li et al., 2007). Acevedo-Rosas et al. (2004) used the ETS region to determine relationships among species of Graptopetalum Rose and closely related genera, and found that it had the highest number of parsimony-informative sites. Li et al. (2002) used the ETS sequence to examine paraphyletic relationships in Syringa L. and Ligustrum L., and Mo (2015) tested the ETS sequence in Paulownia Siebold & Zucc. and confirmed it as a candidate DNA barcode for this genus.
The chloroplast genome, particularly the trnH-psbA region, is one of the more variable intergenic spacers in plants and can be recommended as a supplementary barcode (Hollingsworth et al., 2011). Kress et al. (2005) proposed that the trnH-psbA plastid spacer region is suitable for DNA barcoding of flowering plants, and it was used in a two-locus barcode system. The trnH-psbA region was later used for a wide range of plants, and an increasing number of researchers have proposed it as a supplementary DNA barcode for plant taxa Chen et al., 2010).
Because identifications based on molecular markers are independent of environmental variation, molecular markers have been popular tools for modern taxonomists due to their stability and universality (Feng et al., 2016a). Some molecular markers, including simple sequence repeat (SSR) and sequence-related amplified polymorphism (SRAP) markers, have been used for analysis of Zanthoxylum species. In addition, some DNA sequences, including ITS nrDNA (Shen et al., 2005) and chloroplast regions (matK, rbcL, trnL-trnF), have been used to assess the phylogeny of the Zanthoxylum genus (Feng et al., 2016b).
Herein, we investigated the utility and species identification ability of DNA barcoding using ITS2, ETS, and trnH-psbA in Chinese Zanthoxylum and evaluated the ability to discriminate below the species level in this genus.

Plant materials
A total of 69 individuals of 13 species were included in this study. Samples were from two sources: (1) 61 field collection specimens from all provinces in China presently known to contain Zanthoxylum, namely Shaanxi, Yunnan, Guizhou, Sichuan, Gansu, Hebei, Chongqing, and Shanxi, and (2) eight vouchered herbarium specimens from the Northwest A&F University Herbaria. Of the vouchered specimens, seven species were represented by more than one individual, and six species were represented by a single individual. These individuals represent a diverse mix of cultivated and wild Zanthoxylum species. Spatial coordinates were registered using a Global Positioning System (GPS) (Appendix S1).

DNA extraction, amplification, and sequencing
Genomic DNA was extracted from either silica-dried leaves or herbarium specimens using the modified cetyltrimethylammonium bromide procedure (Porebski et al., 1997). Universal primers for selected regions and sources are listed in Table 1. PCR amplification was conducted in 40-μL volumes containing 20 μL of 2× Taq Master Mix (CWBIO Biotechnology Co. Ltd., Beijing, China), 2.0 μL of genomic DNA, 0.5 μL of each primer, and 17 μL of double-distilled water. Cycling parameters for amplifying the ITS2 markers were as described previously by Chen et al. (2010). Cycling parameters for the ETS region were 96°C for 5 min; followed by 30 cycles of 96°C for 30 s, 55°C for 45 s, and 72°C for 40 s; and a final extension at 72°C for 5 min. Cycling parameters for the trnH-psbA region were 94°C for 5 min; followed by 30 cycles of 94°C for 45 s, 57°C for 30 s, and 72°C for 72 s; and a final extension at 72°C for 10 min. PCR products were run on 1.5% agarose gels and sequenced by AuGCT DNA-SYN Biotechnology (Beijing, China). All sequences reported in this study have been deposited in the National Center for Biotechnology Information (NCBI) GenBank database (accession numbers MF070103-MF070230, MF039484-MF039544; Appendix S2).

Data analysis
ITS2 sequences were annotated using ITS2 annotation tools based on the hidden Markov model to remove 5.8S and 28S DNA sequences (Keller et al., 2009). Sequences of three candidate regions were subsequently aligned with ClustalW, and genetic distances were calculated using MEGA 6.0 based on the Kimura 2-parameter (K2P) model (Tamura et al., 2013). The three parameters calculated to characterize interspecific divergence using the K2P model were average interspecific distances, average theta prime (mean genetic variation between different species, thus eliminating biases associated with different numbers of samples among species), and smallest interspecific distances (the minimum interspecific distance within each genus was at least two species) (Meyer and Paulay, 2005;Chen et al., 2010;Gao et al., 2010). We also used the parameters average intraspecific distance, theta, and coalescent theta to evaluate the intraspecific distances based on the K2P model (Meyer and Paulay, 2005;Chen et al., 2010;Gao et al., 2010). The distribution of interspecific and intraspecific divergence was compared using DNA barcoding gaps (Moritz and Cicero, 2004;Chen et al., 2010). Wilcoxon signed-rank tests were calculated using IBM SPSS Statistics version 19.0 (Meyer and Paulay, 2005). We also applied BLAST and tree-building methods to evaluate the species authentication capacity. In the BLAST method, all regions of Zanthoxylum species were used as query sequences. Correct identification was concluded when the best BLAST hit of the query sequence was from the expected species, ambiguous identification was concluded when the best BLAST hits for a query sequence were from several species including the expected species, and incorrect identification was concluded when the best BLAST hit was not from the expected species van der Merwe et al., 2014). The tree-building method was performed by constructing neighbor-joining trees in MEGA 6.0 using the K2P model with bootstrap support values for individual clades computed by running 1000 bootstrap replicates. Species were considered identified if all individuals of a species formed a monophyletic group (Li et al., 2011). Citrus limetta Risso, C. cavaleriei H. Lév., and Toddalia asiatica (L.) Lam. were used as outgroups.

Amplification and sequencing of ITS2, ETS, and trnH-psbA
In this study, two nuclear DNA regions and one chloroplast region were selected as candidate barcodes. Analysis of the efficiency of PCR amplification showed that ITS2 and trnH-psbA achieved the highest rate (100%), followed by ETS (97.10%, except for the samples of Z. echinocarpum Hemsl. and Z. multijugum Franch.; Appendix S3). The sequencing success rates of ITS2 and ETS were 88.41% and 94.02%, respectively, while that of trnH-psbA was 94.20% (  Table 3). Thus, the lengths and GC content of the three candidate regions for the tested species were relatively variable.

Assessment of intraspecific and interspecific genetic divergence
We used MEGA 6.0 based on the K2P model to estimate the genetic divergence of all species. The ITS2 and ETS regions displayed the highest interspecific and intraspecific divergence according to average interspecific distances, theta prime, average intraspecific distances, coalescent depth, and theta, while the trnH-psbA region yielded lower values for these parameters (Table 4). Moreover, Wilcoxon signed-rank tests confirmed that the ITS2 and ETS regions exhibited the highest interspecific variability and shared similar divergence, whereas variability and divergence were somewhat lower for the trnH-psbA region (Table 5). For intraspecific divergence, Wilcoxon signed-rank tests indicated that trnH-psbA showed the lowest variation between conspecific individuals, whereas ETS showed the highest (Table 6).

DNA barcoding gap and species discrimination
Based on the K2P model of intraspecific and interspecific divergence, we investigated the distribution of genetic distance among Zanthoxylum species at a scale of 0.01 distance units. The results indicated that none of the three regions exhibited distinct DNA barcoding gaps, and each distribution showed a slight overlap between intra-and interspecific distances (Fig. 1). In order to assess the identification efficiency of each region, we calculated the discrimination capacity using the BLAST and tree-building methods. Regardless of whether the BLAST or the tree-building method was used, the results showed that the ITS2 region had the highest discrimination efficacy (100%) at the species level. Meanwhile, ETS possessed 90.91% identification success rates at the species level for both the BLASTA1 method and the tree-building method. The discrimination efficacy of the trnH-psbA region is relatively lower (BLAST: 75.0%, tree-building: 91.67%) ( Table 7).

Phylogenetic analysis
The primary purpose of the phylogenetic tree was not to evaluate the evolutionary relationships, but to test whether species are recovered as monophyletic, using phylogenetic techniques and bootstrap resampling. Our neighbor-joining tree showed that the ITS2 region could distinguish between all tested species of the genus Zanthoxylum in 100% of cases; each species clustered as a distinct monophyletic group ( Fig. 2A). However, two samples of Z. dissitum did not cluster together using the ETS region, and two samples of Z. piperitum did not cluster together using trnH-psbA (Fig. 2B,  C). Interestingly, we found that the Z. armatum accessions divided into two subgroups (III and IV) using all three regions. Subgroup III mainly consisted of cultivar accessions from 'Dingtanhuajiao' (ZA01-ZA05), 'Jiangjinqinghuajiao' (ZA14), and 'Tengjiao' (ZA15), whereas accessions of subgroup IV were wild varieties. In addition, the Z. armatum cultivar 'Dingtanhuajiao' is distributed exclusively in Guizhou Province and was recovered as monophyletic by ITS2 and ETS regions. Separation between cultivars in terms of phylogeny was also observed in Z. bungeanum, and 'Hanchengdahongpao' (ZB06-ZB09 and ZB22-ZB25) and 'Fengxiandahongpao' (ZB01-ZB05) were clearly clustered by ITS2 and ETS regions into subgroups I and II. Together, these findings indicate that the ITS2 region possesses powerful discriminatory ability both at and below the species level.

DISCUSSION
A rapid and accurate method for authenticating species from the genus Zanthoxylum is important to ensure the safe use of drugs made from these medicinal herbs, as well as to maximize their economic value. DNA barcoding is a relatively new taxonomic method for fast, efficient, and reliable species identification (Hebert et al., 2004;Chen et al., 2010). An ideal DNA barcode should be routinely retrievable with a single universal primer pair and should exhibit high taxonomic coverage and high resolution. Given all these considerations, we tested three regions (ITS2, ETS, and trnH-psbA) for their utility as DNA barcode targets to distinguish 13 Zanthoxylum species. To the best of our knowledge, this is the first study using these three regions for species identification in the Zanthoxylum genus.
Compared with the trnH-psbA barcode in the chloroplast region, the ITS2 and ETS regions showed higher interspecific variation and discriminatory power, and accurately discriminated all Zanthoxylum species tested in this study. Specifically, ITS2 distinguished 100% of monophyletic species, and performed better than the other two loci. The nuclear ITS2 region is often regarded as a candidate DNA barcode because of its valuable characteristics, including availability of conserved regions for designing universal primers, ease of amplification, and sufficient variability to distinguish even closely related species .
The ETS nuclear region has previously been used as a DNA barcode for the genus Paulownia (Mo, 2015). However, our current results indicate that ETS is not an ideal candidate DNA barcode for the Zanthoxylum genus, based on the criteria applied. First, the length of ITS2 is relatively short compared to ETS. Second, the PCR amplification efficiency of ITS2 was higher than that of ETS. Third, using the BLAST method, ITS2 achieved a higher success rate for identification of samples at the species level compared with ETS. Although ETS had higher interspecific divergence than ITS2 in our study, Wilcoxon signed-rank tests indicated no statistical differences between the two regions (Tables 5, 6).
In addition, the phylogenetic tree indicated that the ITS2 and ETS regions showed powerful discriminatory ability not only at the species level, but also below the species level. The Qingling Mountains form the boundary between southern and northern China; the 'Hanchengdahongpao' accession is perhaps the most well-known representative of Z. bungeanum in northern China, whereas 'Fengxiandahongpao' is the most common variety in the south. Based on our phylogenetic analysis, these representative accessions were identified correctly using both ITS2 and ETS regions, and the 'Dingtanhuajiao' cultivated accession of Z. armatum could also be distinguished from the other cultivated accessions by both ITS2 and ETS ( Fig. 2A, B). These results imply that ITS2 is superior to ETS as a DNA marker in the Zanthoxylum genus and that the ETS spacer should be used as a complementary barcode in the Zanthoxylum genus.
In previous studies, the chloroplast region trnH-psbA was dismissed early on because of considerable interspecific variation, and even intraspecific variation, including the presence of inversions and insertion-deletion polymorphisms (indels), which make it difficult to align among broad groups of taxa (Fazekas et al.,

2008
; Whitlock et al., 2010). In the present study, we did not find these problems with Zanthoxylum. However, using the full data set constructed in this study, the trnH-psbA region proved unsuitable for DNA barcoding in Zanthoxylum. For example, using the BLAST method, trnH-psbA had the lowest correct identification rate at the species level among the three regions tested. Moreover, this limitation is supported by the wide variation in sequence length. Variable lengths complicate sequence alignment, which increases the difficulty and lowers the accuracy in real-world applications. In addition, the trnH-psbA region exhibited lower interspecific and intraspecific divergence than the ITS2 and ETS regions. All of these results indicate that the trnH-psbA region does not have potential for DNA barcoding in Chinese Zanthoxylum. Surprisingly, herbarium materials accounted for most of the failed amplification and sequencing results, and removing these results increased the sequencing success rate for ITS2, ETS, and trnH-psbA from 88.41%, 94.02%, and 94.20% to 94.20%, 95.52%, and 98.55%, respectively. We believe this may be related to primer design or fundamental changes in gene structure during herbarium specimen preparation and storage, but it could also be related to the existence of homologous sequences (Starr et al., 2009;Xiao et al., 2010;Hollingsworth et al., 2011). Thus, sequencing technology should be improved to obtain more high-quality sequences. In addition, Kress et al. (2005) advocated that using well-preserved specimens is important for successful DNA barcoding. However, herbarium specimens often vary in how and when they are dried after being pressed, and it is not always certain that specimens were stored in a good state. We collected more than one herbarium sample for each species, but DNA could not be extracted from some samples, and six species included only a single individual. Therefore, we agree that it is best to use fresh samples for DNA barcoding.
Ideally, a perfect DNA barcode should have a distinct DNA barcoding gap and no overlap (Meyer and Paulay, 2005). However, none of the three loci in this study have a distinct DNA barcoding gap (Fig. 1). Moritz and Cicero (2004) suggested that the overlap is considerably greater when a larger proportion of closely related taxa are included. Our results suggest that regions in some cultivars (Z. bungeanum and Z. armatum) have undergone strong natural or artificial selection. A previous study indicated that cultivated Zanthoxylum species experienced multiple long-distance dispersal events, as well as several vicariance events, and the ancestors of Zanthoxylum first colonized the Chinese provinces of Yunnan and Guizhou (Feng et al., 2016b). Our Zanthoxylum samples included both wild and cultivated varieties, which led to maximum intraspecific distances that were greater than minimum interspecific distances. The essence of DNA barcoding technology is the use of a universal barcode. However, in reality, and regardless of the DNA barcoding employed, there will always be some species that would be better resolved by a specific DNA barcode (Pang et al., 2012). Numerous studies have focused on a large number of plant species with distant genetic relationships, but the present study is one of only a few to evaluate the efficacy of DNA barcoding among congeneric Zanthoxylum species. In recent years, various studies have proved that ITS2 is a universal barcode in Asteraceae, Fabaceae, and other families Liu et al., 2010). Chen et al. (2010) compared seven candidate DNA barcoding regions (ITS2,matK,rbcL,ycf5,and rpoC1) and proposed that ITS2 can be used as a standard DNA barcode in medicinal plants. As far as we know, DNA barcoding is currently used for identification at the species level, and there are few reports of identification at cultivar level. According to the data of ITS2 sequence characteristics, intra-and interspecific K2P genetic distances, and a neighborjoining algorithm, relatively large variation occurred not only at the species level but also at the cultivar level. For example, the sequence characteristics also revealed that the cultivars 'Hangchengdahong, ' 'Fengxiandahongpao' (Z. bungeanum), and 'Dingtanhuajiao' (Z. armatum) exhibit more divergence than other cultivars, probably resulting from their wide distribution and diverse habitats. The variation of nuclear ribosomal ITS2 sequences within Chinese Zanthoxylum cultivars may stem from the domestication process and the geographical barrier. The two Z. bungeanum cultivars are distributed south and north of the Qinling Mountains, which block the genetic flow of the two cultivars and create a unique genetic divergence. In our study, the ITS2 data demonstrate 100% accuracy in identifying the Chinese Zanthoxylum species using both the BLAST and tree-building methods. Furthermore, the ITS2 sequence length is short in Zanthoxylum, which is suitable for the PCR amplification and sequencing. We believe that, in addition to its use as a standard phylogenetic marker, ITS2 is a useful DNA barcode in Chinese Zanthoxylum species. We therefore recommend the nuclear ITS2 region for DNA barcoding in Chinese Zanthoxylum and its application in Chinese Zanthoxylum germplasm collections and breeding programs. Our results provide valuable baseline data for the development of more efficient conservation and management plans for this important aromatic species.

ACKNOWLEDGMENTS
The authors thank Shi-Jing Feng and Zhen-Shan Liu for their assistance in the laboratory. This study was funded by the Zanthoxylum Selection and High Quality and High Yield Technology Demonstration Project of Northwest A&F University (grant no. TGZX2016-08).

DATA ACCESSIBILITY
The raw data of three DNA barcoding sequences of 69 Zanthoxylum accessions have been deposited in GenBank (accession no. MF070103-MF070230 and MF039484-MF039544).

SUPPORTING INFORMATION
Additional Supporting Information (Appendices S1-S3) may be found online in the supporting information tab for this article.