Development and Application of InDel Markers for Authentication of the Korean Herbs Zanthoxylum schinifolium and Zanthoxylum piperitum

Zanthoxylum schinifolium and Zanthoxylum piperitum are the sources of the well-known traditional Korean herbal medicines “sancho” (prickly ash) and “chopi” (Korean pepper), respectively. Sancho and chopi are often indiscriminately mixed due to the similar appearance of the herbal materials when used as spices and herbal medicines. Moreover, commercial sancho and chopi products often contain adulterants, which is insufficient to ensure food efficacy and safety. In this study, we developed hypervariable insertion/deletion (InDel) markers to distinguish between sancho and chopi products by comparing the complete chloroplast genome sequences of four Zanthoxylum species deposited in the National Center for Biotechnology Information (NCBI) GenBank. Comparative analyses of the nucleotide diversity (Pi) of these Zanthoxylum genomes revealed four hypervariable divergent sites (trnH-psbA, psbZ-trnG, trnfM-rps14, and trnF-ndhK) with Pi > 0.025 among 520 windows. Of these four regions, including two genic and two intergenic regions, only psbZ-trnG yielded accurate PCR amplification results between commercial sancho and chopi products from the Korean herbal medicine market. We therefore selected psbZ-trnG, an InDel-variable locus with high discriminatory powers, as a candidate DNA barcode locus. This InDel marker could be used as a valuable, simple, and efficient tool for identifying these medicinal herbs, thereby increasing the safety of these spices and herbal materials in the food market.

from Korea, including Z. schinifolium var. inermis (Nakai) T.B. Lee, Z. piperitum f. pubescens (Nakai) W.T. Lee, Z. schinifolium SieboldandZucc., Z. piperitum (L.) DC., Z. schinifolium f. microphyllum (Nakai) W.T. Lee, Zanthoxylum planispinum SieblodandZucc., and Zanthoxylum coreanum Nakai [11][12][13]. Z. schinifolium and Z. piperitum are the earliest cultivated trees in southern regions of Korea. These trees are of major agricultural importance as sources of spices and traditional medicines [9,10]. Whereas Z. schinifolium thorns are arranged in an alternating pattern on branches, Z. piperitum thorns are arranged in an opposing pattern [11]. However, it is difficult to visually discriminate between dried and powdered herbal products made from Z. schinifolium and Z. piperitum. Furthermore, commercial herbal products commonly include a mixture of Z. schinifolium and Z. piperitum. Since these plants are used in herbal medicines and as health food supplements, accurate methods should be developed to identify and characterize the two species.
Several studies have focused on developing molecular markers that can discriminate between various Zanthoxylum species used as traditional medicines in different countries. These molecular markers include amplified fragment length polymorphism (AFLP) markers for distinguishing between Zanthoxylum acanthopodium and Zanthoxylum oxyphyllum [14], sequence-related amplified polymorphism (SRAP) markers [15], internal transcribed spacer (ITS) rDNA-specific markers [16], and ISSR markers [17]. However, the use of these markers is limited by population dynamics and their reproducibility as diagnostic markers in the food market. With the improvement of next-generation sequencing (NGS) technology, chloroplast (cp) genome assembly has become more accessible than Sanger sequencing. The development of molecular markers has also become cost effective through comparisons of cp genomes. The complete cp genomes of several Zanthoxylum species have been sequenced by de novo assembly using a small amount of whole-genome sequencing data [18][19][20].
In the current study, we compared the published cp genome sequences of four Zanthoxylum species and searched for species-specific regions with hypervariable nucleotide diversity amongmembers of Zanthoxylum. Our aim was to develop interspecies insertion/deletion (InDel) markers that could be used to discriminate between Z. schinifolium and Z. piperitum and prevent the indiscriminate mixing of these two materials in commercial products.

Comparison of Cp Genomes and Identification of Hypervariable Loci
All cp genome sequences in plants of the Zanthoxylum genus with complete genome sequence information were downloaded from GenBank ( Table 1). The sequences were aligned using the Clustal W algorithm from MEGA7.0 [21] and CLC viewer 8.0 software (CLC bio, Aarhus, Denmark). The gene distribution patterns and similarities in the Zanthoxylum cp genomes were compared and visualized using mVISTA software (http://genome.lbl.gov/vista/mvista/submit.shtml) in Shuffle-LAGAN mode with the annotated Z. piperitum KT153018 cp genome as a reference. The variability of the aligned genomes was evaluated using the sliding window method with DnaSP ver. 5.0. software [22]. The window size was set to 600 base pairs (bp), the typical length of DNA markers. The step size was set to 300bp for relatively accurate positioning of hypervariable InDels. Only regions with a nucleotide diversity (Pi) value of >0.025 were considered. Hypervariable sites and genetic distance in the cp genome were calculated using MEGA 7.0. The InDel events were checked manually based on the aligned sequence matrix.  Table 2 lists the Z. schinifolium and Z. piperitum collections from the National Institute of Biological Resources used in this study. Thirteen spice and powdered herbal materials (ten sancho and six chopi samples) were purchased from verified local market sources in Korea and China (Table 3). Species identification was performed by the National Institute of Biological Resources, Korea. Prior to total genomic DNA extraction, 50mg (dry weight) of each sample was added to a tube filled with stainless steel beads (2.38 mm in diameter) from a PowerPlantPro DNA Isolation Kit (Qiagen, Valencia, CA, USA), and the mixture was homogenized in a Precellys®Evolution homogenizer (Bertin Technologies, Montigny-le-Breonneux, France). Genomic DNA was extracted from the collected samples using the PowerPlantPro DNA Isolation Kit according to the manufacturer's instructions. Guseong-ri, Jugwang-myeon, Goseong-gun, Gangwon-do, Korea NIBRVP0000470134 10 Seongu-ri, Onjeong-myeon, Uljin-gun, Gyeongsangbuk-do, Korea NIBRVP0000538333

Development and Validation of the InDel Molecular Marker
To validate interspecies polymorphisms within the cp genomes and to develop DNA genetic markers for identifying these four Zanthoxylum species, primers were designed using Primer 3 Plus (http: //www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi), and National Center for Biotechnology Information (NCBI) Primer-BLAST online tool (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) was performed based on the mutational hotspot regions (hypervariable regions) found in these Zanthoxylum cp genomes. PCR amplifications were performed in a reaction volume of 50 µL containing 5 µL 10× Ex Taq buffer (with MgCl 2 ), 4 µL dNTP mixture (each 2.5 mM), Ex Taq (5 U/µL) (TaKaRa Bio, Ostu, Japan), 10 ng genomic DNA templates, and 1 µL (10 pM) forward and reverse primers. The mixtures were denatured at 95 • C for 5 min and amplified for 40 cycles at 95 • C for 30 s, 55 • C for 20 s, and 72 • C for 30 s, with a final extension at 72 • C for 5 min. To detect PCR amplicons, the PCR products were separated by capillary electrophoresis (QIAxcel Advanced System, Qiagen, Hilden, Germany) using a QIAxcel DNA High Resolution Kit via the 0M500 method (Qiagen). The target DNA was extracted and purified using a MinElute PCR Purification Kit (Qiagen). Purified PCR products were sent to CosmoGenetech for sequencing (Seoul, Korea) with both forward and reverse primers. The sequencing results were analyzed by BLAST searches of the GenBank database. Sequence alignment and data visualization were carried out with CLC sequence viewer 8.0.
A 2 µL aliquot of each PCR-amplified trnH-psbA product (concentration 0.6 µg/µL to 1 µg/µL) was digested in 2 µL of 10× Cut Smart buffer, two units of PleI restriction enzyme (New England Biolabs, Ipswich, MA; NEB), and 15.8 µL distilled H 2 O in a final volume of 20 µL, followed by incubation at 37 • C for 20 min and inactivation at 65 • C for 20 min. The digested fragments were separated by electrophoresis on 1.5% agarose gels stained with ethidium bromide, and the fragment patterns were visualized under UV light.

Comparative Analysis of the Cp Genomes of Various Zanthoxylum Species
To investigate the level of sequence divergence among Zanthoxylum cp genomes, we performed a comparative analysis of three Zanthoxylum cp genomes with mVISTA using the annotated Z. piperitum sequence as a reference (Figure 1). The cp genomes of the Zanthoxylum species showed high sequence similarity, with identities of <90% in only a few regions, pointing to a high level of conservation among the cp genomes. However, sliding window analysis using DnaSP detected highly variable regions in the Zanthoxylum cp genomes (Figure 2). We calculated the nucleotide diversity (Pi) values in the four Zanthoxylum cp genomes as an indicator of divergence at the sequence level. The large single-copy (LSC) regions and small single-copy (SSC) regions were more divergent than the inverted repeat (IR) regions.
Among the 520 windows examined, IR regions were more conserved than LSC and SSC regions, with average Pi values of 0.0015 and 0.0012 for IRa and IRb, respectively (for regions other than those with a Pi value = 0). The Pi values for the LSC regions averaged 0.0082, whereas the SSC regions had a Pi value of 0.0103, and the average Pi value for all regions was 0.00609. Four mutational hotspots in the cp genomes showed markedly higher Pi values (>0.025), including two intergenic regions (trnH-psbA, 0.02833; psbZ-trnG, 0.05067) and two genic regions (trnfM-rps14, 0.05367; trnF-ndhK, 0.02667 and 0.03133) (Figure 2). Although the SSC and IR regions were generally highly conserved, the four regions located in the LSC regions were particularly divergent.

Development of InDel Markers to Discriminate between Z. Schinifolium and Z. Piperitum
Based on multiple alignments of complete cp genome sequences, we selected the four most highly variable InDel loci as candidate DNA markers (Table 4). We confirmed these four InDel regions by PCR amplification and sequencing and investigated their utility for discriminating between Z. schinifolium and Z. piperitum (Figure 3). We produced four markers (ZanID1, ZanID2, ZanID3, and ZanID4) that were specific to Z. schinifolium and Z. piperitum and were derived from long InDels in the intergenic regions psbZ-trnG, trnfM-rps14, and trnF-ndhK, respectively. ZanID1 was derived from 11, 3, and 6bp InDels in the trnH-psbA locus and was specific to Z. schinifolium and Z. piperitum (Figure 3). ZanID2 was derived from 19, 28, and 5 bp InDels and 40 and 23 bp tandem repeats (TRs) in the psbZ-trnG locus. ZanID3 was derived from 19, 28, 39, and 6 bp InDels and a 23 bp TR in trnfM-rps14. ZanID4 was derived from 32, 6, 59, and 50 bp InDels and an 8 bp TR in trnF-ndhK.

Utilization of InDel Markers in the Korean Food Market
To validate the utility of our newly developed markers to identify commercial dried herbal materials, we extracted genomic DNA from powdered or dried samples of ten sancho and six chopi products and amplified them by PCR using the newly developed primers ( Table 4). The banding patterns of the ZanID1 marker revealed that lanes 1, 2, 5, 6, 8, and 10 contained sancho samples, while lanes 3, 4, 9, and 11-16 were identified as chopi ( Figure 4A). Different banding patterns were obtained using ZanID2, with lanes 2, 3, 5, 7, 8, and 10 containing sancho samples and the other samples identified as chopi ( Figure 4B). Interestingly, samples 2, 5, and 7 clearly produced double bands in both species ( Figure 4B). Furthermore, analysis of the banding patterns of the ZanID3 and ZanID4 markers revealed that lanes 2, 3, 5, 6, 7, 8, and 10 contained sancho samples, while the other samples were identified as chopi ( Figure 4C,D). These three markers produced considerably different banding patterns, making it difficult to discriminate between sancho and chopi.
To select markers that could accurately discriminate between sancho and chopi, we performed PCR-RFLP analysis of the13 samples by developing a PCR-RFLP test to identify sancho samples. Many taxonomic studies of land plants have been based on the trnH-psbA region of cpDNA, as this DNA barcode exhibits high rates of sequence divergence among species. Based on the partial sequences of trnH-psbA in the cp genome that are shared between Z. schinifolium and Z. piperitum, we predicted that the PleI restriction enzyme would produce species-specific RFLP patterns and could therefore be used to identify Z. schinifolium based on the trnH-psbA locus. As shown in Figure 5, the fragment sizes for the two species were as follows: In Z. schinifolium, PleI produced two fragments (436 and 107 bp) from PCR products (lanes 2, 3, 5, 7, 8, and 10) of trnH-psbA (543 bp); in Z. piperitum, this enzyme did not digest the PCR products of trnH-psbA (562 bp). These results indicate that the psbZ-trnG marker is suitable for use as a reliable marker for differentiating between Z. schinifolium and Z. piperitum spices in the food market.    Table 4 for details); 16 samples were amplified by PCR using four primer pairs; ZanID1_F and R (A), ZanID2_F and R (B), ZanID3_F and R (C), and ZanID4_F and R (D). Numbers indicate sancho and chopi samples, as described in Table 4.

Discussion
Zanthoxylum formed a phylogenetic group in previous molecular phylogenetic studies [5,15,16]; however, these studies did not sufficiently resolve the relationships among some of its taxa. These studies were based on the ITS sequences of nuclear ribosomal DNA and the trnL-trnF, matK-trnK, atpB, atp-rbcL, and rbcL sequences of the cp genome [23][24][25]. Although these regions were considered to be universal DNA barcodes for higher plants, they did not allow the assessment of the usefulness of these loci in the barcoding of some taxa. Therefore, advances in NGS technologies have made it possible to sequence whole cp genomes and identify molecular markers. Highly variable markers derived from the cp genomes of different species at the genus level have uncovered many loci that are informative for systematic botany and DNA barcoding research [26,27].
Here, we retrieved the complete cp genome sequences of four Zanthoxylum species from the NCBI database and compared species-specific cp diversity in Zanthoxylum. We confirmed the variation among species, with an average nucleotide diversity value (Pi) of 0.00609 among the four Zanthoxylum species. Although the average Pi value for the SSC region was relatively high, high sequence divergence was detected at loci trnH-psbA, psbZ-trnG, trnfM-rps14, and trnF-ndhK in the LSC. Indeed, the trnH-psbA locus is highly variable in most plants and is known as a universal DNA barcoding region [28,29]. We used PCR amplification and sequencing to validate four hypervariable markers to distinguish between Z. schinifolium and Z. piperitum and to discriminate between sancho and chopi spice materials consumed in the online food market. However, PCR amplification of the four markers produced variable banding patterns, making it difficult to discriminate between sancho and chopi. Therefore, we designed a PCR-RFLP method using PleI digestion of the trnH-psbA DNA barcoding region, resulting in the production of two fragments (436 and 107 bp) only in Z. schinifolium. This marker, the ZanID2 marker from the psbZ-trnG region, is therefore suitable for discriminating between Z. schinifolium and Z. piperitum in sancho and chopi. The ZanID2 marker shows high sensitivity and specificity for detecting both sancho and chopi samples. Among the 10 sancho samples examined, three were successfully detected as sancho (30%), whereas three other samples (30%) produced a double band pattern that was clearly detected in both sancho and chopi samples. As expected, these results confirm the notion that products labeled as sancho that are sold in the spice and herbal medicine market often contain a mixture of sancho and chopi.
Although our results confirm that our newly developed InDel markers can be used to authenticate Z. schinifolium and Z. piperitum based on available cp genome data, more complete cp genome sequences are needed to comprehensively evaluate these InDel markers in the Zanthoxylum genus.