Morphological and molecular datasets for Kaempferia species

This study compared morphological and molecular data for identification of Kaempferia species. Each species was deposited in Institute of Bioscience (IBS), Universiti Putra Malaysia (UPM) as voucher specimens and ITS sequences of each species deposited in NCBI (https://www.ncbi.nlm.nih.gov/) as GenBank accessions. DNA was extracted using a modified CTAB method and PCR amplification was completed using Internal Transcribed Spacer (ITS4 and ITS5) markers. PCR amplification of products were viewed under gel electrophoresis. Sequencing was performed and sequence characteristics of ITS rDNA in Kaempferia is shown. Qualitative and qualitative scoring of morphological characters and measuring techniques for Kaempferia species are included. In addition, a brief review of molecular markers used in phylogenetic studies of Zingiberaceae is included in this dataset.


Data
Two datasets are described and compared. Morphology data and molecular data from ITS markers show taxonomic congruence and both can be used to discriminate among closely related Kaempferia species.

Taxon sampling
Twenty-one accessions of Kaempferia, representing six species and one cultivar were included in this study. Three samples were used for each plant. The samples were authenticated by Dr. Shamsul Khamis and voucher specimens of these plants have been deposited at Institute of Bioscience (IBS), Universiti Putra Malaysia (UPM) ( Table 1). The rhizomes of all plant samples were cultivated for at Field 2, UPM (GPS coordinates 3°0'26"N 101°42'16"E). Cultivated plants were accessed for morphological characters and used for genomic DNA extraction.

Molecular methods
Total genomic DNA was extracted from 0.1 g of fresh leaf sample using a modified cetyltrimethyl ammonium bromide (CTAB) extraction protocol [2]. Extracted DNA were quantified using Nanodrop 200c spectrophotometer (Thermo Scientific). Extracted DNA was stored at À 20°C until further use. Standard polymerase chain reaction (PCR) were used to amplify the target gene region. The ITS region was amplified (35 cycles of 94°C for 3 min, 55°C for 20 sec, 72°C for 2 min) using primers ITS4 (5'-TCCTCCGCTTATTGATATGC-3') and ITS5 (5'-GGAAGTAAAAGTCGTAACAAGG-3') [3]. A Thermocycler (Biometra) was used for PCR amplification. Each PCR reaction was in a total volume of 25 mL, containing 15 ng of template DNA, 12.5 mL of 2x Type-it PCR Master Mix (Red-Taq), 1 mL each of 10 mM ITS4 and ITS5, and 8.5 mL of RNase-free water (Qiagen s ). Amplification products were separated via electrophoresis on 1.5% (w/v) agarose gels with 1x TBE buffer at 70 V for 75 min, stained with Gel-RedTM Nucleic Acid Stain and visualized under UV light using Bio-Rad Molecular Imager GelDoc TM XRþ with Image Lab TM Software (Bio-Rad Laboratories, Inc.,USA) (Appendix 1). GeneRuler 100 bp ladder (Fermentas) was used as DNA molecular weight markers. Amplified products were sent for purification and sequencing (First BASE Laboratories Sdn. Bhd.).

Sequence alignment and phylogenetic analysis
ITS sequenced regions were trimmed and aligned using Clustal X with default values (e.g., gapopening cost ¼ 15) [4,5] and the resulting alignments were manually checked. Gaps were retained for further analysis. Alignments are available (Appendix 2). Nucleotide diversity (π), estimated values of transition/transversion bias (R), nucleotide substitutions (r) for each nucleotide pair, evolutionary divergence between sequences, and cluster analysis among the Kaempferia sequences were estimated using MEGA 6 [6]. For phylogenetic inference, maximum parsimony analysis was performed using Phylogenetic Analysis Using Parsimony (PAUP) ver. 4.0 Beta 10 [7]. Gaps were treated as missing data. The most parsimonious trees were obtained through a heuristic search. A Bootstrap analysis (1000 replicates) was also performed. The phylogenetic tree was re-rooted using the ITS sequence of Tamijia flagellaris (K. Schum.) Ridl. (NCBI accession number, AF478797.1) as reported in a previous study [8]. Disregarding indels, nucleotide sequence divergence between pairs of taxa was calculated using a Kimura [9] 2-parameter model. Maximum parsimony analysis was performed using phylogenetic analysis using parsimony (PAUP*) ver. 4.0 Beta 10 (Swofford 2004). Gaps were treated as missing data. The most parsimonious trees were obtained through a heuristic search with tree bisection reconnection (TBR) branch swapping and 10,000 random sequence additions. A Bootstrap analysis (1,000 replicates) with heuristic search was also performed.

Morphological methods
Morphological traits of 21 accessions representing six species were studied at mature plant stage. Twenty-four qualitative and quantitative characteristics were categorized and transformed into scores to represent each category available (Appendix 3). Leaf characters were classified according to Hickey [10] and colour was identified using the Royal Horticultural Society (RHS) colour chart [11,12] (Appendix 4). The accessions were clustered by the unweighted pair group method using arithmetic average (UPGMA) method which it was indicated as a good cophenetic correlation of the original distance matrices. Multivariate analysis was done by transforming the scores for the presence and absence from each species and analysed using Gower's Similarity Index. A dendrogram and coefficient of similarities were produced using MVSP (Multivariate Statistical Programme) to estimate the variability among the Kaempferia species. Table 1 Voucher specimen numbers for Kaempferia species deposited in Institute of Bioscience (IBS), Universiti Putra Malaysia (UPM) and GenBank accession number for ITS sequences of each species deposited in NCBI (https://www.ncbi.nlm.nih.gov/).

Appendix B
See Table B1   Table B1 Sequence characteristics of ITS rDNA in Kaempferia.   Table C1 Appendix D