Population genetic and phytochemical dataset of Saraca asoca: A traditionally important medicinal tree

The data presented in this article is in support of the research paper “Genetic and phytochemical investigations for understanding population variability of the medicinally important tree Saraca asoca to help develop conservation strategies” Hegde et al., 2018. This article provides PCR based Inter-Simple Sequence Repeat (ISSR) and HPLC datasets of 106 individual samples of Saraca asoca collected from various geographical ranges of the Western Ghats of India. The ISSR data includes information on genetic diversity and images of population structures generated through amplified DNA products from samples of Saraca asoca leaf. Phytochemical data obtained from HPLC includes concentration (mg/g) of gallic acid (GA), catechin (CAT), and epicatechin (EPI). The data also presents information obtained from various statistical analysis viz. standard error of the mean values, distribution variables, prediction accuracy, and multiple logistic regression analysis.


a b s t r a c t
The data presented in this article is in support of the research paper "Genetic and phytochemical investigations for understanding population variability of the medicinally important tree Saraca asoca to help develop conservation strategies" . This article provides PCR based Inter-Simple Sequence Repeat (ISSR) and HPLC datasets of 106 individual samples of Saraca asoca collected from various geographical ranges of the Western Ghats of India. The ISSR data includes information on genetic diversity and images of population structures generated through amplified DNA products from samples of Saraca asoca leaf. Phytochemical data obtained from HPLC includes concentration (mg/g) of gallic acid

Data
The data presented here was the basis of the research article by Hegde et al. [1]. We present the data of seven figures and seven tables related to the research article Hegde et al. [1]. The first figure (Fig. 1) presents the percentage of molecular variance of 106 individuals of Saraca asoca collected from 11 populations. The second figure (Fig. 2) represents the relationship between genetic distances and geographical distances of the above samples using ISSR markers by Mantel test. The third figure (Fig. 3) presents population structure of samples, by admixture analysis. These datasets were obtained after the testing of 20 primers and selecting only those that showed reproducible bands upon repetition of the assays. Based on the presence (1) and absence (0) of bands, the gel profiles were scored. Various multivariate analyses were carried out on the binary data thus obtained, applying statistical tools to obtain the results. The fourth figure (Fig. 4) compares S. asoca with one of its adulterants/substituents Polyalthia longifolia and it provides information on the distribution of these two species with reference Specifications

Value of data
The data presented here will provide information on the genetic and phytochemical profiles of a 106 accessions of S. asoca in various parts of the Western Ghats which is useful to understand population genetics and phytochemical variability (with respect to selected major compounds) of this important medicinal tree species. The data could be used in future investigations of S. asoca and will help develop its conservation strategies.
to concentrations of three phytochemical constituents used as markers in the study viz., a) distribution of S. asoca and P. longifolia samples by gallic acid (GA) concentration, b) distribution of S. asoca and P. longifolia samples by epicatechin (EPI) concentration and c) distribution of S. asoca and P. longifolia samples by catechin (CAT) concentration. The data presented here has been obtained after quantification and analysis of GA, CAT and EPI from the S. asoca leaf and bark extracts using HPLC while those from P. longifolia has been obtained from previous literature on P. longifolia [2]. The fifth figure (Fig. 5)  presents Receiver Operating Characteristic (ROC) plots of GA, EPI and CAT contents in leaf and bark of 106 individuals from 11 populations of S. asoca. The dataset has been obtained after quantification and analysis of GA, CAT and EPI from S. asoca leaf and bark extracts (mg/g) using HPLC. The sixth figure    Tables 2 and 3).
( Fig. 6) presents Principal Component Analysis (PCA) of the S. asoca samples, a) with respect to combined GA, EPI and CAT contents in bark and leaf or and b) with respect to ISSR based genetic markers. The dataset has been obtained after analysis of quantities of GA, CAT and EPI and from binary data obtained after scoring of DNA fingerprints from the S. asoca respectively. The seventh figure (Fig. 7) presents ISSR fingerprints of S. asoca with primer UBC814. This data has been acquired after electrophoresis of amplified PCR products in agarose gels and photographed using gel documentation system (Syngene, UK). The first table (Table 1) presents details about the primers used in ISSR assays and the amplification profiles in 11 populations of S. asoca. These datasets were obtained after testing of 20 primers and selecting only those primers that consistently produced reproducible bands in at least three independent repeat assays. The data was acquired from binary data scored using the fingerprints obtained from S. asoca samples with presence and absence of individual bands taken as 1 and 0 respectively. The second (Table  2AeD) and third tables (Table 3AeD) present contents of GA, CAT and EPI in S. asoca samples quantified (mg/g) using HPLC technique. The fourth table (Table 4) presents standard error of the mean (SEM) of chemical constituent GA, CAT and EPI. The fifth table (Table 5) presents the variations in the total chemical constituent (mg/g) within the 11 populations of S. asoca. The second, third, fourth, and fifth tables show data acquired from HPLC assay with further statistical analysis. The sixth table (Table 6) presents the information on the ISSR markers that are highly associated with (!75th percentile) concentration of phytochemicals in 11 S. asoca populations. The seventh table (Table 7) presents prediction accuracy of models for phytochemical content (!75th percentile ¼ high, else ¼ Low) in 11 S. asoca populations. These data (Tables 6 and 7) were obtained from multiple regression analysis of both HPLC data and ISSR based binary data obtained from 106 accessions of S. asoca.

Plant material collection
The plant materials were collected from Western Ghats regions of Karnataka, Maharashtra and Goa states of India [1]. Total 106 accessions of 11 population of Saraca asoca (Roxb.) De Wilde leaf and bark were collected and authenticated by taxonomist. Voucher specimen has been deposited at ICMR-National Institute of Traditional Medicine with Voucher Number: RMRC 997. The identity of the species was also authenticated by amplification and sequencing of matK region of the voucher specimen [1]. Each leaf sample from all accessions were stored at À80 C for DNA extraction. Leaf and bark samples were shade dried before performing extraction process.

DNA extraction
DNA extraction was performed using modified CTAB method by using 1g of all 106 accessions of leaf samples [3]. The isolated DNA were electrophoresed using 1% agarose gels, stained with GelRed for           [1,4]. The PCR products were separated by electrophoresis in a 1.5% agarose gel under 80 V electrical current, stained with GelRed, and visualized using gel documentation system (Syngene, UK). The banding pattern of the accessions were scored as, presence (1) or absence (0) and binary matrix was constructed [1,5]. The number of polymorphic characters with each primer like Polymorphic Information Content (PIC) and Marker Index (MI) were recorded [1,6]. Relationship between geographical and genetic distance and analysis of molecular variance (AMOVA) were carried out using GenAlEx 6.5 [7,8]. Population genetic structure was assayed using STRUCTURE version 2.3.1 with admixture model to determine the number of sub-populations [1,9e11].

Extract preparation
Extraction was carried out using 5g shade dried powdered samples (leaf and bark) in 50 mL petroleum ether for 12e16 h. This procedure was repeated twice and the pooled extracts were evaporated to dryness. Further, 50 mL of methanol: water (70:30) was added into this and the mixture was kept for 12e16h, followed by 15 min sonication [1]. This extraction was repeated two times to collect a total of 150 mL of extract which was further filtered and evaporated to dryness [1].

EPI, GA and CAT concentrations and their analysis
The leaf and bark samples from all accessions were processed by HPLC based method for quantitation of gallic acid (GA), epicatechnin (EPI) and catechin (CAT) [1]. The GA, EPI and CAT concentration (in mg/g) of all 106 accessions were summarised in terms of range (minimum and maximum), standard deviation, mean, 95% confidence interval, median and inter-quartile range. The distribution of S. asoca along with those of common adulterant/substituent (P. longifolia) obtained from previous study [2] were used to construct dot-plots with median values. The GA, CAT and EPI concentrations were used to construct receiver operating characteristic curves for both bark, leaf and all with false positivity (1specificity) on the X-axis and sensitivity on the Y-axis (Fig. 5). Considering GA, EPI and CAT as dependent variables and bands as independent variables a multiple logistic regression was performed ( Table 6). Table 7 depicts the prediction of high and low concentrations and overall prediction ability for each model. These studies were performed separately for leaf and bark samples of S. asoca. BioVinci version 1.1.0 for Windows (BioTuring Inc., San Diego California USA) was used to perform Principal Component Analysis (PCA) [1].