Toward the identification of molecular markers associated with phytochemical traits in the Iranian sumac (Rhus coriaria L.) population

Abstract Sumac (Rhus coriaria L.) is one of the important forest species dispersed in the northwest of Iran. It is one of the important spice in Iran and the Middle East because of active components containing organic acids, phenolic acids, flavonoids, anthocyanins, tannins and terpenoids. This study aimed to investigate population structure and linkage disequilibrium (LD) extent within Rhus coriaria L. genotypes using ISSR markers and identify molecular markers associated with phytochemical traits using association analysis. In the molecular part of the experiment, the genetic diversity of 75 sumac genotypes from five different areas of northwest Iran was assessed by 18 ISSR primers. In the phenotypic assessment, the fruits of the sumac genotypes were analyzed using HPLC‐LC/MSMS for determining phytochemical components including maleic acid, ellagic acid, maleic acid hexoside, gallic acid, coumaric acid, quercetin, caftaric acid, and linoleic acid. The phenotypic data analysis revealed the great phenotypic diversity among and within Iranian sumac populations for the studied phytochemical traits. The studied sumac genotypes were divided into two subpopulations based on molecular marker‐based structure analysis. A significant level of LD was observed in 11.64% of the ISSR marker pairs (p < .05). The mixed linear model procedure showed that 12 loci had a significant association with investigated traits. The ISSR loci identified in this study can be potentially used in marker‐assisted selection in sumac breeding programs.

Plants have different classes of phytochemicals. Phenolic compounds and flavonoids are the largest phytochemical molecules with antioxidant properties in plants (Andreu et al., 2018;Okpuzor et al., 2009;Ryu et al., 2006;Wang et al., 2016;Zahoor et al., 2018). Many research works showed that medicinal plants, fruits, and vegetables have various proteins, enzymes, and phytochemicals with antioxidant activity, which are the reason for their useful health effects (Scalbert et al., 2005;Zhang et al., 2016).
A few studies were performed on determining the chemical composition of R. coriaria leaves and fruit epicarps (Regazzoni et al., 2013;Van Loo et al., 1988). R. coriaria has a wide range of active components containing hydrolysable tannins and various organic acids such as malic and citric acids (Kossah et al., 2010;Shabbir, 2012), fatty acids, vitamins, and terpenoid derivatives (Abu-Reidah et al., 2014). Phenotyping of complex metabolomic characteristics is done easily by the metabolomics tools, such as gas chromatography-mass spectrometer (GC-MS) (Saito & Matsuda, 2010).
Acquisition of any detailed information on genetic controlling of phenotypic variations is an important step before running any breeding activity in plants including sumac (Jin et al., 2010). The development of molecular marker technology has facilitated the identification of the genetic basis of quantitative traits (Semagn et al., 2010). ISSR is a microsatellite-based method that does not require basic information on the genome and design of primers and produces many polymorphic patterns (Zietkiewicz et al., 1994). This marker is widely used in studies of genetic diversity, phylogeny, genomic mapping, and evolutionary biology in a wide range of crop and medicinal plant species (Reddy et al., 2002). Linkage and association analyses are the two most used methods for elucidating the genetic structure of quantitative or metric traits in plants. Association study was first applied in 2001 in maize concerning mapping quantitative trait loci (QTL) controlling flowering time .
Association analysis identifies the genotypic markers associated with variation of phenotypic traits based on "naturally produced linkage disequilibrium (LD)," and it has higher resolution (Flint-Garcia et al., 2005;Xu et al., 2020) and takes shorter research time than linkage analysis (Flint-Garcia et al., 2003). For running effective association analysis, LD should be surveyed in each region of chromosome or linkage group Sorkheh et al., 2008). Some factors like small population size, inbreeding, population admixture, genetic drift, autogamy and epistasis increase LD, vice versa, allogamy, and high recombination and mutation rate decrease LD level (Al-Maskri et al., 2012). Patterns of LD have been investigated in several crop species. It was observed, LD decays rapidly in some plants, for instance, within 1.1 kb in cultivated sunflower genotypes (Liu & Burke, 2006) or 300 bp in wild grapevine (Lijavetzky et al., 2007), whereas it decays slowly in some plants, for instance, within 100-200 kb in rice diverse lines (Huang et al., 2010;McNally et al., 2009) or 250 kb in cultivated soybean genotypes (Lam et al., 2010). In association analysis studies, the true association between a genotypic marker and QTL is a point of considerable interest. The existence of genotypes from different origins within the population (association panel) results in LD between unlinked loci. On other hand, the amount of LD in an association panel is associated with effective population size (Ne) (Sved, 1971); normally in a population with small size, the higher relatedness among genotypes (kinship) is seen so that it results in higher LD estimation (Falconer & Mackay, 1996). Then, structured population and kinship result in biased estimation of LD, which may augment the rate of false positives in association studies, that is, identifying markers that are not being involved in the variation of quantitative traits (Bradbury et al., 2007).
There are some statistical analyses for preventing population structure and kinship effects on association analysis (Cardon & Palme, 2003). A mixed linear model (MLM) is one of the statistical approaches that minimize the effect of troublesome factors such as population structure and kinship by incorporating these effects as covariates in the model . AM has been used in maize, wheat, rice, barley, and sorghum to clarify the genetic structure of complex traits such as yield, flowering, and disease resistance (Agrama et al., 2007;Breseghello & Sorrells, 2006;Cockram et al., 2010;Maccaferri et al., 2011;Murray et al., 2009;Ravel et al., 2006;Stracke et al., 2009;Thornsberry et al., 2001;Tommasini et al., 2007;Zhang et al., 2016). AM has also been used in forest trees and fruit crops like conifers (Eckert et al., 2010;Gonzalez-Martinez et al., 2007), eucalyptus (Thumma et al., 2005), peach (Aranzana et al., 2010;Cevik et al., 2009), and grape (Chitwood et al., 2014;Emanuelli et al., 2010;Tello et al., 2016) for QTL mapping. The objectives of the present study were to investigate population structure and LD extent within Rhus coriaria L. population using ISSR markers and identify molecular markers associated with phytochemical traits using association analysis.

| Sample preparation
Fruits of sumac were collected and dried. Epicarps of fruits were separated from kernels and ground to powder by the household mill.
The powder was kept at room temperature until extraction.

| Extraction of phenolic compounds
Ten milliliters of methanol (HPLC grade) (80% v/v) added to grounded fruit epicarps (0.5 gr) and sonicated for 45 min at room temperature.
The mixture was centrifuged for 15 min at 3,000 g at room temperature and then filtered through a 0.2 µm syringe filter (PALL, USA) and stored at 20ºC until analysis.

| HPLC-LC/MS-MS analysis
Chromatographic separation was performed using an Agilent ZORBAX SB-C18 (4.6 × 150 mm, 5 µm). Mass Lynx software, version 4.1, was used for instrument control and data acquisition. The analysis was performed in positive and negative ion mode ( Figure 2).

| ISSR genotyping
Genomic DNA extraction was performed following the method described by Doyle and Doyle (1990). Eighteen ISSR primers were used for genotyping of individuals (Table 1). The PCR program consisted of an initial denaturation at 94°C for 4 m, followed by 38 cycles of 94°C for 30 s, annealing temperature depending on primers composition (52 to 60ºC, Table 1) for 30 s, and 72°C for 2 m, with a final extension at 72°C for 10 m. Scoring of separated amplified fragments in agarose gels was performed according to present (1) or absence F I G U R E 1 Regions where sumac populations were sampled (0) of bands, and the produced binary data were used for statistical analysis.

| Data analysis
In order to clarify the phenotypic variation within and among populations, the descriptive statistic was calculated using SAS software version 9.4. Estimating the population structure was performed using the Bayesian approach in the software package Structure 2.3.4 (Pritchard et al., 2000). Ten independent runs were applied. Five replications were considered for each run. The runs were performed under correlated allele frequencies and the admixture model. The burn-in period length was 100,000, followed by 100,000 MCMC (Markov Chain Monte Carlo) replications. The optimal subpopulations number (K) was chosen considering the log probability of data [LnP(D)] (Rosenberg et al., 2002) and ΔK method presented by Evanno et al., (2005). Individuals with membership probability of equal or greater than 0.70 were allocated to subpopulations, while those with lower probabilities were assigned as admixed. Q-matrix was derived from the last run of Structure (Pritchard et al., 2000).
The linkage disequilibrium (LD) was calculated with TASSEL 2.1 (Bradbury et al., 2007). Association analysis was performed to analyze marker-trait association using a mixed linear model (MLM) including both Q and K matrices  as covariates in the model using TASSEL software.

| Phenotypic variability
The descriptive statistics including central tendency and variability for studied traits were calculated and summarized in Table 2. A high level of phenotypic variability was observed among and within populations for studied traits in sumac populations ( Table 2). The highest (708.01) and lowest (355.52) mean value for gallic acid 6.5 was observed in population 4 (Aghberaz; East Azerbaijan) and population 2 (Dareh-Khan; West Azerbaijan), respectively. The highest (887) and lowest (532.54) mean value for TA B L E 1 Geographical location of the sumac fruit collection sites and selected ISSR primers for evaluation of sumac genetic diversity West Azerbaijan) and population 4 (Aghberaz; East Azerbaijan).
In comparing five studied sumac populations, the highest (38,040.87) and lowest (5.54) mean values were observed in malic acid and linoleic acid, respectively. Malic acid and gallic acid 8.7 had the highest (308.35) and lowest (80.45) genotypic coefficient of variation (CVg) among the five studied sumac populations.

| Population structure
The population structure in the sumacs (Rhus coriaria L.) association panel comprising 75 individuals was analyzed using 132 ISSR markers amplified by 18 ISSR primers. The optimal K value was 2, based on diagrams of LnP(D) and ΔK (Figure 3a). Structure analysis revealed that 63 individuals presented a membership probability of Q ≥ 0.70, and they were assigned to one of the two subpopulations. Twelve individuals with Q < 0.70 were determined admixed ( Figure 3b). Results showed that 31 genotypes (41.33%) were assigned to subpopulation 1 and 32 genotypes (42.67%) assigned to subpopulation 2.

| Linkage disequilibrium (LD) assay
LD was measured by r 2 and D′. The r 2 values among ISSR markers pairs showed an average value of 0.057. D′ value ranged from 0.000 to 1 with an average value of 0.59. A significant level of LD was observed in 11.64% of the ISSR markers pairs (p ≤.01) in the studied Iranian sumac germplasm (Figure 3c).
characters (ellagic acid 13.97 and quercetin), UBC841 was found to be related with ellagic acid, gallic acid, and coumaric acid, UBC867 was found to be related with caftaric acid and linoleic acid, and UBC842 was found to be associated with coumaric acid and maleic acid hexoside 6.11 (Table 3).

| D ISCUSS I ON
Results showed extensive phenotypic diversity within and between Iranian sumac populations. High phenotypic and genetic diversity is necessary for conducting association genetic. In the investigated Iranian sumac populations, malic acid was the predominant component and linoleic acid was a component with the lowest mean Syrian and Chinese sumac, and they reported that the fruit of Syrian sumac contained higher amounts of organic acids than Chinese sumac and they observed that the predominant acid in both species was malic acid. It was reported that the major fatty acids in the sumac were oleic, linoleic, and palmitic acids (Dogan & Akgul, 2005;Ünver & Özcan, 2010).
According to Stansfield (1991), if the heritability of a trait is more than 0.5, the trait has high heritability; if it is between 0.2 and 0.5, the heritability is medium; and if it is lower than 0.2, it has low heritability. Given this, malic acid, malic acid hexoside 2.71, coumaric acid, coumaric acid 8.9, caftaric acid, linoleic acid, linoleic acid 5, and ellagic acid 11.49 showed high heritability. There are some reports about the low heritability of primary and secondary metabolites including sugars and organic acids in tomato fruits (Sauvage et al., 2014;Zhang et al., 2016).
A significant level of LD was observed in ISSR markers pairs in the studied Iranian sumac germplasm that encouraged us to run association genetic. LD is defined as a nonrandom association of alleles at different loci placed on the same or different chromosomes (linkage groups) (Mackay & Powell, 2007). A few studies presented reports about the LD patterns in crop plants (Khan & Korban, 2012;Yin et al., 2004). The structure of LD across the genome affects the resolution of association analysis (Remington et al., 2001). LD pattern is different between forest trees and fruit crops; domesticated trees have extended LD compared to undomesticated ones (Khan & Korban, 2012). Extension of LD between 50 and 100 cM in local Michigan populations of Arabidopsis was reported by Abdurakhmonov and Abdukarimov (2008). LD analysis in peach has shown increasing LD to 13-15 centimorgan (cM) (Aranzana et al., 2010). Genome-wide LD in forest trees investigated and the results showed that LD in Populus trichocarpa extended up to 16-34 kb (Yin et al., 2004), less than 500 bp in Populus termula (Ingvarsson, 2005), 2000 bp in Pinus taeda (Brown et al., 2004), 1,000 bp in Pseudostuga menziensii (Krutovsky & Neale, 2005), and 100-200 bp in Picea abies (Rafalski & Morgante, 2004).
Based on structure analysis, the studied Iranian sumac association panel was subdivided into 2 subpopulations. The admixture of populations is an important factor that affects the power and precision of association genetic (Zhang et al., 2012). False-positive marker-trait association occurs when the impact of population structure or familial relatedness was disregard . A mixed linear model can help scientists to overcome this problem by considering the troublesome agents as covariates in the model  and also increase resolution from the level of a 20 cM region to that of an individual gene (Sorkheh et al., 2008).
The mixed linear model was applied for detecting ISSR markers linked to genomic regions controlling phytochemical traits in the Iranian sumac association panel, and some significantly molecular markers were identified. Association studies in sumac have been used for identifying genomic regions related to fatty acid compositions (Qu et al., 2017;Zhu et al., 2019). It has been reported that the FAD2 gene proceeds the linolenic acid synthesis (Zhu et al., 2019).
Metabolite-based association study has validated the metabolome GWAS in genetic improvement of complex traits (Matsuda et al., 2015;Wen et al., 2014). Schauer et al., (2008)  Some association studies have been carried out in forest trees.
For instance, Thumma et al., (2005) reported an association between a polymorphism in the ccr gene and microfibril angle in Eucalyptus.
In another study, an association of SSR markers with resin yield was investigated in 240 genotypes of Pinus roxburghii and the results showed that two chloroplastic SSRs; Pt71936 and Pt87568 and one nuclear SSR; pm09a showed significant association with resin yield (Rawat et al., 2014). In maritime pine, Lepoittevin et al., (2012) found the association of 184 SNPs marker with the height growth rate. Also, Cabezas et al., (2015) found four SNPs associated with total height and polycyclism in maritime pines. Jugran et al., (2012) reported three ISSR markers associated with antioxidant activity in Valeriana jatamansi.
TA B L E 3 ISSR loci identified for the studied phytochemical traits in the studied sumac germplasm using association analysis In the present study, some marker loci were common between traits. The common markers can be due to linkage or pleiotropic effects. The common markers are useful because it makes the possible simultaneous selection for several traits and can increase the efficiency of marker-assisted selection programs.

| CON CLUS IONS
Results of the present study suggest that the studied sumac germplasm was subdivided into 2 subpopulations. 84% of studied genotypes were considered to one of the two subpopulations and just 16% of them were assigned admixed. In the present study, QTLs controlling phytochemical traits in sumac were identified for the first time. A total of 12 ISSR markers associated with studied traits were identified using an association mapping approach. These ISSR markers provide primary molecular information for marker-assisted selection in sumac. Results of MLM association mapping showed that some locus was common for more than one trait. These common markers are useful in sumac breeding programs and help in markerassisted selection programs.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.