Mobile genomic element diversity in world collection of safflower (Carthamus tinctorius L.) panel using iPBS-retrotransposon markers

Safflower (Carthamus tinctorius L.) is a multipurpose crop of dry land yielding very high quality of edible oil. Present study was aimed to investigate the genetic diversity and population structure of 131 safflower accessions originating from 28 different countries using 13 iPBS-retrotransposon markers. A total of 295 iPBS bands were observed among which 275 (93.22%) were found polymorphic. Mean Polymorphism information content (0.48) and diversity parameters including mean effective number of alleles (1.33), mean Shannon’s information index (0.33), overall gene diversity (0.19), Fstatistic (0.21), and inbreeding coefficient (1.00) reflected the presence of sufficient amount of genetic diversity in the studied plant materials. Analysis of molecular variance (AMOVA) showed that more than 40% of genetic variation was derived from populations. Model-based structure, principal coordinate analysis (PCoA) and unweighted pair-group method with arithmetic means (UPGMA) algorithms clustered the 131 safflower accessions into four main populations A, B, C, D and an unclassified population, with no meaningful geographical origin. Most diverse accessions originated from Asian countries including Afghanistan, Pakistan, China, Turkey, and India. Four accessions, Turkey3, Afghanistan4, Afghanistan2, and Pakistan24 were found most genetically distant and might be recommended as a candidate parents for breeding purposes. The findings of this study are most probably supported by the seven similarity centers hypothesis of safflower. This is a first study to explore the genetic diversity and population structure in safflower accessions using the iPBS-retrotransposon markers. The information provided in this work will therefore be helpful for scientists interested in safflower breeding.


Introduction
Safflower (Carthamus tinctorius L.) belongs to the Compositae family, it is self-pollinated and has a haploid genome size of about 1.4 GB and 2n = 24 chromosomes [1]. This crop is cultivated over wide geographical zones throughout the world for several uses including production of dyes, extraction of edible oil, and various medicinal utilizations [2]. Safflower has been in use since ancient times and the archeological remains of Carthamus spp. were found at sites in Syria about 7500 BC ago [3]. From these sites, cultivation of safflower spread to other regions like, Egypt, the Aegean, and southeastern Europe.
It is estimated that safflower is cultivated in nearly 20 countries with a total cultivated area of 1,140,002 hectares and the production of 948,516 tons [7]. Safflower major producer countries include, Russian Federation (286,351 tons), Kazakhstan (167,243 tons), Mexico (121,767 tons), USA (99,830 tons), Turkey (58,000 tons), and India (53,000 tons) which account for about 71% of the total world production [7]. In spite of containing good amount of polyunsaturated fatty acids and being resistant to dry conditions, still safflower did not gain the status of major oilseed crop. The primary factors which prevented its cultivation on large scale are low seed yield, low oil content, biotic stresses susceptibility, and spininess [8]. Therefore, the enhanced acceptability and utilization of safflower as an oilseed crop will require genetic improvement for the traits of interest. To this end, genetic diversity can be an effective approach by providing a good source of variations upon which breeding programs can build [9]. However, it is unfortunate that current safflower germplasm and breeding lines displayed low levels of genetic diversity, and were therefore of reduced usefulness in breeding programs. An extensive genetic and phenotypic diversity characterization among global safflower germplasm can help broaden the genetic base and diversity in the safflower crop, and identify elite accessions [10][11]. Safflower genetic diversity was investigated using different molecular markers; Random Amplified Polymorphic DNA (RAPD), Inter Simple Sequence Repeat (ISSR), Amplified Fragment Length Polymorphism (AFLP), Simple Sequence Repeats (SSRs), and Single Nucleotide Polymorphism (SNPs) [4-11-12-13-14-15-16-17-18-19-20], but so far, iPBS-retrotransposon markers have not been used to investigate the genetic diversity in safflower.
Retrotransposons are known as an important component of the plant genome in terms of structural evolution and have great potential of changing its position and copy number across plant genome [21]. Retrotransposons are genetic elements ranging from 50 to 90% in various plant genomes depending upon the plant species [22]. Long terminal repeat (LTR) and nonlong terminal repeat (non-LTR) are the two classes of retrotransposons, and plant genome reveal higher proportions of LTR retrotransposons as compare to non-LTR [23]. Limitations in the retrotransposon marker systems resulted in the development of a new marker system named Inter-primer binding site (iPBS) retrotransposons having universal applicability [23][24]. iPBS is a PCR-based, universal marker system and depends upon the presence of tRNA as a reverse transcriptase primer binding site [25]. Minimum cost and high efficiency of iPBS-retrotransposons make them good marker system [23]. Various crops like, pea, chickpea, Lens, Turkish okra, Tobacco, and common bean have been studied efficiently using iPBS-retrotransposon markers system [26-27-28-29-30].
Several crop species have been improved utilizing molecular markers in various crop breeding programs [31]. However, for safflower, its genetics and genomics were less studied, which can explain the lack of reliable marker systems for use in the process of developing superior safflower cultivars [32][33]. This study was conducted to evaluate the genetic diversity and population structure of safflower accessions using iPBS-retrotransposons as a start for further scientific investigations and practical breeding use cases.

Plant materials and DNA isolation
Experimental materials comprising 131 safflower accessions collected from 28 different countries were evaluated in this study. Among these accessions, 94, 17, and 20 originated from the United States Department of Agriculture (USDA), Plant Genetic Resources Institute Pakistan, and from the Turkish Central Research Institute for Field Crops (Table 1). A total of 94 accessions from USDA and 17 from Pakistan used in this study were landraces. The 20 Turkish accessions were single plant selection among international germplasm from USDA and are candidate cultivars. Seeds of each accession were sown at the research and experimental area of Bolu Abant Izzet Baysal University. Fresh, young healthy leaves were harvested at proper time for the isolation of DNA, brought to laboratory and frozen at -80˚C for later use. DNA extraction was performed using the bulk leaves of each accession, and followed CTAB protocol [34] with slight modifications [35]. DNA concentration of each accession was measured using agarose gel (0.8%) and was also confirmed with the help of NanoDrop (DeNovix DS-11 FX, USA). Final DNA concentration for the 131 accession samples to be used in polymerase chain reactions (PCR) was adjusted to 5 ng/μL; the samples were stored at -25 o C till the start of PCR amplifications.

iPBS-retrotransposon PCR amplifications
Seventy iPBS-retrotransposon primers were initially screened using eight randomly selected accessions of safflower for PCR amplifications [25]. Out of the 70 iPBS-retrotransposon primers, 13 were found polymorphic and selected for PCR amplification, and produced strong bands (Table 2). A total reaction volume of 20 μL for PCR amplifications were comprised of 3 ng/ul template DNA, 2 μL dNTPs (Thermo Scientific), 0.2 μL U Taq DNA polymerase (Thermo Scientific), 3.2 μL primer, 2 μL 1x PCR buffer (Thermo Scientific), 2 μL MgCl 2 and 7.6 μL distilled water. Reactions were performed in the sequence of denaturation at 95 o C for 3 min, subsequently followed by 30 denaturation cycles at 95 o C for 15 sec, annealing temperature 50-65 o C for one minute depending upon the primer, and a final extension for five minute at 72 o C [25]. The amplified fragments were electrophoresed on agarose gel 1.2% (w/v) using 0.5x TBE buffer at a constant voltage of 120 V for 230 minute. Staining of the gel was performed with ethidium bromide and visualized using UV Imager Gel Doc XR+ system (Bio-Rad, USA) light and photographed. A 100 bp+ DNA ladder was used as molecular weight marker.

Data analysis
Strong, clear, and unambiguous bands were selected for scoring. iPBS-retrotransposon markers are dominantly inherited markers and were therefore scored using the binary system: 0 or 1, respectively, for the absence and presence of specific bands with respect to 100 bp+ DNA ladder (Fig 1). For individual iPBS-retrotransposon markers, PopGene ver. 1.32 [36] was used to estimate various important genetic diversity parameters including effective alleles number (Ne), Shannon's Information Index (I), and gene diversity (He) ( Table 3). Polymorphism  information content (PIC) was computed for each iPBS-retrotransposon marker following Baloch et al. [28] criteria. At the safflower samples level, the diversity metrics evaluated included the overall gene diversity (Ht), inbreeding coefficient (Fis) and the pair-wise FST (measure of genetic structure), all of which were determined using hierfstat R package [37] following the algorithms of Goudet et al. [38] and Yang, [39]. R statistical software was used to compute pairwise genetic distance (GDj) as measured by Jaccard's coefficient [40]. The population structure was assessed using the Bayesian clustering model-based STRUCTURE software, unweighted pair group method with arithmetic mean (UPGMA), and Principle coordinate analysis (PCoA). The most suitable number of clusters (K subpopulations) was determined following the protocol of Evanno et al. [41] using STRUCTURE software. A total of ten independent runs were set for each K value, and for each run, the initial burn-in period was set to 500 with 500,000 MCMC (Markov chain Monte Carlo) iterations with no prior information on the origin of individuals. We plotted the clusters number (K) against logarithm probability relative to standard deviation (ΔK). Final assignment of individual accessions was based on the magnitude of the membership coefficient being greater than or equal to 50% as suggested by Habyarimana, [42] and Nadeem et al. [9]. R statistical software was used to analysis of molecular variance (AMOVA) for considering two main population strata: the model based structure and the country of origin of the accessions.

3.1.iPBS-retrotransposon marker analysis and genetic diversity
Thirteen most polymorphic iPBS-retrotransposon primers produced a total of 295 clear and strong scorable bands with an average of 22.69 bands per primer across 131 safflower accessions. Out of the 295 scorable bands, 275 (93.22%) were polymorphic with an average of 19.77 bands per primer ( Table 3). The highest (36) and lowest (10) number of scorable bands were observed for primers iPBS2377 and iPBS2391, respectively. The primers iPBS2376 and iPBS2377 revealed highest number of polymorphic bands (29) each and exhibited highest information content (PIC), while primer iPBS2391 revealed least number of polymorphic bands (8)    Analysis of molecular variance (AMOVA) was carried out considering two main population strata: the model based structure and the country of origin of the accessions (Table 5). AMOVA revealed that the country of origin was not significant, while the model statistically significant effects on the molecular genotypic variability resulted from model-based structure (P = 0.005), country within model-based populations (P = 0.02), and model-based populations within country (P = 0.047). Variations between countries were not significant (P = 0.07), whereas variations within countries (P = 0.037) and between populations (P = 0.046) were significant ( Table 6). The variations within and between populations explained 43 and 5 percent, respectively, of the genetic structure (Table 7). The country within population and the population within country explained 35 and 52 percent of the observed structure.
In accordance with the observed most suitable goodness of fit (K = 4), the Bayesian clustering model implemented in STRUCTURE software divided the evaluated safflower accessions into four main populations; 31 accessions (23.66% of the total accessions) in the population A (black), 22 accessions (16.79% of the total accessions) in the population B (red), 33 accessions (25.19% of the total accessions) in the population C (blue), 27 accessions (20.61% of the total accessions) in the population D (pink) (Fig 2). Eighteen accessions (on the right-most end of the structure graph) did not reach the membership threshold (50%) and were named unclassified population. The UPGMA based clustering divided 131 safflower accessions into four main clusters corresponding to the four populations (populations A, B, C, D) identified using the model-based structure. The unclassified accessions were dispersed throughout the four populations, particularly in population D where 9 (50%) of the unclassified accessions clustered. With the UPGMA algorithm, two (Jordan4, Jordan5) and five (Israel2, Egypt2, Egypt3, Spain2, Spain4) population B accessions clustered with population D and population C, respectively. Similarly, relative to model-based clustering algorithm, UPGMA discrepantly clustered the accession Iran8 (population C) in population A (Fig 3). PCoA divide all accessions into four populations; A, B, C, and D similar to structure based clustering, with the unclassified accessions being dispersed particularly throughout populations B, C, and D ( Fig  4).

iPBS-retrotransposons in assessing genetic diversity of safflower panel
To the best of our knowledge, the present investigation represents the first attempt to elucidate the genetic diversity and population structure of safflower accessions at DNA level using iPBSretrotransposons. It was observed that retrotransposons are abundant and widely distributed throughout plant genome [43] and huge amount of error-prone retroviral replications lead to the accumulation of these genetic variations [44][45]. iPBS based markers have been used greatly for fingerprinting and genetic diversity investigation in plants [26-35-46-47]. A total of 13 polymorphic iPBS-retrotransposon markers were used in this study to carry out genetic diversity in a panel of 131 safflower accessions from 28 different countries, and 295 clear and strong bands were recorded. The average number of bands per primer was 22.69 while, 275 (93.22%) out of 295 bands were polymorphic. Mean polymorphism found in this study was higher than that of Yang et al. [20], as they reported 82.7% polymorphism using ISSRs in 48 safflower accessions. Furthermore, Sehgal et al. [19] obtained even lower polymorphism levels of 57.6, 68.0, and 71.2% using RAPD, SSR and AFLP markers, respectively. Polymorphism is one of the key requirements to determine good quality genetic markers; therefore, iPBS markers satisfy this requirement in safflower. Polymorphism information content (PIC) is a widely used metric of the usefulness of molecular markers [48]. The PIC was found higher (0.48) in this work than in the findings by Ambreen et al. [12], Ambreen et al. [13], Barati and Arzani [49], Derakhshan et al. [50], Hamdan et al. [33], and Lee et al. [17], all of whom used SSR markers to evaluate the genetic diversity in safflower. In their works, Houmanat et al. [51] found lower PIC value of 0.23 relative to this study, using ISSRs markers in safflower. These results clearly suggest that more diverse iPBS-retrotransposon markers loci can be identified and effectively used as a tool for assessing genetic diversity and other investigations relying on genetic variants. Maximum number of effective alleles is desirable because it represent the presence of higher level of genetic variations. Number of effective alleles (1.16 to 1.51) found in this work was in the similar range (1.29 to 1.72) to that of Panahi and Neghab [52] using ISSR markers to assess the genetic diversity in Iranian safflower germplasm. Similarly, Sung et al. [53] obtained lower range of effective number of alleles (1.02 to 1.09) than us using RAPD markers. Possible reason behind the presence of higher number of effective alleles in this study might be the differences of experimental materials used during evaluation and also the different molecular marker system. Shannon's information index usually distinguishes the level of available genetic diversity in a population, combining abundance and evenness. Kumar et al. [54] reported lower range of Shannon's information index (0.24 to 0.44) than this study using AFLP markers, highlighting the safflower accessions evaluated in this work were more diverse with genetic variants being more evenly distributed throughout the population. This was confirmed also by the level of gene diversity which was found higher than that of Ambreen et al. [12] and Pearl and Burke [18].
To know the genetic diversity more clearly, diversity metrics like; overall gene diversity (0.24), Fst (0.21), and Fis (1) were also computed. The Fst (a measure of genetic differentiation) obtained in this work was comparable with the findings of Ambreen et al. [12] as they obtained Fst in the range 0.08 to 0.29. On the other hand, Mokhtari et al. [55] obtained mean  Fis value of 0.01 which is lower than that (1.00) presented in this work. Safflower is a self-pollinated crop, higher Fis values are therefore expected. In this study, the estimated Fst value (0.21) was higher than the variation explained by the genetic population as evaluated by the analysis of molecular variance (AMOVA). The difference of magnitude between the two metrics was expected as Fst accounted only for genetic populations as a source of variation, while AMOVA accounted for genetic populations and the provenance of the accessions. To understand the variations level more clearly, various diversity indices were calculated at the population's level and population B was found superior by representing higher values for these diversity indices. On the other hand, unclassified population reflected lesser level of diversity by accounting lower values for these diversity indices. The evaluation of pairwise genetic distance showed a mean of 0.288, with the highest genetic distance between accessions Turkey3 and Afghanistan4, followed by Afghanistan2 and Pakistan24 with respective distance values of 0.51 and 0.49. Greater similarity was found between Afghanistan4 and Afghanistan5 accessions showing least genetic distance of 0.05. One understandable reason behind the presence of maximum genetic similarity might be due to their origin from the common parents. To explore the genetic diversity more clearly, genetic distances were also calculated at the population level and mean maximum genetic distance was reflected by the population D and minimum was resulted by population A. Within populations, Turkey16 and China9 reflected maximum genetic distance and minimum was present between Afghanistan4 and Afghanistan5 accessions belonging to population A. Within population B, maximum genetic distance was observed between accessions Iraq1 and Jordan4, while minimum genetic distance was shown between accessions Jordan4 and Jordan5. Argen-tina1 and Iran8 were the two most distinct accessions reflecting maximum genetic distance in the population C and Australia1 and Turkey6 were found two most genetically similar accession of population C representing minimum genetic distance. Within population D, Turkey3 and Iran9 were most diverse accessions and Kazakhstan1 and Pakistan14 were two genetically distinct accessions belonging to unclassified population. Germplasm containing desirable Molecular characterization and polymorphism in safflower accessions plant traits can be usefully integrated in breeding programs to develop superior cultivars [24], particularly through controlled hybridizations involving genetically distant parental lines. The above four most diverse accessions identified in this work can be recommended as a candidate parents for future safflower breeding programs.
The analysis of molecular variance (AMOVA) was used to determine the pattern of the partition of the total gene diversity among and within populations, and the countries of origin [56]. AMOVA showed that most of genetic structure was explained by variations from individuals within populations, the genetic populations within countries and the countries within genetic populations. These findings are in agreement with Wodajo et al. [57], as they reported more within-population (98.9%) importance on genetic structure than among populations (1.1%) using ISSR markers to evaluate 70 safflower accessions from Ethiopia. The discrepancy in terms of the magnitude of variance components explained by the differing sources of variation included in the AMOVA model. The authors included in their model only the population as a source of variation, while in this work two sources of variation were considered including the population and the country of origin.
The model-based structure application proved more robust and informative in previous investigations [58][59]. Structure was therefore used in this work as a benchmark for clustering algorithms. Using structure, the 131 safflower accessions were partitioned into four main populations (A, B, C, and D), and 18 individuals with poor membership coefficients across clusters were considered unclassified population (Fig 2). A total of 31,22,33,27 [3] and from this region, it is distributed to other parts of the world. Turkey, represents a great level of biodiversity, differentiation center among the continents, and played a vital role to connect the continents with each other [24].
On continents basis, population A clustered a total of 7 and 24 accessions belonging to American and Asian continents respectively. In population B, 3, 11 and 8 accessions originated from Africa, Asia and Europe, respectively. Population C comprised accessions from America (2), Asia (29), Europe (1), and Oceania (1). In population D most of the accessions originated from Asia (23), while a few accessions came from Africa (3) and Europe (1). The unclassified population contained genotypes mostly from Asia (11), while the other few came from Africa (2) and Europe (5) accessions also made divergence from above four populations by making their separate group. Clearly, the clustering based on molecular markers did not discriminate the origins of the safflower accessions evaluated in this work, which was also confirmed by the AMOVA inferences. Accessions from different countries clustered together, implying that kinship was more determinant for the population structure than the geographical provenance. In addition to sharing common parentage, similarities of accessions in same group during clustering might also be due to convergent evolution and selection [60]. It can therefore be inferred that populations from different geographical regions shared a great proportion of genetic diversity. The design of the experiment in this work cannot provide explanation of the observed predominance of Asian safflower accessions. However, the above countries of origin are part of the seven "centers of similarity" (the Far East, India-Pakistan, the Middle East, Egypt, Sudan, Ethiopia and Europe) as recognized by Knowles [5]. Safflower accessions from Afghanistan, Pakistan, Turkey, India, and particularly from China were found more diverse as they were present in all populations. The higher diversity observed in the Asian safflower accessions is a strong evidence of their wider adaptability, which is supported by the findings of Yang et al. [20] and Zhang [61].
In 1969, Knowles recognized the existence of seven safflower similarity centers across the world. Overall, the centers of similarity were represented by several accessions in this study. However, the molecular marker data used in this study did not provide much support to the above Knowles's hypothesis on the similarity centers. Indeed accessions belonging to different similarity centers were clustered together. This lack of importance of similarity centers in defining molecular-based populations was reported in scientific literature [62]. In population A, the safflower accessions locally collected from Pakistan were mostly (12 accessions) part of the India-Pakistan similarity center. Also, six accessions from Turkey, two from Afghanistan, and two from Iran were present in this population and can be assigned to the Middle East similarity center. Population B comprised of safflower accessions from Syria (2), Israel (2), Jordan (4), Afghanistan (1), and Iraq (2) belonging to the Middle East similarity center. Similarly, population B contains safflower accessions from Spain (3), Portugal (5), and Morocco (1) which are part of the Europe similarity center. Population C exhibited safflower accessions from Afghanistan (1), Turkey (6), Iran (5), and Syria (1) revealing the Middle East similarity center. Also, population C contains accessions from Pakistan (3), Bangladesh (4), and India (2) showing the India-Pakistan similarity center. Population D revealed the India-Pakistan similarity center by containing accessions from India (3) and Pakistan (9). Population D also exhibits the Middle East similarity center because it contains accessions from Afghanistan (1), Turkey (4), Israel (1), and Iran (3). The unclassified population revealed the presence of Europe similarity center as it contains one accession from each country; Spain, France, Austria, Hungry, and Portugal. In the same way, India-Pakistan similarity center was also available in the unclassified population due to the presence of safflower accessions from Pakistan (4) and India (1). There is a still need for more research in order to shed more light on the safflower similarity centers at molecular level by collecting and evaluating accessions from all known similarity centers.
The investigation of genetic relationships between the 131 accessions using UPGMA clustering algorithm resulted in a clustering pattern comparable with the model-based algorithm with a few exceptions as two and five population B accessions clustered with population D and population C, respectively, and UPGMA discrepantly clustered the accession Iran8 (population C) in population A (Fig 3). Since these accessions displayed mostly full membership coefficients in model-based Structure, the discrepancy observed in UPGMA clustering approaches can be explained by its reduced resolution power relative to the model-based Structure [58][59].
Principal coordinate analysis (PCoA) greatly supported the structure based clustering of 131 safflower accessions using 13 iPBS-retrotransposon primers (Fig 4). The four populations were clearly distinguishable, and the unclassified population was disseminated throughout the other populations, particularly throughout populations B, C, and D. These light discrepancies between PCoA and model-based structure can derive from differing clustering resolution, with model-based structure exhibiting more resolution. Indeed, 40% of the variation in the overall genetic structure was not accounted for by the first two PCoA dimensions presented in this work. The above-mentioned misclassifications of accessions in the principal coordinate space can be explained by the existence of genomic admixture. PCoA analysis revealed the same pattern of distribution of similarity centers as identified by structure based analysis. Population A, B, and D exhibited the Middle East similarity centers as they contain safflower accessions from Turkey, Afghanistan, Iran, Syria, Israel, Jordan, and Iraq. Population C comprised of India-Pakistan similarity center by containing safflower accession from India, Pakistan, and Bangladesh. Europe similarity center is present in population B and in the unclassified population of PCoA based analysis. It suggests more research work regarding the confirmation of safflower similarity centers at molecular level. Overall, iPBS-retrotransposons revealed a good spectrum of genome diversity in safflower and the explored genetic diversity can be used in future safflower breeding programs. As iPBS-retrotransposon marker system demonstrated competitive results in this work and in previous investigations, it is warranted to focus further attention on collecting and evaluating safflower germplasm at molecular level using iPBS-retrotransposons as an important tool for enhancing productivity. To contribute to the yet unending discussion on the safflower similarity centers, a robust sampling techniques including random sampling without replacement can be implemented on the accessions in major world safflower seed repositories; the sampled materials can be evaluated using clustering algorithms such as those implemented in this work.

Conclusion
A good level of genetic diversity was identified among 131 safflower accessions. The importance of genetic populations on the genetic structure was significant, but its magnitude was lesser than the importance the variations of individuals within genetic populations. The provenance of the samples showed no effects on the genetic structures in the 131 accessions. Our results most probably obey the seven similarity centers hypothesis of safflower but still there is need to conduct further research works to confirm these similarity centers at the molecular level. Generally, safflower accessions from Asian countries like Afghanistan, Pakistan, China, Turkey, and India were found diverse. Specifically, among 131 safflower germplasm, accessions Turkey3, Afghanistan4, Afghanistan2, and Pakistan24 were found most diverse at molecular level and might be recommended as a candidate parents for future safflower breeding programs.