Genetic diversity trend in Indian rice varieties: an analysis using SSR markers

The knowledge of the extent and pattern of diversity in the crop species is a prerequisite for any crop improvement as it helps breeders in deciding suitable breeding strategies for their future improvement. Rice is the main staple crop in India with the large number of varieties released every year. Studies based on the small set of rice genotypes have reported a loss in genetic diversity especially after green revolution. However, a detailed study of the trend of diversity in Indian rice varieties is lacking. SSR markers have proven to be a marker of choice for studying the genetic diversity. Therefore, the present study was undertaken with the aim to characterize and assess trends of genetic diversity in a large set of Indian rice varieties (released between 1940–2013), conserved in the National Gene Bank of India using SSR markers. A set of 729 Indian rice varieties were genotyped using 36 HvSSR markers to assess the genetic diversity and genetic relationship. A total of 112 alleles was amplified with an average of 3.11 alleles per locus with mean Polymorphic Information Content (PIC) value of 0.29. Cluster analysis grouped these varieties into two clusters whereas the model based population structure divided them into three populations. AMOVA study based on hierarchical cluster and model based approach showed 3 % and 11 % variation between the populations, respectively. Decadal analysis for gene diversity and PIC showed increasing trend from 1940 to 2005, thereafter values for both the parameters showed decreasing trend between years 2006-2013. In contrast to this, allele number demonstrated increasing trend in these varieties released and notified between1940 to 1985, it remained nearly constant during 1986 to 2005 and again showed an increasing trend. Our results demonstrated that the Indian rice varieties harbors huge amount of genetic diversity. However, the trait based improvement program in the last decades forced breeders to rely on few parents, which resulted in loss of gene diversity during 2006 to 2013. The present study indicates the need for broadening the genetic base of Indian rice varieties through the use of diverse parents in the current breeding program.


Background
Rice (Oryza sativa L.) is a staple food crop in India and many parts of the world. In India, it occupies the largest area under cultivation and has maximum share in grain production [3]. India is one of the centers for rice diversity and large diversity has been reported both at interand intra-specific levels [36]. Yield, quality characters and tolerance to biotic and abiotic stresses are major objectives of varietal development [25].
A large number of rice varieties are released and notified every year in India with higher yields, tolerance to biotic and abiotic stresses and to meet the requirement of changing farming systems based on user demands. Different rice varieties of distinct genetic background are a good promise for the future rice crop improvement. This has contributed to a large extent to the major increases in agricultural productivity in the twentieth century [10]. It is generally thought that continuous selection among the crosses of genetically related cultivars has led to a narrowing of the genetic base of the crops on which modern agriculture is based, thus contributing to the genetic erosion of the crop gene pools [33].
A robust and reliable method of fingerprinting is required for identification and purity testing of these varieties [41], as well as to study the genetic relationships among different cultivars [18,42]. Genetic characterization of crop plants has gained momentum with the advent of PCR based molecular markers. Nowadays, SSR is a marker of choice for molecular characterization as it is co-dominant, distributed throughout the genome, highly reproducible, variable, reliable, easily scorable, abundant and multiallelic in nature [37]. SSR markers have been used by many researchers [9,17,44] for characterization of rice varieties. SSR markers even in less number can give a better genetic diversity spectrum due to their multi allelic and highly polymorphic nature [24].
Recent reports suggest that genetic diversity in crop varieties released over the years fluctuates in successive time periods [8,48]. In case of wheat there are reports available which showed an increase [16], decrease [12] as well as constant gene diversity over a period of time [15,35]. The similar trend was also reported in rice [23,47]. Over the last few centuries, rice has faced diversity loss [6] especially, after the green revolution due to replacement of native varieties with high yielding varieties [14]. Despite of a large number of varieties being developed in every year, molecular studies on a small set of rice varieties has revealed narrow genetic base [26,45].
The present study was undertaken with the aim to assess the trend in genetic diversity of Indian rice varieties released and notified over the period from 1940 to 2013 and to understand the genetic relationship amongst the varieties by employing both hierarchical and model based approach using hyper variable simple sequence repeats (HvSSR) markers.

Results and Discussions
Our study is the first major effort to analyze the trend of genetic diversity in the large set of Indian rice varieties released over the years. A total of 729 varieties released from the year 1940 to 2013 was analyzed using HvSSR markers. These varieties possessed various agronomical and economically important traits such as tolerance to biotic and abiotic stresses (drought, cold, salinity and lodging), aroma content, grain yield and early maturity etc. (Additional file 1: Table S1).

HvSSR marker based analysis
Thirty-six HvSSR markers were used to characterize 729 rice varieties. Gene diversity, heterozygosity, major allele frequency and PIC were calculated for all the 36 HvSSR markers. A total of 112 alleles was amplified with an average of 3.11 alleles per locus (Table 1). Similar observations were also reported; 3.02 alleles per locus with SSR markers during characterization of 25 Indian rice hybrids [2] and 3 alleles per locus in a set of 192 Indian rice germplasm characterization [25]. The number of alleles amplified per HvSSR primers varied from 2 to 5 with maximum numbers of alleles (5) being amplified by primers; HvSSR09-11, HvSSR11-21, HvSSR11-58 and HvSSR12-13. A similar number of alleles (2)(3)(4)(5) for SSR markers were reported in 141 Basmati rice accessions of North Western Himalaya [37]. The PIC values for HvSSR primers ranged from 0.04 (HvSSR06-16) to 0.58 (HvSSR03-37) with a mean of 0.29. Shah et al. [38] and Pachauri et al. [29] have reported mean PIC values 0.37 and 0.38, respectively in different sets of rice varieties which were closer to our result. On the other hand, Pal et al. [30] reported mean PIC value 0.40 on a set of basmati and non-basmati varieties and Salgotra et al. [37] have reported mean PIC value 0.40 in a basmati collection of north-western Himalaya, which were little higher than our result. The gene diversity ranged from 0.04 (HvSSR06-16) to 0.66 (HvSSR03-37) with an average of 0.33. Gene diversity obtained in the present study was quite low as compared to 0.52 [25] and 0.54 [6] reported in rice germplasm lines and varieties, respectively. Heterozygosity varied from 0.92 (HvSSR03-02) to 0.00 (HvSSR05-30) with an average of just 0.15. The low level of heterozygosity has also been reported in other studies on rice [5,25] and this could be attributed to its self pollination behavior. The major allele frequency was also calculated for all 36 HvSSR markers which ranged from 0.37 (HvSSR03-37) to 0.97 (HvSSR06-16) with an average of 0.76 ( Table 1). The average major allele frequency in the present study was higher as compared to the previous studies on Indian rice varieties [46] and Korean landraces [19].

Hierarchical cluster analysis
The amplicons generated by HvSSR markers across 729 varieties were used for cluster development using the neighbour joining (NJ) method. The unrooted tree ( Fig. 1) grouped 729 rice varieties into two major clusters, 400 varieties in cluster1 whereas; 329 varieties were grouped in cluster2. This grouping was further supported by studies of Upadhyaya et al. [46], Nachimuthu et al. [25] and Das et al. [9] who also have reported two clusters during their studies in Indian rice germplasm. Further, we also analyzed clustering pattern of rice  Table S2).

Model based population structure
Population structure divided 729 varieties into 3 populations (Figs. 2 and 3 and Additional file 3: Figure S1). Population1 (pop1), population2 (pop2) and population3 (pop3) contained 72, 329 and 328 varieties, respectively. Further, based on the membership fractions, varieties under different populations were categorized as pure or admixture. The varieties with the probability more than ≥0.80 score was considered as pure and less than 0.80 as   Table 2). The allele frequencies (divergence among populations) were 0.0686 between pop1 and pop2; 0.0533 between pop1 and pop3 and 0.0548 between pop2 and pop3 (Table 3). Earlier studies on population structure have reported two to eight subpopulations using different rice collections [1,4,13,20,[49][50][51]. Roy et al. [36] and Upadhyaya et al. [46] have also reported similar population number in different set of Indian rice varieties. The relatively small value of alpha (α = 0.0829) in present study reveals that, only few individuals were admixed. Alpha value approaching zero indicates that most individuals in the study are from separate populations [19] whereas; an alpha value greater than 1 indicates that most of accessions of populations are admixed [28]. Distributions of rice varieties in different populations based on their traits were also studied.
All 30 hybrid varieties and most of the aromatic rice varieties were grouped in pop2 ( Fig. 2 and Additional file 3: Figure S1). It was also observed that 190 varieties out of 329 in pop2 were released after 1979, which indicates that most of the recently released varieties were present into pop2. Both hierarchical and model based population structure showed that large number varieties in cluster2 (347) and pop2 (329) correspond to each other.
AMOVA and PCoA of clusters obtained using hierarchical approach AMOVA for the 729 varieties was performed based on the two clusters obtained using hierarchical cluster analysis. The two populations showed 3 % variance among themselves, whereas, 61 % variance was recorded among individuals and 36 % variance within individuals ( Fig. 4 and Table 4). PCoA based on hierarchical clusters (labeled with two different colours) showed intermixing of two groups across the coordinates (Fig. 5). The first three axes explained 15.9 % of cumulative variation (Table 5).   AMOVA was performed on three populations obtained using a model based approach. Among three populations 11 % variance was recorded, whereas, among individuals, 55 % variance and within individuals, 34 % variance was found ( Fig. 6 and Table 6). Choudhury et al. [7] have also reported similar pattern of variation in Indian rice germplasm using populations derived from the model based approach. PcoA revealed that large genetic diversity exists in Indian rice varieties. The first three axes explained 15.9 % of cumulative variation (Table 7). In PcoA, rice varieties were labeled with three different colours which represent the three populations obtained from population structure (Fig. 7). The pop1 and pop2 showed distinct grouping whereas; the individuals of pop3 were distributed over pop1 and pop2. Hierarchical based AMOVA analysis showed less (3 %) variation among population, compared to model based structure population (11 %). The reason for less variation between populations in case of hierarchical clusters may be due to the number of groups predicted (two clusters) which was higher in case of model based approach (three groups).

Pedigree-based analysis of hierarchical cluster and model based population
Analysis of varieties sharing common parentage (Additional file 1: Table S1) showed that they were grouped in the same cluster ( Fig. 1) or population (Fig. 2) Table 9). There were a few exceptions where varieties having common parentage  were not grouped together in the same cluster or population. For example, Kanchi and Vaigai were grouped into same population (pop2) but, in different clusters. Similar trend in pedigree based clustering was also observed by Upadhyay et al. [6,46]. Upadhyay et al. [46] showed that varieties with at least one common parent were grouped in one cluster and Choudhary et al. [6] showed that varieties released during different decades were also grouped together due to the presence of common parents in their pedigree.

Co-linearity between hierarchical cluster and model based population analysis
The Co-linearity between varieties grouping in hierarchical cluster and model based population structure was confirmed by Venn diagram. Liu et al. [22] studied the Chinese wild rice collection and has also shown that Venn diagram is a robust method to study overlapping accessions. In the Venn diagram out of 729 varieties, 244 rice varieties (56.5 %) were common between pop2 and cluster2. Similarly, 252 rice varieties (55.3 %) were common between pop3 and cluster1 ( Fig. 8a and b). This study supports that grouping of rice varieties based on hierarchical cluster and model based approach were more than 55 % similar.

Conclusion
The present study based on 36 HvSSR markers distributed over all 12 chromosomes of rice suggests that after green revolution breeders have used different parentage for improving the yield, quality and plant architecture, but after 2006 priority of breeders have changed and instead of plant architecture, more focus was on breeding for biotic and abiotic stress tolerance and trait-specific improvement. This could be the possible reason that allele number recorded over the period has not decreased,

Plant materials
Seed samples of 729 varieties of rice received from Indian National Genebank, ICAR-National Bureau of Plant Genetic Resources (NBPGR), New Delhi. The details of each variety along with passport data (National ID, i.e. Indigenous Collection (IC) number, state, local name, pedigree, traits) are given in Additional file 1: Table S1 [39].

DNA extraction from rice seed
Seeds of each variety (10-12 seeds) were dehusked and used for DNA isolation using QIAGEN DNeasy plant mini kit (Hilden, Germany). Fine powder was obtained by grinding kernels using tissue lyser (Tissue lyser II Retsch, Germany) with a tissue lyser adapter set (QIA-GENq). QIAGEN DNeasy plant mini kit protocol was followed for DNA isolation.

Genotyping of rice varieties using SSR markers
Initial screening was done with 120 highly variable SSR (HvSSR) marker loci with repeat lengths of 51-70 bp which were located across all 12 chromosomes of rice [40]. Finally 36 most polymorphic markers (3markers/ chromosome) were selected, which were covering both long and short arm of rice chromosome for genotyping of 729 rice varieties. Temperature of amplification for each primer was standardized by gradient PCR with selected rice samples. Working stocks (10 ng/μl) of genomic DNA of all the 729 varieties were prepared. PCR reaction mixture (total volume of 10 μl) contained 2 μl genomic DNA (10 ng/μl), 0.8 μl of 25 mM MgCl 2 , 1 μl of 10X buffer, 0.2 μl of each primer (10 nmol), 0.2 μl of 10 mM dNTPs, 0.2 μl of Taq DNA polymerase (Fermentas, Life Sciences, USA) and 5.6 μl distilled water. The conditions for PCR amplification were as follows: initial denaturation at 94°C for 4 min followed by 36 cycles of 94°C for 30 s, Ta for 45 s, 72°C for 1 min and final extension at 72°C for 10 min. 4 % metaphor agarose gel was used for analyzing the amplified products with the constant supply of 120 V for 4 h. Gel documentation system (Alpha Imager®, USA) was used to record the gel pictures.

Statistical analyses
Power Marker 3.5 [21] was used to calculate major allele frequency, gene diversity, heterozygosity and polymorphic information content (PIC) for each locus of HvSSR markers. The genetic distance calculated for each variety with Power marker, which was used for cluster development using neighbor-joining (NJ) tree. The un-weighted neighbor joining tree was constructed using DARwin software 5.0.158 [32] GenAlEx V6.5 [31] was used to study PCoA and AMOVA. To study the population structure model-based program, STRUCTURE 2.3.3 [34] was used and three replications were run for each K. Each run was implemented with a burn-in period of 100,000 steps with 100,000 Monte Carlo Markov Chain replicates [34]. The membership of each genotype was run for a range of genetic clusters from value of K = 1 to 20 by taking the admixture model and correlated allele frequency into account. The plateau of ΔK was obtained by plotting LnPD values derived for each K [11]. The final population was determined using "Structure harvester" program (http://taylor0.biology.ucla.edu/ structureHarvester/). Venn diagram analysis was performed to identify common varieties between cluster and population using software Venny 2.1 [27].

Ten-year interval analysis
All 729 rice varieties were divided into six different intervals on the basis of year of release and notification [1940-1965 (18 varieties), 1966-1975   value of gene diversity, heterozygosity, major allele frequency and PIC of rice varieties falling in six time intervals were analyzed using the Power Marker. The mean value of gene diversity, heterozygosity, major allele frequency and PIC were plotted with years (time interval) on X axis and values of gene diversity, heterozygosity, major allele frequency and PIC on Y axis (Fig. 9).

Additional files
Additional file 1: Table S1. List of rice varieties characterized using HvSSR markers along with their pedigree, IC number, year of release and traits. (XLSX 79 kb) Additional file 2: Table S2. List of hybrid rice used in the present study with year of release and pedigree. (XLSX 10 kb) Additional file 3: Figure S1. Detailed population structure of 729 rice varieties based on SSR data. (PDF 459 kb)

Acknowledgement
We are thankful to the Director, NBPGR, New Delhi, who provided facilities for this work.

Availability of data and materials
The datasets supporting the results of this article are included within the article and its additional files.