Favorable alleles mining for gelatinization temperature, gel consistency and amylose content in Oryza sativa by association mapping

Background Improving the gelatinization temperature (GT), gel consistency (GC) and amylose content (AC) for parental grain eating and cooking qualities (ECQs) are key factors for enhancing average grain ECQs for hybrid japonica rice. Results In this study, a genome-wide association mapping (GWAS) for ECQs was performed on a selected sample of 462 rice accessions in 5 environments using 262 simple sequence repeat markers. We identified 10 loci and 27 favorable alleles for GT, GC and AC, and some of these loci were overlapped with starch synthesis-related genes. Four SSR loci for the GT trait were distributed on chromosomes 3, 5, 8, and 9, of which two SSR loci were novel. Two SSR loci associated with the GC trait were distributed on chromosomes 3 and 6, although only one SSR locus was novel. Four SSR loci associated with the AC trait were distributed on chromosomes 3, 6, 10, and 11, of which three SSR loci were novel. The novel loci RM6712 and RM6327 were simultaneously identified in more than 2 environments and were potentially reliable QTLs for ECQs, with 15 parental combinations being predicted. These QTLs and parental combinations should be used in molecular breeding to improve japonica rice average ECQs. Conclusions Among the 10 SSR loci associated with GT, GC and AC for grain ECQs detected in 27 favorable alleles, the favorable allele RM3600-90bp on chromosome 9 could significantly reduce GT, RM5753-115bp on chromosome 6 could significantly increase GC, and RM6327-230bp on chromosome 11 could significantly reduce AC in hybrid japonica rice mixed rice samples. Electronic supplementary material The online version of this article (10.1186/s12863-019-0735-y) contains supplementary material, which is available to authorized users.


Background
Rice (Oryza sativa L.) is one of the most important food crops grown worldwide for more than half of the world's population [1]. The success of hybrid rice in China was a significant step for improving rice grain yield, with the yield of hybrid rice being 15-20% higher than that of high-yielding conventional varieties [2]. During the last 20 years, the annual planting acreage for hybrid indica rice in China accounted for approximately 80% of the total indica rice area, while hybrid japonica rice only accounted for approximately 5% of the total japonica rice area [3]. One of the main reasons for the slow development of hybrid japonica rice is that the grain eating and cooking qualities (ECQs) are not as good as conventional japonica rice [4]. For hybrid rice cultivars, although F 1 hybrid rice plant populations are consistent, grains harvested from commercial F 1 rice hybrid plants represent the F 2 seed generation and are segregated for some grain characteristics. Thus, we must consider the effect of this segregation on rice ECQs [5].
The ECQs have been widely studied in cereals. Mo (1988) first showed that the inheritance of rice ECQs is complicated by the maternal plants or cytoplasm effects, and is mainly controlled by triploid endosperm genotypes [31]. They were determined througha mating design and its corresponding statistical method, through which a model was used to independently test the genetic effects of the endosperm and maternal genotype, as well as effects of cytoplasmic differences in the segregated generations [32]. Xu et al. (1995) reported that the inheritance of rice ECQs was mainly controlled by the triploid endosperm genotype with slight cytoplasmic effects [33]. Although the aforementioned models were used to identify the genetic expression of quality traits in cereal, these reports usually used heterozygous individuals from the segregated progeny. He et al. (1999) established a doubled haploid (DH) population to identify nine QTLs for rice ECQs; he proposed that a permanent segregation population can provide sufficiently homozygous generations for the analysis of rice ECQs [10]. Aluko et al. (2004) and Bao et al. (2002) suggested that minor QTLs for rice ECQs can be detected for the near-isogenic line (NIL) population [14,34]. Though some QTLs for rice ECQs have been successful analyzed in the abovementioned studies, traditional biparental segregated populations showed several limitations, including finite genetic variation and recombination [35]. Association analysis is a method based on the linkage disequilibrium to detect the correlations between phenotypic variation and genotype [36]. Additionally, compared with traditional QTL mapping, association analysis that occurs in the natural population has the advantage of more recombination probabilities and carries on the validly complement for segregation population; for example, previous studies showed that linkage disequilibrium mapping improves the QTL mapping resolution by 500 times [37]. With the rapid development of plant genomics, association analysis has enabled researchers to develop polymorphisms and locate alleles in the genome [38]. Therefore, it has become a valuable and powerful method to exploit SSR markers for complex quantitative traits [39][40][41][42]. Borba et al. (2010) and Xu et al. (2016) presented to use the marker-trait associations for mining of alleles associated with rice grain quality traits [15,43]. Although a few reports have focused on association mapping for rice ECQs, the geographical and ecological distribution of the experimental populations was insufficient and without abundant genetic diversity, and the extent of linkage disequilibrium (LD) was low. This technique has allowed researchers to exploit natural diversity and mine valuable genes in the genome.
A natural population of 462 accessions derived from seven different ecological regions was developed in our laboratory for QTL identification [44][45][46]. This population has several advantages over the permanent segregation population for detecting QTLs and mining valuable genes in the genome. On one hand, the natural population contains sufficient germplasm resources that are suitable for mining more favorable alleles for rice grain quality traits. In contrast, the homozygotic genotype of each accession in the natural population could simplify the QTL detection of grain quality traits and inconsiderate of the effects of triploid endosperm genotypes, both maternal and cytoplasmic.
In this study, an experimental natural population containing a set of 462 rice accessions and 262 SSR marker pairs were used to conduct association mapping for rice ECQs on a genome-wide scale through five environments. The purposes were to detect QTLs significantly associated with the ECQs, mine favorable alleles, and then explore the parental combinations for improving rice ECQs.

Plant materials and field planting
The 462 rice varieties originating from different regions of Asia were used as the plant materials in this study represent a subset of our previous report [44]. Among the varieties, 341 were from China and 121 were from Vietnam (Additional file 1: Table S1). In the summers of 2011, 2012 and 2013, the 462 accessions were grown in the five environments, E1 to E5, which included three locations, Nanjing, Yuanyang and Xinyang (Additional file 2: Table S2). Each entry plot was composed of eight rows with eight plants per row. All plots were arranged with a randomized complete block design and two replicates per environment.

Evaluations of GT (ASS), GC and AC
After harvest, the rough rice grains were air-dried under natural sunshine conditions, stored at room temperature for 3 months, and then kept in a refrigerator at 4°C for reserve. Then, the rough rice was milled into brown rice using a rice huller (JLGJ45, manufactured by the Taizhou Food and Oil Machinery Factory, Zhejiang, China). The brown rice was then processed into milled rice using a rice milling machine (JNMJ3, manufactured by the Huangyan Food and Oil Machinery Factory, Zhejiang, China) as follows. Twenty grams of brown rice per accession was placed into the miller machine and milled for 70 s. One half of the polished head rice was used for measuring the GT and the other half was ground into a fine powder using the flour mill (FW100, Tianjin Tai Si Te Instrument Inc, China) and kept in each Ziploc bags for measuring the GC and AC.
The GT was estimated indirectly as the alkali spreading score (ASS) using the method from Little et al. (1958) [47] with minor modifications. Briefly, 6 grains of intact milled white rice from each accession were placed in a 60-mm petri dish with 10 ml of 1.7% KOH. The samples were separated from each other using forceps and incubated at 30 ±0.5°C for 23 h to allow the grains to spread. The spreading score of the grains was recorded by visual assessment as described by Jennings et al. [48]. Based on the appearance of the endosperm and the degree of dispersion, an ASS score from 1-7 was recorded. Unaffected or slightly swollen endosperms were recorded 1, while those that completely disappeared were recorded as 7 (Fig. 2b). GT values are inversely related to the ASS score. The following are the three classes of GT: ASS from 1 to 3 grade is high GT (> 75°C), from 4 to 5 grade is intermediate GT (70-74°C), and from 6-7 grade is low GT (<70°C) [6].
The GC was determined according to the method of Cagampang et al. [49]. Briefly, 100 mg rice flour with approximately 12% moisture content was weighed in a test tube, to which 0.2 ml of 95% ethanol containing thymol blue was added and gently shaken to prevent clumping of the powder during gelatinization. Two milliliters of 0.2 mol/L KOH were added and vortexed thoroughly. The tubes were covered with glass marbles and boiled in a water bath to reflux for 8 min. After cooling down at room temperature for 5~10 min, the tubes were placed on ice for 20 min and then laid down horizontally on a table surface for 1 h. The gel length (mm), which was measured as the distance from the bottom of the tube to the front of the gel migration, is a measurement of GC. A longer gel correlates with a higher GC value.
The AC of the rice grains was measured using an automatic microplate spectrophotometer (TECAN Infinite 200 Pro, Untersbergstrasse 1A, Austria, Additional file 3: Figure S1A) according to the method described of Zhu et al. [50]. Briefly, 50 mg of test sample flour was weighed and added to a test tube, followed by the addition of 0.5 ml 95% ethanol and 1.5 ml of 3 mol/L NaOH. The tube was placed at 30°C overnight (12-16 h) after being fully skimmed and shaken lightly with distilled water to increase the volume to 40 ml. 10 μl of the mixture was added to a 96-well ELISA plate; the remaining reaction solution consisted of 2 μl of 1 mol/L acetic acid, 3 μl of 2% I 2 -KI, and 185 μl distilled water to bring the final volume to 200 μl (Additional file 3: Figure  S1B). We then used a microplate spectrophotometer to automatically record the optical density value (OD) at 620 nm, which was shown on the computer monitor (Additional file 3: Figure S1C). The AC was calculated using a standard curve.

Phenotypic data analysis
The phenotypic data statistical analyses were performed using the SPSS software (IBM Institute Inc., Armonk, NY, USA). The broad sense heritability (H 2 B ) was calculated by using the formula H 2 B ¼ σ 2 g =ðσ 2 g þ σ 2 e =nÞ [51], where σ 2 g represents the genetic variance, σ 2 e represents the error variance, and n represents the number of replications.

SSR marker genotyping
Young leaves at the tillering stage were collected from each plant of each accession and were genotyped using SSR molecular markers. Genomic DNA was extracted following the methods described by Tai and Tanksley [52]. A total of 262 SSR markers selected from 12 rice chromosomes were used for genotyping the 462 accessions studied. PCR process, PCR product separation and record of band size followed the methods described by [44].

Genotypic data analysis
The population structure of 462 accessions was analyzed with three methods. Firstly, the Bayesian clustering method, using the STRUCTURE 2.3 software to implement [53,54]. Secondly, the neighbour-joining method, using the PowerMarker (version 3.25) software to carry out the polymorphism information content (PIC), then the neighbor-joining tree was built based on Nei's genetic distance and framed using the MEGA 5.0 software [55]. The calculations followed the same method as those described in [44]. Finally, the principal component analysis (PCA) for each subpopulation was calculated by the RTM-GWAS program with default parameters by using the 262 SSR markers [56]. The coefficient of genetic differentiation (F ST ) was estimated to measure the fixation of different alleles in different subpopulations based on the method that was proposed by Weir and Hill [57], and the computing process was completed using the Arlequin 3.11 software [58].

Linkage disequilibrium analysis
Prior to the association analysis, data with a minimum allele frequency (MAF) < 5% by filter alignment were removed from the dataset. According to the level of LD and the genetic distance between the markers with intra-chromosomal combinations, the regression equation of LD with genetic distance was calculated by regression analysis. The relationship between LD and genetic distance was observed by plotting LD decay.

Association analysis
To control for false positives, the K (the kinship matrix) + Q (population structure matrix) model was used for the genome-wide association mapping based on the mixed linear model (MLM) in the TASSEL 5.0 software [59]. The Q matrix was obtained from the STRUCTURE 2.3 analysis results. The K matrix was generated from the relatedness analysis results using the kinship matrix function in TASSEL 5.0. The false discovery rate (FDR) was used to control for the proportion of falsely refused hypotheses for significant associations using the Muralidharan correction method [60]. A -LogP value greater than 3 was set as a threshold for strong associations, and the quantile-quantile (Q-Q) and Manhattan plots for significantly associated markers were performed using TASSEL 5.0. Based on the association locus identified, the phenotypic effects of the alleles were estimated by the 'null allele' (non-amplified allele) for each locus [61]. The formula used for calculating the phenotypic effect of average positive (negative) allelic effects (AAE + or AAE -) followed the same method as described in [44]. Table 1 shows the mean value, coefficient of variation and heritability in the broad sense (H 2 B ) for GT, GC and AC measured in the 462 varieties across the five environments. The CVs ranged from 30.79% for AC to 56.23% for GC, suggesting significant variations in the 3 traits among the accessions across the five environments. Continuous distributions were observed for GT, GC, and AC, and the phenotypic data for all the three traits followed a normal distribution based on the skewness and kurtosis values. A two-way analysis of variance (ANOVA) showed that differences among the 462 accessions for GT, GC and AC were highly significant (P < 0.01), indicating that there were a large amount of genetic variation in the natural population. The average H 2 B across 5 environments for GT, GC, and AC was 95%, 94% and 98%, respectively.

Phenotypic evaluations
For the GT trait, the mean ASS values for the 462 cultivars were 4.65, 4.45, 4.61, 3.95 and 3.6, with 95.0, 92.7, 95.5, 94.9 and 94.6% H 2 B in the five environments, respectively ( Table 1). The graphical representation of the GT for typical accessions is shown in Fig. 1. The minimum alkali spreading score is 1, and its typical accession is 'Yuedao70'. Its rice grains displayed no cracks or swelling; therefore, it belongs to a type with a high gelatinization temperature (Fig. 1a). In contrast, the maximum ASS value is 7, with 'Longdun105' as a typical accession (Fig. 1b). These rice grains are completely dispersed and mixed, demonstrating that it has a lower gelatinization temperature. The performance of the GT trait was mostly stable across the 5 environments, except for the median ASS of (3.6±1.71) in E5, which represents a Xinyang Farm in Henan (32°N, 114°E), indicating GT trait was mainly controlled by genetic factors (Fig. 1c).
The mean GC for the 462 accessions were 53.07, 50.04, 46.09, 44.05 and 49.5mm, with 94.5, 96.2, 96.0, 92.9 and 90.7% H 2 B , respectively. The shortest gel consistency length was 13 mm, with a typical accession as 'Haobuka' that has a hard gel consistency (Fig. 1d). The longest gel consistency length was 134 mm; 'Hon-gnong5' was this carrier accession and it displayed a soft gel consistency (Fig. 1e). The median value of the 462 rice accessions was largely consistent across the different environments, though the GC trait showed slightly lower phenotypic values (44.05±24.77) in E4, which represents a Yuanyang Farm in Henan (35°N, 113°E) (Fig. 1f) Figure 1g shows that the performance for the AC trait was stable across the 5 environments and was mainly controlled by genetic factors.
Analysis of variance for the GT, GC and AC traits in the population containing the 462 accessions from 2011-2013 at the Nanjing site indicated extremely significant effects due to the genotype, year, year × genotype, and replications within year for the GC and AC traits, but nonsignificant effects due to the replications within year on GT (0.83) (Additional file 4: Table S3).
Additionally, the analysis across three sites for 2013 indicated highly significant effects due to the site, genotype, and site × genotype for the GT and GC traits, but nonsignificant effects due to the replications within site for GT (1.12) and GC (2.01) (Additional file 5: Table S4).

Population genetic structure analysis
The genetic diversity of the 462 accessions was detected using 262 SSR markers, and the analysis resulted in a total of 2462 alleles being detected. The number of alleles per locus ranged from 2 (at locus RM206 on chromosome11) to 25 (RM7545 on chromosome10), with an average of 9.40 alleles per locus (Additional file 6: Table S5). The average genetic diversity was 0.4685 and ranged from 0.0068 (RM7403 on chromosome 3) to 0.7569 (RM128 on chromosome 1) (Additional file 6: Table S5). The average PIC value was 0.4208 and ranged from 0.0067 (RM7403 on chromosome 3) to 0.7444 (RM128 on chromosome 1) (Additional file 6: Table S5). Furthermore, there were 379 unique alleles (allele frequency < 0.5%), 848 rare alleles (0.5% ≤ allele frequency <5%), 1231 polymorphic alleles (5% ≤ allele frequency ≤ 95%) and 4 fixed alleles (allele frequency ≥ 95%) among the 2462 alleles, with corresponding proportions of 15.39%, 34.44%, 50.00% and 0.16%, respectively. An analysis of model-based population structure provided evidence for a significant population structure among the 462 rice accessions. The log-likelihood values increased with the increase in the model parameter K; thus, the ΔK as the diagnostic criterion was used to determine a suitable value for K. The highest ΔK value was obtained at K = 5 (Fig. 2a). The 462 accessions could be divided into five subpopulations from SP1 to SP5 (Fig. 2b). The information of each accession belonging to each subpopulation (Q matrix) is summarized in Additional file 1: Table S1. The neighbor-joining tree constructed based on the Nei's genetic distances showed that the entire population was clearly divided into 5 subpopulations (Fig. 2c), being basically consistent with the results from the Structure analysis. Furthermore, principal component analysis (PCA) plots clearly differentiated the whole population into five subpopulations (Fig. 2d), which corresponded with the population structure results.
Among the five subpopulations, the average pairwise F ST value was 0.3827 ( Table 2). The pairwise value of F ST ranged from 0.2766 (between SP2 and SP3) to 0.4356 (between SP1 and SP5), and the corresponding Nei's genetic distance between SP2 and SP3 was the lowest (0.3906). SP2 had the highest gene diversity of 0.5394, with a total of 1323 alleles or 5.03 alleles per locus and a PIC value of 0.4954, followed by SP3 with a genetic diversity of 0.4938 and 1215 alleles or 4.62 alleles per locus and a PIC value of 0.4487 (Fig. 3). In contrast, SP5 had the lowest gene diversity of 0.4149, with 2.63 alleles per locus and a PIC value of 0.3621 (Fig. 3). Molecular variance analysis (AMOVA) showed that 34.48% of the total genetic variation occurred between the subpopulations, while 65.52% occurred in the subpopulations (Additional file 7: Table  S6). These results suggest a high degree of genetic differentiation among the five subpopulations.  Figure S2).  Fig. 4a). For GT, the alleles with positive phenotypical effect values are considered favorable alleles. Table 5 shows a summary of the top three favorable alleles and their typical carriers. Among the 12 favorable alleles for GT listed in Table 5, the RM3600-90bp allele showed the largest positive phenotypic effect value (2.0 grade) ( Table 5). In total, 86 of 462 accessions carried the allele RM3600-90bp, with a typical carrier accession being 'Longdun105' (Fig. 1b), which originated in Northeast China. RM232 (Chr.3: 8409404-8410886), which was associated with GT, had the second highest PVE of 9.74% in E4 and -LogP value and AAE + values of 3.48 and 0.8, respectively (Table 4, Fig. 4a). Four alleles (RM232-160bp, -155bp, -150bp, and -145bp) were detected over the 462 accessions (MAF>5%). All four of these alleles were associated with reducing the GT trait, of which the favorable allele RM232-145bp showed a maximum positive PEV (0.67 grade) for reducing the GT, with the typical carrier accession being 'Wanyedao' (Table 5).
The alleles with positive effects are considered favorable alleles for the GC trait. Five favorable alleles were detected over the entire population, with 2 in E2 and 3 in E4. The   (Table 5). In total, 33 of 462 accessions carried the allele RM5753-115bp, with its typical cultivar being 'Hongnong5' (Fig. 1e). RM6712 had the largest -LogP value at 3.3 and a PVE and positive AAE of 7.46% and 16.6, respectively, in E2 (Table 4, Fig. 4b). Four alleles from RM6712 were detected over the 462 accessions (MAF>5%). RM6712-115bp and -95bp were associated with increasing the GC trait, while RM6712-85bp and -80bp were associated with reducing the GC trait. The favorable allele RM6712-95bp displayed the largest effect (13.12 mm) for GC, with 'Dongnongjing-nuo418' being a typical carrier accession ( Table 5).

Identification of SSR marker loci associated with AC and favorable alleles
Four loci (RM6712, RM5753, RM258 and RM6327) were associated with AC (FDR< 0.05, -LogP≥ 3) and distributed on chromosomes 3, 6, 10 and 11, respectively, with corresponding -Log P values ranging from 3.04 to 4.27, which explained the phenotypic variation ranging from 5.59% to 12.87%. The alleles with negative effects are considered favorable alleles for the AC trait. Ten favorable alleles were detected over the entire population for the AC trait, with the PEV ranging from -0.45% to -4.32%.
The geographical distribution of variant alleles from the RM3600 and RM6327 loci detected among the 462 rice accessions We further analyzed the geographic distribution and the relative frequencies of alleles on the SSR loci associated with rice ECQs among the 462 accessions (Fig. 5a). RM3600 and RM6327 affected the rice ECQs and explained the maximum phenotypic variance of 10.12% and 12.87%, respectively. Therefore, these two SSR loci were selected for analysis.
According to their origins, the 462 cultivars were divided into the following 7 geographical regions; Eastern China, Northern China, Central China, Southwest of China, Northeast of China, Vietnam and Japan. RM3600-80bp, RM3600-85bp, RM3600-90bp and RM3600-95bp, which phenotypically reduced the GT, were frequently encountered with various origins including Central China, Eastern China, and Northern China; Vietnam and Eastern China; Japan, Eastern China, and Northeast of China; and Northeast of China, respectively. RM3600-70bp, RM3600-125bp and RM3600-170bp, which phenotypically increased the GT, originated in Southwest China, Vietnam, and Eastern China; Vietnam; and Vietnam, respectively. RM3600-90bp, which had the maximum PEV (2.00), was concentrated in carried accessions originating in Japan (100.0%) followed by Northeast of China (60.9%) and Eastern China (17.82%).
The rice accessions originating in Vietnam mainly carries the RM3600-70bp, -125bp, -170bp, and -85bp alleles. As rice was spread from south to Central China, the RM3600-125bp and RM3600-170bp alleles disappeared, while the RM3600-80bp, -90bp, and -95bp alleles appeared. Further north, the main rice accession alleles in Northern China and Japan were RM3600-90bp and -95bp, which are associated with reducing the GT trait.
The geographic distribution and relative frequencies of the 462 accessions carrying the RM6327 alleles are shown in Fig. 5b. RM6327-230bp, -200bp, -190bp, -180bp, and -175bp phenotypically reduced the AC and Fig. 5 The geographic distribution of accessions carrying favorable alleles (The source map was taken from http://pixelmap.amcharts.com/). a Geographic distribution and the relative frequencies of SSR marker RM3600 alleles. b Geographic distribution and the relative frequencies of SSR marker RM6327 alleles. The color in the pie charts indicates the marker alleles within each locus status and geographic provenance of the germplasm category were frequently encountered Northeast of China and Northern China; Southwest China; Eastern China; Vietnam and Eastern China; and Northeast of China, Japan, Northern China and Eastern China, respectively. RM6327-245bp, -170bp, -120bp, and -115bp phenotypically increased the AC and originated in Vietnam, Vietnam, Central China, and Eastern China, respectively. The RM6327-175bp allele was concentrated in the carried accessions originating in northeast of China (60.9%) followed by Eastern China (38.2%), Northern China (20.0%) and Japan (11.1%).
The rice accessions that originated in Vietnam mainly carried the RM6327-245bp, -170bp, and -180bp alleles. As rice spread from south to Central China, the RM6327-245bp and -170bp alleles disappeared, while the RM6327-200bp, -230bp, -175bp, -120bp, -115bp, and -190bp alleles appeared. Further north, the main rice accession alleles in Northern China and Japan were RM6327-230bp and -175bp, which are associated with reducing the AC trait. Some alleles in this population disappeared with the northward movement of rice cultivation, though at the same time, additional alleles emerged to adapt to local ecological and geographical conditions and to meet the taste demands of people.

Design for excellent parental combination
Fifteen excellent parental combinations were proposed for pyramiding favorable alleles into one single plant to improve rice ECQs (Table 6). Taking the GT trait as an example, when 'Yebaodao' was crossed with 'Ligengqing' , all of the favorable alleles at the 4 loci could be combined into a single plant, 'Yebaidao' had 2 favorable alleles (RM267-95bp and RM3600-90bp) and 'Ligengqing' had 2 favorable alleles (RM232-145bp and RM264-195bp). Thus, the ASS of this variety could be theoretically increased to a grade of 4.8 (Table 6). All favorable alleles carried by the parents for rice ECQs are listed in Additional file 9: Table S7.
Some cultivars were found repeatedly in the supposed parental combinations; for example, 'Yebaidao' emerged 5 times in the combinations for the GT trait; 'Baikenuo' , 'Hongnong5' , and 'Dongnongjingnuo418' emerged of 4, 2, and 2 times, respectively for the GC trait; and 'Baoxintaihuqing' appeared in all combinations for the AC trait ( Table 6), indicating that these cultivars possesses important values for breeding superior ECQs parents.

Discussion
Without considering the population structure, there would have a high number of false positives in this association analysis. In this study, 462 rice accessions were classified into five subpopulations using a Structure model-based method (Fig. 2b), a dendrogram was created based on the Nei's genetic distances (Fig. 2c), and the results from the PCA was consistent with the population structure based on these two collections (Fig. 2d). Therefore, the results from these three analyses are consistent. The results showed that accessions collected from Vietnam (latitudes < 17°N) were classified into POP1, while most accessions from northeastern China (latitudes ≥ 45°N) were classified into POP2. Thus, the geographic origin of accessions had an effect on the population structure. Different ecological environments and their corresponding unique geographical origins Digit in parentheses of the second column is the locus number may be partly responsible for the genetic differentiation of the population, enabling germplasm accessions to adapt to the environment and the appearance of rare alleles. Ten SSR loci were associated with rice ECQs from the entire set of accessions in this study, including 4 associated with GT, 2 associated with GC, and 4 associated with AC. Four of the 10 associations were in regions where the QTL associated with the ECQs had been identified and reported in previous studies using whole-genome marker resources from the Gramene website (http://www.gramene.org/), NCBI website (https://www.ncbi.nlm.nih.gov) and QTL database website (http://qtaro.abr.affrc.go.jp/), including RM267 and RM264 for GT and RM5753 for GC and AC [13,19,62,63] (Table 7). Six loci were found for the first time in this study, including 2 for GT, 1 for GC and 3 for AC. For the 2 novel SSR loci for GT, RM3600 (Chr.9: 17107752-17107843) had the maximum PVE (10.12% in E2). The SSR locus RM6712 (Chr.3: 35020004-35020027) was a novel locus associated with GC, and it showed the largest -LogP value (3.3). For the 3 novel loci for AC, the average PVE ranged from 5.69% to 12.13% over the 5 environments (Table 4).
The novel locus RM3600 (Chr.9: 17107752-17107843) for GT was detected close to the Isoamylase3 (ISA3) gene (Chr.9: 17857347-17868663) that facilitates and affects starch metabolism in rice. To identify candidate genes to use as strong associated markers with LD analysis among the 262 SSR markers in this study, we found that RM210 (Chr. 9: 19879785-19879804) was the closest marker to RM3600 in physical location; the distance between RM3600 and RM210 is approximately 2.7 Mb, while the ISA3 gene is flanked by RM3600 and RM210. LD analysis was estimated by the D′ values (P < 0.05), the intrachromosomal combinations of RM3600 and RM210 with a D′ value of 0.76, and the minimum distance of LD decay (44.81 cM). The results indicated that RM3600 (-LogP=4.92) for GT was associated with ISA3. Additionally, association analyses for rice populations can provide an effective method for gene identification. The GWAS results in this study may increase our search for QTLs associated with the rice ECQs and provide useful information for further cloning.
In addition to the novel SSR loci detected in this study, four other SSR loci associated with rice ECQs were consistent with previous studies. RM264 on chromosome 8 had a PVE of 7.43% in E1 and RM267 on chromosome 5 associated with GT had a PVE of 7.04% in E3. These findings confirmed the results from Temnykh et al., who mapped microsatellite sequences in rice and reported that SSR loci RM264 and RM267 were associated with GT [62].
Similarly, Lanceras et al. (2000) reported that RM5753 on chromosome 6 was associated with the GC trait [19]. In this study, RM5753 (Chr.6: 30966850-30967050) was identified as co-associated with AC and GC; its physical distance is close to the Wx gene (Chr.6: 1765580-1770644) that encodes granule-bound starch The SSR Marker and the QTL physical position (bp) was inferred from the Gremene (http://www.gramene.org/markers) and NCBI (https://www.ncbi.nlm.nih.gov/Blast.cgi) respectively synthase and controls GC and AC. RM506 (Chr.6: 435648-435677) was used to identify whether the Wx gene was strongly associated with RM5753, which is the closest marker to RM5753 based on physical location among the 262 SSR markers used in this study. The physical distance between the RM5753 and RM508 is approximately 30 Mb. The Wx gene is flanked by these two SSR markers and the intrachromosomal combinations of RM5753 and RM508 had a D′ value of 1; the physical distance of this block was too long to determine whether the candidate gene was associated with RM5753. More markers on this chromosome should be further studied.
The favorable alleles were further mined using the results for the significant SSR loci associated with the rice ECQs obtained from the 462 accessions. Twenty-seven favorable alleles for rice ECQs were mined at 10 SSR loci. Among them, the favorable alleles were mainly carried by accessions collected from Eastern China, Vietnam and Northeast of China. For the AC trait, 55.79% of the favorable alleles were carried by the accessions collected from Eastern China, 31.14% were carried by accessions from Vietnam, and 9.67% were carried by accessions from Northeast of China. Similarly, the favorable alleles were detected in various accessions for GT and GC were consistent with these results. Based on the above results, during rice evolution and its progression from south to north, some alleles disappeared naturally or through artificial selection while others were retained in cultivars; moreover some appeared in modern cultivars for the first time.
For the GT trait, the broad-sense heritability average across the five environments for all 462 accessions was 95%, which was considerably high. Thus, marker-assisted selection (MAS) could obtain the expected results for improving GT. Among the four SSR loci identified for GT, RM3600 (Chr.9: 17107752-17107843) had the largest -LogP, PVE and AAE + values at 4.92, 10.12% and 1.38, respectively. Among the three favorable alleles detected at this locus, RM3600-90 bp showed the largest positive phenotypic effect value (2.0 grade) for reducing GT. This favorable allele was carried by 86 accessions, and 'Long-dun105' was a typical carrier material. We could improve GT greatly using the combinations described in Table 6.
For the GC trait, the broad-sense heritability average across the five environments for all 462 accessions was 94%. Among the two SSR loci associated with GC, RM5753 (Chr.6: 30452023-30452067) had the maximum PVE and positive AAE values of 10.54% and 20.52, respectively. Among the four favorable alleles found at this locus, RM5753-115bp showed the maximum positive phenotypic effect value (25.73 mm) for increasing the GC. Thirty-three accessions carried the RM5753-115bp allele and its typical cultivar is 'Hongnong5'. GC could be improved by the combinations described in Table 6.
Among the three parameters measured for rice ECQs across the five environments, AC had the highest broad-sense heritability average of 98%. There were four SSR loci associated with AC, of which RM6327 (Chr.11: 364257-364310) had the maximum PVE (12.87% in E2, 11.39% in E3). Among the five negative alleles found at this locus, the favorable allele RM6327-230bp in E2 showed the maximum negative phenotypic effect value (-3.78%) for reducing the AC. Its typical cultivar is 'Jinggunuo'. Thus, AC might be improved by the combinations described in Table 6.
Correlations and allele overlapping among the GT, GC and AC traits were observed; for example, GC was significantly negatively correlated with AC (-0.80**) and GT (-0.17**). Additionally, several cases showed that the same allele was significant for multiple traits. We observed RM5753 as coassociated with GC and AC, in which RM5753-195bp and RM5753-115bp simultaneously increased the phenotypic effect values for GC but reduced the PEV for AC (Table 5). RM6712 was also identified as coassociated with GC and AC, in which the favorable alleles RM6712-95bp and -115bp that increased the phenotypic effect values for GC overlapped with alleles that simultaneously reduced the PEV for AC ( Table 5). The above results confirmed the results from He et al. (1999) and Yoko et al. (2015), who reported that GC and AC were both controlled by the Wx gene and displayed minor effects on the QTLs [10,21]. Moreover, these overlapping alleles have the correct sign with respect to trait correlations, and illuminate the genetic basis of trait correlations.
We found that the geographical and ecological distribution of the experimental populations were sufficient with abundant genetic diversity. In total, 2462 alleles were detected with an average of 9.4 alleles per locus (Additional file 6: Table S5). Meanwhile, the proportion of rare alleles (allele frequency <5%) was 34.44% within the 2462 identified alleles. The slightly high ratio of rare alleles might have been caused by the wide distribution of latitudes for the natural population accessions; this result was consistent with the coefficient of variation for rice ECQs from 30.79% to 56.23%. As the rice cultivation area expanded from the plains to plateaus and from south to north, some new alleles appeared and some original alleles were lost, resulting in the emergence of accessions with rare alleles. Additionally, these results show that the natural population contains sufficient germplasm resources and genetic diversity, which are suitable for mining more favorable alleles for rice ECQs.
Analysis of the geographical distribution of the 7 alleles from RM3600 in the experimental natural population found that 76.44% of the favorable alleles (RM3600-90bp, -85bp, and -80bp) were carried by accessions collected from Eastern China (23°30'-38°23'N) and 10.37% of the favorable alleles were from Northeast of China (53°51'-53°33'N) accessions, but rarely found in the Vietnam (8°30'-23°22'N) accessions. In contrast, the RM3600-125bp and -170bp alleles were only found in the Vietnam accessions (Fig. 5a). We found that the accessions collected from Eastern China, Northeast of China and Japan were mainly japonica rice, which had a lower GT with the positive alleles. In contrast, the accessions collected from Vietnam and parts of the Lower Yangtze Region were mainly indica rice, which have a higher GT with negative alleles. Accessions collected from Eastern China and Vietnam carried abundant alleles and were to the Pearl River in Southern China, which implies that the geographic distribution of these alleles reflect the diffusion and diversification of Asian cultivated rice. Additionally, japonica rice was first domesticated in Southern China [64]. This finding also confirms the results from Khush (1997), who reported that Japonica rice moved north from South China and that artificial selection and adaptation from diverse ecological regions has resulted in cultivar diversity [65].
Vietnam, Southwest of China and part of the Lower Yangtze Region mainly produce indica rice with a hard gel consistency, whereas Northeast China, Northern China, Central China and Japan mainly produced japonica rice with a soft gel consistency. This finding confirmed those reported by Wang et al. (2108), who reported that genetic differentiation of rice developed parallel to the development of ecological diversification and that long-term evolution, artificial selection and different ecological conditions contributed to the differentiation of Asian cultivated rice [66]. The above results revealed the proposed history of the spread of Asian cultivated rice in many parts of Southeast Asia. The expansion of rice cultivation eastwards from Southeast Asia was associated with artificial selection and different ecological conditions, which may have led to the favorable alleles that increased the gel consistency softness and adapted to local ecological conditions.