Genetic Basis of Tiller Dynamics of Rice Revealed by Genome-Wide Association Studies

A tiller number is the key determinant of rice plant architecture and panicle number and consequently controls grain yield. Thus, it is necessary to optimize the tiller number to achieve the maximum yield in rice. However, comprehensive analyses of the genetic basis of the tiller number, considering the development stage, tiller type, and related traits, are lacking. In this study, we sequence 219 Korean rice accessions and construct a high-quality single nucleotide polymorphism (SNP) dataset. We also evaluate the tiller number at different development stages and heading traits involved in phase transitions. By genome-wide association studies (GWASs), we detected 20 significant association signals for all traits. Five signals were detected in genomic regions near known candidate genes. Most of the candidate genes were involved in the phase transition from vegetative to reproductive growth. In particular, HD1 was simultaneously associated with the productive tiller ratio and heading date, indicating that the photoperiodic heading gene directly controls the productive tiller ratio. Multiple linear regression models of lead SNPs showed coefficients of determination (R2) of 0.49, 0.22, and 0.41 for the tiller number at the maximum tillering stage, productive tiller number, and productive tiller ratio, respectively. Furthermore, the model was validated using independent japonica rice collections, implying that the lead SNPs included in the linear regression model were generally applicable to the tiller number prediction. We revealed the genetic basis of the tiller number in rice plants during growth, By GWASs, and formulated a prediction model by linear regression. Our results improve our understanding of tillering in rice plants and provide a basis for breeding high-yield rice varieties with the optimum the tiller number.


Introduction
Tiller number in rice is the major trait determining plant architecture. An excessive tiller number causes too many unproductive tillers, a reduced leaf area, and a reduction in photosynthetic efficiency by mutual shading. By contrast, too few tillers lead to low biomass production and a deficiency in the grain filling capacity and carbohydrate production [1]. It is important to minimize unproductive tillers and increase the sink size to improve the harvest index in rice, as in the case of China's super hybrid regression analysis. Lastly, using independent japonica accessions, we verified the applicability of the detected loci to molecular breeding aimed at optimizing tiller numbers.

Plant Materials and Phenotype Analysis
A total of 266 rice accessions, including 219 Korean rice accessions for GWAS and 47 japonica accessions for the verification of linear regression models, were used. Korean rice accessions included 78 landraces, 130 modern cultivars, and 11 Tongil-type cultivars (Table S1). Modern cultivars and Tongil-type cultivars, derived from a cross between temperate japonica and indica, were considered as separate groups. Forty-seven japonica accessions to verify linear regression models were from four origins: Japan, China, Taiwan, and the USA (Table S2). A total of 32 accessions were provided by the National Agrobiodiversity Center (NAC), Rural Development Administration (RDA), South Korea. The other 234 accessions were conserved at the Agricultural Genetic Resource Center, Seoul National University (SNU), Suwon, South Korea. All plant materials were cultivated in an experimental field at SNU, Suwon, South Korea (natural long-day conditions, latitude = 37 • N). Thirty-day-old seedlings were transplanted into a paddy field under the following conditions: One plant per hill, 25 plants per row, 15 cm between plants in a row, 30 cm between rows, and three rows per accession. All phenotypes were measured in mid-row plants, excluding plants near other accessions and border plants. Tiller number traits, including tiller numbers at the early tillering stage (TNE), maximum tillering stage 1 (TNM1), maximum tillering stage 2 (TNM2), and productive tiller number (PTN), were measured at 18,35,42, and 110 days after transplanting (DAT), respectively (Figure 1c). The productive tiller ratio (PTR) indicated the capacity for developing panicle-emerged tillers from whole tillers. PTR was calculated by the ratio of panicle-emerged tillers (productive tillers) to the maximum potential tillers (PTN/TNM2). The heading date (HD) was defined as the time from the date of sowing to the date at which the first panicle emerged in the plant. Panicle emergence was recoded when a tip of the panicle was visible from the flag leaf sheath. The heading interval (HDI) was defined as the time from the date of emergence of the first panicle to the last panicle in a plant. , maximum tillering stage 1 (TNM1) (e), maximum tillering stage 2 (TNM2) (f), productive tiller number (PTN) (g), productive tiller ratio (PTR) (h), heading interval (HDI) (i), and heading date (HD) (j) across accessions. Horizontal and vertical lines below the histogram represent the range and average value, respectively. M, L, and T indicate modern cultivars, landrace, and Tongil-type cultivars, respectively. (k) Correlation network for tiller number traits and heading traits. A correlation network was built based on Pearson's correlation coefficients ® between traits. Each path indicates a correlation between the two traits. The width and transparency of the line denote the strength of the correlation. Weak correlations with R between −0.3 and 0.3 are not shown. Green and red represent positive and negative correlations, respectively.

NGS Analysis and Genotyping
Total DNA was extracted from 90-day-old leaves of each accession by the CTAB method [24]. DNA was sheared into fragments of 450-500 bp and used for DNA library construction using TruSeq Nano DNA Library Prep kits (Illumina, San Diego, CA, USA) according to the manufacturer's protocol. The library size distribution was checked using the Agilent Technologies 2100 Bioanalyzer and a DNA 1000 chip (Santa Clara, CA, USA). Prepared libraries were quantified by qPCR according to the Illumina qPCR quantification protocol. Whole genome sequencing data were generated on the Illumina HiSeq X system to generate 2 × 150 bp paired-end reads with a sequencing depth of >10× per sample. Raw reads were processed to remove adaptors and low-quality bases using Trimmomatic v0.38 [25] with the parameters ILLUMINACLIP:2:30:10 SLIDINGWINDOW:4:15 MINLEN:50. Reads were aligned to the rice reference genome (Nipponbare, IRGSP v1.0) [26] using the BWA v0.7.17 MEM algorithm with default parameters [27]. Aligned reads were sorted using samtools v1.9 [28], and duplicates were removed using Picard v2.20.2 [29]. Nucleotide variants were called by the HaplotypeCaller function of GATK v4.1.2 [30] with the parameters-max-missing 0.95-minQ 30-minDP 5. In addition, nucleotide variants with proportions of heterozygous genotypes of >0.05 were filtered using the vc.getHetcount command in GATK v4.12.

Population Structure and Genetic Relationships
Linkage disequilibrium (LD)-based SNP pruning was performed using PLINK v1.9 [31] with the command-indep-pairwise 50 5 0.2. In total, 37,009 LD-pruned SNPs (minor allele frequency [MAF] > 0.05) were obtained for analyses of population structure and genetic relationships. Population structure was revealed by multi-dimensional scaling (MDS) analysis conducted using the MDS function of PLINK v1.9. Neighbor-joining (NJ) trees were constructed to infer genetic relationships using MEGA v7 [32].

Genome-Wide Association Mapping
Only 1,509,362 SNPs with MAF > 0.05 and bi-allelic genotypes were included in the GWASs. All GWASs were performed using linear mixed-models (LMM) implemented in FaST-LMM v2.07 [33]. The genetic similarities were used to estimate random effects. The p-value thresholds for genome-wide significance were calculated by dividing the significance level 0.05 by the effective number of independent SNPs (p-value of 1.35 × 10 −6 ) [34]. LD patterns between lead SNPs and the other SNPs were evaluated using PLINK v1.9 [31] with the -r2 command to calculate pairwise genotype correlations (r 2 ). Lead SNPs were defined as the SNPs with the lowest p-value in loci, including significant SNPs. Haplotypes were constructed using all variants, including SNPs and InDels, without consideration of MAF. Individuals containing at least one missing or/and heterozygous genotype were excluded from the haplotype analysis.

Statistical Analysis
All statistical analyses were performed using R studio v1.2.5033 [35]. Pearson's correlation coefficients among all phenotypes were calculated without missing observations using the stats package [36]. A correlation network was constructed with R > |0.3| using the corrr package [37].
Multiple linear regression models for TNM2, PTN, and PTR were estimated using phenotypes as dependent variables and lead SNPs in the GWAS as independent variables. Independent variables consisted of lead SNPs not only for TNM2, PTN, and PTR, but also for earlier traits with correlations. Since these three terminal traits are affected by traits at the previous growth stage, the directions of relationships between traits were determined based on the sequence of growth stages (Figure 1k). Variables for the best linear equation were selected based on Akaike information criterion (AIC) in a stepwise algorithm implemented in the stats package [36]. The relative importance of independent variables in the linear regression equation was estimated using the lmg method of the relaimpo package [38]. Observed values and predicted values were compared using the predict function of the stats package [36].

Estimation of Heritability
GCTA v1.93 [39] was used to estimate the SNP-based heritability of traits as proportion of phenotypic variance of traits explained by the SNP subset [40]. The genetic relationship matrix to estimate genetic relatedness between individuals was calculated using filtered 1,509,362 SNPs. Variance in phenotypes was calculated by the restricted maximum likelihood method with the genetic relatedness and eigenvectors from a principal component analysis.

Population Structure of Korean Rice Accessions
An SNP subset obtained by LD-based pruning was used for analyses. An MDS analysis showed that population genetic structures of Korean rice accessions could be classified into two major groups, japonica and Tongil-type accessions, developed from inter-subspecific crosses between indica and temperate japonica (Figure 1a). Japonica accessions were divided into two sub-groups, modern cultivars, and landrace accessions. An NJ tree revealed a similar pattern to that obtained by MDS plot (Figure 1b).

Variation and Changes in the Tiller Number among Growth Stages
The modern japonica cultivar group showed lower TNE and TNM1 than those of japonica landraces and Tongil-type rice (Figure 1d,e). TNM2 was higher in the Tongil-type cultivar group than in the other groups ( Figure 1f). PTN and PTR in all Korean rice accessions were mainly in the ranges of 10-11 and 0.65-0.7, respectively. The highest average PTN and PTR values were found in the landrace group (Figure 1g,h). Tongil-type cultivars revealed the lowest phenotypic variation for all traits, but showed the highest average values and variation in HDI (Figure 1i). The average HD was lower in the landrace group than in the other groups ( Figure 1j).
Phenotypic relationships among all traits were investigated by correlation analysis (Figure 1k). Three tiller number traits at the vegetative stage (TNE, TNM1, and TNM2) were positively correlated. A positive correlation was found between HD and TNM2. HD was negatively correlated with HDI and PTR. PTR was negatively correlated with TNM2, while the PTN was positively correlated with TNM2 and HDI.
We detected association signals for TNE, represented by the lead SNP chr01:42957568 (p = 6.11 × 10 −7 ), at 42.9 Mb on chromosome 1. A candidate gene, OsRLCK57, in strong LD (r 2 > 0.97) was located about 23 kb from chr01:42957568 (Figure 5f). OsRLCK57 is involved in the modulation of BR signaling and is required to develop tillers and panicle secondary branching [9]. Three OsRLCK57 haplotypes were constructed by three variants (Figure 5g). H1, found in only four accessions, showed higher TNE values than those of other haplotypes (Figure 5h).
Another locus at 24.5-24.8 Mb on chromosome 2, led by chr02:24685790 (p = 7.67 × 10 −8 ), was significantly associated with TNM2. OsTOC1 was located about 113 kb from the lead SNP and was in LD with r 2 > 0.65 (Figure 5i). OsTOC1 is an important circadian clock component and photoperiodic heading regulator [18]. Ten haplotypes were detected based on variants in the genic region of OsTOC1. H1 containing chr02:24572219, an InDel located at the junction between the coding region and 3 UTR, was only detected in landrace accessions, showing the highest TNM2 (Figure 5j,k). H9 and H10 with the chr02:24571309 T allele showed relatively lower TNM2 values than those of the other haplotypes (Figure 5j,k).

Linear Regression Model for Three Tiller-Related Traits
The lead SNPs for TNE, TNM1, TNM2, and HD were included in a linear regression analysis of TNM2 (Figure 6a). A linear regression model consisting of seven independent variables effectively explained variation in TNM2 (R 2 = 0.49) ( Table 2). As determined by estimates of relative importance (i.e., the contribution of an individual variable to R 2 ), chr04:31232808 (23.6%) showed the largest contribution to the linear regression model for TNM2 (Figure 6b). In a linear regression analysis of PTN, six lead SNPs were selected as independent variables, explaining 22.3% of the variance in PTN ( Table 2). Chr02:22024430 showed the highest relative importance of 38.1% (Figure 6c). chr04:31232808 showed high relative importance for PTN in addition to TNM2, accounting for 21.5% of R 2 (Figure 6c). The linear regression model for PTR, consisting of six independent variables, exhibited an R 2 of 0.41 (Table 2). Chr06:21572894 and HD1 showed relatively large contributions to the model, with relative importance values of 36.9% and 16.9% (Figure 6d). Furthermore, an independent test population consisting of 47 japonica accessions collected from several countries was employed to verify the accuracy of the linear regression model (Table S2). TNM2, PTN, and PTR were measured in the test population (Figure 6e-g). Predicted values for TNM2, PTN, and PTR were calculated by applying the regression equations to genotype data for the test population, and were compared with observed values. The correlation coefficients I between predicted and observed values of TNM2, PTN, and PTR were 0.73, 0.49, and 0.6, respectively (Figure 6h-j). Linear regression models for TNM2, PTN, and PTR explained 52.7%, 23.8%, and 36.5% of phenotypic variation, respectively, indicating that the models consisting of lead SNPs consistently explained the three traits with similar accuracy in independent temperate japonica accessions (Figure 6h-j).

Discussion
To identify the genetic mechanisms underlying tiller number, comprehensive investigations involving consecutive observations and accounting for relationships with other traits affecting tillering are necessary. We investigated relationships between tiller numbers and the heading traits, and performed a GWAS to dissect the genetic basis for tiller numbers.

Phenotypic Relationships
We detected a positive correlation between TNM2 and PTN, indicating that TNM2 could directly affect the potential to produce a panicle. TNM2 was also positively correlated with HD, which reflects the transition to reproductive development. A later HD is related to a longer duration of vegetative growth and the continued development of vegetative organs, causing an increase in TNM2. We detected positive correlations between TNM2 and HD, as well as between TNM2 and PTN, but not between HD and PTN. These results suggest that HD is not a direct determinant of PTN and might have an indirect effect via TNM2.
PTN is an important trait affecting grain yield as it directly determines the panicle number per plant. PTN showed a significant positive correlation with HDI (Figure 1k). A longer HDI represents a longer duration of reproductive growth and panicle development. These results suggest that panicle number, derived from PTN, could be increased by nonsynchronous flowering caused by a longer HDI.
Unproductive tillers are involved in discontinuing nutrient and carbohydrate translocation to the tillers from the mother stems; furthermore, they compete with reproductive tillers for nutrients in addition to light [43,44]. Therefore, a high ratio of productive to unproductive tillers (PTR) is considered a desirable trait for high-yielding varieties [2]. We observed a strong negative correlation between HD and PTR ( Figure 1k). As mentioned above, HD was positively correlated with TNM2, but was not significantly correlated with PTN. Additionally, PTN was not correlated with PTR. These results indicate that a higher PTR corresponding with an earlier HD could mainly be explained by reducing TNM2, rather than an increase in PTN.

Association Signals for Tillering and Heading
The tiller number at each growth stage is affected by the tillering capacity at earlier stages [45]. Tillering is controlled by the temporal expression of related genes at various development stages. However, some genetic factors are consistently associated with tillering at several stages [46]. In this study, two lead SNPs, chr01:42957568, and chr03:9436356, were associated with TNE ( Figure 2a). chr02:26906676 and chr02:24685790 were associated with TNM1 and TNM2, respectively, exhibiting stage specific-associations (Figure 2b,c). Two lead SNPs, chr04:31093494 and chr04:31232808, were simultaneously associated with TNM2 and PTN and with TNM1 and TNM2, respectively. These results indicate that TNE was controlled by stage-specific QTLs detected only at certain stages. However, tiller numbers after the early stage, TNM1, TNM2, and PTN were affected by combinations of stage-specific and -nonspecific QTLs.
HD1, known to control photoperiodic heading, was a candidate gene for PTR and HD based on strong association signals within 8.33-8.37 Mb on chromosome 6, although HD1 was located approximately 1 Mb away from this region (Figure 2e,f). In the present study, nonfunctional HD1 resulted from two frame-shifting InDels, chr06:9338004, and chr06:9338220 ( Figure 4c). However, only the SNP subset excluding InDels was used for the GWAS, and this association may, therefore, have been missed. Thus, we also performed an additional GWAS for HD and PTR using all variants, including InDels. However, the most significant association was still not detected in the HD1 region. Instead, chr06:8329287 was the lead SNP (p = 2.14 × 10 −15 ; Figure S1). A similar pattern of association signals for HD have been reported in a previous GWAS using Japanese rice varieties [47] and a diverse collection [48]. The discrepancy between the strongest peak and HD1 could be explained by the presence of several linked genes that contribute to heading across the region and/or allelic heterogeneity. Since a GWAS is based on independent comparisons of phenotypic variation for each polymorphic site, statistical significance is reduced when there are several causative alleles [47]. Similar to previous reports, we concluded that the most significant association signal was not detected in the HD1 region, due to allelic heterogeneity. It is notable that nonfunctional HD1 leads to a higher PTR, as well as early heading (Figure 2e,f). This result supports that earlier heading date confers a higher PTR, as mentioned above, and suggests that PTR could be improved by the allele of HD1 related to early heading.
Previous reports have shown that variation in the tiller number is mainly explained by genes directly regulating axillary meristem formation [4,5,49] and genes involved in phytohormone signaling [6][7][8][9]. However, in the present study, we found three candidate genes (OsHAM1, OsHAM2, and OsTOC1) involved in the developmental phase transition near lead SNPs associated with TNM1 and TNM2 (Table 1). In rice, heading results from a developmental switch from the vegetative to reproductive phase, and the repression of heading is involved in the maintenance of vegetative growth, including tiller development. These results imply that natural variation in the tiller number in Korean rice accessions is mainly modulated by genes involved in developmental phase transitions, rather than by genes directly regulating tiller development.

Genetic Determinants of TNM2, PTN, and PTR
To obtain a comprehensive understanding of the genetic basis of TNM2, PTN, and PTR, multiple linear regression models were estimated. The low R 2 value for the linear regression model for PTN (0.22) indicated that a relatively small portion of the variance in PTN could be explained by genetic factors associated with the tiller number and heading traits ( Table 2). This result suggests that other genetic variants should be additionally considered to sufficiently explain PTN variation. Based on the relative importance of independent variables in linear regression models, TNM2 was largely contributed by chr04:3123280 (23.6%), the lead SNP for TNM1 and TNM2, followed by chr02:26906676 (17.9%), the lead SNP for TNM1 (Figure 6b). For PTN, chr02:22024430, the lead SNP for HDI, was the major contributor (38.1%; Figure 6c). Chr06:21572894, HD1, and chr01:34991240, which were associated with HD, were major contributors to PTR, accounting for 36.9%, 16.9%, and 14.6% of R 2 , respectively ( Figure 6d). These results indicate that genetic variants associated with earlier stage traits and correlated traits could also influence TNM2, PTN, and PTR, together with lead SNPs for terminate traits (TNM2, PTN, and PTR).
In a previous study of progeny populations derived from various crosses, the heritability of tiller numbers was 37.8-97.7%, depending on the genetic background of the parents [50]. The heritability of tiller numbers in single-segment substitution lines varied from 0% to 39.1% throughout plant growth from 7 to 63 days after transplanting (DAT), and the highest heritability was observed at 42 DAT [51]. In the USDA minicore rice diversity panel, SNP-based heritability of tiller numbers at 60 days after emergence was in the range from 20-25% [52]. These results indicate that the heritability of tiller numbers varies substantially depending on the growth stage, genetic population, and estimation method. In the present study, the heritability of tiller numbers varied from 39.2% (TNI; DAT 18) to 77% (TNM2; 42 DAT), showing substantial variation among growth stages ( Figure S2). Furthermore, the heritability of the other tiller number traits PTN and PTR was quite high (i.e., 58.1% and 70.1%, respectively), indicating that a higher proportion of variation of tiller number traits could be explained by genetic factors. Linear regression models estimated using significant SNPs could explain 49%, 22%, and 41% of phenotypic variation in TNM2, PTN, and PTR, respectively (Table 2). Taken together, the small proportion of phenotypic variation explained by multiple linear regression models might be attributable to the use of significant SNPs in the GWAS, while heritability was estimated using all SNPs. Regardless, the multiple linear regression models and genetic variants used as independent variables could be effective molecular tools for the prediction of TNM2, PTN, and PTR in rice breeding programs.

Conclusions
In this study, we dissected the basis of rice tiller numbers by analyzing relationships with heading traits. We also revealed the genetic basis of tiller alterations at different growth stages, using GWASs. Several candidate genes were detected in loci significantly associated with tiller number. OsRLCK57 involved in tiller development was associated with the tiller number at the early tillering stage. OsHAM1, OsHAM2, and OsTOC1, which were related to the developmental phase transition, were associated with the tiller number at maximum tillering stages. HD1 controlling flowering time was associated with the productive tiller ratio. Taken together, these results suggest that genes involved in developmental phase transitions, along with gene modulating tiller development, could also determine the rice tillering pattern at different growth stages. Our results provide insight into the genetic basis of overall tillering dynamics.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2223-7747/9/12/1695/s1, Table S1. Information for 219 Korean rice accessions used in the study. Table S2. Information for 47 japonica rice accessions used for the verification of linear regression models. Figure S1. Association signals for HD on chromosome 6 by a GWAS using all variants, including InDels. Figure S2. SNP-based heritability of seven traits.
Author Contributions: S.J. and S.Z. designed the experiment and prepared the manuscript. S.Z. conducted the field experiments and collected phenotype data. S.J. and Y.K.L. prepared genotype data. S.J. performed the computational analysis. D.-G.K. and Z.J. revised original manuscript. H.-J.K. participated in the supervision of the overall work and contributed to the finalization of the manuscript. All authors read and approved the manuscript.
Funding: This study was supported by a grant from the Next-Generation BioGreen 21 Program (no. PJ013165) of the Rural Development Administration (RDA), Republic of Korea. The funding agency had no role in the experimental design, data collection and analysis, and preparation of the manuscript.

Conflicts of Interest:
The authors declare that they have no competing interests.